dataset.models.metrics

Metrics

class Metrics(*args, **kwargs)[source]

Base metrics evaluation class

This class is not supposed to be instantiated. Use specific children classes instead (e.g. ClassificationMetrics).

Examples

m = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels')
m.evaluate(['sensitivity', 'specificity'], multiclass='micro')
evaluate(metrics, agg='mean', *args, **kwargs)[source]

Calculates metrics

Parameters
  • metrics (str or list of str) – metric names

  • agg (str) – inter-batch aggregation type

  • args – metric-specific parameters

  • kwargs – metric-specific parameters

Returns

if metrics is a list, then a dict is returned::
  • key - metric name

  • value - metric value

Return type

metric value or dict

ClassificationMetrics

class ClassificationMetrics(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]

Bases: batchflow.models.metrics.base.Metrics

Metrics to assess classification models

Parameters
  • targets (np.array) – Ground-truth labels / probabilities / logits

  • predictions (np.array) – Predicted labels / probabilites / logits

  • num_classes (int) – the number of classes (default is None)

  • fmt ('proba', 'logits', 'labels') – whether arrays contain probabilities, logits or labels

  • axis (int) – a class axis (default is None)

  • threshold (float) – A probability level for binarization (lower values become 0, equal or greater values become 1)

Notes

  • Input arrays (targets and predictions) might be vectors or multidimensional arrays, where the first dimension represents batch items. The latter is useful for pixel-level metrics.

  • Both targets and predictions usually contain the same data (labels, probabilities or logits). However, targets might be labels, while predictions are probabilities / logits. For that to work:

    • targets should have the shape which exactly 1 dimension smaller, than predictions shape;

    • axis should point to that dimension;

    • fmt should contain format of predictions.

  • When axis is specified, predictions should be a one-hot array with class information provided in the given axis (class probabilities or logits). In this case targets can contain labels (see above) or probabilities / logits in the very same axis.

  • If fmt is ‘labels’, num_classes should be specified. Due to randomness any given batch may not contain items of some classes, so all the labels cannot be inferred as simply as labels.max().

  • If fmt is ‘proba’ or ‘logits’, then axis points to the one-hot dimension. However, if axis is None, two class classification is assumed and targets / predictions should contain probabilities or logits for a positive class only.

Metrics All metrics return:

  • a single value if input is a vector for a 2-class task.

  • a single value if input is a vector for a multiclass task and multiclass averaging is enabled.

  • a vector with batch size items if input is a multidimensional array (e.g. images or sequences) and there are just 2 classes or multiclass averaging is on.

  • a vector with num_classes items if input is a vector for multiclass case without averaging.

  • a 2d array (batch_items, num_classes) for multidimensional inputs in a multiclass case without averaging.

Note

Count-based metrics (true_positive, false_positive, etc.) do not support mutliclass averaging. They always return counts for each class separately. For multiclass tasks rate metrics, such as true_positive_rate, false_positive_rate, etc., might seem more convenient.

Multiclass metrics

In a multiclass case metrics might be calculated with or without class averaging.

Available methods are:

  • None - no averaging, calculate metrics for each class individually (one-vs-all)

  • ‘micro’ - calculate metrics globally by counting the total true positives,

    false negatives, false positives, etc. across all classes

  • ‘macro’ - calculate metrics for each class, and take their mean.

Examples

metrics = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels')
metrics.evaluate(['sensitivity', 'specificity'], multiclass='macro')
accuracy()[source]

An accuracy of detecting all the classes combined

append(metrics)[source]

Append confusion matrix with data from another metrics

condition_negative(label=None, *args, **kwargs)[source]
condition_positive(label=None, *args, **kwargs)[source]
property confusion_matrix
copy()[source]

Return a duplicate containing only the confusion matrix

diagnostics_odds_ratio(*args, when_zero=(inf, 0), **kwargs)[source]
f1_score(*args, **kwargs)[source]

Compute f1-score

false_discovery_rate(*args, when_zero=(1, 0), **kwargs)[source]
false_negative(label=None, *args, **kwargs)[source]
false_negative_rate(*args, when_zero=(1, 0), **kwargs)[source]
false_omission_rate(*args, when_zero=(1, 0), **kwargs)[source]
false_positive(label=None, *args, **kwargs)[source]
false_positive_rate(*args, when_zero=(1, 0), **kwargs)[source]
free()[source]

Free memory allocated for intermediate data

jaccard(*args, **kwargs)[source]
negative_likelihood_ratio(*args, when_zero=(inf, 0), **kwargs)[source]
negative_predictive_value(*args, when_zero=(0, 1), **kwargs)[source]
one_hot(inputs)[source]

Convert an array of labels into a one-hot array

plot_confusion_matrix(classes=None, normalize=False, **kwargs)[source]

Plot confusion matrix.

Parameters
  • classes (sequence, optional) – Sequence of classes labels.

  • normalize (bool) – Whether to normalize confusion matrix over target classes.

positive_likelihood_ratio(*args, when_zero=(inf, 0), **kwargs)[source]
positive_predictive_value(*args, when_zero=(0, 1), **kwargs)[source]
prediction_negative(label=None, *args, **kwargs)[source]
prediction_positive(label=None, *args, **kwargs)[source]
prevalence(*args, when_zero=(0, 0), **kwargs)[source]

Notes

Parameter when_zero doesn’t really matter in this case, since total_population is never zero, when targets are not empty.

total_population(*args, **kwargs)[source]
true_negative(label=None, *args, **kwargs)[source]
true_negative_rate(*args, when_zero=(0, 1), **kwargs)[source]
true_positive(label=None, *args, **kwargs)[source]
true_positive_rate(*args, when_zero=(0, 1), **kwargs)[source]
update(metrics)[source]

Update confusion matrix with data from another metrics

SegmentationMetricsByPixels

class SegmentationMetricsByPixels(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]

Bases: batchflow.models.metrics.classify.ClassificationMetrics

Metrics to assess segmentation models pixel-wise

Notes

Rate metrics are evaluated for each item independently. So there are two levels of metrics aggregation:

  • multi-class averaging

  • dataset aggregation.

For instance, if you have a dataset of 100 pictures (each having size of 256x256) of 10 classes and you need to calculate an accuracy of semantic segmentation, then:

  • evaluate([‘accuracy’], agg=None, multiclass=None) will return an array of shape (100, 10) containing accuracy of each class for each image separately.

  • evaluate([‘accuracy’], agg=’mean’, multiclass=None) will return a vector of shape (10,) containing an accuracy of each class averaged across all images.

  • evaluate([‘accuracy’], agg=None, multiclass=’macro’) will return a vector of shape (100,) containing an accuracy of each image separately averaged across all classes.

  • evaluate([‘accuracy’], agg=’mean’, multiclass=’macro’) will return a single value of an average accuracy of all classes and images combined.

The default values are agg=’mean’, multiclass=’macro’.

For multi-class averaging see ClassificationMetrics.

Examples

metrics = SegmentationMetricsByPixels(targets, predictions, num_classes=10, fmt='labels')
metrics.evaluate('specificity')
metrics.evaluate(['sensitivity', 'jaccard'], agg='mean', multiclass=None)

SegmentationMetricsByInstances

class SegmentationMetricsByInstances(targets, predictions, fmt='proba', num_classes=None, axis=None, skip_bg=True, threshold=0.5, iot=0.5, calc=True)[source]

Bases: batchflow.models.metrics.classify.ClassificationMetrics

Metrics to assess segmentation models by instances (i.e. connected components of one class, e.g. cancer nodules, faces, )

Parameters

iot (float) – if the ratio of a predicted instance size to the corresponding target size >= iot, then instance is considered correctly predicted (true postitive).

Notes

For other parameters see ClassificationMetrics.

condition_positive(label=None, *args, **kwargs)[source]
free()[source]

Free memory allocated for intermediate data

prediction_positive(label=None, *args, **kwargs)[source]
total_population(label=None, *args, **kwargs)[source]
true_negative(label=None, *args, **kwargs)[source]
true_positive(label=None, *args, **kwargs)[source]