dataset.models.metrics¶

Metrics¶

class Metrics(*args, **kwargs)[source]¶

Base metrics evaluation class

This class is not supposed to be instantiated. Use specific children classes instead (e.g. ClassificationMetrics).

Examples

m = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels')
m.evaluate(['sensitivity', 'specificity'], multiclass='micro')

evaluate(metrics, agg='mean', *args, **kwargs)[source]¶

Calculates metrics

Parameters

metrics (str or list of str) – metric names
agg (str) – inter-batch aggregation type
args – metric-specific parameters
kwargs – metric-specific parameters

Returns

if metrics is a list, then a dict is returned::

key - metric name
value - metric value

Return type

metric value or dict

ClassificationMetrics¶

class ClassificationMetrics(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]¶

Bases: batchflow.models.metrics.base.Metrics

Metrics to assess classification models

Parameters

targets (np.array) – Ground-truth labels / probabilities / logits
predictions (np.array) – Predicted labels / probabilites / logits
num_classes (int) – the number of classes (default is None)
fmt ('proba', 'logits', 'labels') – whether arrays contain probabilities, logits or labels
axis (int) – a class axis (default is None)
threshold (float) – A probability level for binarization (lower values become 0, equal or greater values become 1)

Notes

Input arrays (targets and predictions) might be vectors or multidimensional arrays, where the first dimension represents batch items. The latter is useful for pixel-level metrics.
Both targets and predictions usually contain the same data (labels, probabilities or logits). However, targets might be labels, while predictions are probabilities / logits. For that to work:
- targets should have the shape which exactly 1 dimension smaller, than predictions shape;
- axis should point to that dimension;
- fmt should contain format of predictions.
When axis is specified, predictions should be a one-hot array with class information provided in the given axis (class probabilities or logits). In this case targets can contain labels (see above) or probabilities / logits in the very same axis.
If fmt is ‘labels’, num_classes should be specified. Due to randomness any given batch may not contain items of some classes, so all the labels cannot be inferred as simply as labels.max().
If fmt is ‘proba’ or ‘logits’, then axis points to the one-hot dimension. However, if axis is None, two class classification is assumed and targets / predictions should contain probabilities or logits for a positive class only.

Metrics All metrics return:

a single value if input is a vector for a 2-class task.
a single value if input is a vector for a multiclass task and multiclass averaging is enabled.
a vector with batch size items if input is a multidimensional array (e.g. images or sequences) and there are just 2 classes or multiclass averaging is on.
a vector with num_classes items if input is a vector for multiclass case without averaging.
a 2d array (batch_items, num_classes) for multidimensional inputs in a multiclass case without averaging.

Note

Count-based metrics (true_positive, false_positive, etc.) do not support mutliclass averaging. They always return counts for each class separately. For multiclass tasks rate metrics, such as true_positive_rate, false_positive_rate, etc., might seem more convenient.

Multiclass metrics

In a multiclass case metrics might be calculated with or without class averaging.

Available methods are:

None - no averaging, calculate metrics for each class individually (one-vs-all)
‘micro’ - calculate metrics globally by counting the total true positives,
false negatives, false positives, etc. across all classes
‘macro’ - calculate metrics for each class, and take their mean.

Examples

metrics = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels')
metrics.evaluate(['sensitivity', 'specificity'], multiclass='macro')

accuracy()[source]¶: An accuracy of detecting all the classes combined

append(metrics)[source]¶: Append confusion matrix with data from another metrics

condition_negative(label=None, *args, **kwargs)[source]¶

condition_positive(label=None, *args, **kwargs)[source]¶

property confusion_matrix¶

copy()[source]¶: Return a duplicate containing only the confusion matrix

diagnostics_odds_ratio(*args, when_zero=(inf, 0), **kwargs)[source]¶

f1_score(*args, **kwargs)[source]¶: Compute f1-score

false_discovery_rate(*args, when_zero=(1, 0), **kwargs)[source]¶

false_negative(label=None, *args, **kwargs)[source]¶

false_negative_rate(*args, when_zero=(1, 0), **kwargs)[source]¶

false_omission_rate(*args, when_zero=(1, 0), **kwargs)[source]¶

false_positive(label=None, *args, **kwargs)[source]¶

false_positive_rate(*args, when_zero=(1, 0), **kwargs)[source]¶

free()[source]¶: Free memory allocated for intermediate data

jaccard(*args, **kwargs)[source]¶

negative_likelihood_ratio(*args, when_zero=(inf, 0), **kwargs)[source]¶

negative_predictive_value(*args, when_zero=(0, 1), **kwargs)[source]¶

one_hot(inputs)[source]¶: Convert an array of labels into a one-hot array

plot_confusion_matrix(classes=None, normalize=False, **kwargs)[source]¶

Plot confusion matrix.

Parameters

classes (sequence, optional) – Sequence of classes labels.
normalize (bool) – Whether to normalize confusion matrix over target classes.

positive_likelihood_ratio(*args, when_zero=(inf, 0), **kwargs)[source]¶

positive_predictive_value(*args, when_zero=(0, 1), **kwargs)[source]¶

prediction_negative(label=None, *args, **kwargs)[source]¶

prediction_positive(label=None, *args, **kwargs)[source]¶

prevalence(*args, when_zero=(0, 0), **kwargs)[source]¶

Notes

Parameter when_zero doesn’t really matter in this case, since total_population is never zero, when targets are not empty.

total_population(*args, **kwargs)[source]¶

true_negative(label=None, *args, **kwargs)[source]¶

true_negative_rate(*args, when_zero=(0, 1), **kwargs)[source]¶

true_positive(label=None, *args, **kwargs)[source]¶

true_positive_rate(*args, when_zero=(0, 1), **kwargs)[source]¶

update(metrics)[source]¶: Update confusion matrix with data from another metrics

SegmentationMetricsByPixels¶

class SegmentationMetricsByPixels(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]¶

Bases: batchflow.models.metrics.classify.ClassificationMetrics

Metrics to assess segmentation models pixel-wise

Notes

Rate metrics are evaluated for each item independently. So there are two levels of metrics aggregation:

multi-class averaging
dataset aggregation.

For instance, if you have a dataset of 100 pictures (each having size of 256x256) of 10 classes and you need to calculate an accuracy of semantic segmentation, then:

evaluate([‘accuracy’], agg=None, multiclass=None) will return an array of shape (100, 10) containing accuracy of each class for each image separately.
evaluate([‘accuracy’], agg=’mean’, multiclass=None) will return a vector of shape (10,) containing an accuracy of each class averaged across all images.
evaluate([‘accuracy’], agg=None, multiclass=’macro’) will return a vector of shape (100,) containing an accuracy of each image separately averaged across all classes.
evaluate([‘accuracy’], agg=’mean’, multiclass=’macro’) will return a single value of an average accuracy of all classes and images combined.

The default values are agg=’mean’, multiclass=’macro’.

For multi-class averaging see ClassificationMetrics.

Examples

metrics = SegmentationMetricsByPixels(targets, predictions, num_classes=10, fmt='labels')
metrics.evaluate('specificity')
metrics.evaluate(['sensitivity', 'jaccard'], agg='mean', multiclass=None)

SegmentationMetricsByInstances¶

class SegmentationMetricsByInstances(targets, predictions, fmt='proba', num_classes=None, axis=None, skip_bg=True, threshold=0.5, iot=0.5, calc=True)[source]¶

Bases: batchflow.models.metrics.classify.ClassificationMetrics

Metrics to assess segmentation models by instances (i.e. connected components of one class, e.g. cancer nodules, faces, )

Parameters: iot (float) – if the ratio of a predicted instance size to the corresponding target size >= iot, then instance is considered correctly predicted (true postitive).

Notes

For other parameters see ClassificationMetrics.

condition_positive(label=None, *args, **kwargs)[source]¶

free()[source]¶: Free memory allocated for intermediate data

prediction_positive(label=None, *args, **kwargs)[source]¶

total_population(label=None, *args, **kwargs)[source]¶

true_negative(label=None, *args, **kwargs)[source]¶

true_positive(label=None, *args, **kwargs)[source]¶