dataset.models.metrics¶
Metrics¶
- class Metrics(*args, **kwargs)[source]¶
Base metrics evaluation class
This class is not supposed to be instantiated. Use specific children classes instead (e.g.
ClassificationMetrics
).Examples
m = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels') m.evaluate(['sensitivity', 'specificity'], multiclass='micro')
ClassificationMetrics¶
- class ClassificationMetrics(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]¶
Bases:
batchflow.models.metrics.base.Metrics
Metrics to assess classification models
- Parameters
targets (np.array) – Ground-truth labels / probabilities / logits
predictions (np.array) – Predicted labels / probabilites / logits
num_classes (int) – the number of classes (default is None)
fmt ('proba', 'logits', 'labels') – whether arrays contain probabilities, logits or labels
axis (int) – a class axis (default is None)
threshold (float) – A probability level for binarization (lower values become 0, equal or greater values become 1)
Notes
Input arrays (targets and predictions) might be vectors or multidimensional arrays, where the first dimension represents batch items. The latter is useful for pixel-level metrics.
Both targets and predictions usually contain the same data (labels, probabilities or logits). However, targets might be labels, while predictions are probabilities / logits. For that to work:
targets should have the shape which exactly 1 dimension smaller, than predictions shape;
axis should point to that dimension;
fmt should contain format of predictions.
When axis is specified, predictions should be a one-hot array with class information provided in the given axis (class probabilities or logits). In this case targets can contain labels (see above) or probabilities / logits in the very same axis.
If fmt is ‘labels’, num_classes should be specified. Due to randomness any given batch may not contain items of some classes, so all the labels cannot be inferred as simply as labels.max().
If fmt is ‘proba’ or ‘logits’, then axis points to the one-hot dimension. However, if axis is None, two class classification is assumed and targets / predictions should contain probabilities or logits for a positive class only.
Metrics All metrics return:
a single value if input is a vector for a 2-class task.
a single value if input is a vector for a multiclass task and multiclass averaging is enabled.
a vector with batch size items if input is a multidimensional array (e.g. images or sequences) and there are just 2 classes or multiclass averaging is on.
a vector with num_classes items if input is a vector for multiclass case without averaging.
a 2d array (batch_items, num_classes) for multidimensional inputs in a multiclass case without averaging.
Note
Count-based metrics (true_positive, false_positive, etc.) do not support mutliclass averaging. They always return counts for each class separately. For multiclass tasks rate metrics, such as true_positive_rate, false_positive_rate, etc., might seem more convenient.
Multiclass metrics
In a multiclass case metrics might be calculated with or without class averaging.
Available methods are:
None - no averaging, calculate metrics for each class individually (one-vs-all)
- ‘micro’ - calculate metrics globally by counting the total true positives,
false negatives, false positives, etc. across all classes
‘macro’ - calculate metrics for each class, and take their mean.
Examples
metrics = ClassificationMetrics(targets, predictions, num_classes=10, fmt='labels') metrics.evaluate(['sensitivity', 'specificity'], multiclass='macro')
- property confusion_matrix¶
- plot_confusion_matrix(classes=None, normalize=False, **kwargs)[source]¶
Plot confusion matrix.
- Parameters
classes (sequence, optional) – Sequence of classes labels.
normalize (bool) – Whether to normalize confusion matrix over target classes.
SegmentationMetricsByPixels¶
- class SegmentationMetricsByPixels(targets, predictions, fmt='proba', num_classes=None, axis=None, threshold=0.5, skip_bg=False, calc=True)[source]¶
Bases:
batchflow.models.metrics.classify.ClassificationMetrics
Metrics to assess segmentation models pixel-wise
Notes
Rate metrics are evaluated for each item independently. So there are two levels of metrics aggregation:
multi-class averaging
dataset aggregation.
For instance, if you have a dataset of 100 pictures (each having size of 256x256) of 10 classes and you need to calculate an accuracy of semantic segmentation, then:
evaluate([‘accuracy’], agg=None, multiclass=None) will return an array of shape (100, 10) containing accuracy of each class for each image separately.
evaluate([‘accuracy’], agg=’mean’, multiclass=None) will return a vector of shape (10,) containing an accuracy of each class averaged across all images.
evaluate([‘accuracy’], agg=None, multiclass=’macro’) will return a vector of shape (100,) containing an accuracy of each image separately averaged across all classes.
evaluate([‘accuracy’], agg=’mean’, multiclass=’macro’) will return a single value of an average accuracy of all classes and images combined.
The default values are agg=’mean’, multiclass=’macro’.
For multi-class averaging see
ClassificationMetrics
.Examples
metrics = SegmentationMetricsByPixels(targets, predictions, num_classes=10, fmt='labels') metrics.evaluate('specificity') metrics.evaluate(['sensitivity', 'jaccard'], agg='mean', multiclass=None)
SegmentationMetricsByInstances¶
- class SegmentationMetricsByInstances(targets, predictions, fmt='proba', num_classes=None, axis=None, skip_bg=True, threshold=0.5, iot=0.5, calc=True)[source]¶
Bases:
batchflow.models.metrics.classify.ClassificationMetrics
Metrics to assess segmentation models by instances (i.e. connected components of one class, e.g. cancer nodules, faces, )
- Parameters
iot (float) – if the ratio of a predicted instance size to the corresponding target size >= iot, then instance is considered correctly predicted (true postitive).
Notes
For other parameters see
ClassificationMetrics
.