Classification metrics
Classification Metrics
Comprehensive guide to evaluating classification models. All metrics are derived from the Confusion Matrix.
π Core Metrics:
- Accuracy β Overall correctness
- Precision β Quality of positive predictions
- Sensitivity (Recall) β Coverage of actual positives
- Selectivity (Specificity) β Coverage of actual negatives
- F1-Score β Balance of precision and recall
- Area Under the ROC Curve (AUC) β Threshold-independent performance
- Log Loss (Cross-Entropy) β Probabilistic evaluation
- Brier Score β Calibration quality
- Cohen's Kappa statistic β Agreement beyond chance
- Cost-Sensitive Evaluation β Weighted error costs
Essential Metrics
Beyond Confusion Matrix there are several essential metrics:
-
Accuracy: The fraction of total correct predictions.
-
Sensitivity (Recall / True Positive Rate): The proportion of actual positive instances that were correctly detected.
-
Selectivity (Specificity / True Negative Rate):* The proportion of actual negative instances correctly identified as negative.
-
Precision (Positive Predictive Value): The ratio of correct positive predictions to the total number of positive predictions.
-
Area Under the ROC Curve (AUC): This is a single-number summary of the ROC curve. It measures the probability that a randomly chosen positive instance will be ranked higher by the model than a randomly chosen negative instance. An AUC of 1.0 represents a perfect classifier, while 0.5 represents random guessing.
-
Log Loss (Cross-Entropy): A metric used for models that output probabilities. It penalizes false predictions more heavily when the model is confident in its mistake.
-
Brier Score: Used to evaluate the Accuracy of probabilistic forecasts. It is calculated as the mean squared error of the predicted probabilities compared to the true one-hot labels.
-
Error Rate: The simple complement to Accuracy (1βAccuracy), representing the proportion of incorrect predictions.
-
Average Precision (AP) and Mean Average Precision (mAP): AP summarizes the Precision-Recall Curve by calculating the area under the interpolated curve. mAP is the average of AP values across multiple classes or queries, frequently used in object detection and information retrieval.