Classification metrics

Classification Metrics

Comprehensive guide to evaluating classification models. All metrics are derived from the Confusion Matrix.

📚 Core Metrics:

Accuracy — Overall correctness

Precision — Quality of positive predictions

Sensitivity (Recall) — Coverage of actual positives

Selectivity (Specificity) — Coverage of actual negatives

F1-Score — Balance of precision and recall

Area Under the ROC Curve (AUC) — Threshold-independent performance

Log Loss (Cross-Entropy) — Probabilistic evaluation

Brier Score — Calibration quality

Cohen's Kappa statistic — Agreement beyond chance

Cost-Sensitive Evaluation — Weighted error costs

Essential Metrics

Beyond Confusion Matrix there are several essential metrics:

Accuracy: The fraction of total correct predictions.
Sensitivity (Recall / True Positive Rate): The proportion of actual positive instances that were correctly detected.
Selectivity (Specificity / True Negative Rate):* The proportion of actual negative instances correctly identified as negative.
Precision (Positive Predictive Value): The ratio of correct positive predictions to the total number of positive predictions.
Area Under the ROC Curve (AUC): This is a single-number summary of the ROC curve. It measures the probability that a randomly chosen positive instance will be ranked higher by the model than a randomly chosen negative instance. An AUC of 1.0 represents a perfect classifier, while 0.5 represents random guessing.
Log Loss (Cross-Entropy): A metric used for models that output probabilities. It penalizes false predictions more heavily when the model is confident in its mistake.
Brier Score: Used to evaluate the Accuracy of probabilistic forecasts. It is calculated as the mean squared error of the predicted probabilities compared to the true one-hot labels.
Error Rate: The simple complement to Accuracy (1−Accuracy), representing the proportion of incorrect predictions.
Average Precision (AP) and Mean Average Precision (mAP): AP summarizes the Precision-Recall Curve by calculating the area under the interpolated curve. mAP is the average of AP values across multiple classes or queries, frequently used in object detection and information retrieval.