Log Loss (Cross-Entropy)

Log Loss, often used synonymously with Cross-Entropy, is the standard objective function for classification models that output probabilities, such as logistic regression and neural networks. It measures the performance of a model by comparing its predicted probability distribution against the actual ground truth labels.

1. The Information Theory Foundation

Cross-entropy originated in information theory as a measure of the distance between two probability distributions.

2. Mathematical Formulas

Log loss calculates a continuous-valued score for each instance rather than a discrete 0-1 error.

Pasted image 20260319200031.png

where p(i) is the model's predicted probability for class 1.

Pasted image 20260319200113.png

3. Interpretation and Behavior

The primary purpose of log loss is to punish confident incorrect predictions.

4. Relationship to Maximum Likelihood Estimation (MLE)

In statistical terms, minimizing the Negative Log-Likelihood (NLL) of a model is exactly the same as minimizing its cross-entropy.

5. Training Advantages: Why not use MSE?

While Mean Squared Error (MSE) is standard for regression, cross-entropy is preferred for classification due to its gradient behavior.

6. Numerical Stability: The LogSumExp Trick

In practical implementations, computing log(softmax(a)) directly can lead to numerical overflow if the input values (logits) are large.

Powered by Forestry.md