Model Performance Metrics

Model Performance Metrics 


1. MAE — Mean Absolute Error

Formula:

MAE=1







n
i=1nyiy^i
\text{MAE} = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|

Meaning:

  • It measures the average magnitude of errors in a set of predictions, without considering their direction (positive or negative).

  • Lower is better.

  • It’s in the same units as the target variable.

Intuitive: “On average, my predictions are off by X units.”
⚠️ Downside: It treats all errors equally — large and small errors have the same weight.


2. MSE — Mean Squared Error

Formula:

MSE=1ni=1n(yiy^i)2\text{MSE} = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Meaning:

  • It measures the average squared difference between predicted and actual values.

  • Larger errors have exponentially more weight because of the squaring.

Highlights large errors — useful when large deviations are especially undesirable.
⚠️ Less interpretable since it’s in squared units of the target.


3. RMSE — Root Mean Squared Error

Formula:

RMSE=MSE=1ni=1n(yiy^i)2\text{RMSE} = \sqrt{\text{MSE}} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

Meaning:

  • It’s the square root of MSE, converting it back to the same units as the target variable.

  • Still penalizes large errors more heavily than MAE.

Common and intuitive metric — “average size of the error.”
⚠️ Sensitive to outliers.


4. R² Score — Coefficient of Determination

Formula:

R2=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}

where yˉ\bar{y} is the mean of the actual values.

Meaning:

  • Measures how well the model explains the variability of the target variable.

  • Range: -\infty to 11

    • R2=1R^2 = 1: perfect prediction

    • R2=0R^2 = 0: model predicts no better than the mean

    • R2<0R^2 < 0: model is worse than just predicting the mean

Explains “goodness of fit”
⚠️ Can be misleading for non-linear models or small sample sizes.


Quick Summary Table

MetricFormulaPenalizes Large ErrorsSame Units as TargetInterpretation
MAE( \frac{1}{n}\sumy_i - \hat{y}_i)❌ No
MSE1n(yiy^i)2\frac{1}{n}\sum (y_i - \hat{y}_i)^2✅ Yes (squared)❌ NoAvg. squared error
RMSE1n(yiy^i)2\sqrt{\frac{1}{n}\sum (y_i - \hat{y}_i)^2}✅ Yes (squared root)✅ YesAvg. error magnitude
1SSresSStot1 - \frac{\text{SS}_\text{res}}{\text{SS}_\text{tot}}UnitlessVariance explained


What should be standard values of these to call out model is good

That’s a great and practical question — and the answer depends on your data’s scale and context, but here’s how you can interpret these metrics in general.


⚙️ 1. MAE (Mean Absolute Error)

  • No fixed “good” value, because it depends on your target variable’s scale.

  • As a rule of thumb:

    • MAE should be as low as possible.

    • A good model usually has MAE that’s <10% of the average value of your target variable.

      MAE ratio=MAEyˉ<0.1\text{MAE ratio} = \frac{\text{MAE}}{\bar{y}} < 0.1
  • Example: If your target values average around 100, an MAE of ≤10 is generally quite good.


⚙️ 2. MSE (Mean Squared Error)

  • MSE’s magnitude depends on the square of your target variable, so it’s hard to interpret directly.

  • Use it mainly for comparison between models — the smaller, the better.

  • Often replaced with RMSE, which is easier to interpret.


⚙️ 3. RMSE (Root Mean Squared Error)

  • Interpreted in the same units as your target.

  • A good RMSE is typically 10–20% of the range or mean of the target variable.

  • Lower RMSE than MAE suggests your model is handling outliers well.

Good model:

RMSEyˉ<0.2\frac{\text{RMSE}}{\bar{y}} < 0.2

⚠️ Watch out: If RMSE is much larger than MAE, it means you have outliers or large errors that need attention.


⚙️ 4. R² (Coefficient of Determination)

This one has more universal interpretation:

R² ValueInterpretation
1.0Perfect predictions
≥ 0.9Excellent — model explains most variability
0.75–0.9Good — strong predictive power
0.5–0.75Moderate — acceptable depending on context
< 0.5Weak — model doesn’t explain much variance
< 0Model worse than predicting the mean

💡 Rule of thumb:

  • For engineering / physical systems, expect R² ≥ 0.9.

  • For social sciences or human behavior data, R² ≥ 0.6 can already be quite good due to inherent noise.


🧭 Quick Summary

MetricIdeal DirectionGood Rule of Thumb
MAE↓ Lower better<10% of mean(target)
MSE↓ Lower betterCompare between models
RMSE↓ Lower better<20% of mean(target)
↑ Higher better>0.75 good, >0.9 excellent

Comments

Popular posts from this blog

YouTube Tutorials & Blogs for 6-Month Speaker Development

AI Engineering Stack

GenAI Interview Question