Understanding RMSE and RMSLE
According to the Official Google Cloud Certified Professional Machine Learning Engineer Study Guide:
The root‐mean‐squared error (RMSE) is the square root of the average squared difference between the target and predicted values. If you are worried that your model might incorrectly predict a very large value and want to penalize the model, you can use this. Ranges from 0 to infinity.
The root‐mean‐squared logarithmic error (RMSLE) metric is similar to RMSE, except that it uses the natural logarithm of the predicted and actual values +1. This is an asymmetric metric, which penalizes underprediction (value predicted is lower than actual) rather than overprediction.
Practical Testing
Upon testing what I thought this meant, here is what I saw:
from sklearn.metrics import mean_squared_error, root_mean_squared_error, root_mean_squared_error, root_mean_squared_log_error y_true = [60, 80, 90, 750] y_pred = [67, 78, 91, 102] root_mean_squared_log_error(y_pred=y_true, y_true= y_pred) 0.9949158238939428 root_mean_squared_log_error(y_pred=y_pred, y_true=y_true) 0.9949158238939428 root_mean_squared_error(y_pred, y_true) 324.02083266358045 root_mean_squared_error(y_true, y_pred) 324.02083266358045
Confusion and Clarification
The results were confusing for a couple of reasons:
- If RMSE is for when "your model might incorrectly predict a very large value and want to penalize the model", then shouldn't it be an asymmetric metric, as the authors allege that RMSLE is?
- Why are these two metrics, one of which is allegedly asymmetric, and the other presumably asymmetric, obviously symmetric metrics?
The answer revolves around the nature of logarithmic transformation used in RMSLE and its practical implications, which while mathematically symmetric, react differently to over and under-predictions based on the data scale.
No comments:
Post a Comment