1. Home
2. Questions
3. Unanswered
4. AI Assist
5. Tags
7. Chat
8. Users
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Return to Answer

Updated Sluijterman reference after publication

Source Link

edited May 27 at 14:52

Stephan Kolassa

137.9k
22
277
544

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 Sluijterman et al., 2025 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXivInternational Journal of Machine Learning and Cybernetics:2406.02293, 20242025

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2025 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. International Journal of Machine Learning and Cybernetics, 2025

added pointer to Sluijterman

Source Link

edited May 21 at 9:16

Stephan Kolassa

137.9k
22
277
544

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

changed URL for Mean APEs picture

Source Link

edited Apr 18, 2024 at 19:35

Stephan Kolassa

137.9k
22
277
544

Finally, you should not use the MAPE, because by definition, the MAPE is a Mean APE... and nobody likes Mean APEs.

Image credit: Ivan Svetunkov Ivan Svetunkov. Used with kind permission. Please attribute to him if you reuse it.

added Mean APEs picture by Ivan Svetunkov

Source Link

edited Apr 18, 2024 at 7:55

Stephan Kolassa

137.9k
22
277
544

Loading

added link to DS.SE question

Source Link

edited Dec 13, 2023 at 10:03

Stephan Kolassa

137.9k
22
277
544

Loading

added 29 characters in body

Source Link

edited Mar 26, 2023 at 17:47

kjetil b halvorsen ♦

85.6k
32
216
694

Loading

added 85 characters in body

Source Link

edited Dec 19, 2022 at 9:07

Stephan Kolassa

137.9k
22
277
544

Loading

added link to thread discussing incrementing both the numerator and the denominator by 1

Source Link

edited Sep 9, 2022 at 15:48

Stephan Kolassa

137.9k
22
277
544

Loading

added 170 characters in body

Source Link

edited Dec 13, 2021 at 9:26

Stephan Kolassa

137.9k
22
277
544

Loading

formatting

Source Link

edited Mar 3, 2021 at 8:35

Stephan Kolassa

137.9k
22
277
544

Loading

added 230 characters in body

Source Link

edited Feb 9, 2021 at 18:38

Stephan Kolassa

137.9k
22
277
544

Loading

added caveat about differentiability of MAPE and possible mitigation

Source Link

edited May 4, 2020 at 12:36

Stephan Kolassa

137.9k
22
277
544

Loading

Updated Kolassa (2020) reference

Source Link

edited Nov 5, 2019 at 10:56

Stephan Kolassa

137.9k
22
277
544

Loading

added McKenzie reference

Source Link

edited Sep 18, 2019 at 12:21

Stephan Kolassa

137.9k
22
277
544

Loading

added reference Kolassa (2019)

Source Link

edited Jul 10, 2019 at 14:08

Stephan Kolassa

137.9k
22
277
544

Loading

added 170 characters in body

Source Link

edited Jan 26, 2019 at 21:15

Stephan Kolassa

137.9k
22
277
544

Loading

deleted 1 character in body

Source Link

edited Sep 25, 2018 at 10:04

Stephan Kolassa

137.9k
22
277
544

Loading

added 312 characters in body

Source Link

edited Sep 25, 2018 at 9:49

Stephan Kolassa

137.9k
22
277
544

Loading

added 141 characters in body

Source Link

edited Sep 25, 2018 at 9:24

Stephan Kolassa

137.9k
22
277
544

Loading

added 49 characters in body

Source Link

edited Jul 11, 2018 at 7:46

Stephan Kolassa

137.9k
22
277
544

Loading

added 92 characters in body

Source Link

edited Feb 8, 2018 at 8:41

Stephan Kolassa

137.9k
22
277
544

Loading

deleted 2 characters in body

Source Link

edited Aug 25, 2017 at 10:32

Stephan Kolassa

137.9k
22
277
544

Loading

Source Link

answered Aug 25, 2017 at 8:49

Stephan Kolassa

137.9k
22
277
544

Loading