Skip to main content
Updated Sluijterman reference after publication
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024Sluijterman et al., 2025 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball LossComposite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXivInternational Journal of Machine Learning and Cybernetics:2406.02293, 20242025

  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2025 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. International Journal of Machine Learning and Cybernetics, 2025

added pointer to Sluijterman
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

  • The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.

  • If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.

    It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.

    If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).

  • MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)

  • Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.

    A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.

  • The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.

Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXiv:2406.02293, 2024

changed URL for Mean APEs picture
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
  • Finally, you should not use the MAPE, because by definition, the MAPE is a Mean APE... and nobody likes Mean APEs.

    A picture of angry chimpanzees in business suits arguing

    Image credit: Ivan SvetunkovIvan Svetunkov. Used with kind permission. Please attribute to him if you reuse it.

  • Finally, you should not use the MAPE, because by definition, the MAPE is a Mean APE... and nobody likes Mean APEs.

    A picture of angry chimpanzees in business suits arguing

    Image credit: Ivan Svetunkov. Used with kind permission. Please attribute to him if you reuse it.

  • Finally, you should not use the MAPE, because by definition, the MAPE is a Mean APE... and nobody likes Mean APEs.

    A picture of angry chimpanzees in business suits arguing

    Image credit: Ivan Svetunkov. Used with kind permission. Please attribute to him if you reuse it.

added Mean APEs picture by Ivan Svetunkov
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added link to DS.SE question
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 29 characters in body
Source Link
kjetil b halvorsen
  • 85.6k
  • 32
  • 216
  • 694
Loading
added 85 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added link to thread discussing incrementing both the numerator and the denominator by 1
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 170 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
formatting
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 230 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added caveat about differentiability of MAPE and possible mitigation
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
Updated Kolassa (2020) reference
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added McKenzie reference
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added reference Kolassa (2019)
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 170 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
deleted 1 character in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 312 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 141 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 49 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
added 92 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
deleted 2 characters in body
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading
Source Link
Stephan Kolassa
  • 137.9k
  • 22
  • 277
  • 544
Loading