The MAPE, as a percentage, only makes sense for values where divisions and ratios make sense. It doesn't make sense to calculate percentages of temperatures, for instance, so you shouldn't use the MAPE to calculate the accuracy of a temperature forecast.
If just a single actual is zero, $A_t=0$, then you divide by zero in calculating the MAPE, which is undefined.
It turns out that some forecasting software nevertheless reports a MAPE for such series, simply by dropping periods with zero actuals (Hoover, 2006). Needless to say, this is not a good idea, as it implies that we don't care at all about what we forecasted if the actual was zero - but a forecast of $F_t=100$ and one of $F_t=1000$ may have very different implications. So check what your software does.
If only a few zeros occur, you can use a weighted MAPE (Kolassa & Schütz, 2007), which nevertheless has problems of its own. This also applies to the symmetric MAPE (Goodwin & Lawton, 1999).
MAPEs greater than 100% can occur. If you prefer to work with accuracy, which some people define as 100%-MAPE, then this may lead to negative accuracy, which people may have a hard time understanding. (No, truncating accuracy at zero is not a good idea.)
Model fitting relies on minimizing errors, which is often done using numerical optimizers that use first or second derivatives. The MAPE is not everywhere differentiable, and its Hessian is zero wherever it is defined. This can throw optimizers off if we want to use the MAPE as an in-sample fit criterion.
A possible mitigation may be to use the log cosh loss function, which is similar to the MAE but twice differentiable. Alternatively, Zheng (2011) and Sluijterman et al., 2024Sluijterman et al., 2025 offer a way to approximate the MAE (or any other quantile loss) to arbitrary precision using a smooth function, and their methods can be adapted to the MAPE. If we know bounds on the actuals (which we do when fitting strictly positive historical data), we can therefore smoothly approximate the MAPE to arbitrary precision.
The MAPE treats overforecasts differently than underforecasts. Suppose our forecast is $F_t=2$, then an actual of $A_t=1$ will contribute $\text{APE}_t=100\%$ to the MAPE, but an actual of $A_t=3$ will contribute $\text{APE}_t=33\%$. Minimizing the MAPE thus creates an incentive towards smaller $F_t$ - if our actuals have an equal chance of being $A_t=1$ or $A_t=3$, then we will minimize the expected MAPE by forecasting $F_t=1.5$, not $F_t=2$, which is the expectation of our actuals. The MAPE thus is lower for biased than for unbiased forecasts. Minimizing it may lead to forecasts that are biased low.
Sluijterman, L., Kreuwel, F., Cator, E. & Heskes, T. Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball LossComposite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. ArXivInternational Journal of Machine Learning and Cybernetics:2406.02293, 20242025
