5.6 Forecasting using transformations

Some common transformations which can be used when modelling were discussed in Section 3.1. When forecasting from a model with transformations, we first produce forecasts of the transformed data. Then, we need to reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformation is given by \[\begin{equation} \tag{5.1} y_{t} = \begin{cases} \exp(w_{t}) & \text{if $\lambda=0$};\\ (\lambda w_t+1)^{1/\lambda} & \text{otherwise}. \end{cases} \end{equation}\]

The fable package will automatically back-transform the forecasts whenever a transformation has been used in the model definition. The back-transformed forecast distribution is then a “transformed Normal” distribution.

Prediction intervals with transformations

If a transformation has been used, then the prediction interval is first computed on the transformed scale, then the end points are back-transformed to give a prediction interval on the original scale. This approach preserves the probability coverage of the prediction interval, although it will no longer be symmetric around the point forecast.

The back-transformation of prediction intervals is done automatically for fable models, provided that you have used a transformation in the model formula.

Transformations sometimes make little difference to the point forecasts but have a large effect on prediction intervals.

Forecasting with constraints

One common use of transformations is to ensure the forecasts remain on the appropriate scale. For example, log transformations constrain the forecasts to stay positive.

Another useful transformation is the scaled logit, which can be used to ensure that the forecasts are kept within a specific interval. A scaled logit that ensures the forecasted values are between \(a\) and \(b\) (where \(a<b\)) is given by: \[ f(x) = \log\left(\dfrac{x-a}{b-x}\right). \] Inverting this transformation gives the appropriate back-transformation of: \[ f^{-1}(x) = \dfrac{a + be^x}{1 + e^x} = \dfrac{(b-a)e^x}{1 + e^x} + a. \]

To use this transformation when modelling, we can create a new transformation with the new_transformation() function. This allows us to define two functions that accept the same parameters, where the observations are provided as the first argument. The first function is used to transform the data, the second is used to back-transform forecasts.

scaled_logit <- new_transformation(
  transformation = function(x, lower=0, upper=1){
    log((x-lower)/(upper-x))
  },
  inverse = function(x, lower=0, upper=1){
    (upper-lower)*exp(x)/(1+exp(x)) + lower
  }
)

With this new transformation function defined, it is now possible to restrict forecasts to be within a specified interval. For example, to restrict the forecasts to be between 0 and 100 you could use scaled_logit(y, 0, 100) as the model’s left hand side formula.

Bias adjustments

One issue with using mathematical transformations such as Box-Cox transformations is that the back-transformed point forecast will not be the mean of the forecast distribution. In fact, it will usually be the median of the forecast distribution (assuming that the distribution on the transformed space is symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, you may wish to add up sales forecasts from various regions to form a forecast for the whole country. But medians do not add up, whereas means do.

For a Box-Cox transformation, the back-transformed mean is given by \[\begin{equation} \tag{5.2} y_t = \begin{cases} \exp(w_t)\left[1 + \frac{\sigma_h^2}{2}\right] & \text{if $\lambda=0$;}\\ (\lambda w_t+1)^{1/\lambda}\left[1 + \frac{\sigma_h^2(1-\lambda)}{2(\lambda w_t+1)^{2}}\right] & \text{otherwise;} \end{cases} \end{equation}\] where \(\sigma_h^2\) is the \(h\)-step forecast variance on the transformed scale. The larger the forecast variance, the bigger the difference between the mean and the median.

The difference between the simple back-transformed forecast given by (5.1) and the mean given by (5.2) is called the bias. When we use the mean, rather than the median, we say the point forecasts have been bias-adjusted.

To see how much difference this bias-adjustment makes, consider the following example, where we forecast average annual price of eggs using the drift method with a log transformation \((\lambda=0)\). The log transformation is useful in this case to ensure the forecasts and the prediction intervals stay positive.

eggs <- as_tsibble(fma::eggs)
eggs %>%
  model(RW(log(value) ~ drift())) %>%
  forecast(h=50) %>%
  autoplot(eggs, level = 80, point_forecast = lst(mean, median))
#> Warning: Ignoring unknown aesthetics: linetype
Forecasts of egg prices using a random walk with drift applied to the logged data. The bias-adjusted mean forecasts are shown with a solid line, while the median forecasts are dashed.

Figure 5.11: Forecasts of egg prices using a random walk with drift applied to the logged data. The bias-adjusted mean forecasts are shown with a solid line, while the median forecasts are dashed.

The dashed line in Figure 5.11 shows the forecast medians while the solid line shows the forecast means. Notice how the skewed forecast distribution pulls up the forecast distribution’s mean, this is a result of the added term from the bias adjustment.

Bias adjusted forecast means are automatically computed in the fable package when using mean() on a distribution.. The forecast median (point forecast prior to bias adjustment) can be obtained using the median() function on the distribution.