12.8 Forecasting on training and test sets

Typically, we compute one-step forecasts on the training data (the “fitted values”) and multi-step forecasts on the test data. However, occasionally we may wish to compute multi-step forecasts on the training data, or one-step forecasts on the test data.

Multi-step forecasts on training data

We normally define fitted values to be one-step forecasts on the training set (see Section 3.3), but a similar idea can be used for multi-step forecasts. We will illustrate the method using an ARIMA(2,1,1)(0,1,2)\(_{12}\) model for the Australian eating-out expenditure. The last five years are used for a test set, and the forecasts are plotted in Figure 12.7.

training <- subset(auscafe, end=length(auscafe)-61)
test <- subset(auscafe, start=length(auscafe)-60)
cafe.train <- Arima(training, order=c(2,1,1),
  seasonal=c(0,1,2), lambda=0)
cafe.train %>%
  forecast(h=60) %>%
  autoplot() + autolayer(test)
Forecasts from an ARIMA model fitted to the Australian café training data.

Figure 12.7: Forecasts from an ARIMA model fitted to the Australian café training data.

The fitted() function has an h argument to allow for \(h\)-step “fitted values” on the training set. Figure 12.8 is a plot of 12-step (one year) forecasts on the training set. Because the model involves both seasonal (lag 12) and first (lag 1) differencing, it is not possible to compute these forecasts for the first few observations.

autoplot(training, series="Training data") +
  autolayer(fitted(cafe.train, h=12),
    series="12-step fitted values")
Twelve-step fitted values from an ARIMA model fitted to the Australian café training data.

Figure 12.8: Twelve-step fitted values from an ARIMA model fitted to the Australian café training data.

One-step forecasts on test data

It is common practice to fit a model using training data, and then to evaluate its performance on a test data set. The way this is usually done means the comparisons on the test data use different forecast horizons. In the above example, we have used the last sixty observations for the test data, and estimated our forecasting model on the training data. Then the forecast errors will be for 1-step, 2-steps, …, 60-steps ahead. The forecast variance usually increases with the forecast horizon, so if we are simply averaging the absolute or squared errors from the test set, we are combining results with different variances.

One solution to this issue is to obtain 1-step errors on the test data. That is, we still use the training data to estimate any parameters, but when we compute forecasts on the test data, we use all of the data preceding each observation (both training and test data). So our training data are for times \(1,2,\dots,T-60\). We estimate the model on these data, but then compute \(\hat{y}_{T-60+h|T-61+h}\), for \(h=1,\dots,T-1\). Because the test data are not used to estimate the parameters, this still gives us a “fair” forecast. For the ets(), Arima(), tbats() and nnetar() functions, these calculations are easily carried out using the model argument.

Using the same ARIMA model used above, we now apply the model to the test data.

cafe.test <- Arima(test, model=cafe.train)
accuracy(cafe.test)
#>                     ME    RMSE     MAE      MPE  MAPE   MASE     ACF1
#> Training set -0.002622 0.04591 0.03413 -0.07301 1.002 0.1899 -0.05704

Note that Arima() does not re-estimate in this case. Instead, the model obtained previously (and stored as cafe.train) is applied to the test data. Because the model was not re-estimated, the “residuals” obtained here are actually one-step forecast errors. Consequently, the results produced from the accuracy() command are actually on the test set (despite the output saying “Training set”).