## 5.7 Time series cross-validation

A more sophisticated version of training/test sets is time series cross-validation. In this procedure, there are a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred *prior* to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. Since it is not possible to obtain a reliable forecast based on a small training set, the earliest observations are not considered as test sets.

The following diagram illustrates the series of training and test sets, where the blue observations form the training sets, and the red observations form the test sets.

The forecast accuracy is computed by averaging over the test sets. This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.

With time series forecasting, one-step forecasts may not be as relevant as multi-step forecasts. In this case, the cross-validation procedure based on a rolling forecasting origin can be modified to allow multi-step errors to be used. Suppose that we are interested in models that produce good \(4\)-step-ahead forecasts. Then the corresponding diagram is shown below.

In the following example, we compare the accuracy obtained via time series cross-validation with the residual accuracy.

```
# Time series cross-validation accuracy
google_2015_tr <- google_2015 %>%
slice(1:(n()-1)) %>%
stretch_tsibble(.init = 3, .step = 1)
fc <- google_2015_tr %>%
model(RW(Close ~ drift())) %>%
forecast(h=1)
fc %>% accuracy(google_2015)
# Residual accuracy
google_2015 %>% model(RW(Close ~ drift())) %>% accuracy()
```

Evaluation method | RMSE | MAE | MAPE | MASE |
---|---|---|---|---|

Cross-validation | 11.27 | 7.26 | 1.19 | 1.02 |

Training | 11.15 | 7.16 | 1.18 | 1.00 |

As expected, the accuracy measures from the residuals are smaller, as the corresponding “forecasts” are based on a model fitted to the entire data set, rather than being true forecasts.

A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validation.

### Example: Forecast horizon accuracy with cross-validation

The `google_2015`

subset of the `gafa_stock`

data, plotted in Figure 5.4, includes daily closing stock price of Google Inc from the NASDAQ exchange for all trading days in 2015.

The code below evaluates the forecasting performance of 1- to 8-step-ahead drift forecasts. The plot shows that the forecast error increases as the forecast horizon increases, as we would expect.

```
google_2015_tr <- google_2015 %>%
slice(1:(n()-8)) %>%
stretch_tsibble(.init = 3, .step = 1)
fc <- google_2015_tr %>%
model(RW(Close ~ drift())) %>%
forecast(h=8) %>%
group_by(.id) %>%
mutate(h = row_number()) %>%
ungroup()
fc %>%
accuracy(google_2015, by = "h") %>%
ggplot(aes(x = h, y = RMSE)) + geom_point()
```