13.4 Forecast combinations

An easy way to improve forecast accuracy is to use several different methods on the same time series, and to average the resulting forecasts. Over 50 years ago, John Bates and Clive Granger wrote a famous paper (Bates & Granger, 1969), showing that combining forecasts often leads to better forecast accuracy. Twenty years later, Clemen (1989) wrote

The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. In many cases one can make dramatic performance improvements by simply averaging the forecasts.

While there has been considerable research on using weighted averages, or some other more complicated combination approach, using a simple average has proven hard to beat.

Here is an example using monthly revenue from take-away food in Australia, from April 1982 to December 2018. We use forecasts from the following models: ETS, STL-ETS, and ARIMA; and we compare the results using the last 5 years (60 months) of observations.

auscafe <- aus_retail |>
  filter(stringr::str_detect(Industry, "Takeaway")) |>
  summarise(Turnover = sum(Turnover))
train <- auscafe |>
  filter(year(Month) <= 2013)
STLF <- decomposition_model(
  STL(log(Turnover) ~ season(window = Inf)),
  ETS(season_adjust ~ season("N"))
)
cafe_models <- train |>
  model(
    ets = ETS(Turnover),
    stlf = STLF,
    arima = ARIMA(log(Turnover))
  ) |>
  mutate(combination = (ets + stlf + arima) / 3)
cafe_fc <- cafe_models |>
  forecast(h = "5 years")

Notice that we form a combination in the mutate() function by simply taking a linear function of the estimated models. This very simple syntax will automatically handle the forecast distribution appropriately by taking account of the correlation between the forecast errors of the models that are included. However, to keep the next plot simple, we will omit the prediction intervals.

cafe_fc |>
  autoplot(auscafe |> filter(year(Month) > 2008),
           level = NULL) +
  labs(y = "$ billion",
       title = "Australian monthly expenditure on eating out")
Point forecasts from various methods applied to Australian monthly expenditure on eating out.

Figure 13.6: Point forecasts from various methods applied to Australian monthly expenditure on eating out.

cafe_fc |>
  accuracy(auscafe) |>
  arrange(RMSE)
#> # A tibble: 4 × 10
#>   .model      .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
#>   <chr>       <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 combination Test    8.09  41.0  31.8  0.401  2.19 0.776 0.790 0.747
#> 2 arima       Test  -25.4   46.2  38.9 -1.77   2.65 0.949 0.890 0.786
#> 3 stlf        Test  -36.9   64.1  51.7 -2.55   3.54 1.26  1.23  0.775
#> 4 ets         Test   86.5  122.  101.   5.51   6.66 2.46  2.35  0.880

ARIMA does particularly well with this series, while the combination approach does even better (based on most measures including RMSE and MAE). For other data, ARIMA may be quite poor, while the combination approach is usually not far off, or better than, the best component method.

Forecast combination distributions

The cafe_fc object contains forecast distributions, from which any prediction interval can usually be computed. Let’s look at the intervals for the first period.

cafe_fc |> filter(Month == min(Month))
#> # A fable: 4 x 4 [1M]
#> # Key:     .model [4]
#>   .model         Month           Turnover .mean
#>   <chr>          <mth>             <dist> <dbl>
#> 1 ets         2014 Jan      N(1289, 1118) 1289.
#> 2 stlf        2014 Jan t(N(7.2, 0.00063)) 1326.
#> 3 arima       2014 Jan t(N(7.2, 0.00061)) 1283.
#> 4 combination 2014 Jan               1299 1299.

The first three are a mixture of normal and transformed normal distributions. The package does not yet combine such diverse distributions, so the combination output is simply the mean instead.

However, if we work with simulated sample paths, it is possible to create forecast distributions for the combination forecast as well.

cafe_futures <- cafe_models |>
  # Generate 1000 future sample paths
  generate(h = "5 years", times = 1000) |>
  # Compute forecast distributions from future sample paths
  as_tibble() |>
  group_by(Month, .model) |>
  summarise(
    dist = distributional::dist_sample(list(.sim))
  ) |>
  ungroup() |>
  # Create fable object
  as_fable(index = Month, key = .model,
           distribution = dist, response = "Turnover")
# Forecast distributions for h=1
cafe_futures |> filter(Month == min(Month))
#> # A fable: 4 x 3 [1M]
#> # Key:     .model [4]
#>      Month .model              dist
#>      <mth> <chr>             <dist>
#> 1 2014 Jan arima       sample[1000]
#> 2 2014 Jan combination sample[1000]
#> 3 2014 Jan ets         sample[1000]
#> 4 2014 Jan stlf        sample[1000]

Now all four models, including the combination, are stored as empirical distributions, and we can plot prediction intervals for the combination forecast, as shown in Figure 13.7.

cafe_futures |>
  filter(.model == "combination") |>
  autoplot(auscafe |> filter(year(Month) > 2008)) +
  labs(y = "$ billion",
       title = "Australian monthly expenditure on eating out")
Prediction intervals for the combination forecast of Australian monthly expenditure on eating out.

Figure 13.7: Prediction intervals for the combination forecast of Australian monthly expenditure on eating out.

To check the accuracy of the 95% prediction intervals, we can use a Winkler score (defined in Section 5.9).

cafe_futures |>
  accuracy(auscafe, measures = interval_accuracy_measures,
    level = 95) |>
  arrange(winkler)
#> # A tibble: 4 × 3
#>   .model      .type winkler
#>   <chr>       <chr>   <dbl>
#> 1 combination Test     420.
#> 2 stlf        Test     596.
#> 3 ets         Test     731.
#> 4 arima       Test     766.

Lower is better, so the combination forecast is again better than any of the component models.

Bibliography

Bates, J. M., & Granger, C. W. J. (1969). The combination of forecasts. Operational Research Quarterly, 20(4), 451–468. [DOI]
Clemen, R. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559–583. [DOI]