## 8.2 Methods with trend

### Holt’s linear trend method

Holt (1957) extended simple exponential smoothing to allow the forecasting of data with a trend. This method involves a forecast equation and two smoothing equations (one for the level and one for the trend): \[\begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation} && b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}, \end{align*}\] where \(\ell_t\) denotes an estimate of the level of the series at time \(t\), \(b_t\) denotes an estimate of the trend (slope) of the series at time \(t\), \(\alpha\) is the smoothing parameter for the level, \(0\le\alpha\le1\), and \(\beta^*\) is the smoothing parameter for the trend, \(0\le\beta^*\le1\). (We denote this as \(\beta^*\) instead of \(\beta\) for reasons that will be explained in Section 8.5.)

As with simple exponential smoothing, the level equation here shows that \(\ell_t\) is a weighted average of observation \(y_t\) and the one-step-ahead training forecast for time \(t\), here given by \(\ell_{t-1} + b_{t-1}\). The trend equation shows that \(b_t\) is a weighted average of the estimated trend at time \(t\) based on \(\ell_{t} - \ell_{t-1}\) and \(b_{t-1}\), the previous estimate of the trend.

The forecast function is no longer flat but trending. The \(h\)-step-ahead forecast is equal to the last estimated level plus \(h\) times the last estimated trend value. Hence the forecasts are a linear function of \(h\).

### Example: Australian population

```
<- global_economy %>%
aus_economy filter(Code == "AUS") %>%
mutate(Pop = Population / 1e6)
autoplot(aus_economy, Pop) +
labs(y = "Millions", title = "Australian population")
```

Figure 8.3 shows Australia’s annual population from 1960 to 2017. We will apply Holt’s method to this series. The smoothing parameters, \(\alpha\) and \(\beta^*\), and the initial values \(\ell_0\) and \(b_0\) are estimated by minimising the SSE for the one-step training errors as in Section 8.1.

```
<- aus_economy %>%
fit model(
AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))
)<- fit %>% forecast(h = 10) fc
```

The estimated smoothing coefficient for the level is \(\hat{\alpha} = 0.9999\). The very high value shows that the level changes rapidly in order to capture the highly trended series. The estimated smoothing coefficient for the slope is \(\hat{\beta}^* = 0.3267\). This is relatively large suggesting that the trend also changes often (even if the changes are slight).

In Table 8.2 we use these values to demonstrate the application of Holt’s method.

Year | Time | Observation | Level | Slope | Forecast |
---|---|---|---|---|---|

\(t\) | \(y_t\) | \(\ell_t\) | \(\hat{y}_{t+1\mid t}\) | ||

1959 | 0 | 10.05 | 0.22 | ||

1960 | 1 | 10.28 | 10.28 | 0.22 | 10.28 |

1961 | 2 | 10.48 | 10.48 | 0.22 | 10.50 |

1962 | 3 | 10.74 | 10.74 | 0.23 | 10.70 |

1963 | 4 | 10.95 | 10.95 | 0.22 | 10.97 |

1964 | 5 | 11.17 | 11.17 | 0.22 | 11.17 |

1965 | 6 | 11.39 | 11.39 | 0.22 | 11.39 |

1966 | 7 | 11.65 | 11.65 | 0.23 | 11.61 |

⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |

2014 | 55 | 23.50 | 23.50 | 0.37 | 23.52 |

2015 | 56 | 23.85 | 23.85 | 0.36 | 23.87 |

2016 | 57 | 24.21 | 24.21 | 0.36 | 24.21 |

2017 | 58 | 24.60 | 24.60 | 0.37 | 24.57 |

\(h\) | \(\hat{y}_{T+h\mid T}\) | ||||

2018 | 1 | 24.97 | |||

2019 | 2 | 25.34 | |||

2020 | 3 | 25.71 | |||

2021 | 4 | 26.07 | |||

2022 | 5 | 26.44 | |||

2023 | 6 | 26.81 | |||

2024 | 7 | 27.18 | |||

2025 | 8 | 27.55 | |||

2026 | 9 | 27.92 | |||

2027 | 10 | 28.29 |

### Damped trend methods

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner & McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very successful, and are arguably the most popular individual methods when forecasts are required automatically for many series.

In conjunction with the smoothing parameters \(\alpha\) and \(\beta^*\) (with values between 0 and 1 as in Holt’s method), this method also includes a damping parameter \(0<\phi<1\): \[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*}\] If \(\phi=1\), the method is identical to Holt’s linear method. For values between \(0\) and \(1\), \(\phi\) dampens the trend so that it approaches a constant some time in the future. In fact, the forecasts converge to \(\ell_T+\phi b_T/(1-\phi)\) as \(h\rightarrow\infty\) for any value \(0<\phi<1\). This means that short-run forecasts are trended while long-run forecasts are constant.

In practice, \(\phi\) is rarely less than 0.8 as the damping has a very strong effect for smaller values. Values of \(\phi\) close to 1 will mean that a damped model is not able to be distinguished from a non-damped model. For these reasons, we usually restrict \(\phi\) to a minimum of 0.8 and a maximum of 0.98.

### Example: Australian Population (continued)

Figure 8.4 shows the forecasts for years 2018–2032 generated from Holt’s linear trend method and the damped trend method.

```
%>%
aus_economy model(
`Holt's method` = ETS(Pop ~ error("A") +
trend("A") + season("N")),
`Damped Holt's method` = ETS(Pop ~ error("A") +
trend("Ad", phi = 0.9) + season("N"))
%>%
) forecast(h = 15) %>%
autoplot(aus_economy, level = NULL) +
labs(title = "Australian population",
y = "Millions") +
guides(colour = guide_legend(title = "Forecast"))
```

We have set the damping parameter to a relatively low number \((\phi=0.90)\) to exaggerate the effect of damping for comparison. Usually, we would estimate \(\phi\) along with the other parameters. We have also used a rather large forecast horizon (\(h=15\)) to highlight the difference between a damped trend and a linear trend.

### Example: Internet usage

In this example, we compare the forecasting performance of the three exponential smoothing methods that we have considered so far in forecasting the number of users connected to the internet via a server. The data is observed over 100 minutes and is shown in Figure 8.5.

```
<- as_tsibble(WWWusage)
www_usage %>% autoplot(value) +
www_usage labs(x="Minute", y="Number of users",
title = "Internet usage per minute")
```

We will use time series cross-validation to compare the one-step forecast accuracy of the three methods.

```
%>%
www_usage stretch_tsibble(.init = 10) %>%
model(
SES = ETS(value ~ error("A") + trend("N") + season("N")),
Holt = ETS(value ~ error("A") + trend("A") + season("N")),
Damped = ETS(value ~ error("A") + trend("Ad") +
season("N"))
%>%
) forecast(h = 1) %>%
accuracy(www_usage)
#> # A tibble: 3 x 10
#> .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Damped Test 0.288 3.69 3.00 0.347 2.26 0.663 0.636 0.336
#> 2 Holt Test 0.0610 3.87 3.17 0.244 2.38 0.701 0.668 0.296
#> 3 SES Test 1.46 6.05 4.81 0.904 3.55 1.06 1.04 0.803
```

Damped Holt’s method is best whether you compare MAE or RMSE values. So we will proceed with using the damped Holt’s method and apply it to the whole data set to get forecasts for future minutes.

```
<- www_usage %>%
fit model(
Damped = ETS(value ~ error("A") + trend("Ad") +
season("N"))
)# Estimated parameters:
tidy(fit)
#> # A tibble: 5 x 3
#> .model term estimate
#> <chr> <chr> <dbl>
#> 1 Damped alpha 1.00
#> 2 Damped beta 0.997
#> 3 Damped phi 0.815
#> 4 Damped l[0] 90.4
#> 5 Damped b[0] -0.0173
```

The smoothing parameter for the slope is estimated to be almost one, indicating that the trend changes to mostly reflect the slope between the last two minutes of internet usage. The value of \(\alpha\) is very close to one, showing that the level reacts strongly to each new observation.

```
%>%
fit forecast(h = 10) %>%
autoplot(www_usage) +
labs(x="Minute", y="Number of users",
title = "Internet usage per minute")
```

The resulting forecasts look sensible with decreasing trend, which flattens out due to the low value of the damping parameter (0.815), and relatively wide prediction intervals reflecting the variation in the historical data. The prediction intervals are calculated using the methods described in Section 8.7.

In this example, the process of selecting a method was relatively easy as both MSE and MAE comparisons suggested the same method (damped Holt’s). However, sometimes different accuracy measures will suggest different forecasting methods, and then a decision is required as to which forecasting method we prefer to use. As forecasting tasks can vary by many dimensions (length of forecast horizon, size of test set, forecast error measures, frequency of data, etc.), it is unlikely that one method will be better than all others for all forecasting scenarios. What we require from a forecasting method are consistently sensible forecasts, and these should be frequently evaluated against the task at hand.