## 5.4 Residual diagnostics

A good forecasting method will yield residuals with the following properties:

- The residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts.
- The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts are biased.

Any forecasting method that does not satisfy these properties can be improved. However, that does not mean that forecasting methods that satisfy these properties cannot be improved. It is possible to have several different forecasting methods for the same data set, all of which satisfy these properties. Checking these properties is important in order to see whether a method is using all of the available information, but it is not a good way to select a forecasting method.

If either of these properties is not satisfied, then the forecasting method can be modified to give better forecasts. Adjusting for bias is easy: if the residuals have mean \(m\), then simply add \(m\) to all forecasts and the bias problem is solved. Fixing the correlation problem is harder, and we will not address it until Chapter 10.

In addition to these essential properties, it is useful (but not necessary) for the residuals to also have the following two properties.

- The residuals have constant variance.
- The residuals are normally distributed.

These two properties make the calculation of prediction intervals easier (see Section 5.5 for an example). However, a forecasting method that does not satisfy these properties cannot necessarily be improved. Sometimes applying a Box-Cox transformation may assist with these properties, but otherwise there is usually little that you can do to ensure that your residuals have constant variance and a normal distribution. Instead, an alternative approach to obtaining prediction intervals is necessary. Again, we will not address how to do this until later in the book.

### Example: Forecasting the Google daily closing stock price

We will continue with the Google daily closing stock price example from the previous chapter. For stock market prices and indexes, the best forecasting method is often the naïve method. That is, each forecast is simply equal to the last observed value, or \(\hat{y}_{t} = y_{t-1}\). Hence, the residuals are simply equal to the difference between consecutive observations: \[ e_{t} = y_{t} - \hat{y}_{t} = y_{t} - y_{t-1}. \]

The following graph shows the Google daily closing stock price for trading days during 2015. The large jump corresponds to 17 July 2015 when the price jumped 16% due to unexpectedly strong second quarter results. (The `google_2015`

object was created in Section 5.2.)

```
google_2015 %>% autoplot(Close) +
xlab("Day") + ylab("Closing Price (US$)") +
ggtitle("Google Stock in 2015")
```

The residuals obtained from forecasting this series using the naïve method are shown in Figure 5.5. The large positive residual is a result of the unexpected price jump in July.

```
aug <- google_2015 %>% model(NAIVE(Close)) %>% augment()
aug %>% autoplot(.resid) + xlab("Day") + ylab("") +
ggtitle("Residuals from naïve method")
```

These graphs show that the naïve method produces forecasts that appear to account for all available information. The mean of the residuals is close to zero and there is no significant correlation in the residuals series. The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant. This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal — the right tail seems a little too long, even when we ignore the outlier. Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.

A convenient shortcut for producing these residual diagnostic graphs is the `gg_tsresiduals()`

function, which will produce a time plot, ACF plot and histogram of the residuals.

### Portmanteau tests for autocorrelation

In addition to looking at the ACF plot, we can also do a more formal test for autocorrelation by considering a whole set of \(r_k\) values as a group, rather than treating each one separately.

Recall that \(r_k\) is the autocorrelation for lag \(k\). When we look at the ACF plot to see whether each spike is within the required limits, we are implicitly carrying out multiple hypothesis tests, each one with a small probability of giving a false positive. When enough of these tests are done, it is likely that at least one will give a false positive, and so we may conclude that the residuals have some remaining autocorrelation, when in fact they do not.

In order to overcome this problem, we test whether the first \(\ell\) autocorrelations are significantly different from what would be expected from a white noise process. A test for a group of autocorrelations is called a **portmanteau test**, from a French word describing a suitcase containing a number of items.

One such test is the **Box-Pierce test**, based on the following statistic
\[
Q = T \sum_{k=1}^\ell r_k^2,
\]
where \(\ell\) is the maximum lag being considered and \(T\) is the number of observations. If each \(r_k\) is close to zero, then \(Q\) will be small. If some \(r_k\) values are large (positive or negative), then \(Q\) will be large. We suggest using \(\ell=10\) for non-seasonal data and \(\ell=2m\) for seasonal data, where \(m\) is the period of seasonality. However, the test is not good when \(\ell\) is large, so if these values are larger than \(T/5\), then use \(\ell=T/5\)

A related (and more accurate) test is the **Ljung-Box test**, based on
\[
Q^* = T(T+2) \sum_{k=1}^\ell (T-k)^{-1}r_k^2.
\]

Again, large values of \(Q^*\) suggest that the autocorrelations do not come from a white noise series.

How large is too large? If the autocorrelations did come from a white noise series, then both \(Q\) and \(Q^*\) would have a \(\chi^2\) distribution with \((\ell - K)\) degrees of freedom, where \(K\) is the number of parameters in the model. If they are calculated from raw data (rather than the residuals from a model), then set \(K=0\).

For the Google stock price example, the naïve model has no parameters, so \(K=0\) in that case also.

```
# lag=h and fitdf=K
aug %>% features(.resid, box_pierce, lag=10, dof=0)
#> # A tibble: 1 x 4
#> Symbol .model bp_stat bp_pvalue
#> <chr> <chr> <dbl> <dbl>
#> 1 GOOG NAIVE(Close) 7.74 0.654
aug %>% features(.resid, ljung_box, lag=10, dof=0)
#> # A tibble: 1 x 4
#> Symbol .model lb_stat lb_pvalue
#> <chr> <chr> <dbl> <dbl>
#> 1 GOOG NAIVE(Close) 7.91 0.637
```

For both \(Q\) and \(Q^*\), the results are not significant (i.e., the \(p\)-values are relatively large). Thus, we can conclude that the residuals are not distinguishable from a white noise series.

An alternative simple approach that may be appropriate for forecasting the Google daily closing stock price is the drift method. The `tidy()`

function shows the one estimated parameter, the drift coefficient, measuring the average daily change observed in the historical data.

```
fit <- google_2015 %>% model(RW(Close~drift()))
fit %>% tidy()
#> # A tibble: 1 x 7
#> Symbol .model term estimate std.error statistic p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 GOOG RW(Close ~ drift()) b 0.944 0.705 1.34 0.182
```

Applying the Ljung-Box test, we set \(K=1\) to account for the estimated parameter.

```
augment(fit) %>% features(.resid, ljung_box, lag=10, dof=1)
#> # A tibble: 1 x 4
#> Symbol .model lb_stat lb_pvalue
#> <chr> <chr> <dbl> <dbl>
#> 1 GOOG RW(Close ~ drift()) 7.91 0.543
```

As with the naïve approach, the residuals from the drift method are indistinguishable from a white noise series.