5.3 Fitted values and residuals

Fitted values

Each observation in a time series can be forecast using all previous observations. We call these fitted values and they are denoted by \(\hat{y}_{t|t-1}\), meaning the forecast of \(y_t\) based on observations \(y_{1},\dots,y_{t-1}\) . We use these so often, we sometimes drop part of the subscript and just write \(\hat{y}_t\) instead of \(\hat{y}_{t|t-1}\). Fitted values almost always involve one-step forecasts (but see Section 13.8).

Actually, fitted values are often not true forecasts because any parameters involved in the forecasting method are estimated using all available observations in the time series, including future observations. For example, if we use the mean method, the fitted values are given by \[ \hat{y}_t = \hat{c} \] where \(\hat{c}\) is the average computed over all available observations, including those at times after \(t\). Similarly, for the drift method, the drift parameter is estimated using all available observations. In this case, the fitted values are given by \[ \hat{y}_t = y_{t-1} + \hat{c} \] where \(\hat{c} = (y_T-y_1)/(T-1)\). In both cases, there is a parameter to be estimated from the data. The “hat” above the \(c\) reminds us that this is an estimate. When the estimate of \(c\) involves observations after time \(t\), the fitted values are not true forecasts. On the other hand, naïve or seasonal naïve forecasts do not involve any parameters, and so fitted values are true forecasts in such cases.


The “residuals” in a time series model are what is left over after fitting a model. The residuals are equal to the difference between the observations and the corresponding fitted values: \[ e_{t} = y_{t}-\hat{y}_{t}. \]

If a transformation has been used in the model, then it is often useful to look at residuals on the transformed scale. We call these “innovation residuals”. For example, suppose we modelled the logarithms of the data, \(w_t = \log(y_t)\). Then the innovation residuals are given by \(w_t - \hat{w}_t\) whereas the regular residuals are given by \(y_t - \hat{y}_t\). (See Section 5.6 for how to use transformations when forecasting.) If no transformation has been used then the innovation residuals are identical to the regular residuals, and in such cases we will simply call them “residuals”.

The fitted values and residuals from a model can be obtained using the augment() function. In the beer production example in Section 5.2, we saved the fitted models as beer_fit. So we can simply apply augment() to this object to compute the fitted values and residuals for all models.

#> # A tsibble: 180 x 6 [1Q]
#> # Key:       .model [3]
#>    .model Quarter  Beer .fitted .resid .innov
#>    <chr>    <qtr> <dbl>   <dbl>  <dbl>  <dbl>
#>  1 Mean   1992 Q1   443    436.   6.55   6.55
#>  2 Mean   1992 Q2   410    436. -26.4  -26.4 
#>  3 Mean   1992 Q3   420    436. -16.4  -16.4 
#>  4 Mean   1992 Q4   532    436.  95.6   95.6 
#>  5 Mean   1993 Q1   433    436.  -3.45  -3.45
#>  6 Mean   1993 Q2   421    436. -15.4  -15.4 
#>  7 Mean   1993 Q3   410    436. -26.4  -26.4 
#>  8 Mean   1993 Q4   512    436.  75.6   75.6 
#>  9 Mean   1994 Q1   449    436.  12.6   12.6 
#> 10 Mean   1994 Q2   381    436. -55.4  -55.4 
#> # ℹ 170 more rows

There are three new columns added to the original data:

  • .fitted contains the fitted values;
  • .resid contains the residuals;
  • .innov contains the “innovation residuals” which, in this case, are identical to the regular residuals.

Residuals are useful in checking whether a model has adequately captured the information in the data. For this purpose, we use innovation residuals.

If patterns are observable in the innovation residuals, the model can probably be improved. We will look at some tools for exploring patterns in residuals in the next section.