4.3 STL Features

The STL decompositions discussed in Chapter 3 are the basis for several more features.

A time series decomposition can be used to measure the strength of trend and seasonality in a time series. Recall that the decomposition is written as \[ y_t = T_t + S_{t} + R_t, \] where \(T_t\) is the smoothed trend component, \(S_{t}\) is the seasonal component and \(R_t\) is a remainder component. For strongly trended data, the seasonally adjusted data should have much more variation than the remainder component. Therefore Var\((R_t)\)/Var\((T_t+R_t)\) should be relatively small. But for data with little or no trend, the two variances should be approximately the same. So we define the strength of trend as: \[ F_T = \max\left(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(T_t+R_t)}\right). \] This will give a measure of the strength of the trend between 0 and 1. Because the variance of the remainder might occasionally be even larger than the variance of the seasonally adjusted data, we set the minimal possible value of \(F_T\) equal to zero.

The strength of seasonality is defined similarly, but with respect to the detrended data rather than the seasonally adjusted data: \[ F_S = \max\left(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(S_{t}+R_t)}\right). \] A series with seasonal strength \(F_S\) close to 0 exhibits almost no seasonality, while a series with strong seasonality will have \(F_S\) close to 1 because Var\((R_t)\) will be much smaller than Var\((S_{t}+R_t)\).

These measures can be useful, for example, when you have a large collection of time series, and you need to find the series with the most trend or the most seasonality.

Other useful features based on STL include the timing of peaks and troughs — which month or quarter contains the largest seasonal component and which contains the smallest seasonal component. This tells us something about the nature of the seasonality. In the Australian tourism data, if Quarter 3 is the peak seasonal period, then people are travelling to the region in winter, whereas a peak in Quarter 1 suggests that the region is more popular in summer.

These STL-based features are computed using the feat_stl() function.`

tourism %>%
  features(Trips, feat_stl)
#> # A tibble: 304 x 12
#>    Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
#>    <chr>  <chr> <chr>            <dbl>            <dbl>            <dbl>
#>  1 Adela… Sout… Busine…          0.451            0.380                3
#>  2 Adela… Sout… Holiday          0.541            0.601                1
#>  3 Adela… Sout… Other            0.743            0.189                2
#>  4 Adela… Sout… Visiti…          0.433            0.446                1
#>  5 Adela… Sout… Busine…          0.453            0.140                3
#>  6 Adela… Sout… Holiday          0.512            0.244                2
#>  7 Adela… Sout… Other            0.584            0.374                2
#>  8 Adela… Sout… Visiti…          0.481            0.228                0
#>  9 Alice… Nort… Busine…          0.526            0.224                0
#> 10 Alice… Nort… Holiday          0.377            0.827                3
#> # … with 294 more rows, and 6 more variables: seasonal_trough_year <dbl>,
#> #   spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
#> #   stl_e_acf10 <dbl>

We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.

tourism %>%
  features(Trips, feat_stl) %>%
  ggplot(aes(x = trend_strength, y = seasonal_strength_year, col = Purpose)) +
  geom_point() +

Clearly, holiday series are most seasonal which is unsurprising. The strongest trends tend to be in Western Australia and Victoria.

The most seasonal series can also be easily identified and plotted.

tourism %>%
  features(Trips, feat_stl) %>%
  filter(seasonal_strength_year == max(seasonal_strength_year)) %>%
  left_join(tourism, by = c("State", "Region", "Purpose")) %>%
  ggplot(aes(x = Quarter, y = Trips)) +
  geom_line() +
  facet_grid(vars(State, Region, Purpose))

This shows holiday trips to the most popular ski region of Australia.

The feat_stl() function returns several more features other than those discussed above.

  • spikiness measures the prevalence of spikes in the remainder component \(R_t\) of the STL decomposition. It is the variance of the leave-one-out variances of \(R_t\).
  • linearity measures the linearity of the trend component of the STL decomposition. It is based on the coefficient of a linear regression applied to the trend component.
  • curvature measures the curvature of the trend component of the STL decomposition. It is based on the coefficient from an orthogonal quadratic regression applied to the trend component.
  • stl_e_acf1 is the first autocorrelation coefficient of the remainder series.
  • stl_e_acf10 is the sum of squares of the first ten autocorrelation coefficients of the remainder series.