## 4.3 STL Features

The STL decomposition discussed in Chapter 3 is the basis for several more features.

A time series decomposition can be used to measure the strength of trend and seasonality in a time series. Recall that the decomposition is written as \[ y_t = T_t + S_{t} + R_t, \] where \(T_t\) is the smoothed trend component, \(S_{t}\) is the seasonal component and \(R_t\) is a remainder component. For strongly trended data, the seasonally adjusted data should have much more variation than the remainder component. Therefore Var\((R_t)\)/Var\((T_t+R_t)\) should be relatively small. But for data with little or no trend, the two variances should be approximately the same. So we define the strength of trend as: \[ F_T = \max\left(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(T_t+R_t)}\right). \] This will give a measure of the strength of the trend between 0 and 1. Because the variance of the remainder might occasionally be even larger than the variance of the seasonally adjusted data, we set the minimal possible value of \(F_T\) equal to zero.

The strength of seasonality is defined similarly, but with respect to the detrended data rather than the seasonally adjusted data: \[ F_S = \max\left(0, 1 - \frac{\text{Var}(R_t)}{\text{Var}(S_{t}+R_t)}\right). \] A series with seasonal strength \(F_S\) close to 0 exhibits almost no seasonality, while a series with strong seasonality will have \(F_S\) close to 1 because Var\((R_t)\) will be much smaller than Var\((S_{t}+R_t)\).

These measures can be useful, for example, when you have a large collection of time series, and you need to find the series with the most trend or the most seasonality. These and other STL-based features are computed using the `feat_stl()`

function.

```
tourism |>
features(Trips, feat_stl)
#> # A tibble: 304 × 12
#> Region State Purpose trend…¹ seaso…² seaso…³ seaso…⁴ spiki…⁵ linea…⁶
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Adelaide Sout… Busine… 0.464 0.407 3 1 1.58e+2 -5.31
#> 2 Adelaide Sout… Holiday 0.554 0.619 1 2 9.17e+0 49.0
#> 3 Adelaide Sout… Other 0.746 0.202 2 1 2.10e+0 95.1
#> 4 Adelaide Sout… Visiti… 0.435 0.452 1 3 5.61e+1 34.6
#> 5 Adelaide Hil… Sout… Busine… 0.464 0.179 3 0 1.03e-1 0.968
#> 6 Adelaide Hil… Sout… Holiday 0.528 0.296 2 1 1.77e-1 10.5
#> 7 Adelaide Hil… Sout… Other 0.593 0.404 2 2 4.44e-4 4.28
#> 8 Adelaide Hil… Sout… Visiti… 0.488 0.254 0 3 6.50e+0 34.2
#> 9 Alice Springs Nort… Busine… 0.534 0.251 0 1 1.69e-1 23.8
#> 10 Alice Springs Nort… Holiday 0.381 0.832 3 1 7.39e-1 -19.6
#> # … with 294 more rows, 3 more variables: curvature <dbl>, stl_e_acf1 <dbl>,
#> # stl_e_acf10 <dbl>, and abbreviated variable names ¹trend_strength,
#> # ²seasonal_strength_year, ³seasonal_peak_year, ⁴seasonal_trough_year,
#> # ⁵spikiness, ⁶linearity
```

We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.

```
tourism |>
features(Trips, feat_stl) |>
ggplot(aes(x = trend_strength, y = seasonal_strength_year,
col = Purpose)) +
geom_point() +
facet_wrap(vars(State))
```

Clearly, holiday series are most seasonal which is unsurprising. The strongest trends tend to be in Western Australia and Victoria. The most seasonal series can also be easily identified and plotted.

```
tourism |>
features(Trips, feat_stl) |>
filter(
seasonal_strength_year == max(seasonal_strength_year)
) |>
left_join(tourism, by = c("State", "Region", "Purpose"), multiple = "all") |>
ggplot(aes(x = Quarter, y = Trips)) +
geom_line() +
facet_grid(vars(State, Region, Purpose))
```

This shows holiday trips to the most popular ski region of Australia.

The `feat_stl()`

function returns several more features other than those discussed above.

`seasonal_peak_year`

indicates the timing of the peaks — which month or quarter contains the largest seasonal component. This tells us something about the nature of the seasonality. In the Australian tourism data, if Quarter 3 is the peak seasonal period, then people are travelling to the region in winter, whereas a peak in Quarter 1 suggests that the region is more popular in summer.`seasonal_trough_year`

indicates the timing of the troughs — which month or quarter contains the smallest seasonal component.`spikiness`

measures the prevalence of spikes in the remainder component \(R_t\) of the STL decomposition. It is the variance of the leave-one-out variances of \(R_t\).`linearity`

measures the linearity of the trend component of the STL decomposition. It is based on the coefficient of a linear regression applied to the trend component.`curvature`

measures the curvature of the trend component of the STL decomposition. It is based on the coefficient from an orthogonal quadratic regression applied to the trend component.`stl_e_acf1`

is the first autocorrelation coefficient of the remainder series.`stl_e_acf10`

is the sum of squares of the first ten autocorrelation coefficients of the remainder series.