3.3 Transformations and adjustments

Adjusting the historical data can often lead to a simpler forecasting task. Here, we deal with four kinds of adjustments: calendar adjustments, population adjustments, inflation adjustments and mathematical transformations. The purpose of these adjustments and transformations is to simplify the patterns in the historical data by removing known sources of variation or by making the pattern more consistent across the whole data set. Simpler patterns usually lead to more accurate forecasts.

Calendar adjustments

Some of the variation seen in seasonal data may be due to simple calendar effects. In such cases, it is usually much easier to remove the variation before fitting a forecasting model.

For example, if you are studying the total monthly sales in a retail store, there will be variation between the months simply because of the different numbers of trading days in each month, in addition to the seasonal variation across the year. It is easy to remove this variation by computing average sales per trading day in each month, rather than total sales in the month. Then we effectively remove the calendar variation. Simpler patterns are usually easier to model and lead to more accurate forecasts.

Population adjustments

Any data that are affected by population changes can be adjusted to give per-capita data. That is, consider the data per person (or per thousand people, or per million people) rather than the total. For example, if you are studying the number of hospital beds in a particular region over time, the results are much easier to interpret if you remove the effects of population changes by considering the number of beds per thousand people. Then you can see whether there have been real increases in the number of beds, or whether the increases are due entirely to population increases. It is possible for the total number of beds to increase, but the number of beds per thousand people to decrease. This occurs when the population is increasing faster than the number of hospital beds. For most data that are affected by population changes, it is best to use per-capita data rather than the totals.

This can be seen in the global_economy dataset, where a common transformation of GDP is GDP per-capita.

global_economy %>%
  filter(Country == "Australia") %>%
  autoplot(GDP / Population)
Australian GDP per-capita.

Figure 3.4: Australian GDP per-capita.

Inflation adjustments

Data which are affected by the value of money are best adjusted before modelling. For example, the average cost of a new house will have increased over the last few decades due to inflation. A $200,000 house this year is not the same as a $200,000 house twenty years ago. For this reason, financial time series are usually adjusted so that all values are stated in dollar values from a particular year. For example, the house price data may be stated in year 2000 dollars.

To make these adjustments, a price index is used. If \(z_{t}\) denotes the price index and \(y_{t}\) denotes the original house price in year \(t\), then \(x_{t} = y_{t}/z_{t} * z_{2000}\) gives the adjusted house price at year 2000 dollar values. Price indexes are often constructed by government agencies. For consumer goods, a common price index is the Consumer Price Index (or CPI).

This allows us to compare the growth or decline of industries relative to a common price value. For example, looking at aggregate “newspaper and book” retail turnover from aus_retail, and adjusting the data for inflation using CPI from global_economy allows us to understand the changes over time.

print_retail <- aus_retail %>%
  filter(Industry == "Newspaper and book retailing") %>%
  group_by(Industry) %>%
  index_by(Year = year(Month)) %>%
  summarise(Turnover = sum(Turnover))
aus_economy <- global_economy %>%
  filter(Code == "AUS")
print_retail %>%
  left_join(aus_economy, by = "Year") %>%
  mutate(Adjusted_turnover = Turnover / CPI) %>%
  gather("Type", "Turnover", Turnover, Adjusted_turnover, factor_key = TRUE) %>%
  ggplot(aes(x = Year, y = Turnover)) +
    geom_line() +
    facet_grid(vars(Type), scales = "free_y") +
    xlab("Years") + ylab(NULL) +
    ggtitle("Turnover for the Australian print media industry")

By adjusting for inflation using the CPI, we can see that Australia’s newspaper and book retailing industry has been in decline much longer than the original data suggests.

Mathematical transformations

If the data shows variation that increases or decreases with the level of the series, then a transformation can be useful. For example, a logarithmic transformation is often useful. If we denote the original observations as \(y_{1},\dots,y_{T}\) and the transformed observations as \(w_{1}, \dots, w_{T}\), then \(w_t = \log(y_t)\). Logarithms are useful because they are interpretable: changes in a log value are relative (or percentage) changes on the original scale. So if log base 10 is used, then an increase of 1 on the log scale corresponds to a multiplication of 10 on the original scale. Another useful feature of log transformations is that they constrain the forecasts to stay positive on the original scale.

Sometimes other transformations are also used (although they are not so interpretable). For example, square roots and cube roots can be used. These are called power transformations because they can be written in the form \(w_{t} = y_{t}^p\).

A useful family of transformations, that includes both logarithms and power transformations, is the family of Box-Cox transformations, which depend on the parameter \(\lambda\) and are defined as follows: \[ w_t = \begin{cases} \log(y_t) & \text{if $\lambda=0$}; \\ (y_t^\lambda-1)/\lambda & \text{otherwise}. \end{cases} \]

The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to base \(e\)). So if \(\lambda=0\), natural logarithms are used, but if \(\lambda\ne0\), a power transformation is used, followed by some simple scaling.

If \(\lambda=1\), then \(w_t = y_t-1\), so the transformed data is shifted downwards but there is no change in the shape of the time series. But for all other values of \(\lambda\), the time series will change shape.

Use the slider below to see the effect of varying \(\lambda\) to transform Australian quarterly gas production:

A good value of \(\lambda\) is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler. In this case, \(\lambda=0.10\) works quite well, although any value of \(\lambda\) between 0 and 0.2 would give similar results.

The guerrero feature (Guerrero, 1993) can be used to choose a value of lambda for you. In this case it chooses \(\lambda=0.12\).

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>% autoplot(box_cox(Gas, lambda))

Having chosen a transformation, we need to forecast the transformed data. Then, we need to reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformation is given by \[\begin{equation} \tag{3.1} y_{t} = \begin{cases} \exp(w_{t}) & \text{if $\lambda=0$};\\ (\lambda w_t+1)^{1/\lambda} & \text{otherwise}. \end{cases} \end{equation}\]

The fable package will automatically back-transform the forecasts whenever a transformation has been used in the model definition.

Features of power transformations

  • If some \(y_{t}\le0\), no power transformation is possible unless all observations are adjusted by adding a constant to all values.
  • Choose a simple value of \(\lambda\). It makes explanations easier.
  • The forecasting results are relatively insensitive to the value of \(\lambda\).
  • Often no transformation is needed.
  • Transformations sometimes make little difference to the forecasts but have a large effect on prediction intervals.

Combinations of transformations

Combinations of transformations extend the way in which response variables can be modified substantially. The Box-Cox transformation (\(\lambda\neq 0\)) can be broken down into several simpler transformations (multiplication, addition and exponentiation).

A log transformation is often used to ensure that the resulting forecasts will be non-negative, which is especially appealing for data where this constraint is reasonable. However a log transformation cannot be used for data with zero (or negative) observations, which is typical of count data for example. Instead, a \(\log(x + 1)\) transformation is commonly used, which allows a log-like transformation to be applied on data that contains zero. Most combinations of transformations can be expressed in this way.

Another useful transformation is the scaled logit, which can be used to ensure that the forecasts are kept within a specific interval. A scaled logit that ensures the forecasted values are between \(a\) and \(b\) (where \(a<b\)) is given by: \[ f(x) = \log\left(\dfrac{x-a}{b-x}\right). \] Inverting this transformation gives the appropriate back-transformation of: \[ f^{-1}(x) = \dfrac{a + be^x}{1 + e^x} = \dfrac{(b-a)e^x}{1 + e^x} + a. \]

To use this transformation when modelling, we can create a new transformation with the new_transformation() function. This allows us to define two functions that accept the same parameters, where the observations are provided as the first argument. The first function is used to transform the data, the second is used to back-transform forecasts.

scaled_logit <- new_transformation(
  transformation = function(x, lower=0, upper=1){
    log((x-lower)/(upper-x))
  },
  inverse = function(x, lower=0, upper=1){
    (upper-lower)*exp(x)/(1+exp(x)) + lower
  }
)

With this new transformation function defined, it is now possible to restrict forecasts to be within a specified interval. For example, to restrict the forecasts to be between 0 and 100 you could use scaled_logit(y, 0, 100) as the model’s left hand side formula.

Bias adjustments

One issue with using mathematical transformations such as Box-Cox transformations is that the back-transformed point forecast will not be the mean of the forecast distribution. In fact, it will usually be the median of the forecast distribution (assuming that the distribution on the transformed space is symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, you may wish to add up sales forecasts from various regions to form a forecast for the whole country. But medians do not add up, whereas means do.

For a Box-Cox transformation, the back-transformed mean is given by \[\begin{equation} \tag{3.2} y_t = \begin{cases} \exp(w_t)\left[1 + \frac{\sigma_h^2}{2}\right] & \text{if $\lambda=0$;}\\ (\lambda w_t+1)^{1/\lambda}\left[1 + \frac{\sigma_h^2(1-\lambda)}{2(\lambda w_t+1)^{2}}\right] & \text{otherwise;} \end{cases} \end{equation}\] where \(\sigma_h^2\) is the \(h\)-step forecast variance on the transformed scale. The larger the forecast variance, the bigger the difference between the mean and the median.

The difference between the simple back-transformed forecast given by (3.1) and the mean given by (3.2) is called the bias. When we use the mean, rather than the median, we say the point forecasts have been bias-adjusted.

To see how much difference this bias-adjustment makes, consider the following example, where we forecast average annual price of eggs using the drift method with a log transformation \((\lambda=0)\). The log transformation is useful in this case to ensure the forecasts and the prediction intervals stay positive.

eggs <- as_tsibble(fma::eggs)
fit <- eggs %>% model(RW(log(value) ~ drift()))
fc <- fit %>% forecast(h=50) %>%
  mutate(Forecast = "Bias adjusted")
fc_biased <- fit %>% forecast(h=50, bias_adjust = FALSE) %>%
  mutate(Forecast = "Simple back transformation")
eggs %>% autoplot(value) +
  autolayer(fc_biased, level = 80) +
  autolayer(fc, colour = "red", level = NULL)
Forecasts of egg prices using a random walk with drift applied to the logged data. The bias-adjusted mean forecasts are shown in red, while the median forecasts are shown in blue.

Figure 3.5: Forecasts of egg prices using a random walk with drift applied to the logged data. The bias-adjusted mean forecasts are shown in red, while the median forecasts are shown in blue.

The blue line in Figure 3.5 shows the forecast medians while the red line shows the forecast means. Notice how the skewed forecast distribution pulls up the point forecast when we use the bias adjustment.

Bias adjustments will be applied by default in the fable package. To produce point forecasts that are not bias adjusted (giving forecast medians rather than means), use the argument bias_adjust=FALSE when you compute forecasts.

Bibliography

Guerrero, V. M. (1993). Time-series analysis supported by power transformations. Journal of Forecasting, 37–48.