7.7 Nonlinear regression

Although the linear relationship assumed so far in this chapter is often adequate, there are many cases in which a nonlinear functional form is more suitable. To keep things simple in this section we assume that we only have one predictor \(x\).

The simplest way of modelling a nonlinear relationship is to transform the forecast variable \(y\) and/or the predictor variable \(x\) before estimating a regression model. While this provides a non-linear functional form, the model is still linear in the parameters. The most commonly used transformation is the (natural) logarithm (see Section 3.1).

A log-log functional form is specified as \[ \log y=\beta_0+\beta_1 \log x +\varepsilon. \] In this model, the slope \(\beta_1\) can be interpreted as an elasticity: \(\beta_1\) is the average percentage change in \(y\) resulting from a 1% increase in \(x\). Other useful forms can also be specified. The log-linear form is specified by only transforming the forecast variable and the linear-log form is obtained by transforming the predictor.

Recall that in order to perform a logarithmic transformation to a variable, all of its observed values must be greater than zero. In the case that variable \(x\) contains zeros, we use the transformation \(\log(x+1)\); i.e., we add one to the value of the variable and then take logarithms. This has a similar effect to taking logarithms but avoids the problem of zeros. It also has the neat side-effect of zeros on the original scale remaining zeros on the transformed scale.

There are cases for which simply transforming the data will not be adequate and a more general specification may be required. Then the model we use is \[ y=f(x) +\varepsilon \] where \(f\) is a nonlinear function. In standard (linear) regression, \(f(x)=\beta_{0} + \beta_{1} x\). In the specification of nonlinear regression that follows, we allow \(f\) to be a more flexible nonlinear function of \(x\), compared to simply a logarithmic or other transformation.

One of the simplest specifications is to make \(f\) piecewise linear. That is, we introduce points where the slope of \(f\) can change. These points are called knots. This can be achieved by letting \(x_{1}=x\) and introducing variable \(x_{2}\) such that \[\begin{align*} x_{2} = (x-c)_+ &= \left\{ \begin{array}{ll} 0 & \text{if } x < c\\ x-c & \text{if } x \ge c. \end{array}\right. \end{align*}\] The notation \((x-c)_+\) means the value \(x-c\) if it is positive and 0 otherwise. This forces the slope to bend at point \(c\). Additional bends can be included in the relationship by adding further variables of the above form.

Piecewise linear relationships constructed in this way are a special case of regression splines. In general, a linear regression spline is obtained using \[ x_{1}= x \quad x_{2} = (x-c_{1})_+ \quad\dots\quad x_{k} = (x-c_{k-1})_+ \] where \(c_{1},\dots,c_{k-1}\) are the knots (the points at which the line can bend). Selecting the number of knots (\(k-1\)) and where they should be positioned can be difficult and somewhat arbitrary. Some automatic knot selection algorithms are available, but are not widely used.

Forecasting with a nonlinear trend

In Section 7.4 fitting a linear trend to a time series by setting \(x=t\) was introduced. The simplest way of fitting a nonlinear trend is using quadratic or higher order trends obtained by specifying \[ x_{1,t} =t,\quad x_{2,t}=t^2,\quad \dots. \] However, it is not recommended that quadratic or higher order trends be used in forecasting. When they are extrapolated, the resulting forecasts are often unrealistic.

A better approach is to use the piecewise specification introduced above and fit a piecewise linear trend which bends at some point in time. We can think of this as a nonlinear trend constructed of linear pieces. If the trend bends at time \(\tau\), then it can be specified by simply replacing \(x=t\) and \(c=\tau\) above such that we include the predictors, \[\begin{align*} x_{1,t} & = t \\ x_{2,t} &= (t-\tau)_+ = \left\{ \begin{array}{ll} 0 & \text{if } t < \tau\\ t-\tau & \text{if } t \ge \tau \end{array}\right. \end{align*}\] in the model. If the associated coefficients of \(x_{1,t}\) and \(x_{2,t}\) are \(\beta_1\) and \(\beta_2\), then \(\beta_1\) gives the slope of the trend before time \(\tau\), while the slope of the line after time \(\tau\) is given by \(\beta_1+\beta_2\). Additional bends can be included in the relationship by adding further variables of the form \((t-\tau)_+\) where \(\tau\) is the “knot” or point in time at which the line should bend.

Example: Boston marathon winning times

We will fit some trend models to the Boston marathon winning times for men. First we extract the men’s data and convert the winning times to a numerical value. The course was lengthened (from 24.5 miles to 26.2 miles) in 1924, which led to a jump in the winning times, so we only consider data from that date onwards.

boston_men <- boston_marathon |>
  filter(Year >= 1924) |>
  filter(Event == "Men's open division") |>
  mutate(Minutes = as.numeric(Time)/60)

The top panel of Figure 7.20 shows the winning times since 1924. The time series shows a general downward trend as the winning times have been improving over the years. The bottom panel shows the residuals from fitting a linear trend to the data. The plot shows an obvious nonlinear pattern which has not been captured by the linear trend.

Fitting a linear trend to the Boston marathon winning times is inadequate

Figure 7.20: Fitting a linear trend to the Boston marathon winning times is inadequate

Fitting an exponential trend (equivalent to a log-linear regression) to the data can be achieved by transforming the \(y\) variable so that the model to be fitted is, \[ \log y_t=\beta_0+\beta_1 t +\varepsilon_t. \] The fitted exponential trend and forecasts are shown in Figure 7.21. Although the exponential trend does not seem to fit the data much better than the linear trend, it perhaps gives a more sensible projection in that the winning times will decrease in the future but at a decaying rate rather than a fixed linear rate.

The plot of winning times reveals three different periods. There is a lot of volatility in the winning times up to about 1950, with the winning times barely declining. After 1950 there is a clear decrease in times, followed by a flattening out after the 1980s, with the suggestion of an upturn towards the end of the sample. To account for these changes, we specify the years 1950 and 1980 as knots. We should warn here that subjective identification of knots can lead to over-fitting, which can be detrimental to the forecast performance of a model, and should be performed with caution.

fit_trends <- boston_men |>
    linear = TSLM(Minutes ~ trend()),
    exponential = TSLM(log(Minutes) ~ trend()),
    piecewise = TSLM(Minutes ~ trend(knots = c(1950, 1980)))
fc_trends <- fit_trends |> forecast(h = 10)

boston_men |>
  autoplot(Minutes) +
  geom_line(data = fitted(fit_trends),
            aes(y = .fitted, colour = .model)) +
  autolayer(fc_trends, alpha = 0.5, level = 95) +
  labs(y = "Minutes",
       title = "Boston marathon winning times")
Projecting forecasts from linear, exponential and piecewise linear trends for the Boston marathon winning times.

Figure 7.21: Projecting forecasts from linear, exponential and piecewise linear trends for the Boston marathon winning times.

Figure 7.21 shows the fitted lines and forecasts from linear, exponential and piecewise linear trends. The best forecasts appear to come from the piecewise linear trend.