## 6.8 Nonlinear regression

Although the linear relationship assumed so far in this chapter is often adequate, there are many cases in which a nonlinear functional form is more suitable. To keep things simple in this section we assume that we only have one predictor \(x\).

The simplest way of modelling a nonlinear relationship is to transform the forecast variable \(y\) and/or the predictor variable \(x\) before estimating a regression model. While this provides a non-linear functional form, the model is still linear in the parameters. The most commonly used transformation is the (natural) logarithm (see Section 3.3).

A **log-log** functional form is specified as
\[
\log y=\beta_0+\beta_1 \log x +\varepsilon.
\]
In this model, the slope \(\beta_1\) can be interpreted as an elasticity: \(\beta_1\) is the average percentage change in \(y\) resulting from a \(1\%\) increase in \(x\). Other useful forms can also be specified. The **log-linear** form is specified by only transforming the forecast variable and the **linear-log** form is obtained by transforming the predictor.

Recall that in order to perform a logarithmic transformation to a variable, all of its observed values must be greater than zero. In the case that variable \(x\) contains zeros, we use the transformation \(\log(x+1)\); i.e., we add one to the value of the variable and then take logarithms. This has a similar effect to taking logarithms but avoids the problem of zeros. It also has the neat side-effect of zeros on the original scale remaining zeros on the transformed scale.

There are cases for which simply transforming the data will not be adequate and a more general specification may be required. Then the model we use is \[ y=f(x) +\varepsilon \] where \(f\) is a nonlinear function. In standard (linear) regression, \(f(x)=\beta_{0} + \beta_{1} x\). In the specification of nonlinear regression that follows, we allow \(f\) to be a more flexible nonlinear function of \(x\), compared to simply a logarithmic or other transformation.

One of the simplest specifications is to make \(f\) **piecewise linear**. That is, we introduce points where the slope of \(f\) can change. These points are called **knots**. This can be achieved by letting \(x_{1,t}=x\) and introducing variable \(x_{2,t}\) such that
\[\begin{align*}
x_{2,t} = (x-c)_+ &= \left\{
\begin{array}{ll}
0 & x < c\\
(x-c) & x \ge c
\end{array}\right.
\end{align*}\]
The notation \((x-c)_+\) means the value \(x-c\) if it is positive and 0 otherwise. This forces the slope to bend at point \(c\). Additional bends can be included in the relationship by adding further variables of the above form.

An example of this follows by considering \(x=t\) and fitting a piecewise linear trend to a time series.

Piecewise linear relationships constructed in this way are a special case of **regression splines**. In general, a linear regression spline is obtained using
\[
x_{1}= x \quad x_{2} = (x-c_{1})_+ \quad\dots\quad x_{k} = (x-c_{k-1})_+
\]
where \(c_{1},\dots,c_{k-1}\) are the knots (the points at which the line can bend). Selecting the number of knots (\(k-1\)) and where they should be positioned can be difficult and somewhat arbitrary. Some automatic knot selection algorithms are available in some software, but are not yet widely used.

### Forecasting with a nonlinear trend

In Section 6.4 fitting a linear trend to a time series by setting \(x=t\) was introduced. The simplest way of fitting a nonlinear trend is using quadratic or higher order trends obtained by specifying \[ x_{1,t} =t,\quad x_{2,t}=t^2,\quad \dots. \] However, it is not recommended that quadratic or higher order trends be used in forecasting. When they are extrapolated, the resulting forecasts are often unrealistic.

A better approach is to use the piecewise specification introduced above and fit a piecewise linear trend which bends at some point in time. We can think of this as a nonlinear trend constructed of linear pieces. If the trend bends at time \(\tau\), then it can be specified by simply replacing \(x=t\) and \(c=\tau\) above such that we include the predictors, \[\begin{align*} x_{1,t} & = t \\ x_{2,t} = (t-\tau)_+ &= \left\{ \begin{array}{ll} 0 & t < \tau\\ (t-\tau) & t \ge \tau \end{array}\right. \end{align*}\] in the model. If the associated coefficients of \(x_{1,t}\) and \(x_{2,t}\) are \(\beta_1\) and \(\beta_2\), then \(\beta_1\) gives the slope of the trend before time \(\tau\), while the slope of the line after time \(\tau\) is given by \(\beta_1+\beta_2\). Additional bends can be included in the relationship by adding further variables of the form \((t-\tau)_+\) where \(\tau\) is the “knot” or point in time at which the line should bend.

#### Example: Boston marathon winning times

The top panel of Figure 6.20 shows the Boston marathon winning times (in minutes) since it started in 1897. The time series shows a general downward trend as the winning times have been improving over the years. The bottom panel shows the residuals from fitting a linear trend to the data. The plot shows an obvious nonlinear pattern which has not been captured by the linear trend. There is also some heteroscedasticity, with decreasing variation over time.

Fitting an exponential trend (equivalent to a log-linear regression) to the data can be achieved by transforming the \(y\) variable so that the model to be fitted is, \[ \log y_t=\beta_0+\beta_1 t +\varepsilon_t. \] This also addresses the heteroscedasticity. The fitted exponential trend and forecasts are shown in Figure 6.21. Although the exponential trend does not seem to fit the data much better than the linear trend, it gives a more sensible projection in that the winning times will decrease in the future but at a decaying rate rather than a fixed linear rate.

The plot of winning times reveals three different periods. There is a lot of volatility in the winning times up to about 1940, with the winning times decreasing overall but with significant increases during the 1920s. After 1940 there is a near-linear decrease in times, followed by a flattening out after the 1980s, with the suggestion of an upturn towards the end of the sample. To account for these changes, we specify the years 1940 and 1980 as knots. We should warn here that subjective identification of knots can lead to over-fitting, which can be detrimental to the forecast performance of a model, and should be performed with caution.

```
fit_trends <- boston_marathon %>%
model(
linear = TSLM(Minutes ~ trend()),
exponential = TSLM(log(Minutes) ~ trend()),
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)
fc_trends <- fit_trends %>% forecast(h=10)
boston_marathon %>%
autoplot(Minutes) +
geom_line(aes(y = .fitted, colour = .model), data = fitted(fit_trends)) +
autolayer(fc_trends, alpha = 0.5, level = 95) +
xlab("Year") + ylab("Winning times in minutes") +
ggtitle("Boston Marathon") +
guides(colour=guide_legend(title="Model"))
```

Figure 6.21 above shows the fitted lines and forecasts from linear, exponential and piecewise linear trends. The best forecasts appear to come from the piecewise linear trend.