Piecewise regression comes about when you have ‘breakpoints’, where there are clearly two different linear relationships in the data with a sudden, sharp change in directionality. This crops up occasionally in ecology when dealing with, for example, species richness of understory plants and forest age. There is initially a rapid drop as succession progresses (rapid colonizing plants die as light becomes more limiting), then a sudden breakpoint and a more moderately paced rise in species richness as more shade tolerant plants and seedlings of older trees begin to colonize the mature forest.

If you Google ‘R piecewise regression’, you may get a variety of methods and advice on how to run a piecewise regression. Essentially, you can do it manually or use a canned package to run the regression. If you’re like me, you might be wondering: Are these methods are the same? If not, which is best? How do I actually use them? Well, hopefully I’ll answer some of these questions.

There are essentially two methods I’ll walk through: brute force iterative approaches (as in ‘The R Book’ by Crawley) and the ‘segmented’ package. Using these approaches allows you to statistically estimate the breakpoint, which is better than just eyeballing it and fitting two models around what you *think* is the breakpoint. Let statistics do the work for you objectively.

First, let’s generate some segmented data:

x <- c(1:10, 13:22) y <- numeric(20) ## Create first segment y[1:10] <- 20:11 + rnorm(10, 0, 1.5) ## Create second segment y[11:20] <- seq(11, 15, len=10) + rnorm(10, 0, 1.5) ## Plot it par(mar=c(4,4,1,1)+0.2) plot(x,y, ylim=c(5, 20), pch=16)

You should get a V-shaped plot, where the right side is a little shallower sloped.

**METHOD 1: ITERATIVE SEARCHING**

First, we’ll model the data using the iterative search procedure described by Crawley. We’re going to use a model that looks like this:

In this case, *c* is the breakpoint. *I*(*x < c*) and *I*(*x > c*) are essentially dummy variables. *I*(*x < c*) is 1 if x is less than the breakpoint and 0 if it is above. *I*(*x > c*) is a dummy variable that is 1 if x is above the breakpoint and 0 if it is below. It should be obvious that there are two sets of parameters being modeled, depending on the value of x.

The complicated bit is choosing the breakpoint. We can eyeball the data and say that the breakpoint is somewhere between 9 and 17. Choose a wider range than you might think, just to be safe. Create a variable called breaks to hold these breakpoints:

breaks <- x[which(x >= 9 & x <= 17)]

Now we’re going to iteratively search these breakpoints for the model that has the lowest residual MSE, using that as our criteria for the best model. Create an empty container for MSE values from each model, and use a for( ) loop to run a linear regression for each possible breakpoint. Formulate the linear model exactly like the above formula.

mse <- numeric(length(breaks)) for(i in 1:length(breaks)){ piecewise1 <- lm(y ~ x*(x < breaks[i]) + x*(x>=breaks[i])) mse[i] <- summary(piecewise1)[6] } mse <- as.numeric(mse)

If we plot MSE by breakpoints, we can visually estimate the breakpoint as the lowest point on the curve:

It could easily be either 13 or 15, so I just pick the breakpoint with the lowest error:

breaks[which(mse==min(mse))]

This tells me that the breakpoint in my simulation was 15. I run my final model with my newly estimated breakpoint:

piecewise2 <- lm(y ~ x*(x < 15) + x*(x > 15)) summary(piecewise2)

The summary looks like this:

This is a pretty nasty output. Read it like this: (Intercept) is more or less useless on its own. The intercept for the line when x < 15 is (Intercept) + x < 15TRUE, or 3.3133 + 16.6352 = 19.9485. The slope of the line when x < 15 is x + x:x<15TRUE, 0.5843 – 1.3025 = -0.7182. So, when x is less than 15, the formula is 19.9485 - 0.7182x.

When x is greater than 15, the intercept is 3.3133 – 0.9116 = 2.4017 and the slope is just the variable x, 0.5843. So when x is greater than 15, the line has the equation 2.4017 + 0.5843x. Notice this table excludes the coefficients with NA estimates. Those arise from singularities and can be ignored.

plot(x,y, ylim=c(5, 20), pch=16) curve((3.3133 + 16.6352) + (0.5843-1.3025)*x, add=T, from=1, to=15) curve((3.3133 - 0.9116) + 0.5843*x, add=T, from=15, to=max(x)) abline(v=15, lty=3)

Notice that the segments **were not** constrained to be touching or continuous. This is inherent in the algorithm that we used.

**METHOD 2: USE ‘SEGMENTED’ PACKAGE**

To use the ‘segmented’ package you will, of course, need to install and load it. The procedure of ‘segmented’ uses maximum likelihood to fit a somewhat different parameterization of the model:

*I*(*x > c*) is a dummy variable as above, so when x < c, the model is essentially:

The *γ* term is simply a measure of the distance between the end of the first segment and the beginning of the next. The model converges when *γ *is minimized, thus this method **constrains the segments to be (nearly) continuous**. This is a *major* difference from the iterative approach in Method 1 above.

To use this method, you first fit a generic linear model. You then use the segmented( ) function to fit the piecewise regression. The segmented( ) function takes for its arguments the generic linear model, seg.Z which is a one sided formula describing the predictor with a segment (we only have one predictor, x, which has the segment), and psi, which is a starting value of the breakpoint (as in nls( ), you need to supply a best-guess estimate). More complicated models are a bit more complicated in terms of arguments, but this is a good starting example.

In our case, x is the predictor with a segment (it’s the only predictor) and based on my very first scatterplot (the first graph on the page), I’m going to guess the breakpoint is 14.

lin.mod <- lm(y~x) segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=14)

Using the summary(segmented.mod) command, you will get the estimated breakpoint +/- some error (12.07 for me), the intercept and slope for the first segment, and U1.x, which is the slope of the second segment.

**UPDATE:**

**U1.x is not the slope of the second segment. It is the difference in slopes between the first and second segment. So if your coefficients are x = -0.9699 and U1.x = 1.4163, then the slope of the second segment is -0.9699+1.4163 = 0.4464. Using the slope(segmented.mod) command will give you the slopes of each segment, which should match the above calculation.**

Plotting it is simple:

plot(x,y, pch=16, ylim=c(5,20)) plot(segmented.mod, add=T)

There you go: two methods for piecewise regression. Each is relatively simple to implement, but they do very different things. If you are dead set on piecewise regression, make sure you choose the one that makes the most sense for your question: do the segments need to be continuous or can they be discontinuous? Make sure you have a rationale for either.

Also note that these models are **not**** nested, **so you can’t used likelihood ratio tests for model selection. You can try to use AIC for model selection, but I advise you to think logically about the continuity of the segments instead.

Hi

This piecewise regression is also called sometimes “threshold regression”. In the time series context, threshold auto-regressive models (where y_{t-1} is the regressor) are available in the package tsDyn also.

Thanks for pointing this out! I’m very unfamiliar with the methods and packages available for time series and often forget about them, simply because I don’t work with that type of data regularly.

Hi Nathan, I am also currently fiddeling around with PW-regressions. Did you ever a lattice plot with multiple segmented regressions in a lattice layout? i.e. 2 rows 3 columns and each one with its one seg-regression? nice post!

I haven’t before, but I just played around with it based on my example. segmented.mod$psi contains the breakpoints, so you can make use of that. Here’s a fairly simple code to do what you’re requesting based on the above example, which is simplified with one predictor and one break (you can view this code as a continuation of the post). This also assumes that you’re using the model from the ‘segmented’ package as the preferred model.

This is about as complicated as I’ve gotten, but you can probably extend this up to multiple breaks fairly easily (although nothing is ever truly easy).

Hi, that´s very nice! I was thinking of another output though. I have 20 stations for which I want to have a PW-regression in each panel. Hence it would be something like xyplot(hour ~ ppm | station, data=data.frame,panel=function(x,y,…) { segmented regression here?}

But many thanks for your example! this brought me again a bit closer. I think I will give it a try on stackoverflow.com

cheers

Ahh sorry I misunderstood. Are you thinking of a separate piecewise regression, one for each station then? I’m not sure, but the code provided by Z below might do what you’re thinking more easily.

Hi Nathan,

Some time ago I wrote the following code for a grid search, am not 100% sure but I think this is what you refer to as “iteratively”. Code can be found here: http://eranraviv.com/blog/piecewise-regression/

Thanks for the post!

Yes, your code is very similar to my procedure. I played with it a bit and I’m not sure the two methods give equivalent results, but I haven’t really dug into it.

Note that for optimizing piecewise regressions without constraints for adjacent segments, the “strucchange” package can be used. It assumes that the observations are already ordered with respect to the variable along which you want to split (which may be different from the variable/s you regress on). In this case, the observations are already ordered with respect to x, hence I can directly do the following:

This considers all possible solutions with a minimum segement size of 4 observations, i.e., up to 3 breakpoints. The plot then compares the best residuals sum of squares and associated BIC that can be obtained for 0, 1, 2, 3 breaks. There’s a clear drop from 0 to 1 where BIC is minimized. The associated breakpoint is

Note that returns the last value of x in the first segment (not the first value of x in the second segment as your solution I did). The associated models are of course equivalent. Finally, you can extract intercept and slope in both segments or visualize the models. A lattice xyplot that one of the previous commenters suggested can also be easily created:

For more details on finding breakpoints with strucchange, see the associated manual page and also the corresponding 2003 paper published in Computational Statistics and Data Analysis – the second reference in citation(“strucchange”).

How about fitting a model like y = \beta_0 + \beta_1 * abs(x – \beta_x) + \beta_2 * x

say using nls? I’ve used this with some success in circumstances where I needed a

two-segment fit. The slopes of the two segments are \beta_2 + \beta_1 and \beta_2 – \beta_1 and the intersection is \beta_x. What’s nice is that you can parameterize it to constrain one or both slopes to have fixed values, as well.

See Equations (1) in

http://www.sbri.fr/files/publications/knoblauchvitalbarbur01.pdf

and also Equation 2 of

http://www.journalofvision.org/content/11/10/15.full.pdf

for an example of a fit for which the slopes are constrained.

I like this a lot. It’s very simple and intuitive. The methods above probably give reliable starting estimates for the non-linear regression as well. Thanks for the tip!

Hi, I have been using the segmented package to fit regressions to describe a stepwise change in heart rate in single individuals and it has worked brilliantly for this purpose. I want to compare levels of variation around each of the segments (two in our instance), however in the output I get both a coefficient and S.E for each line through the summary(model) and if I use the slope(model) I get a slope and S.E too. The output from slope(model) makes more sense in light of my data i.e. a negative value for a slope corresponds to a negative slope, whereas the same is not true for the summary(model).

However, I just wanted to clarify whether this approach is correct and that slope(model) produces the values I should be using to describe the gradient of slopes and the error about each slope?

Many thanks for any advice and also this very useful article.

Oli

I updated the post to address your issue because there was a mistake in it that you caught. Using the segmented package, the summary(model) gives you x and U1.x. x is the slope of the first segment, and U1.x is the difference between the original slope and the new slope. slope(model) gives the correct slopes, but I’m betting that if you add x and U1.x together, you’ll get the same values reported by slope(model). At least, that’s how it worked for me.

I’d use the output reported by slope(model).

Hi Nathan, thank you for the excellent article. I have two questions: (1) How would you change your R code to estimate a piecewise regression on time series data in an xts or ts object? and (2) Is it possible to estimate a piecewise multiple regression where only some (not all) of the independent variables have breaks?

Hi Geoffrey, I’ll try to answer your questions as best I can.

1) I’m not too familiar with this. My default starting point would be to compare linear models with and without autoregressive error functions using AIC. If the autoregressive error function doesn’t add much to the model, then I’d treat the time series as a regular linear model and use the standard technique. I’m not totally sure it’s appropriate, but that’s where I’d start. If you must use a time series analysis because of autocorrelation, I’d look into the ‘tsDyn’ package suggested by Matthieu (he mentions threshold autoregression, which may be what you’re looking for).

2) I believe the ‘segmented’ package can do this. You specify your regular linear model (lin.mod <- lm(y ~ x1*x2)) and then in the segmented command, you only specify whichever effects you are interested in (segmented.Z=~x1). I'd double check on that, but that's the impression I get from the help file. Check the Rnews article for more info: http://dssm.unipa.it/vmuggeo/segmentedRnews.pdf

Hi! I am a new R user and I am trying to use Method 1 above to find a breakpoint in my data. Is there a way to get more refinement in the breakpoint? That is, I would like more sig figs in the output. Is it always going to yield a whole number?

It will in my example because I was only searching whole numbers. If you set up a sequence of numbers with more significant digits, you can get more refinement. For example, I was searching only integers between 1 and 20 (or so). In R, this is 1:20. If I want to search between 1 and 20, but include three significant digits, I set up a range of numbers seq(1, 20, by=0.001). Be careful, because this is an incredibly huge amount of numbers, so you need to balance the range you’re searching (i.e. 1 and 20) with the precision (i.e. 0.001) to optimize the amount of time it takes. So in my example, I was searching integers 5:20, which I set up using breaks <- 5:20.

If I wanted to search between 10 and 16, by every 0.01, I reset breaks as breaks <- seq(10, 16, 0.01).

Dear Nathan,

There has been some animated discussion about whether there has been a “pause” in global temperature increases in the last 16 years

see : http://joannenova.com.au/2013/02/the-emerson-v-bolt-argument-on-air-does-emerson-not-know-statistics/#more-26952

Much of this discussion illustrates a misuse of statistical significance

How do you think this statistical question could be better put. Could you fit a linear model to the entire data set and compare with a piecewise regression with a breakpoint 16 years ago ?. A two regression line model obviously has more parameters than a single line – would AIC/BIC adequately test this ?

I hadn’t heard of this argument. Thanks for pointing it out. Clearly, warming trends are difficult to analyze and depend entirely on the dataset used, the location in question (some places are warming faster than others), the length of the time series used, etc. I’m not familiar enough with this argument or the data in question to weigh in on the argument. So my comment is purely statistical (which I think is what you’re getting at).

First, fitting a linear regression to time series data is inappropriate (which it looks like is what the blog author did). This is a time series, and you can clearly see autocorrelation present in the graph. So you need an autoregressive model, and you can use piecewise regressions with autoregressive functions.

Second, I think AIC would be a good way to test this. In fact, let’s say you use the absolute value function to fit the piecewise regression, where y = b0 + b1*x + b2*|x-c|. You can fit a second, OLS model: y = b0 + b1*x. The linear model is a subset, or nested within, the absolute value function (set b2 = 0). I believe (not 100% sure, but pretty sure) in this case, not only can you use AIC, but you can use a log-likelihood ratio test since one model is nested within the other. If you use the other methods described, you may have to use AIC, which can distinguish among non-nested models.

Hi

Note that in the case you have, the “transition” variable is not a simple variable, but it is time. Although this seems to be a small difference, it has some implications on the inference for the parameters, as well as for the testing procedure, so I do not think you can use directly these methods.

However, R has a very nice strucchange package, that will allow you to do everything you want to estimate and test the “structural break” you mention :-)

Thanks! I never deal with time series data, but I do know that when time is the predictor, things get tricky. I keep referring people to your comment when dealing with autoregressive piecewise regressions because strucchange and tsDyn seem to handle this issue well.

I guess whether you want to fit a piece-wise linear model will depend on whether you are willing to accept that the climate can change abruptly or do so gradually. I blogged about this topic in the context of fitting additive models to the NH temperature series, which you might find interesting or useful for comparing with any piece-wise linear fits you do.

Nathan,

Great post, I am interested in using segmented regression to model the thermoneutral zone and lower critical temperature of metabolism in endotherms. The thermoneutral zone theoretically has zero slope (metabolism doesn’t change with temperature in this zone). Is there a way to force the slope of one segment of the regression to be zero???

Thanks

andy

There is. This was covered in one of the previous comments actually

Pingback: Piecewise regression | Nguyen Hoai Tuong

Thanks a lot for this very clear paper.

I am also playing around with piecewise regressions and was wondering if with your ‘manual’ method there is a way to impose a zero-slope on one of the piecewise regressions.

Thank you,

Yes, there is. You can use dummy variables in a clever way to get this to happen. Imagine you’ve run the LS algorithm to find the breakpoint (the ‘manual’ method). The following code will force the line greater (or less than) the break point to be flat.

Note that this still allows discontinuous jumps. I tried playing with the segmented package to get a continuous line, but I haven’t come up with a way just yet. I’m sure there’s a way to incorporate this setup into the absolute value function as well, but I haven’t played with it much. This should get you started though.

Sure with the absolute value function, you can constrain the slopes of either segment. To refresh from above, the function to fit is

y = \beta_0 + \beta_1 * abs(x – \beta_x) + \beta_2 * x

You can calculate the values y at 0, \beta_x and 2 * \beta_x from which you can derive expressions for the slopes of the two segments. Then, fix one of the slopes to an arbitrary value, say 0, and solve for the values of the beta’s that you need to fix. Again, see

http://www.journalofvision.org/content/11/10/15.full.pdf

where we fixed the initial slope to -0.5 and the second to 0, for theoretical reasons.

Yep, I just sat down and did this. For example, if you want to fix the slope x > c to zero, here is the proof:

When x > c, the absolute value function b0 + b1|x-c| + b2x turns into b0 – b1c + (b1 + b2)x.

However, we want this slope to be zero, (b1 + b2) = 0.

Solve for b1

b1 = -b2.

Plug this into the original equation

b0 + b1|x-c| + -b1x

b0 + b1(|x – c| – x)

Run that last equation through nls() and there you have it. This gives the same result as Ken’s formula in the article:

b0 + b1(|x-c| -x + c)

Dear Nathan,

thank you for the codes, it is very helpful. Would there be another reference for the iterative method than ‘The R Book’ by Crawley ? I looked in that book and it doesn’t give any references either…

Not that I know of

Hi

Could you precise the page in Crawley you found this? It is somehow srange to coin this iterative search, as here evaluations are independent of each other.

This type of search comes by multiple names, grid search regarding the algorithm, and regarding the estimator, profile likelihood/SSR, or concentrated least square (as you search only the hyper-parameter, concentrating out/profiling the other slope parameters).

Standard reference for this (including refs to original papers) would be: Hansen (2000) Sample Splitting and Threshold Estimation, Econometrica, Vol. 68, No. 3. (May, 2000), pp. 575-603.

Mat

Sure. If I remember to later this week, I’ll post it.

Crawley (The R Book): pg 425 – 430