What’s the big deal about a 2˚ C increase in temperature?

I’ve spent the last few posts trying to determine the causes of skepticism about anthropogenic climate change, be it the science version of fire-and-brimstone pulpit speeches or hypocrisy from those who encourage lifestyle changes to deal with climate change. This post takes on a different issue: the “So what?” response.

A number of climate models predict average global temperatures to rise anywhere from 2˚ – 7˚ C over the next half- to full century, although this varies among regions. For example, the arctic is warming way faster than the rest of the earth. Actually, it looks like we’re going to overshoot 2˚ C considerably, suggesting that this might not be a realistic baseline for experimental studies (but that’s another issue).

Some people may hear you say ‘Temperatures are going up by 2 degrees!’ and they say, ‘So what? It’s only two degrees’. How do you respond?

First, I’m not sure how big of an issue this next point is, but Americans are used to thinking about degrees Fahrenheit, not Celsius. A 2˚ C increase is a 3.6˚ F increase. However, 2˚ is conservative. The more likely outcome, 7˚ C increase, is a 9˚ F increase. So tell people imagine a summer where instead of 90, it’s 100. A spring that’s 85 instead of 75. Essentially, imagine this summer’s heatwave on a more permanent basis. Again, I’m not sure how big a deal this really is, but I think part of the message might be getting lost in the conversion.

Second, let me hit you with some knowledge. All ectothermic species (i.e. almost everything beyond birds and mammals) have what are called ‘thermal response’ curves describing how well they perform at various temperatures (side note: endotherms have these to, but homeothermy makes them less interesting, at least to me 🙂 ). More often than not, these curves are asymmetrical, where species have a thermal optimum, beyond which things come crashing down rapidly. Super rapidly. Ridiculously rapidly. Etc. In the graph below, ‘Fitness’ refers to an organism’s capacity to reproduce, commonly measured in eggs, size, growth rate, or any number of proxy variables that indicate how successful an individual is at reproducing.

This would be the coolest roller-coaster ever…

This shape has been documented for a huge number of organisms including my little friends, the sea urchin Lytechinus variegatus.

He was hopped up enough to make some bad decisions…

In fact, I have a paper in press showing that the cause of such fitness reductions is a severe mismatch between metabolic demand and consumption at high temperatures. Metabolism keeps going up while consumption rates crash just beyond the urchin’s thermal optimum.

From Lemoine and Burkepile (in press) Temperature-induced mismatches between consumption and metabolism reduce consumer fitness. Ecology. http://dx.doi.org/10.1890/12-0375.1

With less energy available (because they’re eating less), and more energy needed to fuel metabolism, fitness related traits (growth, reproduction, etc.) crash as well. This happens for a huge number of species, and it all happens within the space of 2 – 3˚ C.

Consider that most tropical species are already living near their thermal optimum (say, 29˚ C as in Fig. B). A 2˚ increase pushes the species past its thermal optimum into the crash zone (31˚ in Fig. B). The ramifications for population growth rates under such thermal stress are enormous. Organisms reproduce less, population growth rates decline, and extinction risk increases (because slow-growing populations are more likely to go extinct in variable environments).

So when someone asks, ‘What’s the big deal about 2˚?’, just draw that graph for them. It probably won’t change their minds, but at least you’d have an answer.

P.S. I realize this is the same kind of fear-inducing tactic I’ve recently suggested avoiding, but sometimes it can’t be helped. I like urchins too much.

R for Ecologists: Putting Together a Piecewise Regression

Piecewise regression comes about when you have ‘breakpoints’, where there are clearly two different linear relationships in the data with a sudden, sharp change in directionality. This crops up occasionally in ecology when dealing with, for example, species richness of understory plants and forest age. There is initially a rapid drop as succession progresses (rapid colonizing plants die as light becomes more limiting), then a sudden breakpoint and a more moderately paced rise in species richness as more shade tolerant plants and seedlings of older trees begin to colonize the mature forest.

If you Google ‘R piecewise regression’, you may get a variety of methods and advice on how to run a piecewise regression. Essentially, you can do it manually or use a canned package to run the regression. If you’re like me, you might be wondering: Are these methods are the same? If not, which is best? How do I actually use them? Well, hopefully I’ll answer some of these questions.

There are essentially two methods I’ll walk through: brute force iterative approaches (as in ‘The R Book’ by Crawley) and the ‘segmented’ package. Using these approaches allows you to statistically estimate the breakpoint, which is better than just eyeballing it and fitting two models around what you think is the breakpoint. Let statistics do the work for you objectively.

First, let’s generate some segmented data:

x <- c(1:10, 13:22)
y <- numeric(20)
## Create first segment
y[1:10] <- 20:11 + rnorm(10, 0, 1.5)
## Create second segment
y[11:20] <- seq(11, 15, len=10) + rnorm(10, 0, 1.5)
## Plot it
par(mar=c(4,4,1,1)+0.2)
plot(x,y, ylim=c(5, 20), pch=16)

You should get a V-shaped plot, where the right side is a little shallower sloped.

Like this

METHOD 1: ITERATIVE SEARCHING

First, we’ll model the data using the iterative search procedure described by Crawley. We’re going to use a model that looks like this:

Note that the multiplication symbol is used in the R model definition sense, not the mathematical sense, meaning main effects and interactions for both variables

In this case, c is the breakpoint. I(x < c) and I(x > c) are essentially dummy variables. I(x < c) is 1 if x is less than the breakpoint and 0 if it is above. I(x > c) is a dummy variable that is 1 if x is above the breakpoint and 0 if it is below. It should be obvious that there are two sets of parameters being modeled, depending on the value of x.

The complicated bit is choosing the breakpoint. We can eyeball the data and say that the breakpoint is somewhere between 9 and 17. Choose a wider range than you might think, just to be safe. Create a variable called breaks to hold these breakpoints:

breaks <- x[which(x >= 9 & x <= 17)]

Now we’re going to iteratively search these breakpoints for the model that has the lowest residual MSE, using that as our criteria for the best model. Create an empty container for MSE values from each model, and use a for( ) loop to run a linear regression for each possible breakpoint. Formulate the linear model exactly like the above formula.

mse <- numeric(length(breaks))
for(i in 1:length(breaks)){
 piecewise1 <- lm(y ~ x*(x < breaks[i]) + x*(x>=breaks[i]))
 mse[i] <- summary(piecewise1)[6]
}
mse <- as.numeric(mse)

If we plot MSE by breakpoints, we can visually estimate the breakpoint as the lowest point on the curve:

It could easily be either 13 or 15, so I just pick the breakpoint with the lowest error:

breaks[which(mse==min(mse))]

This tells me that the breakpoint in my simulation was 15. I run my final model with my newly estimated breakpoint:

piecewise2 <- lm(y ~ x*(x < 15) + x*(x > 15))
summary(piecewise2)

The summary looks like this:

This is a pretty nasty output. Read it like this: (Intercept) is more or less useless on its own. The intercept for the line when x < 15 is (Intercept) + x < 15TRUE, or 3.3133 + 16.6352 = 19.9485. The slope of the line when x < 15 is x + x:x<15TRUE, 0.5843 – 1.3025 = -0.7182. So, when x is less than 15, the formula is 19.9485 – 0.7182x.

When x is greater than 15, the intercept is 3.3133 – 0.9116 = 2.4017 and the slope is just the variable x, 0.5843. So when x is greater than 15, the line has the equation 2.4017 + 0.5843x. Notice this table excludes the coefficients with NA estimates. Those arise from singularities and can be ignored.

plot(x,y, ylim=c(5, 20), pch=16)
curve((3.3133 + 16.6352) + (0.5843-1.3025)*x, add=T, from=1, to=15)
curve((3.3133 - 0.9116) + 0.5843*x, add=T, from=15, to=max(x))
abline(v=15, lty=3)

Notice that the segments were not constrained to be touching or continuous. This is inherent in the algorithm that we used.

METHOD 2: USE ‘SEGMENTED’ PACKAGE

To use the ‘segmented’ package you will, of course, need to install and load it. The procedure of ‘segmented’ uses maximum likelihood to fit a somewhat different parameterization of the model:

More or less. This isn’t exactly it but it’s close enough

I(x > c) is a dummy variable as above, so when x < c, the model is essentially:

The γ term is simply a measure of the distance between the end of the first segment and the beginning of the next. The model converges when γ is minimized, thus this method constrains the segments to be (nearly) continuous. This is a major difference from the iterative approach in Method 1 above.

To use this method, you first fit a generic linear model. You then use the segmented( ) function to fit the piecewise regression. The segmented( ) function takes for its arguments the generic linear model, seg.Z which is a one sided formula describing the predictor with a segment (we only have one predictor, x, which has the segment), and psi, which is a starting value of the breakpoint (as in nls( ), you need to supply a best-guess estimate). More complicated models are a bit more complicated in terms of arguments, but this is a good starting example.

In our case, x is the predictor with a segment (it’s the only predictor) and based on my very first scatterplot (the first graph on the page), I’m going to guess the breakpoint is 14.

lin.mod <- lm(y~x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=14)

Using the summary(segmented.mod) command, you will get the estimated breakpoint +/- some error (12.07 for me), the intercept and slope for the first segment, and U1.x, which is the slope of the second segment.

UPDATE:

U1.x is not the slope of the second segment. It is the difference in slopes between the first and second segment. So if your coefficients are x = -0.9699 and U1.x = 1.4163, then the slope of the second segment is -0.9699+1.4163 = 0.4464. Using the slope(segmented.mod) command will give you the slopes of each segment, which should match the above calculation.

Plotting it is simple:

plot(x,y, pch=16, ylim=c(5,20))
plot(segmented.mod, add=T)

There you go: two methods for piecewise regression. Each is relatively simple to implement, but they do very different things. If you are dead set on piecewise regression, make sure you choose the one that makes the most sense for your question: do the segments need to be continuous or can they be discontinuous? Make sure you have a rationale for either.

Also note that these models are not nested, so you can’t used likelihood ratio tests for model selection. You can try to use AIC for model selection, but I advise you to think logically about the continuity of the segments instead.

One big reason climate skeptics don’t believe in climate change

If you follow a credible news source at all (i.e. not Fox, CNN, or MSNBC), you’ve undoubtedly been seeing all sorts of stories: “July hottest month on record, ever!“, “Massive drought grips Midwest“, “Heat records shattered across America“, “327th consecutive month with above average temperatures” (now at least 330), etc. Yet despite these facts (indeed, temperature measurements are facts and not guesses), climate skepticism is growing in the United States! Why, in the face of all these shattered temperature records, droughts, heat waves, and snowless winters, do people still argue against climate change?

I’ve pointed out before that the delivery of the climate change message has been less than perfect; fear is not the way forward. Presenting the facts in a neutral way might possibly work, but doom and gloom scenarios will not work (even if they are true).

So what is the one big reason many skeptics still deny climate change? One word: Hypocrisy. This has been bugging me for a while. My knowledge is restricted somewhat to my experience during my Master’s and PhD. programs, but ecology graduate students really do an abysmal job of taking their own advice. For example, ecologists and climate experts say that one of the most effective actions you can take to mitigate climate change is stop driving. Carpool, bike, walk, skateboard, whatever, just stop using your car so much. That’s a pretty big lifestyle change to ask people to undertake, especially when the people giving the advice are terrible at following it themselves and they should know better.

I bike or walk to work every day. Yes, I drive to the store (sometimes) or to my friend’s house, but that’s about it. I generally use my car way less than most, and that’s still considering my friends always make me DD (I’m chalking that up to carpooling, as well). Here’s a map of a few people I worked with in my Master’s program in relation to the lab (pins are not actual addresses, but close).

I really wanted to slap the two people who drove 0.17 miles to work. For your information, that was driving from the dorms across the parking lot to the lab.

At this place, most students lived well within 4 miles of work. That’s a 20 minute ride at most at a very comfortable pace. Hell, it was an island, it wasn’t exactly a long trip anywhere.

This one is worse. Here’s a map of approximate grad student housing in relation to campus at my PhD program.

Sigh….

Not only are they incredibly close, but 1) It’s quicker to bike due to congested streets and the park being closed to automobiles, and 2) The cluster of people that live near one another never even carpooled, they drove separately.

Here’s another map of a place I have worked. In this situation, the lab was split into two segments roughly 3/4 miles apart. Moreover, the lab has about 30-40 free bikes for transportation around campus. Guess who would sign out a truck just to drive back and forth between labs?

If you answered ‘Marine biologists doing climate change research’, you win!

If I sound angry, it’s because I am. I’m probably going to have a few people angry at me for this (as if anyone reads this anyway). People who should know better and constantly complain about “ignorant Americans driving their stupid cars” can’t be bothered to ride a bike themselves. In that case, why should anyone believe us? Skeptics probably say, “Well, if climate change is really an issue, why are these scientists still putzing a mile down the road in a car?”

“You must be the change you to see in the world”, right? Enough ‘Do as I say, not as I’ve done’. How about, “Do as I do, because we all need to work this out together”.

Also, if you haven’t picked up on it yet, 24 hour cable news networks are awful, awful sources of real information. Also, I was careful not to say names or places in the maps above. So if you do suck at not driving, your secret is safe until someone else says it.

R for Ecologists: Simulating Species-Area Curves (linear vs. nonlinear regression)

This post is about basic model simulation so we can get a feel for how curves are supposed to look given certain processes assumed by the model. One of the most prevalent patterns in ecology is the species-area (SAR) curve, which plots the number of species (species richness) against the area sampled. Unfortunately, SAR curves can arise from a number of competing mechanisms (niche partitioning, neutral processes, pure randomness). The curve alone cannot distinguish between these processes, but we can use models to determine how curves will look given underlying, known processes.

I will simulate a SAR curve using pure randomness as the underlying process to show what this curve may look like. Let’s assume we have 20 one meter square plots. Ignoring spatial distribution of the plots, we can choose plots randomly to sample areas of different sizes. For example, I can randomly choose one plot to sample a 1 sq m. area, or I can randomly choose two plots to sample a 2 sq m. area, and so on all the way up to 20 sq m. So the first assumption is a lack of spatial patterns (i.e. neutral/dispersion processes). We aren’t taking into account any spatial arrangement of plots. Also, we will assume a purely random distribution of species, wherein species occur in plots according to their overall abundance (i.e. species ‘A’ has an overall relative abundance of 0.8 across all plots, then it is 80% likely to occur in any given plot). This assumptions means a lack of niche partitioning, no biotic/abiotic interactions.

OK, so here’s how to simulate this in R. As stated above, we have 20 plots. We’ll assume for simplicity that we sample five individuals within each plot (equal sampling effort across plots). So set up an empty 5 x 20 array:

plots <- array(dim=c(5,20))

Now let’s assume that we have an overall species richness of 20 (we’ll just call them species a, b, c, etc…)

species <- c(letters[1:20])

We need to assign a probability of occurrence to each species, making sure the probabilities sum to one. So we simulate 20 probabilities:

probs <- numeric(20)
probs[1:8] <- runif(8, 0, 0.1)
for(i in 9:20){
 probs[i] <- runif(1, 0, 1-sum(probs[1:i-1]))
}

The first line sets up an empty container vector. The second line randomly assigns the first 8 species a probability between 0 and 0.1 (this is to prevent any species from having a probability of 0.7 or higher and dominating all the plots because that would be boring). The for( ) loop makes sure each remaining species has a probability between 0 and one minus the sum of all other probabilities (constraining them to equal one). You can check the constraint:

sum(probs)

which should equal 1 (more or less).

Now for each plot (column) in our plot array, we want to sample five individuals randomly using the probabilities we just created (because the probability of picking any one species is equal to its overall probability).

for(i in 1:20){
plots[,i] <- sample(species, size=5, replace=T, prob=probs)
}

We sample with replacement to account for the possibility that species can occur multiple times within a plot. Essentially, this says the probability of an individual in a plot belonging to a given species is equal to the relative abundance or each species.

Now we work on generating our random SAR curve. Let’s assume that we will randomly sample plots 20 times, and that we will increase the number of plots sampled (i.e. we will sample 1 plot 20 times, then 2 plots 20 times, then 3 plots 20 times, etc..). For each sampling event, we will calculate species richness.

We set up an empty 20 x 20 container (20 sampling events for 20 different plot numbers):

SAR.mat <- array(dim=c(20,20))

We use a nested for( ) loop to simulate the sampling:

for(j in 1:20){
 for(i in 1:20){
 plot.index <- sample(1:20, j, replace=F)
 SAR.plot <- c(plots[,plot.index])
 SAR.mat[i,j] <- length(unique(SAR.plot))
 }
}

The first loop tells the program to sample j plots going from 1 to 20. This is our ‘plot area’ going from 1 sq m to 20 sq m. The second loop is the sampling event, going from 1-20. plot.index is the index of sampled plots (i.e. randomly pull j plots from the 20 that we have). The length(unique( )) command just calculates the number of unique species i.e. species richness.

Now we have a 20 x 20 array containing 20 sampling events (rows) for each possible area (columns). Set up a vector relating the columns to areas, calculate the mean species richness of each column (area), calculate the 95% confidence interval, and then plot:

areas <- 1:20
means <- apply(SAR.mat, MARGIN=2, mean)
lower.95 <- apply(SAR.mat, MARGIN=2, function(x) quantile(x, 0.025))
upper.95 <- apply(SAR.mat, MARGIN=2, function(x) quantile(x, 0.975))

par(mar=c(4,4,1,1)+0.2)
plot(areas, means, type='n', xlab=expression('Area '*(m^-2)),
 ylab='Species Richness', cex.lab=1.2,
 ylim=c(0,12))
polygon(x=c(areas, rev(areas)),
 y=c(lower.95, rev(upper.95)),
 col='grey90')
lines(areas, means, lwd=2)

Should look nice. If it doesn’t, rerun the above code in its entirety and you’ll get a different plot.

Now we want to fit a model to this. The common SAR model is

We can log transform each side to get the following model to fit with a linear regression:

We can fit the model and then plot the curve


SAR.mod <- lm(log(means) ~ log(areas))
 summary(SAR.mod)

curve(exp(coef(SAR.mod)[1])*x^coef(SAR.mod)[2], add=T, from=0, to=20, col='red', lwd=2)

Unfortunately, the log-transformed model actually fits a model assuming multiplicative errors:

The model we really want to fit is:

See Xiao et al (2011) in Ecology for a good description of the differences between the two models above and how they have been misused in allometric studies. We can fit the second model using a non-linear regression, using the log-linear model parameters as reasonable starting estimates. We then plot the nls( ) curve and tack on a legend.


SAR.nls <- nls(means ~ a*areas^b,
 start=list('a'=exp(coef(SAR.mod)[1]),
 'b'=coef(SAR.mod)[2]))

curve(coef(SAR.nls)[1]*x^coef(SAR.nls)[2], add=T, from=0, to=20, col='blue', lwd=2)

legend('topleft', lty=1, col=c('black', 'red', 'blue'),
 legend=c('Median Species Richness', 'Linear Model Fit', 'Nonlinear Model Fit'),
 cex=0.8,
 bty='n')

Notice the collapse of the 95% confidence intervals at large areas because there are fewer and fewer possible permutations of plots (i.e. at 20 plots, there is only one possible permutation).

And there you have it. In my version, the log-linear model and the nls( ) model both fit pretty well, but the nls( ) model fit ever so slightly better than the log-linear model (although the difference between the two is probably trivial). Also, the scaling exponent z from the nls( ) model was estimated as ~0.44. which is within the range of values predicted by island biogeography. As always, email me or comment if I screwed something up or my code could be improved.