R for Ecologists: RLQ analysis (semi) explained

I’ve been reading about RLQ analysis, also known as the fourth corner method, for analyzing relationships between environmental characteristics and species traits. I was interested because I thought I might be using RLQ analysis to answer a specific set of questions (I’m not). However, I was still curious about it, and I wanted to know how it works.

For those of you have haven’t heard of it, RLQ analysis is a method by which one can uncover how the environment filters certain species traits. For example, you can determine whether a particular environment selects for species with rapid growth rates, high reproductive output, or whatever trait you choose to measure. It accomplishes this by, more or less, linking a description of the environment to species traits by measurements of species abundances.

You start with three data tables: The R matrix is a site x environment table: sites are rows and columns are environmental descriptors. The L matrix is a site x species table, where rows are sites and columns are abundances of specific species. The Q matrix is a species x trait table, where rows are species and columns are biological traits of those species. What RLQ analysis does, simply, is makes a new matrix that I’ll call V, which is a environment x traits matrix, and you then perform your standard PCA-esque eigendecomposition on that. Although not technically correct, RLQ analysis is simply thought of as nothing more than PCA on the matrix V. The vast majority of the work is actually in constructing V.

The ‘ade4′ package in R can do RLQ analyses. First, you do a principle components analysis on both and Q and a correspondance analysis on L. You then pass these analyses to the rlq() function. To figure out how RLQ works, I took apart the rlq() function, then several secondary functions called by rlq(). It turns out, the PCA and CA on the R, L, and matrices aren’t actually used, you can do the whole thing by hand without those preliminary analyses.

So, without further ado, here’s how RLQ analysis works (NOTE: I’ve verified my results with those from the rlq() function to make sure they match:

library(ade4)

#### READ IN THE DATA AND CLEAN IT UP ####
traits <- read.csv('traits.csv')
env <- read.csv('environment.csv')
species <- read.csv('species.csv')

species <- species[,-c(1:2)]
env <- env[,-c(1:2)]
traits <- traits[,-1]
rownames(traits) <- colnames(species)

RLQ analysis operates on the matrix RLQ, which is calculated as RD_site L D_species Q, where D_site and D_species are diagonal matrices of the row and column weights from the species matrix. As shown below, this is the same as R’ P Q, where P is the centered probability matrix

First, we have a site by species matrix, N, of raw abundances

N <- species

Convert N to a relativized species matrix P, where p_ij = n_ij / n++, where n++ is the total number of  individuals (sum of the entire N matrix)

P <- N/sum(N)

Now divide each observation by its row weight (p_i+) and column weight (p_j+). The row and column weights are simply the sum of the observations in a row divided by the sum of the matrix (and similarly for columns). Once done, this gives p_ij = p_ij/(p_i+ p+j+)

row.w <- apply(P, MAR=1, function(x) sum(x)/sum(P))
col.w <- apply(P, MAR=2, function(x) sum(x)/sum(P))

P <- sweep(P, MAR=1, row.w, '/')
P <- sweep(P, MAR=2, col.w, '/')

Next, subtract 1 from each observation, givin p_ij = p_ij(p_i+ p_j+) – 1, which equals (p_ij – p_i+p_j+)/(p_i+p_j+).

P <- P-1

This IS the chi-distance matrix used in correspondance analysis. You can verify this by checking the table from the dudi.coa function. They are the same.

However, we only want the centered matrix, we need to remove the weights in the denominator. Create diagonal matrices D_site and D_species of the row and column weights respectively. Then pre- and post-multiply the matrix P. This will yield a matrix L, where l_ij = p_ij – p_i+ p_j+

D_site <- diag(row.w)
D_species <- diag(col.w)

L <- D_site %*% as.matrix(P) %*% D_species

Now make the R’LQ matrix. First, center and standardize the columns of R and Q. The center is taken as the WEIGHTED average where the weights are the row weights (for the environment matrix) and species weights (for the species matrix).

# Calculate the weighted average for each trait and site
traitAvg <- apply(traits, MAR=2, function(x) sum( x*col.w )/sum(col.w) )
envAvg <- apply(env, MAR=2, function(x) sum(x*row.w)/sum(row.w))

traitCent <- sweep(traits, 2, traitAvg)
envCent <- sweep(env, 2, envAvg)

Calculate the weighted standard deviation. Since the values are now in deviations from the mean, the weighted variance is sum(x^2w) / sum(weights), and the standard deviation is the square root of this.

traitSD <- apply(traitCent, MAR=2, function(x) sqrt(sum(x^2 * col.w)/sum(col.w)))
envSD <- apply(envCent, MAR=2, function(x) sqrt(sum(x^2 * row.w)/sum(row.w)))

traitScale <- sweep(traitCent, MAR=2, traitSD, '/')
envScale <- sweep(envCent, MAR=2, envSD, '/')

R <- as.matrix(envScale)
Q <- as.matrix(traitScale)

Next, V is just the R’ L Q product. This is actually the correlation matrix between the environment traits and the species traits, mediated by species abundances.

V <- t(R) %*% L %*% Q
round(V, 3)

This is identical to the matrix operated on by the rlq() command. You can check this with by examining the table ($tab) returned by the rlq() function.

Next, get the cross-product matrix because the correlation matrix is not guaranteed to be either square or symmetric

Z <- crossprod(V, V)

The rest is a standard PCA-like eigen decomposition of <strong>V</strong>.

eigVals <- eigen(Z)$values
eigVecs <- eigen(Z)$vectors

sum(eigVals)

## THE SPECIES TRAIT LOADINGS ON EACH AXIS ARE THE EIGENVECTORS
traitLoad <- data.frame(eigVecs)
rownames(traitLoad) <- colnames(V)
colnames(traitLoad) <- paste('Axis', 1:length(eigVals))

## THE ENVIRONMENTAL TRAIT SCORES ARE CALCULATED EXACTLY AS IN PCA
envScores <- data.frame(V %*% eigVecs)
names(envScores) <- paste('Axis', 1:length(eigVals))

This gives the biplot:

RLQbiplot

About these ads

3 thoughts on “R for Ecologists: RLQ analysis (semi) explained

  1. I sure wish I could see your raw data. I am an MSc student with no R coach, stuck on the simplest bits. Does;
    rownames(traits) <- colnames(species)
    …associate those tables? I have not been able to associate the dimids of the 3 tables of "fourth-corner" type data as the aravo data in ade4 is, so I am sure feeding the results of each PCA and CA into rlq() will not yield correct results. None of the 'multitable' library functions for creating a list of associated data.frames have worked, so I wanted to try it your way.

    • Not really. The only thing that rownames(traits) <- colnames(species) does is name the rows of the trait matrix after the species. It assumes that they're in the same order as the columns of the site x species matrix.

      If you want to send me an email, I'll send the raw data along to you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s