I’ve been reading about **RLQ** analysis, also known as the fourth corner method, for analyzing relationships between environmental characteristics and species traits. I was interested because I thought I might be using **RLQ** analysis to answer a specific set of questions (I’m not). However, I was still curious about it, and I wanted to know how it works.

For those of you have haven’t heard of it, **RLQ** analysis is a method by which one can uncover how the environment filters certain species traits. For example, you can determine whether a particular environment selects for species with rapid growth rates, high reproductive output, or whatever trait you choose to measure. It accomplishes this by, more or less, linking a description of the environment to species traits by measurements of species abundances.

You start with three data tables: The **R** matrix is a site *x* environment table: sites are rows and columns are environmental descriptors. The **L** matrix is a site *x* species table, where rows are sites and columns are abundances of specific species. The **Q** matrix is a species *x* trait table, where rows are species and columns are biological traits of those species. What **RLQ** analysis does, simply, is makes a new matrix that I’ll call **V**, which is a environment *x* traits matrix, and you then perform your standard PCA-esque eigendecomposition on that. Although not technically correct, **RLQ **analysis is simply thought of as nothing more than PCA on the matrix **V**. The vast majority of the work is actually in constructing **V.**

The ‘ade4′ package in R can do **RLQ **analyses. First, you do a principle components analysis on both **R **and **Q**** **and a correspondance analysis on **L**. You then pass these analyses to the rlq() function. To figure out how **RLQ **works, I took apart the rlq() function, then several secondary functions called by rlq(). It turns out, the PCA and CA on the **R, L, **and **Q **matrices aren’t actually used, you can do the whole thing by hand without those preliminary analyses.

So, without further ado, here’s how RLQ analysis works (**NOTE:** I’ve verified my results with those from the rlq() function to make sure they match:

library(ade4) #### READ IN THE DATA AND CLEAN IT UP #### traits <- read.csv('traits.csv') env <- read.csv('environment.csv') species <- read.csv('species.csv') species <- species[,-c(1:2)] env <- env[,-c(1:2)] traits <- traits[,-1] rownames(traits) <- colnames(species)

RLQ analysis operates on the matrix **RLQ**, which is calculated as **R**‘ **D_site L D_species Q**, where **D_site** and **D_species** are diagonal matrices of the row and column weights from the species matrix. As shown below, this is the same as **R’ P Q**, where **P** is the centered probability matrix

First, we have a site by species matrix, **N**, of raw abundances

N <- species

Convert **N** to a relativized species matrix **P**, where p_ij = n_ij / n++, where n++ is the total number of individuals (sum of the entire **N** matrix)

P <- N/sum(N)

Now divide each observation by its row weight (p_i+) and column weight (p_j+). The row and column weights are simply the sum of the observations in a row divided by the sum of the matrix (and similarly for columns). Once done, this gives p_ij = p_ij/(p_i+ p+j+)

row.w <- apply(P, MAR=1, function(x) sum(x)/sum(P)) col.w <- apply(P, MAR=2, function(x) sum(x)/sum(P)) P <- sweep(P, MAR=1, row.w, '/') P <- sweep(P, MAR=2, col.w, '/')

Next, subtract 1 from each observation, givin p_ij = p_ij(p_i+ p_j+) – 1, which equals (p_ij – p_i+p_j+)/(p_i+p_j+).

P <- P-1

This IS the chi-distance matrix used in correspondance analysis. You can verify this by checking the table from the dudi.coa function. They are the same.

However, we only want the centered matrix, we need to remove the weights in the denominator. Create diagonal matrices **D_site** and** D_species** of the row and column weights respectively. Then pre- and post-multiply the matrix **P**. This will yield a matrix **L**, where l_ij = p_ij – p_i+ p_j+

D_site <- diag(row.w) D_species <- diag(col.w) L <- D_site %*% as.matrix(P) %*% D_species

Now make the **R’LQ** matrix. First, center and standardize the columns of **R** and **Q**. The center is taken as the WEIGHTED average where the weights are the row weights (for the environment matrix) and species weights (for the species matrix).

# Calculate the weighted average for each trait and site traitAvg <- apply(traits, MAR=2, function(x) sum( x*col.w )/sum(col.w) ) envAvg <- apply(env, MAR=2, function(x) sum(x*row.w)/sum(row.w)) traitCent <- sweep(traits, 2, traitAvg) envCent <- sweep(env, 2, envAvg)

Calculate the weighted standard deviation. Since the values are now in deviations from the mean, the weighted variance is sum(x^2w) / sum(weights), and the standard deviation is the square root of this.

traitSD <- apply(traitCent, MAR=2, function(x) sqrt(sum(x^2 * col.w)/sum(col.w))) envSD <- apply(envCent, MAR=2, function(x) sqrt(sum(x^2 * row.w)/sum(row.w))) traitScale <- sweep(traitCent, MAR=2, traitSD, '/') envScale <- sweep(envCent, MAR=2, envSD, '/') R <- as.matrix(envScale) Q <- as.matrix(traitScale)

Next, **V** is just the **R’ L Q** product. This is actually the correlation matrix between the environment traits and the species traits, mediated by species abundances.

V <- t(R) %*% L %*% Q round(V, 3)

This is identical to the matrix operated on by the rlq() command. You can check this with by examining the table ($tab) returned by the rlq() function.

Next, get the cross-product matrix because the correlation matrix is not guaranteed to be either square or symmetric

Z <- crossprod(V, V)

The rest is a standard PCA-like eigen decomposition of <strong>V</strong>.

eigVals <- eigen(Z)$values eigVecs <- eigen(Z)$vectors sum(eigVals) ## THE SPECIES TRAIT LOADINGS ON EACH AXIS ARE THE EIGENVECTORS traitLoad <- data.frame(eigVecs) rownames(traitLoad) <- colnames(V) colnames(traitLoad) <- paste('Axis', 1:length(eigVals)) ## THE ENVIRONMENTAL TRAIT SCORES ARE CALCULATED EXACTLY AS IN PCA envScores <- data.frame(V %*% eigVecs) names(envScores) <- paste('Axis', 1:length(eigVals))

This gives the biplot:

I sure wish I could see your raw data. I am an MSc student with no R coach, stuck on the simplest bits. Does;

rownames(traits) <- colnames(species)

…associate those tables? I have not been able to associate the dimids of the 3 tables of "fourth-corner" type data as the aravo data in ade4 is, so I am sure feeding the results of each PCA and CA into rlq() will not yield correct results. None of the 'multitable' library functions for creating a list of associated data.frames have worked, so I wanted to try it your way.

Not really. The only thing that rownames(traits) <- colnames(species) does is name the rows of the trait matrix after the species. It assumes that they're in the same order as the columns of the site x species matrix.

If you want to send me an email, I'll send the raw data along to you

Thanks for the informative descriprion. Could you please post the code for the last figure?