You’ve carefully designed your experiment, you’ve meticulously collected your data, and you have a hypothesis to test. Unfortunately, your data is typical of ecology data: small sample sizes, messy, and non-normal. Your ideal test, the *t*-test, won’t work because of the non-normality and sample size is too small to invoke the central limit theorem. All hope is not lost!

Permutational analysis are a fantastic way to analyze data from designed experiments where experimental units have been randomly placed among treatments (see Anderson 2001* Canadian Journal of Fisheries and Aquatic Science* for a thorough discussion of permutational analyses). In fact, permutational analysis is great for these carefully designed experiments for two main reasons: 1) It frees us from the stringent assumption of normality and equal variances (**update, not necessarily true – see comments**); there are few distributional assumptions and 2) We can analyze any derived metric we like. Note that the assumption of **independent observations **still applies. There is just no getting around pseudoreplication.

First, let’s simulate some data. We’ll start with normal data so we can see the equivalence of permutational analyses and parametric tests when assumptions are met.

Let’s imagine you’re interested in phenotypic plasticity of *Gambusia *and you raise *Gambusia* under different predation regimes: predator-rich and predator poor. You suspect the *Gambusia* will have fast growth rates in predator-rich environments (they want to mature faster and have more, smaller offspring when predators are present). We’ll simulate the data on 15 mosquitofish from the two populations: Predator-Present (PP) and Predator-Absent (PA).

# SET RANDOM NUMBER GENERATOR SEED set.seed(2001) # SIMULATE NORMAL DATA PA <- rnorm(15, 4.5, 2) PP <- rnorm(15, 7.5, 2) # PLOT HISTOGRAMS mf <- par(mfrow=c(1,2)) hist(PA, breaks=5) hist(PP, breaks=5) par(mf)

Even though the data from one group or the other may look non-normal, we know the underlying distribution is normal because we simulated from a normal distribution. We can use a basic *t*-test to determine if there are differences between the two populations

t.test(PA, PP)

You get a very significant *p*-value and a *t*-statistic of -5.95.

Now we’ll analyze it with a permutational test. Inherent in our experimental design was the random assignment of individuals to PP or PA treatments. We can view our observed data as just one of many possible arrangements of the data. We can shuffle the observations around and ask “If the observations are randomly assigned treatments, what is the probability of observing our particular arrangement of the data” (note that this is *still* the frequentist version of hypothesis, what is the probability of our data if the null hypothesis of randomness is true).

We first need to know just how many permutations we’re capable of so we don’t wind up repeating combinations. We have 30 observations, split evenly into two groups. There are 30-choose-15 = 155117520 ways to permute the data uniquely. I think we’re safe. (In R, this is choose(30, 15) ).

We’ll follow this pattern: 1) pool all data, 2) randomly assign 15 observations to PA, 3) assign the rest to PP, 4) calculate the difference in means between the groups, 5) repeat for 9999 iterations, 6) compare our observed mean difference to the distribution of possible mean differences under the randomized procedure.

# POOL DATA pooledData <- c(PA, PP) # SET THE NUMBER OF ITERATIONS nIter <- 9999 # SET UP A CONTAINER FOR PERMUTED DIFFERENCES. ADD IN A SLOT FOR THE OBSERVED VALUE meanDiff <- numeric(nIter+1) # CALCULATE THE OBSERVED MEAN DIFFERENCE meanDiff[1] <- mean(PP) - mean(PA) # RUN THE ITERATION IN A FOR() LOOP for(i in 2:length(meanDiff)){ # start from 2 to avoid overwriting the observed difference index <- sample(1:30, size=15, replace=F) # Sample numbers 1-30 15 times and store in an index PAperm <- pooledData[index] # Assign the sampled values to PA PPperm <- pooledData[-index] # Assign everything else to PP meanDiff[i] <- mean(PPperm) - mean(PAperm) # Calcualte and store the difference in means } # PLOT HISTORGRAM OF DIFFERENCES IN MEANS hist(meanDiff, xlab='Difference in PP and PA means', prob=T, main='') # ADD IN A LINE FOR OUR OBSERVED VALUE abline(v=meanDiff[1], lty=2, col='red') # CALCULATE THE P-VALUE. USE THE ABSOLUTE VALUE FOR A TWO-TAILED TEST mean(abs(meanDiff) >= abs(meanDiff[1]))

The last line is a little trick where you get R to assign all differences greater than our observed difference a value of 1 and everything else gets assigned 0. Then you add up the 1’s and divide by the number of observations to get the proportion of observations greater than our observed statistic. This is the same as taking a mean of a vector of 1’s and 0’s. For example, if I have two 1’s and two 0’s, then (1+1+0+0)/4 = 0.5, which is the same as mean(1,1,0,0).

The *p*-value is incredibly low, similar to what we saw with the *t-*test.

You can try the same thing with log-normal data

# SIMULATE LOG-NORMAL DATA PAlnorm <- rlnorm(n=15, mean=log(4.5), sd=log(2)) PPlnorm <- rlnorm(n=15, mean=log(7.5), sd=log(2)) mf <- par(mfrow=c(1,2)) hist(PAlnorm);hist(PPlnorm) par(mf) # t-Test t.test(PAlnorm, PPlnorm) # POOL DATA pooledData <- c(PAlnorm, PPlnorm) # SET THE NUMBER OF ITERATIONS nIter <- 9999 # SET UP A CONTAINER FOR PERMUTED DIFFERENCES. ADD IN A SLOT FOR THE OBSERVED VALUE meanDiff <- numeric(nIter+1) # CALCULATE THE OBSERVED MEAN DIFFERENCE meanDiff[1] <- mean(PPlnorm) - mean(PAlnorm) # RUN THE ITERATION IN A FOR() LOOP for(i in 2:length(meanDiff)){ # start from 2 to avoid overwriting the observed difference index <- sample(1:30, size=15, replace=F) # Sample numbers 1-30 15 times and store in an index PAperm <- pooledData[index] # Assign the sampled values to PA PPperm <- pooledData[-index] # Assign everything else to PP meanDiff[i] <- mean(PPperm) - mean(PAperm) # Calcualte and store the difference in means } # PLOT HISTORGRAM OF DIFFERENCES IN MEANS hist(meanDiff, xlab='Difference in PP and PA means', prob=T, main='') # ADD IN A LINE FOR OUR OBSERVED VALUE abline(v=meanDiff[1], lty=2, col='red') mean(abs(meanDiff) >= abs(meanDiff[1]))

The results of the *t-*test and permutational analysis are again equivalent, although this will not always be the case. (My personal experience is that it is *usually* the case, so parametric tests are probably more robust to departures from assumptions than we’ve been led to believe).

This code described a *t*-test. Doing an ANOVA requires a few extra lines of code but is conceptually similar.

I prefer to write things like your for() loop like this

mat <- replicate(iter, sample(pooledData))

PAperm <- mat[1:15,]

PPperm <- mat[15:30,]

meanDiff <- colMeans(PPperm) – colMeans(PAperm)

There only one loop like line (the replicate) and it will execute the whole thing much faster than your code.

Thanks! I didn’t know about replicate(). I’ve never used it.

replicate will be much faster for moderate sized data sets with moderate numbers of iterations of the resampling. If you start getting into the hundreds of thousands or millions that advantage disappears. Indeed the for loop will become faster. That is because the object where the results are being stored is dynamically growing which is a relatively slow process. Initializing the vector or matrix first to store it can speed it up in these cases.

Nice post. I wanted to point out one thing and draw you attention to another.

First, you say “Note that the assumption of independent observations still applies. There is just no getting around pseudoreplication.” I believe you are invoking what is often referred to as the “Stable Unit Treatment Value Assumption,” which states that the treatment applied to one unit is not experienced by any other unit (it also includes a “no hidden levels” clause, but we can ignore that for now). Actually, under proper randomization permutation tests can be carried out that explicitly model spillover between units. The independence comes from the independence of treatment assignments, not from the outcomes.

Paul Rosenbaum shows this in a 2007 JASA article (http://www-stat.wharton.upenn.edu/~rosenbap/interference.pdf) without assuming any particular form of interference. Along with co-authors, I have a paper (forth coming in Political Analysis) that extends this style of analysis to allow researchers to test hypotheses of explicit models of spillover (http://arxiv.org/abs/1208.0366).

The paper makes heavy use of our software package RItools (http://github.com/markmfredrickson/RItools — the “randomization-distribution” branch) that simplifies writing and testing models, both those involving spillover and those that do not. The arxiv paper I linked to has an appendix demonstrating some of the analyses used. I also have a work-in-progress vignette that shows the software in much more detail, including several applied examples. The paper (and a link to the source for the paper) is available at my website: http://www.markmfredrickson.com/academics/

I wanted to clarify something you mentioned in your primary posting. You state:

” It frees us from the stringent assumption of normality and equal variances; there are no distributional assumptions because we’re building the distribution from our data..”

This is only partially correct. It is true that using the permutation test means that there are no distributional assumptions like normality. However the assumption of exchangeability is needed, which means that the “error” variance for growth rate (or other response variables) for your two treatment groups (predator-rich and predator-poor environments) need to be equal (well, similar). So it does not matter what the distribution is, but all treatment groups must share that distribution.

In the Anderson 2001 paper see page 628, second column, third paragraph down

“For example, a test using the t statistic for a difference in mean between two samples will be sensitive to differences in error variances, even if the P value is calculated using permutations (Boik 1987).”

This actually may be important for this case, since organisms in “stressed environments” (either due to abiotic or biotic stressors or even deleterious mutations) often have higher levels of variance for traits, so it could be important for your example.

If you want to use resampling methods where you do not need to worry about exchangeability (in other words we really only need independently distributed, but not identically), then some forms of the non-parametric bootstrap will work (the so-called pairs or random bootstrap will work, but not the form that bootstraps model residuals). This would be a useful way to get confidence intervals on the groups, and you can make your inferences from that (and a pseudo p-value). There are also bootstrap hypothesis testing techniques as well if you really want a true p-value, but once you have your confidence intervals why bother!

Yep. Thanks for pointing that out, I messed that up in my post. Actually, in the paragraph above the one you mention, Anderson actually explicitly says “…a test by permutation does not avoid the assumption of homogeneity of error variances…”. Right before that, she says, more or less, that it doesn’t matter what the shape of the distributions in each group are, just that they are roughly similar (so as to have similar error structures). That is, data in each group must be independent and identically distributed, if not necessarily normal.

This is an important point because, if I recall correctly, violating the assumption of homogenous variances tends to have more severe consequences than violating the assumption of normality at low sample sizes .

It may also help to write a function that is more easy to generalize for other models

PermuteFunction <- function(y=your.response, x=your.predictor){

model.resample = lm(sample(y, replace=F) ~ x)

fstats = summary(model.resample)$fstat[1]

# you could use the treatment contrast for diffs

#coef(model.resample)[2]

return(fstats)}

N <- 1000

permute.N = your.observed.stat])/N

# just remember the smallest p-value would be 1/N

length(fstats[fstats>= your.observed.stat])/N