PyStan: An Intermediate Tutorial of Bayesian Data Analysis in Python

Last week I gave a brief description of Bayesian data analysis using Python and PyStan. The example was pretty basic, and if you’re like me, you hate when you look up examples to figure out what to do and they’re either so simple (like last week) as to be nearly unhelpful or so advanced they’re unintelligible. So suit up, it’s code time.

It's code time

And beware of computer raptors

Here is my attempt to provide code for a simple Bayesian hierarchical model in PyStan, again all the way from the beginning to the end. The model is of a caterpillar (Epimecis hortaria) feeding on three host plants across four temperatures, where each host plant is a random factor. The model is exponential (not linear), but that’s a relatively small change.

import pandas as pd
import pystan
import pylab as py
import numpy as np
import scipy.stats as st

herbResp = pd.read_csv( '/Users/Nate/Documents/FIU/Research/Temp_HerbResponses/Data/Thermal_Curves/Feeding_Assays.csv' )

herb2 = herbResp[herbResp['Comments'] != 'DEAD']
herb2 = herb2.dropna(subset = ['Herb_RGR'])

# pull out Epimecis Data
epimecis = herb2[herb2['Herbivore_Scientific'] == 'Epimecis hortaria']
epimecis = epimecis[epimecis['MassCorr_Consumption_Daily'] > 0]

# set temperature of 33 to 35
epimecis['Temperature'].replace(33, 35, inplace = True)

y = epimecis['MassCorr_Consumption_Daily']
Temp = (epimecis['Temperature'] - epimecis['Temperature'].mean()) / epimecis['Temperature'].std()
N = len(y)

Plant = pd.factorize(epimecis['Food_Type'])[0] + 1
# STAN MOD
epimecisMOD = """
data{
 int N;
 int Plant[N];

vector[N] y;
 vector[N] Temp;

}

parameters{
 vector[3] a;
 vector[3] b;

real<lower = 0> sigma[3];
 real mu_a;
 real mu_b;

real<lower = 0> sigma_a;
 real<lower = 0> sigma_b;
}

model{
 vector[N] mu;

for(i in 1:N){
 mu[i] <- exp(a[Plant[i]] + b[Plant[i]] * Temp[i]);
 y[i] ~ normal(mu[i], sigma[Plant[i]]);
 }

a ~ normal(mu_a, sigma_a);
 b ~ normal(mu_b, sigma_b);
}
"""

dat = {'y':y,
 'Temp':Temp,
 'N':N,
 'Plant':Plant}

fit = pystan.stan(model_code = epimecisMOD, data = dat, chains = 4, iter = 4000, thin = 10)

fit.plot('a')
fit.plot('b')

print fit

# get parameter estimates
a = fit.extract('a', permuted = True)
b = fit.extract('b', permuted = True)

aChains = pd.DataFrame( np.array(a['a']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )
bChains = pd.DataFrame( np.array(b['b']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )

bChains.hist(bins = 50, color = 'k', alpha = 0.5)
py.show()

This gives the histogram of parameter estimates for ‘b’:figure_3

Next, plot the output (in a basic plot with no chains of CI):

# PLOTTING ##
# get the median parameter estimates
aMed = aChains.median()
bMed = bChains.median()

# get the mean and stdev of temperature for conversion
mu = epimecis['Temperature'].mean()
std = epimecis['Temperature'].std()

# get a prediction vector and convert it back to original temperature
xPred = np.linspace(min(Temp), max(Temp), 100)
xTemp = xPred*std + mu

# make the predictions for lindera and spicebush
sassPred = np.exp(aMed[2] + bMed[2]*xPred)
linPred = np.exp(aMed[0] + bMed[0]*xPred)

# make a grouped object and thing calculate the mean and SD for each plant-temperature combination
grouped = epimecis.groupby(['Food_Type', 'Temperature'], as_index = False)
summary = grouped['MassCorr_Consumption_Daily'].agg({'mean' : np.mean,
 'sem' : lambda x: x.std() / np.sqrt(x.count()) })
plants = ['Lindera benzoin', 'Liriodendron tulipifera', 'Sassafras albidum']

fig = py.figure(figsize = (6.5, 5.5))

for i in range(3):
    py.errorbar(summary[summary['Food_Type'] == plants[i]]['Temperature'],
        summary[summary['Food_Type'] == plants[i]]['mean'],
        yerr = summary[summary['Food_Type'] == plants[i]]['sem'],
        fmt = 'o', ms = 10, capsize = 0, mew = 1,
        label = plants[i])

py.plot(xTemp, linPred, 'b')
py.plot(xTemp, sassPred, 'r')
py.title('Consumption Rates of Epimecis hortaria')
py.ylabel('Mass-Corrected Daily Consumption (g)')
py.xlabel(u"Temperature (\u00B0C)")
py.xticks([20, 25, 30, 35])
py.xlim([18, 37])
py.legend(loc='upper left', prop = {'size':12},
 fancybox=True, shadow=True)
py.savefig('epimecis.pdf', transparent = True, bbox_inches = 'tight')

py.show()

figure_1And there you have it. A simple hierarchical model from start to an almost finish (almost because I’d probably add CI’s in a real plot).

Stay tuned for a full on analysis from a paper of mine (eventually). Including CI’s, overall effects, group-level effects, and group-level predictors (something that isn’t really available in the current STAN examples). Don’t hide it. I know you’re excited.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s