Last week I gave a brief description of Bayesian data analysis using Python and PyStan. The example was pretty basic, and if you’re like me, you hate when you look up examples to figure out what to do and they’re either so simple (like last week) as to be nearly unhelpful or so advanced they’re unintelligible. So suit up, it’s code time.

Here is my attempt to provide code for a simple Bayesian hierarchical model in PyStan, again all the way from the beginning to the end. The model is of a caterpillar (*Epimecis hortaria*) feeding on three host plants across four temperatures, where each host plant is a random factor. The model is exponential (not linear), but that’s a relatively small change.

import pandas as pd import pystan import pylab as py import numpy as np import scipy.stats as st herbResp = pd.read_csv( '/Users/Nate/Documents/FIU/Research/Temp_HerbResponses/Data/Thermal_Curves/Feeding_Assays.csv' ) herb2 = herbResp[herbResp['Comments'] != 'DEAD'] herb2 = herb2.dropna(subset = ['Herb_RGR']) # pull out Epimecis Data epimecis = herb2[herb2['Herbivore_Scientific'] == 'Epimecis hortaria'] epimecis = epimecis[epimecis['MassCorr_Consumption_Daily'] > 0] # set temperature of 33 to 35 epimecis['Temperature'].replace(33, 35, inplace = True) y = epimecis['MassCorr_Consumption_Daily'] Temp = (epimecis['Temperature'] - epimecis['Temperature'].mean()) / epimecis['Temperature'].std() N = len(y) Plant = pd.factorize(epimecis['Food_Type'])[0] + 1 # STAN MOD epimecisMOD = """ data{ int N; int Plant[N]; vector[N] y; vector[N] Temp; } parameters{ vector[3] a; vector[3] b; real<lower = 0> sigma[3]; real mu_a; real mu_b; real<lower = 0> sigma_a; real<lower = 0> sigma_b; } model{ vector[N] mu; for(i in 1:N){ mu[i] <- exp(a[Plant[i]] + b[Plant[i]] * Temp[i]); y[i] ~ normal(mu[i], sigma[Plant[i]]); } a ~ normal(mu_a, sigma_a); b ~ normal(mu_b, sigma_b); } """ dat = {'y':y, 'Temp':Temp, 'N':N, 'Plant':Plant} fit = pystan.stan(model_code = epimecisMOD, data = dat, chains = 4, iter = 4000, thin = 10) fit.plot('a') fit.plot('b') print fit # get parameter estimates a = fit.extract('a', permuted = True) b = fit.extract('b', permuted = True) aChains = pd.DataFrame( np.array(a['a']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] ) bChains = pd.DataFrame( np.array(b['b']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] ) bChains.hist(bins = 50, color = 'k', alpha = 0.5) py.show()

This gives the histogram of parameter estimates for ‘b’:

Next, plot the output (in a basic plot with no chains of CI):

# PLOTTING ## # get the median parameter estimates aMed = aChains.median() bMed = bChains.median() # get the mean and stdev of temperature for conversion mu = epimecis['Temperature'].mean() std = epimecis['Temperature'].std() # get a prediction vector and convert it back to original temperature xPred = np.linspace(min(Temp), max(Temp), 100) xTemp = xPred*std + mu # make the predictions for lindera and spicebush sassPred = np.exp(aMed[2] + bMed[2]*xPred) linPred = np.exp(aMed[0] + bMed[0]*xPred) # make a grouped object and thing calculate the mean and SD for each plant-temperature combination grouped = epimecis.groupby(['Food_Type', 'Temperature'], as_index = False) summary = grouped['MassCorr_Consumption_Daily'].agg({'mean' : np.mean, 'sem' : lambda x: x.std() / np.sqrt(x.count()) }) plants = ['Lindera benzoin', 'Liriodendron tulipifera', 'Sassafras albidum'] fig = py.figure(figsize = (6.5, 5.5)) for i in range(3): py.errorbar(summary[summary['Food_Type'] == plants[i]]['Temperature'], summary[summary['Food_Type'] == plants[i]]['mean'], yerr = summary[summary['Food_Type'] == plants[i]]['sem'], fmt = 'o', ms = 10, capsize = 0, mew = 1, label = plants[i]) py.plot(xTemp, linPred, 'b') py.plot(xTemp, sassPred, 'r') py.title('Consumption Rates of Epimecis hortaria') py.ylabel('Mass-Corrected Daily Consumption (g)') py.xlabel(u"Temperature (\u00B0C)") py.xticks([20, 25, 30, 35]) py.xlim([18, 37]) py.legend(loc='upper left', prop = {'size':12}, fancybox=True, shadow=True) py.savefig('epimecis.pdf', transparent = True, bbox_inches = 'tight') py.show()

And there you have it. A simple hierarchical model from start to an almost finish (almost because I’d probably add CI’s in a real plot).

Stay tuned for a full on analysis from a paper of mine (eventually). Including CI’s, overall effects, group-level effects, and group-level predictors (something that isn’t really available in the current STAN examples). Don’t hide it. I know you’re excited.