# PyStan: An Intermediate Tutorial of Bayesian Data Analysis in Python

Last week I gave a brief description of Bayesian data analysis using Python and PyStan. The example was pretty basic, and if you’re like me, you hate when you look up examples to figure out what to do and they’re either so simple (like last week) as to be nearly unhelpful or so advanced they’re unintelligible. So suit up, it’s code time.

And beware of computer raptors

Here is my attempt to provide code for a simple Bayesian hierarchical model in PyStan, again all the way from the beginning to the end. The model is of a caterpillar (Epimecis hortaria) feeding on three host plants across four temperatures, where each host plant is a random factor. The model is exponential (not linear), but that’s a relatively small change.

```import pandas as pd
import pystan
import pylab as py
import numpy as np
import scipy.stats as st

herb2 = herb2.dropna(subset = ['Herb_RGR'])

# pull out Epimecis Data
epimecis = herb2[herb2['Herbivore_Scientific'] == 'Epimecis hortaria']
epimecis = epimecis[epimecis['MassCorr_Consumption_Daily'] > 0]

# set temperature of 33 to 35
epimecis['Temperature'].replace(33, 35, inplace = True)

y = epimecis['MassCorr_Consumption_Daily']
Temp = (epimecis['Temperature'] - epimecis['Temperature'].mean()) / epimecis['Temperature'].std()
N = len(y)

Plant = pd.factorize(epimecis['Food_Type'])[0] + 1
# STAN MOD
epimecisMOD = """
data{
int N;
int Plant[N];

vector[N] y;
vector[N] Temp;

}

parameters{
vector[3] a;
vector[3] b;

real<lower = 0> sigma[3];
real mu_a;
real mu_b;

real<lower = 0> sigma_a;
real<lower = 0> sigma_b;
}

model{
vector[N] mu;

for(i in 1:N){
mu[i] <- exp(a[Plant[i]] + b[Plant[i]] * Temp[i]);
y[i] ~ normal(mu[i], sigma[Plant[i]]);
}

a ~ normal(mu_a, sigma_a);
b ~ normal(mu_b, sigma_b);
}
"""

dat = {'y':y,
'Temp':Temp,
'N':N,
'Plant':Plant}

fit = pystan.stan(model_code = epimecisMOD, data = dat, chains = 4, iter = 4000, thin = 10)

fit.plot('a')
fit.plot('b')

print fit

# get parameter estimates
a = fit.extract('a', permuted = True)
b = fit.extract('b', permuted = True)

aChains = pd.DataFrame( np.array(a['a']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )
bChains = pd.DataFrame( np.array(b['b']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )

bChains.hist(bins = 50, color = 'k', alpha = 0.5)
py.show()
```

This gives the histogram of parameter estimates for ‘b’:

Next, plot the output (in a basic plot with no chains of CI):

```# PLOTTING ##
# get the median parameter estimates
aMed = aChains.median()
bMed = bChains.median()

# get the mean and stdev of temperature for conversion
mu = epimecis['Temperature'].mean()
std = epimecis['Temperature'].std()

# get a prediction vector and convert it back to original temperature
xPred = np.linspace(min(Temp), max(Temp), 100)
xTemp = xPred*std + mu

# make the predictions for lindera and spicebush
sassPred = np.exp(aMed[2] + bMed[2]*xPred)
linPred = np.exp(aMed[0] + bMed[0]*xPred)

# make a grouped object and thing calculate the mean and SD for each plant-temperature combination
grouped = epimecis.groupby(['Food_Type', 'Temperature'], as_index = False)
summary = grouped['MassCorr_Consumption_Daily'].agg({'mean' : np.mean,
'sem' : lambda x: x.std() / np.sqrt(x.count()) })
plants = ['Lindera benzoin', 'Liriodendron tulipifera', 'Sassafras albidum']

fig = py.figure(figsize = (6.5, 5.5))

for i in range(3):
py.errorbar(summary[summary['Food_Type'] == plants[i]]['Temperature'],
summary[summary['Food_Type'] == plants[i]]['mean'],
yerr = summary[summary['Food_Type'] == plants[i]]['sem'],
fmt = 'o', ms = 10, capsize = 0, mew = 1,
label = plants[i])

py.plot(xTemp, linPred, 'b')
py.plot(xTemp, sassPred, 'r')
py.title('Consumption Rates of Epimecis hortaria')
py.ylabel('Mass-Corrected Daily Consumption (g)')
py.xlabel(u"Temperature (\u00B0C)")
py.xticks([20, 25, 30, 35])
py.xlim([18, 37])
py.legend(loc='upper left', prop = {'size':12},