Science Knowledge in the U.S. : 1 in 4 American’s don’t know that Earth Orbits the Sun (a new take)

619px-Full_Sunburst_over_Earth

A recent NSF poll on science knowledge, in which 25% of Americans failed to correctly answer answer that Earth orbits the Sun, is getting a lot of press; you can find articles on NPR, Discovery, and phys.org. This is depressing, and highlights the concerning state of science knowledge in the U.S. However, most of these stories pretty much stop there. But there’s (always) more to it than that. Ignoring issues of sample size (only 2,000 survey respondents or so), there are multiple levels to this survey.

120px-Ecliptic_plane_3d_view

First, the survey asked 10 questions and collected data from all around the world. We can then compare the state of scientific knowledge in the U.S. with that of other countries. With respect to the orbit question, the United States scored higher than every other surveyed country (including the EU) except South Korea.

Interestingly, over 80% of survey respondents knew about continental drift (i.e. that the continents have been moving around for millions of years) but less than half acknowledge that humans descended from earlier species. This seems paradoxical to me. The U.S. scored better on the continental drift question than all but the EU, Japan, and South Korea, but scored worse on the evolution question than every surveyed country but Russia. There’s more to the survey than this, but I won’t get into a play-by-play of the results. The curious can see the results here in the original report.

A New Angle

What is perhaps more interesting, and overlooked by all of those other articles, is that the NSF has conducted this same survey twice before in the U.S., once in 2001 and again in 2004. This lets us look at trends over time (and I love time trends!).

For example, only 80% and 78% of respondents correctly answered the continental drift question in 2001 and 2004, respectively. So it seems like there’s been some improvement (although it’s hard to tell because there’s no information on the margin of error in the survey, which is critical to extrapolating to a population).

In 2001 and 2004, 44% and 53% correctly answered the evolution, respectively. Compared to the 48% from the current survey, this looks like a pretty stable trend. That’s probably the most upsetting part of all. In over ten years, we haven’t made any progress on that front (same with a question about the origins of the universe, which is consistently somewhere in the 30 – 40% range).

Overall, I’m not sure that this report is worth the hoopla. Sure, science knowledge in the U.S. isn’t up to snuff, but it’s a lot better than other countries. And what is up to snuff? Will we ever hit the 80-90% range and is that even a reasonable goal? Sure, the majority of the population doesn’t buy evolution, but we knew that already. Probably the most depressing point is the fact that we haven’t been able to improve over the past decade. With more and more Americans being college educated, it seems odd to me that our science knowledge isn’t improving. Is there a failure of the education system here?

Further, the sample size in these surveys is remarkably low (something the Discovery article completely and surprisingly brushes aside). Extrapolating the results of 2,000 respondents to over 315 million (that’s 315,000,000 compared to 2,000)… well… that’s tenuous at best.  The NSF report appears to concatenate results from numerous surveys, but its unclear to me whether all of those results went into this specific questionnaire or not. Regardless, I guess I’m not surprised and not really all that disappointed, either.

Ecological Lexicon

Hey everyone,

Some grad student friends of mine and I have been digging through the ecological literature to determine how ecologists use different terms (community, assemblage, guild, and ensemble) and whether their usage differs from that of definitions in textbooks and other sources. It can be fairly confusing, seeing as how many terms are often used interchangeably and defined differently depending on what textbook you used as an undergraduate or graduate student. There have been previous attempts to synthesize these definitions, but it’s unclear how successful they were. We’ve conducted a literature and textbook survey of the usage of these terms, but we also want to couple this with actual data on how ecologists and practitioners define these terms themselves to see if some of the confusion in the literature results from confusion amongst ecologists about the definitions. Please try to limit answers to one or two sentences. Thanks for your participation! Karma is your great reward.

Dogs Pollute Your Lawn and Water (So Pick Your Dog’s Poop Up!)

Anyone who has a dog knows that dogs poop. They poop a lot. Part of being a responsible dog owner is picking up this hazardous waste material. Unfortunately, huge numbers of people do not feel like stooping and scooping. This is a problem for all sorts of reasons:

  1. Stepping in dog poop is gross
  2. Often, children play in the grass where dogs poop
  3. Dog poop can contain harmful bacteria (which is why we process human waste)
  4. Dog poop is like a little fertilizer packet that can wreck havoc on your lawn, community, or nearby water bodies like oceans and rivers

I’m going to talk about that last point and try to put it in a frame of reference. I did a little bit of journal searching and some back of the envelope calculations to determine just how much fertilizer a dog puts out. Typically, when we talk fertilizer we think of nitrogen and phosphorus as the two elemental biggies, and I’m going to focus on nitrogen.

To determine how much nitrogen dogs dump onto your lawn each day, we need to know how much a dog actually poops. Fortunately, people know this. A 40 pound (18 kg) dog puts out about 0.07 pounds (30 g) of dry poop material per day [1] . It looks like more than that because poop contains a lot of water, but once you dry it down, this is what’s left over. We also know that dog poop is about 5.21% N (per unit of dry material) [2], which means a 40 lb dog squats out about 0.004 lbs of nitrogen onto your lawn per day.

Doesn’t sound like much? Well, Osmocote fertilizer is 15% nitrogen [3]. So 0.004 lbs of nitrogen that your dog puts onto your lawn is equivalent to about 0.03 lbs of Osmocote slow-release fertilizer per day. Now, Osmocote recommends only 0.02 lb pounds of fertilizer per square foot applied every three to seven months. Your 40 pound dog is dumping out more nitrogen per day than Osmocote’s recommended once or twice per year fertilizer rate.

City-Wide 

Odds are you’re not the only dog-lover in your city. In fact, roughly 36.5% of households own dogs, often more than one [4]. Using demographic pet data, we can estimate how many dogs are in a given community. For example, the greater Miami metropolitan area has 2,097,626 households [5]. This translates to approximately 1,225,013 dogs in the Miami area.

Not all dogs are equal. In Miami, most are much smaller than 40 pounds [citation needed]. We can assume a particular distribution of dog body weights that is heavily skewed towards small body sizes, like this:

The smaller, the yappier

If we assume constant poopage across body sizes, given the information above we know that dogs excrete roughly 0.002 pounds per pound of body mass per day. Multiply this number by each weight class, and we can estimate the amount of poop that a dog puts out in each weight class (i.e. a 5 pound dog puts out 0.002 * 5 = 0.01 pounds of poop per day). Then, multiply that number by the estimated number of dogs in each weight class (above) and we get the total amount of excrement in each weight class (isn’t math awesome?). We can add all of this poop up, which  comes out to a staggering 65,039 pounds of poop per day in the city (anyone looking to solve the oil crisis should look into dog poop as an alternative energy source). This equates to 3,382 pounds of nitrogen per day or 22,457 pounds of fertilizer per day. This is equivalent to over 450 bags of Osmocote (50 lbs) dumped across Miami every day. It doesn’t stop on your lawn either. This stuff gets into the groundwater and leaks into nearby rivers, lakes, canals, and yes, even the ocean.

So do us all a favor. Pick your dog’s poop up.

I don't mind. Really.

I don’t mind. Really.

Cleaning Data and Graphing in R and Python

Python has some pretty awesome data-manipulation and graphing capabilities. If you’re a heavy R-user who dabbles in Python like me, you might wonder what the equivalent commands are in Python for dataframe manipulation. Additionally, I was curious to see how many lines of code it took me to do that same task (load, clean, and graph data) in both R and Python. (I’d like to stop the arguments about efficiency and which language is better than which here, because neither my R nor Python code are the super-efficient, optimal programming methods. They are, however, how I do things. So to me, that’s what matters. Also, I’m not trying to advocate one language over the other (programmers can be a sensitive bunch), I just wanted to post an example showing how to do equivalent tasks in each language).

First, R

# read Data
JapBeet_NoChoice <- read.csv("~/Documents/FIU/Research/JapBeetle_Temp_Herbivory/Data/No_Choice_Assays/JapBeet_NoChoice.csv")
# drop incomplete data
feeding <- subset(JapBeet_NoChoice, Consumption!='NA')
# refactor and clean
feeding$Food_Type <- factor(feeding$Food_Type)
feeding$Temperature[which(feeding$Temperature==33)] <- 35

# subset
plants <- c('Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina')
subDat <- feeding[feeding$Food_Type %in% plants, ]

# make a standard error function for plotting
seFunc <- function(x){
 se <- sd(x) / sqrt(sum(!is.na(x)))
 lims <- c(mean(x) + se, mean(x) - se)
 names(lims) <- c('ymin', 'ymax')
 return(lims)
}

# ggplot!
ggplot(subDat, aes(Temperature, Herb_RGR, fill = Food_Type)) +
 stat_summary(geom = 'errorbar', fun.data = 'seFunc', width = 0, aes(color = Food_Type), show_guide = F) +
 stat_summary(geom = 'point', fun.y = 'mean', size = 3, shape = 21) +
 ylab('Mass Change (g)') +
 xlab(expression('Temperature '*degree*C)) +
 scale_fill_discrete(name = 'Plant Species') +
 theme(
 axis.text = element_text(color = 'black', size = 12),
 axis.title = element_text(size = 14),
 axis.ticks = element_line(color = 'black'),
 legend.key = element_blank(),
 legend.title = element_text(size = 12),
 panel.background = element_rect(color = 'black', fill = NA)
 )
Snazzy!

Snazzy!

Next, Python!

# read data
JapBeet_NoChoice = pd.read_csv("/Users/Nate/Documents/FIU/Research/JapBeetle_Temp_Herbivory/Data/No_Choice_Assays/JapBeet_NoChoice.csv")

# clean up
feeding = JapBeet_NoChoice.dropna(subset = ['Consumption'])
feeding['Temperature'].replace(33, 35, inplace = True)

# subset out the correct plants
keep = ['Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina']
feeding2 = feeding[feeding['Food_Type'].isin(keep)]

# calculate means and SEs
group = feeding2.groupby(['Food_Type', 'Temperature'], as_index = False)
sum_stats = group['Herb_RGR'].agg({'mean' : np.mean, 'SE' : lambda x: x.std() / np.sqrt(x.count())})

# PLOT
for i in range(5):
    py.errorbar(sum_stats[sum_stats['Food_Type'] == keep[i]]['Temperature'],
                sum_stats[sum_stats['Food_Type'] == keep[i]]['mean'],
                yerr = sum_stats[sum_stats['Food_Type'] == keep[i]]['SE'],
                fmt = 'o', ms = 10, capsize = 0, mew = 1, alpha = 0.75,
                label = keep[i])

py.xlabel(u'Temperature (\u00B0C)')
py.ylabel('Mass Change')
py.xlim([18, 37])
py.xticks([20, 25, 30, 35])
py.legend(loc = 'upper left', prop = {'size':10}, fancybox = True, markerscale = 0.7)
py.show()
Snazzy 2!

Snazzy 2!

So, roughly the same number of lines (excluding importing of modules and libraries) although a bit more efficient in Python (barely). For what it’s worth, I showed these two graphs to a friend and asked him which he liked more, he chose Python immediately. Personally, I like them both. It’s hard for me to pick one over the other. I think they’re both great. The curious can see much my older, waaayyy less efficient, much more hideous version of this graph in my paper, but I warn you.. it isn’t pretty. And the code was a nightmare (it was pre-ggplot2 for me, so it was made with R’s base plotting commands which are a beast for this kind of graph).

PyStan: An Intermediate Tutorial of Bayesian Data Analysis in Python

Last week I gave a brief description of Bayesian data analysis using Python and PyStan. The example was pretty basic, and if you’re like me, you hate when you look up examples to figure out what to do and they’re either so simple (like last week) as to be nearly unhelpful or so advanced they’re unintelligible. So suit up, it’s code time.

It's code time

And beware of computer raptors

Here is my attempt to provide code for a simple Bayesian hierarchical model in PyStan, again all the way from the beginning to the end. The model is of a caterpillar (Epimecis hortaria) feeding on three host plants across four temperatures, where each host plant is a random factor. The model is exponential (not linear), but that’s a relatively small change.

import pandas as pd
import pystan
import pylab as py
import numpy as np
import scipy.stats as st

herbResp = pd.read_csv( '/Users/Nate/Documents/FIU/Research/Temp_HerbResponses/Data/Thermal_Curves/Feeding_Assays.csv' )

herb2 = herbResp[herbResp['Comments'] != 'DEAD']
herb2 = herb2.dropna(subset = ['Herb_RGR'])

# pull out Epimecis Data
epimecis = herb2[herb2['Herbivore_Scientific'] == 'Epimecis hortaria']
epimecis = epimecis[epimecis['MassCorr_Consumption_Daily'] > 0]

# set temperature of 33 to 35
epimecis['Temperature'].replace(33, 35, inplace = True)

y = epimecis['MassCorr_Consumption_Daily']
Temp = (epimecis['Temperature'] - epimecis['Temperature'].mean()) / epimecis['Temperature'].std()
N = len(y)

Plant = pd.factorize(epimecis['Food_Type'])[0] + 1
# STAN MOD
epimecisMOD = """
data{
 int N;
 int Plant[N];

vector[N] y;
 vector[N] Temp;

}

parameters{
 vector[3] a;
 vector[3] b;

real<lower = 0> sigma[3];
 real mu_a;
 real mu_b;

real<lower = 0> sigma_a;
 real<lower = 0> sigma_b;
}

model{
 vector[N] mu;

for(i in 1:N){
 mu[i] <- exp(a[Plant[i]] + b[Plant[i]] * Temp[i]);
 y[i] ~ normal(mu[i], sigma[Plant[i]]);
 }

a ~ normal(mu_a, sigma_a);
 b ~ normal(mu_b, sigma_b);
}
"""

dat = {'y':y,
 'Temp':Temp,
 'N':N,
 'Plant':Plant}

fit = pystan.stan(model_code = epimecisMOD, data = dat, chains = 4, iter = 4000, thin = 10)

fit.plot('a')
fit.plot('b')

print fit

# get parameter estimates
a = fit.extract('a', permuted = True)
b = fit.extract('b', permuted = True)

aChains = pd.DataFrame( np.array(a['a']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )
bChains = pd.DataFrame( np.array(b['b']), columns = ['Lindera', 'Liriodendron', 'Sassafras'] )

bChains.hist(bins = 50, color = 'k', alpha = 0.5)
py.show()

This gives the histogram of parameter estimates for ‘b’:figure_3

Next, plot the output (in a basic plot with no chains of CI):

# PLOTTING ##
# get the median parameter estimates
aMed = aChains.median()
bMed = bChains.median()

# get the mean and stdev of temperature for conversion
mu = epimecis['Temperature'].mean()
std = epimecis['Temperature'].std()

# get a prediction vector and convert it back to original temperature
xPred = np.linspace(min(Temp), max(Temp), 100)
xTemp = xPred*std + mu

# make the predictions for lindera and spicebush
sassPred = np.exp(aMed[2] + bMed[2]*xPred)
linPred = np.exp(aMed[0] + bMed[0]*xPred)

# make a grouped object and thing calculate the mean and SD for each plant-temperature combination
grouped = epimecis.groupby(['Food_Type', 'Temperature'], as_index = False)
summary = grouped['MassCorr_Consumption_Daily'].agg({'mean' : np.mean,
 'sem' : lambda x: x.std() / np.sqrt(x.count()) })
plants = ['Lindera benzoin', 'Liriodendron tulipifera', 'Sassafras albidum']

fig = py.figure(figsize = (6.5, 5.5))

for i in range(3):
    py.errorbar(summary[summary['Food_Type'] == plants[i]]['Temperature'],
        summary[summary['Food_Type'] == plants[i]]['mean'],
        yerr = summary[summary['Food_Type'] == plants[i]]['sem'],
        fmt = 'o', ms = 10, capsize = 0, mew = 1,
        label = plants[i])

py.plot(xTemp, linPred, 'b')
py.plot(xTemp, sassPred, 'r')
py.title('Consumption Rates of Epimecis hortaria')
py.ylabel('Mass-Corrected Daily Consumption (g)')
py.xlabel(u"Temperature (\u00B0C)")
py.xticks([20, 25, 30, 35])
py.xlim([18, 37])
py.legend(loc='upper left', prop = {'size':12},
 fancybox=True, shadow=True)
py.savefig('epimecis.pdf', transparent = True, bbox_inches = 'tight')

py.show()

figure_1And there you have it. A simple hierarchical model from start to an almost finish (almost because I’d probably add CI’s in a real plot).

Stay tuned for a full on analysis from a paper of mine (eventually). Including CI’s, overall effects, group-level effects, and group-level predictors (something that isn’t really available in the current STAN examples). Don’t hide it. I know you’re excited.