My Ideal Python Setup for Statistical Computing

I’m moving more and more towards Python only (if I’m not there already). So I’ve spent a good deal of time getting the ideal Python IDE setup going. One of the biggest reasons I was slow to move away from R is that R has the excellent RStudio IDE. Python has Spyder, which is comparable, but seems sluggish compared to RStudio. I’ve tried PyCharm, which works well, but I had issues with their interactive interpreter running my STAN models.

A friend pointed me towards SublimeText 3, and I have to say that it’s everything I wanted. The text editor is slick, fast, and has lots of great functions. But more than that, the add-ons are really what make Sublime shine

Vital Add Ons:

  • Side Bar Enhancements: This extends the side-bar project organizer, allowing you to add folders and files, delete things, copy paths, etc. A must have.
  • SublimeREPL: Adds interactive interpreters for an enormous number of languages, both R and Python included. Impossible to work without.
  •  Anaconda: An AMAZING package that extends Sublime by offering live Python linting to make sure my code isn’t screwed up, PEP8 formatters for those of you who like such things, and built in documentation and code retrieval, for those times you’ve forgotten how the function works. Another must have.
  • SublimeGIT: For working with github straight from Sublime. Great if you’re doing any sort of module building.
  • Origami: A new way to split layouts and organize your screen. Not essential, but helpful
  • Bracket Highlighter: Helpful for seeing just what set of parentheses I’m working in.

Sublime and all of these packages are also incredibly customizable, you can make them work and look however you want. I’ve spent a few days customizing my setup and I think its fairly solid. Here are my preferences:

For the main Sublime, I modified the scrolling map, turned off autocomplete (which I find annoying but can still access with Ctrl+space, adjusted the carat so I could actually see it, changed the font, and a few other odds and ends.

{
"always_show_minimap_viewport": true,
"auto_complete": false,
"bold_folder_labels": true,
"caret_style": "phase",
"color_scheme": "Packages/Theme - Flatland/Flatland Dark.tmTheme",
"draw_minimap_border": true,
"fade_fold_buttons": false,
"font_face": "Deja San Mono",
"font_size": 14,
"highlight_line": true,
"highlight_modified_tabs": true,
"ignored_packages":
[
"Vintage";
],
"line_padding_bottom": 1,
"line_padding_top": 1,
"preview_on_click": false,
"spell_check": true,
"wide_caret": true,
}

For Bracket Highlighter, I changed the style of the highlight:

{
"high_visibility_enabled_by_default": true,
"high_visibility_style": "thin_underline",
"high_visibility_color": "__default__",
}

For Side-Bar Enhancements, I’ve modified the ‘Open With’ options. For Anaconda, I changed a few small things and turned off PEP8 linting, which I hate. I don’t hate linting nor PEP8, but I don’t have much use for PEP8 linting constantly telling me that I put a space somewhere inappropriate.

{
"complete_parameters": true,
"complete_all_parameters": false,
"anaconda_linter_mark_style": "outline",
"pep8": false,
"anaconda_gutter_theme": "basic",
"anaconda_linter_delay": 0.5,
}

I also installed the Flatland Theme to make it pretty. Here is the end result, also showing the Anaconda documentation viewer that I find so awesome:

Screen Shot 2014-11-26 at 3.11.03 PM

I also now use Sublime for all of my R, knitr, and LaTeX work as well. In all, it’s a pretty phenomenal editor that can do everything I need it to and combines at least four separate applications into one (TextWrangler, Spyder, RStudio, TexShop). Now, some day I’ll be able to afford the $70 to turn off that reminder that I haven’t paid (and $15 for LaTeXing).

UPDATE

I forgot to mention snippets. You can create snippets in Sublime that are shortcuts for longer code. For example, I heavily customize my graphs in the same way every time. Instead of typing all the code, I can now just type tplt followed by a tab and I automatically get:


f, ax = plt.subplots()
ax.plot()
#ax.set_ylim([ , ])
#ax.set_xlim([ , ])
ax.set_ylabel("ylab")
ax.set_xlabel("xlab")
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_position(('outward', 10))
#ax.spines['bottom'].set_bounds()
ax.spines['left'].set_position(('outward', 10))
#ax.spines['left'].set_bounds()
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
plt.savefig(,bbox_inches = 'tight')
plt.show()

Great if you rewrite the same code many times.

Car Talk Puzzler Solution: Man Growing A Lawn

I love car talk. My radio station in Miami plays reruns every Saturday (which I just learned were reruns, since I just learned they no longer produce new shows). Probably my favorite part of Car Talk is the puzzler, which is usually some math-based word problem or some word-based math problem. Last weekend’s puzzler was a great one. The short version goes like this:

“A man has a dirt yard. On June 1, he goes to a garden store and tries to figure out how he can get a nice, lush lawn in time for his July 4 party. After discussing his options, the store clerk suggests a new technique: a plug of grass that doubles every day. The clerk did the calculations and figured out that, if the man were to buy one plug, his lawn would be covered by June 30 (in 30 days). The man thinks that he would be cutting it too close, so he buys two plugs. How many days does it take for his yard to get covered?”

As a biologist/ecologist, I recognized this immediately. This is the problem of exponential population growth! You can write down a really simple equation. Suppose that today is day 1 (t1) and tomorrow is day 2 (t2). The population of tomorrow is twice that of today:

latex-image-1

The lawn size on day 3 (t3) is twice that of the lawn size on day 2:

latex-image-2

and so on and so on until you get to t(30). However, writing all of these calculations is time consuming and kind of a pain. So you can consolidate. If t(2) is twice that of t(1), and we know that t(1) is twice that of t(0), we can substitute in:

latex-image-3

which gives a general equation:

latex-image-4

where the population on the ith day is simply 2 raised to (i – 1) multiplied by the initial population size. So we can quickly calculate the lawn size on day 30:

latex-image-5

That’s the first bit of information we need. But the man started with two plugs (to be safe). We need to know how many days it takes to reach that same size:

latex-image-6

To solve for (i-1), we can use the awesomeness of (natural) logarithms:

latex-image-7

The Car Talk guys used nice numbers that led to nice integer solutions. The neat part is that, by starting with two plugs “for insurance” and spending twice as much, the man only saved himself a single day. The proof of that is in this plot:

figure_1

The two plug line (blue) reaches the final size of the one plug line (green) only one day before

This puzzler was easy for me because I recognized the word problem immediately as the exponential growth of populations, something I’m fairly familiar with.

Why Not? (An Evolution Pictogram)

IF:

Check out David Dogglehoff over there...

Check out David Dogglehoff over there…

AND:

That's right. Broccoli, cauliflower, cabbage, brussel sprouts, and kale are all one disgusting species.

That’s right. Broccoli, cauliflower, cabbage, brussel sprouts, and kale are all one disgusting species.

THEN WHY NOT:

I am aware that a) dogs and the veggies are all one species while the primates here are all different generas and families that diverged millions of years ago (rather than a few thousand), so the differences are much more pronounced and that b) the process of selective breeding (for dogs and veggies) is different than speciation, although it is very similar to sexual selection and I don't know of any research suggesting that sexual selection is NOT how primates diversified. The common ancestor is an artist's rendition of Pierolapithecus catalaunicus, which is the suggested common ancestor (or close to it) of humans and great apes.

I am aware that a) dogs and the veggies are all one species while the primates here are all different generas and families that diverged millions of years ago (rather than a few thousand), so the differences are much more pronounced and that b) the process of selective breeding (for dogs and veggies) is different than speciation, although it is very similar to sexual selection and I don’t know of any research suggesting that sexual selection is NOT how primates diversified. The common ancestor is an artist’s rendition of Pierolapithecus catalaunicus, which is the suggested common ancestor (or close to it) of humans and great apes.
We are to the great apes what the chinese crested/hairless chihuahua are to dogs

 

AFTER ALL:

skeletons

We’re not so different…

Notes: All images from Wikimedia commons except the one of me and the skeleton comparison (which came from Google images).

Science Knowledge in the U.S. : 1 in 4 American’s don’t know that Earth Orbits the Sun (a new take)

619px-Full_Sunburst_over_Earth

A recent NSF poll on science knowledge, in which 25% of Americans failed to correctly answer answer that Earth orbits the Sun, is getting a lot of press; you can find articles on NPR, Discovery, and phys.org. This is depressing, and highlights the concerning state of science knowledge in the U.S. However, most of these stories pretty much stop there. But there’s (always) more to it than that. Ignoring issues of sample size (only 2,000 survey respondents or so), there are multiple levels to this survey.

120px-Ecliptic_plane_3d_view

First, the survey asked 10 questions and collected data from all around the world. We can then compare the state of scientific knowledge in the U.S. with that of other countries. With respect to the orbit question, the United States scored higher than every other surveyed country (including the EU) except South Korea.

Interestingly, over 80% of survey respondents knew about continental drift (i.e. that the continents have been moving around for millions of years) but less than half acknowledge that humans descended from earlier species. This seems paradoxical to me. The U.S. scored better on the continental drift question than all but the EU, Japan, and South Korea, but scored worse on the evolution question than every surveyed country but Russia. There’s more to the survey than this, but I won’t get into a play-by-play of the results. The curious can see the results here in the original report.

A New Angle

What is perhaps more interesting, and overlooked by all of those other articles, is that the NSF has conducted this same survey twice before in the U.S., once in 2001 and again in 2004. This lets us look at trends over time (and I love time trends!).

For example, only 80% and 78% of respondents correctly answered the continental drift question in 2001 and 2004, respectively. So it seems like there’s been some improvement (although it’s hard to tell because there’s no information on the margin of error in the survey, which is critical to extrapolating to a population).

In 2001 and 2004, 44% and 53% correctly answered the evolution, respectively. Compared to the 48% from the current survey, this looks like a pretty stable trend. That’s probably the most upsetting part of all. In over ten years, we haven’t made any progress on that front (same with a question about the origins of the universe, which is consistently somewhere in the 30 – 40% range).

Overall, I’m not sure that this report is worth the hoopla. Sure, science knowledge in the U.S. isn’t up to snuff, but it’s a lot better than other countries. And what is up to snuff? Will we ever hit the 80-90% range and is that even a reasonable goal? Sure, the majority of the population doesn’t buy evolution, but we knew that already. Probably the most depressing point is the fact that we haven’t been able to improve over the past decade. With more and more Americans being college educated, it seems odd to me that our science knowledge isn’t improving. Is there a failure of the education system here?

Further, the sample size in these surveys is remarkably low (something the Discovery article completely and surprisingly brushes aside). Extrapolating the results of 2,000 respondents to over 315 million (that’s 315,000,000 compared to 2,000)… well… that’s tenuous at best.  The NSF report appears to concatenate results from numerous surveys, but its unclear to me whether all of those results went into this specific questionnaire or not. Regardless, I guess I’m not surprised and not really all that disappointed, either.

Ecological Lexicon

Hey everyone,

Some grad student friends of mine and I have been digging through the ecological literature to determine how ecologists use different terms (community, assemblage, guild, and ensemble) and whether their usage differs from that of definitions in textbooks and other sources. It can be fairly confusing, seeing as how many terms are often used interchangeably and defined differently depending on what textbook you used as an undergraduate or graduate student. There have been previous attempts to synthesize these definitions, but it’s unclear how successful they were. We’ve conducted a literature and textbook survey of the usage of these terms, but we also want to couple this with actual data on how ecologists and practitioners define these terms themselves to see if some of the confusion in the literature results from confusion amongst ecologists about the definitions. Please try to limit answers to one or two sentences. Thanks for your participation! Karma is your great reward.