My Ideal Python Setup for Statistical Computing

I’m moving more and more towards Python only (if I’m not there already). So I’ve spent a good deal of time getting the ideal Python IDE setup going. One of the biggest reasons I was slow to move away from R is that R has the excellent RStudio IDE. Python has Spyder, which is comparable, but seems sluggish compared to RStudio. I’ve tried PyCharm, which works well, but I had issues with their interactive interpreter running my STAN models.

A friend pointed me towards SublimeText 3, and I have to say that it’s everything I wanted. The text editor is slick, fast, and has lots of great functions. But more than that, the add-ons are really what make Sublime shine

Vital Add Ons:

  • Side Bar Enhancements: This extends the side-bar project organizer, allowing you to add folders and files, delete things, copy paths, etc. A must have.
  • SublimeREPL: Adds interactive interpreters for an enormous number of languages, both R and Python included. Impossible to work without.
  •  Anaconda: An AMAZING package that extends Sublime by offering live Python linting to make sure my code isn’t screwed up, PEP8 formatters for those of you who like such things, and built in documentation and code retrieval, for those times you’ve forgotten how the function works. Another must have.
  • SublimeGIT: For working with github straight from Sublime. Great if you’re doing any sort of module building.
  • Origami: A new way to split layouts and organize your screen. Not essential, but helpful
  • Bracket Highlighter: Helpful for seeing just what set of parentheses I’m working in.

Sublime and all of these packages are also incredibly customizable, you can make them work and look however you want. I’ve spent a few days customizing my setup and I think its fairly solid. Here are my preferences:

For the main Sublime, I modified the scrolling map, turned off autocomplete (which I find annoying but can still access with Ctrl+space, adjusted the carat so I could actually see it, changed the font, and a few other odds and ends.

{
"always_show_minimap_viewport": true,
"auto_complete": false,
"bold_folder_labels": true,
"caret_style": "phase",
"color_scheme": "Packages/Theme - Flatland/Flatland Dark.tmTheme",
"draw_minimap_border": true,
"fade_fold_buttons": false,
"font_face": "Deja San Mono",
"font_size": 14,
"highlight_line": true,
"highlight_modified_tabs": true,
"ignored_packages":
[
"Vintage";
],
"line_padding_bottom": 1,
"line_padding_top": 1,
"preview_on_click": false,
"spell_check": true,
"wide_caret": true,
}

For Bracket Highlighter, I changed the style of the highlight:

{
"high_visibility_enabled_by_default": true,
"high_visibility_style": "thin_underline",
"high_visibility_color": "__default__",
}

For Side-Bar Enhancements, I’ve modified the ‘Open With’ options. For Anaconda, I changed a few small things and turned off PEP8 linting, which I hate. I don’t hate linting nor PEP8, but I don’t have much use for PEP8 linting constantly telling me that I put a space somewhere inappropriate.

{
"complete_parameters": true,
"complete_all_parameters": false,
"anaconda_linter_mark_style": "outline",
"pep8": false,
"anaconda_gutter_theme": "basic",
"anaconda_linter_delay": 0.5,
}

I also installed the Flatland Theme to make it pretty. Here is the end result, also showing the Anaconda documentation viewer that I find so awesome:

Screen Shot 2014-11-26 at 3.11.03 PM

I also now use Sublime for all of my R, knitr, and LaTeX work as well. In all, it’s a pretty phenomenal editor that can do everything I need it to and combines at least four separate applications into one (TextWrangler, Spyder, RStudio, TexShop). Now, some day I’ll be able to afford the $70 to turn off that reminder that I haven’t paid (and $15 for LaTeXing).

UPDATE

I forgot to mention snippets. You can create snippets in Sublime that are shortcuts for longer code. For example, I heavily customize my graphs in the same way every time. Instead of typing all the code, I can now just type tplt followed by a tab and I automatically get:


f, ax = plt.subplots()
ax.plot()
#ax.set_ylim([ , ])
#ax.set_xlim([ , ])
ax.set_ylabel("ylab")
ax.set_xlabel("xlab")
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_position(('outward', 10))
#ax.spines['bottom'].set_bounds()
ax.spines['left'].set_position(('outward', 10))
#ax.spines['left'].set_bounds()
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
plt.savefig(,bbox_inches = 'tight')
plt.show()

Great if you rewrite the same code many times.

Advertisements

5 thoughts on “My Ideal Python Setup for Statistical Computing

  1. Set-up: I straddle the world and line between R and Python (Python 3, BTW), and I also use Conda. At home I’ve bought and installed Continuum’s acceleration package.

    Now most of my work is Bayesian estimation, prediction, model comparison, and most of that, but not all, is done using JAGS. So I can call JAGS from R, and the results are accessible there, for plotting, further study, etc. I often save this in R images. Much of the time, if I need to collect or sample data for analysis, I’ll write Python code to go over our gigabytes of data and distill it, then load that into R, and, so, JAGS.

    JAGS can also be invoked from Python, but I did not “switch” because JAGS itself is a separate module, compiled, and I see no speed-up for that. I s’ppose I should try that some day so I can invoke JAGS from Python when I need to.

    Question: What keeps me in the R camp is its richness of packages for doing *Everything* *Under* *The* *Sun*. Specifically, working for industry, I usually don’t want to invest the time and effort to write new code and, more importantly, subject it to the shakedown such code really needs to be trusted. If it’s published as an R package, with some exceptions, I know it’s gotten a significant shakedown. Python has a lot of packages out there, but in the world of special statistical techniques, there’s a dearth of them. Also, some are written for Python 2.7, and that whittles the field even farther.

    I know Ray Pierrehumbert in his POPC has committed big to Python, and I applaud the open way that’s being done. But I don’t often do the same problem over and over again. I’ll use them in spurts, and then it’s off to a new problem. Do you know of a Treasure Trove of packages like R’s CRAN, but for Python statistical problems and numerical computing. Sure, Numpy and Scipy are a given.

    • I meant to make a post about this earlier. Python has the statsmodels module which is excellent and growing by the day. It now can do GEE and basic mixed-effects models. I also know of several other Python modules under development that have not been released yet for numerical computation and statistical computing. It’s true that R, as of now, boasts a huge advantage in packages that make it far easier to jump into, as I mentioned earlier . But also as I mentioned earlier, I would bet money that Python catches up quickly.

  2. Great post

    Would it be possible to create some sort of YouTube video showing your setup in action? I know it’s lazy from me to ask, but I find those videos always inspiring to tweak my own setup.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s