I’m a big fan of open-source software for research. For example, R-statistics, Qgis, and Grass GIS are awesome programs. R can do any statistical tests and numerical modeling you can imagine; if there’s not a built-in function you can write one (the beauty of using a programming language over point-and-click statistical programs). Granted, SPSS and Minitab do allow you to write scripts, but the language is less straightforward than R, and matrix-manipulation is easier in R. Same with Grass GIS. Qgis is an amazing substitute for ArcMap, with a beautiful interface. The other benefit of using a programming language for your research is reproducibility. You can save your code, re-run it to generate the same exact tests and figures so that you never forget what you did. Plus, you can share your code as appendices on your paper so that others can read your code to reproduce your results exactly. There is absolute transparency. This can be incredibly important, as noted in a post by Jeremy Fox over at Dynamic Ecology.

However, as wonderful as R is, there are things R does not do well. Symbolic math is one of them (R CAN do it, but I find the commands clunky). However, the Python programming language (which is a high-level language that can be used to make stand-alone software) has excellent modules (like R packages) for matrix manipulations and symbolic math. Although I’m still new to it, I dare say that Python’s mathematical package is near as complete as MATLAB (at least for my uses). In fact, Python’s numerical packages are widely used in physics, astronomy, and bioinformatics.

Python makes use of three main modules for math: scipy, numpy, and sympy. Scipy and Numpy together make Python a powerful MATLAB-esque program, while Sympy adds capabilities for symbolic math. Matplotlib (and by association Pylab) provide excellent graphing capabilities. Another benefit is that the language is similar to (but not identical to) R, so that transitioning should be easy. R does have a python-language interface package, but it is slow and you still need to learn Python. So why not just run Python directly. Additionally, Spyder provides an excellent RStudio-esque IDE for Python (although I think it’s Mac only). **UPDATE: Spyder apparently works on both Windows and Linux. My bad. **

I can’t walk you through the installation of Python and its modules (there’s a huge amount of material already available, and if you use a Mac, I highly recommend the MacPorts installation route). But I can show you what it can do. I developed a Python script to model Lotka-Volterra competition equations. Here’s the annotated script with the figure output, just as a demonstration of the capabilities of Python’s symbolic math package Sympy as an excellent complement to R’s highly advanced numerical techniques. Interested folks can look up Intro to Python material to get a handle on the language, although if you know R, it should be easy enough to figure out. Same goes for those of you familiar with MATLAB. The biggest difference is that when you import a module, you need to type the module name (or nickname if you use ‘as’) before the command so that Python knows where to look for the command. (NOTE: I’ve verified all of Python Sympy’s results with MATLAB. Although they give different equations in the output, they’re identical albeit with some algebraic manipulation).

# Import the necessary modules (like R packages) # LOTKA-VOLTERRA COMPETITION import sympy as sm from scipy import integrate import pylab as py import numpy as np # First, define the variables as symbols (like MATLAB) r1, N1, K1, a12, r2, N2, K2, a21 = sm.symbols('r1, N1, K1, a12, r2, N2, K2, a21') # Define the Lotka-Volterra differential equation symbolically sp1Diff = r1*N1*(1 - (N1 + a12*N2)/K1) sp2Diff = r2*N2*(1 - (N2 + a21*N1)/K2) # Make the equality, setting the equations equal to zero sp1Mod = sm.Eq( sp1Diff, 0 ) sp2Mod = sm.Eq( sp2Diff, 0 ) # Solve each equation for the relevant species (equilibrium values) sp1Eq = sm.solve( sp1Mod, N1 ) sp2Eq = sm.solve( sp2Mod, N2 ) print 'The equilibrium population size for species 1 is\n', sp1Eq print 'The equilibrium population size for species 2 is\n', sp2Eq # Above, we've used sympy to symbolically find the equilibrium # population sizes for both N1 and N2. The code is very straight- # forward and simple (compare to R's symbolic math abilities). # Unfortunately, the equilibria depend on the population size of # the other species. Assume that both species are present. # Make a system of linear equations using the two equilibria, # solving for N1 and N2. sys1 = sm.Eq( K1 - N1 - N2*a12, 0 ) sys2 = sm.Eq( K2 - N2 - N1*a21, 0 ) # Solve this system for N1 and N2, giving the equilibrium # population size when both species are present sysSolve = sm.solve( (sys1, sys2), N1, N2 ) print 'The equilibrium population size for each species in the presence of the other is\n', sysSolve # Next, we use scipy to solve the Lotka-Volterra model numerically # given a set of chosen parameter values (this can be done in R # easily, but we're on a roll and it's also very easy in Python. # Arguably easier, actually. # Note that arrays in Python start with 0, so that K[0] in Python # is equivalent to K[1] in R (K[1] = K[2] in R, etc.). # I prefer R's technique here, but # I'm sure there's a computer-sciencey reason for using zero-based # arrays. def lvComp(N, t, r, K, alpha): return( r[0] * N[0] * (1 - (N[0]+alpha[0]*N[1])/K[0]) , r[1] * N[1] * (1 - (N[1] + alpha[1]*N[2])/K[1]) ) # Now, set the parameter values. You can vary these later r = [0.5, 0.2] N = [2, 2] # Initial pop size K = [100, 50] alpha = [1.2, 0.2] # Set the duration of the model tmax = 100 t = range(0, tmax + 1) # Range goes from the starting point to n-1, n is the end point # Now here it gets complicated! Solve the ode Nt = integrate.odeint( lvComp, N, t, args = (r, K, alpha) ) # Done! # Get the equilibrium pop sizes by plugging in our chosen parameter # values into the equilibria calculated above. This is why I # didn't use R to solve the ODE. I've already calculated the # symbolic equilibrium, and substituting my parameters in is easy. sp1EqLine = sysSolve[N1].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True) sp2EqLine = sysSolve[N2].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True) # Make a vector of equilibria plotequils = (sp1EqLine, sp2EqLine) # Plot the ode results py.plot(t, Nt, '--o') py.xlabel('time (t)') py.ylabel('N(t)') py.legend(['Sp1', 'Sp2'], loc='lower right') py.title('Lotka-Volterra Competition Between Two Species') for line in plotequils: py.axhline(y=line, linewidth=2, color='red', ls='--')

# To print zero-growth isoclines, first isolate the equilibrium # equation for each species sp1IsoEq = sp1Eq[1] sp2IsoEq = sp2Eq[1] # We're plotting a phase-plane graph with species 2 on the y-axis # so we need to re-arrange the species 1 equation to predict N2 # based on N1 (it's currently the other way around) sp1IsoEq = sm.Eq(sp1IsoEq, N1) sp1IsoEq = sm.solve(sp1IsoEq, N2) # Get the range of N1 on the x-axis. Depending on the parameter # values, the max can either be K1, or K2/a21 n1Range = range(0, int( max(K[0], K[1]/alpha[1]) + 5)) # Calculate the zero-growth isocline for species 1 n1Iso = [] #makes an empty list for n in n1Range: a = sp1IsoEq[0].subs([(K1,K[0]),(N1,n1Range[n]),(a12,alpha[0])], simultaneous=True) n1Iso.append(a) # Calculte the zero-growth isocline for species 1 n2Iso = [] for n in n1Range: a = sp2IsoEq.subs( [ (K2,K[1]), (N1, n1Range[n]), (a21, alpha[1]) ], simultaneous=True ) n2Iso.append(a) # Plot the results, showing the model output overtop py.clf() py.plot(n1Range, n1Iso, '--') py.plot(n1Range, n2Iso, '--') py.plot(Nt[:,0], Nt[:,1], 'b^', label='Model Output') py.axis( [0, max(K[0], K[1]/alpha[1])+0.1*max(K[0], K[1]/alpha[1]), 0, max(K[1], K[0]/alpha[0])+0.1*max(K[1], K[0]/alpha[0])] ) py.xlabel('N1') py.ylabel('N2') py.legend( ['N1', 'N2', 'Model Output'], loc='upper right' ) # Get text value py.text(0.5, K[1], 'K2', va='center') py.text(0.5, K[0]/alpha[0], 'K1/a12', va='center') py.text(K[0], 0.5, 'K1', ha='center') py.text(K[1]/alpha[1], 0.5, 'K2/a21', ha='center')

Normally, we could stop here. But we’ll go one further and plot a vector field (something else python makes easy)

# First, set up a grid of N1 and N2 values (similar to # expand.grid() in R) X,Y = np.meshgrid( np.arange(0, K[0] + 0.1*K[0], 5), # 0 - K1, every 5th number np.arange(0, K[1] + 0.1*K[1], 5) ) # Python wants a starting X,Y position (above) and vectors of # of change along the X and Y axes (U,V). These are what the # differential equations describe, so we can just use the very # first equations over every element in X and Y U = r[0] * X * (1 - (X + alpha[0]*Y)/K[0]) V = r[1] * Y * (1 - (Y + alpha[1]*X)/K[1]) # plot the results py.clf() py.plot(Nt[:,0], Nt[:,1], 'go') Q = py.quiver(X,Y,U,V) py.plot(sp1EqLine, sp2EqLine, 'b*', ms=15) py.plot(0, K[1], 'b*', ms=15) py.plot(K[0], 0, 'b*', ms=15) py.axis( [-1, K[0] + 0.1*K[0], -1, K[1] + 0.1*K[1] ] ) py.xlabel('N1') py.ylabel('N2') py.title('Vector Field of Population Trajectories')

I can’t advocate Python over R for statistics, R is too well defined, has too many functions, and has much better model formulation (to name a very few of the long list of reasons R is better for stats). But hopefully I’ve convinced you that Python is a great platform for symbolic math, calculus, and modeling that can complement R. Together, R and Python make the world’s best open-source mathematical software combination. Also, as with R, if you want it, there’s a Python module for it (GIS, GPS tracking, astronomical predictions, physics, etc). Also, Python is a high-level programming languaging, meaning that it’s possible to integrate your math with other Python modules and programs meant for things like information gathering from websites, etc.

(NOTE: Sage exists as an open-source alternative to MATLAB, Maple, and Mathematica combined and is excellent, but it’s largely built on Python and uses many of the same modules. I’m not sure where Sage extends the capabilities of Numpy and Scipy, but definitely check it out. It’s possible that Sage expands on Numpy, Scipy, and Sympy)

Complements (the two things together make a whole), rather than compliments (‘you look nice!’).

Ahhh… you beat me to it. Back on topic: that’s why R has interfaces to Python, as well as to the OS, Excel, LaTex, etc., and why there’s RinRuby and even a Mathematica interface to R. It’s great to be able to integrate existing tools no matter what language they’re in.

Rats — now the blog owner went and fixed the title

Spyder works on windows too.

I also recommend using pandas (dataframe, plyer, reshape,… replacement -> easier than using numpy directly), statsmodels+patsy (regressions,… and now using formula syntax with patsy) and IPython (mainly the notebook -> running python code in a browser and rendering plots/output inline and sharing it online similar to rpubs -> http://nbviewer.ipython.org/).

Pandas also recently gained a “rplot” module which seems to emulate ggplot.

Didn’t know about rplot in pandas… will have to check that out I love ggplot2 in R. If this sort of functionality comes to Python it may just sway me over to Python completely.

Spyder works on my Linux Mint flawlessly.

You may also be interested in Enthought’s Canopy (Express or supported versions) platform (https://www.enthought.com/). As an EPD user, I am impressed with where Enthought appears to be going with the Canopy project.

Nice post. FWIW, R can do symbolic math using the yacas package, although probably not as slick.

I use python for writing scripts, but I cannot get myself to use python for interactive data analysis on the “command line”. My main problem, as silly as it may sound, is indentation. I like testing pieces of code inside functions, loops, etc. If anyone has any pointers, I’d be happy to hear them.

Spyder allows this, actually. I can write a script in the editor (as in RStudio) which as a function in it. I can select the indented code within the function and run that on its own just fine. And the yacas package is what I was referring to when I said R can do symbolic math, but Python is way easier. The code in python for symbolic math is astoundingly simple.

For interactive data analysis in Python you really should try the IPython Notebook (http://ipython.org/notebook.html). It’s the best interactive analysis environment I’ve every used (in any language). If you really want the command line then terminal IPython automatically handles indentation for you.

I actually started on IPython (last week, when I began learning Python). What I like about Spyder is that I can run a regular console and IPython console simultaneously, so I can hop back and forth between them.

I’m a big fan of your papers and your blog, by the way. I spent a good deal of time on your website looking at your ‘Programming for Biology’ courses (syllabi and reading materials, mostly) when I was getting started with Python and trying to figure out how widespread its use is in ecology.

I’d go with homebrew over MacPorts personally and I’d also go with Continuum’s Anaconda over Enthought’s distribution if looking at binaries.

Pingback: Python Complements R’s Shortcomings | spider's space

Pingback: This is how I did it…learned basic R. | EcoPress

Pingback: R vs Python: Practical Data Analysis (Nonlinear Regression) | Climate Change Ecology

Pingback: This is how I did it…learned R. | EcoPress

Reblogged this on josephdung.