Python Complements R’s Shortcomings

I’m a big fan of open-source software for research. For example, R-statistics, Qgis, and Grass GIS are awesome programs. R can do any statistical tests and numerical modeling you can imagine; if there’s not a built-in function you can write one (the beauty of using a programming language over point-and-click statistical programs). Granted, SPSS and Minitab do allow you to write scripts, but the language is less straightforward than R, and matrix-manipulation is easier in R. Same with Grass GIS. Qgis is an amazing substitute for ArcMap, with a beautiful interface. The other benefit of using a programming language for your research is reproducibility. You can save your code, re-run it to generate the same exact tests and figures so that you never forget what you did. Plus, you can share your code as appendices on your paper so that others can read your code to reproduce your results exactly. There is absolute transparency. This can be incredibly important, as noted in a post by Jeremy Fox over at Dynamic Ecology.

However, as wonderful as R is, there are things R does not do well. Symbolic math is one of them (R CAN do it, but I find the commands clunky). However, the Python programming language (which is a high-level language that can be used to make stand-alone software) has excellent modules (like R packages) for matrix manipulations and symbolic math. Although I’m still new to it, I dare say that Python’s mathematical package is near as complete as MATLAB (at least for my uses). In fact, Python’s numerical packages are widely used in physics, astronomy, and bioinformatics.

Python makes use of three main modules for math: scipy, numpy, and sympy. Scipy and Numpy together make Python a powerful MATLAB-esque program, while Sympy adds capabilities for symbolic math. Matplotlib (and by association Pylab) provide excellent graphing capabilities. Another benefit is that the language is similar to (but not identical to) R, so that transitioning should be easy. R does have a python-language interface package, but it is slow and you still need to learn Python. So why not just run Python directly. Additionally, Spyder provides an excellent RStudio-esque IDE for Python (although I think it’s Mac only). UPDATE: Spyder apparently works on both Windows and Linux. My bad.

I can’t walk you through the installation of Python and its modules (there’s a huge amount of material already available, and if you use a Mac, I highly recommend the MacPorts installation route). But I can show you what it can do. I developed a Python script to model Lotka-Volterra competition equations. Here’s the annotated script with the figure output, just as a demonstration of the capabilities of Python’s symbolic math package Sympy as an excellent complement to R’s highly advanced numerical techniques. Interested folks can look up Intro to Python material to get a handle on the language, although if you know R, it should be easy enough to figure out. Same goes for those of you familiar with MATLAB. The biggest difference is that when you import a module, you need to type the module name (or nickname if you use ‘as’) before the command so that Python knows where to look for the command. (NOTE: I’ve verified all of Python Sympy’s results with MATLAB. Although they give different equations in the output, they’re identical albeit with some algebraic manipulation).

# Import the necessary modules (like R packages)
# LOTKA-VOLTERRA COMPETITION
import sympy as sm
from scipy import integrate
import pylab as py
import numpy as np

# First, define the variables as symbols (like MATLAB)
r1, N1, K1, a12, r2, N2, K2, a21 = sm.symbols('r1, N1, K1, a12, r2, N2, K2, a21')

# Define the Lotka-Volterra differential equation symbolically
sp1Diff = r1*N1*(1 - (N1 + a12*N2)/K1)
sp2Diff = r2*N2*(1 - (N2 + a21*N1)/K2)

# Make the equality, setting the equations equal to zero
sp1Mod = sm.Eq( sp1Diff, 0 )
sp2Mod = sm.Eq( sp2Diff, 0 )

# Solve each equation for the relevant species (equilibrium values)
sp1Eq = sm.solve( sp1Mod, N1 )
sp2Eq = sm.solve( sp2Mod, N2 )

print 'The equilibrium population size for species 1 is\n', sp1Eq
print 'The equilibrium population size for species 2 is\n', sp2Eq

# Above, we've used sympy to symbolically find the equilibrium
# population sizes for both N1 and N2. The code is very straight-
# forward and simple (compare to R's symbolic math abilities).

# Unfortunately, the equilibria depend on the population size of
# the other species. Assume that both species are present.
# Make a system of linear equations using the two equilibria,
# solving for N1 and N2.

sys1 = sm.Eq( K1 - N1 - N2*a12, 0 )
sys2 = sm.Eq( K2 - N2 - N1*a21, 0 )

# Solve this system for N1 and N2, giving the equilibrium
# population size when both species are present

sysSolve = sm.solve( (sys1, sys2), N1, N2 )
print 'The equilibrium population size for each species in the presence of the other is\n', sysSolve

# Next, we use scipy to solve the Lotka-Volterra model numerically
# given a set of chosen parameter values (this can be done in R
# easily, but we're on a roll and it's also very easy in Python.
# Arguably easier, actually.

# Note that arrays in Python start with 0, so that K[0] in Python
# is equivalent to K[1] in R (K[1] = K[2] in R, etc.).
# I prefer R's technique here, but
# I'm sure there's a computer-sciencey reason for using zero-based
# arrays.
def lvComp(N, t, r, K, alpha):
    return(
    r[0] * N[0] * (1 - (N[0]+alpha[0]*N[1])/K[0]) ,
    r[1] * N[1] * (1 - (N[1] + alpha[1]*N[2])/K[1])
    )

# Now, set the parameter values. You can vary these later
r = [0.5, 0.2]
N = [2, 2]  # Initial pop size
K = [100, 50]
alpha = [1.2, 0.2]

# Set the duration of the model
tmax = 100
t = range(0, tmax + 1)
# Range goes from the starting point to n-1, n is the end point

# Now here it gets complicated! Solve the ode
Nt = integrate.odeint( lvComp, N, t, args = (r, K, alpha) )
# Done!

# Get the equilibrium pop sizes by plugging in our chosen parameter
# values into the equilibria calculated above. This is why I
# didn't use R to solve the ODE. I've already calculated the
# symbolic equilibrium, and substituting my parameters in is easy.
sp1EqLine = sysSolve[N1].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True)
sp2EqLine = sysSolve[N2].subs( [ (K1,K[0]), (K2, K[1]), (a12,alpha[0]), (a21, alpha[1]) ], simultaneous=True)

# Make a vector of equilibria
plotequils = (sp1EqLine, sp2EqLine)

# Plot the ode results
py.plot(t, Nt, '--o')
py.xlabel('time (t)')
py.ylabel('N(t)')
py.legend(['Sp1', 'Sp2'], loc='lower right')
py.title('Lotka-Volterra Competition Between Two Species')

for line in plotequils:
    py.axhline(y=line, linewidth=2, color='red', ls='--')

ODEresults

# To print zero-growth isoclines, first isolate the equilibrium
# equation for each species
sp1IsoEq = sp1Eq[1]
sp2IsoEq = sp2Eq[1]

# We're plotting a phase-plane graph with species 2 on the y-axis
# so we need to re-arrange the species 1 equation to predict N2
# based on N1 (it's currently the other way around)
sp1IsoEq = sm.Eq(sp1IsoEq, N1)
sp1IsoEq = sm.solve(sp1IsoEq, N2)

# Get the range of N1 on the x-axis. Depending on the parameter
# values, the max can either be K1, or K2/a21
n1Range = range(0, int( max(K[0], K[1]/alpha[1]) + 5))

# Calculate the zero-growth isocline for species 1
n1Iso = [] #makes an empty list

for n in n1Range:
    a = sp1IsoEq[0].subs([(K1,K[0]),(N1,n1Range[n]),(a12,alpha[0])], simultaneous=True)
    n1Iso.append(a)

# Calculte the zero-growth isocline for species 1
n2Iso = []
for n in n1Range:
    a = sp2IsoEq.subs( [ (K2,K[1]), (N1, n1Range[n]), (a21, alpha[1]) ], simultaneous=True )
    n2Iso.append(a)

# Plot the results, showing the model output overtop
py.clf()
py.plot(n1Range, n1Iso, '--')
py.plot(n1Range, n2Iso, '--')
py.plot(Nt[:,0], Nt[:,1], 'b^', label='Model Output')
py.axis( [0, max(K[0], K[1]/alpha[1])+0.1*max(K[0], K[1]/alpha[1]), 0, max(K[1], K[0]/alpha[0])+0.1*max(K[1], K[0]/alpha[0])] )
py.xlabel('N1')
py.ylabel('N2')
py.legend( ['N1', 'N2', 'Model Output'], loc='upper right' )
# Get text value
py.text(0.5, K[1], 'K2', va='center')
py.text(0.5, K[0]/alpha[0], 'K1/a12', va='center')
py.text(K[0], 0.5, 'K1', ha='center')
py.text(K[1]/alpha[1], 0.5, 'K2/a21', ha='center')

IsoPlot

Normally, we could stop here. But we’ll go one further and plot a vector field (something else python makes easy)

# First, set up a grid of N1 and N2 values (similar to
# expand.grid() in R)
X,Y = np.meshgrid(
    np.arange(0, K[0] + 0.1*K[0], 5), # 0 - K1, every 5th number
    np.arange(0, K[1] + 0.1*K[1], 5)
    )

# Python wants a starting X,Y position (above) and vectors of
# of change along the X and Y axes (U,V). These are what the
# differential equations describe, so we can just use the very
# first equations over every element in X and Y
U = r[0] * X * (1 - (X + alpha[0]*Y)/K[0])
V = r[1] * Y * (1 - (Y + alpha[1]*X)/K[1])

# plot the results
py.clf()
py.plot(Nt[:,0], Nt[:,1], 'go')
Q = py.quiver(X,Y,U,V)
py.plot(sp1EqLine, sp2EqLine, 'b*', ms=15)
py.plot(0, K[1], 'b*', ms=15)
py.plot(K[0], 0, 'b*', ms=15)
py.axis( [-1, K[0] + 0.1*K[0], -1, K[1] + 0.1*K[1] ] )
py.xlabel('N1')
py.ylabel('N2')
py.title('Vector Field of Population Trajectories')

VectorField

I can’t advocate Python over R for statistics, R is too well defined, has too many functions, and has much better model formulation (to name a very few of the long list of reasons R is better for stats). But hopefully I’ve convinced you that Python is a great platform for symbolic math, calculus, and modeling that can complement R. Together, R and Python make the world’s best open-source mathematical software combination. Also, as with R, if you want it, there’s a Python module for it (GIS, GPS tracking, astronomical predictions, physics, etc). Also, Python is a high-level programming languaging, meaning that it’s possible to integrate your math with other Python modules and programs meant for things like information gathering from websites, etc.

(NOTE: Sage exists as an open-source alternative to MATLAB, Maple, and Mathematica combined and is excellent, but it’s largely built on Python and uses many of the same modules. I’m not sure where Sage extends the capabilities of Numpy and Scipy, but definitely check it out. It’s possible that Sage expands on Numpy, Scipy, and Sympy)

17 thoughts on “Python Complements R’s Shortcomings

    • Ahhh… you beat me to it. Back on topic: that’s why R has interfaces to Python, as well as to the OS, Excel, LaTex, etc., and why there’s RinRuby and even a Mathematica interface to R. It’s great to be able to integrate existing tools no matter what language they’re in.

  1. Spyder works on windows too.

    I also recommend using pandas (dataframe, plyer, reshape,… replacement -> easier than using numpy directly), statsmodels+patsy (regressions,… and now using formula syntax with patsy) and IPython (mainly the notebook -> running python code in a browser and rendering plots/output inline and sharing it online similar to rpubs -> http://nbviewer.ipython.org/).

    Pandas also recently gained a “rplot” module which seems to emulate ggplot.

    • Didn’t know about rplot in pandas… will have to check that out :) I love ggplot2 in R. If this sort of functionality comes to Python it may just sway me over to Python completely.

  2. Nice post. FWIW, R can do symbolic math using the yacas package, although probably not as slick.

    I use python for writing scripts, but I cannot get myself to use python for interactive data analysis on the “command line”. My main problem, as silly as it may sound, is indentation. I like testing pieces of code inside functions, loops, etc. If anyone has any pointers, I’d be happy to hear them.

    • Spyder allows this, actually. I can write a script in the editor (as in RStudio) which as a function in it. I can select the indented code within the function and run that on its own just fine. And the yacas package is what I was referring to when I said R can do symbolic math, but Python is way easier. The code in python for symbolic math is astoundingly simple.

      • I actually started on IPython (last week, when I began learning Python). What I like about Spyder is that I can run a regular console and IPython console simultaneously, so I can hop back and forth between them.

        I’m a big fan of your papers and your blog, by the way. I spent a good deal of time on your website looking at your ‘Programming for Biology’ courses (syllabi and reading materials, mostly) when I was getting started with Python and trying to figure out how widespread its use is in ecology.

  3. I’d go with homebrew over MacPorts personally and I’d also go with Continuum’s Anaconda over Enthought’s distribution if looking at binaries.

  4. Pingback: Python Complements R’s Shortcomings | spider's space

  5. Pingback: This is how I did it…learned basic R. | EcoPress

  6. Pingback: R vs Python: Practical Data Analysis (Nonlinear Regression) | Climate Change Ecology

  7. Pingback: This is how I did it…learned R. | EcoPress

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s