# Evaluating Optimization Algorithms in MATLAB, Python, and R

As those of you who read my last post know, I’m at the NIMBioS-CAMBAM workshop on linking mathematical models to biological data here at UT Knoxville. Day 1 (today) was on parameter estimation and model identifiability. Specifically, we (quickly) covered maximum likelihood estimation and how to program our own estimation procedures in MATLAB.

If you’ve read this blog in the past, you know I’m an open-source kind of person. Naturally, I re-programmed all of the MATLAB examples in R. When I did, I noticed something funny. The optimization procedures in MATLAB gave different estimates than those in R. I asked a post-doc there, who seemed equally stumped but did mention that R’s optimization procedures are little funky. So, I took the next logical step and programmed the ML optimization routine into Python, using Scipy and Numpy, just to double check.

The model is an SIR epidemiological model that predicts the number of Susceptible, Infected, and Recovering people with, in this case, cholera. It relies on four parameters, Bi, Bw, e, and k. I won’t give the model here, you’ll see the formula in the code below. When optimizing, I made sure that MATLAB, R, and Python all used Nelder-Mead algorithms and, when possible, equivalent ODE solvers (ode45 in MATLAB and R).

I won’t post the MATLAB code here, because I didn’t write it and it’s multiple files etc etc, but I’ve gone over it line by line to make sure it’s identical to my R and Python code. It is.
MATLAB outputs estimates for Bi (0.2896), Bw (1.0629), e (0.0066) and k (0.0001). If you want the MATLAB files, or the data, I’ll send them to you.

In R, first import the data (see the Python code below for the actual data). Then, run the ode solver and optimization code:

```## Main code for the SIR model example
library(deSolve) # Load the library to solve the ode

## Set initial parameter values
Bi <- 0.75
Bw <- 0.75
e <- 0.01
k <- 1/89193.18

## Combine parameters into a vector
params <- c(Bi, Bw, e, k)
names(params) <- c('Bi', 'Bw', 'e', 'k')

# Make a function for running the ODE
SIRode <- function(t, x, params){
S <- x[1]
I <- x[2]
W <- x[3]
R <- x[4]

Bi <- params[1]
Bw <- params[2]
e <- params[3]
k <- params[4]

dS <- -Bi*S*I - Bw*S*W
dI <- Bi*S*I + Bw*S*W - 0.25*I
dW <- e*(I - W)
dR <- 0.25*I

output <- c(dS, dI, dW, dR)
list(output)
}

# Set initial conditions
I0 <- data[1]*k
R0 <- 0
S0 <- (1-I0)
W0 <- 0

initCond <- c(S0, I0, W0, R0)

# Simulate the model using our initial parameter guesses
initSim <- ode(initCond, tspan, SIRode, params, method='ode45')

plot(tspan, initSim[,3]/k, type='l')
points(tspan, data)

# Make a function for optimzation of the parameters
LLode <- function(params){

k <- params[4]

I0 <- data[1]*k
R0 <- 0
S0 <- 1 - I0
W0 <- 0

initCond <- c(S0, I0, W0, R0)

# Run the ODE
odeOut <- ode(initCond, tspan, SIRode, params, method='ode45')

# Measurement variable
y <- odeOut[,3]/k

diff <- (y - data)
LL <- t(diff) %*% diff

return(LL)
}

# Run the optimization procedure

## Resimulate the ODE
estParms <- MLresults\$par
estParms
MLresults\$value

I0est <- data[1]*estParms[4]
S0est <- 1 - I0est
R0 <- 0
W0 <- 0
initCond <- c(S0est, I0est, W0, R0)

odeEstOut <- ode(initCond, tspan, SIRode, estParms, method='ode45')
estY <- odeEstOut[,3]/estParms[4]

plot(tspan, data, pch=16, xlab='Time', ylab='Number Infected')
lines(tspan, estY)
```

Running this code gives me estimates of Bi (0.30), Bw (1.05), e (0.0057), and k (0.0001).

I rounded the values, but the unrounded values are even closer to MATLAB. So far, I’m fairly satisfied.

We can run the same procedure in Python:

```
from scipy.optimize import minimize
from scipy import integrate
import pylab as py

tspan = np.array([0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112, 119, 126, 133, 140, 147, 154, 161])
data = np.array([ 113, 60, 70, 140, 385, 2900, 4600, 5400, 5300, 6350, 5350, 4400, 3570, 2300, 1900, 2200, 1700, 1170, 830, 750, 770, 520, 550, 380 ])

## define ODE equations
def SIRode(N, t, Bi, Bw, e, k):
return(
-Bi*N[0]*N[1] - Bw*N[0]*N[2] ,
+Bi*N[0]*N[1] + Bw*N[0]*N[2] - 0.25*N[1] ,
+e*(N[1] - N[2]) ,
+0.25*N[1]
)

# Set parameter values
Bi = 0.75
Bw = 0.75
e = 0.01
k = 1/89193.18

# Set initial conditions
I0 = data[0]*k
S0 = 1 - I0
N = [S0, I0, 0]

# Run the ode
Nt = integrate.odeint(SIRode, N, tspan, args=(Bi, Bw, e, k))

# Get the second column of data corresponding to I
INt = [row[1] for row in Nt]
INt = np.divide(INt, k)

py.clf()
py.plot(tspan, data, 'o')
py.plot(tspan, INt)
py.show()

def LLode(x):
Bi = x[0]
Bw = x[1]
e = x[2]
k = x[3]

I0 = data[0]*k
S0 = 1-I0
N0 = [S0, I0, k]

Nt = integrate.odeint(SIRode, N0, tspan, args=(Bi, Bw, e, k))

INt = [row[1] for row in Nt]
INt = np.divide(INt, k)

difference = data - INt

LL = np.dot(difference, difference)

return LL

x0 = [Bi, Bw, e, k]

print results.x

estParams = results.x

Bi = estParams[0]
Bw = estParams[1]
e = estParams[2]
k = estParams[3]

I0 = data[0]*k
S0 = 1 - I0
N = [S0, I0, 0]

Nt = integrate.odeint(SIRode, N, tspan, args=(Bi, Bw, e, k))

INt = [row[1] for row in Nt]
INt = np.divide(INt, k)

py.clf()
py.plot(tspan, data, 'o')
py.plot(tspan, INt)
py.show()
```

Python gave me estimates of Bi (0.297), Bw (1.106), e (0.0057), and k (0.00001).

Overall, I have to say I’m pretty satisfied with the performance of both R and Python. I also didn’t find programming these sorts of optimization procedures in R or Python to be any more difficult than in MATLAB (discounting for the fact that I’m not terribly familiar with MATLAB, but I’m also only somewhat familiar with Python, so they’re roughly equivalent here). I initially wrote this post because I could not get R to sync up with MATLAB, so I tried it in Python and got the same results as in R. I then found out that something was wrong with the MATLAB code, so that all three matched up pretty well.

I’ll say it again, R + Python = awesome.

UPDATE:

I’ve had a couple of requests for time comparisons between MATLAB, R, and Python. I don’t have those for MATLAB, I don’t know MATLAB well.

For Python, if I wrap the mimize function in:

```t0 = time.time()
miminize....# run the optimizer
t1 = time.time()
print t1 - t0
```

I get 3.17 seconds.

In R, if I use system.time( ) to time the optim( ) function, I get about 39 seconds. That pretty much matches my feeling that R is just laboriously slow compared with how quickly Python evaluates the function.

## 16 thoughts on “Evaluating Optimization Algorithms in MATLAB, Python, and R”

1. Could you give us optimization time in R and Python?
By the way: some estimates differ substantially.
R: Bi (0.30), Bw (1.5), e (0.0057), and k (0.0001).
P: Bi (0.297), Bw (1.106), e (0.0057), and k (0.00001).

• That’s a typo. R actually gives Bw(1.05), which is much closer to 1.106. Also, if I’m remembering correctly, the model has identifiability issues, so the parameters may not be estimated accurately (that was part of the exercise).

I don’t have precise optimization times, but I can tell you that MATLAB is by far the fastest, followed closely by Python, while R was much slower than either. Of course, MATLAB uses parallel processing (I think), taking advantage of all four cores on my processes, while R does not. I’m not sure about Python. When doing heavy computations, I use the ‘snowfall’ package in R to use multi-core processing, which speeds up the computation substantially. So base MATLAB to base R: advantage MATLAB. base MATLAB to multi-core R is unknown. I haven’t done that.

2. In your R code, i did not see where data was entered (loaded) and tspan is not defined

• I cut out the line where I load in a separate data file that defines the data and tspan because I have a separate R script that I source in with that information. You can get that info from the Python code and modify it from R.

• See my response above, but I don’t have actual computation times. Eyeballing it, it seems that MATLAB ~ Python << R, but again R doesn't take advantage of multi-core processing like MATLAB and, possibly, Python. R was brutally slow compared to Python and MATLAB. Python might be faster than MATLAB… but I'm not good at MATLAB so I don't know how to get computational times (or in Python, for that matter).

• only slightly. I prefer R for statistics and analyses

3. Nelder-Mead is one of the slowest methods of optim. Why didn’t you use BFGS? In my own experience BFGS is always faster and more accurate than Nelder-Mead.

• I did that to standardize across MATLAB, R, and Python. Like I said, I didn’t write the MATLAB code and I’m not overly familiar with it. I know the optimize function that was used in MATLAB used Nelder-Mead, so I used that method in R and Python for equivalence.

4. Hi Nathan,

Great work, thank you for posting it.

I would like to know which equations of SIR model you used. I couldn’t find anything similar to your code. What exactly are the variables e and k? From what I’ve read the equations of SIR model include just two variables (often called beta and gamma). Are you using those variable to estimate S_0?

Also, I’m thinking of using a part of your python code in my university software project. It goes without saying that we are going to mention you as the author of this code. Would that be ok with you?

Thanks,
Yena

• Hi Yena,

Of course you can use the code! I’m glad it could be useful.

As far as your other questions, the reason that you couldn’t find anything similar to this is that this model is not a traditional SIR model, but is actually an SIRW model that includes a compartment for residence of the disease in a local water body.

k represents the proportion of the population that is infected (usually small), and so k is used to estimate S0 (the number of initially susceptible individuals). e is the residence time of cholera in the water body, and so determines how much cholera is a) both transferred from infected individuals to the water body and b) how much cholera leaves the water body at any given time step.

Eisenberg et al. (2013) Examining rainfall and cholera dynamics in Haiti using statistical and dynamic modeling approaches. Epidemics 5: 197-207

gives a good description of the model. This was the model I presented in the code above, so Eisenberg et al. should also be acknowledged.

Hope this helps!

• Thank you very much for the answer.

We have experimented with your code and applied it on other datasets. It turned out that depending on how we choose initial parameters the fitting can vary significantly. Would you please tell, why have you chosen 0.75 for beta and gamma? This shouldn’t be dependent on dataset, right?
We suppose that the reason is that we find a local minimum instead of the global minimum. As we read on the Wikipedia Nelder-Mead method finds a local minimum. Is there a reason why you chose this method?

Cheers,
Yena

• Well to be honest, I followed the guidelines for the workshop. I’m not sure how Marisa chose 0.75 for beta and gamma, but you should definitely try more than one starting point. And yes, it sounds like you’re finding local minima. It’s not just Nelder-Mead that finds local minima, all optimization algorithms have the potential to get stuck in local minima. Nelder-Mead is the default setting for several software packages because it performs well, is computationally efficient, and does a fairly good job of avoiding local minima. You can try other algorithms (BFGS comes to mind), but these require the calculation of derivatives and are a bit more computationally expensive.

I suppose one way to check which parameters are best is to actually look at the likelihood. You might be getting stuck in local minima, but you can guess as to which values are best as the values that return the greatest likelihood (or log likelihood).