Ecological Dynamic Programming Optimization in Python

It’s been awhile since I’ve posted anything. I’ll use the excuse that I’ve been busy, but mostly I just forget. Regardless, I’m learning how to do Dynamic Programming Optimization (DPO), which sounds more complex than it is. The reason for this is that DPO allows us to simulate the behavior of individuals who make decisions based on current patch quality.  DPO is an exciting tool that forms the foundation of individual-based models, which allow us to assess community and ecosystem dynamics as emergent properties of individual decisions based on well-grounded, physiological principles (hence my interest).The underlying idea, as I understand it, is that individuals assess their future reproductive output prior to making a decision (I’m interested in optimal foraging, so I’ll put this in the context of foraging). We can model how individuals make decisions over the course of a lifetime, and track each individual, which then allows us to make quantitative statements about populations, communities, or ecosystems.

This all sounds complicated, and it can be difficult. So it’s easiest to jump right in with an example. Here’s the code, along with a basic explanation of what’s going on, for the very first toy example found in “Dynamic State Variable Models in Ecology” by Clark and Mangel. In the book, they give the computer code in TRUEBASIC, but… in all honesty.. no one I know uses that. I could use R, but we all know my feelings on that. So here’s their toy model programmed and annotated in Python.

Background

Suppose we’re interested in the behavior of fish and how they choose to distribute themselves among three different patches. Two patches are foraging patches and one is a reproduction patch. The first thing we need to do is make an objective fitness function F. In this example, F relates body size to reproductive output. It can be thought of as what the organism believes about its final fitness given a particular body size. Let’s make it increasing, but asymptotic, with body mass:

$F = \frac{A(x-x_c)}{x-x_c+x_0}$

Here, A is sort of a saturation constant beyond which fitness increases minimally with body size, $x_c$ is the body size at which mortality occurs (0), and $x_0$ is the initial body size. Lets set $x_c=0$, $A=60$ and $x_0=0.25*x_{max}$, where $x_{max}=30$ is the individuals maximum possible body size. You have to set the ceiling on body size otherwise organisms grow without bounds.

OK so that’s what the organisms “believes” about its fitness at the end of a season given a specific body mass. The question is: How does the organism forage optimally to maximize fitness at the end of the its lifetime?

The obvious answer would be to simulate a foraging decision at every time step moving forward, and then have it decide again at the new time step, etc. This is computationally expensive, so to circumvent this we work backwards from the end of the season. This saves time because there are far fewer paths to a known destination than there are to an unknown one (essentially).

So we imagine that, at the final time step $t_f$, the organism’s fitness is given by F for any given body mass.

import numpy as np
import matplotlib.pyplot as pet
import pandas as pd

n_bodysize = 31 # 31 body sizes, from 0-30
n_time = 20 # 20 time steps, 1-20
n_patch = 3, # 3 foraging patches

# make a container for body size (0-30) and time (1-20)
F = np.zeros((n_bodysize, n_time))

# make the function F
x_crit = 0
x_max = 30
acap = 60
x_0 = 0.25*x_max
t_max = 20

# calculate organism fitness at the final time point
for x in range(x_crit, x_max+1):
F[x,-1] = acap*(x-x_crit)/(x-x_crit+x_0)
[/code]

Now that we know fitness at the final time point, we can iterate through backwards (called backwards simulation) to decide what the optimal strategy is to achieve each body mass. To determine that, we need to know the fitness in each patch. Let’s start with the two foraging patches, Patch 1 and Patch 2. We need to know four things about each patch: (1) the mortality rate in each patch (m), (2) the probability of finding food in each patch (p), (3) the metabolic cost of visiting a patch (a), and (4) the gain in body mass if food is successfully found (y). For these two patches, let:

Patch 1: m=0.01, p=0.2, a=1, y=2

Patch 2: m=0.2, p=0.5, a=1, y=4

Right away we can see that Patch 2 is high-risk, high reward compared to Patch 1. In each patch, we calculate the next body size given that an animal does (x‘) or does not (x”) find food:

$x' = x-a_i+y_i$

$x'' = x-a_i$

Those are simple equations. Body size is the current body size minus the metabolic cost of foraging in the patch and, if successful, the energy gain from foraging. Great. Now we can calculate the expected fitness of each patch as the weighted average of F‘ and F” given the probability of finding food, all times the probability of actually surviving. For these two patches, we make a fitness function (V):

$V_i = (1 - m_i)*[p_i*F_{t+1}(x') + (1-p_i)*F'_{t+1}(x'')]$

The reproductive patch is different. There is no foraging that occurs in the reproductive patch. Instead, if the organism is above a critical mass $x_{rep}$, then it devotes all excess energy to reproduction to a limit  (c=4). If the organism is below the reproductive threshold and still visits the foraging patch, it simply loses mass (unless it dies).

OK this is all kind of complicated, so let’s step through it. We know what fitness is at the final time step because of F. So let’s step back one time step. At this penultimate time step, we go through every body mass and calculate fitness for each patch. Let’s do an example. If x=15, then we need to know fitness in Patch 1, Patch 2, and Patch 3. For Patches 1 and 2, we need to know the weight gain if successful and the weight gain if unsuccessful.

$x' = max(15-a_1+y_1, x_{max})$

$x' = 15-1+2 = 16$

$x''=min(15-a_1, x_{c})$

$x'' = 15-1 = 14$

The min and max functions here just make sure our organism doesn’t grow without limit and dies if metabolic cost exceeds body mass. So these are now the two potential outcomes of foraging in Patch 1 given x=15. The expected fitness of these two body masses is given as F(16) and F(14). Plug all these values into equation V for Patch 1 to get the expected fitness of Patch 1 at a body size of 15. We then take the maximum for all three Patches, save whichever Patch corresponds to that as the optimal foraging decision, and then save as the fitness for body size 15 at that time step. So at body size 15, for Patch 1 is  39, for Patch 2 is 38, and for Patch 3 is 37.6, so the individual will foraging in Patch 1 and now fitness for body size 15 at this time step is 39.

Repeat this procedure for every possible body size, and you’ll get the fitness for every body size at the second to last time step as well as the optimal foraging patch for every body size at that time.

Then, step backwards in time. Repeat this whole procedure, except now the value for each body mass doesn’t come from the equation F, but comes from the fitness we just calculated for each body mass. So for example, if an organism’s foraging decisions at this time step lead it to a body mass of 15, then F is now 39. Again, repeat this for every body mass, and then step back, etc.

Here’s the full Python code for how this is done:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

n_bodysize = 31 # 31 body sizes, 0 - 30
n_time = 20 # 20 time steps, 1 - 20
n_patch = 3 # number of patches

#%% CONATINERS
# make a container for body size (0-30) and time (1-20)
F = np.zeros((n_bodysize, n_time))
# make a container for three patches, each with body size (0-30) and time (1-20)
Vi = np.zeros((n_patch, n_bodysize, n_time))
# make a container for the optimal decision
d = np.zeros((n_bodysize, n_time))

# make a container for the mortality rates in each patch
m = np.zeros(n_patch)
# make a container for the probability of finding food in each path
p = np.zeros(n_patch)
# make a container for the metabolic cost
a = np.zeros(n_patch)
# make a container for food gain in each patch
y = np.zeros(n_patch)
# make a container for reproductive output in each patch
c = np.zeros(n_patch)

#%% CONDITIONS
x_crit = 0
x_max = 30
t_max = 20
x_rep = 4

#%% PARAMETERS
m[0] = 0.01; m[1] = 0.05; m[2] = 0.02
p[0] = 0.2; p[1] = 0.5; p[2] = 0
a[0] = 1; a[1] = 1; a[2] = 1
y[0] = 2; y[1] = 4; y[2] = 0
c[0] = 0; c[1] = 0; c[2] = 4

#%% END CONDITION
acap = 60
x_0 = 0.25*x_max

# Calculate Fitness for every body mass at the final time step
for x in range(x_crit, x_max+1):
F[x,-1] = acap*(x-x_crit)/(x-x_crit+x_0) # maximum reproductive output for each body size at the final time

#%% SOLVER
for t in range(0, t_max-1)[::-1]: # for every time step, working backward in time #print t
for x in range(x_crit+1, x_max+1): # iterate over every body size # print x
for patch in range(0, 3): # for every patch
if patch in [0,1]: # if in a foraging patch
xp = x-a[patch]+y[patch] # the updated body size
xp = min(xp, x_max) # constraint on max size
xpp = int(x-a[patch]) # updated body size if no food
Vi[patch,x,t] = (1 - m[patch])*(p[patch]*F[int(xp),t+1] + (1-p[patch])*F[xpp, t+1]) # Calculate expected fitness for foraging in that patch
else:
if x < x_rep: # in a reproduction patch
xp = max(x-a[patch], x_crit)
Vi[patch, x, t] = (1-m[patch])*F[int(xp), t+1]
else:
fitness_increment = min(x-x_rep, c[patch]) # resources devoted to reproduction
xp = max(x-a[patch]-fitness_increment, x_crit) # new body size is body size minus metabolism minus reproduction
Vi[patch, x, t] = fitness_increment + (1-m[patch])*F[int(xp),t+1]
vmax = max(Vi[0,x,t], Vi[1,x,t])
vmax = max(vmax, Vi[2,x,t])
if vmax==Vi[0,x,t]:
d[x,t] = 1
elif vmax==Vi[1,x,t]:
d[x,t] = 2
elif vmax==Vi[2,x,t]:
d[x,t] = 3
F[x,t] = max # the expected fitness at this time step for this body mass
[/code]

This model doesn’t really track individual behavior. What it does is provides an optimal decision for every time and every body mass. So we know what, say, an individual should do if it finds itself small towards the end of its life, or if it finds itself large at the beginning:

T,X = np.meshgrid(range(1, t_max+1), range(x_crit, x_max+1))
df = pd.DataFrame({'X': X.ravel(), 'T': T.ravel(), 'd': d.ravel()})
df = df.pivot('X', 'T', 'd')
sns.heatmap(df, vmin=1, vmax=3, annot=True, cbar_kws={'ticks': [1,2,3]})
plt.gca().invert_yaxis()
plt.show()
[/code]

And that’s that!! To be honest, I still haven’t wrapped my head around this fully. I wrote this blog post in part to make myself think harder about what was going on, rather than just regurgitating code from the book.