Time for Miami to Revamp its Public Transportation System

Lackluster Transportation in the Magic City

Miami-Dade suffers from some of the worst driving conditions in the United States. Numerous bottlenecks along I-95, the only traffic corridor connecting metropolitan South Florida, bring traffic to a standstill at nearly all hours of the day. Traffic analysts at INRIX consistently rank among the top 15 most congested cities in the U.S. Congestion is only part of the issue, however. The Florida stretch of I-95 incurs more fatalities per year than any other interstate segment in the country. Many of these fatalities occur in Miami-Dade, which 242 traffic-related deaths in 2011, more than any other Florida county. Local troopers blame driver distraction (e.g. texting) coupled with aggressive driving. Meanwhile, Florida legislature will not prohibit texting while driving in any meaningful way; the recent anti-texting law treats texting as a ‘secondary offence’, punishable only if the driver has already been pulled over for another offence. Given these factors, being off the road in Miami is far better, and safer, than being on it.


A typical day on the Miami freeway

Unfortunately, Miami provides one of the worst public transportation systems of any major metropolitan area in the U.S. Even some cities in South America, like Bogata, Colombia, have more comprehensive public transportation (and bike lanes). Numerous Miami-Dade Transit (MDT) websites extoll the virtues of public transportation in the greater Miami area. To the uninformed, these websites can be deceptive.

The MDT bus system is chaotic and woefully inefficient. A trip that takes 20 minutes by car requires over two hours, at best, in the MDT bus system. One prime is example is the route between Coconut Grove and FIU’s main campus on 8th St. In part, this is because there are bus stops every other block. Contrast this with efficient busing in Houston, Washington, or even Bogata, where pick-up locations are further apart and dedicated bus lanes included along many heavily trafficked routes. In those cities, transfers among routes operate efficiently, providing timely access to nearly anywhere in the city. Despite these inefficiencies, last year the federal government donated $10 million to upgrade MDT’s bus fleet with hybrid buses. With brand new buses, MDT should (re)organize routes, making buses a viable transport option for the numerous taxpayers in need of car-free alternatives.

Metro Light

Stunning in both its breadth and complexity

Stunning in both its breadth and complexity

Miami also has an extremely underdeveloped rail system. The MetroRail was completed in 1984, costing over $1 billion (the Federal Transit Authority (FTA) footed roughly 80% of the bill). For that sum, Miami received one rail line between South Miami, through downtown, and then west to Hialeah and Palmetto, with no stop at Miami International Airport. Whether an oversight or deliberately poor planning, it took 25 years to rectify this mistake. Although there are now two lines, Orange and Green, there is still but one track. There are no stops in South Beach, midtown, or north along the US 1 corridor for residents of North Miami, Aventura, and Hollywood. This accounts, in part, for the horrid congestion on I-95. Rail extensions were planned, but shelved when FTA withdrew financial support due to concerns over MDT’s estimates of cost as well as suspicion of government corruption.

Yet, MDT proclaims that 15% of Miami’s population (~70,000 passengers) pass through MetroRail turnstiles every weekday. This misleading percentage only holds when population estimates are restricted to the depopulated City of Miami metro area. This excludes Hialeah, Opa-Locka, Coral Gables, Doral, Miami Beach, North Miami, North Miami Beach, Kendall, South Miami, etc. Residents of all of these regions commute daily through the Miami metro area. Adding in the populations of these cities reduces the number to ~7%. National context is also important. Although 70,000 passengers per day sounds impressive, this number pales in comparison to daily light rail ridership in other major U.S. cities: New York City (5.4 million), Washington D.C. and Chicago (790,000), San Francisco (375,000). Comparatively, the Miami MetroRail is sadly underused.

By comparison....

By comparison…. I give you Washington DC

Can Miami Revamp its Public Transportation System?

Yes. City officials are reexamining a possible ‘Bay Link’ rail line to South Beach. The Bay Link had been considered in the early 2000s, but the issue died in 2004. This line would connect tourists with South Beach directly from the airport and, if coupled with rail extensions to other areas of metropolitan South Florida, remove numerous drunk drivers from the freeways. In addition, public support for rail lines is swelling. Wynwood, a burgeoning neighborhood, created an imaginary ‘Purple Line’ stop to gather support for MetroRail expansion.

Can Miami, and MDT, make it work? Possibly. Voters approved a half-penny tax increase a decade ago to fund MDT expansion. These funds paid for the new Orange line stop at the airport. Unfortunately, progress has been slow and taxpayers are losing faith in city officials to address these problems. The FTA withdrew funding for the MetroRail amid speculation of corruption and misappropriations. A continuing scandal, in which the FTA recently ruled that county officials illegally handled a contract for new rail cars, will not assuage the FTAs concerns and bring back federal funding. One viable option is a Private-Public Partnership (PPP), wherein private investors assume much of the cost, and risk, of infrastructure improvements. PPP programs are taking off around the U.S., including South Florida. The current renovation of the 826-836 exchange in one example, as is the new Port of Miami Tunnel. The question is whether MDT can find investors willing to gamble on a greater Miami MetroRail in a city notorious for publics works projects being far over budget and way behind schedule. Then again, perhaps the risks associated with private investment is the remedy to keep construction costs low and on time. Can Miami make a rail line work?

I certainly hope so, for everyone’s sake.

Side note: I submitted a short version of this as an opinion letter to the Miami Herald, but I’ve not heard back.

Faking Injuries Isn’t Just for Europeans Anymore

I’m a soccer player. Always have been, and at some level, always will be. I also enjoy watching American football (despite the fact that both my fantasy teams crumbled to dust this weekend. Damn you and your 65 points, Peyton Manning!). That said, it’s impossible for me to count the number of times that I’ve been told that soccer is for [wusses], what with all that faking and rolling around on the ground.

They’re not exactly wrong. Soccer is full of faked injuries at all levels, not just professional. I even played against a team in high school that did it consistently. In fact, one of the most infamous, not-so-subtle demonstrations was by Brazil in the Women’s World Cup (do watch the video, it really is awful). For the record, players usually don’t fake injuries because they’re frail and easily hurt. Usually, its to gain a tactical advantage. In the Brazil case, it was to burn time off the clock to secure a win in overtime (it didn’t work, in what is arguably one of the greatest comebacks in sports history, certainly for the U.S.). Other times it’s to draw a foul, or attempt to get another player carded (carded players need to tone down aggression or risk being ejected with a second card). Although these tactics aren’t exactly sportsmanlike, tactical fouls have a long place in sports history, and faking injuries/fouls is no exception (think soccer and basketball, trying to draw a charge foul). However, no sport has incurred the ridicule associated with fake injuries like soccer.

Well guess what? Turns out, the NFL likes to fake injuries too! Future Hall-of-Famer (hopefully) Brian Urlacher (a man among men of linebackers) has admitted it. I remember playing football in high school. As a kicker, I was told to dive onto the ground anytime an opposing player came near me on a field goal to try and draw a penalty. The coaches got pretty mad if I got touched and stayed on my feet. Today, Jerry Jones called out the New York Giants for it in Sunday’s game. The Giants lost two players to injuries on back-to-back plays on a drive where the Cowboys were trying to up the tempo of the game. Is Jerry Jones right? Who knows. Football is a brutal sport. Its likely that the injuries are real (your first thought on seeing any injury should be: ‘I hope he’s OK’. It should not be whatever Kansas City fans do). But they also might not be. Or, they could be some combination of a minor injury that gets addressed at an opportune moment. I’m not making any judgement on whether the injuries mentioned by Jones are real, I’m merely commenting on the trend in general.

Personally, I’d like to see some sort of formal analysis of when injuries occur. It wouldn’t surprise me to see that the number of minor injuries (assessed by the player returning to practice or the game in the next week) increases substantially in the final minutes of the game, when teams are running hurry-up offenses to put points on the board. I’d do this, but I don’t have the time right now.

I’m not mocking football (ok, I am slightly), I still really enjoy watching it. I just like irony.

What Does Open-Access Data Really Mean?

I’ve recently been digging through the internet looking for data on plant traits (things like seed mass, SLA, etc.). I thought this would be relatively simple given the recent push towards open-access data repositories and the arrival of ‘big data ecology‘. Data repositories like Dryad or The Knowledge Network for Biocomplexity aim to make finding raw data easy. Other websites, like TraitNet, LEDA, or TRY aim to compile all of the existing data on a particular subject (like plant traits) into a single location, vet the data, standardize collection practices, and provide the data for use. Other institutions, like NCEAS, appear to have requirements that the data be made publicly available and have their own public, searchable repository.

My question is: After trying unsuccessfully for several days to actually get any of this data, what does open access data really mean? Personally, I pictured a searchable database that would pull up data related to your search terms, provide metadata to determine if the data is useful, and then you click download and *poof*, data on your hard drive. This turned out to be pretty rare.

Some of those databases contain no data, they simply provide links to other databases. Data from Dryad or KNB is spotty: sometimes you can download it, sometimes its listed but not publicly available, sometimes it takes you to another website which asks for a login ID and requires author permission to use. That last bit is common: Data is posted, you are then redirected to a secondary website, requiring a login to access, possibly requiring author permission to use (which requires emailing the author and getting a response), and then, sometimes, requiring uploading data of your own (i.e. TRY), which is hard for people concentrating on meta-analyses.

I guess my complaint is just that, when I saw open-access databases, I imagined searching, browsing, and downloading. It really seems like we’re not quite there yet, there are still dozens of hoops to jump through that make actually getting your hands on the data difficult. One could argue that the author of the data should always have a final say as to whether the data is downloaded. In effect, they do. By uploading the data, the authors are implicitly acknowledging that they have gotten their publications from the data, and while they might still make use of it, its available for anyone else. If the data is still being worked, the authors can always choose not to upload it.

I like the premise of open-access data a lot (its why I make all my data and code available on my website), but I don’t think we’ve quite gotten the spirit of it yet.

R for Ecologists: RLQ analysis (semi) explained

I’ve been reading about RLQ analysis, also known as the fourth corner method, for analyzing relationships between environmental characteristics and species traits. I was interested because I thought I might be using RLQ analysis to answer a specific set of questions (I’m not). However, I was still curious about it, and I wanted to know how it works.

For those of you have haven’t heard of it, RLQ analysis is a method by which one can uncover how the environment filters certain species traits. For example, you can determine whether a particular environment selects for species with rapid growth rates, high reproductive output, or whatever trait you choose to measure. It accomplishes this by, more or less, linking a description of the environment to species traits by measurements of species abundances.

You start with three data tables: The R matrix is a site x environment table: sites are rows and columns are environmental descriptors. The L matrix is a site x species table, where rows are sites and columns are abundances of specific species. The Q matrix is a species x trait table, where rows are species and columns are biological traits of those species. What RLQ analysis does, simply, is makes a new matrix that I’ll call V, which is a environment x traits matrix, and you then perform your standard PCA-esque eigendecomposition on that. Although not technically correct, RLQ analysis is simply thought of as nothing more than PCA on the matrix V. The vast majority of the work is actually in constructing V.

The ‘ade4’ package in R can do RLQ analyses. First, you do a principle components analysis on both and Q and a correspondance analysis on L. You then pass these analyses to the rlq() function. To figure out how RLQ works, I took apart the rlq() function, then several secondary functions called by rlq(). It turns out, the PCA and CA on the R, L, and matrices aren’t actually used, you can do the whole thing by hand without those preliminary analyses.

So, without further ado, here’s how RLQ analysis works (NOTE: I’ve verified my results with those from the rlq() function to make sure they match:


traits <- read.csv('traits.csv')
env <- read.csv('environment.csv')
species <- read.csv('species.csv')

species <- species[,-c(1:2)]
env <- env[,-c(1:2)]
traits <- traits[,-1]
rownames(traits) <- colnames(species)

RLQ analysis operates on the matrix RLQ, which is calculated as RD_site L D_species Q, where D_site and D_species are diagonal matrices of the row and column weights from the species matrix. As shown below, this is the same as R’ P Q, where P is the centered probability matrix

First, we have a site by species matrix, N, of raw abundances

N <- species

Convert N to a relativized species matrix P, where p_ij = n_ij / n++, where n++ is the total number of  individuals (sum of the entire N matrix)

P <- N/sum(N)

Now divide each observation by its row weight (p_i+) and column weight (p_j+). The row and column weights are simply the sum of the observations in a row divided by the sum of the matrix (and similarly for columns). Once done, this gives p_ij = p_ij/(p_i+ p+j+)

row.w <- apply(P, MAR=1, function(x) sum(x)/sum(P))
col.w <- apply(P, MAR=2, function(x) sum(x)/sum(P))

P <- sweep(P, MAR=1, row.w, '/')
P <- sweep(P, MAR=2, col.w, '/')

Next, subtract 1 from each observation, givin p_ij = p_ij(p_i+ p_j+) – 1, which equals (p_ij – p_i+p_j+)/(p_i+p_j+).

P <- P-1

This IS the chi-distance matrix used in correspondance analysis. You can verify this by checking the table from the dudi.coa function. They are the same.

However, we only want the centered matrix, we need to remove the weights in the denominator. Create diagonal matrices D_site and D_species of the row and column weights respectively. Then pre- and post-multiply the matrix P. This will yield a matrix L, where l_ij = p_ij – p_i+ p_j+

D_site <- diag(row.w)
D_species <- diag(col.w)

L <- D_site %*% as.matrix(P) %*% D_species

Now make the R’LQ matrix. First, center and standardize the columns of R and Q. The center is taken as the WEIGHTED average where the weights are the row weights (for the environment matrix) and species weights (for the species matrix).

# Calculate the weighted average for each trait and site
traitAvg <- apply(traits, MAR=2, function(x) sum( x*col.w )/sum(col.w) )
envAvg <- apply(env, MAR=2, function(x) sum(x*row.w)/sum(row.w))

traitCent <- sweep(traits, 2, traitAvg)
envCent <- sweep(env, 2, envAvg)

Calculate the weighted standard deviation. Since the values are now in deviations from the mean, the weighted variance is sum(x^2w) / sum(weights), and the standard deviation is the square root of this.

traitSD <- apply(traitCent, MAR=2, function(x) sqrt(sum(x^2 * col.w)/sum(col.w)))
envSD <- apply(envCent, MAR=2, function(x) sqrt(sum(x^2 * row.w)/sum(row.w)))

traitScale <- sweep(traitCent, MAR=2, traitSD, '/')
envScale <- sweep(envCent, MAR=2, envSD, '/')

R <- as.matrix(envScale)
Q <- as.matrix(traitScale)

Next, V is just the R’ L Q product. This is actually the correlation matrix between the environment traits and the species traits, mediated by species abundances.

V <- t(R) %*% L %*% Q
round(V, 3)

This is identical to the matrix operated on by the rlq() command. You can check this with by examining the table ($tab) returned by the rlq() function.

Next, get the cross-product matrix because the correlation matrix is not guaranteed to be either square or symmetric

Z <- crossprod(V, V)

The rest is a standard PCA-like eigen decomposition of <strong>V</strong>.

eigVals <- eigen(Z)$values
eigVecs <- eigen(Z)$vectors


traitLoad <- data.frame(eigVecs)
rownames(traitLoad) <- colnames(V)
colnames(traitLoad) <- paste('Axis', 1:length(eigVals))

envScores <- data.frame(V %*% eigVecs)
names(envScores) <- paste('Axis', 1:length(eigVals))

This gives the biplot: