#### Walker Harrison

By day, I help out at dev.to. Email me at walker@dev.to with any concerns or questions. By night, I blog about data, both on this site and my Medium publication, perplex.city.

*I'm a data blogger and a statistics student who likes to run experiments on his hometown of New York City. This post introduces the mathematical concept of memorylessness and then runs some R code on transit data published by New York's MTA to try finding that concept in the wild.*

In the realm of probability theory, there’s a spooky concept called *memorylessness* that characterizes certain scenarios. Anytime the chances associated with an outcome are unchanged as time or trials go by, your situation can be described as memoryless. In other words, whatever has happened does not affect what will — the past is always “forgotten.”

This idea is a lot easier to understand with an example, and for that we’ll call upon probability’s poster child: flipping coins. If you flip a coin a thousand times, the probability of getting heads on any one flip is 50%, irrespective of what has already occurred. So even if you miraculously flipped a thousand straight heads, the chances of you flipping one again is still 50%, and hence the scenario is memoryless.

This might seem obvious, but an entire phenomenon has been named after humankind’s inability to grasp this truth in certain environments. Indeed the Gambler’s Fallacy, which describes the mistaken belief that past events must be balanced out somehow in subsequent developments (like a gambler thinking he’s due for a windfall), is a common pitfall for everyone from teenagers unsettled by a string of consecutive lettered answers on a test to investors clinging to a declining stock.

The continuous version of memorylessness, whereby events are not discrete happenings like flips of coins but can occur at any time, is a little trippier. Continuous, memoryless processes describe situations where probabilities conditional on time passing are the same as if that time had not passed. These usually describe “waiting” for something that has no guarantee of happening in a certain timeframe.

One of statisticians’ favorite examples is waiting for a bus. A memoryless wait for a bus would mean that the probability that a bus arrived in the next minute is the same whether you just got to the station or if you’ve been sitting there for twenty minutes already.

Another way to think of this is that the probability of waiting any amount of time for the bus to come is equivalent to waiting that time in addition to a pre-established duration: the chances of a bus getting there in 2 minutes are the same as it getting there in 7 minutes given that 5 have already passed, or getting there in 102 minutes given that 100 have. In that sense, **you can’t earn better odds just by sitting there.**

As it turns out, only the exponential distribution can describe such a setup in the continuous case. Graphically, the memoryless property can be demonstrated by focusing solely on the portion of a density curve to the right of a certain point — i.e. how long you’ve already waited.

You might profess that the curve is lower there, so the odds of waiting longer are smaller if you have already waited some. But our new sample space is also smaller, since several minutes have already passed, requiring us to expand the curve upward, or normalize it so that the underlying area is one. When we do so, we realize it has assumed its original shape, just shifted over:

The bus example is memorable (no pun intended) because it so effectively captures the futility of waiting. When you’re sitting idly with no knowledge of when the next bus might come, you start to wonder if the chances it comes have actually improved since you got there. The exponential distribution provides the sad answer to that question: no!

**But does the arrival of actual buses follow that same memoryless distribution?**

We can test this proposal using data published by New York City’s transit authority, which tracks buses as they complete their routes throughout the day. I'm going to pick the stop at 96th Street and 5th Avenue served by the crosstown M96 bus, since it's a personal favorite.

First we must combine all the daily files into one giant 2016 data frame in R:

```
# upload first day (Jan 1 2016). Station ID number is 404087
bus.all <- read.csv("bus_time_20160101.csv")
bus.all<- bus.all[bus.all$next_stop_id==404087,]
bus.all$timestamp <- as.POSIXct(strptime(bus.all$timestamp, "%Y-%m-%dT%H:%M:%SZ")
# create vector of all other dates (Dec 29 missing for some reason)
dates <- c(20160102:20160131, 20160201:20160229, 20160301:20160331,
20160401:20160430, 20160501:20160531, 20160601:20160630,
20160701:20160731, 20160801:20160831, 20160901:20160930,
20161001:20161031, 20161101:20161130, 20161201:20161228,
20161230:20161231)
# cycle through all of them, reading the csv and binding it to bus.all
for (i in dates){
file <- paste ("bus_time_", as.character(i), ".csv", sep="")
bus_trans <- read.csv(file)
bus_trans <- bus_trans[bus_trans$next_stop_id==404087,]
bus_trans$timestamp <- as.POSIXct(strptime(bus_trans$timestamp, "%Y-%m-%dT%H:%M:%SZ"))
bus.all <- rbind(bus.all, bus_trans)
}
```

Now we have logs of every bus that was headed to the examined stop on 96th Street, since we selected only rows with the required `next_stop_id`

.

There's one twist though: the MTA doesn’t record exactly when buses arrive at stops but rather logs what each bus’s next stop is based on periodic pings. That means multiple rows with different timestamps will exist for the same instance of a bus approaching our stop. So in lieu of an exact arrival time, we'll have to settle for the final ping that a bus sent before it's next stop updated to the subsequent station:

```
# order buses by vehicle id (already ordered by time)
bus.all <- bus.all[order(bus.all$vehicle_id),]
# add column showing time difference btwn pings
bus.all$diff <- c(601, diff(bus.all$timestamp))
# find index for last ping before arrival
# by subtracting 1 from index of first ping in string of pings going to stop
index <- which(bus.all$diff<0 | bus.all$diff>600)
index <- index - 1
bus.all.real <- bus.all[index,]
# make a new column for the wait between buses in minutes
bus.all.real <- bus.all.real[order(bus.all.real$timestamp),]
bus.all.real$wait <- c(0, diff(bus.all.real$timestamp)/60)
```

Now that we've created a value that closely estimates the wait between buses, we can graph its distribution. In total, over 60,000 buses came and went from the stop in 2016, or about one every nine minutes:

Pretty close to our model! The graph has two major discrepancies from the ideal one from earlier, but both have plausible explanations. First off, in a truly memoryless distribution, the shortest time (our “less than a minute” bucket) would have the highest probability. But in the above distribution, such waits are actually less frequent than other short-time buckets like two or three minutes.

The reason behind this may be that bus drivers try to give the buses in front of them some more time and space if they realize they are too close. So if drivers are intentionally slowing down to avoid tailgating their predecessors, some of the waits that would have naturally taken less than a minute might take slightly longer.

The other divergence from the exponential model occurs much further down the graph near the 40-minute mark, where a modest but noticeable bump occurs. This might indicate that at a certain point (probably in the dead of night), the MTA falls back on a schedule that is more reliable albeit much less frequent. In that sense, the overall shape is probably actually a combination of two: an exponential distribution during waking hours and a modest normal distribution during the graveyard shift.

Given that you’re probably not about to catch the 3AM M96 across town, an exponential or memoryless distribution still describes the waiting process pretty accurately. So next time you’re headed for the bus, remember this blogpost as a guide. Or, keep with the theme, and forget all about it…

Doesn't this assume that the expected time never actually reaches zero? The far right hand side of the "exponential" graph is for night/holidays/whatever where the time between busses will be much longer than "normal", but it isn't infinite and there isn't a tiny chance that if you wait for a week you still won't see a bus.

This is obviously much easier to reason about in common situations where you know the bus is scheduled to arrive every 10 minutes, and you've waited 9, the chance you'll see one within 2 minutes is obviously much higher now than it was when you arrived.

Yes the exponential graph never touches zero, which seems impossible but makes more sense if you start to imagine the

extremelyimprobable events that might cause a bus to take a week to arrive.A grim example: if New York City is bombed, it might take a long time for the transit system to get back up and running, in which case waiting a week for a bus is conceivable. That's very unlikely, but that's also the point--the distribution is near zero for this length wait.

So even though nothing like that has happened (and hopefully never will), the idea could still be captured in the exponential distribution.

I challenge you to a game. You win when you flip a "fair" coin and get Heads.

You flip the coin...and get Tails.

But there's a loophole in the game rules. You win

whenyou flip Heads, but you can flip that coin as many times as you want until you get Heads. So you flip the coin, over and over, until you finally get Heads and win the game. As long as the game isn't rigged, there's always the possibility that you might get Heads on the next flip.But flipping a coin is a rather tedious process. You have to get a coin, move it up in the air, bring it back into the ground, and check whether it is Heads or Tails. What if you could flip multiple coins at the same time, and then check them all at once? There's no real penalty for flipping more coins than necessary, after all.

The odds of flipping one coin and getting at least one Heads is 50%.

The odds of flipping two coins and getting at least one Heads is 75%.

The odds of flipping three coins and getting at least one Heads is 87.5%.

The odds of flipping ten coins and getting at least one Heads is 99%.

Etc.

Improbable events can still occur (such as ten coins flipping tails), and you have to be ready for them, but the Law of Large Numbers suggest that the odds are in your favor...even if each individual flip is memoryless.

Waiting for the bus stop is like playing my game. Every minute that you wait, you're playing the "game", hoping that the memoryless bus will come and you "win". When you wait another minute, you're still playing. And if you decide ahead of time that you're willing to wait 20 minutes for a bus stop, you're merely precommitting yourself to playing the game for a certain period of time. Obviously you can win at any time, but you're not betting on winning on the first round. You're betting that you'll win before the 21st round. You're willing to keep playing

untilyou win (or your "waiting" budget runs out and it's better to take a taxi to arrive at your destination since that'd probably be quicker than to keep playing the game).It is not futile to wait. You just have to know the rules.

I agree with everything you've said -- the game you describe is an instance of the geometric distribution, which is memoryless and the discrete case analogue to the exponential distribution. And yes, committing yourself to waiting for 20 minutes as opposed to 5 or 10 will by definition expose you to a higher probability of the bus coming.

The point of my article, and the thing about memoryless that for me at least is fascinating and spooky, is that your odds never improve even after waiting. It's a counterintuitive idea -- you'd think that since a bus must be on the way, your chances should be improving with every passing minute, when in fact they are not. Personally, I find that pretty cool!