Because I have a technical mind, I believe that “everything around us is numbers” © Numb3rs.
In this case, I decided to investigate a performance of my marathon preparation and try to estimate the full time for the full marathon run before it happens, using my Strava tracked data.
Strava Runner Profile | Vladimir S.
To achieve that, I decided to create a Python script to calculate an estimate. Python is a programming language that is good for doing some mathematical calculations and is easy to learn.
Linear prediction
First of all, I used basic human logic to find that the very brief estimate could be easily calculated using just a linear function. Said and done.
The code above is pretty simple from a logic perspective. Let’s assume that we have a constant pace for the whole race, the same as any of the previous runs. In this case, the formula is a basic multiplication pace to the desired distance.
Predicted time = Desired distance * Pace
Exploring my running statistics, I found that Strava calculates not only a pace but also a GAP, which is Grade Adjusted Pace. Taking this into account, the linear formula with Pace and GAP gives us a brief estimate of the fastest and probably slowest time, assuming that the actual race is flatter than my usual runs and has a bit faster pace as well.
Well, using a linear formula gives us boundary values for the estimation. Not bad, but still not good enough. When I tried to use it with some of my first preparation runs, it gave me very distant values between 3.5 and 4.5 hours.
I expected to have more precise values as a result, so I started to explore other possible formulas to calculate a time prediction. After some time, I found a better formula called Pete Riegel formula.
Pete Riegel formula prediction
In a 1977 article for Runner’s World Magazine, Riegel proposed a simple formula for comparing relative performances at different distances. The formula is most commonly quoted as:
Predicted time= T1 * (D2 / D1)^C
- T1 is the time achieved for D1
- D1 is the distance over which the initial time is achieved
- D2 is the distance for which the time is to be predicted
- C is the pace degradation coefficient, from 1.06 to 1.10
Using this formula gives more precise values for the estimated time, however, it is still using two boundary values with a degradation coefficient 1.06 for faster time and 1.10 for the slowest one.
Exploring my running statistics again, I found that Strava provides with elevation information. In this case, taking into account the value of elevation for the Rotorua Marathon race, I assumed that it might help me to calculate a more precise pace degradation coefficient for a race.
To achieve that, I created a code to calculate a grade based on elevation and distance and a code to calculate the coefficient by grade. I assumed that a 0% grade could represent the lowest value of the coefficient and 3% is the highest one.
As a result, I received a coefficient of around 1.077, which represents a low-medium difficulty for the Rotorua Marathon race.
In a nutshell, with a combination of the Pace, GAP and degradation coefficients, I now have estimations with a different confidence level. I created a simple web page (using Google Charts) with a graphic that shows a visualisation of the script estimation results. It looks like the image below.
Interactive web page (https://thesun2003.github.io/marathon-prediction/)
Well, if you check the graphic above, you can see that there is a trend to run faster. I used data from my first 25 preparation runs.
Let us take a closer look. In the beginning, the fastest predicted time is a Linear GAP time with 03:53:28 which is sub 4 hours, yay! However, all other predictions are more 4 hours with the slowest Riegel prediction with the highest coefficient 1.10 is 04:49:24, ooh. The main two lines I believe, show the time between 04:19:11 to 04:32:47. This is still satisfactory but far from what I expect from my actual marathon race.
The 25th run shows a faster time, from 03:33:47 to 03:49:27, which is a great prediction for me. However, this run could be less or more accurate only because that run was on a treadmill. The run was almost flat and fast-paced.
In the meantime, if you look closer to a 16th run, then you can see that the fastest time is 03:22:32 and the slowest is 04:04:40. There was a morning run on the street for 40 minutes with really fast pace 04:56.
All in all, I believe I managed to find some fun in the marathon preparation as well as create a helpful tool to research my performance data. Moreover, I showed that it might be interesting to treat yourself as a resource of data for analysis.
At the end of this retrospective session, I agreed that I have a good tendency to increase my pace, and I expect to achieve my second goal to run a sub 4-hour marathon.
You can find the source code by in my GitHub by the link below.
thesun2003/marathon-prediction
The next retrospective session is scheduled to be at the end of the project, which means after I run a marathon. I believe it will be interesting and fun. See you then!
If this article was helpful or interesting please hit the clap button and feel free to share it . I’ll be sure to deliver more articles in the weeks to come.
Top comments (0)