Marathon Data Analysis Part 2: Testing Tanda

There have been numerous attempts to predict marathon performance based on physiological measurements and training data; Dr. Christof Schweining gives a good overview of these [1]. However, many of these models have been fitted to a very small data sample, and do not generalise well to new data.

Now we have access to a large data set of marathon runners and their training, we can test these models and quantify their predictive power. Unfortunately, beyond age and gender, we have no access to physiological measurements and so we are restricted to models based on training data. As discussed by Christof, the only such model which stands up beyond basic scrutiny is Tanda's prediction formula [2]. Unfortunately, as we will see, this model fails to generalise outside of its very small training data set.

Continue reading "Marathon Data Analysis Part 2: Testing Tanda"

Marathon Data Analysis Part 1: Initial Thoughts

The marathon season is well and truly upon us. Whether you are recovering from the Boston Marathon or one of the 40 000 gearing up for London this weekend, it is likely your running shoes are beginning to look somewhat the worse for wear.

Marathon running is not incomprehensibly complicated, and in the age of data it is surprising that no extensive study of the factors affecting performance has been carried out. Strava, a social network for athletes, collates detailed training data from a large number of athletes and while they provide a summary of some training data [1], the insights that can be gained from such are limited. The Guardian published a brief list of results that can be obtained from this data [2].

Knowing which training factors affect performance and to what extent they matter is useful for two main reasons. First, it can allow objective scientific design of training schedules and produce the best possible performance for an athlete subject to training time constraints and minimising injury risk. Moreover, if we can form an accurate prediction of an athlete's performance we can decide on an appropriate pacing strategy for race day and limit the chance of hitting the dreaded 'wall' [3]. Continue reading "Marathon Data Analysis Part 1: Initial Thoughts"