'Twas the night before BUCS: An algorithmic approach for predicting cross country performances.

'Twas the night before BUCS, when all through Gloucester
Not an athlete was drinking, not even one beer.
The spikes were stood by the door with care,
In hopes that some medals soon would be theirs.

The greatest day of the year is almost upon us: tomorrow is the British Universities (BUCS) Cross Country championships. Obviously being too excited to do any work I decided to see if it were possible to predict the results based on previous performances.

PowerOf10 provides a fantastic source of athletics data and a few Scrapy spiders later I had a large dataset to play with.

The race entry lists provided a list of names and cross referencing with PowerOf10 allowed me to obtain a complete set of historical race results. Unfortunately several names were insufficiently unique or misspelt (some universities were more prone to this mistake than others... no comment) which meant obtaining performances was impossible.

Based on analysis of every cross country race from 1st of January 2015, my algorithm predicted the following top 20 mens' team results based on a 6 to run, 4 to score system:

Position University Score
1 University of Birmingham 48
2 St Mary's University 58
3 Loughborough University 72
4 Oxford University 122
5 University of Cambridge 136
6 University of Leeds 188
7 University of Sheffield 207
8 Cardiff Metropolitan University 209
9 University of Edinburgh 238
10 Durham University 247
11 Leeds Beckett University (Carnegie) 293
12 Sheffield Hallam University 299
13 University College London 316
14 University of Stirling 329
15 University of Southampton 340
16 Cardiff University 368
17 University of Warwick 374
18 University of Manchester 414
19 University of Nottingham 427
20 University of Exeter 471

Pretty reasonable! The individual results started off strongly but seemed to give wilder predictions the further we progressed through the results. These can be viewed at the bottom. Names that don't appear may be insufficiently unique or misspelt in the entry list. Apologies to the ladies, I ran out of time to run the algorithm for them.

The three main failings of the predictions seem to be: excessively penalising athletes for poor performances, overly rewarding athletes for regular racing, and not taking into account results from the road or track.

How does the algorithm work?

We construct a weighted directed graph with a node for every athlete who has ran a cross country race since 1st of January 2016. If athlete A beats athlete B in a race we add an edge from B to A weighted by the percentage they were beaten by and exponentially decayed by how long ago the race was.

This graph contains the entire information set we are interested in. We then run Google's PageRank algorithm on this graph to rank the athletes from best to worse. To obtain our predicted finish order we select out those athletes on the start list for BUCS.

They are numerous tweaks and improvements I would like to make to this to improve it's predictive power but unfortunately I have to go and catch a bus to Gloucester to try and improve on my mediocre predicted performance of 143rd... maybe next year!

The top 100 individual predictions if everyone were to run in the same race:

Position First Name Surname University
1 Jonathan Davies University of Birmingham
2 Callum Hawkins University of the West of Scotland
3 Chris Olley Imperial College London
4 Richard Goodman St Mary's University
5 Henry Pearce Loughborough University
6 Graham Rush University of Gloucestershire
7 Alex Teuten University of Southampton
8 Alex Brecker University of Warwick
9 Jonathan Hopkins Cardiff Metropolitan University
10 Andrew Heyes University of Birmingham
11 Oliver Fox University of Cambridge
12 Jack Rowe St Mary's University
13 Maximilian Nicholls King's College London
14 William Christofi Oxford University
15 Peter Chambers University College London
16 Euan Gillham University of Edinburgh
17 Corey De'ath St Mary's University
18 Jack Gray University of Birmingham
19 Stuart McCallum University of Birmingham
20 William Fuller Loughborough University
21 Aidan Thompson University of Stirling
22 Steven Bayton University of Sheffield
23 Jake Wightman Loughborough University
24 Alexander Goodall Loughborough University
25 Ellis Cross St Mary's University
26 Joshua Griffiths Cardiff Metropolitan University
27 Alex Howard Oxford University
28 Tom Austin St Mary's University
29 George Duggan Loughborough University
30 Mark Pearce University of Birmingham
31 Cameron Field Durham University
32 Jack Douglas Coventry University
33 Andrius Jaksevicius Kingston University
34 Richard Horton Loughborough University
35 Michael Ferguson University of Aberdeen
36 Rowan Preece St Mary's University
37 Lloyd Heckler University of Birmingham
38 Aidan Smith Oxford University
39 Michael Callegari St Mary's University
40 James McMurray Loughborough University
41 Phil Sesemann University of Leeds
42 Paulos Surafel St Mary's University
43 George Gathercole University of Cambridge
44 Kelvin Gomez University of Cambridge
45 Daniel Wallis Loughborough University
46 Matthew Arnold St Mary's University
47 Bertie Houghton University of Sheffield
48 James Parkinson Oxford University
49 Scott Halsted St Mary's University
50 Phillip Crout University of Cambridge
51 Richard Powell University of Leeds
52 Tim Faes Durham University
53 Victor Mound University College London
54 Ben Alcock University of Birmingham
55 Jonathon Roberts University of Southampton
56 Elliott Dorey St Mary's University
57 Patrick Dever Loughborough University
58 Tom Hook St Mary's University
59 Brad Wattleworth University of Birmingham
60 Ben Bradley St Mary's University
61 Joshua Woodcock-Shaw University of Leeds
62 Ian Crowe-Wright University of Birmingham
63 John Sanderson University of York
64 Scott Stirling University of Edinburgh
65 Jordan Bell Sheffield Hallam University
66 Patrick Roddy University of Cambridge
67 William Ryle-Hodges Oxford University
68 Ben Marriott University of Leeds
69 Jayme Rossiter Loughborough University
70 Benjammin Priddle St Mary's University
71 Edward Dudgeon Loughborough University
72 Tom Bains Leeds Beckett University (Carnegie)
73 Harry Powell University of Hull
74 Michael Harrison University of Bedfordshire
75 Josh Carr University of Cambridge
76 Andrew Wright Cardiff University
77 MAtthew Sheen Leeds Beckett University (Carnegie)
78 Ben Westhenry University of Bristol
79 John Ashcroft University of Leeds
80 Jonathan Tobin University of Sheffield
81 Niall Holt Cardiff Metropolitan University
82 Christian Von Eitzen St Mary's University
83 Luke Penney St Mary's University
84 Josh Bull University of Nottingham
85 Robert Bough University of Birmingham
86 Adam Speake BPP University Ltd
87 Sebastian Anthony Loughborough University
88 Andrew Headley University of Kent
89 Michael Ellis Sheffield Hallam University
90 Luke Cotter Oxford University
91 Anthony Haynes St Mary's University
92 Calum Upton University of Sussex
93 MacGregor Cox University of Cambridge
94 Jack Millar University of Nottingham
95 Miles U Unterreiner Oxford University
96 Jack Crabtree St Mary's University
97 Connor Milnes Sheffield Hallam University
98 Fergus Roberts University of Stirling
99 Callum Charleston St Mary's University
100 Nathan Marsh University of Leeds

One Reply to “'Twas the night before BUCS: An algorithmic approach for predicting cross country performances.”

  1. Quality material Mr Mycroft!

    Let's see what happens tomorrow. If your racing is half as good as your mathematics I expect you to feature highly...

Leave a Reply

Your email address will not be published. Required fields are marked *