key: cord-0315689-8l3uy37i authors: Crowell, B. title: From treadmill to trails: predicting performance of runners date: 2021-04-05 journal: bioRxiv DOI: 10.1101/2021.04.03.438339 sha: a16a829516f342827d133a0ab63a0028a2518bd5 doc_id: 315689 cord_uid: 8l3uy37i Previous laboratory studies have measured the energetic costs to humans of running at uphill and downhill slopes on a treadmill. This work investigates the extension of those results to the prediction of relative performance of athletes running on flat, hilly, or very mountainous outdoor courses. Publicly available race results in the Los Angeles area provided a set of 109,000 times, with 2200 runners participating in more than one race, so that their times could be compared under different conditions. I compare with the results of a traditional model in which the only parameters considered are total distance and elevation gain. Both the treadmill-based model and the gain-based model have some shortcomings, leading to the creation of a hybrid model that combines the best features of each. Author summary Running a race on a road allows absolute measures of performance. Trail running, however, has traditionally been thought of as a sport in which the only valid comparison is between different runners competing on the same course on the same day. Even the exact measurement of distance is considered to be unimportant, since courses and conditions vary so much. An extreme example is the relatively new genre of “vertical” races, in which runners race up a mountain. In a typical example, the competitors cover a horizontal distance of 5 km, while climbing about 1000 m. The winner in one such race had a time almost triple that expected for a state-champion high school runner in a 5k road race. Clearly no comparison can be made here without taking into account the amount of climbing. In noncompetitive contexts, many runners venture onto mountain trails, lightly dressed and with little equipment, so that it becomes important to be able to anticipate whether they will have the endurance needed to be able to safely complete a planned route. Again, this is impossible without some model of the effect of hill climbing. This paper presents a method for predicting relative performance on trail runs -28 "relative" meaning that we can predict the time for course A divided by the time for Traditionally, runners and hikers have described a trail using two numbers, the 31 horizontal distance and the total elevation gain. For example, if the route is an 32 out-and-back voyage consisting of steady climbing to a peak and a return, then the total 33 elevation gain is simply the elevation of the peak minus the elevation of the trailhead. If 34 the elevation profile of the trip consists of multiple clearly defined ascents and descents, 35 2021 1/13 then one adds up the ascents. Although this two-parameter description of the route is 36 easy to derive from a paper topographic map, knowledge of the two numbers is not 37 sufficient to make a very useful estimate of the total energy expenditure. 38 It has been known for a long time among the officials who measure road races that 39 the effect of elevation change has a nonlinear dependence on the grade. The following 40 argument was advocated by R. Baumel. [1] Consider a closed course whose elevation 41 profile is described by some function y(x). The derivative y is the trail's slope i. The 42 total energy expenditure is an integrated effect of the slope, of the form L 0 C(i)dx, 43 where C is a function that describes the energetic cost of running up or down a hill. We 44 will see that C has been measured in laboratory experiments, but for the moment we 45 assume only that C is a smooth function, so that for small slopes it can be well 46 approximated by the first few terms of its Taylor series, C(i) ≈ c 0 + c 1 i + c 2 i 2 . Then for 47 any closed loop over a distance L, the contribution from the c 1 term vanishes, and the 48 energy cost is c 0 L + c 2 L 0 i 2 dx. The dependence on the slope is therefore quadratic 49 rather than linear. For example, if we were to exaggerate the elevation profile by a 50 factor of 2, y → 2y, then the size of the c 2 term would go up by a factor of four, not two 51 (in the low-slope limit, on a closed course). From conversations with runners and hikers, I have found that the result of Baumel's 53 argument almost always elicits total disbelief, especially when presented as a numerical 54 example showing the extreme smallness of the slope effect when the slope is small. One 55 of the goals of this paper is to test this empirically. As an alternative hypothesis, it is 56 commonly believed that one can get a good measure of the relative energy cost by 57 taking the horizontal distance and adding in a term proportional to the total elevation 58 gain. If the total gain is determined down to a fine enough scale (which with modern 59 technology has become more practical), then this hypothesis is equivalent to the 60 assumption that the cost of running is given by a function of the form whose graph is shaped like a hockey stick (dashed line in Fig 1) . Popularly proposed 62 rules are that 100 m of elevation gain is equivalent to either 400 m or 800 m of 63 horizontal distance, so that c g is said to be approximately in the range from 4 to 8. There is nothing mathematically impossible about this hypothesis. A function C(i) of 65 this form evades Baumel's argument because its hockey-stick shape is not smooth at 66 i = 0, and therefore cannot be approximated by its Taylor series. In a more sophisticated approach, Minetti et al. [2] have used oxygen consumption 68 to measure the energy expenditure of runners on a treadmill at slope i, for both running 69 and walking. The results are expressed as C = (1/m)dE/ds, where m is the person's 70 body mass, E is the energy expended, and ds is the increment of three-dimensional 71 distance, which usually differs negligibly from the increment of horizontal distance d . C has units of J/kg · m. The correctness of the factor of 1/m has empirical support. [3] 73 Efficiency varies by ∼ 25% even among elite athletes, [2] [4] and differences are also 74 to be expected between elite and recreational athletes. This is one of the reasons why 75 this study presents a comparative technique, rather than an absolute method for 76 determining a particular runner's actual energy expenditure in units of kilocalories. approximation to the curve found by ref. [2] is used (Appendix 1), and is referred to as 82 C t , where "t" stands for "treadmill." To test these models, I use publicly available race results from the Los Angeles area. This area has a large population and tall mountains. The large population makes it 89 possible to pick out a significant number of runners who have competed in several 90 different races. If the ratio of the runner's time on courses 1 and 2 is t 2 /t 1 , then we take 91 this as a measure of the ratio E 2 /E 1 of the energy expenditure, which can be compared 92 with the model. It was possible to find courses with a variety of elevation profiles, 93 allowing a test of the dependence of the predictions on the amount of hill climbing. Table 1 lists the races used as sources of data. One-letter mnemonics are defined so 95 that courses can be referred to succinctly in the text. Because a runner's performance 96 can change over time due to training and aging, the time period of the study was 97 restricted as much as possible to January 2017 through March 2020 (before the COVID 98 epidemic ended races other than virtual ones in California). Distance and elevation data 99 were analyzed as described in Appendix 3. Runners' names and times were obtained by web-scraping public race results, and 101 runners were assumed to be the same person if their first and last names matched. When a runner ran the same race more than once, their best time was used. To avoid 103 biases in comparisons of times in different races, it is necessary here to define an upper 104 limit on the times that will be used from a given race, and to do so in some consistent 105 and unbiased way. Some such limit is in any case defined by race organizers, but is 106 different for different races and usually quite long, often about 4-5 hours for a 107 half-marathon. Competitors who clock the longer times are generally either walking the 108 entire race or alternating between walking and jogging, and especially in more casual 109 races may be pushing a stroller, running alongside their tween-age child, or staying in a 110 costumed group for fun and emotional support. Because the physiological data and 111 models used in this work are not applicable to walking, I impose a somewhat arbitrary 112 time limit of 2.5 hours on half-marathon times. These limits, as well as others, where 113 imposed, are described in the notes in Table 1 . For course S, the time limit was derived 114 by scaling down the half-marathon time limit in proportion to the distance. The other 115 courses in this study are of a qualitatively different character, so for them I simply used 116 the race organizers' cut-off. The resulting bias is an inherent limitation of this work. Exertion depends most strongly on distance, and the goal of this work is to tease out 118 effects from other factors, which are often weaker. For this reason, distance is a 119 confounding variable in this study and has been controlled for as much as possible by 120 using races at a consistent distance, the half marathon (21.1 km), or distances that, 121 taking extreme climbing into account, result in similar times. These are the distances at 122 which the largest sample sizes are available for mountain trail races. Section 2.2 123 describes how the remaining inevitable variations in distance have been taken into 124 account, as much as possible. In table 1, two measures of hilliness are given. The total elevation gain is the only 126 parameter needed in order to calculate an energy expenditure using the function C g . The next column gives a statistic I will refer to as the climb factor, CF, which is defined 128 as the fraction of the runner's total energy expenditure that is devoted to climbing. That is, if E is the actual energy required for the course, and E 0 the energy that would 130 have been required if the race had been perfectly flat, then Inverting this equation known, a measure of effort can be found by dividing the distance by 1 − CF . To define quantitative tests of the models, consider a comparison of courses 1 and 2. 134 The observed data are the runner's times t 1 and t 2 , and the model predicts the ratio of 135 the energy consumption For small errors, E is approximately the relative error in the prediction, expressed as a 137 percentage. The use of the logarithm transforms multiplicative sources of error into 138 additive quantities. 139 We pick a feature of the model that is to be tested. For example, we would like to 140 see whether the model does a good job of predicting the relative times for flat races 141 April 5, 2021 4/13 compared to steep uphill-only races (Fig 3, c) . For this example, we make a list of 142 courses that are relatively flat (P, C, H, and I), and a list of some that are steep 143 uphill-only courses (B and V). We then find every case where the same runner did a run 144 j from the first list and a run k from the second, and compute the error E jk , which will 145 be positive if the runner's time in the uphill race k is overpredicted by the model 146 relative to their time in the flat race j. versus distance. The model is essentially a simplification of the one constructed by 162 Rapoport, [7] with modifications to suit these purposes. First we compute an equivalent distance d, which is the distance of flat running that 164 would require the same energy expenditure as the actual run. If the runner's time is t, 165 then v = d/t has dimensions of speed, but is in fact a measure of energy per unit time, 166 or power. We then have where P is the power and is a measure of the runner's efficiency. For example, a 168 recreational runner with a slight roll of belly fat will have a lower value of because of 169 the increased energetic cost of transporting the additional body weight. Although it 170 would seem that we are now introducing an individualized parameter , the model is 171 designed so that at the end of the calculation, cancellations occur that allow κ to be 172 predicted on a universal basis. The power P depends on aerobic fitness and on the proportions of fat and 174 carbohydrates being burned in aerobic metabolism. Fat burning is slower than 175 carbohydrate burning by a factor β ≈ 0.4. [7] If we let f be the fraction of energy 176 production from carbohydrates, then where the proportionality constant A is another per-individual parameter that it will be 178 possible to normalize away later. This expression's linearity in f is an approximation to 179 results from real-world data that provide evidence for slightly nonlinear behavior. [7] 180 The runner's supply of carbohydrates c is limited by the amount of glycogen that 181 can be stored in the liver and the leg muscles. If f is chosen optimally, then there will 182 be some distance d c = c that can be run with pure carbohydrate fuel, while longer 183 distances will require f < 1. Thus, April 5, 2021 5/13 Under these assumptions, the runner's speed will be the same in races at all distances 185 less than d c , which is unrealistic. We will first work out the consequences of Eq 4-6 and 186 the introduce a simple elaboration that more realistically reproduces the effects of 187 fatigue. Solving Eq 4-6 and expressing κ as a correction factor relative to the short-distance 189 maximum speed v m = A , we find This depends on the universal parameter β ≈ 0.4 and also on the critical distance d c . The latter is a measure of endurance and does depend on individual factors such as 192 body composition and training, as well as on strategies such as carbohydrate loading. However, for the sample of recreational athletes studied here, I hypothesize that one can 194 fix a universal value of d c lying somewhere around the half-marathon distance, and find 195 a reasonable description of real-world data. 196 Fig 2. Relative speed versus equivalent distance d. All speeds are normalized relative to the speed at half-marathon distance. The black curve is the function defined by Eq 8, with d c set to a half-marathon distance. The red curve is a fit to world-record times. [10] The green and red violin plots show the distribution of speeds in races S and G relative to the same runners in half-marathon race P (sample sizes 1303 and 11, respectively). The gray dots are the author's personal-record times from a variety of courses. The equivalent distances were determined from the horizontal distances using the curvilinear function C t (i) in Eq 10, which is based on treadmill data. It is not true in reality that runners can maintain the same pace at any of the 197 distances below d c , for which glycogen suffices. As the distance increases from 5 km to 198 the half-marathon distace of 21 km, one observes a decrease in speed which, as 199 originally observed by Hill, [6] appears linear on a graph of speed versus the logarithm 200 of distance. In the men's and women's world-record times, this decrease is about 5%. The graph then shows a knee, like the one described by Eq 7. The more gradual 202 decrease for distances before the knee is generically described as being due to fatigue, 203 which is a complicated and poorly understood phenomenon involving a variety of 204 factors, many of which are mediated by the central nervous system rather than by any 205 April 5, 2021 6/13 change at the chemical or tissue level. As an ad hoc correction, we multiply the result of 206 Eq 7 by a factor controlled by a small parameter Q: The factor of 3 is introuced so that Q is approximately equal to the reduction in speed 208 between a 5k and a half-marathon, and we set Q = 0.05. Empirically, for the mostly recreational runners studied here, a reasonable 210 description of the data is achieved when d c is set to the half-marathon distance, which 211 is the value adopted in this work. Fig 2 shows that setting d c to half-marathon distance 212 gives a good fit to some real-world data. test a particular feature of these models. We discuss each in turn. (a) Here we compare the extremely flat half-marathon I, having only 90 m of 233 elevation gain, to half-marathon P, which is slightly more hilly with 170 m of gain, or 234 about twice as much. According to the treadmill-based model C t , the effects of climbing 235 and descending nearly cancel out, giving a negligible CF < 1% for each run, as 236 expected from Baumel's argument. In the gain-based model C g , however, the effect of 237 the hills on course P is 6 times its elevation gain, which is equivalent to adding 1.0 km 238 to its length. The effect for I would be half as much, causing the model to predict a 239 considerable difference in the times on the two courses. In the figure we see that 240 Baumel's approximation is a good one here. The median error for C t (open circles) is 241 only 1.7%, while that for C g (filled circles) is +6.4%, the positive sign showing that the 242 effect of the small hills is over-predicted. Of the four tests a-d, this is the only one where the effect being probed is small 244 enough to require statistical analysis rather than simple visual inspection. Such an 245 analysis (Appendix 4) show that systematic error in C g is significant (p = 3 × 10 −6 ), 246 while any such evidence against C t is statistically marginal. Tests of the predictions of the functions C t derived from treadmill data (open circles), C g based on eleveation gain (black circles), and the hybrid "recreational" model C r (gray circles). Positive E means that the runner's time in the first-listed race is greater in reality than in the model. |i| ≈ 0.10 to 0.15. A likely interpretation is that on the uphills, C g is an underestimate 253 (see c, below), while on the downhills C t is an underestimate. The race is run on a trail 254 that is mostly a narrow single track, with steep hillsides on the climber's right. Safety is 255 likely to inhibit many runners from going downhill at anything like the pace that would 256 be possible for the elite mountain runners in ref. [2] on a treadmill, and trail etiquette 257 dictates that they yield the right of way when encountering people who are still on their 258 way up. systematically underestimated the difficulty of the uphill races (E < 0), the 264 underestimate is far more severe for C g than for C t . Course B consists almost entirely 265 of climbing on grades 0.05 < i < 0.25, at which C g is less than C t and is apparently a A graphical summary of the interpretation of the systematic errors in the models C t and C g , observed in parts a, b, c, and d of Fig 3. Portions of the model that are interpreted as being inaccurate are circled and labeled with the test that provides the evidence for the inaccuracy. The interpretation is more conjectural for b, since both models had similar errors, but hypothetically for different reasons. The solid line is the treadmill-based function C t and the dashed one C g . The dotted line is a modified version C r of the treadmill function, defined in Eq 9 and found here empirically to be more appropriate for recreational runners in real-world trail conditions. The observations mainly support the model C t , except that as downhill grades get 286 steeper and steeper, it appears that most recreational runners in real-world conditions 287 reach a point of diminishing returns far earlier than is the case in treadmill studies. The 288 results for course X, which has an average i ≈ −0.05, suggest that this point of 289 diminishing returns is reached at relatively small negative slopes, perhaps i ≈ −0.03. April 5, 2021 9/13 The factors causing this effect probably have as much to do with safety and etiquette as 291 with physiology, so that they cannot be quantified in any universal way. However, it 292 would be irresponsible to provide runners, especially recreational athletes, with scientific 293 advice that would give an unrealistically rosy picture of the difficulty of a run. Table 2 so as to shift the minimum of the function up and to the right. This 298 was unsuccessful, because the smooth analytic character of Eq (10) makes it impossible, 299 by varying its parameters, to dramatically modify the function's behavior for 300 −0.06 i −0.03 while retaining its apparently correct behavior at −0.03 i 0. A 301 more successful ad hoc recipe was simply to introduce a cut-off in C, i.e., to define a 302 "recreational" version of the function, It is convenient to describe the function C(i) using a fit to the form where the subscript t stands for treadmill. Parameters fitted to the results of ref. [2] are 322 given in Table 2 . The purpose of using this form, rather than the polynomial fit given 323 by [2] , is to make the computations degrade gracefully in cases where the limitations of 324 GPS tracks or data from digital elevation models produce unrealistically steep slopes. In such cases, this expression approaches the physiologically expected asymptotic 326 behavior. Although the present work focuses only on running, parameters for walking 327 are presented as well. The results for running are empirically found to be nearly 328 independent of speed, whereas the ones for walking are not. For walking, ref. [2] 329 measured the energy consumption at the speed that was found to be most efficient for 330 that particular subject. 331 Table 2 . Parameters for Eq (10) . These parameters were found by constraining Eq 10 to agree with the polynomial fits in ref. [2] on the following degrees of freedom: the function is minimized at the same i, and has the same value of C there; the functions agree at i = 0. Furthermore, the slopes at ±∞ were constrained to have the asymptotic values found in that work. Cameron [10] has given a convenient closed-form approximation to world-record speeds 333 of runners at various distances, This is shown as the red curve in figure 2 . The parameters are given in Table 3 335 Table 3 . Parameters for Eq (11), for d in meters. Digital maps projected into a horizontal plane were obtained from the race organizers' 337 web site or in some cases by tracing roads and trails in a Google Maps application. Elevation data were obtained from publicly available digital elevation models (SRTM1) 339 having a horizontal resolution of 30 meters. (Elevation data from handheld GPS/GNSS 340 units are more difficult to obtain from public sources, and are in any case of 341 questionable reliability for this purpose, since the uncertainty can be very large when all 342 satellites are near the horizon or when the terrain is rough, causing radio echoes from 343 the walls of canyons.) The use of these data is inherently subject to certain errors, which need to be 345 minimized. Trails and roads are intentionally constructed so as not to go up and down 346 steep hills, but the DEM may not accurately reflect this. The most common situation 347 seems to be one in which a trail or road takes a detour into a narrow gully in order to 348 maintain a steady grade. If the gully is narrower than the horizontal resolution of the 349 DEM, then the DEM doesn't know about the the gully, and the detour appears to be a 350 steep excursion up and then back down the prevailing slope. Empirically, I have found that sensitivity to these effects can be minimized if the 352 elevation profile of the run y(x) is filtered by convolving it with a rectangular 353 windowing function having width w = 200 meters. This tends to eliminate unrealistic 354 glitches in the elevation data, and also seems to give a fairly close reproduction of race 355 organizers' estimates of total elevation gain. This choice of w gives sane results for 356 routes in mountainous terrain, and is used throughout this work, even for flat courses 357 on city streets. For a course that is relatively flat and has many small, short hills, 358 w ≈ 60 m gives more accurate results, but I have used the larger value of w throughout 359 this work in an effort to maintain consistency. The mileage derived from a GPS track can vary quite a bit depending on the 361 resolution of the GPS data. Higher resolution increases the mileage, because small 362 wiggles get counted in. This has a big effect on the energy calculation, because the 363 energy is mostly sensitive to mileage, not gain. For races that were advertised as 5k or 364 half-marathon races, I have therefore used the advertised distance, as shown in Table 1 , 365 in order to calculate the first-order estimate of the energy, but have used the elevation 366 gain and CF value derived from the actual GNSS data. Appendix 4: Statistical analysis 368 In section 3, test (a) probes an effect small enough that visual inspection of the scatter 369 plots is not a satisfactory way of testing hypotheses. Specifically, we want to know 370 whether the apparent systematic error in the model C g is statistically consistent with 371 zero. 372 We do not know a priori the underlying probaility distribution of the ratio of times or 373 of its logarithm E. One might have expected based on previous work [5] that the times 374 would be log-normal, in which case E would be normally distributed. However, a Q-Q 375 plot shows that this is not the case for the present data-set, and in fact the distribution 376 of E is asymmetric. The ratio of times, however, has a symmetric and leptokurtic 377 distribution. Its symmetry allows the use of the one-sample Wilcoxon test. For C g the 378 null hypothesis is rejected with p = 4 × 10 −6 , while for C t , p = 0.07. Thus the defect in 379 C g is significant, while any such evidence against C t is statistically marginal. The compilation of race times was derived from public sources and is itself publicly 384 available at https://github.com/bcrowell/trail. The code used to analyze the 385 data is contained in that repository and in https://github.com/bcrowell/kcals. All 386 code is under a GPL license. I have also used Zenodo to assign a DOI to the data: Hill effect to second order. Measurement News Energy cost of walking and 391 running at extreme uphill and downhill slopes The valid measurement of running economy in 393 runners The key to top-level 395 endurance running performance: a unique example Comparing and forecasting performances in different events of athletics 399 using a probabilistic model The physiological basis of athletic records Metabolic factors limiting performance in marathon runners Human running performance from real-world big data Computer generated track scoring tables Time-equivalence Model