Thursday, October 1, 2015

Why doesn't Strava do proper interpolation? And a proposed solution!

Speculation first...

This is a follow-up to my previous post about Strava's segment timing, and since they are clearly aware that their method is crude and inaccurate, I'm wondering why they have chosen not to do anything about it. Here's my short list of how they might reason against it:

  1. Additional computational load on their servers (to do proper interpolation)
  2. GPS accuracy isn't good enough in the first place
  3. People losing KOMs after re-calculation would become upset, and would complain
  4. Most segments are long enough that the error will be small compared to the time spent in the segment
  5. They don't know how
Well, the last point was a joke, because it is rather simple to do this properly, and I can't imagine that their engineers are incapable of such a simple calculation. But still, I'll provide my proposed solution further down, just in case...

So, let's start at the top of the list. Computational load. Sure, if they were to queue a re-calculation of all the entries in the leaderboards for all the segments, then I can imagine that it would be quite expensive in CPU time. However, that's not really necessary, which I'll get back to shortly.

GPS accuracy. Everyone knows that GPS accuracy varies, both between devices and from environmental conditions. However, dynamic positioning (i.e. at speed) is usually better than static positioning, since the GPS unit measures the Doppler shift for the signals received, and can then better predict speed (and thus differential positioning). I will not claim to be an expert in this area, but suffice it to say that in my opinion, GPS accuracy is good enough for interpolation of segment start/finish crossings to make perfect sense.

People losing KOMs would become upset. Probably, but my suggestion (wait for it...) would minimize this impact, I think.

Segment length. Yes, the error introduced by Strava's current method is small compared to the total segment time for longer segments, but the absolute error is unaffected. This means that for segments where the fight for the KOM is fierce and efforts are close to each other, the current method can still cause leaderboards to have the wrong order, even if the segment is long.

A possible solution

The first thing for Strava to do would of course be to actually implement proper segment time calculation. To do this, virtual start and finish lines are needed. All you need for that is a position and a heading, and then use the given heading to create a line that is perpendicular to this, and which extends a suitable distance to both sides. The given position (which is one of the segment ends) will then be at the center of this line, like so:

Finish line illustration
The red line is the segment track, the green dot is at the segment end position, the green arrow is the heading/bearing of the segment track at the end point (i.e. it is an extension of the line between the two last data points in the segment track), and the blue line is the finish line with some defined width. It needs to extend a few meters to both sides to ensure a match, in this example it has a total width of 50 m (164 ft). If an effort does not cross the line, it should not be a match for the segment, just as it is today if no data point is close enough to the segment end point.

There is one problem with this approach, and that is with segments created from shoddy GPS data, especially when the heading is obviously wrong. This problem can be conquered by Strava leveraging either their own Slide technology, or by simply taking the median heading of the other efforts. Either way, the result is that the virtual line will be perpendicular to the actual direction that people ride in this spot. Here's how that would work:

Finish line corrected

The red track is the segment track, and unfortunately it ends with a heading that is not representative for how people ride. I haven't bothered to offset the finish line sideways in this example, so in this case two of the efforts would actually not cross the line and be matched. This could be easily fixed of course, either by extending the line further, or by offsetting it based on the trajectory of the other efforts.

Now, you could argue that the segment creator wanted the finish to end this way, and thus it shouldn't be corrected, but since Strava's current matching method also matches the segment, that's pretty much a moot point. And easily avoided by being careful when creating the segment in the first place. It would also be possible for Strava to check this during segment creation, and give the user a notice if the created segment would suffer from this issue. They wouldn't even need to check other rides, it should be sufficient to validate that the last few (and first few) data points in the to-be-created segment either are more or less in line, or that they follow a continuous curve.

The actual line crossing would be calculated by assuming constant acceleration between the data points immediately before and after the line crossing, and then simply finding the distance between the point before the line (or after, in the case of the starting line) and the line itself. It is then trivial to calculate the time for this distance. In my opinion, it would be suitable to round the result to the nearest 1/10th of a second, so it would still be possible for a KOM to be tied, just a lot less likely.

But what about angry people losing their KOMs?


If Strava implements the method outlined here, it is unavoidable that there will be changes to the leaderboards. However, there is no need to go ahead and recalculate all at once. Instead, I propose that when a new segment effort is matched, Strava simply recalculates a suitable number of efforts above and below the new matched effort (say 5 above and 5 below), ensuring that this new effort competes fairly with the relevant efforts.

This method effectively kills two birds with just one stone; Strava avoids a massive load on their servers, and the leaderboards will change as necessary. People will only lose KOMs whenever someone either beats their time, or comes close enough to cause a recalculation of the current KOM. And the KOM will of course only be lost if it actually isn't the legitimate KOM.

I'm convinced that if Strava communicates such a change properly, people would understand and accept it.

Making segments prettier

I would even suggest that Strava starts using their aforementioned Slide technology on segment tracks. That would serve two purposes, one is that segment matching (i.e. the matching of the entire segment, not just the end points) would become more accurate, and the segment display would be visually more pleasing, with smooth lines and curves. The segment distance would then also change, but the segment timing would not be affected.

This would just be the icing on the cake though, and a lot less important than implementing proper segment time calculation.

1 comment:

  1. Yeah, the other thing this would solve is segments which are lanes in a multi lane course :-)