Sunday, January 7, 2018

Den såkalte splittelsen i geocaching-miljøet på Haugalandet

Den opprinnelige teksten i dette innlegget er nå fjernet på oppfordring (fra flere hold). Hvis dette miljøet skal kunne leges, så må skarpe ord slipes ned, og i håp om at alle skal kunne legge gamle konflikter og sår bak seg, så er teksten nå tatt bort.

Som indirekte tredjepart har jeg et sterkt ønske om at alle nå klarer å innse at en konflikt aldri kan oppstå helt ensidig, og ingen part har rett i alt. Det som er gjort og sagt kan ikke gjøres verken ugjort eller usagt, men det kan få lov til å skli rolig bakover og ut i historiens bunnløse mørke. Det er ikke dermed sagt at alle sår umiddelbart blir leget, men med tiden så blir de det. Om ingen pirker i dem.

Dette kan kun fungere om alle velger å se fremover, så min oppfordring er derfor at alle involverte er veldig bevisst på å faktisk legge dette bak seg. I det øyeblikk noen velger (og ja, det er et valg) å avvike fra dette, så er plutselig konflikten levende igjen, og det verken trenger eller fortjener dette miljøet!

Det er ikke slik at alle nødvendigvis må bli perlevenner (tidenes selvfølgelighet...), men alle må kunne omgås hverandre med normal høflighet og folkeskikk, og så velger man jo hvem man bruker mest tid sammen med.

Thursday, October 1, 2015

Why doesn't Strava do proper interpolation? And a proposed solution!

Speculation first...


This is a follow-up to my previous post about Strava's segment timing, and since they are clearly aware that their method is crude and inaccurate, I'm wondering why they have chosen not to do anything about it. Here's my short list of how they might reason against it:

  1. Additional computational load on their servers (to do proper interpolation)
  2. GPS accuracy isn't good enough in the first place
  3. People losing KOMs after re-calculation would become upset, and would complain
  4. Most segments are long enough that the error will be small compared to the time spent in the segment
  5. They don't know how
Well, the last point was a joke, because it is rather simple to do this properly, and I can't imagine that their engineers are incapable of such a simple calculation. But still, I'll provide my proposed solution further down, just in case...

So, let's start at the top of the list. Computational load. Sure, if they were to queue a re-calculation of all the entries in the leaderboards for all the segments, then I can imagine that it would be quite expensive in CPU time. However, that's not really necessary, which I'll get back to shortly.

GPS accuracy. Everyone knows that GPS accuracy varies, both between devices and from environmental conditions. However, dynamic positioning (i.e. at speed) is usually better than static positioning, since the GPS unit measures the Doppler shift for the signals received, and can then better predict speed (and thus differential positioning). I will not claim to be an expert in this area, but suffice it to say that in my opinion, GPS accuracy is good enough for interpolation of segment start/finish crossings to make perfect sense.

People losing KOMs would become upset. Probably, but my suggestion (wait for it...) would minimize this impact, I think.

Segment length. Yes, the error introduced by Strava's current method is small compared to the total segment time for longer segments, but the absolute error is unaffected. This means that for segments where the fight for the KOM is fierce and efforts are close to each other, the current method can still cause leaderboards to have the wrong order, even if the segment is long.

A possible solution


The first thing for Strava to do would of course be to actually implement proper segment time calculation. To do this, virtual start and finish lines are needed. All you need for that is a position and a heading, and then use the given heading to create a line that is perpendicular to this, and which extends a suitable distance to both sides. The given position (which is one of the segment ends) will then be at the center of this line, like so:

Finish line illustration
The red line is the segment track, the green dot is at the segment end position, the green arrow is the heading/bearing of the segment track at the end point (i.e. it is an extension of the line between the two last data points in the segment track), and the blue line is the finish line with some defined width. It needs to extend a few meters to both sides to ensure a match, in this example it has a total width of 50 m (164 ft). If an effort does not cross the line, it should not be a match for the segment, just as it is today if no data point is close enough to the segment end point.

There is one problem with this approach, and that is with segments created from shoddy GPS data, especially when the heading is obviously wrong. This problem can be conquered by Strava leveraging either their own Slide technology, or by simply taking the median heading of the other efforts. Either way, the result is that the virtual line will be perpendicular to the actual direction that people ride in this spot. Here's how that would work:

Finish line corrected

The red track is the segment track, and unfortunately it ends with a heading that is not representative for how people ride. I haven't bothered to offset the finish line sideways in this example, so in this case two of the efforts would actually not cross the line and be matched. This could be easily fixed of course, either by extending the line further, or by offsetting it based on the trajectory of the other efforts.

Now, you could argue that the segment creator wanted the finish to end this way, and thus it shouldn't be corrected, but since Strava's current matching method also matches the segment, that's pretty much a moot point. And easily avoided by being careful when creating the segment in the first place. It would also be possible for Strava to check this during segment creation, and give the user a notice if the created segment would suffer from this issue. They wouldn't even need to check other rides, it should be sufficient to validate that the last few (and first few) data points in the to-be-created segment either are more or less in line, or that they follow a continuous curve.

The actual line crossing would be calculated by assuming constant acceleration between the data points immediately before and after the line crossing, and then simply finding the distance between the point before the line (or after, in the case of the starting line) and the line itself. It is then trivial to calculate the time for this distance. In my opinion, it would be suitable to round the result to the nearest 1/10th of a second, so it would still be possible for a KOM to be tied, just a lot less likely.

But what about angry people losing their KOMs?

 

If Strava implements the method outlined here, it is unavoidable that there will be changes to the leaderboards. However, there is no need to go ahead and recalculate all at once. Instead, I propose that when a new segment effort is matched, Strava simply recalculates a suitable number of efforts above and below the new matched effort (say 5 above and 5 below), ensuring that this new effort competes fairly with the relevant efforts.

This method effectively kills two birds with just one stone; Strava avoids a massive load on their servers, and the leaderboards will change as necessary. People will only lose KOMs whenever someone either beats their time, or comes close enough to cause a recalculation of the current KOM. And the KOM will of course only be lost if it actually isn't the legitimate KOM.

I'm convinced that if Strava communicates such a change properly, people would understand and accept it.

Making segments prettier


I would even suggest that Strava starts using their aforementioned Slide technology on segment tracks. That would serve two purposes, one is that segment matching (i.e. the matching of the entire segment, not just the end points) would become more accurate, and the segment display would be visually more pleasing, with smooth lines and curves. The segment distance would then also change, but the segment timing would not be affected.

This would just be the icing on the cake though, and a lot less important than implementing proper segment time calculation.


Sunday, September 27, 2015

Why Strava segment matching and timing sucks (and how to cheat)

Intro

I'm enjoying Strava segments. I find segments motivating and fun, and even if it should be considered just a game (and not serious competition), fact is that there is some prestige to be had for holding the KOM on certain segments. However, Strava doesn't really try to get accurate segment times, so if you lost your KOM (or gained a KOM) by a small margin, then there is a chance that the result is wrong. This is of course true also further down the leaderboard, so if your mate beat you with a small margin, there is a chance that in reality he didn't.

Strava's segment timing

The reason for this is no secret, it is because Strava makes no effort to calculate your actual segment time, they just choose the data point from your ride that is closest to the start (or end) point of the segment as the basis for their timing. Here's what Strava says:
Recording intervals vary between devices - for example, the Strava mobile app records every second while Garmin devices use either 1-second intervals or a smart recording which has a varied recording interval. Segment matching works the same on each GPS dataset, but depending on the device's recording interval, can yield different results. Segment matching uses the GPS points in the data closest to the start and endpoints of the segment, and as this can vary with each activity, timing on a segment can vary slightly because of this. At the present time, we don't interpolate or extrapolate GPS data to normalize for the exact start and end positions of the segment.
The highlight is mine, and it summarizes the issue in just one sentence. Strava also makes a note that Garmin devices has a recording mode called 'Smart Recording', which is a dynamic mode where the unit decides (depending on a few metrics) if it should write a data point to the log or not. If you ride in a straight line, the unit will typically increase the recording interval, waiting a few seconds between log points. Now, if a segment starts on a straight section, devices with smart recording (or otherwise a long recording interval) will most likely miss the segment start point by quite a large margin. The same is of course also true for the segment end point, and armed with the knowledge that Strava just chooses the closest point, we can see that there are four possible scenarios:

  1. The log start point is before segment start and the log end point is after segment end
  2. The log start point is before segment start and the log end point is before segment end
  3. The log start point is after segment start and the log end point is after segment end 
  4. The log start point is after segment start and the log end point is before segment end
For segment timing, scenario 1 will yield the worst segment time, while scenario 4 will give the best segment time. As for the two in the middle, whether you gain or lose all boils down to the actual distances by which you miss the segment start and end. For example, if your log start point is just before segment start, and the end point is quite a distance before segment end, then you clearly gain.

An example

I guess you might have become bored if you've read this far, so here's a visualization from a random segment (but one that I'm familiar with):

Segment start

This is the start of the segment, and the segment track itself is shown in red (I've added a virtual starting line perpendicular to the segment track) . The colored tracks are the efforts of the top 10 from the segment leaderboard, and as you can see (if you look closely), there are 4 efforts that start early, and 6 efforts that start late. The yellow rider is the one that gains the most, and his time effectively starts approx 17 m (55 ft) into the segment, gaining him ~1.2 seconds in this case.




Segment end


To the right you can see the end of the same
segment. Again, if you look closely, you can see that half the efforts end early, and the other half overshoots the finish. The green rider's effort ends approx 19 m (62 ft) after the segment end, causing him to lose ~1.4 s. The yellow rider's effort ends just before the finish, gaining him a couple of tenths, so in total he gained ~1.3 seconds. He was clearly using smart recording, because for this effort his device logged one data point each 3.1 seconds on average. The yellow rider's effort was a case of scenario 4, as listed above.

The green rider's effort was a case of scenario 3, and even though he lost some on the finish, he gained some at the start, so in total he gained just a couple of tenths.

How much error does Strava allow (and how to cheat)

I seem to recall that Strava requires your ride to match the segment 75% of the time, but I can't find any accurate references to this at the moment. If true, it means you could pass the segment start, and somewhere along the way make a shortcut and then pass the segment end and still get a segment time. I'm a bit skeptical that this still holds true, as I think perhaps Strava has improved this.

However, let's get back to the issue at the start and end of segments. We know that Strava chooses the closest data point, but are there any limits to how far off a point might be, and still be considered valid by Strava (clearly there is, but bear with me). To find out, I took one of my own rides where I rode this segment, and started removing data points on both sides of the segment start point, and then uploading to Strava for each iteration, until I no longer got a match for the segment. I didn't care to do this in small enough increments to find the exact limit, but I can say that it seems to be somewhere between 40 and 50 m (131 to 164 ft). In other words, as long as you have a data point that is within this distance of the segment start (and end) point, you will get a match for the segment (of course provided you also ride the segment, but that's stating the obvious).

Now, I'm sure you're all curious what effect this had on my times on this segment, and here they are:

Pos   1: Manipulated effort   time:  2:47.00
Pos   2: Manipulated effort   time:  2:50.00
Pos   3: Manipulated effort   time:  2:51.00
Pos   4: Genuine effort       time:  2:55.00
Pos   5: Manipulated effort   time:  2:57.00


Remember, I didn't really do anything but weed out some data points at strategic places, I didn't change any of the data (like speed or time) in the rest of the data set, I just removed some points, much like what happens naturally with devices that use 'Smart Recording'. The main difference is that with those devices the outcome is more random, and you'll both win some and lose some.

The worst result in this list was when I accidentally got it wrong, and the last data point before segment start was a wee bit closer than the first data point after. For one of these efforts, I removed data points both at the start and at the end, and as you can see, I then gained a whopping 8 seconds on my own genuine effort. From the same data set!

For reference, here's images showing what this (manipulated) effort looks like:


Segment start - manipulated
Segment end - manipulated

How Strava could fix this

At the start of this post, I quoted Strava where they state that they don't interpolate or extrapolate to calculate more accurate segment times. However, had they done so, they could have increased the probability of getting correct leaderboards many times. I'm not saying it would become entirely accurate because clearly it wouldn't, but why should that be an argument for not trying? I've created my own piece of software that can do this (with some limitations, since the Strava API doesn't allow downloading of the entire data stream for other athlete's activities). Here's the results for the exact same efforts as above (sorted, obviously):

Pos   4: Genuine effort       time:  2:54.97 (corr: -0.03)
Pos   5: Manipulated effort   time:  2:55.10 (corr: -1.89)
Pos   3: Manipulated effort   time:  2:55.13 (corr: +4.13)
Pos   1: Manipulated effort   time:  2:55.17 (corr: +8.18)
Pos   2: Manipulated effort   time:  2:55.25 (corr: +5.26)


The number in parentheses is the amount of correction that was applied as a result of the interpolation, and as you can see, the difference across the board is still within 3/10 of a second!

I've also done some more interesting experiments, like riding with two Edge units, one using 1 sec recording, the other set to smart recording. While the majority of the results (for segments I passed) were close (luckily), I also observed exactly what I have written about in this post: A big discrepancy (4 seconds on Strava) between the device using smart recording compared to the one that used 1 sec recording. After processing the data in my software, the difference was within a few tenths.



I might write some more about this, if time permits.