I previously had this buried in another thread but let me put it here instead, and translate it to the lingua franca, aka Javascript. Thanks to @mary for nudging me to do this.
UPDATE: I moved this yet again!
This thread used to be the home of the official reference implementation of the TagTime algorithm but now it’s home to a discussion about a possible new algorithm. Until we reach consensus here, head to the above link for the official reference implementation!
So people using different GAP values will get a completely different set of pings, right? Actually if I’m reading this correctly, everyone will get the same “relative” ping schedule, but scaled by GAP, right?
I have always wondered whether it would be possible to have a single universal ping schedule, such that someone with a bigger gap simply gets a subset of the pings of someone with a smaller gap. That is, if person A is using gapA and person B uses gapB, with gapA > gapB, then whenever A gets a ping, B also gets a ping at the exact same time (but not necessarily vice versa). From another point of view, imagine that as gap approaches 0 the ping schedule becomes infinitely dense; pings disappear (but never move) as you smoothly increase the gap, until at gap = infinity there are no pings.
It seems like this should be possible in theory but I haven’t thought too hard about how to implement it. It would be sort of nifty if people with different gap settings still got pings simultaneously some of the time.
That would be amazing. I’m even (potentially) willing to change the universal algorithm to make it happen.
(I think the way to do that would be to pick a future date when it switches and get all existing implementations on board in time.)
Sketch of a possible algorithm with @byorgey’s property
You can approximate TagTime’s poisson distribution by discretizing time, let’s say into seconds, and just deciding independently each second with fixed probability whether that second should have a ping or not. That probability is in fact 1/g where g is the gap parameter in seconds. So we need to do a bernoulli trial with probability p at time t – bern(p, t) – with the constraint that every bernoulli trial at time t with p’>p – bern(p’, t) – will say yes if bern(p, t) says yes. For that we just need to pick, for an arbitrary unixtime t, the p threshold above which the answer is yes. So a function pthresh(t) that gives a probability. Now if you want to know if there’s a ping at time t for gap g, you just check if pthresh(t) < 1/g.
I don’t know if it’s a dealbreaker that this approach means computing that function again and again for each unixtime until you find the time of the next ping. Maybe worse, when TagTime first starts up, to ensure the RNG is in sync, it has to start at the dawn of (tag)time and walk forward, checking pthresh for every single second up to the present. Maybe that’s fine or there’s a clever way to avoid that.
As for the pthresh function, I think I see a way to write it if we also discretize and bound the possible gap lengths. Like you ask “is there a ping at this time at probability 1/60? no? how about 2/60?” etc. The first p that yields “yes” is the p that pthresh returns.
I was just going to suggest something similar! I don’t understand pthresh though, could one not just generate a random number in [0,1] each second and check whether it is less than 1/gap? If gapA > gapB then automatically a ping for person A is a ping for person B.
A possible trick to avoid running through every second since the ur-ping: some RNG algorithms allow fast-forwarding in logarithmic time.
I didn’t know about fast-forwarding RNGs. Fascinating. You could also just store the RNG state and current unixtime so you never have to start from very far in the past.
Any linear congruential generator (of the form seed’ = a * seed + c mod m for constants a, c, and m) has that property, since you can represent the transition step as multiplying the column vector (seed, 1) by the matrix [ [a, c] [0, 1] ]. Repeatedly applying the transition corresponds to taking powers of the matrix, which can be done in logarithmic time via repeated squaring. (Of course all of this has (mod m) thrown in everywhere appropriately, but it doesn’t change the basic argument.) LCGs aren’t necessarily the state-of-the-art, but they seem perfect for this application.
Edited to add: I think the algorithm @dreev has outlined (as enhanced by @mvr) should work very nicely indeed!
And actually, since TagTime is already using ran0, an LCG with c = 0, fast-forwarding becomes even simpler: you don’t need matrices at all, you are just computing powers of IA, mod IM.
Beautiful! Want to help implement the reference implementation? I have the current implementation (the code above) running on Glitch here:
So we could say, for example, the new universal ping schedule is the existing schedule until 2019-01-01 at which point it switches to the new algorithm.
I don’t think so. It’s a pretty tiny amount of computation for each check. Even on a slower embedded device I think you could probably check at least one million unixtimes per actual second (about 11 days’ worth of pings).
Should we care that ran0’s period is only 2 billion? If computing a random number every second that’s 68 years before it starts repeating. Some of us may still be alive in 68 years!
For the original algorithm that period was definitely fine because there was just one random number computed per ping so with a 45 minute gap that’s like a couple hundred thousand years of non-repeating ping gaps.
PS: Googling around about periods of random numbers I, amusingly, came across this:
I really don’t think it matters. Even aside from the unlikeliness of a TagTime user still being alive and still using TagTime in 68 years, what exactly would be the problem? That somehow they would remember ping patterns from 68 years ago and it would ruin TagTime’s effectiveness for them, because their subconscious would know when to expect the next ping?
It sounds farfetched when you put it that way but I guess I was thinking if something incredibly unlikely and memorable happens then you’ll know that’s coming again in 68 years. Maybe an epic ping day that motivates you to work more when the Great Wraparound approaches. Or vice versa for an epic ping famine. Of course if I’d just never mentioned the 68-year period… But, yeah, this is still farfetched.
One more farfetched scenario: conceivably we could care about subsecond granularity. Some people may answer pings in well under a second if the UI is smooth enough. And if you want to track the time you spend answering pings, well…
Still fairly far-fetched, but throw in the unknown unknowns and maybe it’s worth having a longer period just in case? I just learned from a paper by L’Ecuyer that it’s easy to pick constants for a multiplicative linear congruential generator that have longer periods than we could even conceivably care about.
So might as well go with one of those if we’re changing the algorithm anyway!
Could we make the ‘random’ value a function of the time (hashing maybe?), so rather than generating the next ping time we find it by comparing every timestamp until we find the next one.
Computers are fast enough that this is not a problem at a 1 second resolution. (and probably for any resolution > human reaction time. /handwave)
Caveat: It’s a while since I studied the relevant math for all this.
Based on the link @insti posted, it looks like the best choices would be FNV-1a or some variant of Murmur. FNV-1a is much simpler to implement. Both look like they give a reasonable distribution of hash values when calling them on consecutive input values (our use case), though Murmur looks perhaps slightly more uniform? One thing to be careful about/do more research on is that a good distribution of values is not quite enough: we also need consecutive hash outputs to be uncorrelated, which is not necessarily an important criterion for choosing non-cryptographic hash functions, so I don’t know how well these would fit the bill.
On that page it is claimed that Murmur is faster, but that seems to be due to the fact that it works 4 bytes at a time instead of 1 byte at a time, so there is probably not much difference (or it might even be slower) on small values: we are going to be using it on 32-bit Unix time stamps.
Overall it seems to me FNV-1a on 32-bit values is the way to go. For reference, the algorithm is as follows:
hash = 2166136261 // magic FNV offset basis for 32 bits
for each byte:
hash = hash xor byte
hash = hash * 16777619 // magic FNV prime for 32 bits
(If TagTime is still around in 2038 we can easily specify a switchover date to the 64-bit version of FNV-1a, to work with 64-bit unix timestamps.) I’ll play around with this a bit and get a feel for how well it would work.