Possible new TagTime universal ping algorithm

dreev · August 6, 2018, 11:58pm

@insti, that’s genius, hashing the timestamp to get the random number.

I’m nervous about how to be sure that non-cryptographic hash functions are random enough. If we used, say, MD5 we could probably count on that as an off-the-shelf function in any language. (Pretty sure it doesn’t matter if it’s cryptographically broken, as MD5 is.) But cryptographic hashes might be too slow, or just wastefully compute-intensive, so maybe it’s worth convincing ourselves that a non-cryptographic one suffices.

But now I’m wondering if we’re sure hashing is better than an LCG from a fixed ur-ping. That’s only slow/wasteful on startup. The common case is that TagTime is already running and we just need to find the next timestamp with a ping. So whatever gives us the least computation per timestamp is what’s going to be overall most efficient. And that probably means an LCG. You can also avoid the startup cost by always writing to disk the last ping time and seed. Then the only tradeoff is that if TagTime has been off for a while then you have to compute from the last known ping. With hashing you can start computing from Right Now. But probably we should make the tradeoff in favor of what’s fastest/efficientest for a constantly running TagTime.

Good question. We’ve never cared about it so far and I don’t feel like we’ve been missing out. It’s implausible for the knowledge of “no ping for the next full second” to change one’s behavior at all. I’m still thinking about whether it’s plausible to want to get data on time spent answering pings if you answer pings in less than a second…

dreev · August 7, 2018, 5:48am

Or even if subsecond ping-answering never actually happens, does the discretized-by-whole-seconds version of the algorithm distort anything? Suppose you take x seconds to answer a ping. For the discrete version and for the continuous version of the algorithm, what’s the probability of getting pung while answering the previous ping? If x<1 then the probability for the discrete version is 0. If x=1.999 then the probability is about half what it should be, I think. If x is more than, I’m guessing, a dozen seconds then the probabilities will be indistinguishable. What about x=3? That’s a reasonable amount of time to answer a ping.

http://dreev.commits.to/compute_discrete_vs_continuous_if_no_one_else_does

insti · August 7, 2018, 6:51am

Pretty sure this would be covered in papers describing the hash.
We can also run our own randomness tests if we want.
I’m pretty sure a sufficiently random non-cryptographic hash is fine.

So whatever gives us the least computation per timestamp is what’s going to be overall most efficient.

Won’t this depend on the ping interval?

I suspect the hashing version is going to be the best option if the goal is synchronised ping schedules for people using different gaps.

But I’m not entirely convinced that is a useful feature, so why are we changing anything?
I’m not sure (with all due respect to @byorgey) that

is a good reason.

dreev · August 7, 2018, 8:40am

Sounds true. But I’m starting to think that LCG is better anyway if that’s the fastest/cheapest/simplest for TagTime as a background process. The startup cost of going back to the last known ping+seed and walking forward is no big deal and is a cost paid only rarely, when the background process is restarted.

Not with the nifty @byorgey/@dreev/@mvr algorithm. [1]

Ah, yeah, @bee has the same feeling. I do love the universal schedule though so if we deem it important to change the ping frequency I’d prefer the schedule still match as much as possible.

[1] Pseudocode for the possible new algorithm:

First some constants and global variables:

Given desired average gap, GAP, between pings (in seconds).
Given fixed constants URPING (a unixtime in deciseconds) and SEED.
Given fixed constants IM and IA for the RNG.
Let constant p = 1/(GAP*10) // probability per decisecond
Let s = SEED // s is seed or RNG state
Let t = URPING

And some functions:

pthresh(t):   // LCG version
  s = IA * s % IM
  return s / IM  // this is a U(0,1) random variable

pthresh(t):   // hash version
  compute a hash of t and derive from that a U[0,1] random variable

decinow():
  return current unixtime in deciseconds -- Date.now()/100
  
nextping(t):  // next ping time after time t (unixtime in deciseconds)
  do t++ until p > pthresh(t)
  return t

For the LCG version that also needs a prevping() or init() function as in the current algorithm. For the hash version it doesn’t.

insti · August 7, 2018, 11:04am

Should there be a t in there somewhere?

Edit: Answering myself: No, because the initial value of s, IA and IM specify the state of the LCG at t. t is the starting (UR) time not the current time we’re interested in.
Is that right?

nextping(t):  // next ping time after time t (unixtime in deciseconds)
  do t++ until p > pthresh(t)
  return t

So we still need to look at every t either way?

byorgey · August 7, 2018, 1:00pm

Yes.

I want to emphasize that with logarithmic fast-forwarding, the startup cost is truly negligible. Let’s do a simple example. Suppose we are using a decisecond granularity, and someone starts the TagTime app 20 years from now. That’s 10 x 60 x 60 x 24 x 365.25 x 20 = approximately 6.3 x 10^9 deciseconds we have to walk through from the ur-ping to get caught up. But with a repeated squaring algorithm, we only need to do at most 2 log_2(n) steps to get to time n. In this case 2 log_2(6.3 x 10^9) is approximately… 65. That’s the same amount of computation you would normally need to do to compute just six seconds worth of pings. I need hardly tell you that modern computers, even the “slowest” ones, can do a few hundred multiplications and mod operations much faster than a human can blink. You would never, ever notice TagTime taking any time on startup. This is so fast that it’s probably not even worth the programming effort to cache the most recent value of the LCG somewhere when the app quits. In fact it could probably compute each ping by walking forward from the ur-ping from scratch, every single decisecond and you would never notice.

insti · August 7, 2018, 2:00pm

Thanks for the awesome description.
Do you have some code (pseudo or otherwise) that demonstrates how this works with the LCG?

drtall · August 7, 2018, 10:45pm

If the ping schedule is predetermined then people can already look ahead for interesting patterns and/or decide when they’re going to work based on pings. I mean I could already make an app that alerts you 10 seconds before the TagTime pings so you’re always working when you get pinged, right?

byorgey · August 8, 2018, 3:54am

Sure, here’s some Haskell code. Of course not everyone knows Haskell but I’ve tried to include a bunch of comments so hopefully it makes sense. Let me know if you have questions.

gist.github.com

https://gist.github.com/byorgey/3c76bde58f5f1be0359fbb06e4b90225

FastForwardLCG.hs


-- An LCG consists of three values: a multiplier, an offset, and a
-- modulus.
data LCG = LCG
  { multiplier :: Integer
  , offset     :: Integer
  , modulus    :: Integer
  }
  deriving Show

This file has been truncated. show original

byorgey · August 8, 2018, 3:55am

Indeed. This is the downside of having a universal ping schedule. (Well, it’s not so much a downside as a potential temptation.)

dreev · August 10, 2018, 8:31am

If I did this right then here’s a graph with the amount of time you spend answering pings on the x-axis and the amount of time that each algorithm (blue = continuous, orange = discrete with 1-second granularity) will tell you you spent:

And here’s the same thing but with 0.1-second granularity:

(The probability of getting pung in s seconds after the last ping is 1-exp(-s/g) in the continuous case and 1-(1-d/g)^floor(s/d) in the discrete case, where d is the granularity, eg, 1 for seconds or 1/10 for deciseconds, and g is the mean gap in seconds.)

So I guess we can call that justification for using deciseconds in the new algorithm (if we actually want a new algorithm). Since I got seriously nerdsniped, I implemented it in Javascript. The RNG below has a period of 109 years in deciseconds. And it does the fancy fast-forwarding.

First, constants:

const GAP = 45*60       // Average gap between pings, in seconds

const URPING = 15338120000 // Birth of the new TagTime algorithm in deciseconds
const SEED = 20180809     // Initial state of the random number generator (RNG)
const IA = 3125        // (5^5) Multiplier used for LCG RNG
const IM = 34359738337 // (2^35-31) Modulus used for RNG (assumes 36-bit ints)

// Each decisecond has probability 1/(GAP*10) of being a ping, where GAP is the 
// average time between pings in seconds. The LCG random number generator
// returns a random integer uniformly in {1,...,IM-1} so LCG/IM is a U(0,1) 
// random variable and there's a ping iff 
//   LCG/IM < 1/(GAP*10) => LCG < IM/(GAP*10)
// so we define a constant for the RHS to compare to.
const THRESH = IM/(GAP*10)

Global variables:

let state = SEED // Global variable that's the state of the RNG
let lastping = URPING // Global var with decisecond of last ping

Random number generator, with fast-forwarding:

// Return a random integer in [1,IM-1]; changes the RNG state.
// wikipedia.org/wiki/Linear_congruential_generator
function lcg() { return state = IA * state % IM }

// Compute a*b % m when a*b might overflow
function multmod(a, b, m) {
  if (b === 0)   { return 0 }
  if (b%2 === 0) { return multmod(a*2%m, b/2, m) % m }
  else           { return (a + multmod(a, b-1, m)) % m }
}

// Efficiently compute a^n % m -- wikipedia.org/wiki/Modular_exponentiation
function powmod(a, n, m) {
  if (n === 0)   { return 1 }
  let x
  if (n%2 === 0) { x = powmod(a, n/2, m); return multmod(x, x, m) }
  else           { x = powmod(a, n-1, m); return multmod(a, x, m) }
}

// Fast-forward the RNG by doing i steps at once
function powlcg(i) { return state = multmod(powmod(IA, i, IM), state, IM) }

And finally, the function to get the time of the next ping, from any starting time:

// Current unixtime in deciseconds
function decinow() { return round(Date.now()/100) }

// Given unixtime in deciseconds, t, start checking at t+1 and return the first
// time we come to with a ping, which becomes the new lastping time.
function nextping(t) {
  if (t < lastping) { return 0 }            // going back in time not allowed!
  if (t > lastping) { powlcg(t-lastping) }  // fast-forward the RNG up thru t
  lastping = t+1
  while (lcg() >= THRESH) { lastping += 1 }
  return lastping
}

For example, nextping(decinow()) efficiently gives the next upcoming ping time (in deciseconds) and then you just keep calling nextping with the previous ping time to get the next.

Feel free to play with the code on Glitch.

Btw, I realized that we can’t use the nifty fast-forwarding in the original algorithm because we don’t know the exact number of times the the RNG will have been called in the past.

dreev · August 10, 2018, 10:06pm

So the next question is whether we care about this… To review, this gives us a universal ping schedule that preserves universality as much as possible when you change the ping frequency. If you opt for more frequent pings you’ll just get extra ones in addition to the usual ones. And if you opt for less frequent pings then the ones you do get will still always be shared with everyone getting them more frequently. This could be handy in our house where we have half a dozen devices all pinging in unison. But not clear if there are any other use cases.

I guess any time you’re working nearby to someone else using TagTime it’s handy if the pings are simultaneous…

dpangier · August 30, 2018, 7:47am

Are there any implementations of the new algorithm in the wild? I am implementing TagTime for iOS, and used the new algorithm which caused confusion with existing users of TagTime systems that use the original Perl originated method. I would like a help screen to guide users to pick which schedule to use. A factor in that decision would be a list of the apps using which algorithm.

dreev · August 30, 2018, 11:53am

Oh no, sorry for the confusion in calling it “the new algorithm”! I’ll add a warning to it that unless there’s consensus that the new one is better, the original one at the top of this thread is the only one used in the wild. If people do like the new one then we’ll pick an exact time in the future for all existing implementations to switch over.

http://dreev.commits.to/add_warning_that_og_algorithm_is_it

UPDATE: Now done and I replaced instances of “new algorithm” with “possible new algorithm”

dpangier · August 31, 2018, 7:56am

No worries, I have both implemented under the hood. I’ll revert to the original for now, and when a decision is made as to the future I’ll implement whatever the decision is.

imoatama · September 25, 2018, 2:46am

I like the idea of the new universal ping algorithm. I don’t think it is worthwhile worrying too much about capturing time spent tagging for those that spend less than a second responding to pings, if only because if you really consistently answer pings sub-second, then you’re spending negligible time tagging even summed over years.

But also, I doubt many people spend less than a second tagging. For me it’s always at least 2-3 seconds. Further UI improvements might reduce this in future though I guess.

Topic		Replies	Views
Official Reference Implementation of the TagTime Universal Ping Schedule TagTime	11	2040	January 4, 2022
Tagtime polling frequently TagTime	2	1215	April 8, 2016
universal ping schedule and custom midnight TagTime	18	2501	December 4, 2016
Pure function to determine next and last ping? TagTime	3	761	July 26, 2020
I made an online version of TagTime TagTime	18	1527	April 24, 2021

Possible new TagTime universal ping algorithm

Related topics