What do you have against Poisson processes @narthur?
What do you have against Poisson processes @narthur?
With your latest proposal, say you’ve gone 80 minutes without a ping - you’re guaranteed one in the next 10 minutes, so working at that time is worth more than normal.
With Poisson you avoid that problem, because it’s memoryless!
@zedmango’s exactly right, as was your first instinct to implement it by having a 10% probability of a ping every 5 minutes (other than the small loophole where you can slack off in the 5 minute window following each ping). The official implementation is equivalent to that same algorithm, just doing that literally every second instead of every 5 minutes.
I think narthur’s idea can still work, but only if we properly weight the measurements in the final estimate.
We sample the next ping interval uniformly between 0 minutes and 90 minutes. When determining the average rate for a label, we multiply each datapoint by (90-T)/45 where T is the number of minutes in that interval. (I think this is the correct weighting coefficient, but I might be wrong.)
This leads to an unbiased estimate of the true time spent on each label, no matter how the user changes their behavior as a function of the elapsed time in the interval.
Intuitively: True, if 89 minutes have elapsed, I can be sure that there will be a ping in one minute. However, this ping will count very little to the final estimate, so it’
I’m curious if
(a) what I wrote is actually correct (i.e. can we really get unbiased estimates like that? And is it sufficient to ensure non-exploitability?)
(b) would this actually lower the variance of the estimate? Or rather, in what situations would it do so? Effectively, we could still get something like a drought, if we get a bunch of intervals close to 90 minutes in a row, which all count very little.
© Are there any smart ways to adaptively sample / importance-sample the intervals that both ensure unbiasedness / non-exploitability, but perhaps substantially reduce the variance compared to Poisson sampling?
What do you guys think?
So one could be 100 times more efficient by only working in the 10 microseconds surrounding each possible sample time: like work 11:59:59.995 - 12:00:00.005, relax for 990 microseconds, then work 12:00:00.995 - 12:00:01.005, and so forth.
Now we just have to find something that we can do in 0.99 seconds.
Using that coefficient, seems like you’d benefit from working extra hard right after a ping, since in the few minutes after a ping, the ping will be worth about twice as much.
More generally, given any weighting coefficient that varies with T, you can exploit it by working harder at values of T where the coefficient is higher. It’s like buying more lottery tickets when the jackpot is higher. So you really need a constant weighting coefficient.
It’s unfortunate because in my ideal world I could choose when to report, so I wouldn’t be interrupted when I was concentrating. But I really don’t think there’s any way to make that work in a way that couldn’t be gamed. Besides the fact that if I avoid reporting when I’m concentrating on work then that kind of prevents me from tracking work time…
Sure there is - just do it non-stochastically. For instance, work in pomodoros of 25 minutes, then report each one after it’s done. Or, start a stopwatch, work on something as long as you’re concentrated, then once you stop working on it, stop the stopwatch and report.
Truth. I’m already doing pomodoros (was before I started messing with stochastic tracking). Unfortunately my work life got so crazy that a lot of my time is spent in activities for which I can’t commit focused blocks, like meetings. Hopefully that will calm down soon.
So I may switch to something more conventional to complement my pomodoros–Toggl, perhaps.
I’ve found that pings while working don’t feel like interruptions. They’re like the opposite, reinforcing that I’m doing what I meant to be doing. They’re not pulling my attention to something else.
If that doesn’t work for you because the ping sound itself breaks your concentration, @zedmango’s idea could work. If you can guarantee that you’re totally focused during the entirety of a pomodoro then you can answer any pings from that pomodoro retroactively when it’s done.
Ooh, or how bout this - keep a pen and paper time log, keep your pings on silent, and then go back and retro-respond to them at the end of each day?
I think that defeats the point of TagTime! If you could reliably do that then you’d have a perfect account of all your time and TagTime’s noisy estimates would be superfluous.
Yeah, but I at least can’t do that reliably, so a combo of TagTime and manual timekeeping might give more accurate results.
Or just put your phone on silent some of the time (like work hours) and go back and retrofill after - the rest of the time you have the pings audible.
Can you or @bee elaborate on these points? What’s her tag-tology and how does the tocks goal work?
That’s interesting. To be honest, I haven’t fully understood the issue, maybe you can help me.
Maybe we use different definitions of “exploiting”?
In the method I proposed, it is true that the pings right after a previous ping count more. However, at this time, the probability of the ping occurring in the next millisecond is as low as it gets. As you go progress in the interval, the probability of a ping in the next time instant increases while the weight decreases. These two effects can be set up to cancel each other out, in some sense. They ensure that no behavior policy can look at the time since the last ping in order to lead the system to systematically over- or underestimate the time spent working.
Any “behavior policy” has a certain true rate of some activity of interest. This rate will be some number between 0 and 1 and specifies the fraction of time spent doing that activity. I claim that if you have a method that estimates this quantity without bias for any behavioral policy, i.e.
E[estimated_rate] == true_rate, then it cannot be exploited. Would you agree with that?
The fixed-ping schedule can be exploited because a policy that only executes the activity for an epsilon-interval around the measurement point will lead to a biased estimate: the true rate can be arbitrarily low while the measured rate is guaranteed to be 1. (And since there is no stochasticity, no re-weighting will help). Here, we have a measurement that is biased for some policies.
The method proposed by narthur + taking the reweighting coefficients into account, should, however, be unbiased. Or is there any behavior policy that results in a skewed measurement?
Maybe there are more sensible definitions of “exploitability” other than biasedness which I haven’t considered?
Let p(T) be the probability of getting a ping where T is the time since the last ping. Let w(T) be the weight of the ping - that is, the score sent to Beeminder.
Then if p(T) * w(T) is not constant, you can exploit the algorithm by working more at values of T with higher p(T) * w(T).
With a Poisson distribution, p(T) is constant, and of course in the TagTime implementation w(T) is constant as well. With a non-Poisson distribution in which p varies only as a function of T, I think what you’re suggesting is to make the weight inversely proportional to p(T) - that is, make w(T) = k/p(T) so that p(T) * w(T) is constant.
While that would indeed prevent you from gaming the algorithm, it’s not clear to me whether your results would average out to accurately reflect the time you spent.
Oh, ok, I see what you mean now. If you’re choosing the time between pings uniformly from, say, (0, 90), after T minutes the probability of getting a ping in the next minute should be 1 / (90 - T). So the weight should then be k(90 - T).
That way p(T) * w(T) is a constant and you can’t exploit the algorithm.
Yeah I think that’s a good way of putting it!
Yup, that’s consistent with my calculation.
I think it would indeed accurately reflect the time you spent. This is how I’d describe it, a bit informally:
Let U(t) be the user’s policy: it specifies the true average rate of engaging in some fixed activity as a function of the time t since the last ping. The domain of U is all nonnegative reals, and the codomain is the interval [0, 1].
The true rate of the activity is now R = ∫U(t) p(t) dt from t=0 to infinity where p(t) is the marginal distribution over times-since-last-ping that occur in the process. For narthur’s sampling rule, p(t) looks triangular with highest density at t=0 and zero density at t=90.
For any measured point, its estimated rate R_hat is the observed user’s action, times a weight W(t) where t is the time since the last ping. If we take the expectation, we get E[R_hat] = ∫ U(t) W(t) q(t) dt where again U is the policy, W is the weighting function that we want to choose, and q is the distribution over times-since-last-ping in the measured points. In narthur’s sampling rule, q is a uniform distribution from t=0 to 90. In order to make E[R_hat] = R, for all policies U, we have to make this integral equal to the integral for R above. We achieve that by setting W(t) = p(t) / q(t), which results in the coefficient W(t) = (90-t)/45 that we determined earlier in the thread. (More generally, W(t) = 2 - 2 * t / max_duration).
I’ll look over that in more detail in a bit - but you seem to be assuming that the user’s behavior is purely a function of the time since the last ping. If a user acts differently depending on what happened earlier - say, more or less focused on days when there have previously been many short intervals - would that introduce a bias?
Can you clarify for me how p(t) works? What’s a marginal distribution?