[Disclaimer: I’m not an expert in this field, so skepticism is definitely in order I’ll be more detailed not because I want to be lecturing, but such that you can spot errors in my reasoning.]
We don’t want to make any simplifying assumptions about the user’s behavior. It should be allowed to be arbitrarily complicated. The assumption I want to make about the behavior is only that it, together with the measurement process, is a stationary ergodic process, i.e. that the underlying distribution does not keep changing and that sample paths are ultimately informative about the distribution. This assumption may actually not be very realistic because the user’s behavior changes throughout their lives, and in general, we only have a finite life anyway so nothing will really converge, etc. etc. But it feels close enough and allows us to make this analysis. In practice, it’s probably sufficient that the underlying behavior does not change too quickly.
How I introduced U(t): without any measurements, all we can say about the user’s state at some random time τ is that they are either performing the activity or they aren’t (0 or 1). Since the process is stationary, this does not depend on τ, so there is just a single probability r that describes the user’s true long-term average rate of the activity (even if there is a very complicated process underlying the changes from activity to non-activity and back). Now, if we include measurements, we can’t simply use a single number r anymore, because the measurements might influence the user’s behavior - the true average rate of the activity might, for instance, be higher right after a measurement than one hour after the measurement. So we model the user’s behavior as a function U(t) where t (perhaps confusingly) is the time since the last ping, and U(t) is the true average rate of the activity t minutes after any measurement. In U(t), all the complicated ways in which the user’s behavior depends on their past behavior is already integrated out. In other words, U(t) is the expected value of the activity-indicator-variable at any instant, conditioned only on the time since the last measurement being equal to t.
By marginal distribution p(t) I was referring to the probability of being in a situation where the time-since-last-ping is t. E.g. when we sample ping times uniformly from the interval [0, 90], we will be at t=0 “more often” than at t=89. (It is difficult to talk about it in continuous time because those are all just probability densities instead of probabilities.) If you do the calculation, you get the probability density p(t) = (90-t)/45 for 0<=t<=90, and 0 otherwise. This is the pdf for the time-since-last-ping t when we look at a random point in time in the process. (Should have used a different symbol than t…). This distribution is generally different from the distribution over the time-since-last-ping in our dataset of measurements. Here, we have the distribution q(t), which is simply the distribution from which we sample the next ping interval - uniform in the example above and exponentially distributed for Poisson sampling. It turns out that in Poisson sampling, p and q happen to be identical! A nice property of the exponential distribution. That’s why q(t)/p(t) is 1 and we don’t have to worry about introducing any weights - our sample average will be an unbiased estimator of the true activity rate. With any other sampling distribution, we have to use weight W(t) = q(t)/p(t) to de-bias the measurement.
One way to make everything much easier to understand is to simplify the system and discretize time. We now look at time steps. The Poisson process now becomes a Bernoulli process, which takes a measurement with some probability p at each step, independently of all the other measurements. The fixed-horizon strategy makes each measurement at most n steps after the previous measurement. In the simplest case, we can say n=2. This can easily be done on paper, and the results should translate easily to the continuous-time version in the limiting case.
Btw. I actually believe that the Poisson measurement is probably pretty good for stochastic time tracking. Simple and elegant. My interest in this was more intellectual - does it have to be a Poisson process or could it be something else? So far I think it does not have to be Poisson. Maybe if someone wants to make a stochastic time tracking app for the masses, they can get better user retention if there are no ping-droughts. Those might frustrate users and cause them to stop using it altogether. Also, some sampling distributions might be both non-exploitable and may lead to higher-confidence estimates, given the same total number of measurements.