Beeminder Forum

Does tagtime data model time-dependent behavior?

Hi. Sorry for title-gore, not sure what better title to use.

I was thinking about how tagtime gives you the right answer “eventually”. So, say I want to know how much time of my life goes to reading. In 10 years of tagtime I’ll have an amazingly close estimation of the portion of my time spent reading books. I believe this makes sense.

The problem comes if you want to know how much you sleep, or exercise, or waste time on social media, or a wide number of such behaviours that change over time.

In 10 years you will know that you slept 27375 hours total, or 7.5 hours a day. Oversimplifying how sleep works, that could either mean that you sleep a good number of hours, or that some months when you’re overworked you sleep 4 hours a day and some months when work is slow or on weekends/vacation you sleep 11 hours a day, in equal proportion. Still comes out to 27375 hours!

You could zoom in to see what happens each month, except of course you’re not necessarily going to switch sleeping habits on the month, or hard-switch one day from one sleeping schedule to the next.


  • What’s the smallest time interval “chunk” you can analyze and hope to be covered by the “will eventually be accurate” guarantee?
  • How do segment time intervals to find changes in behavior/time spent on things?
  • How do the above, mathematically, depend on average ping interval and proportion of tags being the one you’re interested in analyzing?
    • that proportion is time-varying, which is the whole reason we’re interested in this; can math save the day anyway?

What am I missing? It seems to me that without some further math and thinking, tagtime can’t deal with “how much sleeping are you doing” over time, it can just deal with “over the whole period you’ve been tagtiming, you spent 42% of your time on foobarbaz”.

The only “hack” (?) that comes to mind given my zero mathematical sophistication is, given time T and a window W, the total time S spent per day at time T is the number_of_pings * pings_per_one_day / W, with an error bar that depends on W and S (which I don’t know how to calculate but I’m sure you folks do). you can then plot S against T.

If the ping interval were constant the usual estimator of the error bar would be the same formula as the total time except with square root of the number of pings where you were sleeping replacing the number of pings. (Ok maybe that isnt strictly true…i am assuming the recording is a poisson process). I need to look up the tagtime algorithm and think about it to improve that.

The best approach to not have to worry if correlations make a naive formula wrong would be “the bootstrap”. Draw 1000 samples with replacement (same size as real data) from your tagtime data, compute the number for each sample, and use the standard deviation of the result as the error bar. I can post sample python code later (you use python right?)

EDIT: Actually im not even sure if the assumptions of the bootstrap hold because the original sample isnt completely random. Maybe someone else can weigh in… Or maybe this will help bootstrap - How do you do bootstrapping with time series data? - Cross Validated (but more likely it doesnt help lol)

It might also be possible to not worry about statistical error and do the segmenting thing. The algorithm could be roughly “scan through the data recursively; each time average of all points before T is different from average after T, split the data”. With some thresholds on how different the average has to be and a maximum amount you are willing to split, maybe (or bootstrap and only split if they are different by more than the error bar?)

Edit: the more i think about it the more this sounds better than what i wrote above about just making error bars. I will expand on it later

1 Like



I anxiously await ideas :slight_smile:
Yes, I do use python, but if the ideas are simple enough pseudocode or any language will do. I’d say wiki links would do as well but I’d be lying. I tried to understand what bootstrapping is but what I think I understood was just taking your small (eg. 1 week window) portion of data and taking random stuff from it a large number of times so you can compute statistics but that doesn’t seem to make sense to me, eg. if I had one coin flip and just kept sampling that I’d get no error bars… every day I regret not learning math more. if only there was a self-improvement tool to help with making sure you study stuff. :stuck_out_tongue:

You have the right idea, except you need to have more than one coin flip!

I played with it a bit this morning but the method I proposed seems kind of unreliable - basically if you try splitting the data at every possible time point, you seem likely to get a lot of “false positives”. (Basically, the algorithm wants to split the data near the edges just because the sample size is small near the edges and you get some kind of big fluctuations). The estimation of the error bar by bootstrapping was supposed to account for this but I guess I haven’t gotten it right yet. (This method is based on something people use in my subfield of physics so I feel like there has to be a way to fix this problem; I remember the bias at the edges of the data being a thing people talked about…)

Here is my current (NON WORKING) code. Note that I attempt to split the data every 10 pings. This is probably way too often.

import numpy as np
from numba import jit

def mean_sd(data):
This does not, in fact, return the mean and sd of the data.
It returns the mean and the bootstrap estimator of the standard error of the mean.
Maybe we can replace this with some kind of exact calculation, assume poisson, etc but let’s do it the slow way for now.
sample_size = len(data)
resample_sample_size = 100

mean_estimates = np.zeros(resample_sample_size)

for i in range(resample_sample_size):
    y = np.random.choice(data, size=sample_size)
    mean_estimates[i] = np.mean(y)
return np.mean(data), np.std(mean_estimates)

def single_split(data, z_thresh = 1):
Attempt to find a place to split the data.
If no such place exists, return None
z_thresh: how many error bars the two intervals should differ by to split.
This may all be terrible because of “multiple comparisons”, maybe using a fixed threshold is bad?

z_score = np.zeros(len(data))

for t in range(10,len(data),10):
    mean1, sd1= mean_sd(data[:t])
    mean2, sd2 = mean_sd(data[t:])
    difference = np.abs(mean1-mean2)
    var = sd1**2 + sd2**2
    if var == 0:
        var = 0.001 #yes this is hacky
    z_score[t] = difference / np.sqrt(var)
arg = np.argmax(z_score)

if z_score[arg] > z_thresh:
    return arg
    return -42

def recursive_split(data, splits,z_thresh=1):

t = single_split(data, z_thresh=z_thresh)
if len(splits) == 0:
    splits = np.zeros(len(data), dtype=bool)

if t>0:
    splits[t] = True
    splits = recursive_split(data[:t], splits, z_thresh=z_thresh)
    splits = recursive_split(data[t:], splits, z_thresh=z_thresh)

return splits

#test data: 1 if doing X, 0 if not
#for the first 1000 pings we use a probability 0.1 of doing X, for the second 1000 we use a probability 0.2
test_data = np.concatenate((np.random.binomial(1, 0.1, size=1000),np.random.binomial(1, 0.2, size=1000)))

splits = recursive_split(test_data,[]) #should give you a split near the middle but instead we get a bunch of splits near the edges where the mean is noisy

1 Like

do we need to try to find split points? what is the drawback of using eg. mean_sd on data[T-one_week:T], rolling, to estimate the underlying data at every T? (not an argument, just trying to understand. i’m now super curious about the subfield of physics you’re in, and also find it funny that poisson is helping me figuring out poisson distributions)

The intuitive idea of the bootstrap is that if I want to know roughly what the error in something I calculate from the data is, I could just split the data in half and see if I get different answers when I only use half the data.

It’s possible the splitting idea is just me seeing everything as a nail, yeah. I guess you don’t need to automate detection of the places where the average seems to change hahaha, so plotting it is fine. I was a bit worried about how small the interval could get and still have the estimation of the error bar make sense. But I guess with the “automated” method that problem is still there.

For instance, we could split my 2000 “pings” into 40 “days” of 50 pings each and plot with the error bars generated this way. Result vs. reality:

Screen Shot 2021-04-20 at 2.44.08 PM

If we wanted to get smaller error bars we would have to combine together data into 2 days, etc. (Or use a running average of a wider window like you said maybe although I’m not sure about that)

I think this exercise makes it clear that to be able to reliably detect even a doubling of the time you spend on something you do need many pings.

The method I was trying to imitate is one for finding sudden jumps in the motion of particles, derived from here: Phys. Rev. Lett. 102, 088001 (2009) - Building Blocks of Dynamical Heterogeneities in Dense Granular Media

1 Like

I will play with my actual data and your code and report back with (anonymized :D) tags as examples. It helps that my tag frequency is 30m and I’ll go even more granular in the future.

That would kill me. Or I’d hurl the computer across the room! :slight_smile:

I get so annoyed by being pinged when I’m in the middle of doing something, since the popup steals focus immediately and so either captures what I’m typing or destroys the selection or motion that I was carefully doing with the mouse.

A goodly number of my tags include ‘ffs’…

I use tagtime web, so no focus stealing, and i reply to a lot of notifications with a voice command in my vocal assistant without having to physically move. It also helps me being present and being aware of what I’m doing. I’m already down to 20m :slight_smile:

1 Like