Documentation of aggregation methods would be nice

So as a result of the new triangular beeminding feature I was looking through the other aggregation methods to see what they were.

And honestly some of them I can guess, but some I have no idea. The earlier ones are relatively clear, but by the time you get to trimmean I’m not even sure I can guess. trimmean, uniqmean, and truemean are all more or less opaque terms to me here, I only can guess what triangle means because I’m the one that defined it, and I couldn’t even begin to guess what jolly is (well, I happen to know from googling that it’s the name of a user you implemented it for, but I don’t know what it does)

I know I’m doing the equivalent of looking under the car hood and complaining it’s not an intuitive interface, but it would be nice if the engine came with a manual. :slight_smile:

5 Likes

through a strange sequence of events i have actually found a list already public on the internet! but yeah, should definitely be public on, like, the actual site that’s using it :slight_smile:

2 Likes

Oh yes, I’d actually seen that list before and completely forgot that it was a thing. Thanks.

1 Like

Great point, @drmaciver, and nice work digging up the answer from Akratics Anonymous, @chelsea

UPDATE 2018: The following table gives the aggday functions. They take the list x of raw datapoints on a given day and return the aggregated datapoint that is plotted and counts toward the Beeminder goal. In the actual Python code this is a hash of lambda functions. And np is NumPy.

UPDATE 2019: I added cap1 and made it pseudocode instead of strictly Python.

last     : x[-1]   # only the most recent datapoint entered that day counts
first    : x[0]    # only the first datapoint entered that day counts
min      : min(x)  # count the min of the day (default for weightloss)
max      : max(x)  # just count the max datapoint
truemean : mean(x)          # use the mean of all datapoints
uniqmean : mean(deldups(x)) # mean of the unique values (shrug)
mean     :                  # alias for uniqmean
median   : median(x)        # has this ever been used for any goal ever?
mode     : mode(x)          # technically: median of the commonest values
trimmean : trim_mean(x, .1) # another way to discount outliers
sum      : sum(x)           # what Do More and Do Less goals use
jolly    : 1 if x else 0    # deprecated; now an alias for 'binary'
binary   : 1 if x else 0    # 1 iff there exist any datapoints
nonzero  : 1 if any([i!=0 for i in x]) else 0 # 1 iff any non-0 datapts
triangle : sum(x)*(sum(x)+1)/2            # blog.beeminder.com/triangle
square   : sum(x)^2                       # requested by DRMacIver
clocky   : clocky(x)   # sum of differences of pairs, HT @chipmanaged
count    : len(x)      # ignore values, just count number of datapoints
skatesum : min(rfin, sum(x))     # only count the daily min
cap1     : min(1, sum(x))        # the sum but capped at 1, HT @zedmango

def clocky(x):
  if len(x) % 2 != 0: x = x[:-1] # ignore last entry if unpaired
  return sum([end-start for [start,end] in partition(x,2,2)])

[Search keyword: “aggday docs”]

6 Likes

Would it be too much trouble to write a brief English description for each? I mean, I guess I could go looking through Numpy and SciPy docs, but it’s likely that I’m going to have to look up what “trim_mean” means every time.

Also, I think deldups is undefined. What constitutes a duplicate data point? Is it deduping data points with the same value, same comment, same timestamp, etc. ?

3 Likes

Done! Added blurbs to each. Most of these you’ll never realistically want. Some (like the mean of unique values, and trimmed mean; also “first”) are vestiges of convoluted weight loss rules we experimented with when Beeminder was brand new. (Before we finally concluded it was ok to just be generous and count your min weight of the day.)

As for deldups, it’s just the values themselves.

1 Like

You really should promote these more, like with a link right where you choose the function - that is a pretty easy uvi.

Actually I might have a need for Jolly on an upcoming goal (will have to wait for ifftf integration) it actually seems super useful but if I hadn’t stumpled over this I never would have known.

3 Likes

@tomjen, ping me for the IFTTT preview link. It seems to be working beautifully for the other people who are helping us beta test it. (And please publish any recipes you create with it. They won’t show up publicly till we launch the channel but they’ll help other beta testers.)

As for jolly aggregation, we made that for @jolly since he’s been so good to us that we’ll pretty much do anything he says, but I was never sure it was a good idea. It seems like a shame to only visualize such a pale shadow of the actual data being collected. I guess triangular beeminding also means you’re visualizing something quite different from the underlying data, though technically no information is actually lost in that case, since the function is invertible.

Definitely let us know if you try jolly beeminding. But I agree with @mary that autoratchet is probably a better way to accomplish this.

1 Like

Absolutely! I’ve found it very useful: https://www.beeminder.com/drtall/goals/median

3 Likes

I like my secret Jolly function… Although I have no idea of anyone other than me uses it…

2 Likes

I have some ideas planned for the jolly data type now that I know what it is.

3 Likes

Not to devalue Jolly’s contribution, but maybe you could add an alias to it called ‘binary’ which may be more indicative of what it does.

4 Likes

I can’t tell if you’re serious!

Good call. Done.

1 Like

I just added another aggday setting: count. It just counts the number of datapoints each day, ignoring their actual values. I’m beeminding Mark Forster’s productivity system called Final Version and wasn’t sure if I should beemind chains gotten through or total individual tasks. I decided that counting total tasks, with a datapoint for each chain giving the number of tasks in that chain, was the best of both worlds. Then if I change my mind later I can set aggday=count and beemind number of chains instead.

I have no idea if I’ll ever use that but since I came up with a reason to want it in theory I went ahead and added it!

2 Likes

I moved 5 posts to an existing topic: Complementary Tools: Mark Forster’s Final Version(s)

1 Like

I’d like to request a new aggday setting: ‘ceiling’ which I guess would be:

lambda x: ceil(x) # round the number up to the next integer.

I would use this on a goal for which the usual values are between 0 and 1 and I currently use ‘binary’ . But that does not allow me to report a zero value.

2 Likes

I think x is the list of data points, so I don’t know what it means to take the ceiling of a list.

If all you want is binary but where 0 elements don’t count, I think that is

lambda x: 1 if any(x) else 0

5 Likes

+1. I definitely would use “1 if any non-zero datapoints.” I would replace all my binary goals with it.

4 Likes

I’ve just realised that ‘binary’ checks for the presence of a datapoint, not for the presence of a non-zero datapoint.

Which interestingly means that it’s not binary at all: the aggday code will only ever return ‘1’ because it’ll only be called in contexts where datapoints exist. So maybe it should be renamed to ‘unary’, with ‘binary’ rewritten as something like @drtall’s proposal, or

'binary' : lambda x : 0 if ( np.sum(x) == 0 ) else 1


updated to reflect things that I’d skipped over in my suprise and haste.
updated again to retract the lousy code I wrote
and yet again to replace and redact my lousy code for overall coherence of the thread (:

3 Likes

Just FYI these two are not equivalent. Yours will return 0 if the data consists of [1, -1] whereas mine will return 1.

4 Likes