Progress Report on Yellow Brick Half-Plane

Content warning: Way more nitty-gritty than an end user of Beeminder could reasonably care about. But maybe you unreasonably care about it? Either way, Beeminder is going to be way better for everyone when this is done.

Background: Yellow Brick Whatnow?

To start at the very beginning… The Yellow Brick Road is the path on your graph that starts at the current value of the metric you’re minding and ends at where you want it to be, or perhaps soars upward indefinitely. (Related: Why Beeminder likes cumulative graphs.) Historically the yellow brick road has been defined in terms of the centerline and then so-called lanes on each side of that. The lane on the good side of the road we called being in the blue and represented two days of safety buffer. On the bad side of the centerline was being in the orange which was one day of buffer. (If you’re thinking of a standard do-more goal then “good” means above the road and “bad” below, but this is reversed for do-less or weight loss goals.)

That turned out to be a horrible design choice, brutally complicating all sorts of things that should be simple, like computing how much safety buffer you have, especially when the slope of the road changes.

Yellow Brick Half-Plane means scrapping the concept of lanes. Instead of defining the road in terms of the centerline, it’s defined in terms of the critical edge. Then you can still have colored zones defined in terms of the number of safe days. This has a million implications for the implementation but the primary user-visible difference will be that the thing you’re editing with the road editor is the one bright line that you can’t end the day on the wrong side of.

Status

We made a checklist of 9 hairy infrastructural changes to Beeminder before we can make the transition to Yellow Brick Half-Plane (YBHP) and we’re now more than halfway through it.

This is also intertwined with porting Beebrain – that’s the module that takes your datapoints and all your goal settings and computes the myriad statistics and draws the actual graph – from Python to Javascript. So here’s the list:

  1. Die timezones. (Beeminder-proper should deal with all the timezone logic and just send datapoints to Beebrain specified as daystamps.)
  2. Die asof-null. (Beeminder-proper tells Beebrain explicitly the “as of” date for drawing the graph so we can purge the code in Beebrain that would infer it and so Beeminder-proper doesn’t have to worry about when Beebrain generated a graph or anything.)
  3. Die edgy. (This was such a clusterf*ck I don’t know where to begin. Ok, let me put some old notes in a footnote for posterity so we can marvel at the absurdity [1] and then move on!)
  4. Die reset. (Beebrain used to handle the convoluted logic of hard resets on goals that would hide old data and reset where the yellow brick road started. Nowadays you just zoom in if you don’t want to see old data but there’s fundamentally one road and one continuous dataset (cf UVI#2468).)
  5. Die inferred tini/vini. (Beebrain used to be in charge of inferring the start of the yellow brick road based on your first datapoint. Now Beeminder-proper fully specifies the yellow brick road (cf UVI#2405).)
  6. Die weight loss leniency. (More convoluted logic that should never have been in Beebrain’s purview (cf UVI#2475).)
  7. Die noisy width and auto-widening. (Now blogged; see also what I said in a beemail [2] (cf UVI#2474 and UVI#3271).)
  8. Die exponential roads. (These can be perfectly well approximated by piecewise linear functions. Weekly or monthly changes in the linear slope of the yellow brick road is always good enough. It was fun to write the code that did this all exquisitely correctly but, as they say, kill your darlings!)
  9. Die lanes!

It’s been a long road, so to speak, but the light is at the end of the tunnel. I was just hashing out the latest design decisions for #7 with @bee and we thought we’d try doing it in public in a forum thread so, here we are!

PS: HT Kevin Lochner who, years ago, first said “Isn’t this yellow brick road more of a yellow brick half-plane?” to which we probably said “kind of but that sounds stupid” and HT @insti for articulating the problem with lanes as we’d implemented them and suggesting the theoretically elegant solution of defining the road in terms of the critical edge. Last but not least, HT @kenoubi for a putting a bounty on this.

Footnotes

[1] Notes on the “edgy” parameter for posterity:

Click if you must

Suppose you start a pushups goal of 3/day but want the first 3 due on the first day. This means starting the centerline of the yellow brick road above the initial datapoint so the initial datapoint starts on the edge of the road. So an “edgy” goal is one where the initial point is on the bottom (or top, depending on “yaw”) edge of the road. That means moving the start of the road up (or down) by one lane width compared to where it would be if it started at the initial datapoint. But we have a chicken and egg problem: We need the road function in order to compute the lane width and we need the initial point, (tini,vini), to compute the road function. In other words, we need (tini,vini) to compute the lane width but we need the lane width to compute (tini,vini), to shift it.

Solution (sort of): The lane width is generally equal to the daily rate of the road. So we can have the road start at the initial datapoint but one day earlier. Ie, (tini,vini) -= (86400,0). Then when we know the road function we can move tini forward a day and vini by the appropriate amount, vini = rdf(tini+86400).

The only problem with this approach is that it only puts the initial datapoint at the edge if the rate of the first segment of the road is the same as the rate of the most recent datapoint, since it’s the rate there that determines the overall lane width. Trying to do this Really Right gets very messy or even impossible without introducing worse problems than the initial point not being on the actual edge.

I think the right solution to the ‘edgy’ confusion is to get rid of the edgy parameter – goals are always edgy. To get the edgy=false behavior, just use the road matrix to specify one initial flat day. And in fact goals should always have an initial road row that says rate 0 up till yesterday or today (because otherwise if you add a datapoint before the first datapoint then you’ll make the road change).

[later] Bethany has now convinced me that edginess should not be in Beebrain’s purview. The start of the road is always given explicitly by tini and vini. When you create a goal that should be edgy you know the initial rate (eg, the generous estimate that the user provides for Set-A-Limit [now do-less] goals) so you can can bump vini up by that amount. So we’re going to do that! Phew!


[2] Excerpt from daily beemail: We’re killing off auto-widening yellow brick roads (finally!) and instead asking you explicitly when you create the goal what your max daily weight fluctuation is. Then we actually use a zero-width road but starting enough above your initial datapoint to account for your fluctuations.

In some ways this is a temporary step backwards but our intent is to end up with the best of both worlds where there’s a fixed bright line you have to stay below but we do the same data analysis to give you warnings about your proximity to that line that are at least as good as the current so-called lanes are.

And let me not mince words (I guess that’s easy when I’m only castigating my past self!): I now feel my entire auto-widening concept was deeply wrongheaded and I’m pretty embarrassed at the amount of effort I spent rationalizing it!

5 Likes

re #7

It seems like making them all zero width and moving the roads to be at current critical edge matches the status quo, except for in the case of ppl currently in the blue, but we’re removing the “can’t lose tomorrow if you’re in the blue” guarantee anyway, so… not such a big deal…

If we want to be nicer than “this matches the current status quo” we could give ppl in the orange extra buffer (e.g. by moving the road to be at vcur + dflux), but that’s optional.

So this will involve adjusting roads in some as-yet undetermined manner.

And sending out an email to all affected because things will look very different.

Possibly we want to write a blog post about it to point people to in the email where we do the bulk of our justifying.

1 Like

For #6 – weight loss leniency (WLL) – this is done for all new goals but before we can rip the relevant code out of Beebrain we have to transition existing goals. Here’s the relevant Beebrain code:

d = [(t,v) for (t,v) in data if t >= tini] # aggday'ing has already happened
vini = yaw*min(yaw*v for v in [vini]+[v for (t,v) in d[0:min(7,len(data))]])

In words, assuming weight loss though this applies to any goal with noisy set to true, we retroactively shift the start of the yellow brick road up high enough to match the max of the first 7 datapoints.

As far as I can tell we never explicitly promised the user that we’d do that leniency adjustment so I don’t think we need to perfectly match it in the new world order. (The new world order, btw, means an explicit dflux parameter for maximum daily fluctuation and the road just starts at dflux above your initial datapoint.) Rather, let’s try a heuristic:

Set dflux to be max - min of your datapoints so far, and not less than 1% of your current weight.

Related question for @bee: Did we decide how to handle the case of someone putting in a completely unrealistic dflux?

Nope. We only check to make sure it’s a number.

Basically, what we’re doing with the variance right now is asking for your starting safety buffer, just by a different name. Anyhow, communicating with users is difficult. If someone wants to use a goal just for tracking and wants to start with a huge buffer above their current weight, they should be able to do that (and can), so then it becomes hard to tell the difference between intentionally weird numbers and unintentionally weird ones.

But that realization just now that what we are doing is analogous to starting safety buffer is maybe useful? I mean, it’s kind of a hard ask: “how much does your weight fluctuate”… I’m not sure where I’m going with this. Maybe there’s not any improvement to be made by framing it in terms of safety buffer because we don’t know ahead of time how many grams will be necessary to give them the requested safety buffer.

Ok, maybe “safety margin” could be a useful term? still asking the user for a weight, but calling it the safety margin…

2 Likes

Let’s see if we can justify/rationalize just setting the razor road (we’re calling zero-width roads “razor roads” now) at what’s currently the critical edge.

To be clear, this is only for existing goals we need to transition – new goals all have razor roads already, with explicitly chosen dfluxes.

So as you say, the one case where just setting a razor road at the current critical edge is technically harsher than the status quo is if you’re in the blue, close to the centerline, and the road is narrow so far but then you eat a horse and leap into the red the next day. I think we can solve this well enough by generously choosing dflux for existing goals.

In all of the following cases, after determining dflux, we set the razor road to shift up by dflux (technically subtract yaw*dflux so it shifts in the good/generous direction). If dflux is set to the lane width (lnw) that means setting the razor road to what’s currently the critical edge.

Cases:

  1. Frozen goal
  2. Active goal in the green with recent data
  3. Active goal in the green w/o recent data
  4. Active goal in the blue with recent data
  5. Active goal in the blue w/o recent data
  6. Active goal in the orange or red with recent data
  7. Active goal in the orange or red w/o recent data

For case 4 there’s an argument for coming up with a more generous dflux. For case 6, dflux=lnw is what the status quo does but it could be nice of us to add some leniency as part of this transition. For all other cases, dflux=lnw seems uncontroversial.

Question for @bee: How many goals are in cases 4 and 6? Maybe we could eyeball them all or look at enough of them to pick a heuristic for a more generous dflux?

[talking out loud ensues]

DONE item for @bee: Every existing weight-loss/noisy goal needs an explicit dflux. Beebrain computes a nice one based on your data so put that in the database. It’s what we’ll use when explicitly restarting a frozen goal as well.

active noisy goals with a datapoint in the last week:

  • blue => 78
  • orange => 73
  • red => 10

opening up that window slightly and looking at noisy goals with a datapoint added in the last 3 weeks:

  • blue => 97
  • orange => 107
  • red => 12
1 Like

So the concern here is that ppl who haven’t collected 7 datapoints yet might be getting an unfair deal if we just stop doing WLL fullstop?

I have the sense that we can probably just wait, say, 2 weeks, and then end WLL fullstop.

There’s basically two classes of goals where the WLL clause hasn’t already applied:

People who only just started beeminding, and are actively engaging with the goal. These people will benefit from the WLL thing. But their number is honestly small, and we can just wait two weeks and then they’re covered.

The other set, the majority, are moribund goals. They’re things where conditions are such that they were able to just ignore the goal indefinitely / for long enough to just forget about it. Either we let them set it up with a flat road, and then they never weighed in again, or they weighed in a small number of times (but more than 1) putting them way below the road, or there’s some other kind of error with road setup (positive rate, etc). Anyhow, the point is that they haven’t entered more than 7 days worth of data because they’ve just abandoned the road.

In this case, just fixing the starting point of the road where the current starting point is and never adjusting doesn’t seem like a problem. And in fact, in some of the cases I’ve seen it seems like doing the WLL starting point adjustment would actually be fairly confusing and wrong-headed.

1 Like

Agreed!

So it kind of sounds like we can just wait 2 more weeks and then rip WLL out of Beebrain and we’re done with #6?

Although: however long we wait, it’s possible for someone to have been inactive from their first datapoint until a couple days ago. Like it’s been a month since they started their goal but they only actually have a few datapoints, all but the first recent. So I think we need to just manually intercede to make those dfluxes reasonable. The longer we wait the fewer of them there will be but the number may already be plenty small and we could do this now.

(But now I’m confused about the mechanics of this. @bee, when Beebrain decides to up and change vini, as in WLL, how does that make its way back to the database?)

At the moment it doesn’t, really. The goal thinks its vini is A and keeps sending that to beebrain, and beebrain keeps ignoring it and deciding to use B instead. Beebrain does return that to the goal in its output, so the actually used vini is stored in the bb field. I did one update query already that set vini = bb.vini for noisy goals where they differed. I’ll do that again just before you deploy a new non-WLL beebrain.

1 Like

I’m adding the dflux field to the db, and storing what the user enters when they set up the goal there. Possibly it shouldn’t be called dflux, though, because it is not the nicely computed thing that bbrain does.

1 Like

Proposal: What Beebrain computes is stdflux because it’s like standard deviation, computed from data so far. What the user specifies is maxflux – max daily fluctuation.

DONE: Refactor dflux in beebrain to stdflux
DONE: Beeminder’s attr is called maxflux.

Meta: We’ll edit to-do items like above to say “DONE” when they’re done, so we can grep this thread for to-dos in the usual way.

Plan for finishing up #7 (die noisy param and auto-widening):

  1. Big picture is we need to obliterate the “noisy” field
  2. Probably make all noisy graphs razor roads
  3. For most cases we can just set razor road = existing critical edge
  4. But there are a couple cases where we’ll eyeball the graphs PLN
  5. The number of graphs goes down the longer we wait but maybe it’s already a perfectly manageable number
  6. Email everyone whose graph we’re changing. Those emails (even when slightly smarmbotty, though we’ll make sure not to cross that line) really impress people and lead to good feedback and insight for us. (This is us psyching ourselves up for this because emailing people has a massive ugh field around it for us. Speaking of which, it may be time to try Intercom.io which supposedly makes it much easier to email arbitrary sets of users.)
  7. [DONE] Pretty urgent bugfix: restarting frozen/archived noisy goals need to not restart you on the razor’s edge
  8. Open question how exactly recommits should work. Arguably razor’s edge but with flat spot is ok. For weight loss you presumably derailed due to an up-fluctuation so maybe you don’t need additional margin, just the flat post? This might vary a lot across users though so this is pretty tricky. Let’s say we start with making sure we’re pareto-dominating the status quo. Maybe that means recommits happen at bb:stdflux above the current datapoint?

PLN: better plan

Do we really want to make all existing noisy graphs razor roads? An alternate, lazier solution would be to do an update that sets the abslnw = current lnw and then be done with it.

There’d be no more growing or shrinking, but it wouldn’t look like a massive change – save that for later when we’ve got true YBHP. Then we don’t have to do transforms on road matrices. (Well, not yet. We’ll have to transform road matrices to change to YBHP anyway, so might as well delay for now and change status quo as little as possible for existing noisy folks. For now.)

1 Like

Yeah, I agree. In any case that’s a good first step. We can take the next step later, but you’re right, that’s more part of death-to-lanes than death-to-noisy/auto-widening.

Thanks for the update! I am still very excited about this! And this update is well timed since I am just on the verge of re-ramping up my Beeminder usage after taking a year off (which has included a lot of re-remembering Beeminder’s quirks including this one).

1 Like

Ok, so tentative plan going forward:

Weightloss Leniency

  • [DONE] Danny kills it in Beebrain
  • [DONE] lets Bee know just prior to deploying updated beebrain
  • [DONE] Bee does a final DB update to set goal.vini = bb[:vini]

Existing Noisy Roads

  • write a blog post justifying / explaining our reasoning for making the change, and laying out clear expectations about what will change for ppl with currently noisy goals.
  • auto-email all those ppl to let them know that we’re making a change [solicit feedback?]
  • shortly thereafter run a DB update that sets goal.abslnw = bb[:lnw] (~ or possibly bb[:dflux] if that happens to be more generous?)
  • Danny rips noisy out of beebrain
  • Do I need to stop sending noisy to beebrain as well?

Am I missing things?

2 Likes

You’re satisfied on the question of how db.vini could ever not equal bb.vini?

Also we’re killing “dflux” and replacing with “stdflux” in Beebrain and “maxflux” in Rails. [DONE]

Do we maybe need a way to recognize weightloss goals even if the goal type is changed to custom? Right now “noisy” basically tells us this, but without noisy then inbox goals and weight loss goals are differentiable I guess only by the lane width being set (either to 0 or to dflux based on if it is an existing goal vs new/restarted goal). Is that weird? a problem? Well, by those things, and by the goal type, but users w/access to custom goals can change the goal type.

2 Likes

I’d say yes. In general I think there’s huge value in having much more structured data about people’s goals. Similarly I think it’s worth gathering more information on users too, for Science. For goals, it matters both for Science and for Engineering. Like right now we don’t have a good way to know how many pushups goals are out there and we’d really like to know that when building a goal creation wizard more like StickK’s.

In conclusion, as we’re hacking away on transitioning these goals, don’t lose the information that they used to have noisy set to true.

1 Like

This thread is the coolest.

2 Likes