Revisiting "Red Yesterday"

Intuitively, what it means to derail on a Beeminder goal is that you’re in still in the red when your deadline hits. (In bright-red-staircase world you’d derail at the exact instant you cross the bright red line.)

There’s more background on this in a years-old forum thread and I wanted to think through a proposal from @zzq and try putting it in my own words.

Status quo: A Beeminder goal is deemed to be derailed if yesterday’s datapoint is in the red.

Advantages of that definition:

  1. It’s very simple
  2. It’s robust to delays in checking the graph status when the deadline hits
  3. It means that editing past data can cause you to derail.

That last one is a very mixed bag, of course. Sometimes that’s right and proper – you did how much did and that either kept you hewing to the bright red line or it didn’t. If it didn’t, then a derailment is correct. On the other hand, it’s usually surprising in a bad way when it happens. And maybe you just deleted an old datapoint because you’re moving its amount to a different day or something. Insta-derailing the moment one datapoint is deleted is not great in that scenario.

Only-the-first-check-counts proposal: A Beeminder goal is deemed to be derailed if yesterday’s datapoint is in the red AND this is the first time today that we’re checking whether it’s derailed.

That means a goal needs a timestamp field, initially set to “never”, for when we last checked its derailment status. Checking for derailment means checking that yesterday’s datapoint is red and that the time of the last check was before the most recent deadline.

Then comes a painful list of edge cases to deal with:

  1. The user changes the deadline [Update: we don’t currently let you change the deadline when you’re in the eking window but worth thinking about how that would play out [as well as less weird ways the deadline could change].]
  2. The user adds or edits data after the deadline but before Beeminder has checked the derailment status
  3. All the ways the existing code presumes that if a goal is derailed, the check for that will give a consistent answer until the goal changes, like by being rerailed.
  4. How confusing might it be if you end up in a situation where your graph shows you hopelessly below the red line but not derailed?
  5. Does any of this yield any temptation for fake data?

Those may or may not be hard or messy or dealbreakers. For 1, we could set the last-checked timestamp back to “never” if the deadline changes. Maybe edge case 2 is a non-issue. And for 3, the intended protocol is to immediately, atomic-ally rerail a goal the moment it derails, so that might not be an issue either. The answer to 4 might be “quite” but still less than accidental insta-derails.

We could even choose a compromise on advantage 3 of the status quo. Maybe any changes to your data in the past week trigger a reset of the last-checked time to “never”. So editing recent data could derail you but 1 week becomes like a statute of limitations.

2 Likes

I feel like this is one of those things that is a bit weird, but surely doesnt trip up “normal” users much. If you try to edit a past datapoint you get the big ol’ warning. This has been the source of me accidentally derailing myself a few times with buggy scripts but I mean, if you are doing that you are not a normal user and you can figure it out.

1 Like

I don’t think 4 is a problem with this relative to the status quo—both now and in this proposal, one can be for up to 24 hours hopelessly in the red. In the status quo that would require something like a large negative datapoint today, and in this proposal the same state of affairs could come about by deleting old data.

This isn’t just about eking. The user changing the deadline means that the “beeminder day” may be more (or less) than 24 hours. You shouldn’t literally just check '“24 hours” (rather than calendar day) anyway because of DST, but if you don’t save when the original (changed) deadline was previously, you lose track of the start of the deadline-based-calendar-day for that goal.

(By “deadline-based-calender-day” I mean the span of time from one deadline occurrence to the next, which in beeminder world is often represented with a daystamp. Come to think of it, a refined version of my proposal would be to store a daystamp rather than a timestamp. Storing the timestamp of the last derailment check has the issue described above, but saving the daystamp doesn’t. So never mind, it’s not an issue.)

2 Likes

Another “legal” way for how a goal’s “calendar day” can be longer than 24 hours is when the user is moving westward through time zones. What is the chance that the user will be tweaking some past datapoints on the exact day when they traveled west and switched the system-wide time zone? Minimal but never zero :slight_smile: I am not suggesting that there should be any exception created for such a situation like this or the case of DST that @zzq mentioned - rather to support the idea that if this is implemented, there should be a generalized solution that takes into account all this potential weirdness.

In the response to the general idea, my first thought was to consider whether this messes up with the anti-magic principle: but probably not; the status quo is just something that we are currently used to [*] and then we’ll get used to the new order.

[*] Do I sometimes reorganize datapoints in the past? Yes, I do, but I have to remember to add first and then to subtract, so as not to trigger the derailment check, and that was easy enough that I haven’t contacted support for accidentally derailing in such a situation in a long-long time. (Every time I travel westward though…. keeping in mind that it’s already beeminder midnight at 4 p.m. is so hard for a jet-lagged brain.)

2 Likes

Very much agreed. I’m strongly in favor of thinking only in the abstraction of “beeminder days”, the time between two occurrences of the goal’s deadline, considered as an atomic unit (and represented by a daystamp.)

The core idea, as I see it, of this proposal is to strengthen this abstraction by making each such “day” be atomic: do whatever changes you like, within the bounds of the day, and so long as by the end of it you’re not in the red, you are OK. That’s how it currently works for adding new data! So long as you add your day’s datapoints by the deadline, everything is fine, and it doesn’t matter what data you added (or removed) for today along the way, in whatever order—only the final, aggregated, total for the day is taken into account to see if you made it. The way I see it, it’s weird that this within-day atomicity only applies to adding (or editing, or removing) data for the current day. It feels like a magical special-case! It’s a much simpler conceptual model for derailments to be checked once per day, at the deadline. (Or very slightly after the deadline—that complicates things a bit, mental-model-wise, but it’s an unavoidable technical limitation.)

2 Likes

Does this happen often? And do you get a warning when you try to delete a datapoint that would cause you to derail? (I don’t want to check!)

They can talk to support if it really was a mistake, so it seems not worth the tradeoffs unless there are enough people quietly quitting Beeminder when this happens.

You get a warning to this effect when deleting any datapoint, so perhaps people just stop paying attention to this warning, as it is not specific to only past datapoints that will bring you to being in the red as of yesterday.

2 Likes

Then I think the real solution is to fix whatever is wrong with the warnings: either removing them from non-fatal deletions, making fatal deletions more distinct, or both.

Yeah, @shanaqui might be able to give a better sense of how much of a priority this should actually be.

Agreed. I edited my post again to be clearer here. A deadline change in the eking window is just one more theoretical/potential edge case – maybe just a sanity check. And I think you’re right that using daystamps instead of timestamps routes around all such potential confusion elegantly. (Thank you again!) Just one new question: how does the daystamp version generalize to the bright red staircase world? I know we’re not exactly racing towards that new world order but I don’t want to back us into any more corners that will make it harder.

Amen. But @zzq might be convincing me that only-the-first-check-counts makes everything more consistent. On the other hand, we’re talking about a new database field and a more complicated conditional, so the burden of proof that this is worth it is high. See “worse is better”.

Interestingly, I’m loath to do this because I’m paranoid about it sort of implicitly condoning wrong data. The general warning is fine but the specific one could read as, like, “are you sure that that wrong datapoint isn’t actually correct, wink-wink?”

Interestingly, I’m loath to do this because I’m paranoid about it sort of implicitly condoning wrong data. The general warning is fine but the specific one could read as, like, “are you sure that that wrong datapoint isn’t actually correct, wink-wink?”

Hmm, would a time-limited “undo” button be feasible and solve that problem? No up-front friction, but a chance to fix it if it was a genuine mistake.

If I’m following back correctly, this is about editing past datapoints and thus derailing, right?

Since we added more warnings, it’s become less of a problem. People do seem to generally expect though that they won’t derail immediately if they change past data, and will have until their next deadline to fix it. I think making that the norm would be very useful, if it can be swung.

4 Likes

I’ve definitely had to ping support when I derail unexpectedly while fixing old data; usually, as @dreev says, while trying to fix dates and doing it in the wrong order, but also sometimes when my autoratchet “ate” buffer that would have kept me from derailing during the period that had misplaced data (fwiw, most of my misplaced data is due to the fact that the webpage doesn’t reload automatically at newday, so sometimes I end up adding a bunch of new datapoints on yesterday and might not notice for a while).

I absolutely ignore the standard warning window because it’s the functional equivalent to the “are you sure you want to move this item to the trash?” popup, and I’ve long wished for either an undo or a data-specific warning (“THIS datapoint will derail you”). Ideally I’d like something like the graphical graph editor, where you make your changes but don’t actually apply them until you’re happy with the outcome.

4 Likes

I agree that the more complicated conditional is a real burden that must be justified. I do think that a new database field should be somewhat less a big deal than you guys seem to—all the magic strings stuff (in-band signalling) is a torturous and roundabout way to avoid adding a few new db fields. But never mind.

Answer: it doesn’t. This is a feature about allowing you to be temporarily below the red line (each day until the deadline at end of day)—that’s what the bright red staircase seeks to remove.

More generally: daystamps are the way Beeminder refers to the discrete atomic unit of time in which data is aggregated and by the end of which one must be above the bright red line, and in the continuous world of the bright red staircase that is meaningless.

Note that this applies everywhere that Beeminder currently uses daystamps, not just here.

To illustrate: how will people enter retroactive data in the bright red staircase world? It’s a bit tricky to design an interface to do so in a pareto-improvement way relative to the present. As it stands, adding a datapoint for yesterday is trivial—you choose “Yesterday” from the dropdown or enter the day number in the advanced entry.

In the bright red staircase world, however, “yesterday” isn’t one thing. Would you have to enter a time in addition to the day you want to add the datapoint for? That’s extra work for a user with day-sized steps. And you might not remember exactly what time you did the thing yesterday (and making stuff up is a slippery slope.) And you can’t have a simplified interface where you don’t enter the time if it doesn’t matter, because what if the user edits the road afterwards, in a way which might or might not cause a retroactive derailment, depending on what time of day the datapoint should be associated with?

(And I’ve posed this before, but it’s relevant here too: how does aggday work in the world of the bright red staircase? It basically can’t, because there’s no discrete day unit to aggregate over.)

It is unfortunate that beebrain is an asynchronous separate system. The ideal, I think, would be to simply reject edits that cause immediate derailment. The uncle button would be the canonical way to intentionally and immediately derail.

Second best would be a preview system for editing/deleting past data, possibly integrated with the graph editor. You make your edits, preview the effect on the graph, then commit them. An interface like that, and keeping the “red yesterday” derailment check, would likely be better than my “only the first check counts” proposal in practice.

2 Likes

We’re probably going overboard with it right now, but there are advantages. Like how in this other thread just now it was trivial to just add a new visual graph feature for marking when graphs were archived and try it out, without touching the database or any code except the graph generator itself. And it gives the user a ton of flexibility without any new UI or settings. We can experiment with zero impact on users who are blissfully unaware of the magic strings. (Unless an unsuspecting user stumbles on a magic string somehow – but it’s kind of part of the definition that the strings are chosen so that won’t happen.)

But now I’m defending it too hard. With better architecture, better UI organization, less tech debt, etc, I suspect you’d be right that more structured metadata for some of the things we’re using magic strings for (derails, restarts, and, just now, archives) would be better. Self-destructing datapoints and taring feel like they lend themselves to magic strings nicely. Anyway, we’re getting off-topic! (Totally happy to debate magic strings more in a new thread, or the comments of the recent blog posts – about magic strings for in-band signaling generally or the specific magic strings we’ve made official so far.)

Multiple very good points here about how difficult the transition to the bright red staircase will be. I do think each of the challenges you’ve identified are solvable. Quick reactions:

Retroactive data: I think a convention like noon or 12:01am or the current time but on the specified day can work. Metadata could record the fact that an exact timestamp wasn’t specified. The datapoint could even be shown with subtle horizontal error bars to indicate that all we know is that it happened in this particular 24-hour window. You’re right that the question of whether a retroactive datapoint happened before or after a critical stairstep is… critical. But maybe by choosing “yesterday” you’re implicitly vouching that it was early enough in that 24-hour window to have cleared the relevant stairstep. That’s what it means in the status quo and I think that generalizes to staircase world.

(Also, I think even in staircase world, you want a notion of a daily deadline, just that it’s a more flexible concept – like you can also set hourly deadlines, weekly deadlines, whatever. But that daily deadline, if you have one, can be used for the definition of “yesterday” so that everything works out, in the common case, like the status quo.)

Aggday: I think there’s always a logical translation to staircase world. Aggday=sum is just tracking a cumulative total, which works just as well if no datapoints are on top of each other. Aggday=last, same thing. Aggday=min for weight loss … is the translation here is that the red line wants to be a sawtooth shape that allows a higher weight in the morning or evening?

(Another approach is to say that aggdays apply the same as always, but specifically to points with the exact same timestamp. And then, in the weirder cases where this matters (not do-more or do-less or odometer goals, I don’t think), you can give all your datapoints that you want to be aggregated each day a timestamp of exactly noon today. Or mark subsequent datapoints as being in the same batch as some previous datapoint? I think there’s a way to have the best of all worlds. Or we bite the bullet on everything being continuous rather than discrete and accept some tradeoffs. I’m not at all sure which philosophy is best. I do like the Pareto Dominance Principle, but I also like worse-is-better.)

[shakes self out of reverie] Another fun digression! But we should return to the slightly more immediate question of whether to change “red yesterday”.

I think the answer to my original question is that checking for derailment is straightforward in staircase world: When we look at the graph at any time t, is there any point earlier than t [optionally: and more recently than a week ago, or whatever statute of limitations] at which the data crossed the red line? If so, the graph is derailed. Compute the most recent intersection of user data and the bright red line and insert the post-derail respite. (See also the sidebar in the uncle button post about post-derail respite in staircase world.)

1 Like

Yeah, I don’t want to be too harsh about this. The advantages are real. And the disadvantages partially overlap with the existing disadvantages of the database you’re using (MongoDB)—either way, you wouldn’t be getting the full advantages of a well-structured and principled database schema.

In short, I fully agree with you here:

I’m not entirely convinced by your quick reactions to the challenges I’ve pointed out about the bright red staircase. I do understand they are just quick reactions; but they are very much ad-hoc, unprincipled. It is in response to proposals like this where the anti-magic principle is at its most necessary, I think. “Conventions” like 12:01 am scare me, especially if that’s going to be baked into code as “magic”.

A truly generalized solution where datapoints may be associated with either points in time or time ranges (representing uncertainty) could work, maybe—but that deserves more thought as to how to do so in a principled, unmagical way.

Also consider a case where a user decides that e.g. they want a daily deadline each day at 3am, except on Wednesdays, where they want hourly deadlines each hour on the hour. How does the interface work for them? Does it change on Thursdays, with “Yesterday” grayed out—or does selecting “Yesterday” work even on Thursdays, somehow? In some principled way, which works not just in this one case, but no matter what? (And what about someone who chooses a continuous road? Some descriptions of the vision of the bright red staircase imply that would eventually a very standard use case. (Maybe even the default for new goals, eventually?))

The notion of aggday is very much dependent on the current (pre-bright-red-staircase) way of thinking about days as a bundle of datapoints to be aggregated. Probably you’d want a different abstraction here entirely. As you say, overall aggregation (not daily aggregation) works equivalently for some, such as sum and last.

The real issue is with aggdays like triangle. (Or square, or sqrt, or mean, truemean, median, mode, etc. Or min or max.) These are explicitly about lumping a discrete collection of datapoints into one. That works well in the current model, and is pretty solidly at odds with a continuous model.

Note that this is more than a bit magical, and rather at odds with the most standard use of e.g. triangle for stuff like alcohol consumption—it’s very natural to add a datapoint for each drink you imbibe as you go, which gives them different timestamps. If you have to go back and manually edit stuff to get it to line up… that’s annoying and fiddly, very much dependent on arbitrary and undiscoverable rules (“magic”), and worst of all, it’s fake data. You want a graph that also serves as a Quantifiable Self record of the alcohol you drank, with timestamps—and it’s very much not great if Beeminder makes you falsify the timestamps to meet its arbitrary rules about aggregation.

Maybe. I think that there’s probably a way to rethink the model of aggregation in a completely different way, from first principles, that fits better into the continuous world.

Yeah, agreed. Consider also a hypothetical “red today” world (which I’m not actually recommending you implement, but imagine for logical completeness)—a very simple derailment model, in today’s discrete world, where if you’re ever in the red, you immediately derail.

That’s almost equivalent to the existing “red yesterday”, just with a different meaning to safety buffer. If you adjusted all graphs to add one day of safety buffer, then switched to the “red today” rule, it would make no difference to most goals. (Except—if you add a negative datapoint putting you in the red, then later the same day a positive datapoint getting you out of the red (or the opposite on a do-less goal) that doesn’t derail you as is, but would in this proposal.)

Why do we have “red yesterday” and not this hypothetical “red today”? I think it’s for historical reasons, an artifact left over after the migration from the yellow brick road model. That’s fine, maybe not worth fixing, but worth thinking about if we’re trying to understand what derailment model we want. If the “red yesterday” rule really is a historical, path-dependent artifact, we probably should put less focus on preserving specifically it.

One could imagine a shift to “red today” as a first step towards the continuous model you described for the staircase world. If so, it would obviate this whole topic of discussion here.

2 Likes

Excellent points and excellent discussion. Especially gratifying how on board with the anti-magic principle you are these days (possibly we still prefer slightly different versions of it but we seem to be on the same page for present purposes).

New idea: what if flat stairsteps were the proper generalization of “day”? In the common or default case, with a daily deadline, everything would match traditional Beeminder. A new monkey wrench here is breaks. Like if you added a flat spot for the weekend, we’d be surprised if aggday (hypothetically aggstairstep) suddenly aggregrated the whole weekend. And True Breaks instead of flat spots doesn’t help…

I guess fundamentally aggday just needs a corresponding definition of “day” and that this is orthogonal to whatever the red line is doing. Maybe your bright red stairsteps jump up at 5pm every day but for your weight graph you want to plot the mean of all the readings you got between midnight and midnight every day. Maybe you get a warning when those are out of sync?

As for “Red Today”, one clarification: Red should continue to mean “derailing within 24 hours”. That’s just a question of color-coding the countdown timer and applies equally well in staircase world.

But what you mean is that we could do a one-time shift of everyone’s bright red lines and say that instead of derailing if you don’t get on the good side of the line by the deadline, you derail the moment you cross the red line. Just rhetorically I do like that much better.

And then you’re right that the derailment criterion would become “are you, at this exact moment, on the wrong side of the red line?”. Good catch on one potential downside of that – no longer being able to have a negative datapoint temporarily putting you in the red below the red line.

I have come to understand why you like it so much—it seems that without it you find yourself liable to go off in some pretty wild directions! All my previous pushback was under the seemingly mistaken assumption that without it, mere common sense would keep things reasonable. But it turns out that we’re starting from very different points as to what’s considered common sense; or alternatively that you simply named an aspect of common sense with that name.

This doesn’t handle continuous (that is, stairstepless) roads, and supporting them was I think a key part of the elegance that makes bright red staircase be a sensible model. That is, even if not everyone uses continuous roads, a stairstep is modeled as a “break” in a by-default-continuous road. Even if the road is entirely such breaks forming steps, behind the scenes it is a continuous road with breaks. So disregarding or not supporting actual continuous roads seems not great.

(But I suppose you could say that it’s only aggday that isn’t supported for continuous roads—or on continuous segments of roads which have some stairsteps and some continuous segments. That makes aggday a less-than-fully-supported feature, but maybe that’s fine, or the least-bad option.)

If your midnight-to-midnight mean is ascribed to (for instance) the day whose deadline ends at the 5pm during that midnight-to-midnight period, then that’s very similar to having a midnight deadline. Not exactly, I guess: it means you have to get the average over the threshold by 5pm, and keep it above it as you add more datapoints during the 5pm to midnight period. That’s kind of weird, and probably not what anybody wants.

Yes, of course, you’re right. I was implicitly shifting over the colors by one when describing it that way. My bad.

Right, yeah, that’s what I meant. It certainly aligns more with the modern Beeminder way of thinking about things (as compared to the yellow-brick-road-Beeminder way of thinking about things)—you cross the line, you derail. No ambiguity, no lanes to the yellow brick road and no wiggle room where you go a bit past the line and then back by end of day so you’re alright.

That’s probably true compared to you, and, as we talk about on the blog a lot, we had some wildly embarrassing anti-magic violations in the early days of Beeminder. But I now have a solid refutation of the otherwise plausible theory that the anti-magic principle is idiosyncratic. Namely, LLM code generators violate it egregiously and when I point them to the anti-magic post they get it and do much better. (I see it with humans all the time too. It’s just too natural to reach for an if-statement when you see wrong thing X happening in scenario Y.)

Sounds like we’re on the same page about the object-level stuff. Thanks again for the help in thinking this through.

New aggday idea: For every datapoint p = (t,v) the aggday function takes the list of datapoints starting at t minus 24 hours culminating in p. So irrespective of the stairsteps on the bright red line. The default aggday (aka no aggday) just returns p.

I can see that working well for weight loss using aggday=min. But I’m not sure how to map a scheme like that to use cases like “give me a 1 if I have any nonzero datapoints on a given day and plot those 1’s cumulatively”.

1 Like