FAQ (fairness, to-do lists, ...)


A couple questions that have come up chatting with some of you…

(I figure we can collect questions here and turn this into an FAQ. So I’m wikifying this post. If you have an edit to make but it doesn’t let you, let me know. I’m interested to learn how the permissions work for that kind of thing!)

1. Is it fair that my reliability seems lower only because I commit to harder things? Or: Is it fair to inflate your supposed reliability by only ever committing to easy things?

The rule I use is that if I utter “I will” or “I’ll” or “I’m going to” then I must log a commitment. It doesn’t matter how trivial. The only thing the reliability score is meant to measure is “of the times you say you’ll do something, what fraction of the time do you do it?”. If you make sure to only say you’ll do easy things and get a 100%, that’s perfectly allowed. If you want to get yourself to do more hard things, you probably want to use Beeminder. is just about ensuring that your words match your actions. If you do that purely by adjusting your words, that’s still a win. But you may find that is a powerful commitment device too, once you have a reliability score you’re invested in.

Maybe it still feels like apples and oranges if you commit to very different things than someone else. But the important comparison isn’t between different people. It’s from the perspective of the person you’re committing to right now, to compare with your commitments in the past. When Alice tells Bob, “I will do the thing”, Bob can see, empirically, how much he can count on Alice for the thing.

2. Can I create commitments to myself? Maybe even make this my to-do list?

We’re not the boss of you, but it’s a bad idea. It gives you too much flexibility to shape your reliability score beyond its meaning for the people you’re making “I will” statements to. When Bob sees a commitment URL from Alice, her reliability should tell him “of all commitments like this that she’s made to people, how often does she follow through?” If it’s combined with self-commitments, it’s not really telling him that.

It’s not just about goosing your numbers with easy commitments. Probably you’re more likely to flake out on commitments made only to yourself. So by including those you underestimate your reliability when committing to others.

So, no, is not a to-do list. It’s just for tracking explicit commitments you’ve made.


It’s not just about goosing your numbers with easy commitments. Probably you’re more likely to flake out on commitments made only to yourself. So by including those you underestimate your reliability when committing to others.

This assumption surprises me. There’s presumably a subset of people who value their commitments to themselves more than their commitments to others (not judging one as better than the other) and I don’t see why shouldn’t cater to that group as well. In addition, if you’re really right that people routinely flake out on commitments to themselves much more often than commitments to each other, that seems like a huge problem that I’d think you’d want to alleviate with Imagine a person who never misses an event they agreed to go to but fails on even the simplest commitments to themself like “I’m going to get to work on time today”.

I think the better argument against self-commitments is that they’re already taken care of by normal Beeminder, but my counter-argument is that Beeminder only accounts for repeating/easily measurable commitments. Sometimes I want to force myself to do something once just to see if I can and seems perfect for this. For example, the other day, I wanted to see if I could fast for 36 hours and I it. I actually failed, but I’m glad that tracked my failure and I don’t see how I could’ve used regular Beeminder to track this as it’s not something I expect to do on a recurring basis (at this time).


Indeed this is the answer I would give to the question…

But I’m not 100% sold.

The open question is how Bob’s new commitment to Alice compares to Bob’s previous commitments. If Bob only makes commitments of a particular difficulty, then Alice can correctly deduce Bob’s reliability given his history on But if this specific commitment is wildly more or less difficult than Bob’s typical commitment, Alice will probably over or under estimate Bob’s reliability, respectively.

So I don’t think it is about “inflating” your reliability relative to other people, it’s about inflating your reliability when you go on to make an unusually difficulty commitment.

But I think the problem is actually even more complex. I think we are implicitly assuming that reliability decreases with task difficulty, but I’m not sure I believe that. Personally I am often less reliable on more trivial tasks because meeting deadlines for major commitments is more important.

So I think what Alice really needs to know is Bob’s reliability distribution for tasks of this type, difficulty, flavor, etc..

The question about commitments to yourself vs others is just another type of flavor in this N-dimensional reliability space that Alice wants to observe.


I think @drtall is right in theory. Ideally you’d have self-commitments in their own category, and anyone interested in your reliability would query it relative to the appropriate reference class. Like “non-self-commitments, of similar difficulty, made in a work context”.

But in practice, getting clever with reference classes leads to overfitting and self-delusion. It’s a slippery slope to adding things like “commitments due in the springtime” where you’re really just torturing the data.

Also it doesn’t feel realistic to ontologize one’s commitments like that. (Note to self about my blog post draft on Anti-ontologyism…)

Anyway, @krazemon, you said that if you’re flakier about self-commitments, that’s a big problem that you’d want to help solve. I’m not sure how I feel about that. But I guess my argument generalizes to this: Maybe you’re the type to more readily flake on yourself or maybe you’re the other way around but it would be a big coincidence if your reliabilities for self- and other-commitments were commensurate. They’re two different beasts – personal resolve vs reliability or something like that. Which means adding self-commitments is skewing your reliability one way or the other.

But I’m not super certain I’m right about this. Maybe there are other ways to frame your commitments, like predictions about work you will complete in the future, that let them span self- and other-commitments while being a meaningful measure of a coherent class of things.

It’s just for me personally that class of things is defined as times I literally uttered “I will” to someone. That was the original problem I wanted to solve. Making my future-tense statements to others be true or at least true in the sense of conveying the true amount of uncertainty. Saying “I will” paired with a link now achieves that!

Maybe it’s an important exercise to articulate your own class of things that your commitments represent?


I agree with @drtall and am not entirely convinced by your counter-argument that creating reference classes would lead to data torturing any more than beeminder in general enables data torturing. If you’re the type of person who would create a “commitments due in the springtime” category as a reference class, you also are probably the type of person who will tweak their beeminder measures in un-desirable ways. I don’t have data for this obviously, but the two seem intuitively similar.

That said, you make a good point about being about the space of defined commitments to others, not just general predictions about future one-time actions and I understand that.

Up to now, I viewed like a personal prediction book where any future action qualified as commit-able. For me, that included commitments such as:

  • I will not eat cake at that birthday party.
  • I will stay at that event for at least 1 hour.

I suspect this reflects different struggles and focuses. I tend to struggle more with commitments I make to myself than commitments to other people (I think, haven’t quizzed my friends to get a more unbiased perspective). I’m usually early for events and hate canceling on plans, so I gravitated towards using to deal with commitments to myself, which I perceive myself as breaking more often. I now see that there’s space for the thing and I described and what you described but that optimizes for the latter.


Oh yes this is exactly what I was trying to say, thanks for the link!

It sounds like you’re envisioning a scheme where you manually create categories and hope that you stumble upon the right ones? And I think I agree this will be a wasted effort because if I knew apriori which categories of commitment I am flaky on, then I’d be way ahead of the game. And if I don’t, then the categories I make up will probably not be useful classes.

Instead, what if you used religiously for, say, 6 months, and then went back to try to identify the common pattern in your failed commitments. Would there be enough data + recollection to identify the patterns? Or would it all just boil down to ad hoc excuses and not reveal any general pattern? Could you identify some trial categories to use for the next 6 months and eventually converge on useful categories?

To be clear, I’m not trying to make any argument against’s mission or its efficacy. I’m just enjoying an old fashioned information theory debate :slight_smile:


Noticed today navigating to that it’s awfully close to - which logs most active github users. Presumably if someone is following the build structure correctly they won’t end up there but might be something to look for.