Scores and Additional Stats

@dreev and I had an off-forum conversation in which I mostly said false things and then came to false conclusions from the starting point of those false things, but in all of that, I think there are a couple that I’m still inclined towards.

The first is that it strikes me as weird that the following two people have the same score:

Person A
She does 10 things on time, as per her commitment. She also committed to do one other thing that she is now 6 months late on. It wasn’t a wise thing to commit to as it was partly out of her control. Since she’s 6 months late, though, that’s an 11.9%. Her score is 91.991%.

Person B
Has literally never effing done a thing she has said she would do on time in her life. And she doesn’t ever really intend to when she makes the “commitment”. She gets around to things when she gets around to them. The last 11 things she did (the things tracked so far) were just under 5 days late. She also has a 91.991% score.

Something about these two people having the same score doesn’t seem quite right. Though, it seems incredibly difficult to weigh timeliness against percentage of tasks completed in an overall score. (And, if you polled every human being on the planet, I suspect no two would agree on precisely how to weigh being on time against just getting the thing done in a timely manner even if late, so the numbers are bound to need to be arbitrary no matter what.)

There was talk about maybe there eventually being other stats that people could check up on to see what’s generating the score, which might eventually make that irrelevant. And anyway,

I like the idea of extra stats that you can click on to get an idea of your follow-through profile, and here are some that I’m throwing out there:

an “on time” score, a “better late than never ¯\_(ツ)_/¯” score, and a “flake” score where…

  • the “on time” score is the percentage of tasks that are past the deadline that were completed at or before t* he deadline,
  • the “better late than never score ¯\_(ツ)_/¯” is a score that evaluates how late you tend to be when you’re late and complete the task, and
  • the “flake” score is the percentage of tasks that haven’t been completed but are past due.

Then, on top of the 91.991% score each of the above people would have, they would also have the following set of scores:

A: On time: 90.9% ---- ¯\_(ツ)_/¯: 0.0% -------- Flake: 9.1%
B: On time: 0.0% ----- ¯\_(ツ)_/¯: 91.991% ---- Flake: 0%

From that, we can tell that the first person almost always gets things done on time, but hasn’t tended to complete things once they get late. (At least they haven’t yet… but they could drop that score by finishing those things they haven’t done yet.) We can also tell that the second person will totally not be on time with what they say they’ll do, but they’re very likely to get it done fairly soon after the deadline and not to simply fail to do it.

I think those’d be cool stats to have!


Thing #2

It strikes me as not entirely accurate to call the main score a reliability score. Reliability when it comes to the truth of the claims (something you mention as your your key motivation) is pretty easy to calculate. It’s the % of times they did what they said when they said they’d do it. What your score does seem to deliver nicely is a kind of follow-through score. To what degree are you likely to follow through on your “I will” statements?

(And you can still have a reliability score; that’s technically what the “on time” score I proposed is. Giving you both the technical reliability score in the background stats and and the neat info about your degree of follow through up front.)


I have lots of responses! First, a big part of is having a single focal metric. There can be other stats to view for a person, and whole lateness histograms, etc, to find out what you really want to know about reliability, but only one focal score. So, yes, there are tricky tradeoffs to make in distilling everything into one score…

I don’t think there’s any way around that problem, though perhaps the shape of the current late penalty function feels wrong to you? Here’s the basis of that function:

  • by the deadline = 100% success
  • seconds late = close enough, 99.999% credit
  • minutes late = .999 (baaasically counts)
  • hours late = .99 (no big deal, almost fully counts)
  • days late = .9 (hey, it got done, that was the main thing)
  • weeks late = .5 (kind of half defeats the point to be this late)
  • months late = .1 (mostly doesn’t count)
  • years late = .01 (better late than never, barely)

Of course how much sense that function makes depends a lot on the commitment. Sometimes being hours late means you might as well not do it at all, and other times the deadline is super arbitrary and you can be weeks late and still have upheld the spirit of the commitment.

Again, it really depends. If I say I’ll bring you a banana split with a cherry on top and I bring you a banana split without a cherry on top, is it reasonable to say I partially did what I said I’d do? Maybe it depends on if the cherry was significant or just a flourish. Or say I promise you a fancy red sports car and then show up with a blue one. Was the redness just because that’s the canonical color for sports cars? A lot of my commitments’ deadlines feel that way. I say I’ll do something and I go with the default due date of 5pm the next business day but no one cares that it’s then vs the end of the week. (Like, oh, me telling you I’d repeat my emailed arguments as a reply in this forum thread, which I’m about to get 99.1% credit for.)

This is indeed all driven by an obsession with Truth. Quoting the preamble of the spec:

It was really bugging me that my future-tense statements could sometimes be falsehoods. So I started building this system so that instead of making a potentially false statement about what I’ll do I can instead, impeccably truthfully and more informatively, always give an exact probability that I’ll do [where “do” is defined by the above function, eg, doing it weeks late is defined as half doing it, etc] the thing.

(tangent: i’ve included in my collection of Truth Vignettes:

You’re also right to point out how this is falling short of the Platonic ideal. For example, I agree that I’m wrong to still phrase something as “I’ll do X by Saturday” when I mean “the probability distribution on how much after Saturday I’ll do X is summarized by the following number computed in the following way”. I’m not sure how to fix that though, since it’s the natural utterance of “I will” that triggers my personal rule to log it as a commitment. So I’m counting on the system to make clear to everyone what I actually mean.

If we’re going to get philosophical about it (which, of course we are) then no proper contract can ever just state a naked commitment and leave it at that. It always has to clarify the or-else. In this case the or-else is your score falling according to a given formula. Which matters because the higher your score the more credibility you have when stating intentions. That score really is full of meaning despite the ambiguities you point out and the incentive to keep it high is real.

PS: More brainstorming of metrics:

  • Number and fraction of promises completed on time
  • Number and fraction of promises completed up to 1 minute late
  • Number and fraction of promises completed up to 1 hour late
  • Number and fraction of promises completed up to 1 day late
  • Number and fraction of promises completed up to 1 week late
  • Number and fraction of promises completed up to 1 month late
  • Number and fraction of promises completed up to 1 year late
  • Pessimistic score, if all overdue promises never get done
  • Number and fraction of promises voided

And all of these could have graphs to show how they’ve changed over time.