20th anniversary of The Joel Test for software teams

Guess what this weekend is?? Oh right, I pre-spoiled it in the title there. Well, Joel Spolsky is pretty brilliant and hilarious and had a huge influence on Programmers Of A Certain Age. He’s most of the reason Trello, Stack Overflow, and Glitch all exist (he’s not involved in any of them anymore and in my opinion they’ve all stagnated without him, sadly).

Anyway, I thought it would be fun to revisit The Joel Test 20 years on and see how Beeminder fares, and how the test itself fares. What would a modern version of it look like?

Without further ado…

1. Do you use source control?

Yes, GitHub. It sure is nice living in the future, where it’s hard to even imagine saying no to this question!

Funny story about "backing up your website"

In 2018 we were setting up an ad campaign through a program with Google and they wanted to “make sure our website was backed up” and I was… I guess horrified that that’s a question they even have to ask people? Even for normal people in 2018 it seemed like this would be a total non-question. Keep everything in the cloud and you don’t even need to think about the concept of backups. It’s magic! Unless climate change or World War 3 manage to bring down the whole internet, I guess.

(I mean, with version control like we have with GitHub, it’s not just the website that’s automatically backed up but every version of it that’s ever existed. Programmers are really serious about making sure they never lose any work due to computers, or themselves, flaking out. This is something normal people really need more of. Like it’s not enough to have a document backed up. If you mess it all up somehow you need to be able to go back to any point in time to see how it used to look.)

Anyway, this Google Ads person wanted to make sure our website was backed up. Sometimes we get imposter syndrome and are like “how are we even running a company” or “how does the internet actually work omg i feel so dumb” but apparently there are businesses out there where the only place the code exists is what’s running in production and that makes me feel much better! That was a very long way to say that we definitely have the website backed up oh my goodness can you imagine.

But this reminds me of how my grandma had her entire dissertation destroyed by a bomb in World War 2. True story!

Fortunately Beeminder is on multilpe people’s laptops all over the world, plus all the ways GitHub itself backs up repositories. So losing Beeminder (the code/website) would take a pretty apocalyptic scenario. I’m going to stop tempting fate now!

I think the modern version of the Joel Test doesn’t bother with this question. But in the meantime, we’ll take our free point here.

verdict for Beeminder: 1 point

2. Can you make a build in one step?

By “a build” he’s talking about burning CD-ROMs or whatever but the version of the question for websites is, can you deploy in one step? Yes we can!

Candidate modern version of this: “Can you run a single command to deploy a completely fresh server from scratch?” Answer: No, but we’re working on it!

Or maybe: “Do you have continuous integration and frictionless automated tests?” Answer: Also working on it? We’re maybe above the 50th percentile on this. I’m very proud of what we have for Beebrain (the graphing and road editor – all open source, btw!) with an automated huge test suite that shows us pixel-by-pixel differences of real-world graphs any time we change anything.

verdict for Beeminder: 0.6 points

3. Do you make daily builds?

I guess every time we deploy our daily User-Visible Improvement that’s a build, so, yes.

verdict for Beeminder: 1 point

4. Do you have a bug database?

Yup, thanks again to GitHub. But for full points we need every bug in the database to be a Proper Bug Report. We’re getting there!

verdict for Beeminder: 0.9 points

5. Do you fix bugs before writing new code?

This sounded just impossibly fastidious, even out-of-touch, but I think I’ve figured out what Joel means. In the bad old days before it was easy to make branches in version control and continuous deployment and all that, software teams would commonly accumulate bugs that absolutely couldn’t ship.

We call an issue a Mendoza if you can’t deploy until it’s resolved. Nowadays with feature branches and pull requests and whatnot I think this is a non-issue.

But I think what Joel is saying is that if you suddenly need to deploy but you have a backlog of weeks-old bugs that have to be resolved before you do, you are going to have a bad time.

So I think just the fact that we deploy a new version of Beeminder once a day (on average) means that we get full marks on this.

Another interpretation of this could be “don’t tolerate regressions”. Or our Pareto Dominance Principle. If something you previously fixed manages to unfix itself, drop everything and figure out why. Our UVI commitment I think puts appropriate pressure on us here. If we announced a User-Visible Improvement we feel like cheating cheaters if it becomes false.

We now also beemind regressions (or zombies, as we call them).

verdict for Beeminder: 1 point

6. Do you have an up-to-date schedule?

Um… OK, this might seriously be crucial. I intend to reread Joel’s thoughts on scope creep and evidence-based scheduling.

We do not currently get any points here and I’m pretty embarrassed about that.

verdict for Beeminder: 0 points

7. Do you have a spec?

More and more often we do, and we’re getting better at it.

verdict for Beeminder: 0.9 points

8. Do programmers have quiet working conditions?

Well, we’re all remote and I think we all have pretty fancy headphones at least.

verdict for Beeminder: 0.5 points

9. Do you use the best tools money can buy?

Good question. At least one of us (that’s me) has gone crazy with monitors (two 40-inchers).

verdict for Beeminder: 0.5 points

10. Do you have testers?

Uh oh. We do have @mary who is uncannily good at testing, though it’s not her job.

verdict for Beeminder: 0 points

11. Do new candidates write code during their interview?

I’d say we pass this one, in spirit anyway.

verdict for Beeminder: 1 point

12. Do you do hallway usability testing?

Not enough but this is super important!

verdict for Beeminder: 0.8 points


Total score for Beeminder: 8.2/12 = 68.3%

Eek, that’s like a D+? We either need to up our game or declare this test a dusty dinosaur! :lemon: :grapes:

4 Likes

Oh my goodness I love evidence-based scheduling. While I was briefly entirely self-employed I wrote Python scripts to track my time on a per-task basis and then use this historical data to run Monty Carlo simulations to estimate future contracts based on my naive estimates. I’d then price the contract based on the discounted aggregation of the distribution of scenarios.

Probably spent way too much time on it, but it worked really well, and I loved being able to tell my clients that I would complete the contract by such-and-such a date with an 80% probability. I couldn’t imagine making timelines for clients any other way now. It just seems like anything else would be throwing darts blindfolded.

If TaskRatchet started paying my bills and I had time to do whatever I wanted, I’d really like to build an affordable system for contractors to use this technique, since FogBugz is so stinking expensive.

4 Likes

:heart_eyes: So cool!

Noooo, wait, I think this is like the number 1 startup failure mode. “Such-and-such is great but too expensive; I shall build something more affordable to compete with them.” Much better to be more expensive and better than the competition. I think there’s a classic article making this argument but I can’t think of what it is offhand.

1 Like

PS: There’s a tiny bit of additional discussion of this on Hacker News.

1 Like

You’re being quite hard on yourself with some of those ratings. Really, only 0.5 points for #8? If anything, being remote should give you more than one point for #8, given that the quiet working conditions that I’m sure Joel was imagining as a full point were certainly nowhere near so good as that!

For #9: what would it take, then, to give yourself a full point (or 0.9 like you did for 4 and 7)? If you feel free to do things like your example of going crazy with monitors, what more would you want before giving yourself more than that half-point? I don’t know, maybe you’re really stingy with anything other than monitors, and that’s why it’s only 0.5? But if the monitors are typical, that is if you felt the desire to go crazy for something else just like you did for monitors then you would, well then, if that’s not 1 point (or almost), what is?

I’m also not too keen on deducting fractional points on things like #4. The question is if you have a bug database at all, not if you have a perfect bug database. Holding yourself to a high standard in general is good, of course, but it’s not the point of The Joel Test. All the questions are intentionally simple pass-fail questions, and that gives the test results a certain unambiguity.

But eh, it’s all in good fun, so never mind. I do think though that nowadays the Joel test (at least as originally constituted) is a lot less useful than it once was: as you point out, questions 2 and 5 are simply made for a completely different era of software writing, and to be honest, so is #10. A modern version of #10 might ask about automated testing—but actual testers whose job is to test things manually? If it comes at the expense of automated testing, that probably should cost a point, not give one. And for #1, you’re quite right in saying that a modern version of the test wouldn’t bother with it at all.

So yeah, it’s a dusty dinosaur. The spirit of the test is still good—but even so, I think the unambiguous boolean nature of the questions was an integral part of that original spirit, so I’m really not all that sure about the fractional points thing, besides anything else.

4 Likes

Ha, thanks! And you’re making a great case for bumping up some of our points…

Well, @bee and I have kids. :slight_smile:

I guess this one is pretty ambiguous. I personally have gone crazy with monitors but that’s not something Beeminder treats as a standard perk. Also my laptop is 5 years old.

Ok, fair. Sometimes I’ve felt pretty lousy about how our bug database is where bug reports go to die. But now we’re beeminding it nicely and converging on some nice protocols and conventions and I feel pretty good about our bug database.

2 Likes

With the work experience I’ve had, I certainly wouldn’t consider question 1 a free point. I know of a project that has hundreds, if not thousands, of hours worth of SQL, all on a client-controlled server, with no version control and no backups. It’s terrifying to work on.

3 Likes

You are really optimistic. I have seen people not using any source control, not using git but some weird proprietary stuff, not tracking database structure, not including critical part of asset generation code in repositories…

1 Like

And I build something very similar inside Calc

Now I feel less guilty about spending noticeable chunk of time on my playing with graphs :slight_smile:

1 Like

Doesn’t fogbugz / manuscript still have a free account for groups of 2 users or less? I used it for a while and don’t remember receiving any messages about a grandfathering or anything like that, but I still get this for $0.00/mo and my account remains active…

I thought it was a little bit expensive too, until I realized how revolutionary this evidence-based scheduling could be. If you are running a business and you need more than just a couple of users, $900/yr for 5 or less users is really not that bad. (I couldn’t get my boss to spring for it, but that’s $15/mo per user, without any of the upfront discounts for paying in advance. It was hard to compete with “Pivotal Tracker for .edu is meanwhile completely free.”)

I don’t know any other tools that do this out of the box, but the python scripts and TagTime / monte carlo thing is certainly something you could do on your own, (it isn’t rocket surgery!)

2 Likes

Hmm, not sure. Their pricing page doesn’t list it if they do.

Fair points on the pricing not being as bad as I made it out to be. I think my reaction was due to being a single independent contractor, not a team of five.

1 Like

You’re right, I have a subscription for “up to 2 users” but I don’t see the option to select it anymore. I just created a new 14 day free trial fogbugz instance to test the free account behavior as a new subscriber and see if it’s still available or if I really got something unusual and special here.

So, I can tell you in 14 days after the trial period expires if it degrades gracefully down into the subscription mode I have for my instance, or if you just can’t do this as a new user anymore! If they do still allow this, they sure don’t advertise it.

($75/mo for a single user is a lot of money indeed.)

edit: commitment device I will report back in two weeks

2 Likes

Wait, is that emacs calc? Is it published?

No - LibreOffice Calc. It is not published but if someone really wants it I may describe/publish it.

1 Like

You’re doing EBS in a spreadsheet? I would love to see that.

2 Likes

14 days later: Indeed you cannot get this deal anymore.

My new fogbugz trial ended after 14 days, and the only options available are to enter a credit card and pay $75/mo. On the other hand I have several instances from when Manuscript was new. One of them is able to remain in 2-user mode without any payment, seemingly indefinitely, the other one was decidedly out of use and has probably been deleted (it is throwing 500 errors and can’t log in anymore.)

I suspect the difference is that I did pay some money to Fogbugz on the one that is still active, for one or two months at least. (It was then that my boss said “no way, you have to go through procurement, we’ll reimburse what you spent so far, but shut it down” and we lost some really great progress at communicating better that our team was making, thanks to the EBS features of fogbugz.)

2 Likes

First description (I would need to sanitize my file and I am not sure is it understandable without the explanation anyway):

part 1: recording

  • expected time
  • real time
  • multiplier (real/expected)

part 2: analyzing

  • % of tasks where multiplier is below <0.1 (some COUNTIF function)
  • % of tasks where multiplier is below <0.2
  • % of tasks where multiplier is below <0.3
  • % of tasks where multiplier is below <0.4
  • % of tasks where multiplier is below <1
  • % of tasks where multiplier is below <1.1
  • % of tasks where multiplier is below <1.2
  • % of tasks where multiplier is below <3.1
  • % of tasks where multiplier is below <3.2

part 3: graph

For task where I have an expected time and no real time (yet) I can try calculating probability of completing it by taking expected time and historic completion times.

Then I can graph this probability distribution.

(if someone is interested in also viewing Libre office file, not just this explanation please reply)

2 Likes

I’m most interested in how you’re doing this. Can you elaborate?

I have a field with current task time (taken from a single task that have no filled real time): =IF(MAX(H1:H104)=0,"",MAX(H1:H104))

I have table with colums:

time (first row set to 0, second to 0.1, all subsequent rows with previous row + step: =J61+$J$2

helper column: “datapoint time divided by expected time” ( =K62 / ( <location of field with current task time> ) )

why? Because this way I scaled task-specific time into something where “% of tasks where multiplier is below <0.2” from part 2 becomes relevant

Now there is part where magic happens. I count how many datapoints where completed within given multiplier of estimated time (=COUNTIF($G$2:$G$10000,"<"&( $L52 ))

For example =COUNTIF($G$2:$G$10000,"<3.1") would be “% of tasks where multiplier is below <3.1”. -

Note that "<"&( $L52 ) glues text “<” and value in L52

Total datapoint count is less magical with =COUNT($G$2:$G$10000) formula.

Probability itself is something like =N52/O52

I also have detection of expected time (=IF(AND(L61<1,L62>=1),1,"") but that is just a part of hack to display expected time on probability graph).

This allows you to produce something like


with orange line being a probability distribution given past estimates and real time, and blue line marking predicted time. X scale in hours, Y scale is probability.

Yes, making time estimates for programming is hard. Anyone claiming otherwise is a liar. Though I hope to become better, so far I just become aware how hard is to this properly.

Fun fact: I forgot how it worked and I needed to refactor a single column with monster formula into four to understand how I achieved this.

Unfun fact: LibreOffice sometimes refuses to recalculate graph. Sometimes reopening document is not enough to trigger recalculation. Still, probably better than making a dedicated software used by a single person. It is not an official COVID stat file with datapoints in colums, so I am far away from peak spreadsheet insanity.

3 Likes

(if someone wants more detail or clarification, please write)