Thanks so much for documenting how you do this! This is really great!
(This might be too much of a tangent, but does anyone have thoughts on the best way to turn that cdf into a pdf? Ie, turning that cumulative, monotonic probability graph into a more familiar bell curve? I think it’s easier said than done…)
This is a bit incomplete - I also used this knowledge to switch to time based contracts and away from “system/program with features XYZ for N €”.
And stopped believing “this thing was completely mispredicted, like last one, but it is just an outlier”.
Graph from the easiest way (reduce size of bin, graph “delta between current and previous cumulative value” rather than “current cumulative value”)
(peak is where predicted time is)
Note that this graphing has massive weakness: you miss complete outliers - tasks that were not completed even after taking 100 times more time than expected.
It includes tasks with inifinite real time (never completed due to runaway work required and some that turned out to be de facto impossible).
And ones with “that can be done in 5 minutes” that turned into “one month later it is completed”.
Typical case: “so that is just switching an option and reprocessing data… Ops, I need to upgrade package, what requires upgrading OS, so I should retest backups, ops there is pre-failure on one of multiple backup disks…” and total time ended including replacing one of (independent!) backup HDDs, backup, retesting backup, upgrading backup script and dealing with new defaults, full reinstall, reconfiguring other software on new OS, upgrading backup scripts, reconfiguring relevant software. And then finally making this tiny switch in config + verifying reprocessed OpenStreetMap data (what actually took five minutes).
Great points! Maybe the way to convey it all reasonably with a bell curve style graph (pdf in stats lingo, not to be confused with pdf as a file type!) is to both smooth it and have “>45 hours” on the right edge of the graph with another bump there to indicate the ~14% chance of that.
Here’s Nate Silver of 538 doing the kind of smoothing I have in mind:
In Mathematica, SmoothKernelDistribution looks promising. I don’t know if spreadsheets have anything like that built in.
I tried to follow @beeminder_user’s explanation (I really did!) but in the end decided to just write a Google Apps Script. Still waaaaaay less code than my old Python scripts!
All the EBS function does is take a set of estimates for incomplete tasks and a set of multipliers for already-completed tasks (where each multiplier equals the time taken to complete the task divided by the estimate for that task).
The function returns a sorted list of 100 estimated total hours which I dump into a sheet and then use to extrapolate an estimated end date for each scenario. In my current spreadsheet, I use the end date of the last completed task as the start date for this extrapolation.
Also, since 100 scenarios are returned, I assume that each scenario represents percentiles of sorts from 1% to ~100%.
From there I chart different distributions I’m interested in, log a few key percentiles in a static log so I can have a record of trends, etc.
Definitely enjoying doing this in a spreadsheet way more than I am maintaining Python scripts for it!