Ok, so I don’t know exactly what the “best” thing to do is, but here are some rambling bullshit thoughts.

Consider some task that we must do. For example, “solve this puzzle”. Solving the puzzle requires having a sequence of K insights, each of which we don’t really know how long will take to happen, perhaps it comes spontaneously with a probability of y per unit time. The distribution of waiting times for such a process is a gamma distritbution with shape parameter K and scale parameter 1/y.

However, let’s model it as being log-normal distributed. This maintains our desired qualitative shape but I think it’s easier to make something out of it because it’s easier to relate the confidence interval to the implicit parameters of the distribution. If you are willing to guess the mean and standard deviation instead of strictly guessing the 80% confidence interval, then the gamma distribution becomes easier to assume.

As Danny said, we should think our prediction is implicitly trying to guess the parameters of the distribution for the given task. We want to at the very least ensure that the optimal guess is to correctly guess the parameters of the distribution.

One puzzle, from this perspective - suppose a task is genuinely hard to guess, i.e. it genuinely has a large variance in the amount of time it will take to perform an identical task. Unfortunately, since we don’t know whether or not tasks are genuinely “identical”, it is hard to correct for this - most likely, we will end up picking a scoring rule that just gives low scores for a vague prediction, even if the task is “genuinely” hard to predict.

Anyway, let’s say you believe this log-normal shit. Then if you make a prediction of [60 min, 80 min], then we are predicting that the (natural) log of the time (in minutes) is between 4.09 and 4.38. As is convention, use μ, σ to denote the mean and standard deviation of the log. Let’s take this, according to Danny’s suggestion, to be a guess of the 80% confidence interval, which is roughly between μ +/- 1.285 σ. So we are guessing that the log of time has a mean of 4.238 and standard devation of 0.11.

Following Danny’s suggestion I looked up “proper scoring rules”. It seems like one such rule is the log of the density assigned to the observed value. I am not really sure of the relative merits of different scoring rules but let’s try that. **I should reemphasize this so the reader doesn’t just get absorbed in the equations. The scoring rule we are using is - convert the guess into the implied guess about the probability distribution. The idea is that the score is assigned based on how high a probability you assigned to the measured time - thus, if the distribution you assign is too broad, you can’t get ANY POINTS becaue you assign such low probability to any individual time. This is how it conforms to what we want.**

The implied PDF is

So if we observe a time T, the score is

S = - \log{(\sqrt{2 \pi} \sigma T)} - \frac{\left(\log(T) - \mu \right)^2}{2 \sigma^2}.

If we call the interval [a,b], then again in the above expression

\mu =\frac{ \log{a} + \log{b}}{2}

and

\sigma \approx \frac{ \log{b} - \log{a} }{2.57}.

If we want to use the gamma distribution as our implied distribution instead, let’s instead make our guess for the mean \langle T \rangle and the standard deviation \sigma_T.

If we do so, then the implied estimate of the parameters of the distribution is \theta = \sigma^2_T/ \langle T \rangle and k = \langle T \rangle^2 / \sigma_T^2. (Look at the last entry of the table in the wikipedia article gamma distribution.)

Thus, the logarithmic scoring rule gives

S = - T \langle T \rangle / \sigma_T^2 + \left(\frac{T^2}{\sigma_T^2} -1 \right) \log{T} - \log{\left( \frac{\sigma_T^2}{\langle T \rangle} \Gamma{\left(\frac{\langle T \rangle^2 }{ \sigma_T^2}\right)}\right)}.

One unfortunate feature of this log scoring rule is that the numbers are all negative - the best you can do is get 0, if you guess with 100% confidence and get it right.

Anyway this is all both low-confidence and poorly explained, so lemme know what you think.

EDIT: If you want positive numbers, the “spherical scoring rule” p(T) / \sqrt{\int_{0}^{\infty} p^2(T)} may be better. I need to do other stuff for now but I am 99% sure we can get a closed-form expression for that for the gamma distribution. Less sure for the log-normal distribution.

EDIT2: https://www.jstor.org/stable/2629907?seq=3#metadata_info_tab_contents was the source for scoring rules

EDIT3: If you really want to both use the gamma distribution and to interpret the low / high values as where the CDF is 0.1 and 0.9, then I think we could work out a way to get the score via solving the system of 2 equations CDF(a, k, θ) = 0.1 and CDF(b, k, θ) = 0.9 for k, θ on each individual case. But this sounds annoying to implement except in a real programming language or mathematica.

EDIT4: Why might a log-normal distribution be on the right track? Well, I think it satisfies our intuitive notion that the scale of the prediction is irrelevant. So if you predict [2 min, 4 min] and the real time is 5 min, you will get the exact same score as if you predict [2 hour, 4 hour] and the real time is 5 hour. (Because in log-space, this is just an overal translation of the distribution and the specific place we are looking on the T-axis to find our predicted P(T))