Bit-Probing The Secret List, or Cantor vs Kronecker

Impressive work, @zzq and @byorgey. The best @bee and I had come up with was a way to do it with 3n+1 queries.

Here’s my writeup of the optimal solution, after chewing on everything above as well as seeing Stan Wagon’s writeup, which this may also overlap with a bit.

Normal version, a bit at a time

First, query the first bit of the first 3 rows. We’ll necessarily get mostly zeros or mostly ones. If we flip that majority bit and use it as bit 1 of our guess, we’ve eliminated (mismatched) at least 2 rows.

To spell that out, just by using the complement of the majority bit as the first bit in our guess, we’re assured our guess does not match 2 of the strings in the first 3 rows.

(For easier visualization, rearrange the first 3 rows so the eliminated rows are on top. We can remember this rearrangement and restore the original order at the end. Which means that WLOG we can say we’ve eliminated the first 2 rows.)

Now proceed down the diagonal starting with bit 2 of string 3 like so:

\begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & \cdots & n \\ \hline 1 & \blacksquare & \cdot & \cdot & \cdot & \cdots & \cdot \\ 2 & \blacksquare & \cdot & \cdot & \cdot & \cdots & \cdot \\ 3 & \blacksquare & \Box & \cdot & \cdot & \cdots & \cdot \\ 4 & \cdot & \cdot & \Box & \cdot & \cdots & \cdot \\ 5 & \cdot & \cdot & \cdot & \Box & \cdots & \cdot \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ n+1 & \cdot & \cdot & \cdot & \cdot & \cdots & \Box \end{array}

(The squares represent queried bits; whether they’re filled or open is just for convenience in referring to them.)

We construct the rest of our guess by flipping every bit of that diagonal (the open squares). In this way we mismatch strings 1 and 2 by the above argument and mismatch strings 3 through n+1 by Cantor’s classic diagonalization argument.

That’s n+2 queries total.

To see why we can’t do better than that, we’ll first show we need at least n+1 queries and then show that for some secret lists, n+1 isn’t enough.

The lower bound of n+1 is because each row needs at least one query, else we’d have no guarantee that our guess mismatches that row.

Why n+1 isn’t always enough: With n+1 queries and only n columns, some column is getting queried twice, by the pigeonhole principle. That means we’re guaranteed to have two rows that look like the first two rows in the figure above. Namely, two rows each with the same single column queried. Now suppose we’re unlucky and the first row has a 1 in that column and the second row a 0. Now whatever guess we come up with has either a 1 or a 0 in that double-queried column. If it has a 1 then we have no guarantee that our guess mismatches row 1. If it has a 0 then we have no guarantee that it mismatches row 2. QED.

Batch version aka the Oblivious Variant

If we have to make all our queries at once, we’re going to need a couple more of them.

Start with the n=2 case. Suppose the secret list is one of these two:

\begin{array}{c c} \text{A:} & \text{B:} \\ \begin{array}{l} 00 \\ 01 \\ 11 \\ \end{array} & \begin{array}{l} 10 \\ 01 \\ 11 \\ \end{array} \end{array}

And suppose we query everything except the first bit of the first row, WLOG. Then we know 01 and 11 are on the list (we see them in rows 2 and 3, all of which we’ve queried) so our only hope is to guess either 10 or 00 as the only 2 possible bit strings left. But, womp womp, those are both possibilities depending on which of the two secret lists above is the real one. So we only have a 50% chance of naming a string not in the list and we need a guarantee. Meaning we’ll have to query all 6 of those bits. When n=2 we need n+4 queries.

Can we manage with n+4 queries in general? Yes, like so:

\begin{array}{c|ccccc:c} & 1 & 2 & 3 & 4 & \cdots & n \\ \hline 1 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 2 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 3 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 4 & \cdot & \cdot & \Box & \cdot & \cdots & \cdot \\ 5 & \cdot & \cdot & \cdot & \Box & \cdots & \cdot \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \hdashline n+1 & \cdot & \cdot & \cdot & \cdot & \cdots & \Box \end{array}

(Ignore the new dashed lines for now.)

Again, there are 4 possibilities for the first 2 bits and the solid squares can only cover 3 of them. Whichever 2-bit string is missing in the solid squares, that’s the first 2 bits of our guess. We’ve mismatched the first 3 rows. Now proceed to flip the diagonal like usual (the open squares in the figure) and we’ve mismatched the remaining rows.

All that remains is to show that n+4 is optimal – that n+3 queries is not enough. We already showed n+3 queries is not enough for n=2 but suppose, by contradiction, it’s enough for some n=k>2. I.e., that we can find a bit string not in a secret list of k+1 bit strings each of length k using k+3 queries.

If we were to query every row twice, that would be 2(k+1) queries, which is more than k+3 when k>1, which it is. So some row is only getting queried once in this supposed k+3 solution. WLOG say it’s the last column of the last row.

Now throw away the last row and last column (see the dotted lines in the figure). That’s what an n=k-1 instance looks like and whatever remaining queries we used for the n=k instance we can use for this smaller one. Any queries in the column we threw away aren’t needed (the smaller instance doesn’t have that column) and there are no other queries in the row we threw away. Since we’ve thrown away at least one query, we have at most k+3-1 = (k-1)+3 queries still being used.

Those remaining queries suffice to mismatch everything inside the dotted lines. Again, we posited a set of queries from which we constructed a string that mismatched all the strings in the n=k instance. Simply use that string, without the final bit, and it must mismatch all the strings in the n=k-1 instance.

So we’ve just shown that if there’s an n+3 solution for any particular n, there’s also an n+3 solution for the next smaller n. Which means we can keep applying that logic until we hit n=2 and show that there’s an n+3=5 query solution to the n=2 case, which we already showed there isn’t. That’s our contradiction. Meaning n+3 queries is impossible for all n. QED.

2 Likes