Bit-Probing The Secret List, or Cantor vs Kronecker

So I think we actually now have solutions for both variants of the problem, and the answers are n+2 and n+4, for the versions of the problem in which we respectively can and cannot base our choices of subsequent queries on the results of earlier queries.

My inductive step works in both cases, as the query it adds doesn’t depend on prior queries. @byorgey gave the base case for the with-dependency version (four), and @dreev gave the base case for the without-dependency version (six).

Likewise, @byorgey’s n+1 lower bound applies in both cases. I guess we haven’t ruled out, though, that there might be some solution to the without-dependency variant in n+2 or n+3 queries (for some values of n, at least, if not for others.)

2 Likes

Technically I think there’s still a gap for the batch query version: we know that 6 queries are required in the n = 2 case, and we have a construction that solves it in n + 4 queries for any n \geq 2, but I don’t think we have proved that it’s not possible to do better than n + 4 for larger n.

However, if we could formalize @zzq 's argument that if the case for some n > 2 can be solved with x queries, then n-1 can be solved with x-1 queries, that would do it.

2 Likes

Yeah, that struck me just now too, and I added it on to my last post in parallel.

I think we ultimately do need to formalize the inductive lower bound proof. [EDIT: yes, agreed, exactly what you said as well.]

2 Likes

In the batch query version, imagine we only had n+3 queries. We need to query each of the n+1 strings at least once, then we have two “extra” queries. In this version, we can reorder the queries, indexes, and strings WLOG, so let’s say that our first n queries were each at index i of string i.

We have three more queries, which we must allocate without seeing the results of the first n queries. We also have a yet-unqueried string to query with (at least) one of them. If we choose to allocate two of the queries to the same index and one to another, we can say WLOG that we chose to query the second bit of the first three strings and the first bit of the second and third string, and then a diagonal going down the rest of the strings.

Then we apply @dreev’s argument to that little rectangle at the first two indexes of the first three strings, showing that bit one of string one could be anything.

All that remains is to show that we can’t get out of this by assigning all three extra queries to the same index, or to distinct indexes. (Remembering also that we have the constraint that one of them is earmarked for the as-yet-unqueried string.)

2 Likes

Alright, I think I’ve got it, a proof that you can’t do better than n+4 in the batch version.

Let’s assume for the sake of contradiction that there exists some n such that the n case is solvable in n+3 or fewer queries. In fact, let’s say that n is the smallest such number.

(n > 2, because we know that the n=2 case isn’t solvable in less than six queries.)

As (n+1) * 2 > n + 3 (for all n > 1), it isn’t possible that this solution involves querying each string twice. So there must be at least one string this solution queries only once. The resulting mismatching string the solution discovers must mismatch the singly-queried string at the index where that string was queried, because otherwise there is no way to be sure of the mismatch.

In the worst case, the singly-queried string has one value at that index and all the other strings have the other value, so mismatching the singly-queried string there doesn’t help mismatch any other string.

WLOG, let’s take the singly queried string to be the final (n+1 th) one, and the bit of it we’re querying to be the final (n th) index.

To solve an arbitrary n-1 case in no more than (n-1) + 3 queries, we can just follow our solution for the n case, except skipping all queries for index n (which our n-1 length strings don’t have) or for string n+1, likewise. This skips at least one of the n+3 queries, and so uses no more than (n-1) + 3 queries. With these queries, we must be able to mismatch all the first n-1 bits of the first n strings, which is all there are in the n-1 case!

But this is a contradiction, because we assumed that n is the smallest integer for which the problem is solvable in n+3 or fewer queries, and we just found a smaller such integer (n-1). Thus by reductio ad absurdum, there is no smallest integer for which the problem is solvable in n+3 or fewer queries, and thus no such integer at all.

Q.E.D.

We have our final answer: we’ve shown that the batch version needs exactly n+4 queries for all n \geq 2, and the adaptive version needs exactly n+2.

2 Likes

Fantastic!

Impressive work, @zzq and @byorgey. The best @bee and I had come up with was a way to do it with 3n+1 queries.

Here’s my writeup of the optimal solution, after chewing on everything above as well as seeing Stan Wagon’s writeup, which this may also overlap with a bit.

Normal version, a bit at a time

First, query the first bit of the first 3 rows. We’ll necessarily get mostly zeros or mostly ones. If we flip that majority bit and use it as bit 1 of our guess, we’ve eliminated (mismatched) at least 2 rows.

To spell that out, just by using the complement of the majority bit as the first bit in our guess, we’re assured our guess does not match 2 of the strings in the first 3 rows.

(For easier visualization, rearrange the first 3 rows so the eliminated rows are on top. We can remember this rearrangement and restore the original order at the end. Which means that WLOG we can say we’ve eliminated the first 2 rows.)

Now proceed down the diagonal starting with bit 2 of string 3 like so:

\begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & \cdots & n \\ \hline 1 & \blacksquare & \cdot & \cdot & \cdot & \cdots & \cdot \\ 2 & \blacksquare & \cdot & \cdot & \cdot & \cdots & \cdot \\ 3 & \blacksquare & \Box & \cdot & \cdot & \cdots & \cdot \\ 4 & \cdot & \cdot & \Box & \cdot & \cdots & \cdot \\ 5 & \cdot & \cdot & \cdot & \Box & \cdots & \cdot \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ n+1 & \cdot & \cdot & \cdot & \cdot & \cdots & \Box \end{array}

(The squares represent queried bits; whether they’re filled or open is just for convenience in referring to them.)

We construct the rest of our guess by flipping every bit of that diagonal (the open squares). In this way we mismatch strings 1 and 2 by the above argument and mismatch strings 3 through n+1 by Cantor’s classic diagonalization argument.

That’s n+2 queries total.

To see why we can’t do better than that, we’ll first show we need at least n+1 queries and then show that for some secret lists, n+1 isn’t enough.

The lower bound of n+1 is because each row needs at least one query, else we’d have no guarantee that our guess mismatches that row.

Why n+1 isn’t always enough: With n+1 queries and only n columns, some column is getting queried twice, by the pigeonhole principle. That means we’re guaranteed to have two rows that look like the first two rows in the figure above. Namely, two rows each with the same single column queried. Now suppose we’re unlucky and the first row has a 1 in that column and the second row a 0. Now whatever guess we come up with has either a 1 or a 0 in that double-queried column. If it has a 1 then we have no guarantee that our guess mismatches row 1. If it has a 0 then we have no guarantee that it mismatches row 2. QED.

Batch version aka the Oblivious Variant

If we have to make all our queries at once, we’re going to need a couple more of them.

Start with the n=2 case. Suppose the secret list is one of these two:

\begin{array}{c c} \text{A:} & \text{B:} \\ \begin{array}{l} 00 \\ 01 \\ 11 \\ \end{array} & \begin{array}{l} 10 \\ 01 \\ 11 \\ \end{array} \end{array}

And suppose we query everything except the first bit of the first row, WLOG. Then we know 01 and 11 are on the list (we see them in rows 2 and 3, all of which we’ve queried) so our only hope is to guess either 10 or 00 as the only 2 possible bit strings left. But, womp womp, those are both possibilities depending on which of the two secret lists above is the real one. So we only have a 50% chance of naming a string not in the list and we need a guarantee. Meaning we’ll have to query all 6 of those bits. When n=2 we need n+4 queries.

Can we manage with n+4 queries in general? Yes, like so:

\begin{array}{c|ccccc:c} & 1 & 2 & 3 & 4 & \cdots & n \\ \hline 1 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 2 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 3 & \blacksquare & \blacksquare & \cdot & \cdot & \cdots & \cdot \\ 4 & \cdot & \cdot & \Box & \cdot & \cdots & \cdot \\ 5 & \cdot & \cdot & \cdot & \Box & \cdots & \cdot \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \hdashline n+1 & \cdot & \cdot & \cdot & \cdot & \cdots & \Box \end{array}

(Ignore the new dashed lines for now.)

Again, there are 4 possibilities for the first 2 bits and the solid squares can only cover 3 of them. Whichever 2-bit string is missing in the solid squares, that’s the first 2 bits of our guess. We’ve mismatched the first 3 rows. Now proceed to flip the diagonal like usual (the open squares in the figure) and we’ve mismatched the remaining rows.

All that remains is to show that n+4 is optimal – that n+3 queries is not enough. We already showed n+3 queries is not enough for n=2 but suppose, by contradiction, it’s enough for some n=k>2. I.e., that we can find a bit string not in a secret list of k+1 bit strings each of length k using k+3 queries.

If we were to query every row twice, that would be 2(k+1) queries, which is more than k+3 when k>1, which it is. So some row is only getting queried once in this supposed k+3 solution. WLOG say it’s the last column of the last row.

Now throw away the last row and last column (see the dotted lines in the figure). That’s what an n=k-1 instance looks like and whatever remaining queries we used for the n=k instance we can use for this smaller one. Any queries in the column we threw away aren’t needed (the smaller instance doesn’t have that column) and there are no other queries in the row we threw away. Since we’ve thrown away at least one query, we have at most k+3-1 = (k-1)+3 queries still being used.

Those remaining queries suffice to mismatch everything inside the dotted lines. Again, we posited a set of queries from which we constructed a string that mismatched all the strings in the n=k instance. Simply use that string, without the final bit, and it must mismatch all the strings in the n=k-1 instance.

So we’ve just shown that if there’s an n+3 solution for any particular n, there’s also an n+3 solution for the next smaller n. Which means we can keep applying that logic until we hit n=2 and show that there’s an n+3=5 query solution to the n=2 case, which we already showed there isn’t. That’s our contradiction. Meaning n+3 queries is impossible for all n. QED.

2 Likes

Good writeup.

(but note that you wrote n-3 a number of times in the final paragraph when you meant n+3.)

1 Like

Fixed those typos; thank you! And thanks for the beautiful solution. Thanks also to @bee for all the work on this, including helping me understand your backwards-induction reductio argument.

1 Like