There’s a secret list of n+1 binary strings, not necessarily unique, each with n bits. Your mission is to name a string guaranteed NOT on the secret list, by querying as few of the n(n+1) bits as possible. This is clearly impossible for n=1 since the secret list might just be {0, 1}, with every possible 1-bit string in the list. For n=2 there are 4 possible bit strings and only 3 in the list, so with enough queries, it’s possible.
As a function of n, what’s the minimum number of queries you need?
Batch variant: What if you have to pick all your bits to probe all at once?
I’ll kick things off with a question and a few observations:
Question: are the binary strings guaranteed to all be distinct? You didn’t say that, so I’m assuming not, but thought it worth clarifying.
And now a few observations:
At least n+1 queries are required, since we must learn at least one bit from each string; if we have zero information about some string then we cannot guarantee to pick a string not in the list.
If there were n strings with n bits, we could use the usual diagonalization trick: query bit i from each string i, and then generate the string whose i th bit is the opposite of whatever the i th bit is in string i. This is guaranteed to be distinct from each of the n strings and hence not in the list.
However, if we try this with n+1 strings, there’s no unique bit we can query for the n+1 th string. Suppose we query bit j. If we get lucky and bit j of string n+1 is the same as bit j in string j, then we can just use the same diagonal string as before. But if bit j of string n+1 is the opposite of bit j of string j, then we are stuck: the diagonal string corresponding to the first n strings might be the same as string n+1. Even if we were to query all the bits of the n+1 th string, we might still be stuck: if the entire thing happens to be equal to the diagonalization of the first n strings, then we still don’t know a particular string we can pick that is not in the list.
Also, (n+1) k queries suffice, where k = \lfloor \lg (n+1) \rfloor + 1 is the smallest integer such that 2^k > n+1. Just query the first k bits of each string. Since there are 2^k possible k-bit sequences, and 2^k > n+1, there must exist some k-bit sequence which is unattested; pick a string which starts with some such k-bit sequence and ends with, say, all 0’s. But it’s probably possible to do better than this.
I’d argue that in the worst case there can exist a pair of strings in the secret list that differ at only a single bit. If so, then to solve the problem we need to pick a string that differs somewhere else in the string from (both of) them. So we need to solve the n-1 case—the version of the problem where there are the n binary strings each with n-1 bits, formed from the original problem by removing the i th bit from each string, if i is the index where that almost-identical pair differs. (That leaves us with a set of only n strings, because by removing the one place where those two strings differ, we collapse them down into the same thing.)
That’s not quite enough for an inductive approach, but it’s a start…
OK, how about this: let’s try doing induction. Let’s say we’ve solved the n case, and use that to solve the n+1 case.
By doing x queries on the first n bits of the first n+1 strings, we can (by assumption) come up with a length n string that doesn’t match the first n bits of any of the first n+1 strings.
So no matter what final bit we append, we will end up with a string that doesn’t equal any of the first n+1 strings. So all we need is one more query, to find what the final bit of the n+2 th string is, and tack the opposite on to our length n string.
The first n bits don’t match any of the first n bits of the first n+1 strings by assumption, and the final bit doesn’t match the final string by construction. All in all we’ve used x+1 queries.
That’s the inductive step—each increase of n by one requires just one more query. We now just need a base case, which I guess is n=2.
For the n=2 case—as @byorgey says, we need at least n+1 = 3 queries, at least one from each string. But of course querying all three at the same index doesn’t help in the worst case. Similarly if we query two at the same index and one at the other, if the two we query at the same index get different results we’ll still need a fourth query. Can it be done in four?
Four queries needs to be both bits of one string and one bit of each other of the other two strings, as we need to query each string at least once.
Yes, it can be done in four. Query all three strings at the first bit. If they are all the same, done. So suppose not: then two must be the same, and one must be different. Without loss of generality suppose two start with 0 and the other starts with 1. Now query the second bit of the string that starts with 1. Whatever it is, we can pick the string that starts with 1 and has the opposite second bit.
Yes, by the logic of my first post in this thread, I think. In the worst case, we need to solve the n-1 case to solve the n case—and even then we wouldn’t have enough information to solve the n case unless we gather more, which in this setup can only be done by issuing at least one more query.
I’m sure there must be a way to make that slightly more rigorous, but that’s my intuitive sketch here.
Something like this: if it so happens that two bitstrings differ only at one place, say at index i, then whatever we do to solve the problem must use (at least) one query on some string at index i. But without that (or those) queries we must be able to solve the n-1 case (with the i th bit of each string sliced out), by simply doing all the other queries which together must be enough to find a string that doesn’t match at some other bit the almost-matching pair.
That is to say, if we can solve the n case in x queries, we can solve the n-1 case in x-1 (or fewer) queries. This means that if we can solve the n case in fewer than n+2 queries, we could solve the n=2 case in fewer than four queries, which we previously ruled out. Q.E.D.
I don’t follow this. If the two bitstrings differ only at index i, then it seems like that’s the one index we don’t have to query, because it doesn’t help. What am I missing?
You’re right, actually, and I was confused. I was thinking that we’d need to know that they were different on index i, but of course not, if we found some way to not match either string on some other index.
I still want to convey the essence of my point, which at this stage is still just an informal intuition: it seems absurd that the solution to the n case and the n+1 case are identical (for some pathological n.) I’m pretty sure the number of queries is (strictly) monotonically increasing in n.
When you add on a new bit and a new string to an already solved problem, you’ll at least need to query the new string somewhere (in the worst case.) But that’s only if you start with the previous solution—perhaps there is some clever way (I seriously doubt it, in the worst case), to construct a whole new solution that doesn’t match even the new string, no matter what the new string is, without further queries.
How about this. If we make only n+1 queries, then we know exactly one bit from each string.
If we only know a single bit from a given string, the only way to pick a string that is guaranteed to be different than it is to pick a string that has the opposite bit in that position. But since there are n+1 strings with only n bits, there must exist some i for which we have queried multiple strings at bit i. In the worst case, the bits at position i differ among those strings. But now we have no way to pick a string that is guaranteed to be different from them.
Put another way: if we only know a single bit from each of the n+1 strings, then everyn-bit string could be on the list, as far as we know. Say string A and string B are two strings we queried both at bit i, but string A has a 0 there and string B has a 1. Now given anyn-bit string C, just look at bit i: if it is 0, then (as far as we know) we could have C = A. If bit i of C is 1, then we could have C = B.
Here are the initial things we figured out with our friend Svata:
for n=1 it’s impossible. the secret list is potentially {0, 1} which is every possible 1-bit string so it’s impossible to find a bitstring not in the list.
for n=2 the answer is 6, meaning you have to query every single bit in the secret list. proof: suppose the secret list is either this:
00
01
11
or this:
10
01
11
and suppose you query everything except the first bit WLOG. then you know 01 and 11 are on the list so your only hope is to guess either 10 or 00. but, womp womp, those are both possibillities depending on which of the two secret lists above is the real one. so you only have a 50% chance of naming a string not in the list and we need a guarantee. meaning we’ll have to query all 6 of those bits.
Yeah, but since one of those two strings was also queried somewhere else, why can’t we just mismatch that string at that other position, then use index i to mismatch the string whose only query was at i.
That is to say, me and @byorgey have been assuming that you can chose what later queries to do based on the results of the earlier queries. If you say we have to choose which queries we’re doing before we start with any of them, that’s a different problem.
The reason it loses generality is that after querying three bits at the same index and getting two of a given result and one of the other, it’s vital that the fourth bit we query is the other bit of the string which is the odd one out based on what we know so far.
Ah, but you did lose some generality! Assuming we get to make our queries sequentially (i.e. you get to learn the outcome of one query before you decide which query to make next), making those five queries in particular was suboptimal. As I explained in my post above, you can do it with only four queries: as soon as you learn that one of the strings is different from the other two in a particular bit position, you should query its other bit.
Sorry again, yeah, I’m still caught up in the induction stuff, where we work on a smaller version with an addition grafted on. You’re right, of course, we have n+1 strings in the case you’re talking about here.
Yes, I think your logic works well, in that case. To query each string once, we must have queried some index (at least) twice, pigeonhole-principle style. And if those two queries don’t match, we’re stuck, unable to mismatch both strings at that index (because they differ there, so you can’t mismatch both there at once), or (definitely) mismatch them at any other index (because we have no information about either of those strings at any other index.)