🦜 Parrots

Intro

One of my favorite competitive programming tasks (IOI 2011). You don’t have to actually write any code — just come up with a way to do the task. Suitable for a) people familiar with binary, or b) programmers, or c) people who kinda liked math in school.

Problem statement for non-programmers (rephrased)

  1. Jaxon has an N byte file, which he wants to send to Lachlan. They both know how big the file is.

  2. They don’t have internet (b/c Australia) so they are going to use parrots. Each parrot can memorize a single byte. Bytes are just whole numbers from 0 to 255.

  3. However, parrots don’t necessarily arrive in the same order they departed. So if the file is [1,2,3], Jaxon can’t just take three parrots and teach them the numbers 1, 2, 3 — because the parrots might arrive in the opposite order 3, 2, 1 (or in any other order). He’ll need to be cleverer.

  4. Naturally Jaxon will have to use more parrots than N. It’s alright. We love you anyway, Jaxon. Would be nice to use as few parrots as possible though. Are you listening, Jaxon. Listen to me.

If it’s unclear, you can read the full original statement here. I got rid of some extra cases because I don’t like them.

Problem statement for programmers

Write an encoder and decoder for messages (arrays of bytes). Your decoder will be given the length of the original message.

  • encode(msg: uint8[]): uint8[]
  • decode(n: int, data: uint8[]): uint8[]

The decoder must be able to correctly decode the message even if the data is randomly shuffled after encoding. That is, decode(length(msg), random_shuffle(encode(msg))) == msg for all msg.

Levels of difficulty

  • ONE BROWNIE POINT: the file is N \leq 16 bytes long, and Jaxon can use at most 10 \times N parrots. Doesn’t have to be optimal.

  • TWO BROWNIE POINTS: the file is N \leq 32 bytes long, and Jaxon can use at most 10 \times N parrots. Also doesn’t have to be optimal.

  • RED VELVET CAKE: come up with the optimal solution that works for any N. Optimal means “minimize the maximum number of parrots you’d possibly need”.

Solution for N = 2 (read if you didn’t get the task)

You can always use three parrots to send a two-byte file. Here’s how:

If the file is [A, B], you just teach two parrots A and one parrot B. This works because:

Even if the parrots get shuffled on their way to Lachlan, he can look at which byte is more common and he’ll know it was the first byte of the file.

4 Likes

I don’t have a solution specifically for the second level, but the first and the third both seem pretty simple to brute force. (And if you can solve the third, I guess that also solves the second, if not as elegantly.)

For the first:

Use the first 7 bits of each byte for a sequence number, with the actual information only in the last bit of each byte. Using only one bit per byte means that you need 8 \times N parrots, and using a seven bit sequence number allows you to send a message requiring up to 2^7 = 128 pieces (parrots). If each parrot carries one bit, that’s \frac{128}{8} = 16 bytes, precisely what we need for the first level of the problem.

For the third:

With k parrots, you can distinguish between exactly \binom{k + 256 - 1}{k} possible message encodings, a la the classic “balls and urns” combinatorics problem. Each parrot is a “ball” and you’re putting it into one of 256 “urns”, each labeled by an 8-bit integer, based on which byte the parrot carries. Decide on an order for these encodings, these 256-tuples of integers, say the lexicographical order. (The first element of such a tuple is the number of parrots carrying the byte 000000, the second the number of parrots carrying 000001, the third the number of parrots carrying 000010, etc.) Now you have a bijection between the \binom{k + 255}{k} possible message encodings and the natural numbers from 0 to \binom{k + 255}{k} - 1 — which will be the decoded message. (You’re given N when decoding the message, so you can, say, sort the set of length-N arrays of eight-bit integers lexicographically and use that number as an index into that ordering to recover the original array.)

To encode a message of length N, you need to choose a k such that 8^N = 2^{3N} is less than or equal to \binom{k + 255}{k}, so choose the least k such that \frac{log_{2}\binom{k + 255}{k}}{3} \geq N.

1 Like

@zzq Excellent job on both.

Actually implementing the bijection is slightly non-trivial, especially in 2011 when people mostly didn’t have access to bignums in competitions. But even with bignums it’s still fun to implement. I didn’t participate in IOI but my classmate was a strong competitive programmer and he didn’t manage to come up with the bijection idea at all, IIRC.

1 Like

When viewed as a pure math puzzle (as indeed you presented it) this problem is much easier to tackle than when approached as a competitive programing puzzle.

Viewing it as a math problem, I started by thinking “if I can find the number of possible encodings for any given number of parrots, I can put those encodings in a bijection with the natural numbers less than it.” That’s why I called this solution “brute force”—it enumerates all possible encodings and builds a lookup table matching that list to all possible messages that could be sent. With that as a starting point, all that was left was to identify that this was a combinatorics problem, which made it trivial to calculate that number.

If you start with the notion that you put the possible encodings in a bijection with a subset of the naturals, the rest of the problem is relatively easy. But approaching this from a competitive programing angle, you’d probably start from the opposite direction—and that, I imagine, makes it a lot harder.

Ah, I finally got the third one while sitting in the car park waiting for my wife to give blood! So here’s what I came up with:

For N<=16: use the first 4 bits to encode which word (byte) of the file it represents. Then use the next 2 bits to encode which pair of bits in that word we’re sending. Use the final 2 bits to send those two bits. This uses 4 parrots per byte sent, and works for up to 2^4 words.

For N<=32: write the whole file as a bit string (max length 256 bits). Each parrot is a byte representing an index position in the bit string, 0-255. Send a parrot for each 1 in the bit string. On average you’ll have to send 128 parrots for a 32 byte file, as half the bits will be 1s.

For arbitrary N: extending the previous method, break the file into successive 256 bit strings. Send 1 parrot for each 1 in the first 256 bits, 2 parrots for each 1 in the 2nd 256 bits, 4 for each in the 3rd, 8 for each in the 4th, and so on. So say I receive 23 parrots with the value 27: I know 23 is 10111 in binary, so there is a 1 in the 27th place in the 1st, 2nd, 3rd and 5th sequence of 256 bits. Parrots used is stochastic, depending on message, so I’m sure it’s not optimal!

1 Like

@clivemeister

  • N\le16 – looks correct.

    You are using 6 bits for position. This turns out to be more efficient than using 7 bits for position as @zzq did. In general if you spend X bits on position you can send 2^X things of size (8-X) bits. If you try for all possible X-es, only 6 and 7 bits let you transmit 16 bytes.

  • N\le32 – also looks correct.

    256 parrots in the worst case but it’s still less than 10\times N, so well done. Cleverly uses the fact that the number of parrots doesn’t have to be constant. That’s kinda cool actually.

  • Arbitrary N – even more cleverly uses the same fact (I’m jealous), but yeah, it’s not optimal.

    (Just judging by the fact that it heavily privileges messages with lots of zeroes.)

    If you want a slight (moderate?) hint re/ optimal solution, here’s a hint → the optimal solution is entirely completely different.

Ah, I came up with a path to a solution that’s probably closer to optimal for arbitrary N. It goes something like:

We might be able to use Multisets! I needed to read up in Wikipedia on this, I admit - in programming, I’d think of them as the datatype Bag. Then our message is a Bag of parrots, containing some number of each parrot, where parrots have a value 0-255. So the message is a Multiset, with the cardinality of the Multiset large enough to cover the message space, with the message interpreted as a binary number with all the digits concatenated together. So some number in the range 0-N^2^3, for N digits. We establish a 1-1 mapping from all possible Multisets to the numbers in the range to N^2^3 (just some intuitive ordering of the possible multisets will do), then we can send the Bag of parrots, work out which multiset we’ve got, and map back to the number, i.e. to the message. Seems like this should work!

Wikipedia tells me that I need to use the “multiset number” to work out how many parrots I’d need for any message, and I can write this down on a bit of paper, but my ability to write it onto this forum post is limited by my knowledge of math notation in markup, so this is left as an exercise for the reader :slight_smile:

Right, now I’m going to read your hint, see if it was going in this direction or not…

@clivemeister That’s correct! You get a slice of the red velvet cake.

1 Like

This is a very cool problem! I haven’t looked at the spoilers yet but let’s officially say the statute of limitations on spoilering is over (does anyone prefer a spoiler-tags-forever policy?).

So here are some stabs at this, after out-loud discussion with @bee and @morehavoc. A lot of this is due to @morehavoc, who is one smart cookie:

For 1 brownie point, where your message has up to 16 bytes, you can allocate the first 7 bits of each parrot’s payload to represent the parrot’s position. So then you just split the N-byte message into n=8N (at most 16×8=128) bits and you’re good: Parrot i's payload is a byte that tells you its exact position and the bit that goes in that position. Done.

But beyond 16 bytes (128 bits) we need a new strategy. You’d need all 8 bits just to store the parrot’s position, with no room for any bit of the message.

So here’s an absurdly, hilariously inefficient general solution, to get us started:

Your whole n-bit message can be treated as a single integer – something in the range 0 to 2^n-1. If you send that many parrots, you’ve technically encoded the whole message. The parrots’ payloads are irrelevant. Purely knowing the quantity of parrots sent suffices.

I mean, you quickly need more parrots than there are atoms in the universe, but in theory…

Ok, but now we can improve it pretty drastically by splitting the n-bit message into 2^8=256 chunks, numbered 0255.

Each chunk will have \left\lceil n/256 \right\rceil or \left\lfloor n/256 \right\rfloor bits. For example, if the message happens to be 256 bits then every chunk will be 256/256=1 bit. If the message is less than 256 bits, some chunks will be size zero. When you hit 257 bits, you need one 2-bit chunk. At 512 bits, every chunk is 2 bits. And so on.

Now, for each chunk (of non-zero size), look at what number that chunk encodes. That number is how many parrots you need for that chunk. Suppose this is chunk number 42 which is a 2-bit chunk and the 2 bits are “11”. That encodes 3 in binary so we’ll need 3 parrots for chunk 42. Give all 3 the same payload: 42, or “101010” in binary.

When your friend receives 3 parrots with payload 42, they know that the bits for chunk 42 are “11” (again, because 3 is “11” in binary).

In the worst-case, the number of parrots for an n-bit or N-byte message would be this:

256\cdot 2^\frac{n}{256} = 256\cdot 2^\frac{N}{32}

At this point I got stuck for a while, thinking there must be something much cleverer and more efficient we could do. But now I’m not so sure. Suppose each parrot can only memorize a single bit. In that case all the receiver learns is 2 integers – the number of parrots who say “zero” and the number who say “one”. So splitting the message into 2 parts seems like the best we can do. Now imagine the parrots can memorize 5 bits – so about 1 letter of the alphabet. And imagine you’ve encoded a Shakespeare play. The receiver effectively is going to get a book of letters where the first several pages are all A, then a bunch of pages with all B, etc. Again, all the receiver is learning is 32 integers – the numberof each of the possible symbols a parrot can memorize, encoded as the number of parrots who memorized that symbol. With 8 bits we get 256 symbols – not too much better.

So I guess it’s starting to feel believable that the above is the best we can do?

Unless we don’t need a 100% guarantee. If a probabilistic guarantee is good enough, here’s another clever solution @morehavoc thought of:

Use a normal error correction code like a Hamming code to turn your n-bit string into a slightly-more-than-n-bit string in the normal way. Now ship your parrots. The receiver just tries parrot permutations until there are no errors. Should work fine with very little space overhead. Assuming you’re immortal.

You can do a lot better than that, but you’re on the right track. (You say that you don’t want spoiler tags, so while I’m not going to put the following hint in them.)

Your example with 32 integers is an excellent start:

[A]ll the receiver is learning is 32 integers – the number of each of the possible symbols a parrot can memorize, encoded as the number of parrots who memorized that symbol.

Good! But you map the 32-tuple of integers you’ve transmitted to messages in the “obvious” way: the first element is the number of As, the second the number of Bs, and so forth. The reason that isn’t so great is that this doesn’t uniquely identify the message you want to send, because the mapping loses information about the order the letters are arranged in.

But what if you preserved that information? You’d have to drop the simplicity of the first element being the number of As, etc, replacing it with a more complicated mapping scheme…

Alright.

Let’s simplify – say, the parrots can only carry {0,1,2,3}. Then we’d be splitting the n-bit message into 4 chunks.

For 4-bit messages, this encoding will be very simple: we send one parrot for each set bit.

  • 0000 → “”
  • 0001 → “3” (one parrot carrying a 3)
  • 0010 → “2”
  • 0011 → “23”

Overall you will get “”,“0”,“01”,“02”,“03”,“012”,“013”,“023”,“0123”,“1”,“12”,“13”,“123”,“2”,“23”,“3” – 2^{16} in total, as expected. The max number of parrots required is 4.

However, note how many other parrot arrangements are left unused — eg. there’s only one 4-parrot arrangement, out of C(7,4) = 35 available to you. So the scheme even for 4-bit messages can be made more efficient.