Birthdays in April

Here’s a problem that came up “naturally” on our family email list and that @bee and I have been having fun with:

How many people do you have to know before probably every day in April is someone-you-know’s birthday?

By a “person” of course we’ll mean a random number in {1, …, 365}. No leap babies or anything. Certainly not caring about whether April is a more common or less common birth month.

And by “probably” we mean >50%. So think of the question as finding the probability as a function of the number of people, n, that all 30 birthdays are hit. Then it’s easy to check what value of n takes us over the 50% threshold.


Warmup 1: How many distinct birthdays do you expect to hit if you know n people?

Warmup 2: In expectation, how many people do you have to ask the birthday of before you get at least one answer for every day in April?


Background:

It may help to first learn about the Coupon Collector’s Problem, which Matt Parker has a nice exposition of on YouTube.

4 Likes

How many friends do you need in order to in expectation have one with a birthday in April? This is your classic geometric distribution—you have a series of Bernoulli trials with a chance of success of p = 30/365, and so the expected value of the number of trials it takes to get the first success is 1/p = 365/30. Call the random variable (the number of friends you need for one with a birthday in April) X_1. E[X_1] = 365/30.

Once you’ve done that, you’ve “blocked off” one of the days: having further friends with that birthday doesn’t help you any more. So now you go get more friends until you hit one with a birthday on one of the remaining 29 days in April: another geometric distribution, now with p=29/365. So you have X_2, the number of further friends you need to find one with a birthday on one of the other 29 days in April. E[X_2] = 365/29.

The important part is that X_1 and X_2 are independent. It doesn’t matter how long it took you to find a friend with an (any) April birthday: as soon as you do, you’ve blocked off exactly one day in April (It doesn’t matter which.) It also doesn’t matter what birthdays your other (non-April-birthday-having) friends have, or how many of those friends you have.

So E(X_1 + X_2) = E(X_1) + E(X_2) = 365/30 + 365/29.

We can continue in this manner, defining X_3 as the number of further friends it takes to find one with an April birthday that doesn’t match the two blocked-off days in April you’ve already found. (It doesn’t matter if along the way to the second distinct April birthday you found more friends with the same birthday as your first April friend: either way you’ve blocked off exactly two days in April, and are now looking for a friend with a birthday on one of the remaining 28.)

E(X_3) = 365/28, and likewise we can define X_4, with E(X_4) = 365/27, and so forth all the way to X_{30}. These are all independent random variables, so E[\sum_{i=1}^{30}X_i] = \sum_{i=1}^{30}E[X_i] = \sum_{i=1}^{30}{{365}\over{31-i}}.

We can evaluate that as 365 \cdot \sum_{i=1}^{30}{1\over{31 - i}} = 365 \cdot \sum_{i=1}^{30}{1\over{i}} = 365 \cdot H_{30}, where H_{30} is the 30th harmonic number, which turns out to be about 3.995. Times 365 that’s 1458.175, so the integer number of friends you need to reach is 1459.

2 Likes

Nice! I think that’s the answer to warmup 2.

1 Like

I guess so. I see it’s subtly different from the original question, which was how many people do you need to know so that probably you meet the every-day-in-April criterion.

If by “probably” we mean at least a 50% chance, we’re looking for the median of the probability distribution that, my proof shows, is the sum of 30 independent random variables each with a geometric distribution with a different parameter.

The sum of independent random variables following geometric distributions with the same parameter is a negative binomial distribution. But I don’t know how to deal with the sum of random variables with geometric distributions whose parameters vary.

But I guess we can approximate it with the Central Limit Theorem? The median of a normal distribution is its mean, and the central limit theorem says that the sum of independent random variables approaches a normal distribution with a mean equal to the sum of the means of those variables. So, approximately, the answer is the same, 1459.

That’s just an approximation, and I’m not sure how much of a good one it is. But to the extent it holds, my answer to the warmup stands as a solution to the main question too.

2 Likes

Yeah, I’ll clarify this with an edit to the question but I think the original question should be treated as “for n people, what’s the probability of hitting all 30 birthdays in April?”. Then check the value of n for which you cross the 50% threshold. It’s different but not toooo different from your approximation.

1 Like

Ok, yeah, so for warmup 2, the expected number of people you’ve got to collect before you get an april 1st bday, an april 2nd bday, an april 3rd bday, etc, is 365/30 + 365/29 + 365/28 + … + 365/1. I think that makes sense to me after watching the standup-maths coupon collector video.

How to tackle warmup 1? Is there some way to use complementary probabilities for that?

2 Likes

Warmup 1:
E[n] – expected number of distinct bdays with n people.

E[1] is one.

When you add a second person you either get the same birthday, or a new birthday. I.e. you’re either adding no new bdays (to the existing 1 distinct bday) with P=1/365, and with the complementary probability, you get a new bday, and so you’re adding 1 new distinct bday to the set with P=364/365.

E[2] = E[1] + P(no new bday)* 0 + P(new bday) * 1
E[2] = 1 + 364/365

So in general, if you’ve already hit some number of days, and then you add another person to the set, you’re going to get a new distinct bday with a probability equal to [how many days are unhit] / [total number of days], or…

E[n] = E[n-1] + (365-E[n-1])/365

Which, you do some algebra to simplify it and you get:

E[n] = 1 + (364/365)E[n-1]

But how to handle that recursive bidness. Well if we start to write it out…

E[1] = 1
E[2] = 1 + (364/365)*1
E[3] = 1 + (364/365)(1 + (364+365)) = 1+ (364/365) + (364/365)^2
E[4] = 1 + (364/365)(1 + (364/365) + (364/365)^2) = 1 + (364/365) + (364/365)^2 + (364/365)^3

so, hey, i see what’s going on there,

E[n] = (364/365)^0 + (364/365)^1 + (364/365)^2 + … + (364/365)^(n-1)

i.e. E[n] = Sum[i=0 to n-1] (364/365)^i

Okay, then there’s this handy trick that Danny always uses to turn a geometric series into a closed form…

2 Likes