Looking at Harvard's STAT 110 book

Sven Halvorson 2019-09-07

I have seen a lot of people write about this book before so I thought I would take a little look. I always loved the counting problems in school so doing a few of these problems was fun. This also was a way to practice some latex.

Definitions:

  • Sample space: The sample space $S$ of an experiment is the set of all possible outcomes
  • Event: An event $A$ is a subset of $S$. When we say that $A$ ocurred we mean that the outcome of the experiment was in $A$.
  • Naive probability: The probability of an event $A$ is the number of elements in $A$ divided by the number of elements in $S$.
  • Symmetry: Naive probability depends on the idea that all outcomes are equally likely.
  • Simple random sample: A subset of the sample space with symmetric probability. This is often desired but rarely achieved. Sometimes this is assumed as a null model in order to test the idea that all the outcomes are equally likely.
  • Binomial coefficients: For positive integers $n$ and $k$, the number of subsets of a set with $n$ elements of size $k$ is $n\choose k$, else 0.
  • Probability space: A probability space must have these elements:
    • A set called the sample space $S$
    • A probability function $P$ that has these properties:
      • The domain is all subsets of $S$
      • The codomain is real numbers in [0,1]
      • $P(\emptyset) = 0$ and $P(S) = 1$
      • Given any set (infinite or finite) of disjoint subsets of $S$, $A_{1},A_{2}...$ we have $P(\bigcup\limits_{j=1}^{\infty} A_{i}) = \sum_{j=1}^\infty$$P(A_{j})$
  • Frequentist perspective: When we say that an event $A$ has a probability of 0.5, that means that over a very large number of repititions of the experiment, the proportion of experiments resullting in $A$ will result approach 50%
  • Bayesian perspective: Probabilities represent beliefs about the likelihood of an event happening. Under this perspective, probabilities can be assigned to experiments that only happen once or a few times as well as experiments that have already been run but the outcome is unknown.

Theorems:

  • Multiplication rule If an experiment is composed of two subexperiments, $A$ & $B$, where $A$ has $a$ possibilities and $B$ has $b$ possiblities, then the total number of posibilities is $ab$. This generalizes to outcomes within an experiment and any number of subexperiments.
  • Sampling with replacement: When sampling $k$ outcomes from $n$ distinct outcomes and replacing after the selection, the total number of possibilties is $n^k$
  • Sampling without replacement: When sampling $k$ outcomes from $n$ distinct outcomes and not replacing after the selection, the total number of possibilties is $n(n-1)(n-2)...(n-k+1)$
  • Permutation: A sample without replacement of the size of the sample space is called a permutation.
  • Binomial coefficients formula: For natural numbers $n \geq k$, $n\choose k$ $= \frac{n(n-1)...(n-k+1)}{k!} = \frac{n!}{(n-k)!k!}$
  • Bose-Einstein: The number of ways to select $k$ indistinguishabl objects from a set of size $n$ is $n+k-1 \choose k$. This is demonstrated with the 'stars and bars' argument of filling k containers with n objects.
  • Binomial compliment: Assuming $k \leq n$, $n \choose k$$=$$n \choose n-k$. This is like saying that there are as many ways to choose the $k$ elements selected as there are the $n-k$ elements not selected.
  • Team and captain: If $k \leq n$ then $n$$n-1 \choose k-1$ $=$$k$$n \choose k$. This is selecting a team of size $k$ from $n$ prospects and then choosing a captain. You can choose the captain then the rest of the team or choose the team and then choose the captain from that team.
  • Vandermonde's Identity: For non-negative $m, n, k$, $m+n\choose k$$=\sum_{j=0}^k$$n\choose j$$m \choose k-j$. To me this is saying that if we pool two disjoint groups $(m$ and $n)$ and then select $k$ members from that pool, we also could have selected them from each of the groups independently so that the sum stays at $k$.
  • Partnerships: Given $n\geq 0$, $\frac{(2n)!}{2^{n}n!} = (2n-1)(2n-3)...3\cdot1$. This describes the number of ways to partner up the members of a set of size $2n$. On the LHS we can order them in whatever pairing up 1&2, 3&4... This over counts in two ways. First off, each pairing could be internally reversed. Secondly, the order of the pairs themseleves could be reordered in $n!$ ways. The RHS says we have $2n-1$ potential partners for the first member, then once selected the next member has $2n-3$ potential partners and so fourth.
  • Basic properties of probability
    • $P(A^{c}) = 1-P(A)$
    • If $A\subseteq B$ then $P(A)\leq P(B)$
  • Inclusion-Exclusion property: The latex will be too annoying for this one. It basically says that if we have any number of subsets of $S$, we can compute the probability of their union as the sum of their individual probabilities, minus their pairwise intersections, plus their three way intersections, minus their four way intersections...

Other Notes:

  • I find it very comforting that Leibniz believed that a roll of 11 and 12 were equally likely with two dice. I was caught by one of the examples in this and it's good to remember that humans are not so great with probability.

Problems

8a. How many ways are there to split a dozen people into a team of 2 and two teams of 5?

Here we can use binomial coefficients to select the teams but because we aren't distinguishing the two teams of 5, we double count them by selecting (1,2,3,4,5)(6,7,8,9,10) as different than (6,7,8,9,10)(1,2,3,4,5): $$\frac{{12\choose2}{10\choose5}}{2} = 8316$$

8b. How many ways are there to split up a dozen people into three teams of 4?

This is basically the same but we will overcount by $3!$ instead of 2: $$\frac{{12\choose4}{8\choose4}}{3!} = 5775$$

15. Give a story proof that

$$\sum_{k=0}^{n}{n\choose k} = 2^n$$

This represents the number of possible subsets for a set of cardinality $n$. On the left, we can go through all of the subsets of zero elements, then the subsets of 1 element, until we get the subsets of $n$ elements. On the right side, we can consider each element and decide if it is in or out of the subset.

16. Give an algebraic and story proof for this identity (assume $n \geq k$ and $n,k \geq 0$):

$${n\choose k}+{n\choose k-1} = {n+1\choose k}$$

Algebraic: Expanding the LHS with the formula: $$\frac{n!}{(n-k)!k!} + \frac{n!}{(n-k+1)!(k-1)!}$$ Common denominator and add: $$\frac{n!(n-k+1) + k\cdot{n!}}{(n-k+1)!k!}$$ Factor out $n!$: $$\frac{n!(n-k+1 + k)}{(n-k+1)!k!}$$ And it's the RHS: $${n+1\choose k}$$

Story time: The hint suggests that we think of an organization with a president. The RHS of the identity shows us how to select $k$ members from an organization with $n$ workers and a president. The LHS says we can select either $k$ workers and exclude the president or $k-1$ workers and include the president.

I like these story proofs so let's do a few more:

17. Give a story proof that:

$$\sum_{k=0}^n{n\choose k}^2 = {2n\choose n}$$

I find this easier to see if we re-write it as this:

$$\sum_{k=0}^n{n\choose k}{n\choose n-k} = {2n\choose n}$$

This describe fusing two groups of size $n$, and then choosing $n$ members from the total. The RHS directly describes this. The LHS says we can run through all the posibilities, n from group A and 0 from group B, n-1 from group A and 1 from group B, and so fourth.

18. Give a story proof that:

$$\sum_{k=0}^n{k}{n\choose k}^2 = n{2n-1\choose n-1}$$

Again re-phrased: $$\sum_{k=0}^n{k}{n\choose k}{n\choose n-k} = n{2n-1\choose n-1}$$

The hint suggests that we think of this as combining two groups ($A$ & $B$) of size $n$ and then selecting a group of size $n$ but this time we are selecting a chair that comes from group A. On the RHS we select the chair from the $n$ members of group $A$, then we select the remaining $n-1$ members from the leftover $2n-1$ choices. On the LHS we can again run through the choices possibilities of selecting $k$ members from $A$ and $n-k$ members from B, each time having $k$ choices for the chair coming from group $A$.

19. Give a story proof that for integers $n,k\geq 2$:
$$\sum_{k=2}^{n}{{k\choose2}{n-k+2\choose2}}={{n+3}\choose5}$$

So we're given the hint here to think of the middle number of a size 5 subset of $\{1, 2, 3..., n+3\}$. The $n+3$ part looks a little odd but it's just there so that when we have $n,k\geq 2$, it will be possible to take a subset of size 5.

I think the way to consider this is that the RHS is telling us to take a subset of size 5. The LHS is going through the possibilties of what the 'median' is the subset and k is the number of values below that median. Once that median is selected, we need to pick two values below it which is the $k\choose 2$ portion and the two values above which is the $n-k+2\choose 2$ portion. This is also why the sum only goes up to $n$, not $n+2$. We're picking where the median is and the median cannot be in the n+2 or n+3 position.

20. Give a story proof that $${k\choose k}+{k+1\choose k}+{k+2\choose k}+ ... + {n\choose k} = {n+1\choose{k+1}}$$

Apparently this is called the 'hockey stick identity.'

21.a This problem involves a symbol that I can't figure out how to write within the jupyter notebook. So for this I'll use $@ 5\choose3$ to represent the number of partitions of a set of 5 elements into 3 non-empty subsets. We're asked to prove (I assume verbally) that: $${@n+1\choose k} = {@n-1\choose k} + k{@n\choose k}$$

The hint is thinking about a single element being in a solo group or not. On the left we see the number of partitions of n+1 elements into k non-empty subsets. The first term of the RHS is saying that if the $n+1^{th}$ element is in a group by itself, then we have $n$ remaining elements to distribute and $k-1$ possible sets for them. The second term is saying that if the $n+1^{th}$ element is not by itself, we can partition the rest of the elements however we like and then have $k$ choices to place the last element.

21.b Prove: $$\sum_{j = k}^n{n\choose j}{@j\choose k} = {@n+1\choose k+1}$$

The hint given is that we are to consider how many people are not in the group of an identified element. On the RHS we see the number of partitions of $n+1$ elements into $k+1$ non-empty subsets. On the left hand side, we're running through possibilities where we have $j$ elements not in the identified element's group. The sum starts at $k$ because we cannot make $k+1$ non-empty subsets by having fewer than $k$ elements in the selected element's subset. We can select those elements in $n\choose j$ ways.Then we have to partition the remaining elements into $k$ categories which can be done in $@j\choose k$ ways.

22.

24. A family has three boys and three grills. What is the probability that the three oldest are all girls?

The numerator can only occur in one way so we just need to compute the number of orderings for the children. If all were unique, we could do this in $6!$ ways but we would over count by $3!\cdot 3!$ scramblings of the children within the genders. Thus the total should be $$\frac{1}{\frac{6!}{3!\cdot 3!}}=0.05$$

Alternatively, we could simply choose three of the 'positions' to be girls giving us: $$\frac{1}{6\choose 3} = 0.05$$

25. A city with 6 districts has 6 robberies each week. Assuming that the robberies are randomly distributed, what is the probability that a district has more than one robbery?

This solvable with the compliment and stars + bars idea. A single robbery in each district can only happen in one way. To count the number of arrangements we require 5 bars and 6 stars which gives us:

$$1-\frac{1}{6+5\choose 5} = 0.998$$

So the solution that is given by the authors is that we should be using $\frac{6!}{6^6}$ . This is a bit strange to me. From what I can gather, the Bose-Einstein formula comes when we're sampling with replacement and the order doesn't matter. I think my problem is that the BE counting technique involves indistinguishable particles in distinguishable bins. Even though we don't care about the nature of the robberies, they are distinguishable by something like a timestamp*perpatrator.

28. A college has 10 time slots and independently assigns 10 courses to those time periods. There are three statistics courses. What is the probability that at least two of to statistics courses are at the same time?

Again, we should probably do this with the compliment which is the number of ways that we can select three different time slots for the three courses. This is ${10\choose3}\cdot 3!$ because the orderings are different schedules. The denominator should be the number of ways to assign three courses to 10 slots which is $10^3$ since we have 10 choices for each course as to which slot it occupies. So the probability is $$1 - \frac{10*9*8}{10^3} = 0.28$$

31. We sample a population of $N$ elk and tag $n$ of them. Then we resample $m$ elk and want to know what the probability of drawing exactly $k$ tagged elk is in the second sample.

The number of ways we can select $k$ elk from the sample is $n\choose k$. We then have to choose the remaining, untaggetd elk so the the total should be: $$\frac{{n\choose k}{N-n \choose m-k}}{N \choose m}$$

After looking at the solution, it's good to note that there are some restrictions on this. We must have $N\geq n,m \geq k$

33.a We have a jar with $r$ red balls and $g$ green balls and draw two balls in sequence. Explain why the probability of the second ball being green is the same as the probability of the first.

Because we don't know what the color of the first ball will be, we can't alter our judgement about the probability of the second one being green based on the first.

33.b Compute the previously mentioned probabilities.

$$P(first green) = \frac{g}{g+r}$$$$P(second green) = \frac{g}{g+r}\frac{g-1}{g+r-1} + \frac{r}{g+r}\frac{g}{g+r-1} = \frac{g(g+r-1)}{(g+r)(g+r-1)} = \frac{g}{g+r}$$

This is basically going through the possiblities for the first draw.

33.c Suppose that there are 16 total balls and that the probability of the first and second being the same color is the same as the probability of the first and second being different. What are $r$ and $g$?

We have these two equations: $$r+g=16$$

$$\frac{g}{g+r}\frac{g-1}{g+r-1}+\frac{r}{g+r}\frac{r-1}{g+r-1} = 2\frac{r}{g+r}\frac{g}{g+r-1}$$

This second one has the same color ball as the LHS and different colors on the RHS.
Substituting and removing the denominators gives us:

$$(16-r)(16-r-1) + r(r-1) = 2r(16-r)$$$$(16-r)(15-3r) + r(r-1) = 0$$$$(240-63r+3r^2) + (r^2-r) = 0$$$$4(60-16r+r^2)= 0$$$$4(r-10)(r-6)= 0$$

So $r=10, g=6$ or $r=6, g=10$

34a. Compute the probability of a non-royal flush:

We can do this by selecting the suit, then selecting the 5 cards, then remove the royal flush: $$\frac{{4\choose1}({13\choose5}-1)}{52\choose5}\approx0.002$$

34b. Compute the probability of a two-pair: Select the two ranks, then the suits within then, and then the remaining card $$\frac{{13\choose2}{4\choose2}^2{44\choose1}}{52\choose5}\approx0.048$$

42. A norepeatword is one with no repeated letters. Show that the proability of selecting the norepeatword with 26 letters out of all norepeatwords is very close to $\frac{1}{e}$

To create a norepeatword, we select some number of distinct letters and scramble them in any order. The sample space has this many elements: $$\sum_{i=1}^{26}i!{26\choose i} = 26!\sum_{i=1}^{26}\frac{1}{(26-i)!}= 26!\sum_{i=0}^{25}\frac{1}{i!}\approx26!\cdot e$$

There are $26!$ norepeatwords that involve all 26 letters so the probability is about $\frac{1}{e}$ It's kinda wild that more than $\frac{1}{3}$ of the total involve all 26 letters.

43. Show that this inequality holds and describe when it is actually equality:

$$P(A)+P(B)-1\leq P(A\cap B)\leq P(A\cup B) \leq P(A) + P(B)$$

Maybe I'll just make a brief argument for each from left to right

  1. Since all probabilities are bounded by [0,1], we can say that $$P(A\cup B) - 1 \leq 0$$
    Adding $P(A\cap B)$ gives us $$P(A\cup B) + P(A\cap B) - 1 = P(A) + P(B) -1 \leq P(A\cap B)$$
    Equality holds in the case that either $A$ or $B$ is $S$
  2. The intersection is a subset of the union so it must be less. Equality occurs when $A$ and $B$ are identical.
  3. $P(A)+P(B) = P(A\cup B) + P(A\cap B)$ so $P(A) + P(B) \geq P(A\cup B)$. Equality holds when the interesction is empty.

44. Show that $A\subseteq B\implies P(B-A) = P(B)-P(A)$

We can start with: $$P(B-A) = P(B\cap A^c)$$
Complementing twice and using DeMorgan's law gives: $$P((B^c \cup A)^c) = 1-P(B^c \cup A)$$ Because $A$ is a subset of $B$, $A$ and $B^c$ are disjoint. The last axiom of probability gives: $$1-P(B^c)-P(A) = P(B)-P(A)$$

45. Show that the probability of a symmetric difference can be computed like this:
$$P(A\triangle B) = P(A)+P(B)-2P(A\cap B)$$

The symmetric difference could be defined like this: $$P(A\triangle B) = P((A\cup B)\cap(A^c\cup B^c))$$
Use that same trick of compliments to make this a union on its outer layer: $$=1- P((A\cup B)^c\cup(A^c\cup B^c)^c)$$
$$=1- P((A\cup B)^c\cup(A\cap B))$$ These are disjoint so:
$$=1- P((A\cup B)^c) - P(A\cap B)$$
$$=P((A\cup B)) - P(A\cap B)$$
$$=P(A)+P(B) -P(A\cap B) - P(A\cap B)$$
$$=P(A)+P(B) -2P(A\cap B)$$

46. Let $A_1, A_2, ...,A_n$ be events. Let $B_k$ be the event that exactly $k$ of the $A_i$ occur and $C_k$ be the even that at least $k$ of the events occur. Find a simple expression of $B_k$ in terms of $C_k$ and $C_{k+1}$

$$B_k = C_k - C_{k+1}$$


That took longer to type than to solve...

49. A fair die is rolled $n$ times. What is the probability that one more more value never appears?

So the heading suggests we should use inclusion-exclusion principle and this is because it's possible for more than one face to never occur. So we'll do that back and fourth thing. We select some faces to never occur then compute # of possible rolls. $$P(A) = \frac{1}{6^n}\sum_{i=1}^5(-1)^{i+1}{6\choose i}(6-i)^n$$

50. What is the probability of being delt a 'void' in 13 card hand?

This is basically the same as the previous question but we're sampling without replacement: $$P(void) = \frac{1}{52\choose13}\sum_{i=1}^3(-1)^{i+1}{52-13i\choose 13}{4\choose i}\approx0.051$$

51. For a group of 7 people, find the probability that all four seasons occur among their birthdays.

Let's let $R$, $U$, $F$, and $W$ represent the event in which at least one person's birthday is in spring, summer, fall, or winter, respectively. We want to compute:

$$P(R\cap U\cap F\cap S)$$

Using the compliment we get:

$$P(R\cap U\cap F\cap S) = 1-P(R^c\cup U^c\cup F^c\cup S^c)$$

And this sets us up to use inclusion-exclusion where we're taking the event that at least one doesn't happen, then then at least two not happening, and finally exactly one season occuring. This probability should be :
$$P(R^c\cup U^c\cup F^c\cup S^c) = \frac{1}{4^7}\sum_{i=1}^3(-1)^{i+1}{4\choose i}i^7 \approx 0.487$$
$$P(R\cap U\cap F\cap S) \approx 1 - 0.487 \approx 0.513$$

54. Alice selects 6 classes from a total of 30 non-overlapping courses. There are 6 courses on each weekday. What is the probability she has one class each day.

This feels a lot like the previous one. If $M, U, W, R, F$ are the events that she has a class on each of the respective weekdays, we want to compute:

$$P(M\cap U\cap W\cap R\cap F) = 1-P(M^c\cup U^c\cup W^c\cup R^c\cup F^c)$$

Using the inclusion/exclusion principle, we can compute the probability that at least one is absent, minus two absent, plus three...

$$P(M^c\cup U^c\cup W^c\cup R^c\cup F^c) = \frac{1}{30\choose 7}\sum_{i = 1}^3(-1)^{i+1}{5\choose i}{30-6i\choose 7} \approx 0.698$$


$$P(M\cap U\cap W\cap R\cap F) \approx 1-0.698 \approx 0.302$$

Note that the sum only goes to three as there are no possible schedules with courses on one day.

58. An inspector has 12 widgets and among them are 3 defective. If they test them one by one, what is the probability that 9 or more tests are required to identify the three defective units.

Can't quite figure this one out... I believe my simmulation as it seems more reasonable but it seems like I'm overcounting here somewhere.

So for this problem, we're looking for arrangements of the 12 widgets for which at least one defective one is in the final 4 tested. I think this looks like an inclusion-exclusion question because we have three possibly overlapping events in which each of the three defective widgets lies in the back end of the testing lineup. Define $D_1, D_2, D_3$ to be the events that the first, second, or third defective widget is in the final four widgets tested. Then we want:
$$P(D_1\cup D_2\cup D_3) = P(D_1) - P(D_2) + P(D_3)$$
$$P(D_1\cup D_2\cup D_3) = 1-(D_1^c\cap D_2^c\cap D_3^c)$$

We can write that probability as this sum:

$$P(D_1\cup D_2\cup D_3) = \frac{1}{12!}\sum_{3}^{i=1}(-1)^{i+1}{3\choose i}{12-i\choose 4-i}8!\cdot4!$$

This is saying that we alternate for the inclusion-exclusion principle, then we choose 1, 2, or 3 of the defective units to be in the final four, then choose the other members to be in the last four, and finally scramble each group.

This comes out to:

$$P(D_1\cup D_2\cup D_3)\approx0.745$$

There isn't a solution for this one so let's see if we can simpulate it with python:

In [27]:
import random
# Make the list of functioning and defective widgets:
widgets = [0 for i in range(9)] + [1 for i in range(3)]
print(widgets)

# Store the total number of hits, shuffle the list
total = 0
for i in range(100000):
    random.shuffle(widgets)
    # The index finds the first so we could imagine reversing the list
    # and finding if the first 1 occurs within the first four:
    if widgets.index(1) <= 3:
        total += 1
total/100000
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
Out[27]:
0.74395

59. Given 15 chocolate bars and 10 children, how many ways can we distribute them if...

a. The chocolate bars are fungible
This is basically the stars and bars problem. We have 15 stars and 9 bars so:
$${15+10-1\choose 10} = 1,961,256$$

b. The chocolate bars are fungible and each child must receive at least one: This is the same thing but we reduce the number of stars by the number of children (give each one to start): $${5+10-1\choose 10} = 1,001$$

c. The bars are not fungible
Now we simply have 10 choices for each of the 15 candies: $$10^{15}$$

d. The bars are not fungible and the children must each receive one. For this we'll need inclusion exclusion. We can set $C_i$ to be the event where the $i^{th}$ child receives no candy. Then we want to compute $$\bigcup\limits_{j=1}^{10}C_i = \sum_{j=1}^9(-1)^{i+1}{10\choose j}(10-i)^{15}$$
So the number of ways to distribute at least one to each child is: $$10^{15}-\sum_{j=1}^9(-1)^{i+1}{10\choose j}(10-i)^{15}\approx 4.60\cdot10^{13}$$

61. There are 100 passengers lined up to board a plane with that many seats. The first passenger takes a seat at random. Each subsequent passenger takes their assigned seat if available, otherwise takes a random seat. What's the probability that the last passenger takes their assigned seat?

So I'll admit that I had to look at the solution for this as I started drawing some trees and trying to find a formula to figure out the probability of the $j^{th}$ passenger getting their own seat.

The argument is as follows: The only seats that the last passenger could take are the one assigned to the crazy person and their own. The reason for this is that if we consider the 24th seat, this will either be occupied by the 24th passanger, or a person who randomly selected it befor the 24th person got on board. The same is true for seats 2-99. Because none of the previous passangers are more likely to take the first seat than the 100th, the overall probability is $\frac{1}{2}$

62. Here's a variation on the birthday problem where we do not assign equal likelihood to each day. Let p $= (p_1, p_2,...p_{365})$ be the vector of probabilties for a person to be born on each day of the year. We define the $k^{th}$ elementary symmetric polynomial in the variables $x_1,...,x_n$ as:
$$e_k(x_1,...,x_n)=\sum_{1\leq j_1<j_2<...<j_k\leq n}x_{j_1}...x_{j_k}$$

This is basically the sum of the number of polynomials you can form with $k$ distinct terms from the variable list.

a. Find a simple expression for the probability that there is at least one birthday match in terms of p and the elementary symmetric polynomial.

Here we can just do the compliment like we did before:

$$P(repeated\;birthday) = 1-P(no\;repeats) = 1-k!\cdot e_k(p_1, p_2,...,p_{365})$$

This is like saying, take every combination of $k$ distinct days in the year and multiply their probabilities together and then allow for the $k!$ possible orderings of people within those birthdays.

b. Explain intuitively why it makes sense that the probability of at least one match is minimized when the probabilities are symmetric.

Two things come to mind. One is that the total probability is fixed at 1 so we are distributing out that one to 365 possibilities. When we multiply out the probabilities for the terms of $e_k(p_1, p_2,...,p_{365})$, it's kind of like those easy optimization problems where you're trying to make a fence for your goats with a fixed ammount of wood. The fence is biggest when you make a square and likewise the product of the denominators is the largest when they're equal sizes.

Another way to consider this would be to consider the case where there is a 99% chance of being born January first, a 1% chance on january 2nd, and a 0% chance otherwise. It's not hard to see that repeats will be very likely in this case. If we stepped this back a bit, to something like 5 days with 20% chance each, it's easier to visualize the larger number of combinations that yield no repeats with this configuration.

c. Here is the arithmetic mean and geometric mean inequality. For $x,y\geq 0$:

$$\frac{x+y}{2}\geq \sqrt{xy}$$

Which basically says the arithemtic mean is always larger than the geometric mean.

Define r = $(r_1, r_2,...,r_{365})$ as $r_1=r_2=\frac{p_1+p_2}{2}$ and $r_j = p_j$ for $3\leq j \leq365$

We're asked to verify this:

$$e_k(x_1,...,x_n) = x_1x_2e_{k-2}(x_3,...,x_n)+(x_1+x_2)e_{k-1}(x_3,...,x_n)+e_k(x_3,...,x_n)$$

We can see this as three parts:

  1. All the polynomials involving both $x_1$ and $x_2$
  2. All the polynomials involving one of $x_1$ or $x_2$ but not both
  3. All the polynomials involving neither $x_1$ nor $x_2$

Now we're supposed to use this to prove that:
$$P(at\;least\;one\;birthday\;match|p) \geq P(at\;least\;birthday\;match|r)$$

So we can take rephrase each side of this inequality in terms of the three pieces from the previous fact:

$$P(at\;least\;one\;birthday\;match|p) = p_1p_2e_{k-2}(p_3,...,p_{365})+(p1_1+p_2)e_{k-1}(p_3,...,p_{365})+e_k(p_3,...,p_{365})$$


$$P(at\;least\;one\;birthday\;match|r) = (\frac{p_1+p_2}{2})^2e_{k-2}(p_3,...,p_{365})+(\frac{p_1+p_2}{2}+\frac{p_1+p_2}{2})e_{k-1}(p_3,...,p_{365})+e_k(p_3,...,p_{365})$$

Each equation has the second and third term in common. Each of the first terms have a factor of $e_{k-2}(p_3,...,p_{365})$ so we're really just comparing: $$p_1p_2\;vs.\;(\frac{p_1+p_2}{2})^2$$
Square rooting this gives us the pieces of the inequality: $$\sqrt{p_1p_2}\;vs.\;\frac{p_1+p_2}{2}$$

So this is saying that if you do anything but evenly distribute the probability between the $p_i$'s, we'll get a higher global probability. I don't really want to get in it but I think we could do some sort of inductive proof for this.