Peter Norvig, 12 Feb 2016

A Concrete Introduction to Probability (using Python)¶

This notebook covers the basics of probability theory, with Python 3 implementations. (You should have some background inprobability andPython.)

In 1814, Pierre-Simon Laplacewrote:

Probability ... is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible ... when nothing leads us to expect that any one of these cases should occur more than any other.

Laplace

Pierre-Simon Laplace
1814

Laplace really nailed it, way back then! If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We'll start being methodical by defining some vocabulary:

Experiment:An occurrence with an uncertain outcome that we can observe.
For example, rolling a die.
Outcome:The result of an experiment; one particular state of the world. What Laplace calls a "case."
For example:4.
Sample Space:The set of all possible outcomes for the experiment.
For example,{1, 2, 3, 4, 5, 6}.
Event:A subset of possible outcomes that together have some property we are interested in.
For example, the event "even die roll" is the set of outcomes{2, 4, 6}.
Probability:As Laplace said, the probability of an event with respect to a sample space is the number of favorable cases (outcomes from the sample space that are in the event) divided by the total number of cases in the sample space. (This assumes that all outcomes in the sample space are equally likely.) Since it is a ratio, probability will always be a number between 0 (representing an impossible event) and 1 (representing a certain event).
For example, the probability of an even die roll is 3/6 = 1/2.

This notebook will develop all these concepts; I also have asecond part that covers paradoxes in Probability Theory.

Code for`P`¶

P is the traditional name for the Probability function:

In [1]:

fromfractionsimportFractiondefP(event,space):"The probability of an event, given a sample space of equiprobable outcomes."returnFraction(len(event&space),len(space))

Read this as implementing Laplace's quote directly:"Probability is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible."

Warm-up Problem: Die Roll¶

What's the probability of rolling an even number with a single six-sided fair die?

We can define the sample spaceD and the eventeven, and compute the probability:

In [2]:

D={1,2,3,4,5,6}even={2,4,6}P(even,D)

Out[2]:

Fraction(1, 2)

It is good to confirm what we already knew.

You may ask: Why does the definition ofP uselen(event & space) rather thanlen(event)? Because I don't want to count outcomes that were specified inevent but aren't actually in the sample space. Consider:

In [3]:

even={2,4,6,8,10,12}P(even,D)

Out[3]:

Fraction(1, 2)

Here,len(event) andlen(space) are both 6, so if just divided, thenP would be 1, which is not right.The favorable cases are theintersection of the event and the space, which in Python is(event & space).Also note that I useFraction rather than regular division because I want exact answers like 1/3, not 0.3333333333333333.

Urn Problems¶

Around 1700, Jacob Bernoulli wrote about removing colored balls from an urn in his landmark treatiseArs Conjectandi, and ever since then, explanations of probability have relied onurn problems. (You'd think the urns would be empty by now.)

Jacob Bernoulli

Jacob Bernoulli
1700

For example, here is a three-part problemadapted from mathforum.org:

An urn contains 23 balls: 8 white, 6 blue, and 9 red. We select six balls at random (each possible selection is equally likely). What is the probability of each of these possible outcomes:

all balls are red

3 are blue, 2 are white, and 1 is red
exactly 4 balls are white

So, an outcome is a set of 6 balls, and the sample space is the set of all possible 6 ball combinations. We'll solve each of the 3 parts using ourP function, and also using basic arithmetic; that is,counting. Counting is a bit tricky because:

We have multiple balls of the same color.
An outcome is aset of balls, where order doesn't matter, not asequence, where order matters.

To account for the first issue, I'll have 8 different white balls labelled'W1' through'W8', rather than having eight balls all labelled'W'. That makes it clear that selecting'W1' is different from selecting'W2'.

The second issue is handled automatically by theP function, but if I want to do calculations by hand, I will sometimes first count the number ofpermutations of balls, then get the number ofcombinations by dividing the number of permutations byc!, wherec is the number of balls in a combination. For example, if I want to choose 2 white balls from the 8 available, there are 8 ways to choose a first white ball and 7 ways to choose a second, and therefore 8 × 7 = 56 permutations of two white balls. But there are only 56 / 2 = 28 combinations, because(W1, W2) is the same combination as(W2, W1).

We'll start by defining the contents of the urn:

In [4]:

defcross(A,B):"The set of ways of concatenating one item from collection A with one from B."return{a+bforainAforbinB}urn=cross('W','12345678')|cross('B','123456')|cross('R','123456789')urn

Out[4]:

{'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7', 'W8'}

In [5]:

len(urn)

Out[5]:

Now we can define the sample space,U6, as the set of all 6-ball combinations. We useitertools.combinations to generate the combinations, and then join each combination into a string:

In [6]:

importitertoolsdefcombos(items,n):"All combinations of n items; each combo as a concatenated str."return{' '.join(combo)forcomboinitertools.combinations(items,n)}U6=combos(urn,6)len(U6)

Out[6]:

I don't want to print all 100,947 members of the sample space; let's just peek at a random sample of them:

In [7]:

importrandomrandom.sample(U6,10)

Out[7]:

['R7 W3 B1 R4 B3 W2', 'R7 R3 B1 W4 R5 W6', 'B5 B4 B6 W1 R3 R2', 'W3 B1 R4 W8 R9 W5', 'W7 B6 W1 R3 R9 B3', 'W3 W1 R8 R2 R1 W5', 'B5 R7 R4 W8 R2 W6', 'W7 W1 R3 W4 R5 B3', 'B4 R7 B2 B1 W6 W2', 'W7 B1 W4 W8 W6 W2']

Is 100,947 really the right number of ways of choosing 6 out of 23 items, or "23 choose 6", as mathematicianscall it? Well, we can choose any of 23 for the first item, any of 22 for the second, and so on down to 18 for the sixth. But we don't care about the ordering of the six items, so we divide the product by 6! (the number of permutations of 6 things) giving us:

$$23 ~\mbox{choose}~ 6 = \frac{23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18}{6!} = 100947$$

Note that $23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18 = 23! \;/\; 17!$, so, generalizing, we can write:

$$n ~\mbox{choose}~ c = \frac{n!}{(n - c)! \cdot c!}$$

And we can translate that to code and verify that 23 choose 6 is 100,947:

In [8]:

frommathimportfactorialdefchoose(n,c):"Number of ways to choose c items from a list of n items."returnfactorial(n)//(factorial(n-c)*factorial(c))

In [9]:

choose(23,6)

Out[9]:

Now we're ready to answer the 4 problems:

Urn Problem 1: what's the probability of selecting 6 red balls?¶

In [10]:

red6={sforsinU6ifs.count('R')==6}P(red6,U6)

Out[10]:

Fraction(4, 4807)

Let's investigate a bit more. How many ways of getting 6 red balls are there?

In [11]:

len(red6)

Out[11]:

Why are there 84 ways? Because there are 9 red balls in the urn, and we are asking how many ways we can choose 6 of them:

In [12]:

choose(9,6)

Out[12]:

So the probabilty of 6 red balls is then just 9 choose 6 divided by the size of the sample space:

In [13]:

P(red6,U6)==Fraction(choose(9,6),len(U6))

Out[13]:

True

Urn Problem 2: what is the probability of 3 blue, 2 white, and 1 red?¶

In [14]:

b3w2r1={sforsinU6ifs.count('B')==3ands.count('W')==2ands.count('R')==1}P(b3w2r1,U6)

Out[14]:

Fraction(240, 4807)

We can get the same answer by counting how many ways we can choose 3 out of 6 blues, 2 out of 8 whites, and 1 out of 9 reds, and dividing by the number of possible selections:

In [15]:

P(b3w2r1,U6)==Fraction(choose(6,3)*choose(8,2)*choose(9,1),len(U6))

Out[15]:

True

Here we don't need to divide by any factorials, becausechoose has already accounted for that.

We can get the same answer by figuring: "there are 6 ways to pick the first blue, 5 ways to pick the second blue, and 4 ways to pick the third; then 8 ways to pick the first white and 7 to pick the second; then 9 ways to pick a red. But the order'B1, B2, B3' should count as the same as'B2, B3, B1' and all the other orderings; so divide by 3! to account for the permutations of blues, by 2! to account for the permutations of whites, and by 100947 to get a probability:

In [16]:

P(b3w2r1,U6)==Fraction((6*5*4)*(8*7)*9,factorial(3)*factorial(2)*len(U6))

Out[16]:

True

Urn Problem 3: What is the probability of exactly 4 white balls?¶

We can interpret this as choosing 4 out of the 8 white balls, and 2 out of the 15 non-white balls. Then we can solve it the same three ways:

In [17]:

w4={sforsinU6ifs.count('W')==4}P(w4,U6)

Out[17]:

Fraction(350, 4807)

In [18]:

P(w4,U6)==Fraction(choose(8,4)*choose(15,2),len(U6))

Out[18]:

True

In [19]:

P(w4,U6)==Fraction((8*7*6*5)*(15*14),factorial(4)*factorial(2)*len(U6))

Out[19]:

True

Revised Version of`P`, with more general events¶

To calculate the probability of an even die roll, I originally said

even = {2, 4, 6}

But that's inelegant—I had to explicitly enumerate all the even numbers from one to six. If I ever wanted to deal with a twelve or twenty-sided die, I would have to go back and changeeven. I would prefer to defineeven once and for all like this:

In [20]:

defeven(n):returnn%2==0

Now in order to makeP(even, D) work, I'll have to modifyP to accept an event as eitheraset of outcomes (as before), or apredicate over outcomes—a function that returns true for an outcome that is in the event:

In [21]:

defP(event,space):"""The probability of an event, given a sample space of equiprobable outcomes.    event can be either a set of outcomes, or a predicate (true for outcomes in the event)."""ifis_predicate(event):event=such_that(event,space)returnFraction(len(event&space),len(space))is_predicate=callabledefsuch_that(predicate,collection):"The subset of elements in the collection for which the predicate is true."return{eforeincollectionifpredicate(e)}

Here we see howsuch_that, the neweven predicate, and the newP work:

In [22]:

such_that(even,D)

Out[22]:

{2, 4, 6}

In [23]:

P(even,D)

Out[23]:

Fraction(1, 2)

In [24]:

D12={1,2,3,4,5,6,7,8,9,10,11,12}such_that(even,D12)

Out[24]:

{2, 4, 6, 8, 10, 12}

In [25]:

P(even,D12)

Out[25]:

Fraction(1, 2)

Note:such_that is just like the built-in functionfilter, exceptsuch_that returns a set.

We can now define more interesting events using predicates; for example we can determine the probability that the sum of a three-dice roll is prime (using a definition ofis_prime that is efficient enough for smalln):

In [26]:

D3={(d1,d2,d3)ford1inDford2inDford3inD}defprime_sum(outcome):returnis_prime(sum(outcome))defis_prime(n):returnn>1andnotany(n%i==0foriinrange(2,n))P(prime_sum,D3)

Out[26]:

Fraction(73, 216)

Card Problems¶

Consider dealing a hand of five playing cards. We can definedeck as a set of 52 cards, andHands as the sample space of all combinations of 5 cards:

In [27]:

suits='SHDC'ranks='A23456789TJQK'deck=cross(ranks,suits)len(deck)

Out[27]:

In [28]:

Hands=combos(deck,5)assertlen(Hands)==choose(52,5)random.sample(Hands,5)

Out[28]:

['AH 6D 5D TS 4H', 'JC AD AH 7S QC', '6C 7S 3H 9C KH', '6D 5C QH TH QS', '6C 3D 5D KH 5S']

Now we can answer questions like the probability of being dealt a flush (5 cards of the same suit):

In [29]:

defflush(hand):returnany(hand.count(suit)==5forsuitinsuits)P(flush,Hands)

Out[29]:

Fraction(33, 16660)

Or the probability of four of a kind:

In [30]:

deffour_kind(hand):returnany(hand.count(rank)==4forrankinranks)P(four_kind,Hands)

Out[30]:

Fraction(1, 4165)

Fermat and Pascal: Gambling, Triangles, and the Birth of Probability¶

No description has been provided for this image

Pierre de Fermat
1654

Blaise Pascal]
1654

Consider a gambling game consisting of tossing a coin. Player H wins the game if 10 heads come up, and T wins if 10 tails come up. If the game is interrupted when H has 8 heads and T has 7 tails, how should the pot of money (which happens to be 100 Francs) be split?In 1654, Blaise Pascal and Pierre de Fermat corresponded on this problem, with Fermatwriting:

Dearest Blaise,

As to the problem of how to divide the 100 Francs, I think I have found a solution that you will find to be fair. Seeing as I needed only two points to win the game, and you needed 3, I think we can establish that after four more tosses of the coin, the game would have been over. For, in those four tosses, if you did not get the necessary 3 points for your victory, this would imply that I had in fact gained the necessary 2 points for my victory. In a similar manner, if I had not achieved the necessary 2 points for my victory, this would imply that you had in fact achieved at least 3 points and had therefore won the game. Thus, I believe the following list of possible endings to the game is exhaustive. I have denoted 'heads' by an 'h', and tails by a 't.' I have starred the outcomes that indicate a win for myself.

h h h h *       h h h t *       h h t h *       h h t t *h t h h *       h t h t *       h t t h *       h t t tt h h h *       t h h t *       t h t h *       t h t tt t h h *       t t h t         t t t h         t t t t

I think you will agree that all of these outcomes are equally likely. Thus I believe that we should divide the stakes by the ration 11:5 in my favor, that is, I should receive (11/16)*100 = 68.75 Francs, while you should receive 31.25 Francs.

I hope all is well in Paris,

Your friend and colleague,

Pierre

Pascal agreed with this solution, andreplied with a generalization that made use of his previous invention, Pascal's Triangle. There's evena book about it.

We can solve the problem with the tools we have:

In [31]:

defwin_unfinished_game(Hneeds,Tneeds):"The probability that H will win the unfinished game, given the number of points needed by H and T to win."defHwins(outcome):returnoutcome.count('h')>=HneedsreturnP(Hwins,continuations(Hneeds,Tneeds))defcontinuations(Hneeds,Tneeds):"All continuations of a game where H needs `Hneeds` points to win and T needs `Tneeds`."rounds=['ht'for_inrange(Hneeds+Tneeds-1)]returnset(itertools.product(*rounds))

In [32]:

continuations(2,3)

Out[32]:

{('h', 'h', 'h', 'h'), ('h', 'h', 'h', 't'), ('h', 'h', 't', 'h'), ('h', 'h', 't', 't'), ('h', 't', 'h', 'h'), ('h', 't', 'h', 't'), ('h', 't', 't', 'h'), ('h', 't', 't', 't'), ('t', 'h', 'h', 'h'), ('t', 'h', 'h', 't'), ('t', 'h', 't', 'h'), ('t', 'h', 't', 't'), ('t', 't', 'h', 'h'), ('t', 't', 'h', 't'), ('t', 't', 't', 'h'), ('t', 't', 't', 't')}

In [33]:

win_unfinished_game(2,3)

Out[33]:

Fraction(11, 16)

Our answer agrees with Pascal and Fermat; we're in good company!

Non-Equiprobable Outcomes: Probability Distributions¶

So far, we have made the assumption that every outcome in a sample space is equally likely. In real life, we often get outcomes that are not equiprobable. For example, the probability of a child being a girl is not exactly 1/2, and the probability is slightly different for a second child. Anarticle gives the following counts for two-child families in Denmark, whereGB means a family where the first child is a girl and the second a boy:

GG: 121801    GB: 126840BG: 127123    BB: 135138

We will introduce three more definitions:

Frequency: a number describing how often an outcome occurs. Can be a count like 121801, or a ratio like 0.515.
Distribution: A mapping from outcome to frequency for each outcome in a sample space.
Probability Distribution: A distribution that has beennormalized so that the sum of the frequencies is 1.

We defineProbDist to take the same kinds of arguments thatdict does: either a mapping or an iterable of(key, val) pairs, and/or optional keyword arguments.

In [34]:

classProbDist(dict):"A Probability Distribution; an {outcome: probability} mapping."def__init__(self,mapping=(),**kwargs):self.update(mapping,**kwargs)# Make probabilities sum to 1.0; assert no negative probabilitiestotal=sum(self.values())foroutcomeinself:self[outcome]=self[outcome]/totalassertself[outcome]>=0

We also need to modify the functionsP andsuch_that to accept either a sample space or a probability distribution as the second argument.

In [35]:

defP(event,space):"""The probability of an event, given a sample space of equiprobable outcomes.    event: a collection of outcomes, or a predicate that is true of outcomes in the event.    space: a set of outcomes or a probability distribution of {outcome: frequency} pairs."""ifis_predicate(event):event=such_that(event,space)ifisinstance(space,ProbDist):returnsum(space[o]foroinspaceifoinevent)else:returnFraction(len(event&space),len(space))defsuch_that(predicate,space):"""The outcomes in the sample pace for which the predicate is true.    If space is a set, return a subset {outcome,...};    if space is a ProbDist, return a ProbDist {outcome: frequency,...};    in both cases only with outcomes where predicate(element) is true."""ifisinstance(space,ProbDist):returnProbDist({o:space[o]foroinspaceifpredicate(o)})else:return{oforoinspaceifpredicate(o)}

Here is the probability distribution for Danish two-child families:

In [36]:

DK=ProbDist(GG=121801,GB=126840,BG=127123,BB=135138)DK

Out[36]:

{'BB': 0.2645086533229465, 'BG': 0.24882071317004043, 'GB': 0.24826679089140383, 'GG': 0.23840384261560926}

And here are some predicates that will allow us to answer some questions:

In [37]:

deffirst_girl(outcome):returnoutcome[0]=='G'deffirst_boy(outcome):returnoutcome[0]=='B'defsecond_girl(outcome):returnoutcome[1]=='G'defsecond_boy(outcome):returnoutcome[1]=='B'deftwo_girls(outcome):returnoutcome=='GG'

In [38]:

P(first_girl,DK)

Out[38]:

0.4866706335070131

In [39]:

P(second_girl,DK)

Out[39]:

0.4872245557856497

The above says that the probability of a girl is somewhere between 48% and 49%, but that it is slightly different between the first or second child.

In [40]:

P(second_girl,such_that(first_girl,DK)),P(second_girl,such_that(first_boy,DK))

Out[40]:

(0.4898669165584115, 0.48471942072973107)

In [41]:

P(second_boy,such_that(first_girl,DK)),P(second_boy,such_that(first_boy,DK))

Out[41]:

(0.5101330834415885, 0.5152805792702689)

The above says that the sex of the second child is more likely to be the same as the first child, by about 1/2 a percentage point.

Newton's Answer to a Problem by Pepys¶

Isaac Newton
1693

Samuel Pepys
1693

This paper explains how Samuel Pepys wrote to Isaac Newton in 1693 to pose the problem:

Which of the following three propositions has the greatest chance of success?

Six fair dice are tossed independently and at least one “6” appears.
Twelve fair dice are tossed independently and at least two “6”s appear.
Eighteen fair dice are tossed independently and at least three “6”s appear.

Newton was able to answer the question correctly (although his reasoning was not quite right); let's see how we can do. Since we're only interested in whether a die comes up as "6" or not, we can define a single die and the joint distribution overn dice as follows:

In [46]:

die=ProbDist({'6':1/6,'-':5/6})defdice(n,die):"Joint probability from tossing n dice."ifn==1:returndieelse:returnjoint(die,dice(n-1,die))

In [47]:

dice(3,die)

Out[47]:

{'---': 0.5787037037037037, '--6': 0.11574074074074073, '-6-': 0.11574074074074073, '-66': 0.023148148148148143, '6--': 0.11574074074074073, '6-6': 0.023148148148148143, '66-': 0.023148148148148143, '666': 0.0046296296296296285}

Now we are ready to determine which proposition is more likely to have the required number of sixes:

In [48]:

defat_least(k,result):returnlambdas:s.count(result)>=k

In [49]:

P(at_least(1,'6'),dice(6,die))

Out[49]:

0.6651020233196161

In [50]:

P(at_least(2,'6'),dice(12,die))

Out[50]:

0.6186673737323009

In [51]:

P(at_least(3,'6'),dice(18,die))

Out[51]:

0.5973456859478227

We reach the same conclusion Newton did, that the best chance is rolling six dice.

Simulation¶

Sometimes it is inconvenient to explicitly define a sample space. Perhaps the sample space is infinite, or perhaps it is just very large and complicated, and we feel more confident in writing a program tosimulate one pass through all the complications, rather than try toenumerate the complete sample space.Random sampling from the simulationcan give an accurate estimate of the probability.

Simulating Monopoly¶

Mr. Monopoly
1940—

Considerproblem 84 from the excellentProject Euler, which asks for the probability that a player in the game Monopoly ends a roll on each of the squares on the board. To answer this we need to take into account die rolls, chance and community chest cards, and going to jail (from the "go to jail" space, from a card, or from rolling doubles three times in a row). We do not need to take into account anything about buying or selling properties or exchanging money or winning or losing the game, because these don't change a player's location. We will assume that a player in jail will always pay to get out of jail immediately.

A game of Monopoly can go on forever, so the sample space is infinite. But even if we limit the sample space to say, 1000 rolls, there are $21^{1000}$ such sequences of rolls (and even more possibilities when we consider drawing cards). So it is infeasible to explicitly represent the sample space.

But it is fairly straightforward to implement a simulation and run it for, say, 400,000 rolls (so the average square will be landed on 10,000 times). Here is the code for a simulation:

In [52]:

fromcollectionsimportCounter,dequeimportrandom# The board: a list of the names of the 40 squares# As specified by https://projecteuler.net/problem=84board="""GO   A1 CC1 A2  T1 R1 B1  CH1 B2 B3           JAIL C1 U1  C2  C3 R2 D1  CC2 D2 D3           FP   E1 CH2 E2  E3 R3 F1  F2  U2 F3           G2J  G1 G2  CC3 G3 R4 CH3 H1  T2 H2""".split()defmonopoly(steps):"""Simulate given number of steps of Monopoly game,    yielding the number of the current square after each step."""goto(0)# start at GOCC_deck=Deck('GO JAIL'+14*' ?')CH_deck=Deck('GO JAIL C1 E3 H2 R1 R R U -3'+6*' ?')doubles=0jail=board.index('JAIL')for_inrange(steps):d1,d2=random.randint(1,6),random.randint(1,6)goto(here+d1+d2)doubles=(doubles+1)if(d1==d2)else0ifdoubles==3orboard[here]=='G2J':goto(jail)elifboard[here].startswith('CC'):do_card(CC_deck)elifboard[here].startswith('CH'):do_card(CH_deck)yieldheredefgoto(square):"Update the global variable 'here' to be square."globalherehere=square%len(board)defDeck(names):"Make a shuffled deck of cards, given a space-delimited string."cards=names.split()random.shuffle(cards)returndeque(cards)defdo_card(deck):"Take the top card from deck and do what it says."globalherecard=deck[0]# The top carddeck.rotate(-1)# Move top card to bottom of deckifcard=='R'orcard=='U':whilenotboard[here].startswith(card):goto(here+1)# Advance to next railroad or utilityelifcard=='-3':goto(here-3)# Go back 3 spaceselifcard!='?':goto(board.index(card))# Go to destination named on card

And the results:

In [53]:

results=list(monopoly(400000))

I'll show a histogram of the squares, with a dotted red line at the average:

In [54]:

%matplotlib inlineimportmatplotlib.pyplotaspltplt.hist(results,bins=40)avg=len(results)/40plt.plot([0,39],[avg,avg],'r--');

Another way to see the results:

In [55]:

ProbDist(Counter(board[i]foriinresults))

Out[55]:

{'A1': 0.0211675, 'A2': 0.021105, 'B1': 0.022485, 'B2': 0.0233825, 'B3': 0.0231025, 'C1': 0.02732, 'C2': 0.0240925, 'C3': 0.02433, 'CC1': 0.0187875, 'CC2': 0.0258225, 'CC3': 0.02384, 'CH1': 0.00864, 'CH2': 0.010405, 'CH3': 0.008545, 'D1': 0.028305, 'D2': 0.0294775, 'D3': 0.03062, 'E1': 0.0285925, 'E2': 0.0273425, 'E3': 0.0317775, 'F1': 0.027155, 'F2': 0.02686, 'F3': 0.0261775, 'FP': 0.0289025, 'G1': 0.0269675, 'G2': 0.0258, 'G3': 0.0249025, 'GO': 0.0305975, 'H1': 0.0219575, 'H2': 0.0261525, 'JAIL': 0.0621475, 'R1': 0.03009, 'R2': 0.028755, 'R3': 0.030435, 'R4': 0.024545, 'T1': 0.023715, 'T2': 0.02204, 'U1': 0.0257525, 'U2': 0.0279075}

There is one square far above average:JAIL, at a little over 6%. There are four squares far below average: the three chance squares,CH1,CH2, andCH3, at around 1% (because 10 of the 16 chance cards send the player away from the square), and the "Go to Jail" square, square number 30 on the plot, which has a frequency of 0 because you can't end a turn there. The other squares are around 2% to 3% each, which you would expect, because 100% / 40 = 2.5%.

The Central Limit Theorem / Strength in Numbers Theorem¶

So far, we have talked of anoutcome as being a single state of the world. But it can be useful to break that state of the world down into components. We call these componentsrandom variables. For example, when we consider an experiment in which we roll two dice and observe their sum, we could model the situation with two random variables, one for each die. (Our representation of outcomes has been doing that implicitly all along, when we concatenate two parts of a string, but the concept of a random variable makes it official.)

TheCentral Limit Theorem states that if you have a collection of random variables and sum them up, then the larger the collection, the closer the sum will be to anormal distribution (also called aGaussian distribution or abell-shaped curve). The theorem applies in all but a few pathological cases.

As an example, let's take 5 random variables reprsenting the per-game scores of 5 basketball players, and then sum them together to form the team score. Each random variable/player is represented as a function; calling the function returns a single sample from the distribution:

In [56]:

fromrandomimportgauss,triangular,choice,vonmisesvariate,uniformdefSC():returnposint(gauss(15.1,3)+3*triangular(1,4,13))# 30.1defKT():returnposint(gauss(10.2,3)+3*triangular(1,3.5,9))# 22.1defDG():returnposint(vonmisesvariate(30,2)*3.08)# 14.0defHB():returnposint(gauss(6.7,1.5)ifchoice((True,False))elsegauss(16.7,2.5))# 11.7defOT():returnposint(triangular(5,17,25)+uniform(0,30)+gauss(6,3))# 37.0defposint(x):"Positive integer";returnmax(0,int(round(x)))

And here is a function to sample a random variablek times, show a histogram of the results, and return the mean:

In [57]:

fromstatisticsimportmeandefrepeated_hist(rv,bins=10,k=100000):"Repeat rv() k times and make a histogram of the results."samples=[rv()for_inrange(k)]plt.hist(samples,bins=bins)returnmean(samples)

The two top-scoring players have scoring distributions that are slightly skewed from normal:

In [58]:

repeated_hist(SC,bins=range(60))

Out[58]:

30.09618

In [59]:

repeated_hist(KT,bins=range(60))

Out[59]:

22.1383

The next two players have bi-modal distributions; some games they score a lot, some games not:

In [60]:

repeated_hist(DG,bins=range(60))

Out[60]:

14.02429

In [61]:

repeated_hist(HB,bins=range(60))

Out[61]:

11.70888

The fifth "player" (actually the sum of all the other players on the team) looks like this:

In [62]:

repeated_hist(OT,bins=range(60))

Out[62]:

36.31564

Now we define the team score to be the sum of the five players, and look at the distribution:

In [63]:

defGSW():returnSC()+KT()+DG()+HB()+OT()repeated_hist(GSW,bins=range(70,160,2))

Out[63]:

114.31262

Sure enough, this looks very much like a normal distribution. The Central Limit Theorem appears to hold in this case. But I have to say "Central Limit" is not a very evocative name, so I propose we re-name this as theStrength in Numbers Theorem, to indicate the fact that if you have a lot of numbers, you tend to get the expected result.

Conclusion¶

We've had an interesting tour and met some giants of the field: Laplace, Bernoulli, Fermat, Pascal, Bayes, Newton, ... even Mr. Monopoly and The Count.

The Count

The Count
1972—

The conclusion is: be explicit about what the problem says, and then methodical about defining the sample space, and finally be careful in counting the number of outcomes in the numerator and denominator. Easy as 1-2-3.

Appendix: Continuous Sample Spaces¶

Everything up to here has been about discrete, finite sample spaces, where we canenumerate all the possible outcomes.

But I was asked aboutcontinuous sample spaces, such as the space of real numbers. The principles are the same: probability is still the ratio of the favorable cases to all the cases, but now instead ofcounting cases, we have to (in general) compute integrals to compare the sizes of cases.Here we will cover a simple example, which we first solve approximately by simulation, and then exactly by calculation.

The Hot New Game Show Problem: Simulation¶

Oliver Roeder posedthis problem in the 538Riddler blog:

Two players go on a hot new game show calledHigher Number Wins. The two go into separate booths, and each presses a button, and a random number between zero and one appears on a screen. (At this point, neither knows the other’s number, but they do know the numbers are chosen from a standard uniform distribution.) They can choose to keep that first number, or to press the button again to discard the first number and get a second random number, which they must keep. Then, they come out of their booths and see the final number for each player on the wall. The lavish grand prize — a case full of gold bullion — is awarded to the player who kept the higher number. Which number is the optimal cutoff for players to discard their first number and choose another? Put another way, within which range should they choose to keep the first number, and within which range should they reject it and try their luck with a second number?

We'll use this notation:

A,B: the two players.
A,B: the cutoff values they choose: the lower bound of the range of first numbers they will accept.
a,b: the actual random numbers that appear on the screen.

For example, if playerA chooses a cutoff ofA = 0.6, that means thatA would accept any first number greater than 0.6, and reject any number below that cutoff. The question is: What cutoff,A, should playerA choose to maximize the chance of winning, that is, maximize P(a >b)?

First, simulate the number that a player with a given cutoff gets (note thatrandom.random() returns a float sampled uniformly from the interval [0..1]):

In [64]:

defnumber(cutoff):"Play the game with given cutoff, returning the first or second random number."first=random.random()returnfirstiffirst>cutoffelserandom.random()

In [65]:

number(.5)

Out[65]:

0.643051044503982

Now compare the numbers returned with a cutoff ofA versus a cutoff ofB, and repeat for a large number of trials; this gives us an estimate of the probability that cutoffA is better than cutoffB:

In [66]:

defPwin(A,B,trials=30000):"The probability that cutoff A wins against cutoff B."Awins=sum(number(A)>number(B)for_inrange(trials))returnAwins/trials

In [67]:

Pwin(.5,.6)

Out[67]:

0.49946666666666667

Now define a function,top, that considers a collection of possible cutoffs, estimate the probability for each cutoff playing against each other cutoff, and returns a list with theN top cutoffs (the ones that defeated the most number of opponent cutoffs), and the number of opponents they defeat:

In [ ]:

deftop(N,cutoffs):"Return the N best cutoffs and the number of opponent cutoffs they beat."winners=Counter(AifPwin(A,B)>0.5elseBfor(A,B)initertools.combinations(cutoffs,2))returnwinners.most_common(N)

In [ ]:

fromnumpyimportarange%time top(5, arange(0.50, 0.99, 0.01))

We get a good idea of the top cutoffs, but they are close to each other, so we can't quite be sure which is best, only that the best is somewhere around 0.60. We could get a better estimate by increasing the number of trials, but that would consume more time.

The Hot New Game Show Problem: Exact Calculation¶

More promising is the possibility of makingPwin(A, B) an exact calculation. But before we get toPwin(A, B), let's solve a simpler problem: assume that both playersA andB have chosen a cutoff, and have each received a number above the cutoff. What is the probability thatA gets the higher number? We'll call thisPhigher(A, B). We can think of this as a two-dimensional sample space of points in the (a,b) plane, wherea ranges from the cutoffA to 1 andb ranges from the cutoff B to 1. Here is a diagram of that two-dimensional sample space, with the cutoffsA=0.5 andB=0.6:

The total area of the sample space is 0.5 × 0.4 = 0.20, and in general it is (1 -A) · (1 -B). What about the favorable cases, whereA beatsB? That corresponds to the shaded triangle below:

The area of a triangle is 1/2 the base times the height, or in this case, 0.4² / 2 = 0.08, and in general, (1 -B)² / 2. So in general we have:

Phigher(A, B) = favorable / totalfavorable     = ((1 - B) ** 2) / 2 total         = (1 - A) * (1 - B)Phigher(A, B) = (((1 - B) ** 2) / 2) / ((1 - A) * (1 - B))Phigher(A, B) = (1 - B) / (2 * (1 - A))

And in this specific case we have:

 A = 0.5; B = 0.6 favorable         = 0.4 ** 2 / 2 = 0.08 total             = 0.5 * 0.4    = 0.20 Phigher(0.5, 0.6) = 0.08 / 0.20  = 0.4

But note that this only works when the cutoffA ≤B; whenA >B, we need to reverse things. That gives us the code:

In [ ]:

defPhigher(A,B):"Probability that a sample from [A..1] is higher than one from [B..1]."ifA<=B:return(1-B)/(2*(1-A))else:return1-Phigher(B,A)

In [ ]:

Phigher(0.5,0.6)

We're now ready to tackle the full game. There are four cases to consider, depending on whetherA andB gets a first number that is above or below their cutoff choices:

firsta	firstb	P(a,b)	P(A wins \|a,b)	Comment
a >A	b >B	(1 -A) · (1 -B)	Phigher(A,B)	Both above cutoff; both keep first numbers
a <A	b <B	A ·B	Phigher(0, 0)	Both below cutoff, both get new numbers from [0..1]
a >A	b <B	(1 -A) ·B	Phigher(A, 0)	A keeps number;B gets new number from [0..1]
a <A	b >B	A · (1 -B)	Phigher(0,B)	A gets new number from [0..1];B keeps number

For example, the first row of this table says that the event of both first numbers being above their respective cutoffs has probability (1 -A) · (1 -B), and if this does occur, then the probability ofA winning is Phigher(A,B).We're ready to replace the old simulation-basedPwin with a new calculation-based version:

In [ ]:

defPwin(A,B):"With what probability does cutoff A win against cutoff B?"return((1-A)*(1-B)*Phigher(A,B)# both above cutoff+A*B*Phigher(0,0)# both below cutoff+(1-A)*B*Phigher(A,0)# A above, B below+A*(1-B)*Phigher(0,B))# A below, B above

That was a lot of algebra. Let's define a few tests to check for obvious errors:

In [ ]:

deftest():assertPhigher(0.5,0.5)==Phigher(0.7,0.7)==Phigher(0,0)==0.5assertPwin(0.5,0.5)==Pwin(0.7,0.7)==0.5assertPhigher(.6,.5)==0.6assertPhigher(.5,.6)==0.4return'ok'test()

Let's repeat the calculation with our new, exactPwin:

In [ ]:

top(5,arange(0.50,0.99,0.01))

It is good to see that the simulation and the exact calculation are in rough agreement; that gives me more confidence in both of them. We see here that 0.62 defeats all the other cutoffs, and 0.61 defeats all cutoffs except 0.62. The great thing about the exact calculation code is that it runs fast, regardless of how much accuracy we want. We can zero in on the range around 0.6:

In [ ]:

top(10,arange(0.500,0.700,0.001))

This says 0.618 is best, better than 0.620. We can get even more accuracy:

In [ ]:

top(5,arange(0.61700,0.61900,0.00001))

So 0.61803 is best. Does that numberlook familiar? Can you prove that it is what I think it is?

To understand the strategic possibilities, it is helpful to draw a 3D plot ofPwin(A, B) for values ofA andB between 0 and 1:

In [ ]:

importnumpyasnpfrommpl_toolkits.mplot3d.axes3dimportAxes3Ddefmap2(fn,A,B):"Map fn to corresponding elements of 2D arrays A and B."return[list(map(fn,Arow,Brow))for(Arow,Brow)inzip(A,B)]cutoffs=arange(0.00,1.00,0.02)A,B=np.meshgrid(cutoffs,cutoffs)fig=plt.figure(figsize=(10,10))ax=fig.add_subplot(1,1,1,projection='3d')ax.set_xlabel('A')ax.set_ylabel('B')ax.set_zlabel('Pwin(A, B)')ax.plot_surface(A,B,map2(Pwin,A,B));

What does thisPringle of Probability show us? The highest win percentage forA, the peak of the surface, occurs whenA is around 0.5 andB is 0 or 1. We can confirm that, finding the maximumPwin(A, B) for many different cutoff values ofA andB:

In [ ]:

cutoffs=(set(arange(0.00,1.00,0.01))|set(arange(0.500,0.700,0.001))|set(arange(0.61700,0.61900,0.00001)))

In [ ]:

max([Pwin(A,B),A,B]forAincutoffsforBincutoffs)

SoA could win 62.5% of the time if onlyB would chose a cutoff of 0. But, unfortunately forA, a rational playerB is not going to do that. We can ask what happens if the game is changed so that playerA has to declare a cutoff first, and then playerB gets to respond with a cutoff, with full knowledge ofA's choice. In other words, what cutoff shouldA choose to maximizePwin(A, B), given thatB is going to take that knowledge and pick a cutoff that minimizesPwin(A, B)?

In [ ]:

max(min([Pwin(A,B),A,B]forBincutoffs)forAincutoffs)

And what if we run it the other way around, whereB chooses a cutoff first, and thenA responds?

In [ ]:

min(max([Pwin(A,B),A,B]forAincutoffs)forBincutoffs)

In both cases, the rational choice for both players in a cutoff of 0.61803, which corresponds to the "saddle point" in the middle of the plot. This is astable equilibrium; consider fixingB = 0.61803, and notice that ifA changes to any other value, we slip off the saddle to the right or left, resulting in a worse win probability forA. Similarly, if we fixA = 0.61803, then ifB changes to another value, we ride up the saddle to a higher win percentage forA, which is worse forB. So neither player will want to move from the saddle point.

The moral for continuous spaces is the same as for discrete spaces: be careful about defining your space; count/measure carefully, and let your code take care of the rest.

Movatterモバイル変換

A Concrete Introduction to Probability (using Python)¶

Code for`P`¶

Warm-up Problem: Die Roll¶

Urn Problems¶

Urn Problem 1: what's the probability of selecting 6 red balls?¶

Urn Problem 2: what is the probability of 3 blue, 2 white, and 1 red?¶

Urn Problem 3: What is the probability of exactly 4 white balls?¶

Revised Version of`P`, with more general events¶

Card Problems¶

Fermat and Pascal: Gambling, Triangles, and the Birth of Probability¶

Non-Equiprobable Outcomes: Probability Distributions¶

More Urn Problems: M&Ms and Bayes¶

Newton's Answer to a Problem by Pepys¶

Simulation¶

Simulating Monopoly¶

The Central Limit Theorem / Strength in Numbers Theorem¶

Conclusion¶

Appendix: Continuous Sample Spaces¶

The Hot New Game Show Problem: Simulation¶

The Hot New Game Show Problem: Exact Calculation¶

Movatterモバイル変換

A Concrete Introduction to Probability (using Python)¶

Code forP¶

Warm-up Problem: Die Roll¶

Urn Problems¶

Urn Problem 1: what's the probability of selecting 6 red balls?¶

Urn Problem 2: what is the probability of 3 blue, 2 white, and 1 red?¶

Urn Problem 3: What is the probability of exactly 4 white balls?¶

Revised Version ofP, with more general events¶

Card Problems¶

Fermat and Pascal: Gambling, Triangles, and the Birth of Probability¶

Non-Equiprobable Outcomes: Probability Distributions¶

More Urn Problems: M&Ms and Bayes¶

Newton's Answer to a Problem by Pepys¶

Simulation¶

Simulating Monopoly¶

The Central Limit Theorem / Strength in Numbers Theorem¶

Conclusion¶

Appendix: Continuous Sample Spaces¶

The Hot New Game Show Problem: Simulation¶

The Hot New Game Show Problem: Exact Calculation¶

Code for`P`¶

Revised Version of`P`, with more general events¶