Apps Contact Seminars

## Posts tagged ‘probability’

November 20th, 2011

## Monty Hall Problem – Simulation using Python

Had too much time between my flights at Frankfurt, and the old “Find and Win the Elephant/Car Behind 3 Doors” (Monty Hall problem) puzzle is easily the simplest psychology/probability/Math puzzle that bothers people. The puzzle, commonly stated is as follows:

You are a contestant on a game show, where the prize is a Mercedes Benz. You are shown 3 unmarked doors, one of which is the door to the Benz and the other 2 doors lead to nothing (or goats). You can pick a door, and if you pick the right door, you can drive away with a new Benz. Most of us can quickly agree that the probability of winning the Benz is 1/3. (The interesting variation comes next.)

The setup is the same – you are still a contestant, trying to win the Benz, and there are still 3 unmarked doors. However, in this case, the game show host, who knows the door that has the car, is trying to help you. When you pick a door, the game show host flings open one of the two doors that you did not select, and that door (as you can observe after he flings it open) does NOT lead to the car. Then, he gives you a chance to reevaluate your choice – you can keep the door that you originally selected, or to switch to the other unopened door.

The question is: Should you switch?

The subtext here is that door that you selected, has the probability of being the right door of 1/3, and of not being the right door of 2/3. If you did not select the right door initially, when you switch, you will have the right door. If you selected the right door initially, when you switch, you will have a wrong door. So, if you do switch, the probability of you getting to the right door becomes 2/3, and there is a probability of 1/3 that you leave your good door and go to a wrong door. You can choose to look at it very many different ways, and come up with many good arguments, and you can convince yourself one way or the other.

There are many answers to this problem, and one of which, which happens to be the CORRECT one is, that Yes, you should ALWAYS switch. The probability of you winning the Benz if you switch, is indeed 2/3. The probability of you winning the Benz, if you do not switch remains a rather uninteresting 1/3.

If you have had enough of the arguments, you can look at this simple Monte Carlo simulation, and observe the results yourself. Of course, no reason to trust the results without looking at the code, so please feel free to download the source code here. The simulation models two contestants – contestant 1 who always switches, and contestant 2 who never switches and sticks with the original choice.

>>> =================== RESTART ======================
Total number of times run: 100000
Win count when switching the choice: 66613
Win count when not switching the choice: 33267
>>> =================== RESTART ======================
Total number of times run: 1000000
Win count when switching the choice: 666085
Win count when not switching the choice: 333498
>>> =================== RESTART ======================
Total number of times run: 10000000
Win count when switching the choice: 6665600
Win count when not switching the choice: 3329991

It is a Monte Carlo simulation, so you can run it, and you can observe different results each time due to randomization. I do not have Python experience, so I used Python in this example so I could learn it a bit as well. If you have a different programming language that you think I should keep in mind for some other simulation, let me know.

March 27th, 2010

## An old dice problem – probability of hitting 1000 in a running sum

So, you roll a standard six faced unbiased dice unlimited times, and keep your running sum.  What is the probability you will hit 1000 (at some point of time).

Let us call that Pr(S=1000).  Some base cases are easy to start with, and let us use q for 1/6:

Pr(S=1) = q = 0.166667

Pr(S=2) = q + q*Pr (S = 1) = q + q*q = q(1+q) = 0.194444

Pr(S=3) = q + q * Pr (S = 1) + q* Pr(S = 2)

= Pr (S = 2) + q* Pr(S = 2)

= q (1+ q)^2  = 0.226852

Pr(S=4) = q + q * Pr (S = 1) + q* Pr(S = 2) + q* Pr(S = 3)

= Pr(S=3) + q * Pr(S=3)

= q[(1+q)^3] = 0.264660

Pr(S=5) = q + q * Pr (S = 1) + q* Pr(S = 2) + q* Pr(S = 3) + q* Pr(S = 4)

= Pr(S=4) + q * Pr(S=4)

= q[(1+q)^4] = 0.308771

Pr(S=6) = q + q * Pr (S = 1) + q* Pr(S = 2) + q* Pr(S = 3) + q* Pr(S = 4) + q*Pr(S=5)

= Pr(S=5) + q * Pr(S=5)

= q[(1+q)^5] = 0.360232

This is the point from which we deviate from this script, because Pr(S=7) does not have a sequence of length 1 as its component.  Instead, we write the following recurrence relation, which applies as long as N > 6:

Pr(S=N) = Sum of probabilities of all distinct sequences which lead to N.  We can distinguish the sequences based on the last dice roll, which can be {1..6}.

Pr(S=N) = q * Pr(S=N-1) + q * Pr(S=N-2)+ q * Pr(S=N-3)+ q * Pr(S=N-4)+ q * Pr(S=N-5)+ q * Pr(S=N-6)

= q * Sum_{i = 1 to 6}   {Pr (S=N-i)}

That is, Pr(S=N) is the arithmetic average of previous 6 terms of the series.  So, it is easy to see that Pr(S=N) is a series that is stabilizing.  Even using the plain old Excel, the first few terms of the series are:

Pr(S=7) = 0.253604395

Pr(S=8) = 0.268094017

Pr(S=9) = 0.280368945

Pr(S=10) = 0.289288461

By 100, the series stabilizes to Pr(S=100) = 0.285714286.  That is also the value for S=1000, and hence the answer to the question above.

It is easy to miss the observation that 0.285714286 is the same as 1/3.5, where 3.5 is the average of numbers on the dice.

By weak law of large numbers, if a dice is thrown k times, then the expected value of the sum will be close to 3.5 * k.  The probability that the sum will be a specific number would then be 1/3.5.

January 9th, 2010

## 10 people and 10 hats (an old problem)

Problem: 10 people walk into a party and give their hats to the coat and hat check guy. When the party finishes, their hats are returned in no specific order and with no specific intent. What is the probability that no one gets their own hat back? (This is an old problem, and a standard one in combinatorics and probability.)

First, an approximate solution: probability that a person gets his hat back is 1/10, so the probability that the person does not get his hat back is 9/10. So the probability that no one gets their hat back is 0.9^10 = 0.34868. Why is this solution imperfect? Well, because the events aren’t really independent. If one person gets their hat back, it increases the chance that the remaining people will get their hats back (easy to see that with two people ).  We can’t really multiple the event probabilities, if the events are not independent.

The solution: The correct solution comes from combinatorics. Total number of ways in which hats can be returned is 10! (10 factorial = 10 x 9 x 8 x … 1). If we can count the number of ways in which no guest receives their hat back, then we can deduce the probability by dividing that number by 10!. To count that, we can use the principle of inclusion and exclusion. Say A1 is the set of cases in which the first guest receives his own hat back. Then, we can enumerate:

The cases in which some guest(s) receive their own hats back are: A1 U A2 U … U A10

The number of ways in which that happens can then be written as:
|A1 U A2 U A3 …. A10| = |A1| + |A2| .. + |A10| – (|A1 intersection A2| + |A1 intersection A3| …. + |A9 intersection A10|) + (|A1 intersection A2 intersection A3| + …. |A8 intersection A9 intersection A10|) – (|A1 intersection A2 intersection A3 intersection A4| + … |A7 intersection A8 intersection A9 intersection A10|) + …. – (A1 intersection A2 intersection … intersection A10)

Due to symmetry, we can say that:

|A1 U A2 .. A10| = 10 |A1| – C(10,2) |A1 intersection A2| + C(10,3) |A1 intersection A2 intersection A3| – C(10,4) |A1 intersection A2 intersection A3 intersection A4| + … – C(10,10) |A1 intersection A2 intersection A3 intersection … intersection A10|
[where C(10,2) is the number of ways to select two people out of 10 = 10! / (8! * 2!) ]

Now if we can calculate |A1 intersection A2 … intersection Ai|, we can get our answer. To see that part, the number of ways in which the guests 1..i receive their hats back is simply (10 – i)! because that is the number of ways in which the remaining guests can get their hats back (whether or not those other guests get their own hats back).

So, here we go:
|A1 U A2 .. A10| = 10 * 9! – C(10,2) 8! + C(10,3) 7! – C(10,4) 6! + …. – C(10,10) 0!
= 10 * 362880 – 45 * 40320 + 120 * 5040 – 210 * 720 + 252 * 120 – 210 * 24 + 120 * 6 – 45 * 2 + 10 * 1 – 1 * 1
= 2293839

Therefore, the probability that no guest receives their hat back is 1 – 2293839/10! = 0.367879464.

Monte Carlo Simulation: As Dijksra said (he said it once, I quote it often): Don’t rely on it – I have only proven my solution correct, I haven’t actually tested it. So, I wrote a small program to try this experiment randomly a bahzillion times and check how many times the 10 people get their hats back.  Here are the simulation results:

Strings: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Going to run 100,000,000 tests.
Finished test number: 10000000
Finished test number: 20000000
Finished test number: 30000000
Finished test number: 40000000
Finished test number: 50000000
Finished test number: 60000000
Finished test number: 70000000
Finished test number: 80000000
Finished test number: 90000000
Finished test number: 100000000
Results:
Total Cases tested: 100000000
Cases in which no guest got their hat back: 36782656.

Seems to me that that matches the analytical result above pretty well.

Entire source code is available here. Feel free to use it whichever way you want, but don’t hold me responsible if your space ship crashes against an asteroid because of it. It is unoptimized, unclean, smelly code – so hold your criticism.

Switch to our mobile site