DOC PREVIEW
Berkeley COMPSCI 70 - n9

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EECS 70 Discrete Mathematics and Probability TheorySpring 2014 Anant Sahai Note 9Probability Theory: Brief IntroProbability1theory is one of the great conceptual achievements of the 20th century. In statistical physicsit provides a technique for understanding the aggregate behavior of large numbers of particles, withoutanalyzing their individual behavior. In quantum mechanics, it is an integral part of the fabric of the the-ory. Probabilistic methods form the backbone of many recent achievements in algorithms (from the theoryof computation), are foundational for understanding the nature of information in both communication andcryptograph, and play a critical role in estimation/learning and signal processing more generally. Whencombined with ideas from linear-algebra and optimization (studied in EECS 127), probabilistic perspectiveson signals form the foundation of modern machine learning. This perspective, when further combined withideas from control, is also the driving force behind contemporary approaches2to artificial intelligence. Prob-ability ideas are also critical in understanding economics and finance. And of course, statistical techniquesbased on probability theory are absolutely fundamental to experimental science from the physical sciencesthrough biology and even in the social sciences.One of the basic themes in probability theory is the emergence of almost deterministic behavior from prob-abilistic phenomena. For example, if we flip a fair coin, we believe that the underlying frequency of headsand tails should be equal. When we flip it 10,000 times, we are pretty certain in expecting between 4900 and5100 heads. A random fluctuation around the true frequency will be present, but it will be relatively small.This phenomenon (referred to colloquially as a law of large numbers) is responsible for the stability of thephysical world we live in, whose building blocks - elementary particles, atoms, molecules - individuallybehave in many ways like coin flips, but in aggregate present a solid front to us, much like the proportion ofheads in the example above.Coin FlippingLet’s look at the coin flipping example in more detail. To picture coin flipping, consider the simple apparatusbelow called the Galton-Watson board.1Seriously, while here in 70 we can give you a taste for probability and certain basic tools. You really are strongly encouragedto take EECS126 to get a more in-depth understanding, especially of the connection between Probability and Linear Algebra whichwe cannot cover in this course because Linear Algebra is not a prereq. In 126, you will see things like how Probability is intimatelyconnected to things like Google’s PageRank, Speech Recognition, Planning for Robots, various topics in networking, etc.2The older, more old-timey “classic CS” approach of rule-based “expert systems” for artificial intelligence has been almostcompletely replaced by the more modern “EE+Stat” approach that combines large amounts of data with linear-algebraic and comm-theoretic optimization algorithms for graphical-model signal processing.EECS 70, Spring 2014, Note 9 1Imagine we drop a token in the slot at the top of the board. As the token falls, at each peg it choosesrandomly with 50-50 chance whether to go right or left. If we were to watch a single token drop, it wouldend up in one of the slots at the bottom of the board, chosen in some random way. What happens if wewatch N tokens drop? How many tokens does each of the bottom slots hold?Before we answer the question, let us observe that the Galton-Watson board is just a way of visualizing asequence of coin flips. To see this, imagine that each time the token encounters a peg, it flips a coin to decidewhether to go left or right. If the coin is heads, the token goes right, and if the coin is tails, the token goesleft.If the height of the Galton-Watson board is n, then there are exactly n coin flips as the token makes its waydown. Moreover, if we think of heads as +1 and tails as -1, then the slot that the token ends up in is justthe sum of the choices at each round, which ranges from −n to n. So we can label each slot by a numberbetween −n and n in this way, with the leftmost slot labelled −n and the right most slot is labelled n. Noticethat the slot number also represents the number of tails subtracted from the number of heads.Now let’s get back to our question - what happens if we watch N tokens drop? Here is a figure showing theanswer for a board of height n:EECS 70, Spring 2014, Note 9 2As you can see, when n = 10,000, we expect almost all the tokens to fall in slots between -200 and 200.If we go back to the coin flipping setting, we can see that we should expect the number of tails subtractedfrom the number of heads to be between -200 and 200; equivalently, the number of heads is expected to bebetween 4900 and 5100.What is the significance of the range 4900 to 5100? Note that the size of that interval 5100-4900 = 200 isthe twice the square root of n = 10, 000. This is no coincidence. In general if we flip a coin n times, theprobability of landing in a slot between −2√n and 2√n is around 95% (we’re outside less than 1 time outof every 20). And the probability of landing outside of slots between −4√n and 4√n is quite tiny, less thanone-hundredth of a percent — actually less than once out of 15,000 times! This rapid decline in probabilityis evident in the picture above - as you move further away from the center of the picture, the number oftokens in the corresponding slots decreases rapidly.Why does the probability fall so quickly? To understand this, we can turn back to the Galton-Watsonbranching process. First consider a smaller example, where n = 3:There are eight total paths from the start slot to one of the final slots. There is only one path leading to theslot labelled −3 (the token must choose the left branch at each step). The same applies for the right mostslot. However, there are 3 paths to each middle slot (labelled −1 and 1).EECS 70, Spring 2014, Note 9 3As n grows, the total number of paths that the token can take increases exponentially. In fact, since thereare 2 choices for each coin flip, there are 2 ×2 ×···×2 = 2nchoices of paths for the token if the board hasheight n. Even for relatively small values of n like 50 or 100, 2nis already astronomically large. Moreover,as the number of paths grows, the discrepancy between the number of paths leading to each slot is


View Full Document

Berkeley COMPSCI 70 - n9

Documents in this Course
Notes 1

Notes 1

12 pages

Note 2

Note 2

6 pages

Notes

Notes

6 pages

Notes

Notes

7 pages

Note 10

Note 10

8 pages

n20

n20

10 pages

n19

n19

10 pages

n18

n18

10 pages

n17

n17

6 pages

n16

n16

9 pages

n15

n15

10 pages

n14

n14

9 pages

n13

n13

6 pages

n12

n12

7 pages

n11

n11

6 pages

n10

n10

8 pages

n8

n8

7 pages

n7

n7

8 pages

n6

n6

5 pages

n5

n5

8 pages

n4

n4

7 pages

n3

n3

10 pages

n2

n2

7 pages

n1

n1

5 pages

Load more
Download n9
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view n9 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view n9 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?