DOC PREVIEW
Princeton COS 116 - (Machine Learning)

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Self-improvement for dummies(Machine Learning)COS 1164/24/2008Sanjeev AroraArtificial IntelligenceDefinition of AI (Merriam-Webster):1. The capability of a machine to imitate intelligent human behavior3. Branch of computer science dealing with the simulation of intelligent behavior in computersDefinition of Learning:To gain knowledge or understanding of or skill in by study, instruction, or experienceToday:(Next time)Today’s lecture: Machine LearningMachine learning = “Programming by example.”Show the computer what to do, without explaining how to do it.The computer programs itself!In fact, continuous improvement viamore data/experience.Recall your final Scribbler labTask: Program Scribbler to navigate a maze.Avoid walls, avoid “lava”, head towards the goal.As “obstacle course” gets more complex, programming gets much harder. (Why?)Program Scribbler to navigate a mazeProgram Teach Scribbler to navigate a mazeStart with a simple program:2. Run the maze.3. Label this trial GOOD or BAD, depending on whether goal was reached.4. Submit data from the trial to a “learning algorithm”, which uses it to devise a better program.5. Repeat as needed.Is this how you learned to drive a car?Caveat: imitating nature may not be best strategyExamples:BirdsAirplanesvsCheetahsRace carsvsA machine’s “experience” of the worldn sensors, each produces a number“experience” = an array of n numbersExample: video camera: 480 x 640 pixelsn = 480  640 = 307200In practice, reduce n via compression or preprocessingExample: Representing wood samplesBrownness scale 1 … 10Texture scale 1 … 10(3, 7) = wood that is fairly light brown but kind of on the rough sidelight darksmooth roughA learning task and its mathematical formulationGiven: 100 samples of oak, mapleFigure out labeling(“clustering”)Given a new sample, classify it as oak, maple, or mahoganycolortextureoakmaple“Clustering”New pointAn algorithm to produce 2 clustersSome notions:Mean of k points (x1, y1), (x2, y2), ... , (xk, yk)is (“center of gravity”)Distance between points (x1, y1), (x2, y2) is(x1 – x2)2 + (y1 – y2)2x 1x 2.. . x kk,y 1 y 2.. . y kk2-means Algorithm (cont.)Start by randomly breaking points into 2 clustersRepeat many times:{Compute means of the current two clusters, say(a, b), (c, d)Reassign each point to the cluster whose mean is closest to it; this changes the clustering}What about learning a more complicated object?Speech?Motion?Handwriting?Similar datarepresentation,But more “dimensions”One major idea: modeling uncertainty using probabilitiesExample: Did I just hear“Ice cream” or “I scream”?Assign probability ½ to eachListen for subsequent phonemeIf “is”, use knowledge of usage patterns to increase probability of “Ice cream” to 0.9Spam filteringHow would you define Spam to a computer? Descriptive approach:“Any email in ALL CAPS, unless it’s from my kid brother, or that contains the word ‘mortgage’, unless it’s from my real estate agent, …”Difficult to come up with an good description!Learning approach:“Train” the computer with labeled examples of spam and non-spam (a.k.a. ham) email.Easy to find examples of spam – you probably get hundreds a day!Spam FilteringGiven: A spam corpus and ham corpus.Goal: Determine whether a new email is spam or ham.Step 1: Assign a “spam score” to each word:Fspam(word) = Fraction of emails in spam corpus that contain word.Fham(word) = Fraction of emails in ham corpus that contain word.Observe:SpamScore(word) > 1 if word is more prevalent in spam.SpamScore(word) < 1 if word is more prevalent in ham. SpamScore word =FspamwordFhamwordSpam FilteringStep 2: Assign a “spam score” to the email:SpamScore(email) = SpamScore(word1) x … x SpamScore(wordn),where wordi is the ith word in email.Observe:SpamScore(email) >> 1 if email contains many spammy words.SpamScore(email) << 1 if email contains many hammy words.Step 3: Declare email to be spam if SpamScore(email) is high enough.Spam FilteringAdvantages of this type of spam filter:Though simple, catches 90+% of spam!No explicit definition of spam required.Customized for your email.Adaptive – as spam changes, so does the filter.Text synthesis (v. simplistic version!)Idea: Use example text to generate similar text.Input: 2007 State of the Union Address.Output: “This war is more competitive by strengthening math and science skills. The lives of our nation was attacked, I ask you to make the same standards, and a prompt up-or-down vote on the work we've done and reduce gasoline usage in the NBA.”Text synthesisHow it works: Output one word at a time.1. Let (v, w) be the last two words outputted.2. Find all occurrences of (v, w) in the input text.3. Of the words following the occurrences of (v, w), output one at random.4. Repeat.Variants: Last k words instead of last two words.Handwriting recognition [LeCun et al, AT&T, 1998]The LeNet-5 systemTrained on a database of 60,000 handwritten digits.Reads about 10% of all the checks cashed in the USA.Handwriting recognition: LeNet-5Can recognize weird styles:Handwriting recognition: LeNet-5Can handle stray marks and deformations:Mistakes are usually ambiguous anyway:Aside: How to get large amounts of data? (major problem in ML)• Answer 1: Use existing corpuses (lexis-nexis, WWW for text)• Answer 2: Create new corpuses by enlisting people in fun activities. (Recall Image-Labeling Game in Lab 1)Example: SAT AnalogiesBird : Feathers :: Fish : ____Idea: Search the web to learn relationships between words. [Turney 2004]Example: Is the answer above “water” or “scales”?Most common phrases on the web: “bird has feathers”, “bird in air”, “fish has scales”, “fish in water”.Conclusion: Right answer is “scales”.SAT Analogies [Turney 2004]On a set of 374 multiple-choice SAT analogies, this approach got 56% correct.High-school seniors on the same set: 57% (!)Mark of “Scholastic Aptitude”?Image labeling [Blei et al, 2003] Another solution: Learn captions from examples.System trained on a Corel database of 6,000 images with captions.Applied to images without captions.Princeton prof!Helicopter flight [Abbeel et al 2005]Idea:


View Full Document

Princeton COS 116 - (Machine Learning)

Documents in this Course
Lecture 5

Lecture 5

15 pages

lecture 7

lecture 7

22 pages

Lecture

Lecture

32 pages

Lecture

Lecture

16 pages

Midterm

Midterm

2 pages

Lecture

Lecture

23 pages

Lecture

Lecture

21 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

21 pages

Lecture

Lecture

50 pages

Lecture

Lecture

19 pages

Lecture

Lecture

28 pages

Lecture

Lecture

32 pages

Lecture

Lecture

23 pages

Lecture

Lecture

21 pages

Lecture

Lecture

19 pages

Lecture

Lecture

22 pages

Lecture

Lecture

21 pages

Logic

Logic

20 pages

Lab 7

Lab 7

9 pages

Lecture

Lecture

25 pages

Lecture 2

Lecture 2

25 pages

lecture 8

lecture 8

19 pages

Midterm

Midterm

5 pages

Lecture

Lecture

26 pages

Lecture

Lecture

29 pages

Lecture

Lecture

40 pages

Lecture 3

Lecture 3

37 pages

lecture 3

lecture 3

23 pages

lecture 3

lecture 3

20 pages

Lecture

Lecture

21 pages

Lecture

Lecture

24 pages

Lecture

Lecture

19 pages

Load more
Download (Machine Learning)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view (Machine Learning) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view (Machine Learning) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?