DOC PREVIEW
MIT 6 006 - Lecture Notes

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.006- Introduction to AlgorithmsLecture 1Prof. Constantinos DaskalakisToday’s Menu• Motivation • Administrivia• Course Overview • Linked Lists and Document Distance• Intro to “Peak Finding”“Al-go-rithms”…wha?• Remember Logarithms?– they have nothing to do with Algorithms• Well specified method for solving a problem using a finite sequence of instructions• Description might be English, Pseudocode, orreal code• Key: no ambiguityAl-Khwārizmī (780-850)Efficient Algorithms: Why?• Solving problems consumes resources that areoften limited/valuable:– Time: Plan a flight path– Space: Query a database– Energy: Save money• Bigger problems consume more resources• Need algorithms that “scale” to large inputs, e.g. searching the web…Efficient Algorithms: How?• Define problem: – Unambiguous description of desired result• Abstract irrelevant detail– “Assume the cow is a sphere”• Pull techniques from the “algorithmic toolbox”– [CLRS] class textbook• Implement and evaluate performance– Revise problem/abstraction• Generalize – Algorithm to apply to broad class of problemsAdministrivia• Handout: course info• Profs: Daskalakis, Jaillet• TAs: Goldstein, Griner, Bhattacharya, Madry• Sign up for class at https://sec.csail.mit.edu/to get a recitation assignment• Prereqs: 6.01, 6.042• Python• Grades: Problem sets (30%) Quiz1 (Oct 13: 7.30-9.30pm; 20%)Quiz2( Nov 17: 7.30-9.30pm; 20%)Exam (30%)• Read collaboration policy!Content• 8 modules with motivating problem/pset• Linked Data Structures: Document Distance• Divide&Conquer: Peak Finding• Hashing: Efficient File Update/Synchronization• Sorting • Graph Search: Rubik‟s Cube• Shortest Paths: Google Maps• Dynamic Programming: print justification• Numerical Algorithms: linear systemsDocument Distance• Given 2 documents, how similar are they?– if one “document ” is a query, this is web search– find “similar documents” to a given one– detect plagiarism• Goal: algorithm to compute similarity– Actually, we‟ll compute “distance” = 1/similarityProblem Definition• Need unambiguous definition of similarity• We define in terms of “distance”• Word: sequence of alpha characters– Ignore punctuation, formatting• Document: sequence of words• Word frequencies:D(w) is number of occurences of w in D• Similarity based on amount of word overlapVector Space Model• [Salton, Wang Yang 1975]• Treat each doc as a vector of its words– one coordinate per word of the English dictionarye.g. doc1 = “the cat”doc2 = “the dog”„the‟„cat‟„dog‟111– similarity by dot‐product – trouble: not scale invariantdocuments “the the cat cat” and “the the dog dog” will appear closer than doc1 and doc2d1 d2 = 1Vector Space Model• Solution: Normalization– divide by the length of the vectors– measure distance by angle:e.g. θ=0 documents “identical”(if of the same size, permutations of each other)θ=π/2 not even share a wordAlgorithm• Read file• Make word list (divide file into words)• Count frequencies of words• Compute dot product– for every word in the first document, check if it appears in the other document; if yes, multiply their frequencies and add to the dot product• worst case time: order of #words(D1) x #words(D2)– micro-optimization:• sort documents into word order (alphabetically)• compute inner product in time #words(D1) + #words(D2)Python Implementation• Docdist1.py (see handout)• Read file: read_file(filename)– Output: list of lines (strings)• Make word list: get_words_from_line_list(L)– Output: list of words (array)• Count frequencies: count_frequency(word list)– Output: list of word‐frequency pairs• Sort into word order: insertion_sort()– Output: sorted list of pairs• Dot product: inner_product(D1, D2)– Output: numberInputs:• Jules Verne: 25K• Bobsey Twins: 268K• Francis Bacon: 324K• Lewis and Clark: 1M• Shakespears: 5.5M• Churchill: 10MProfiling• Tells how much time spent in each routine– import profile– profile.run(“main()”)• One line per routine reports1. #calls2. #total time excluding subroutine calls3. Time per call (#2/#1)4. Cumulative time, including subroutines5. Cumulative per call (#4/#1)What’s with +?• L=L1+L2 is concatenation of arrays• Take L1 and L2• Copy to a bigger array• Time proportional to sum of lengths• Suppose n one-word lines• Time 1+2+…+n = n(n+1)/2 = (n2)Solution• word_list.extend(words_in_line) : appends list named “words_in_line” to list named “word_list”• Takes time proportional to length of list “words_in_line”• Total time in example of n one-word lines: (n)• resulting improvement:– get_words_from_line_list 23s0.12sOther Improvements• Docdist4.py:– Instead of inserting words in list, insert in dictionary: total to 42s• 5.py:– Process words instead of chars: to 17s• 6.py: merge sort instead of insertion: 6s• 7.py: dictionary (again) instead of sort: 0.5sNext time: Peak Finding• Array of numbers• Find one that is bigger than its neighbors• A local


View Full Document

MIT 6 006 - Lecture Notes

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?