DOC PREVIEW
Princeton COS 116 - Lecture

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Seek and Ye shall Find The continuum of computer intelligence COS 116 Spring 2012 Adam Finkelstein Recap Binary Representation Powers of 2 20 21 22 23 24 25 26 27 28 29 210 1 2 4 8 16 32 64 128 256 512 210 1024 103 Fact Every integer can be uniquely represented as a sum of powers of 2 Ex 25 16 8 1 1 x 24 1 x 23 0 x 22 0 x 21 1 x 20 25 2 11001 1024 Misconceptions about Computers Weather Forecast Just a calculator on steroids Just maintains large amount of data Just does what the programmer tells it Airline Reservation System Yes but Various meanings of Look up Shirley Tilghman in online phonebook In consumer database find credit worthy consumers Find web pages relevant to computer music Among all cell phone conversations originating in Country X identify suspicious ones Search all religion and philosophy books of the world for meaning of life Data Mining Web Search These are major scientific problems with many components Algorithms Engineering Linguistics Statistical Modeling Ethics Policy Society Discussion Time How do you solve this task Sorted array of n numbers find if it contains 58780 Binary search First thing to check Is A n 2 58780 Whatever the answer you halve the range Question What if the array of numbers is not sorted Looking up Shirley Tilghman in Electronic Phonebook T i ASCII Agreed upon convention for representing letters with numbers Example Ideas l g h m a n 2 5 8 6 1 0 0 84 105 108 103 104 109 97 110 44 50 53 56 45 54 49 48 48 Sorted Phonebook sorted array of numbers Use binary search prev slide Rest of the lecture Web Search Future lecture Internet physical infrastructure underlying Web Routers gateways DNS any computer can send a msg to any other What is World Wide Web Files residing on servers that are connected to internet URL uniform resource locator basically an address A file index html in public html directory on some server belonging to PU hyperlinks URL of other files May be on another server Logical Structure of the Web Directed graph edges link from one node to another Important This logical structure is created by independent actions of 100s of millions of users 1st step for search engines create snapshot of the web Webcrawler browser on autopilot Maintains array of web pages it has seen 2 types of pages visited fully explored Do forever Pick any webpage marked visited from array Mark it fully explored Open all its linked pages in browser Save them in array and mark them visited Better just the pages not visited yet First Web Crawler From bp cs washington edu Brian Pinkerton Newsgroups comp infosystems announce Subject The WebCrawler Index A content based Web index Date 11 June 1994 21 33 42 GMT Organization University of Washington The WebCrawler Index is now available for searching The index is broad it contains information from as many different servers as possible It s a great tool for locating several different starting points for exploring by hand The current index is based on the contents of documents located on nearly 4000 servers world wide Check it out at http www biotech washington edu WebCrawler WebQuery html Other information is available from there including a description of the WebCrawler the robot itself and a list of the 25 most frequently referenced sites on the Web Brian Pinkerton Dept of Computer Science and Engineering University of Washington http thinkpink com bp WebCrawler History html Still Feasible Today Still Feasible Today bestbuy com 2 18 2010 2TB for 120 as of 2 23 2012 Still Feasible Today More than 1 trillion web pages now 1 terabyte 1012 byte disk cost 100 Say 10 kb 10 000 bytes of data per page 1 petabyte 1015 bytes to store the web 1 000 disks 100 000 in 2010 Searching for computer music Ideas Identify all pages that contain computer music Sort according to number of occurrences of computer music in the page Human staff computes answers to all possible questions Some pitfalls Spamming by unscrupulous websites Synonymy car auto vehicle Polysemy jaguar car or cat Solution IBM s CLEVER 1996 Google s PAGERANK 1997 Take advantage of the link structure of the web Web link confers approval CLEVER Authorities Sites that are viewed with respect by many New York Times International Computer Music Association Hubs Clearinghouses of information My favorite computer music links Typically Authorities point to hubs and hubs point to authorities Circular Definition Circular Definition see Definition Circular Breaking Circularity Iterative algorithm Pages containing Computer music Start with All pages they point to At every step each page has Hub Score Authority Score Initially all 1 Score Calculation Do forever Next Hub Score for page Sum of current Authority Scores of pages that link to it Next Authority Score for page Sum of current Hub Scores of pages that link to it Fact The scores converge Proof uses Linear Algebra Eigenvalues Computer models and jurisprudence Aug 25th 2005 Fowler and Jeon 05 By product of CLEVER algorithm it reveals clusters Pro Choice Example Abortion Pro Life Data Mining Process of finding answers that are not in the data and must be inferred Example How is a person who shops at Whole Foods REI likely to vote Concerns From users Privacy Privacy Privacy From Computer scientists Formalize privacy How to safeguard privacy while allowing legitimate computations Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences top prize 1M Trends in web search Algorithms to guess what user generating the query had in mind using AI Psychology User History News tracking Seamless integration with e commerce and click based revenue harvesting interesting meeting point of economics and computer science Semantic web Allow users to attach meaning to web based documents allowing search engines to make sense of them Shape of things to come http shape cs princeton edu search html Next week What computers cannot do


View Full Document

Princeton COS 116 - Lecture

Documents in this Course
Lecture 5

Lecture 5

15 pages

lecture 7

lecture 7

22 pages

Lecture

Lecture

32 pages

Lecture

Lecture

16 pages

Midterm

Midterm

2 pages

Lecture

Lecture

23 pages

Lecture

Lecture

21 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

21 pages

Lecture

Lecture

50 pages

Lecture

Lecture

19 pages

Lecture

Lecture

28 pages

Lecture

Lecture

32 pages

Lecture

Lecture

23 pages

Lecture

Lecture

21 pages

Lecture

Lecture

19 pages

Lecture

Lecture

22 pages

Lecture

Lecture

21 pages

Logic

Logic

20 pages

Lab 7

Lab 7

9 pages

Lecture

Lecture

25 pages

Lecture 2

Lecture 2

25 pages

lecture 8

lecture 8

19 pages

Midterm

Midterm

5 pages

Lecture

Lecture

26 pages

Lecture

Lecture

40 pages

Lecture 3

Lecture 3

37 pages

lecture 3

lecture 3

23 pages

lecture 3

lecture 3

20 pages

Lecture

Lecture

21 pages

Lecture

Lecture

24 pages

Lecture

Lecture

19 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?