Stanford CS 106B - Handout #3 - Tree and Graph Algorithms - D588006

Home> Schools> Stanford University> Computer Science (CS) > CS 106B> Handout #3 - Tree and Graph Algorithms

DOC PREVIEW

Stanford CS 106B - Handout #3 - Tree and Graph Algorithms

School name Stanford University

Course Cs 106b- Programming Abstractions

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Eric Roberts Handout #53CS 106B May 29, 2009Tree and Graph AlgorithmsMore Algorithmsfor Trees and GraphsEric RobertsCS 106BMay 29, 2009Outline for Today• The plan for today is to walk through some of my favorite treeand graph algorithms, partly to demystify the many real-worldapplications that make use of those algorithms and partly toemphasize the elegance and power of algorithmic thinking.• These algorithms include:– Google’s Page Rank algorithm for searching the web– The Directed Acyclic Word Graph format used in Lexicon– The heap data structure used to create efficient priority queues• In an optional lecture next Monday, Keith Schwarz and I willtalk about using C++ in the “real world” outside of CS106.Google’s Page Rank Algorithm The Page Rank AlgorithmABDECStart with a set of pages.1.The Page Rank AlgorithmABDECCrawl the web to determine the link structure.2.The Page Rank AlgorithmABDECAssign each page an initial rank of 1 / N.3.0.2 0.20.20.20.2– 2 –The Page Rank AlgorithmSuccessively update the rank of each page by adding up theweight of every page that links to it divided by the numberof links emanating from the referring page.4.DEC0.20.20.2• In the current example, page Ehas two incoming links, onefrom page C and one frompage D.• Page C contributes 1/3 of itscurrent page rank to page Ebecause E is one of three linksfrom page C. Similarly, pageC offers 1/2 of its rank to E.• The new page rank for E isPR(E) = PR(C) PR(D) 32+0.2 3+0.2 2=0.17The Page Rank AlgorithmIf a page (such as E in the current example) has no outwardlinks, redistribute its rank equally among the other pages inthe graph.5.DEC0.20.20.2• In this graph, 1/4 of E’s pagerank is distributed to pages A,B, C, and D.• The idea behind this model isthat users will keep searchingif they reach a dead end.The Page Rank AlgorithmABDECApply this redistribution to every page in the graph.7.0.28 0.150.180.170.22The Page Rank AlgorithmABDECRepeat this process until the page ranks stabilize.8.0.26 0.170.170.160.23The Page Rank AlgorithmABDECIn practice, the Page Rank algorithm adds a damping factorat each stage to model the fact that users stop searching.9.0.25 0.170.180.170.22Page Rank Relies on Mathematics– 3 –Directed Acyclic Word Graphs•The Lexicon class provides ahighly time- and space-efficientrepresentation of a word list.• The data representation used inthe Lexicon class is called aDirected Acyclic Word Graphor DAWG, which was firstdescribed in a 1988 paper byAppel and Jacobson.• The DAWG structure is basedon a much older data structurecalled a trie, developed by EdFredkin in 1960. (The trie datastructure is described in the texton page 490.)The Trie Data Structure• The trie representation of a word list uses a tree in which eacharc is labeled with a letter of the alphabet. In a trie, the wordsthemselves are represented implicitly as paths to a node.Going to the DAWGs• The new insight in the DAWG is that you can combine nodesthat represent common endings.• As the authors note, “this minimization produces an amazingsavings in space; the number of nodes is reduced from117,150 to 19,853.The Heap Algorithm• As they are implemented in the starter folder for Assignment#6, priority queues take O(N) time. Because priority queuesare at the center of both Dijkstra’s and Kruskal’s algorithms,making them more efficient will improve performance.• The standard algorithm for implementing priority queues usesa data structure called a heap, which makes it possible toimplement priority queue operations in O(log N) time.• A heap is a binary tree with three additional properties:– The tree is complete, which means that it is not only completelybalanced but that each level of the tree is filled as far to the leftas possible.– The root node of the tree has higher priority than the root ofeither of its subtrees.– Every subtree is also a heap.Tracing the Heap AlgorithmInsert in order: 17 93 20 42 68 11Tracing the Heap AlgorithmInsert in order: 17 93 20 42 68 11Dequeue

View Full Document