DOC PREVIEW
CR MATH 45 - Calculating Web Page Authority Using the PageRank Algorithm

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Calculating Web Page AuthorityUsing the PageRank AlgorithmMath 45, Fall 2005Levi Gill and Jacob Miles Prystowsky1IntroductionIn 1998 a phenomenon hit the World Wide Web: Google opened itsdoors. Larry Page and Sergey Brin, the creators of the Google searchengine, had come up with a method to allow everyday users to searchbillions of web pages and get accurate and relevant results.Many of Google’s search methods have been widely spe culated about,but some fundamentals of their search technology are well known, andone of the best known is PageRank.2What is PageRank?Search Engines are responsible for:1. Indexing the Web.2. Finding relevant results based on search queries.3. Displaying pages in hierarchical order based on their authority.Authority ranking is what PageRank is all about.3PageRank is calculated based off the premise that every pageu has 1 vote to spe nd.− Each page u links to receives an equal portion of u’s vote.− A page u’s rank depends on the number of pages linking tou, and their ranking.− The more forward links (pages you link to), the less each ofyour link is worth.− The fewer backlinks (pages linking to you), the lower yourrank.4Some of the most influential factors on your PageRank is theauthority and number of Backlinks.− If a page with a high PageRank links to you, even if thatpage only give s you a minuscule portion of their vote, youhave a good chance of significantly increasing your PageR-ank.− If a lot of little pages link you, your PageRank will alsoincrease.For several reasons, it is hard to manipulate a web page’sPageRank.5A Simple ExampleLet’s look at a contrived system of our math Teachers’ websites:Dave Arnold Bruce WagnerMike Butler Kevin YokoyamaTami MatsumotoErin WallFigure 3.1 Math Teacher’s Home Pages6From the previous flow chart we derive this adjacency matrix(where the rows are the forward links, and the columns are thebacklinks):A =0 1 1 1 0 01 0 0 1 0 01 1 0 0 1 01 1 1 0 0 00 1 1 1 0 11 1 1 1 1 0(3.1)Note: A diagonal entry is a page linking to itself. The assump-tion we make is that these links are thrown out.7Summation MethodThe adjacency matrix that we have derived is now used in thesummation method to calculate the PageRank, and is iterateduntil the values converge.P ageRank = αE(u) + αXv∈BuP ageRank(v)NvWhere,− u is a web page.− Nv is the the numbe r of forward links into u.8− Bu is the vector of backlinks into u (columns ofmatrix 3.1).− α is the convergence constant, or “normalization factor.”− E is the “Uniform Rank Source.”The result of using this method onmatrix 3.1 is as follows:P ageRank =DaveBruceMikeKevinErinTami=1.46191.39131.08411.30550.50080.2564.9Figure 4.1 PageRank Output10Problems With the Summation MethodThese results are nice, and we can work with them. Howev-er, the summation method does have some problems.1. Dangling NodesA dangling node is a page that is that has no forwardlinks. The consequences of dangling nodes is that theyare “rank sinks” for PageRank, because they obtainPageRank and don’t give any back.On a mathematical level, Dangling Nodes appear as arow of zeros, and when applied to our formula causesdivision by zero. So a better method is needed.112. Huge ComputationsThe summation method may work fine for a 6 × 6 ma-trix, but what about Google’s case with more than8 billion pages? This method is too rudimentary.3. Mathematically ImpreciseAs we will show soon, the answers provided by thismethod are only scalar multiples of the true PageRankvector.12Google’s MethodThe method that is preferred by Google, the power method,easily works around the issues we’ve presented. To start, werow-normalize the adjacency matrix A:Hi=AiPnk=1AikWhich divides each row by its sum. Now we apply this tothe power method, which is defined as:π(k)T= απ(k−1)TH + (απ(k−1)Ta + (1 − α))eT/nA Matlab file was written to take an adjacency matrix anditerate the power method 100 times, resulting in the vectorπT.13Results of the Power Method? The power method effectively addresses dangling nodesby replacing the entire row of zeros with 1/n. Whatthis means is that when a person browsing the web findsthemselves stuck on a page with no forward links, thereis a uniform probability of that person “teleporting” toan unrelated page to escape.? The power method is fairly inexpensive for heavy com-putations, because H is a sparse matrix.? It is possible to prove that πTis exactly the PageRankvector, not a scalar multiple.14The Page Rank of online.redwoods.eduTo demonstrate the effectiveness and ease of using thepower method, a “spider” was programmed to “crawl”http://online.redwoods.edu and collect all the validlinks in that server. We have analyzed the results.15Top Five1. .../darnold/laproj/fall98/jodlynn/sld001.htm2. .../darnold/laproj/fall98/skymeg/splinepres/sld001.htm3. .../darnold/laproj/fall98/jodlynn/tsld001.htm4. .../darnold/laproj/fall98/jodlynn/index.htm5. .../bwagner/math30/index.htmBottom Five1. .../darnold/deproj/sp99/paulc/mypaper2.pdf2. .../darnold/deproj/sp05/bodenmike/presentation.pdf3. .../darnold/deproj/sp04/jaimiemike/draft4.pdf4. .../darnold/deproj/sp05/atrav/pendulumpresentation.pdf5. .../darnold/deproj/sp00/franscott/finalpaper.pdf16Figure 8.1 online.redwoods.edu17AcknowledgmentsWe would both like to give our sincerest appreciation toDave Arnold for both his time, going above and be yond thecall of duty, and his dedication to excellent teaching.Also, we would also like to thank Bruce Wagner for his re-search assistance and resources, and his willingness to helpus understand this material, even though we were not en-rolled in any of his


View Full Document

CR MATH 45 - Calculating Web Page Authority Using the PageRank Algorithm

Documents in this Course
Load more
Download Calculating Web Page Authority Using the PageRank Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Calculating Web Page Authority Using the PageRank Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Calculating Web Page Authority Using the PageRank Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?