CS206 --- Electronic CommerceHigh-Level OverviewAbout the CourseIssue: B2B Versus B2CTypical Buyer: DellTechnical ProblemsTechnical Problems 2B2CTypical Seller: AmazonSlide 10Slide 11Finding SellersPage RankStochastic Matrix of the WebExampleRandom Walks on the WebRandom Walks 2Example: The Web in 1839Simulating a Random WalkSlide 20Solving The EquationsReal-World ProblemsMicrosoft Becomes Dead EndSlide 24M’soft Becomes Spider TrapSlide 26Google Solution to Traps, Etc.Ex: Previous with 20% TaxGeneral CaseSolving the EquationsSearch-Engine ArchitectureGoogle Anti-Spam DevicesUse of Page RankHubs and AuthoritiesSlide 35Transition Matrix ASlide 37Using Matrix A for H&AConsequences of Basic EquationsSlide 40Slide 411CS206 --- Electronic CommerceDan BonehYoav ShohamJeff Ullmanothers . . .2High-Level OverviewDiscovering buyers and sellersBuyers finding sellers•Search enginesSellers finding buyers•Data miningMaking a dealAuctionsExecuting the dealPayments, security3About the CourseMinimal prerequisites:CS106, CS107Mathematical and algorithmic “sophistication”Emphasis on technology, not “what you need to know to start your very own dot-com.”4Issue: B2B Versus B2CBusinesses buy/sell on-line.Specialized transactions: RFP, reserve, query inventory, etc.Catalogs support purchases, design.•Integration of supplier catalogs.High-value auctions.e.g., bandwidth for wireless.5Typical Buyer: DellDellDBNeed 10,00060G disksTuesdayVendor1Vendor2Disk model123: 60G6Technical ProblemsTransport standards, e.g. HTTP, RPC.Standards for interpreting messages, e.g., SOAP.What is requested? What is offered? Terms?Lexicons or “ontologies.”Is 60G the same number of bytes always?7Technical Problems 2Integration, wrappers, middleware.Different suppliers have different back-end systems. How do they talk to the hub?Security, authorization.Who is allowed to see what?Who is allowed to make decisions?How do you keep out intruders?8B2CMany more participants.Payment an integral part of the process.Identification, secure transfer.Sellers succeed by helping the buyer search.Massive auction site(s).9Typical Seller: AmazonDatabaseserverApplicationserverWebserverUsersTitle Author PriceHowl G’brg 49.95. . . . . . . . .Queries,Accounts,etc.10Technical ProblemsBalancing DB/Web/App servers, distributing load.Wise use of (Web-page) real estate.Pick a few good things to pitch to the known customer.Requires complex data-mining.•Example: Amazon figured out I like Vivaldi and similar composers. End in “i”? Italian renaissance? Composers bought by others who buy Vivaldi CD’s?11Technical Problems 2Exchange of sensitive information, e.g., credit-card numbers.Keeping stored, personal data secret.Managing auctions.Example: 10 matching placemats for sale.•A: $4/each for <= 4.•B: $3/each for exactly 7.•C: $2/each for <= 6.12Finding SellersA major use of search engines is finding pages that offer an item for sale.How do search engines find the right pages?We’ll study:Google’s PageRank technique and other “tricks”“Hubs and authorities.”13Page RankIntuition: solve the recursive equation: “a page is important if important pages link to it.”In high-falutin’ terms: compute the principal eigenvector of the stochastic matrix of the Web.A few fixups needed.14Stochastic Matrix of the WebEnumerate pages.Page i corresponds to row and column i.M[i,j] = 1/n if page j links to n pages, including page i; 0 if j does not link to i.Seems backwards, but allows multiplication by M on the left to represent “follow a link.”15ExampleijSuppose page j links to 3 pages, including i1/316Random Walks on the WebSuppose v is a vector whose i-th component is the probability that we are at page i at a certain time.If we follow a link from i at random, the probability distribution of the page we are then at is given by the vector Mv.17Random Walks 2Starting from any vector v, the limit M(M(…M(Mv)…)) is the distribution of page visits during a random walk.Intuition: pages are important in proportion to how often a random walker would visit them.The math: limiting distribution = principal eigenvector of M = PageRank.18Example: The Web in 1839YahooM’softAmazony 1/2 1/2 0a 1/2 0 1m 0 1/2 0y a m19Simulating a Random WalkStart with the vector v = [1,1,…,1] representing the idea that each Web page is given one unit of “importance.”Repeatedly apply the matrix M to v, allowing the importance to flow like a random walk.Limit exists, but about 50 iterations is sufficient to estimate final distribution.20ExampleEquations v = Mv:y = y/2 + a/2a = y/2 + mm = a/2ya =m11113/21/25/4 13/49/811/81/26/56/53/5. . .21Solving The EquationsBecause there are no constant terms, these 3 equations in 3 unknowns do not have a unique solution.Add in the fact that y+a+m=3 to solve.In Web-sized examples, we cannot solve by Gaussian elimination; we need to use relaxation (= iterative solution).22Real-World ProblemsSome pages are “dead ends” (have no links out).Such a page causes importance to leak out.Other (groups of) pages are spider traps (all out-links are within the group).Eventually spider traps absorb all importance.23Microsoft Becomes Dead EndYahooM’softAmazony 1/2 1/2 0a 1/2 0 0m 0 1/2 0y a m24ExampleEquations v = Mv:y = y/2 + a/2a = y/2m = a/2ya =m11111/21/23/41/21/45/83/81/4000. . .25M’soft Becomes Spider TrapYahooM’softAmazony 1/2 1/2 0a 1/2 0 0m 0 1/2 1y a m26ExampleEquations v = Mv:y = y/2 + a/2a = y/2m = a/2 + mya =m11111/23/23/41/27/45/83/82003. . .27Google Solution to Traps, Etc.“Tax” each page a fixed percentage at each interation.Add the same constant to all pages.Models a random walk in which surfer has a fixed probability of abandoning search and going to a random page next.28Ex: Previous with 20% TaxEquations v = 0.8(Mv) + 0.2:y = 0.8(y/2 + a/2) + 0.2a = 0.8(y/2) + 0.2m = 0.8(a/2 + m) + 0.2ya =m1111.000.601.400.840.601.560.7760.5361.688 7/11 5/1121/11. . .29General CaseIn this example, because there are no dead-ends, the total importance remains at 3.In examples with dead-ends, some importance leaks out, but total remains
View Full Document