UT INF 385Q - LECTURE NOTES - D2916674

Home> Schools> University of Texas at Austin> Information (INF) > INF 385Q> LECTURE NOTES

DOC PREVIEW

UT INF 385Q - LECTURE NOTES

School name University of Texas at Austin

Course Inf 385q- Knowledge Management Systems

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

RecomSystRecommender SystemsCOMMUNICATIONS OF THE ACM March 1997/Vol. 40, No. 3 59The feasibility of automatic recognition of recom-mendations is supported by empirical results. First,Usenet messages are a significant source of recom-mendations of Web resources: 23% of Usenet mes-sages mention Web resources, and 30% of thesementions are recommendations. Second, recommen-dation instances can be machine-recognized withnearly 90% accuracy. Third, some resources are rec-ommended by more than one person. These multi-confirmed recommendations appear to be significantresources for the relevant community. Finally, thenumber of distinct recommenders of a resource is aplausible measure of resource quality. A comparisonof recommended resources with resources in FAQs(lists of Frequently Asked Questions maintained byhuman topic experts) indicates the more distinct rec-ommenders a resource has, the more likely it is toappear in the FAQs. PHOAKS is distinguished from other recom-mender systems by two major design principles: rolespecialization and reuse. Many recommender systems,particularly ratings-based systems [1, 3, 4], are builton the assumption of role uniformity. They expectall users to do the same types of work in return forA System forSharing RecommendationsLoren Terveen, Will Hill, Brian Amento,David McDonald, and Josh CreterFINDING RELEVANT, HIGH-QUALITY INFORMATION ON THE WORLD-WIDEWeb is a difficult problem. PHOAKS (People Helping OneAnother Know Stuff) is an experimental system that addressesthis problem through a collaborative filtering approach. PHOAKSworks by automatically recognizing, tallying, and redistributing recom-mendations of Web resources mined from Usenet news messages. A collaborative filtering system that recognizes and reuses recommendations.PHOAKS:60 March 1997/Vol. 40, No. 3 COMMUNICATIONS OF THE ACMthe same types of benefits. In the case of ratings-based systems, for example, everyone rates objects ofinterest. Yet there is evidence that people naturallyprefer to play distinct producer/consumer roles inthe information ecology [2]; in particular, only aminority of people expend the effort of judginginformation and volunteering their opinions to oth-ers. Independently, we have observed such role spe-cialization in Netnews; authors volunteer long listsof recommended Web resources at a stable, but low,rate. PHOAKS assumes the roles of recommenda-tion provider and recommendation recipient are spe-cialized and different. PHOAKS reusesrecommendations from existing online conversa-tions. This reuse requires no extra work fromproviders and no judgments of informationquality from PHOAKS users, another differencewith ratings-based systems.The PHOAKS system contains six months ormore of recommendations and associated data forabout 1,500 newsgroups. Thousands of newopinions about Web resources are added weekly.1What Counts as a Recommendation?The basic idea of collaborative filtering is peo-ple recommending items to one another. Read-ers of Usenet news know this is a normalpractice in newsgroups. Posters often volunteertheir impressions and opinions about all sortsof items, including Web pages. They may statewhat a page is useful for and how useful it is.PHOAKS searches messages for mentions ofWeb pages (URLs) and counts a mention as arecommendation if it passes a number of tests.First, the message must not be cross-posted totoo many newsgroups. Messages posted to alarge number of groups are so general they arenot likely to be thematically close to any of thegroups. Second, if the URL is part of a poster’ssignature or signature file, it is not counted asa recommendation. Third, if the URL occurs ina quoted section of a previous message, it isruled out. Fourth, if the textual context sur-rounding the URL contains word markers thatindicate it is being recommended and does notcontain makers that indicate it is being adver-tised or announced, then it is categorized as arecommendation. We have developed rathercomplicated categorization rules that imple-ment this basic strategy to distinguish the dif-ferent purposes for which Web resources arementioned. In a representative sample of 1.3 million mes-sages processed between February and August of1996, 23% of the messages mention Web resources,with computer- and science-related groups having aslightly higher percentage and recreational groups aslightly lower percentage.AFTER MUCH ANALYSIS, TESTING, AND ITERATIONof our categorization rules, we have developeda fairly accurate rule set. There are two aspectsof accuracy: precision (the percentage of resources therules classify into a certain category that actuallyNumber of Doubly Confirmed Recommendations0 5 10 15 20100050010050Number of Newsgroups0 5 10 15 200.300.250.200.150.100.050.0Probability of Appearing in FAQFrequency Rank 1st – 20th (one-person-one-vote)Figure 2. Comparing recommended resources to FAQ resources1PHOAKS is available at http://www.phoaks.com/phoaks/. As of December 1996,more than 3,000 visitors access recommendations each day.Figure 1. The distribution of doubly confirmed recommendationsbelong to the category) and recall (the percentage ofresources that belong to a category that the rules actu-ally classify into that category). A validation study ofmore than 600 URL mentions shows that our rules forrecognizing recommendations have 88% precisionand 87% recall.HOW SHOULD WE RANK RECOMMENDEDresources within a newsgroup? In otherwords, how can we automatically computean approximate measure of resource quality? Weselected the number of distinct recommenders of aresource as a measure. This metric values independentopinions in estimating the worth of a resource. Wehave done an analysis that focuses on resources with atleast three recommenders—“doubly confirmed” rec-ommendations. Figure 1 shows, for a set of 1,042newsgroups with at least 20 recommended resources,the number of newsgroups that have from 1 to 20doubly confirmed recommendations. Forexample, 429 newsgroups had at least threedoubly confirmed recommendations, 217had at least five, and 68 had at least 10.How can we tell whether the number-of-recommenders metric is a good one? To tryto answer this question, we analyzed theintersection between resources recom-mended on Usenet (that were not in FAQmessages) and resources in newsgroupFAQs. We obtained FAQs by tailoring thebasic PHOAKS message-filtering architec-ture to identify Usenet messages that postedFAQs. Since FAQs contain the kind of infor-mation a

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

UT INF 385Q - LECTURE NOTES

Sign up for free to view:

Please select your school