DOC PREVIEW
hsu11sac

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hierarchical Comments-Based ClusteringChiao-Fang HsuDepartment of ComputerScience and EngineeringTexas A&M UniversityCollege Station, TX [email protected] CaverleeDepartment of ComputerScience and EngineeringTexas A&M UniversityCollege Station, TX [email protected] KhabiriDepartment of ComputerScience and EngineeringTexas A&M UniversityCollege Station, TX [email protected] resources on the Web like videos, images, and docu-ments are increasingly becoming more “social” through user en-gagement via commenting systems. These commenting systemsprovide a forum for users to discuss the resources but have the sideeffect of providing valuable editorial and contextual informationabout the resources. In this paper, we explore a comments-drivenclustering framework for organizing Web resources according tothis user-based perspective. Concretely, we propose a hierarchicalcomment clustering approach that relies on two key features: (i)comment term normalization and key term extraction for distillingnoisy comments for effective clustering; and (ii) a real-time inser-tion component for incrementally updating the comments-based hi-erarchy so that resources can be efficiently placed in the hierarchyas comments arise and without the need to re-generate the (poten-tially) expensive hierarchy. We study the clustering approach overthe popular video sharing site YouTube. YouTube is a challeng-ing and difficult environment, notorious for its extremely short, ill-formed, and often unintelligible user-contributed comments. Throughextensive experimental study, we find that the proposed approachcan lead to effective and efficient comments-based video organiz-ing even in a YouTube-like environment.1. INTRODUCTIONOne of the cornerstones of emerging participatory informationenvironments – like Web 2.0 social news aggregators, social me-dia sites, digital libraries incorporating social computing features,etc. – is the emphasis on user-driven commenting and discussion.By encouraging users to comment, resources in these systems (likevideos, images, news articles) can become “social” resources thatreflect the attitudes and interests of the community of users in away that may depart from the viewpoint of system experts, editors,and the content of the underlying information resource itself. Pop-ularized by weblogs, commenting systems are now in wide use bymajor media (e.g., NYTimes), social media sites (e.g., YouTube,Flickr), and other participatory environments. This rising preva-lence of user-contributed comments is inspiring new approachesfor enhancing how users view and access information resources inPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SAC’11 March 21-25, 2011, TaiChung, Taiwan.Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00.these systems. As an example, NYTimes now prominently fea-tures highly-rated comments as an added dimension to their arti-cle. From a search perspective, researchers are examining tech-niques for retrieving and ranking information resources via com-ments [23].Since user-contributed comments provide a potentially rich sourceof contextual information, we are interested to study whether thesecomments can be used to automatically self-organize a collectionof information resources. In this way, the comments themselvesmay provide a “semantic overlay” that groups similar resources bythe collaborative user perspective encoded in the comments (ratherthan editorial grouping, e.g., into “News” or “Sports”). Our visionis a self-organizing collaborative information sharing space whereuser comments are automatically reflected in how resources are or-ganized. Concretely, we study one popular approach for organiz-ing resources – hierarchical clustering. Hierarchical clustering hasbeen widely studied in the context of structuring text documents(like Web pages and email) [1, 6, 11, 19, 21] and has shown suc-cess in improving information search and browsing [4, 15, 22].Automatically clustering resources by their comments is chal-lenging, however. Comments are typically free-form and highlyunstructured with users engaging in a wide variety of comment-ing purposes, including: (i) describing the underlying resource; (ii)engaging in a back-and-forth discussion with other users; (iii) ex-pressing emotional reaction (e.g., “Awesome!”); (iv) providing newperspective and pointers (e.g., summarizing a related article andadding a hyperlink). In addition to the variation in purpose andsubstance, comments are often syntactically “messy” with a hugevariation in quality and style. Spelling errors (both intentional andnot), grammatical errors, typos, and shorthand are all typical of thecomments generated by a large group of (typically) volunteer com-menters. These challenges suggest that effective clustering maybe dependent on high-quality comment distillation – for findingthe “essence” of a community’s collective comments. Addition-ally, since comments themselves are dynamic (with comments be-ing written at unknown time intervals and reflecting the changingperspective of different commenters), any proposed automatic clus-tering approach should be designed to balance stable resource clus-tering (by considering all possible comments) with timely resourceclustering (by immediately organizing a resource according to thefirst comment).With these challenges in mind, we present a comments-based hi-erarchical clustering approach for organizing information resources.Two of the salient features of the proposed approach are its (i)comment term normalization and key term extraction for distill-ing noisy comments for effective clustering; and (ii) a real-time in-sertion component for incrementally updating the comments-basedhierarchy so that resources can be efficiently placed in the hierar-1130chy as comments arise and without the need to re-generate the (po-tentially) expensive hierarchy. Concretely, we study the clusteringapproach over the popular video sharing site YouTube. YouTubeis a challenging and difficult environment, notorious for its ex-tremely short,


hsu11sac

Download hsu11sac
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hsu11sac and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hsu11sac 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?