Automatic Identification of User Goals in Web Search WWW 05 Uichin Lee UCLA Zhenyu Liu UCLA Junghoo Cho UCLA Presenter Emiran Curtmola UC San Diego CSE 291 4 29 2008 Need to improve the quality of a SE s results improve the user s browsing experience Page ranking Result clustering Answer presentation Snippet selection Which hyperlinks to highlight and how much in order to indicate paths to search results include the distance to relevant pages as the nr clicks Query answers organized as to reflect the organization s intranet structure similar to a table of contents 4 29 2008 Go to Demo 2 Understand user web search behavior Understand how and for what people are Now searching Understand why users are searching What are the goals of users issuing queries on the Web 4 29 2008 3 Many users many needs many goals Is there an agreed classification of query goals Do most queries have a detectable goal Is query goal classification feasible Characterization of Web query goals Goal Automatically identify query goals on the Web 4 29 2008 A way for SE to associate user goals and queries 4 Taxonomy of query goals Rose et al WWW 04 Broder SIGIR 02 Conceptual framework for user goals Offline classification of web queries by manually investigating Web query logs 4 29 2008 5 Show that the majority of Web queries have a detectable clear goal Propose a benchmark of queries and their goals Extract properties of Web queries to predict the query goals Effectiveness measurements show 80 90 query goal prediction accuracy 4 29 2008 6 Taxonomy of query goals Degree of identifiability of a query goal via a human subject study Automatic identification of query goals Experimental measurements 4 29 2008 7 Methodology User surveys on AltaVista query log dataset Taxonomy of Web query goals Navigational user has a particular Web page in mind E g pubmed citeseer bestbuy Stanford Informational user intends to visit multiple pages to learn about a topic E g hidden markov model 4 29 2008 8 17 60 3 30 3 25 2 23 4 5 6 5 8 4 29 2008 9 Taxonomy of query goals Degree of identifiability of a query goal via a human subject study Automatic identification of query goals Experimental measurements 4 29 2008 10 User s goal is subjective can we associate a query with a particular goal without any user feedback Predictable queries queries can be clearly classified as navigational or informational How many are predictable Unpredictable queries classification by human observers is bimodal How many queries are unpredictable 4 29 2008 11 50 most popular Google queries issued from UCLA CS department 28 grad students from CS dept guess the most probable goal for each query Assumption their consensus guess is correct Questionnaire Query taxonomy 4 29 2008 Manual classification Users Classification results Query benchmark Queries 12 Use user intention descriptions to classify Clear navigational Clear informational Choice 1 I already have a particular Website in mind and my major interest is just to reach that site through the search engine Choice 2 I know there s a particular Website corresponding to this query However my interest is not only to reach that site but to visit some other sites returned by the search engine Choice 3 I have no particular Website in mind I am willing to click on multiple results returned by the search engine 4 29 2008 13 i q percentage of users who indicate q s goal is informational Clear navigational 4 29 2008 23 Unpredictable queries Clear informational 14 Breakdown 23 queries 17 queries belong to the following topics Software names e g cygwin spybot ns2 Personal names e g CS researchers 6 queries on diverse topics Distribution of software 12 and personal 8 names 4 29 2008 15 Clear separation of informational and navigational queries after 20 query removal 11 Clear navigational 4 29 2008 6 Unpredictable queries 12 Clear informational 16 A large fraction of queries are predictable Can be associated with a particular goal Feasible for automatic goal classification Next use the 30 queries as a benchmark Most of the unpredictable queries fall into certain topics Use topic detection methods to detect the topic 4 29 2008 17 Taxonomy of query goals Degree of identifiability of a query goal via a human subject study Automatic identification of query goals Experimental measurements 4 29 2008 18 Detect the query goal based on the user click behavior and the Web structure Propose metrics to measure Click behavior of other users who previously issued same query leverage the SE query log Click distribution Average number clicks per query Distribution of HTML links SE does not have enough stats about the query Experimentally confirm the effectiveness of metrics 4 29 2008 19 Intuition in the past other users clicked most likely on a single and same answer for a navigational query Metric click distribution per query Measure number clicks per query answer 4 29 2008 20 Sort answers for a query in descending order of the number of clicks they received from all users Navigational query a skewed distribution Informational query a flat distribution 4 29 2008 21 Intuition number of results a user clicks per query Metric average number of clicks per query Navigational query user clicks once Informational queries user clicks multiple times 4 29 2008 22 Intuition there is a direct correlation between the query e g query pubmed the number of anchor destinations from Web links with the same anchor text as the query e g a href www ncbi nih gov PubMed a Metric anchor link distribution Crawl the Web Locate all links on the Web with anchor text as the query Count the number of distinct anchor destinations 4 29 2008 23 Sort the anchor destinations in descending order of their frequency Navigational query a skewed distribution A query is likely to be navigational if many links exist with the same destination and the same anchor text which is the query Informational queries a flat distribution 4 29 2008 24 Danger of Web Crawling THE NOISE Link spam Artificial links not really related to the anchor text to increase the page rank Mirror sites at multiple locations Solution use spam and mirror detection techniques 4 29 2008 25 Taxonomy of query goals Degree of identifiability of a query goal via a human subject study Automatic identification of query goals Experimental measurements 4 29 2008 26 50 most popular queries 30 predictable queries Record all Google incoming outgoing messages from the department 6 months 2004 147 744 unique queries 1 6
View Full Document
Unlocking...