Document Representation and Query Expansion Models for Blog Recommendation Jaime Arguello and Jonathan L Elsas and Jamie Callan and Jaime G Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 USA Abstract We explore several different document representation models and two query expansion models for the task of recommending blogs to a user in response to a query Blog relevance ranking differs from traditional document ranking in ad hoc information retrieval in several ways 1 the unit of output the blog is composed of a collection of documents the blog posts rather than a single document 2 the query represents an ongoing and typically multifaceted interest in the topic rather than a passing ad hoc information need and 3 due to the propensity of spam splogs and tangential comments the blogosphere is particularly challenging to use as a source for high quality query expansion terms We address these differences at the document representation level by comparing retrieval models that view either the blog or its constituent posts as the atomic units of retrieval and at the query expansion level by making novel use of the links and anchor text in Wikipedia1 to expand a user s initial query We develop two complementary models of blog retrieval that perform at comparable levels of precision and recall We also show consistent and significant improvement across all models using our Wikipedia expansion strategy Introduction Blog retrieval is the task of finding blogs with a principle recurring interest in X where X is some information need expressed as a query The input to the system is a short i e 1 5 word query and the output is a ranked list of blogs a person might want to subscribe to and read on a regular basis This was the formulation of the TREC 2007 Blog Distillation task Macdonald Ounis Soboroff 2007 Feed recommendation systems may also suggest relevant feeds based on the feeds a user already subscribes to

