View Full Document

Data Association for Topic Intensity Tracking



View the full content.
View Full Document
View Full Document

16 views

Unformatted text preview:

Data Association for Topic Intensity Tracking Andreas Krause Jure Leskovec Carlos Guestrin School of Computer Science Carnegie Mellon University 1 Document classification Emails from two topics Conference and Hiking Will you go to ICML too Let s go hiking on Friday P C words 9 P C words 1 Conference Hiking 2 A more difficult example Emails from two topics Conference and Hiking 2 00 pm 2 03 pm Let s have dinner after the talk Should we go on Friday P C words 7 P C words 5 Conference Could refer to both topics What if we had temporal information How about modeling emails as HMM C1 C2 Ct Ct 1 D1 D2 Dt Dt 1 Assumes equal time steps smooth topic changes 3 Valid assumptions Typical email traffic Enron data Topic 2 Topic 1 12 Bursts 10 15 10 5 0 0 No emails Number of emails Number of emails 20 8 6 4 2 50 100 Time days 150 200 0 0 50 100 Time days Email traffic very bursty Cannot model with uniform time steps Topic intensities change over time separately per topic Bursts tell us how intensely a topic is pursued Bursts are potentially very interesting 150 200 4 Identifying both topics and bursts Given A stream of documents emails d1 d2 d3 and corresponding document inter arrival times time between consecutive documents 1 2 3 Simultaneously Classify or cluster documents into K topics Predict the topic intensities predict time between consecutive documents from the same topic 5 Conference Data association problem Hiking If we know the email topics we can identify bursts time High intensity for Conference Intensity for Conference Low intensity for Conference Low intensity for Hiking Intensity for Hiking High intensity for Hiking If we don t know the topics we can t identify bursts Two step solution First classify documents then identify bursts Kleinberg 03 Can fail badly This paper Simultaneously identify topics and bursts 6 The Task Have to solve a data association problem We observe Message Deltas time between the arrivals of consecutive documents We want to estimate Topic



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Data Association for Topic Intensity Tracking and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Association for Topic Intensity Tracking and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?