Unformatted text preview:

CSCI 5417 Information Retrieval Systems Jim MartinTodayMachine Learning for ad hoc IRWhy is There a Need for ML?Why is There a Need for MLUsing ML for ad hoc IR (Approach 1)Training dataUsing classification for ad hoc IRSlide 9More Complex CasesProblemLearning to RankSlide 13Slide 14The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]Slide 16Slide 17Limitations of Machine LearningBreakSlide 20IE vs. IRInformation RetrievalSlide 23WhyWebInformation ExtractionSlide 27Slide 28Information Extraction: EntitiesNamed Entity RecognitionInformation Extraction: RelationsRelation ExtractionSlide 33Information Extraction: EventsEvent DetectionInformation Extraction: Times, numbers, measures, etc.Temporal and Numerical ExpressionsSlide 38Template AnalysisIE detailsNERNE TypesSlide 43AmbiguityNER ApproachesML ApproachData Encoding for Sequence LabelingSlide 48IOB EncodingSlide 50TrainingNER FeaturesNER as Sequence LabelingRelationsSlide 55Relation TypesSlide 57Relation AnalysisSlide 59FeaturesSlide 61Slide 62ExampleBootstrapping ApproachesBootstrapping Example: Seed TupleBootstrapping RelationsInformation Extraction SummarySocial Media and IRSlide 69CSCI 5417Information Retrieval SystemsJim MartinLecture 2111/8/2011TodayFinish learning to rankInformation extraction01/14/19 CSCI 5417 - IR 3Machine Learning for ad hoc IRWe’ve looked at methods for ranking documents in IR using factors likeCosine similarity, inverse document frequency, pivoted document length normalization, Pagerank, etc.We’ve looked at methods for classifying documents using supervised machine learning classifiersNaïve Bayes, kNN, SVMsSurely we can also use such machine learning to rank the documents displayed in search results?Sec. 15.401/14/19 CSCI 5417 - IR 4Why is There a Need for ML?Traditional ranking functions in IR used a very small number of featuresTerm frequencyInverse document frequencyDocument lengthIt was easy to tune weighting coefficients by handAnd people didBut you saw how “easy” it was on HW101/14/19 CSCI 5417 - IR 5Why is There a Need for MLModern systems – especially on the Web – use a large number of features:Log frequency of query word in anchor textQuery term proximityQuery word in color on page?# of images on page# of (out) links on pagePageRank of page?URL length?URL contains “~”?Page edit recency?Page length?The New York Times (2008-06-03) quoted Amit Singhal as saying Google was using over 200 such features.Using ML for ad hoc IR (Approach 1)Well classification seems like a good place to startTake an object and put it in a classWith some confidenceWhat do we have to work with in terms of training data?DocumentsQueriesRelevance judgements01/14/19 CSCI 5417 - IR 601/14/19 CSCI 5417 - IR 7Training data01/14/19 CSCI 5417 - IR 8Using classification for ad hoc IRA linear scoring function on these two features is then Score(d, q) = Score(α, ω) = aα + bω + cAnd the linear classifier isDecide relevant if Score(d, q) > θ… just like when we were doing text classificationSec. 15.4.101/14/19 CSCI 5417 - IR 9Using classification for ad hoc IR02 3 4 50.050.025cosine score Term proximity RRRRRRRRRRRNNNNNNNNNNSec. 15.4.1Decision surfaceDecision surface01/14/19 CSCI 5417 - IR 10More Complex Cases We can generalize this to classifier functions over more featuresWe can use any method we have for learning the linear classifier weights01/14/19 CSCI 5417 - IR 11ProblemThe ranking in this approach is based on the classifier’s confidence in its judgmentIt’s not clear that that should directly determine a ranking between two documentsThat is, it gives a ranking of confidence not a ranking of relevanceMaybe they correlate, maybe not01/14/19 CSCI 5417 - IR 12Learning to RankMaybe classification isn’t the right way to think about approaching ad hoc IR via MLBackground MLClassification problemsMap to a discrete unordered set of classesRegression problemsMap to a real valueOrdinal regression problemsMap to an ordered set of classes01/14/19 CSCI 5417 - IR 13Learning to RankAssume documents can be totally ordered by relevance given a queryThese are totally ordered: d1 < d2 < … < dJThis is the ordinal regression setupAssume training data is available consisting of document-query pairs represented as feature vectors ψi and a relevance ranking between themSuch an ordering can be cast as a set of pair-wise judgements, where the input is a pair of results for a single query, and the class is the relevance ordering relationship between them01/14/19 CSCI 5417 - IR 14Learning to RankBut assuming a total ordering across all docs is a lot to expectThink of all the training dataSo instead assume a smaller number of categories C of relevance existThese are totally ordered: c1 < c2 < … < cJDefinitely rel, relevant, partially, not relevant, really really not relevant... Etc.Indifferent to differences within a categoryAssume training data is available consisting of document-query pairs represented as feature vectors ψi and relevance ranking based on the categories C01/14/19 CSCI 5417 - IR 15The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]Aim is to classify instance pairs as correctly ranked or incorrectly rankedThis turns an ordinal regression problem back into a binary classification problemWe want a ranking function f such thatci > ck iff f(ψi) > f(ψk)Suppose that f is a linear function f(ψi) = wψi01/14/19 CSCI 5417 - IR 16The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]Ranking Model: f(ψi)€ f (ψi)01/14/19 CSCI 5417 - IR 17The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]Thenci > ck iff w(ψi − ψk) > 0So let’s directly create a new instance space from such pairs:Φu = Φ(di, dj, q) = ψi − ψkzu = +1, 0, −1 as ci >,=,< ckFrom training data S = {Φu}, we train an SVM01/14/19 CSCI 5417 - IR 18Limitations of Machine LearningEverything that we have looked at (and most work in this area) produces linear models of features by weighting different base featuresThis contrasts with most of the clever ideas of traditional IR, which are nonlinear scalings and combinations of basic measurementslog term frequency, idf, pivoted length normalizationAt present, ML is good at weighting features, but not at coming


View Full Document

CU-Boulder CSCI 5417 - Information extraction

Download Information extraction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information extraction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information extraction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?