DOC PREVIEW
Johns Hopkins EN 600 446 - Vector Models for IR

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Vector Models for IRSlide 2PowerPoint PresentationQueries and Documents share same vector representationSimilarity FunctionsProjection of Vectors into 2-D PlaneSlide 7Hierarchical Search with Document CentroidsHierarchical Query MatchingIdeal Clustering BehaviorSample Clustered Document CollectionSlide 12Slide 13CS466-8 1Vector Models for IR•Gerald Salton, Cornell(Salton + Lesk, 68)(Salton, 71)(Salton + McGill, 83)•SMART SystemChris Buckely, Cornell / SAPIR systemsCurrent keeper of the flameSalton’s Magical Automatic Retrieval Tool(?)CS466-8 2Vector Models for IR0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0Doc V1Doc V2Boolean ModelSMART Vector Model1.0 3.5 4.6 0.1 0.0 0.0Doc V1Doc V20.0 0.0 0.0 0.1 4.0 0.0Termi WordStemSpecial compoundsSMART vectors are composed of real valued Term weightsNOT simply Boolean Term Present or NOTCS466-8 3Example 3 5 4 1 0 1 0 0Doc V1Doc V2Comput* C++ Sparc genome bilog* proteinCompilerDNA 1 0 0 0 5 3 1 4Doc V3 2 8 0 1 0 1 0 0Issues• How are weights determined? (simple option : raw freq. weighted by region, titles, keywords)• Which terms to include? Stoplists• Stem or not?CS466-8 4Queries and Documents share same vector representationD3D2D1QGiven Query DQ  map to vector VQ and find document Di : sim (Vi ,VQ) is greatestCS466-8 5Similarity Functions• Many other options available(Dice, Jaccard)• Cosine similarity is self normalizingD3D2QV1100 200 300 50V2 1 2 3 0.5V3 10 20 30 5Can use arbitrary integer values(don’t need to be probabilities)CS466-8 6Projection of Vectors into 2-D PlaneV2V1V5V4V3C1V7V6V10V9V8C2CS466-8 7Centroid computation : |D|Vtd,C Centroid|D|1dtset termt Basically, the average of the vectorsin the centroid setD = documents in centroid setTotal docs in centroid setC2C1CS466-8 8Hierarchical Search with Document CentroidsV1V2V3V4V5V6V7V9V10V8CS466-8 9Hierarchical Query Matching For all children of Ci {Cj }•find Cj : sim (VQ , Cj) is maximum•if Cj is a leaf(document vector), return Cj •else Ci = Cj and iteratelog ( | D | ) vector comparisons(height of tree)VQ = Query VectorCi = Root CentroidCS466-8 10Ideal Clustering BehaviorCS466-8 11 document vector centroid vectorSample Clustered Document CollectionCS466-8 12 relevant document with respect to a queryvector nonrelevant document with respect to a queryIdeal Document SpaceCS466-8 13 document vector centroid vector supercentroid vectorIntroduction of


View Full Document

Johns Hopkins EN 600 446 - Vector Models for IR

Download Vector Models for IR
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Vector Models for IR and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Vector Models for IR 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?