DOC PREVIEW
LEHIGH CSE 335 - Efficient Case Retrieval

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1ThemesProblem DescriptionPriority QueuesImplementing Priority Queues: HEAPS(Non) ExamplesSequential RetrievalSequential Retrieval (II)Properties of Sequential RetrievalTwo-Step RetrievalPossible Definitions of SIMExample of a “Good” SIMTwo-Step Retrieval With DatabasesProperties of Two-Step RetrievalHomeworkEfficient Case RetrievalSources:–Chapter 7–www.iiia.csic.es/People/enric/AICom.html–www.ai-cbr.orgThemes•Sequential Retrieval•Two-Step Retrieval•Retrieval with Indexed CasesProblem Description•Input: a collection of cases CB = {C1, …Cn} and a new problem P•Output:The most similar case (nearest neighbor, NN): A case Ci in CB such that sim(Ci, P) is maximal, orA collection of k most similar cases (k-nearest neighbors, k-NN) in CB {Ci_1,…, Ci_k}, orA sufficiently similar case: case Ci in CB such that sim(Ci, P) > th where th is a predefined thresholdPriority Queues•Typical example:•These priorities may override the FIFO policy of the queuesOperations supported in a priority queue:•Insert a new element•Extract/Delete of the element with the lowest priorityprinting in a Unix/Linux environment. Printing jobs have different priorities.Implementing Priority Queues: HEAPSA heap is a binary tree T such that:• The key at each node is less or equal than the key of its children•T is completeWe assume that each node has a key, K, and other information, I. K is the priority of I.(Non) Examples581016125891612105891612109 5610 > 9! Tree is not completeComplexity of insert, remove is O(log2 n). Top element: constantSequential Retrieval•The case base is organized as a a sequence of cases with no order among themselves (i.e., no indexing)•Retrieving the most similar or one that passes the threshold condition is straightforward• Let us discuss the straightforward procedure• What is the complexity of this procedure?• How about if we want to retrieve the k most similar cases?How to implement an efficient procedure to retrieve the k-most similar cases:Use heaps!Sequential Retrieval (II)How to implement an efficient procedure to retrieve the m most similar cases to a problem P:Keep all candidate cases in a heap, HInitially fill H with the first k cases. Each with a priority: When looking at a case C in the case base, compared it against the one on the top, T, of the heap HIf then remove T from H and add C with a priority sim(C,P) sim(<case>, P)sim(C,P) > sim(T,P)Properties of Sequential Retrieval•Complexity of retrieval of straightforward k most similar = O(<number of cases>  k )Complexity of retrieval of k most similar with heaps: O(<number of cases>  log2 (k))Drawbacks:If <number of cases> is too large, retrieval time may take too long (despite the low complexity) Retrieval time is independent from the problemAdvantages:Independent of the similarity Simple; doesn’t require complex indexing codeTwo-Step Retrieval•The Mac/Fac approach (many are called/few are chosen):1. Compute the set CAND with all cases in CB such that SIM(C,P) holds2. Search for similar case in CAND The question is how to define the relation SIM such that1. Computing the relation SIM should require much less effort than computing the similarity metric, sim2. SIM doesn’t eliminate good candidatesPossible Definitions of SIMPartial equality:SIM(C,P) iff C and P have the same value on a pre-selected attributeRelaxed similarity:SIM(C,P) iff the values of attributes in C and P are sufficiently similarDomain-specific SIM:(example: in the restaurant domain if P indicates the type of restaurant, one may pre-select all cases of the same type)(example: if sim = sqr((x-v)2 + (y-v)2) ) one could define SIM iff |x-v| < 10)(example: Process Planning domain)HeuristicsExample of a “Good” SIMpa1pa2•Any case for which pa1 is not ordered before pa2 should not be retrieved•One can use these dependencies as criterion for construction SIM (Munoz-Avila & Huellen, ICCBR-1995)Problem:Two-Step Retrieval With Databases•Cases are stored in a database (e.g., MySQL) as regular records•In the first step, a pre-selection is made based on cases matching some attributes (e.g., hyper-rectangles):SELECT a1, . . . , am FROM CaseBaseWHERE (a1 ≥min1 AND a1 ≤max1) . . .AND (am ≥minm AND am ≤maxm)•Find k nearest neighbors from pre-selected cases•Process can be refined by iteratively relaxing the query (see Section 7.6)Properties of Two-Step Retrieval•Advantage: •Disadvantages:  May lead to a significant reduction in retrieval timeMay result in -error: cases very similar to the problem are discardedObtaining SIM maybe difficultHomework1. Construct an example of a k-d tree and a query case such as the BOB test succeeds and hence the NN retrieval algorithm needs to backtrack. Indicate the NN for the query.2. For the same k-d tree as 1, make a query case such as the BWB succeeds and hence the NN retrieval algorithm doesn’t need to backtrack. Indicate the NN for the query.3. Describe a domain of your choice, describe cases in that domain including the attributes, their types and the case’s similarity metric, sim. Describe a SIM relation in that domain that will not result in the


View Full Document

LEHIGH CSE 335 - Efficient Case Retrieval

Download Efficient Case Retrieval
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Efficient Case Retrieval and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Efficient Case Retrieval 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?