MIT HST 950J - Homework 3 (3 pages)

Previewing page 1 of 3 page document View the full content.
View Full Document

Homework 3



Previewing page 1 of actual document.

View the full content.
View Full Document
View Full Document

Homework 3

115 views

Problems/Exams


Pages:
3
School:
Massachusetts Institute of Technology
Course:
Hst 950j - Biomedical Computing

Unformatted text preview:

6 872 HST950 Problem Set 3 Due LEC 15 Homework 3 1 Identification Problem In an earlier lecture we had outlined a theory of record linkage the full paper is linked from our class schedule page that tells us in principle how to do probabilistic matching of various features of two objects in order to decide whether they are likely to be the same object Briefly the theory is as follows I have interspersed questions for you to answer with the description 1 Given two purported objects e g patients o1 and o2 it is either the case that o1 o2 or that they are distinct individuals For example our records contain a patient file for Raul A Jones of 123 Main Street Boston MA 02131 a new patient arrives claiming to be Raul Jones of 123 Main Street Boston MA 02113 2 Among all the observations we might make of o1 and o2 we select a certain set of features fi o that we agree will be of interest For example we might choose last name first and middle names street address city and ZIP code 3 For each pair of features fi o1 fi o2 we can compare the probability that one would observe fi o1 fi o2 in either of the two cases of step 1 For example assuming that half the hospital s patient population have home addresses in Boston then P fcity Raul fcity Jones same is B y contrast if these two records belong to the same person then we would just expect that the probability that that person lives in Boston is Thus the likelihood ratio 1 p Boston Boston same 2 2 Further if 1 of people in the city live on Main p Boston Boston same 1 4 p Main Main same 0 01 St then 100 We may get an additional likelihood p Main Main same 0 012 ratio of 1000 say for the address 123 and another factor of say 75 for both states being MA These are both estimates and answer the question what fraction of all addresses is 123 or what fraction of individuals like in MA If our initial database contains records on 1M individuals then we might argue that the a priori odds are p same essentially 1 1M 10 6 If we assume



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Homework 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 3 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?