View Full Document

Probabilistic Methodology for Genealogical Record Linkage: Determining Weight Robustness



View the full content.
View Full Document
View Full Document

3 views

Unformatted text preview:

Probabilistic Methodology for Genealogical Record Linkage Determining Weight Robustness Krista Jensen John S Lawson Brigham Young University Statistics Department Record Linkage What is record linkage Process that joins two records of information for a particular individual or family Applications of Record Linkage Genealogical research Census Records Ecclesiastical Records Medical research Data storage Government Census Data Benefits of census data Collection methods Information Completeness Starting point for genealogical research Training Instruction given to enumerators Concerns with census data Correctness of data Age Place of origin Census Indexes What is a census index Head of Household Individuals with different last names Subset of questions Availability of census records Census record access limited from 1930 to present for privacy Fields available in census record indexes Surname given name age gender race place of origin state county census page information Probabilistic Methodology Overview of Theory 3 decisions possible e i where i 1 2 3 Definitions of Events e where i 1 2 3 i e1 two fields are a match positive link e2 two fields are a of undetermined status e two fields are a non match positive non3 link Probabilistic Methodology A weight is calculated for each field based on conditional and unconditional probabilities Definitions of Probabilities P ei M can be calculated from a known set of matches P ei can be estimated using sample pairs P M is constant for all comparisons A score for each comparison is calculated sum of the weights Threshold Values are used to determine the classification of each record comparison Probabilistic Methodology Calculating the Weights wk ln P M ei Using Bayes Rule P ei M P M P M ei P ei Probabilistic Methodology The Scores W wk ln P M ei P ei M ln P M ln P ei A Weight is calculated for k fields the score is the sum of those weights Probabilistic Methodology T T 2 504 T T 1 806 Project Data Census Record availability



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Probabilistic Methodology for Genealogical Record Linkage: Determining Weight Robustness and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic Methodology for Genealogical Record Linkage: Determining Weight Robustness and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?