Unformatted text preview:

An Introduction to Classification Classification vs Prediction Classification ANOVA Classification Cutoffs Errors etc Multivariate Classification Linear Discriminant Function An Introduction to Classification Let s start by reviewing what prediction is Using a person s scores on one or more variables to make a best guess of the that person s score on another variable the value of which isn t known Classification is very similar Using a person s scores on one or more variables to make a best guess of the category to which that person belongs when the category type isn t known The difference a language convention if the unknown variable is quantitative its called prediction if the unknown variable is qualitative its called classification How does classification work Let s start with an old friend ANOVA In its usual form There are two qualitatively different IV groups naturally occurring or created by manipulation A quantitative DV H0 MeanG1 Mean G2 Rejecting H0 tells us There is a relationship between the grouping and DV Groups represent populations with different means on the DV Knowing what group a person in allows us to guess their DV score mean of that group Let s review in a little more detail Remember the formula for the ANOVA F test In words F compares the mean difference to the variability around each of those means Which of the following will produce the larger F test The two data sets have the same means mean difference N but the difference is Data 2 n 50 Data 1 n 50 group 1 mean 30 std dev 5 group 1 mean 30 std dev 15 group 2 mean 50 std dev 5 group 2 mean 50 std dev 15 Graphical depictions of these data show that the size of F relates to the amount of overlap between the groups Data 1 0 Larger F more consistent grp dif 10 20 30 40 50 70 Smaller F less consistent grp dif Data 2 0 60 10 20 30 40 50 60 70 80 Notice Since all the distributions have n 50 those with more variability Let s consider that last one in reverse Could knowing the person s score help tell us what qualitative group they are in to classify them to the proper group an Example Research has revealed a statistical relationship between the number of times a person laughs out loud each day quant variable and whether they are depressed or schizophrenic qual grouping variable Mean laughsDepressed 4 0 Mean laughsSchizophrenic 7 0 F 1 34 7 00 p 05 A new as yet undiagnosed patient laughs 11 times the first day what s your classification depressed or schizophrenic Another patient laughs 1 time your classification A third new patient laughs 5 times your classification Why were the first two gimmies and the last one not When the groups have a mean difference a score beyond one of the group means is more likely to belong to that group than to belong to the other group unless stds are huge someone who laughs more than the mean for the schizophrenic group is more likely to be schizohrenic than to be depressed someone who laughs less than the mean of the depressive group is more likely to be depressed than to be schizophrenic Even when the groups have a mean difference a score between the group means is harder to correctly classify unless stds are miniscule someone with 5 6 laughs are hardest to classify because several depressed and schizophrenic folks have this score Here s a graphical depiction of the clinical data o X 18 schiz patients x x xo o o mean laughs 4 0 x x x ox ox o o o mean laughs 7 0 x x x ox ox ox ox o x ox o o o 18 dep patients laughs 0 1 2 3 4 5 6 7 8 9 0 1 2 Looking at this its easy to see why we would be confidant in an assignment based on 11 laughs no depressed patients had a score that high confident in an assignment based on 1 laugh no schizophrenic patients had a score that low lacking confidence in an assignment based on 5 or 6 laughs several depressed schizophrenic patients had 5 or 6 The process of prediction required two things that there be a linear relationship between the predictor and the criterion reject H0 r 0 a formula y bx a to translate a predictor score into an estimate of a criterion variable score Similarly the process of classification requires two things a statistical relationship between the predictor DV criterion reject H0 M1 M2 a cutoff to translate a person s score on the predictor DV into an assignment to one group or the other where should be place the cutoff Wherever gives us the most accurate classification X 18 dep patients mean laughs 4 0 o x x xo o o 18 schiz patients x x x ox ox o o o mean laughs 7 0 x x x ox ox ox ox o x ox o o o laughs 0 1 2 3 4 5 6 7 8 9 0 1 2 1 1 1 When your groups are the same size and your group score distributions are symmetrical things are pretty easy place the cutoff at a position equidistant from the group means here the cutoff would be 5 5 equidistant between 4 0 and 7 0 anyone who laughs more than 5 5 times would be assigned as schizophrenic anyone who laughs fewer than 5 5 times would be assigned as depressed o x x x xo o o 18 schiz patients x x x ox ox o o o mean laughs 7 0 18 dep patients mean laughs 4 0 x x x ox ox ox ox o x ox o o o laughs 0 1 2 3 4 5 6 7 8 9 0 1 2 1 1 1 We can assess the accuracy of the assignments by building a reclassification table Actual Diagnosis Assignment Depressed Schizophrenic Depressed Schizophrenic 14 4 4 14 reclassification accuracy would be 28 36 77 78 When considering simple regression prediction we wanted to be able to compare two potential predictors to determine if one would be better we used Steiger s Z test of H0 ry x1 ry x2 How do we compare two potential classification variables to determine if one is a better basis for accurate classification We do it the same way with one intermediate step As you might remember from ANOVA we can express the effect size associated with any F as r or same thing r F F dferror So to compare two potential classification variables compute the ANOVA for each variable on same sample convert each F to r compare the r values using Steiger s Z test remember that you ll need the correlation between the two classification variables r An example Which provides better classification between schiz vs depression times laughing out loud or score on a depression scale For laughing out loud F 1 34 7 00 translates to r F F dferror 7 0 7 …


View Full Document

UNL PSYC 942 - An Introduction to Classification

Download An Introduction to Classification
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Introduction to Classification and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Introduction to Classification 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?