Unformatted text preview:

Clustering Example The purpose of the analysis was to look for sub populations of adult females with respect to a selection of clinically relevant variables Converting Variables to Standardized Form Z scores It is a good idea to work with Z scores of the variables If the variables being used differ in their variability Otherwise the variables with greater variability will dominate clustering Analyze Descriptive Statistics Descriptives Select the variables for the analysis and click the Save standardized values as variables box The clustering will be done with the resulting Z score variables zruls zsoss etc Getting Clustering Analysis Analyze Classify Hierarchical Clustering Select the variables to be clustered Remember to use the Z score form of each variable Open the Statistics window The agglomeration schedule will help us decide how many clusters to include in our solution Knowing the cluster membership of each case for different of clusters can be very useful also but we ll use a different way of looking at this information Open the Method window This is how you select the clustering method how to decide which clusters will be combined on each step and the dissimilarity measures how to represent how similar the cases clusters are to each other You can tell SPSS to work with transformed values I prefer to save the transformed values separately as above so that they are available for additional analyses This allows you to save the cluster membership of each case for each clustering solution you specify Usually 2 12 is enough depends upon whether groups or strays are being combined to form the successive clusters Clustering Output Examining the Agglomeration Schecule The agglomeration schedule shows the step bystep clustering process Which clusters were combined on that step The resulting total error in the clustering solution We look for the big jump in error as a sign that two different clusters have been combined Pretty big jump on step 120 from 4 3 clusters suggesting that 3 is too few and 4 is just right Have to worry about strays 6 clusters 5 5 clusters 4 4 clusters 3 3 clusters 2 2 clusters 1 Agglomeration Schedule Stage 1 2 3 111 112 113 114 115 116 117 118 119 120 121 122 Cluster Combined Cluster 1 Cluster 2 235 289 245 338 212 387 210 226 212 215 207 208 207 247 219 242 206 213 206 297 207 219 210 218 207 210 206 212 206 207 Coefficients 092 223 409 289 703 304 766 320 378 336 982 355 247 375 485 402 101 432 390 469 263 542 696 633 798 976 000 Stage Cluster First Appears Cluster 1 Cluster 2 0 0 0 0 0 0 101 93 108 78 100 90 113 97 103 0 104 109 116 105 114 115 111 110 118 119 117 112 121 120 Next Stage 78 10 48 119 121 114 118 118 117 121 120 120 122 122 0 It can be very helpful to also consider the frequencies of the clusters for the different solutions This can help you think about how the groups form and separate Analyze Descriptive Statistics Frequencies The variables saved during the clustering tell the membership of each case in each number of clusters solution Use several of them to identify clustering patterns strays etc Ward Method Valid 1 2 Total Frequency 84 39 123 Ward Method Percent 68 3 31 7 100 0 Valid Ward Method Valid 1 2 3 Total Frequency 41 39 43 123 Percent 33 3 31 7 35 0 100 0 Most likely solutions Group 1 n 43 and Group 4 n 41 look pretty stable The questions is whether to keep just a 3rd group of n 39 or a 3rd and 4th group of n 19 n 21 1 2 3 4 Total Frequency 41 19 20 43 123 Percent 33 3 15 4 16 3 35 0 100 0 Ward Method Valid 1 2 3 4 5 Total Frequency 41 19 8 43 12 123 Percent 33 3 15 4 6 5 35 0 9 8 100 0 The best way to make this decision is to look at the plots of the 4 group solutions If the 3 rd and 4th groups have similar enough profiles you may decide to go with the 3 group solution If they are sufficiently different you may decide to keep the 4 group solution Getting Custer Profiles Analyze Compare Means Means Use the same variables that were used to perform the cluster solution remember to use the Z score form of each Select one of the solutions for examination This examines the 4 cluster analysis the variable is clus1 4 but doesn t show up until you highlight the variable in the listing Open the Options window Remove everything from the Cell Statistics window except Mean Mean Ward Method 1 2 3 4 Total Zscore significant other social suppor 0253659 1 6423312 0809595 7391507 0000000 Zscore family social support 1586084 1 4103485 0896958 8161275 0000000 Zscore friend social support 3073686 1 2006885 1634026 7476080 0000000 Zscore stait anxiety 3003678 9320337 1 2053890 6860777 0000000 Zscore trait anxiety 1480552 9621648 9346288 7186848 0000000 Zscore depression BDI 2308712 1 1836913 6707560 6148729 0000000 Zscore S TRESS 4889418 9867314 9604135 4165012 0000000 Zscore loneliness 1718599 1 3585691 2841101 8963086 0000000 You get the following table as output Notice that the table includes the group means for each variable for each group and for the total overall population You can decide whether or not you want that overall profile included in your graph They will always all be 0 00 average Zscores If you don t want the total data plotted you should double click the table and then highlight and delete that row You can also edit the various names etc Here s the table as I edited before graphing To obtain the graph Double click the table to put it in edit mode Then right click the table and a menu appears that includes Create Graph Move the cursor to that phrase and another menu appears Click on Line Mean Ward Method Grp 1 N 41 Grp 2 N 19 Grp 3 N 20 Grp 4 N 43 soss 025366 1 6423 08096 739151 sass 1586084 1 4103485 0896958 8161275 frss 3073686 1 20069 1634026 7476080 stanx 3003678 9320337 1 2053890 6860777 tranx 1480552 9621648 9346288 7186848 dep 2308712 1 1836913 6707560 6148729 stress 48894 986731 960414 41650 ruls 171860 1 35857 284110 89631 Here s the 4 group plot 2 0000000 Grp 2 1 0000000 Grp 3 Grp 1 0 0000000 Grp 4 1 0000000 2 0000000 s os s s as s frs s s tanx tranx dep s tres s ruls Deciding between the 3 and 4 group models separate or combine Grp 2 Grp 3 Group 4 Healthy cluster above average social support below average for lonely anxious dep stress Group 1 Average custer pretty flat Group 2 Unsupported Lonely Unhappy low support high on lonely anxiety dep stress and loneliness Group 3 Semi supported Not Lonely but Unhappy average support low on lonely high on anx dep stress I d keep 2 …


View Full Document

UNL PSYC 451 - Clustering Example

Download Clustering Example
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Clustering Example and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Clustering Example 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?