0 0 12 views

**Unformatted text preview:**

Latent Profile Analysis Lecture 14 April 4 2006 Clustering and Classification Lecture 14 4 4 2006 Slide 1 of 30 Today s Lecture Overview Today s Lecture Latent Profile Analysis Latent Profile Analysis LPA LPA as a specific case of a Finite Mixture Model How to do LPA MVN LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 2 of 30 LPA Introduction Latent profile models are commonly attributed to Lazarsfeld and Henry 1968 Like K means and hierarchical clustering techniques the final number of latent classes is not usually predetermined prior to analysis with latent class models Overview Latent Profile Analysis LPA Input LPA Process LPA Estimation Assumptions The number of classes is determined through comparison of posterior fit statistics The characteristics of each class is also determined following the analysis MVN LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 3 of 30 Variable Types Used in LPA A set of continuous metrical variables values allowed to range anywhere on the real number line Examples include The number of classes an integer ranging from two through must be specified prior to analysis Overview Latent Profile Analysis LPA Input LPA Process LPA Estimation Assumptions As it was originally conceived LPA is an analysis that uses MVN LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 4 of 30 LPA Process For a specified number of classes LPA attempts to Overview Latent Profile Analysis LPA Input LPA Process LPA Estimation Assumptions For each class estimate the statistical likelihood of each variable Estimate the probability that each observation falls into each class MVN For each observation the sum of these probabilities across classes equals one This is different from K means where an observation is a member of a class with certainty LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Across all observations estimate the probability that any observation falls into a class Slide 5 of 30 LPA Estimation Estimation in LPA is more complicated than in previous methods discussed in this course Overview Latent Profile Analysis LPA Input LPA Process LPA Estimation Assumptions In agglomerative hierarchical clustering a search process was used with new distance matrices being created for each step K means used more of a brute force approach trying multiple starting points Both methods relied on distance metrics to find clustering solutions MVN LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 LPA estimation uses distributional assumptions to find classes The distributional assumptions provide the measure of distance in LPA Slide 6 of 30 LPA Distributional Assumptions Because LPA works with continuous variables the distributional assumptions of LPA must use a continuous distribution Within each latent class the variables are assumed to Overview Latent Profile Analysis LPA Input LPA Process LPA Estimation Assumptions MVN Be independent Marginally be distributed normal or Gaussian LPA as a FMM For a single variable the normal distribution function is LPA Example 1 Wrapping Up Lecture 14 4 4 2006 1 exp f xi p 2 2 x xi x x2 2 Slide 7 of 30 Joint Distribution Because conditional on class we have normally distributed variables in LPA we could also phrase the likelihood as coming from a multivariate normal distribution MVN The next set of slides describes the MVN What you must keep in mind is that our variables are set to be independent conditional on class so the within class covariance matrix will be diagonal Overview Latent Profile Analysis MVN Univariate Review MVN MVN Contours MVN Properties LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 8 of 30 Multivariate Normal Distribution The generalization of the well known normal distribution to multiple variables is called the multivariate normal distribution MVN Many multivariate techniques rely on this distribution in some manner Although real data may never come from a true MVN the MVN provides a robust approximation and has many nice mathematical properties Furthermore because of the central limit theorem many multivariate statistics converge to the MVN distribution as the sample size increases Overview Latent Profile Analysis MVN Univariate Review MVN MVN Contours MVN Properties LPA as a FMM LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 9 of 30 Univariate Normal Distribution 1 f xi p exp 2 2 x Overview Latent Profile Analysis MVN Univariate Review MVN MVN Contours MVN Properties LPA as a FMM The univariate normal distribution function is xi x x2 2 The mean is x The variance is x2 The standard deviation is x Standard notation for normal distributions is N x x2 which will be extended for the MVN distribution LPA Example 1 Wrapping Up Lecture 14 4 4 2006 Slide 10 of 30 Univariate Normal Distribution N 0 1 Overview Univariate Normal Distribution 0 4 Latent Profile Analysis 0 2 f x LPA as a FMM 0 3 MVN Univariate Review MVN MVN Contours MVN Properties 0 0 Wrapping Up 0 1 LPA Example 1 6 4 2 0 2 4 6 x Lecture 14 4 4 2006 Slide 11 of 30 Univariate Normal Distribution N 0 2 Overview Univariate Normal Distribution 0 4 Latent Profile Analysis 0 2 f x LPA as a FMM 0 3 MVN Univariate Review MVN MVN Contours MVN Properties 0 0 Wrapping Up 0 1 LPA Example 1 6 4 2 0 2 4 6 x Lecture 14 4 4 2006 Slide 12 of 30 Univariate Normal Distribution N 3 1 Overview Univariate Normal Distribution 0 4 Latent Profile Analysis 0 2 f x LPA as a FMM 0 3 MVN Univariate Review MVN MVN Contours MVN Properties 0 0 Wrapping Up 0 1 LPA Example 1 6 4 2 0 2 4 6 x Lecture 14 4 4 2006 Slide 13 of 30 UVN Notes Recall that the area under the curve for the univariate normal distribution is a function of the variance standard deviation In particular Overview Latent Profile Analysis P X 0 683 MVN Univariate Review MVN MVN Contours MVN Properties LPA as a FMM P 2 X 2 0 954 Also note the term in the exponent LPA Example 1 Wrapping Up Lecture 14 4 4 2006 x 2 x 2 1 x This is the square of the distance from x to in standard deviation units and will be generalized for the MVN Slide 14 of 30 MVN Overview f x Latent Profile Analysis MVN Univariate Review MVN MVN Contours MVN Properties LPA as a FMM Lecture 14 4 4 2006 1 2 p 2 x e 1 2 1 x 2 The mean vector is The covariance matrix is Standard notation for multivariate normal distributions is Np Visualizing the MVN is difficult for more than two dimensions so I will demonstrate some plots with two variables the bivariate normal distribution LPA Example 1