DOC PREVIEW
Duke STA 216 - Lecture 15

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

OutlineFactor AnalysisLatent Factor Regression ModelsSTA 216, GLM, Lecture 15October 22, 2008Factor AnalysisLatent Factor Regression ModelsGLMM AssumptionsIData consist of repeated observations, yi= (yi1, . . . , yini)0,for subject i (i = 1, . . . , n)IGLMMs assume that the elements of yiare conditionallyindependent given known predictors Xi, Ziand randomeffects biIUnder this assumption, we simply specify separate GLMsfor each yij, with the linear predictors: ηij= x0iβ + z0ibi.IShared dependence on the latent variables biinducesdependence among the elements of yiLimitations of GLMMsITypical GLMMs are designed to accommodate dependencein repeated observations of the same typeIFor example, we observe the same response on a subject atdifferent times or the same response on different subjectswithin a study centerIWhat if instead of repeated observations of the same type,we have multiple observations that are dependent, butdifferent?Sperm Concentration ExampleIIn studying predictors of sperm concentration, it isnecessary to count the number of sperm in a sampleIThis can be difficult to do accurately and three methodsare available based on manual counting or two differentautomated methodsILet yi= (yi1, yi2, yi3) denote the three measures ofconcentration for man i and let xi= (xi1, . . . , xip)0denote avector of predictorsIHow to model?Possibility 1: Multivariate Linear RegressionIOne possibility is to use a multivariate linear regressionmodelyi= βxi+ i, i∼ N3(0, Σ),β = 3 × p matrix of coefficientsΣ = 3 × 3 covariance matrixIIn this case, we have separate regression coefficientsdescribing predictor effects on each measure of spermconcentrationIReally, we’d like a single set of coefficients for sake ofparsimony and interpretabilityPossibility 2: Random Effects ModelIAnother possibility is to use a typical random effects modelyij= x0iβ + z0ibi+ ij, ij∼ N(0, σ2).with bi∼ N(0, Ω) random effectsIIs this model reasonable for these data?IWhat if one measure of sperm motivation (e.g., the manualone) has much more measurement error than then othermeasures?Problems with Random Effects Model?IThe random effects model implies that the correlationbetween yijand yij0is the same for all j, j0combinationsIIf the multiple items are designed to measure one latentvariable (sperm concentration), then this equal correlationassumption only makes sense if the measurement errors ifthe different items have the same variance.IThis is typically unrealistic if we have different types ofmeasurements (manual, different automated technology)Factor AnalysisIFactor analysis provides an approach to allow for multiplemeasures of a latent variable, while accounting formeasurement errorIIn particular, initially ignoring covariates, we can use themodel:yi= µ + Ληi+ i, i∼ N3(0, Σ),µ = (µ1, µ2, µ3)0=interceptsΛ = (λ1, λ2, λ3)0=factor loadings matrixηi∼ N(0, 1)=latent sperm concentrationi=idiosyncratic measurement errors specific to each itemΣ=diagonal error variance matrixFactor Analysis (continued)IFor identifiability, we require at least one λj> 0 -otherwise, model doesn’t know whether ηitakes a highvalue for men with high concentration or a low value (signambiguity)IWe also fix the location & scale of the latent variabledistributionIIdentifiability can be assessed by marginalizing out thelatent variables and seeing whether or not the data informabout all the parameters in the resulting model.Marginal Form & IdentifiabilityIThe factor analysis model described above induces thefollowing multivariate normal model for yi:yi∼ N3(µ, ΛΛ0+ Σ),IThis model is equivalent to the hierarchical form, whichincludes the ηilatent variablesICan we estimate all the parameters under the aboverestrictions?Including Predictors??IHow do we include the predictors in the factor analysismodel?IPotentially, this can be done on the measurement level toallow separate effects on each of the responsesIHowever, such a specification is not parsimonious & doesnot address the interest of assess effects on the latentvariableLatent factor regressionIIn latent factor regression, one uses the same measurementmodel as in typical factor analysis:yi= µ + Ληi+ i, i∼ N3(0, Σ),with one or more elements of Λ constrained as discussedaboveIWe then include the predictors on the latent variable level:ηi= x0iβ + δi, δi∼ N(0, 1),with xilacking an interceptINow, we have a single vector of coefficients, β, describingthe predictor effects on standardized sperm concentrationinstead of 3 sets of coefficientsMarginalized form of latent factor regressionIWe can again marginalize out the latent variables ηitoobtain a marginal form (now conditionally on thepredictors):yi= µ + Λx0iβ + ∗i, ∗i∼ N3(0, ΛΛ0+ Σ).IHence, the latent factor regression model implies a scaledlinear regression form.IIn particular, we have a single linear regression model x0iβthat is defined on a standardized scale & then multipliedby λjin the model for the mean of the jth responseIThis allows different measurement scales for the differentresponsesDimensionality reduction through latent factorregressionINote that we have also reduced dimensionality incharacterizing the predictor effectsIInstead of p × k regression coefficients (k = number ofresponses), we have p regression coefficientsIInstead of k(k + 1)/2 free parameters in the covariancematrix, we have 2k free parametersIHowever, we are assuming that a single factor is sufficientto describe dependence among yi.Bayesian implementation for latent factor regressionIComplete a Bayesian specification with priors forunknowns - intercepts (µ), factor loadings (Λ), errorvariances (Σ), and regression coefficients (β)IThis model has a conditionally normal linear structure, soGibbs sampling straightforward if conditionally conjugatepriors are chosen (normal or truncated normal for µ, Λ, β,inverse gamma for diagonal elements of Σ)IWe can even include variable selection priors for β withoutcomplicationSome issues with mixing of Gibbs samplerIDue to the structure of the factor analysis model (with orwithout the regression component), there tends to be highposterior correlation in many of the parameters.IThis leads to very poor mixing of the Gibbs sampler incertain cases even if diffuse priors are not chosen.IThe problem is particularly bad if there is very highcorrelation in certain of the responses, which was the casein the sperm concentration example I


View Full Document

Duke STA 216 - Lecture 15

Download Lecture 15
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 15 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 15 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?