NCSU ST 762 - Introduction to nonlinear models - D1380724

Home> Schools> North Carolina State University> Statistics (ST) > ST 762> Introduction to nonlinear models

NCSU ST 762 - Introduction to nonlinear models

School name North Carolina State University

Course St 762- Nonlinear Statistical Models for Univariate and Multivariate Response

Pages 24

Download Save

Unformatted text preview:

762notes.pdfCHAPTER 2 ST 762, M. DAVIDIAN2 Introduction to nonlinear models2.1 IntroductionIn this chapter, we will discuss the model that will be our central focus in Chapters 3–12. Throughthe course of our discussion, we will identify different approaches to inference in the model, setting thestage for these f uture chapters.SITUATION: Assume that we have in dep en dent pairs of observations (Yj, xj), j = 1, . . . , n. The xjmay be “fixed” or random, as discussed in Chapter 1. We will assume that the pairs, and hence therandom variables Yj, are independent.BASIC MODEL: Rather than state the model in the form of “response = model + deivation” and aseries of assumption s about the deviations. we will instead write the model in terms of what we arewilling to say about the first two moments of the distribution of Yjgiven xj. We will begin with a basicform of the model. As our discussion progresses, we will modify this basic form.E(Yj|xj) = f(xj, β), var(Yj|xj) = σ2j. (2.1)• In model (2.1, f(x, β) is a real-valued f unction of the vector of covariates x (r × 1), say, and thevector of regression parameters β (p × 1). The dependence of f on β n eed not be in a linearfashion; as in the models discussed in the examples of Chapter 1, f may depend on some or all ofthe components of β in a complicated, nonlinear way.Note that r need not be equal to p, as in the examples in Chapter 1.• The assumption var(Yj|xj) = σ2jis left deliberately vague at this point. What is importantright now is the idea that the values σjare j-dependent. This implies that the variances of the(conditional) distributions of Y values at different xjare not the same across j. The values σjmay be know n constants or; more generally, the expression allow s the possibility that they maydepend on xj.• If we defineej= Yj− E(Yj|xj) = Yj− f (xj, β),we do not necessarily assume that ejis independent of xj, as in the “classical” assumptions. Giventhe way we have defined the model, we do have that E(ej|xj) = 0, which is similar to classicalassumption (1). Thus , we do assume th at the chosen mo del form f(xj, β) is a correct specificationof E(Yj|xj).PAGE 27CHAPTER 2 ST 762, M. DAVIDIANThis may be interpreted as saying that the data analyst is well equipped to identify an appropriatemodel form. In the case where there is a theoretical basis for ch oosing a model, as in the case ofpharmacokinetics, this is certainly a reasonable assumption.• Note that we make no assumption about the distributions of the (Yj, xj)s or, more directly, theconditional distributions of Yjgiven xj. Major themes will be the ability to d evelop inf eren -tial strategies that have nice properties without making s uch assumptions and the robustness ofinferential methods to violation of distributional assumptions that might be made.2.2 Inferential approachesGenerally, as in the “classical” regression set-up, the scientific objective may be stated in terms ofquestions about the value of th e parameter β or at least some of its elements. That is, questions ofinterest focus on the mean response as a function of xj, e.g.• To obtain the most accurate characterization• To determine whether the model may be modified to exclude consideration of some componentsof xjThu s, at least initially, when we speak of inference within the framework of our basic model (2.1), weinterpret this to mean estimation of and testing with respect to the parameter β. We will see that otherparameters may also be involved in carrying this out most effectively, and that in deed other parametersin modifications of (2.1) may also be of interest.APPROACH 1: Except for the fact that f is nonlinear in β, pretend that some of th e other “classical”assumptions hold. In particular, whether we believe variance is constant or not, suppose we pr oceed asif it is, so that var(Yj|xj) = σ2is a constant. We might even adopt the assumption of normality of Ygiven x (this clearly would be erroneous for binary data or d ata in the form of small counts, but mightbe a reasonable approximation for continuous responses). Under this perspective, a natural approachwould then be ordinary least squares (OLS); that is, minimizing in β the sum of squared deviationsnXj=1{Yj− f (xj, β)}2. (2.2)Just as in the linear case, this approach can be motivated in different ways:PAGE 28CHAPTER 2 ST 762, M. DAVIDIAN• If we adopt the (conditional) normality assu mption, maximum likelihood estimation of β (andσ2) involves jointly maximizing the loglikelihoodlog L = −(n/2) log 2π − (n/2) log σ2− (1/2)nXj=1{Yj− f (xj, β)}2/σ2. (2.3)Maximization of this in β is equivalent to minimizing (2.2).• With or without the normality assum ption, one m ay view minimizing (2.2) in β as a “sensible”thing to do, as discussed in Chap ter 1. The sum of squared deviations (2.2) may be viewed asa “distance” criterion that, in accordance with the assu mption of constant variance, treats all nobservations as if they were of equal quality.ASIDE: It is important to recognize that, in discussing maximum likelihood, we are implicitly condi-tioning on xjwhen writing the likelihood. To app reciate this, suppose the xj(r × 1) are random andthemselves normally distributed with some mean µ and covariance matrix Σ. S o if we consider the(Yj, xj) as independent draws from a distribution of possible (Y, x) pairs, ideally, the loglikelihood ofthe observed data (the pairs (Yj, xj), j = 1, . . . , n) wou ld belog L − (rn/2) log 2π − (n/2) log |Σ| − (1/2)nXj=1(xj− µ)TΣ−1(xj− µ), (2.4)where log L is defined in (2.3) and is the logarithm of the product of ind ividual normal densities forYjgiven xj. Note that, as the part of the loglikelihood due to xjdoes not involve β, maximizing thefull loglikelihood (2.4) in β is the same as maximizing log L alone. This also shows that, in the contextof regression modeling, where the distribution of Yjgiven xjis of central interest, the distributionof (random) covariates is not directly relevant. A word of warning, however: this observation appliesonly if the xjare observed without error or are not missing. In these more complex cases, which arebeyond our scope here, the distribution of xjvalues does enter into the picture, complicating mattersconsiderably.Using the notation described in Section 2.4, minimizing (2.2) is equivalent to solving the p-dimensionalestimating equationnXj=1{Yj− f (xj, β)}fβ(xj, β) = 0, (2.5)where

View Full Document


School:
Email:
New Password:
Confirm Password:

NCSU ST 762 - Introduction to nonlinear models

Sign up for free to view:

Please select your school