**Unformatted text preview:**

Chapter 3 Linear regression part I Identi cation and interpretation In this chapter we begin studying linear regression analysis the workhorse of empirical research in economics We will rst introduce the notion of an econometric model which will provide a framework for how variables relate to each other and thus guide our analysis of their relationships We will then turn to the concept of identi cation where we assume we observe the entire population and show we can recover the parameters of our econometric model Afterwards we will turn our attention to the case where we only have a sample and discuss how we can estimate the econometric model in practice As part of this we will evaluate whether our estimates are any good 3 1 Identi cation In the previous chapter we introduced the notion of a sample In this chapter we will return to assuming the researcher observes the entire population This means that 1 when we discuss random variables available to the researcher which implies that not all random variables will be available we will assume we know their distributions We will then show how we can recover the parameters of interest using a linear regression This is called proving identi cation One reason why this is important is because the proof of identi cation can guide us on how to estimate the parameter of interest using a sample The other reason is because if you cannot identify a parameter even when you fully observe the population then you have no chance of estimating it well with a sample So proving identi cation tells you that your empirical exercise is not futile 3 2 Economic and econometric models Unfortunately the world is a complicated place Whether we want to study individu als households rms states countries etc the relationships amongst any collection of variables is likely to be a mess For instance perhaps we want to study how later life outcomes such as lifetime earnings is impacted by early childhood circumstances Variables that contribute to early childhood circumstances may include the age of the parents at birth their marital status their education their income whether or not the grandparents were around to assist in child rearing whether they had siblings the neighborhood characteristics attendance of daycare or preschool school quality childhood illnesses and so on Each of those variables are likely to be correlated with one another though For example the income of the parents is likely tied to their ages and years of education One s income may also determine the neighborhoods one can a ord to live in as well as the schools one can a ord to send one s children to In order disentangle the relationship between each of these variables with the outcome variable of interest we need a framework to provide structure to the problem as well 2 as to simplify our analysis random variable U Going forward we will focus on the setup where there is some outcome variable Y determined by various input variables X X1 XK cid 48 and an unobservable Important note on indexes We are back to studying the population The subscripts in X X1 XK cid 48 index to the components of the random vector X In general a subscript i is used to index an individual in a sample or population So if you want to denote the kth variable for the ith individual we use double subscripts Xik The outcome variable Y may be referred to as the outcome variable or depen dent variable The X variables maybe referred to as the input variables control variables covariates or independent variables The U variable maybe referred to as the unobservable the error term or the shock and can be thought of as the composite of all other variables that a ect Y but are not included in the vector X The goal is to uncover the relationship between the outcome variable Y and the covariates X this is what we call regression analysis 1 In particular we are in terested in understanding this relationship for the population not just for a sample 1The name regression stems from an old study where someone wanted to know what deter mined the characteristics of horses This led them to looking at the lineage of horses and hence the name regression 3 3 2 1 Economic models Prior to this class you have already seen various economic models before Take for instance the model for production faced by rms So suppose that rms produce Y AK L 3 1 where Y is the rm s output A is a the rm s level of productivity K is the rm s capital input L is the rm s labor input We assume that rms take A and as given and choose the capital and labor inputs under some constraint to maximize output pro t Since the rm only sells output Y it will equivalently want to maximize output The economic model is simply a collection of assumptions that tell you how agents make their decisions and hence provides structure to your analysis Speci cally you assume that the rm wants to maximize pro t output you assume that the produc tion function has a particular form e g the Cobb Douglas form above and you assume the agent has some restrictions on the values of K and L it can choose since there is some constraint All these assumptions can be characterized mathematically which is what allows you to solve the model using algebra and calculus i e nd the K and L that maximize pro ts De nition 1 An economic model is a collection of assumptions about how agents make their decisions Example 3 2 1 The economic models you saw in your economics classes are just assumptions For instance you may have assumed that the agent has a particular utility assumption that he she wishes to maximize You may have assumed they only had so much time or money to allocate to maximizing their utility You may 4 have assumed agents only live for two periods and can borrow or loan money across periods variables Variables whose values are determined by an economic model i e the K and L you are optimizing over in example above are called endogenous variables So any variable that is a choice is often argued to be endogenous In contrast variables that are assumed to not be determined by an economic model are called exogenous Example 3 2 2 Consider a labor economics model where wages of workers are deter mined by the education level of the worker the worker s demographics and the supply and demand for workers in the market Workers know this and must maximize their utility which depends on their consumption which requires income and leisure time which means not working or studying So workers must decide how to

View Full Document