**Unformatted text preview:**

Regressions with a dummy variable as the dependent variable. y = β0 + β1x1 + e- y can be a binary number(either 0 or 1)-Use D to denote dummy dependent variablePredicting DEstimate of a conditional expected value-average value of D given values of X-ex. E[ D| X ] = .9Implies that there is 9 outcomes of D = 1 and 1 outcome of D = 0.-Estimates the probability that D = 1 given the values of XThis model is called a Linear Probability Modelβ = Change in probability of D=1 / change in xForcing this regression to a straight line can cause impossible probabilities, you must use non-linear regression to avoid this. Called a Logit Regressionβ = change in log(odds D=1) / change in X = %change in odds D=1 / change in XOdds ratioeB= odds[D=1 | x + 1] / odds[D=1 | x]- As x increases by one unit the odds will change by a factor of the odds ratio.Panel Data(Longitudal Data)Data that contains multiple observations of the same individuals across different periods of time.Common types include Before and After studies, multiple period tracking studiesVariables have two subscripts Y⇒it and Xitsubscript i represents the individualsubscript t represents a specific timeThere are different types of variables.Type 1) Variables that differ across individuals but are constant in timei.e. the time index is irrelevantex. gender, race, paental educationType 2) Variables that change across time but the same for individualsi.e. the individual index is irrelevantType 3) Variables that change over time and are different for each individuali.e. Employment, marital statusPooled Regression-A regression where we ignore that the data is in a panelE[ e | Xi] does not equal 0- This violates the CLRM assumption number 2Yit = β0 + β1xit + eiteit may be composed of errors from all three types of variables.ai= error term from T1 variablebt= error term from T2 variablecit= error term from T3 variableCheck first difference regressionchange in Yit = β0 + β1(change in xit )+ eChallenges to first differences- The data needs observations for all individuals at all times- You must have variation in the change in Xit or the Standard Error of β will be very large.Fixed effects regression-Assign a unique Dummy variable to every individual-Include these dummy variables in your regressionYit = β0 + β1xit + F1D1 + F2D2 + F3D3 + …. + bt + citFixed effects regression has n interceptsYou only create n - 1 dummy variablesOne individual does not get a dummy variableOnce you include fixed effects in your regression, all other T1 variables must be droppedDistributed Lag Model (DLM)Yt = αo + β0Xt + β1xt-1 + β2Xt-2 + … + βpXt - p + etA model that past values affect current valuesEvery lag you add reduces one observationβ values should taper off as lag length increases, should be smooth |β⇒1| > |β10|Problems with the DLMValues of lag x’s are likely to be highly colinear large S.E. for your estimators(β’s)⇒Due to large S.E. the tapering of β’s may not be smoothEach lag uses up one observation and we lose another degree of freedom. i.e. every lag decreases observation by 2⇒Solution to the Problems with DLM- Utilize or use the first lag of y to replace all lags of X This is called a dynamic model⇒- any model where lag values of y are used as a independent variableDynamic ModelYt = αo + β0Xt + λYt-1 + etA Dynamic Model- has a dependent variable that changes over time- Lag of dependent variable as individiual variable- Independent variable that are notlinear combinations of the other independent variablesCompletely avoids all three problems of DLM-Don’t directly put lag X’s into model- no multicollinearity-If value of |λ| < 1 λ⇒2 decreases and there is smooth tapering-We lose only one observation and 1 degree of freedom because we are only using a single lag Long-run multiplierM = β0 Σt=0∞ λt⇒ Σt=0∞ λt = 1 / ( 1 - λ) if λ < 1M = β0 * 1 / 1 - λAuto-regressive Model AR(q)Yt = αo + β1Yt-1 + β2Yt-2 + … + βqYt-q + etAuto-regressive Distributed Lag Model ADL(q,p)Yt = αo + β1Yt-1 + β2Yt-2 + … + βqYt-q + α1Xt-1 + α2Xt-2 + … + αqXt-q-ADL(q,p) include lags of dependent variable and lags of other independent variablesBayesian Criterion Information- If you have T observations and K regressors- BIC = ln(SSR/ T) + k*ln(T) / T You want to pick the combination of q and p based on minimizing BIC⇒Granger Causality-F-test that all coefficients for x variable = 0- it says that past values of x have no effect on y- if reject than “x granger causes y”Stationarity-A variable is stationary if its main properties do not change over time-mean -variance-correlation between variable and its own lags is only dependent on number of lagsIf two variables are non-stationary-they may be correlated for non-causal reasons this is called a “spurious relationship”⇒Testing for mean-stationaryRun AR(1)- Yt = β0 + λYt-1 + eIf |λ| < 1 then Y will move towards a long run variable⇒If |λ| >/ 1 y has a “unit root” y is not mean-stationary⇒If both variables come as non-stationary-Are they both “cointegrated”- cointegrated = non-stationary in the same wayDickey-Fuller TestChange in Yt = β0 + Yt-1(λ - 1) + etRegress change in Yt on Yt-1 calculate t-statistic in usual way and compare to critical value⇒If t-stat > critical value cointegrated⇒If they’re not cointegrated take the first differences of you variables and run regression of ⇒change in Y on change in xchange in Y and change in x are likely to be stationaryNo spurious regression results-Problems with this approach-variables become noisy-Hard to study because it doesn’t follow economic theoryStationarity (cont.)If the variables are cointegrated, regression of y on x will not be spurious- Regress y on X, estimate the residuals e- Test whether the e’s are stationary-If yes y and x are cointegrated⇒-If no y and x are not cointegrated⇒- non-stationary means it has a “unit

View Full Document