UI STAT 4520 - Bayesian Regression Model For Predicting Income

Unformatted text preview:

Bayesian Regression Model For Predicting Income—Bayesian Statistics Project Final ReportHai Liu, Yi-Lung Kuo, Yi He, Jin LiuDecember 3, 2006IntroductionBackgroundUndoubtedly, to some extent, income is considered as one index of a successful life. It isgenerally believed that different factors would influence the amount of income earned afterleaving school, such as intelligence, gender, race, and educational levels. For example, withlongitudinal data taken from National Longitudinal Survey of Youth (NLSY), Murray (1997,1998) categorized intelligence into five levels from very dull to very bright, and illustratedthat adults with higher intelligence would have higher income. Terman spent most of his lifeconducting the landmark longitudinal study of gifted students for almost forty years from1921 (Terman & Oden, 1959). He found that earned income of male gifted adults was higherthan that of female gifted adults. Another longitudinal study of gifted students (Subotnik,Karp, & Morgan, 1989) showed the same result. Additionally, Terman’s study revealedthat different educational levels conditional on gender would result in different amountsof income. In short, although this study was confined to the group of gifted adults, it im-plied that gender and educational level might be factors influencing income in the adulthood.The 1998 data from U.S. Census Bureau renders us some indication that demographic in-formation, such as gender, race, and educational levels, might have effects on yearly income.Tables 1 through 4 in the appendix show mean income of the year 1998 for all applicablepeople in the United States. These data will be used to set informative priors when Bayesianregression model is used in our study.Our research questions is: how would these different factors affect the income on average?Would the gifted tend to earn more than the non-gifted? Here by gifted we refer to thosewho have ever been in a gifted/talented program.DatasetThe data to be used in our project is from the National Education Longitudinal Study(NELS 1988-2000), which was conducted by the National Center for Education Statistics(NCES). The data collection experienced five waves, including the base year in the springof 1988 (BY), and four follow-ups in 1990(F1), 1992(F2), 1994(F3), and 2000(F4). Thesubjects were nationally representatively sampled eighth-grade students who were surveyedin the base year and from F1 to F4. The major topics in the questionnaire consisted of school,work, and home experiences; educational resources and support; the role in education of theirparents and peers; neighborhood characteristics; educational and occupational aspirations;1and other student perceptions. We are going to just pick some variables from this huge dataset to do our statistical analysis under the Bayesian framework.Model SpecificationOur purpose is to set up a model to find if gifted students are more likely to have higherincome than the non-gifted and if gender, race, education level, and working hours will havesignificant influence on income. So in our model, we use income as the response variable,which could be treated as a continuous random variable. The predictor variables are includedare:Gifted Ever enrolled in a gifted programGender Male/FemaleRace Race of respondent1. American Indian or Alaska Native2. Asian or Pacific Islander3. Black, not Hispanic4. White, not Hispanic5. Hispanic or Latino6. More than one raceEducation Highest PSE degree attained as of 20001. Some PSE, no degree attained2. Certificate/license3. Associate’s degree4. Bachelor’s degree5. Master’s degree/equivalent6. Ph.D or a professional degreeWorking Hours Number of hours the respondents worked in the year of 1999The working hours can be treated as a continuous variable. All the other predictor variablesare categorical, so just like the frequentist approach, dummy variables are used in settingup the model in WinBUGS. Another issue is that after checking the data we find that theresponse income is very skewed to the left. So before fitting the model we should first makesome transformation to the income variable. A natural choice is to use the logarithm ofincome instead of raw income as the response. So the model has the form:log(Income) ∼ 1 + Gifted + Gender + Race + Education + Working HoursBecause an overall intercept is included in the model, the number of dummy variables withineach predictor should be one fewer than the level of that predictor. For instance, there are6 levels of education as described before, so there should be 5 dummy variables associatedwith the predictor education.First, noninformative priors for all parameters are used and then some informative priorsbased on previous studies are considered. Also the parameters of the dummy variables areconsidered to be independent. The model specification in WinBUGS is listed in the appendix.2Some Computing OutputNoninformative PriorThe following is some computing output from the model with noninformative priors. Noticethat the response is log(Income).Node statisticsnode mean sd MC error 2.5% median 97.5% start samplea 4.67E-4 8.26E-6 1.266E-7 4.51E-4 4.67E-4 4.83E-4 1001 5000alpha 9.0840 0.05768 8.912E-4 8.973 9.084 9.198 1001 5000beta 0.0545 0.01091 1.516E-4 0.03271 0.05463 0.0762 1001 5000e2 -0.0160 0.01628 2.331E-4 -0.04755 -0.0162 0.0162 1001 5000e3 0.0519 0.01650 2.046E-4 0.01944 0.05191 0.0843 1001 5000e4 0.2055 0.01058 1.483E-4 0.1849 0.2056 0.226 1001 5000e5 0.2637 0.02261 2.981E-4 0.2188 0.2634 0.3089 1001 5000e6 0.6288 0.05971 8.349E-4 0.5139 0.6281 0.7475 1001 5000gamma -0.1802 0.00922 1.263E-4 -0.1984 -0.1803 -0.1623 1001 5000r2 0.3612 0.05765 9.484E-4 0.2509 0.3611 0.4749 1001 5000r3 0.1832 0.05653 9.74E-4 0.07506 0.1825 0.2959 1001 5000r4 0.2621 0.05481 8.718E-4 0.1571 0.2613 0.3706 1001 5000r5 0.2362 0.05601 8.972E-4 0.127 0.2359 0.3461 1001 5000r6 0.1933 0.06054 0.001005 0.07671 0.1927 0.3094 1001 5000sigma 0.3176 0.00319 4.766E-5 0.3116 0.3175 0.3242 1001 5000Alpha is the overall intercept, beta is the coefficient associated with gifted, and gamma withfemale; e[i]’s are the coefficients associated with education; r[i]’s associated with race;sigma is the variance of the random error; and a is the coefficient of the only continuouspredictor working hours.Only one chain is run. There are totally 6000 posterior samples and the first 1000 itera-tions are discarded as the burn-in stage by checking the history plots of all the parameters.Also the


View Full Document

UI STAT 4520 - Bayesian Regression Model For Predicting Income

Documents in this Course
Load more
Download Bayesian Regression Model For Predicting Income
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Regression Model For Predicting Income and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Regression Model For Predicting Income 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?