Slide 1Data Source: Census of CanadaSlide 3Slide 4Slide 5Step4: Linear regressionStep5: Test the validity of linear regression: Normality?Slide 8Step6: Log Transformation (log income)Results of linear regression on log transformationAre different models needed for different ranges of variables?Outliers affecting the modelModel disregarding outlierFinal ModelConclusionsOccupational Factors Affecting the Income of Canada’s Residents in the 1970’sGroup 5Ben Wright Bin RenHong Wang Jake Stamper James RogersYuejing WuData Source: Census of CanadaCollected by Canadian Government in 1971102 different occupational categories4 occupational categories had incomplete dataCategories represent data aggregated over 1000’s of employeesDefinition of variables -Gender: % of women in occupationYears of Education: Average number of years of education per workerJob prestige: rating assigned based on social survey conducted in the mid-1960’sJob types:Blue collar (e.g. janitor)Professional (e.g lawyer)White collar (e.g. insurance agent)What factors affected the occupational income of Canada’s residents in 1971?Step1: Data preparationRemoval of incomplete observations(4 types of employment were not classified into a type: baby sitters, athletes, newsboys, and farmers)Removal of non-descriptive statistics(Census code)Step2: Exploratory data analysis1. Professional occupations have higher average income, prestige scores, and years of education of than blue and white collar jobs2. White collar jobs (on average) employ a larger percentage of womenStep3: pair-wise scatter plot to see the relationships between variables +.57+.57 -.45-.45+.87+.87+.70+.70Step4: Linear regressionData outputR2 = 0.9023F-stat: 120P-value: < 0.00000000000000022Variable CoefficientStdDev T-value P-statEducation131.18 288.75 .454 0.650Women -53.235 9.83 -5.415 0.00000050Prestige 139.20 36.40 3.82 0.00024Type (b.c.)7.32 3037.27 0.002 0.998Type (prof.)516.47 3519.59 0.147 0.884Type (w.c.)355.31 3135.86 0.113 0.91Step5: Test the validity of linear regression: Normality?Data is skewed towards higher incomesStep5: Test the validity of linear regression: Heteroskedasticity?Data is heteroskedastic -> need to perform data transformationR2 = .90Variance is not constantStep6: Log Transformation (log income)Approximates a normal distributionResults of linear regression on log transformationeducation is not a significant variable and can be removed from the model Variable Coef. StdDev T-value P-statEducation 0.0076 0.0255 0.30.765Women -0.0085 0.0009 -9.467 3.01e-15Prestige 0.0208 0.0033 6.340 8.42e-09Type (b.c.) 7.8720 0.1811 43.462 <2e-16Type (prof.) 7.8584 0.3019 26.034 <2e-16Type (w.c) 7.9428 0.2453 32.379 <2e-16Are different models needed for different ranges of variables?Linear model explains the entire range of observationsLinearrelationshipLinearrelationshipVariables:•Women•Prestige•TypeOutliers affecting the modelPossible outliersModel may not account for a variable which explains these data pointsModel disregarding outlierThe total sum of squared residuals is further reduced by removing outliersFinal ModelAAAThis means that regardless of your job type, if you switched between jobs with the same level of prestige (e.g 62) to one which had a lower percentage of women (e.g. 57% to 10%), you could increase you income substantially (~$3,500)ConclusionsThe level of prestige (more than education) associated with a particular occupation best describes the income it will earnOccupations which employ a higher percentage of women will offer a lower incomeJob type (i.e. b.c., w.c., or prof) can be used to explain income differences between
View Full Document