Unformatted text preview:

17.871, Political Science Lab Spring 2009 Problem set # 1: Using STATA Handed out: Feb. 9, 2009 Due: Feb. 23, 2009, at the beginning of class. For Parts I-III and V, turn in a “log” file produced from running the do file. Part I: Golf putting data (one point each) Variables dist distance to hole in feet tries number of putting attempts success number of successful puts (one hit only) 1. Open the data set putting.dta from the 17.871 course locker (Examples folder) or off the class website. Paste the code for opening the file into your do file as your answer to question 1. 2. Examine the data: a. With small data sets, you can easily see the data with the list command. b. What are the mean, min, and max of each variable? (Hint: summarize) Use the underlined characters as a command shortcut. c. Use the tabulate command to examine the distribution of each variable one at a time. d. Use the tabulate command to examine the distribution of each variable one at a time with one line of code. (Hint: tab1) 3. Create a new variable called success_rate that is equal the proportion of successes. (Hint: generate) 4. Label your new variable "Put success rate (proportion)." (Hint: label variable) 5. Create a scatter plot of success rate (y-axis) by distance (x-axis). (Hint: scatter) 6. Which is the dependent variable and which is the independent variable? Why? Part II: Speed-dating data (one point each) Speed-dating data from studies conducted in New York City by Ray Fisman and Sheena Iyengar, an economist and a psychologist at Columbia University. If you're interested, they summarize their findings in this paper. You'll need to familiarize yourself with the codebook (see the class website). Here's the abstract for the paper: We study dating behavior using data from a Speed Dating experiment where we generate random matching of subjects and create random variation in the number of potential partners. Our design allows us to directly observe individual decisions rather than just final matches. Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in affluent neighborhoods. Finally,male selectivity is invariant to group size, while female selectivity is strongly increasing in group size. 1. Open the speed_dating.dta file from the 17.871 course locker (Examples folder) or off the class website. Paste the code for opening the file into your do file as your answer to question 1. 2. How many unique subjects participated in the experiments? (Hint: tabulate) 3. The questions in this section are about variables that are constant across the multiple speed-dating waves, such as a self-reported question about how often students go on dates. To analyze the responses, we need to eliminate the multiple occurrences of participants so that each individual occurs only once in the data set (that is, one row per person). To do so, use the collapse command. With this command, we can take the average of participants’ responses. a. To see which variables we are going to analyze in this section, first run the following command: sum wave gender date dec *1_1 (Note how the * acts as a wildcard.) b. Eliminate multiple occurrences by running the following command: collapse wave date gender dec *1_1, by(iid) c. Recode the variable date (see page 4 of the codebook) so that the values roughly correspond with number of dates per year (e.g., once a week = 52) and call this variable dates. Do this with generate and replace. Drop this first variable (drop dates). d. Do this recoding again with recode. e. What's the modal category on the dates variable? (Hint: tabulate) f. How many dates does the average participant go on each year? (Hint: summarize) g. How many men and how many women participated in the experiments? h. Who goes out on dates more often: men or women? (Hint: tabulate gender with sum(dates) as an option.) i. In speed dating, are men or women more selective? (Hint: similar to previous question) j. In waves 6-9, the experimenters use different scales for the attribute preference questions (e.g., attr1_1). To simplify, drop waves six through nine. (Hint: drop if wave == 6) You can also save yourself time with the following command: for num 6/9: drop if wave == X . k. Do men and women report placing similar weights on traits in potential partners? What's the biggest difference? (Hint: by gender, sort: sum attr1_1) These questions appear on pages 5-6 of the codebook. 4. Which participant(s) (iid) sought the most matches? (Hint: first create a variable decisions that totals the number decisions to pursue a match by each participant (dec) with the egen command (egen decisions = total(dec), by(iid).) Before answering this question, reopen the survey to restore waves 6-9 and to undo the collapse command. 5. What was the maximum number of "matches" participants received across the speed dating rounds? (Hint: similar to 4.)6. What was the highest success rate observed among participants? (Hint: create a new variable match_rate with the generate that equals matches divided into decisions.) 7. The speed-dating data contains a variable that codes the median SAT score for participants’ undergraduate institutions. The variable, however, is not coded in numeric form. What form is it in? Convert it to a numeric variable. (Use describe to determine the variables’ format. Use destring to convert the variable. You will have to use the ignore and replace options.) 8. SAT terciles I: Create a variable that equals 1 for the bottom third of participants’ undergraduate institutions based on the median SAT variable, 2 for the middle third, and 3 for the top third. First do so with recode using the generate new variable option. 9. SAT terciles II: Now that you've practiced recoding, show how you can save yourself considerable time in the future by creating this variable again using xtile. 10. Does a higher SAT tercile predict a higher match_rate? (Hint: use one of the commands above.) 11. More practice with the collapse command. a. Create a new data set that contains the average ratings for each self-reported attribute (e.g., attr3_1) and the average ratings by partners for each participant (e.g., attr_o). (Hint: use the collapse command with the by option as in


View Full Document

MIT 17 871 - Problem set # 1: Using STATA

Documents in this Course
Load more
Download Problem set # 1: Using STATA
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem set # 1: Using STATA and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem set # 1: Using STATA 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?