Unformatted text preview:

17.871, Political Science LabProblem set # 1: Using STATASpring 2012 Handed out: Feb. 13Due: Feb. 27, at the beginning of class. (Please print before coming to class. Unfortunately, the printers in our classroom are ridiculously slow and loud.) Write a do-file that responds to all the parts of this problem set. Turn in the do-file and the log-file that shows that the do-file works. Clearly label each do-file. Make a habit of writing comments in the do-file, to help us and you keep track of things. (You can make non-executed comments in a do-file using the front slash and asterisks as follows:/* THIS IS WHAT A COMMENT LOOKS LIKE IN A DO-FILE */ Part I: Golf putting data (6 points for the part, one point for each step) Variablesdist distance to hole in feettries number of putting attemptssuccess number of successful puts (one hit only) 1. Open the data set putting.dta from the 17.871 course locker (Examplesfolder) or off the class website. Paste the code for opening the file into your do file as your answer to question 1. (Don't forget to use the clearcommand first.)2. Examine the data:a. With small data sets, you can easily see the data with the list command. Try it.b. What are the mean, min, and max of each variable? (Hint: summarize)c. Use the tabulate command to examine the distribution of each variable one at a time.d. Use the tabulate command to examine the distribution of each variable one at a time with one line of code. (Hint: tab1)3. Create a new variable called success_rate that is equal the proportion of successes. (Hint: generate)4. Label your new variable "Put success rate (proportion)." (Hint: label variable)5. Create a scatter plot of success rate (y-axis) by distance (x-axis). (Hint: scatter)6. Which is the dependent variable and which is the independent variablein step 5? Why? (Don't forget to save your do file.)2 Part II: Getting data into STATA (five points for the part) Data comes in many forms. Here's one way to get data into Stata. Using a text editor (such as EMACS), type the text from Exhibit 1 in the handout “How to Use the STATA infile and infix Commands” into Athena and save it in a file named scores.dat on your home directory. Write the code that will create a STATA data set from this raw data and save it as a file called “scores.dta”. Use the list command to see your data.Part III: Speed-dating data (16 points for the part, one point for each step, except for point 12)Speed-dating data from studies conducted in New York City by Ray Fisman and Sheena Iyengar, an economist and a psychologist at Columbia University. If you're interested, they summarize their findings in this paper. You'll need to familiarize yourself with the codebook (see the class website). Here's the abstract for the paper:We study dating behavior using data from a Speed Dating experiment wherewe generate random matching of subjects and create random variation in the number of potential partners. Our design allows us to directly observe individual decisions rather than just final matches. Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligenceor ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in afuent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size. 1. Open the speed_dating.dta file from the 17.871 course locker (Examples folder) or off the class website. Paste the code for opening the file into your do file as your answer to question 1. (Don't forget to use the clear command first.)2. Get a sense for the data: How many unique subjects participated in the experiments? (Hint: tabulate)3. The questions in this section are about variables that are constant across the multiple speed-dating waves, such as a self-reported question about how often students go on dates. To analyze the responses, we need to eliminate the multiple occurrences of participants so that each individual occurs only once in the data set (that is, one row per person). To do so, use the collapse command. With this command, we can take the average of participants’ responses.3a. To see which variables we are going to analyze in this section, first run the following command: sum wave gender date dec *1_1 (Note how the * acts as a wildcard.)b. Eliminate multiple occurrences by running the following command: collapse wave date gender dec *1_1, by(iid). c. Recode the variable date so that the values roughly correspond with number of dates per year (e.g., once a week = 52) and call this variable dates. Do this with generate and replace. Drop this first variable (drop dates). d. Do this recoding again with recode. (Hint: the recode command is structured very differently from generate or replace. Look back atthe book or at the Stata help file for examples of how to use it.)e. What's the modal category on the dates variable? (Hint: tabulate)f. On average, how many dates do participants go on each year?g. How many men and how many women participated in the experiments? h. Who goes out on dates more often: men or women? (Hint: tabulategender with sum(dates) as an option.)i. In speed dating, are men or women more selective? (Hint: similarto previous question)j. In waves 6-9, the experimenters use different scales for the attribute preference questions (e.g., attr1_1). To simplify, drop waves six through nine. (Hint: drop if wave == 6) You can also save yourself time with the following command: for num 6/9: drop if wave == X.k. Do men and women report placing similar weights on traits in potential partners? What's the biggest difference? (Hint: by gender,sort: sum attr1_1)4. Using a non-collapse data set with waves 6-9 restored, determine which participant(s) (iid) sought the most matches? (Hint: first clear and reload the original data set, then create a variable decisions that totals the number decisions to pursue a match by each participant (dec) with the egen command (egen decisions = total(dec), by(iid)).5. What was the maximum number of "matches" participants received across the speed dating rounds? (Hint: similar to 4.)6. What was the highest success rate observed among participants? (Hint: create a new variable match_rate with the generate that equals matches divided into decisions.)7. The speed-dating data contains a variable that codes the median SAT


View Full Document

MIT 17 871 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?