DOC PREVIEW
MIT 17 871 - STATA merge and reshape commands

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

17.871Spring 2007How to Use the STATA merge and reshape commandsMost of the projects done in 17.871, and in fact most interesting research, require combiningdata sets. This handout reviews using the most valuable command for managing multiple datasets, the merge command. In addition, we are often interested in combining multipleobservations from some unit of analysis (like countries or states or people) to create a panel dataset. The reshape command helps to move between different organizations of the data.The merge commandLet us say we are researching the effect of using different voting equipment on the tendency ofvoters to cast “blank” or “spoiled” ballots (that is, to record no vote, or to record multiple votes). One data set might record the number of votes cast and the number of blank ballots in each town. Here is an example:Exhibit 1. ballots.dtatown blank92 ballot92 blank96 ballot96Barnstable 1163 22747 232 22467Bourne 125 7885 61 8032Brewster 42 5480 62 5498Chatham 35 4558 103 4475Dennis 294 8732 86 8493Eastham 20 3013 26 3151Falmouth 381 16377 169 16224Harwich 184 6741 89 6737Mashpee 22 4619 46 4962Orleans 109 4251 113 4229Provincetown 15 2488 6 2268Sandwich 35 9151 62 9757Truro 3 1161 7 1156Wellfleet 14 1815 24 1768Yarmouth 441 12862 104 12710The variable town identifies a town (duh!), blank92 and blank96 record the number of blankballots in these town for 1992 and 1996, and ballot92 and ballot96 record the total number ofballots cast in the town. Let us suppose we have saved this data set in the file ballots.dtaAnother data set might record the ballot method used in the town, like this:2Exhibit 2. machines.dtatown method92 method96Barnstable Paper AccuVoteBourne AccuVote AccuVoteBrewster Optech OptechChatham Paper OptechDennis Optech OptechEastham Paper OptechFalmouth Paper AccuVoteHarwich Optech OptechMashpee Paper OptechOrleans Optech OptechProvincetown Paper PaperSandwich Optech OptechTruro Paper PaperWellfleet Paper PaperYarmouth Optech OptechAs before, town is the town. The variables method92 and method96 record the type of votingdevice the town used in 1992 and 1996, respectively. The data are saved in machines.dta.Notice the following important point: Both ballots.dta and machines.dta have a commonvariable, town, which uniquely identifies the cases. (In this example, the town name identifiesthe cases. For larger, more complex cases, it is often advisable to use a numerical variable as thejoint identifier.)To merge two data sets, follow these steps:(1) Sort both data sets on the common identifying variable and save them to disksorted.(2) Use one of the data sets.(3) Issue the merge command, using the following syntax:merge commonvariable using remotefilenameThis set of commands will augment the data set you had used (in step 2), by adding the variablesfrom the remote file. In addition, a new variable will be created, called _merge. The variable_merge will be equal to 3 if the case was in both data sets, 1 if the case was in the “master” dataset (i.e., the one used in step 2) but not in the “using” data set, and 2 if the case was in the“using” data set but not originally in the “master” data set.So, the following commands use ballotssort townsave ballots,replaceuse machinessort townsave machines,replace3use ballotsmerge town using machineswould produce the following data set:Exhibit 3. Merged data settown blank92 ballot92 blank96 ballot96 method92 method96 _mergeBarnstable 1163 22747 232 22467 Paper AccuVote 3Bourne 125 7885 61 8032 AccuVote AccuVote 3Brewster 42 5480 62 5498 Optech Optech 3Chatham 35 4558 103 4475 Paper Optech 3Dennis 294 8732 86 8493 Optech Optech 3Eastham 20 3013 26 3151 Paper Optech 3Falmouth 381 16377 169 16224 Paper AccuVote 3Harwich 184 6741 89 6737 Optech Optech 3Mashpee 22 4619 46 4962 Paper Optech 3Orleans 109 4251 113 4229 Optech Optech 3Provincetown 15 2488 6 2268 Paper Paper 3Sandwich 35 9151 62 9757 Optech Optech 3Truro 3 1161 7 1156 Paper Paper 3Wellfleet 14 1815 24 1768 Paper Paper 3Yarmouth 441 12862 104 12710 Optech Optech 3The _merge variable confirms that we brought in a balanced set of cases from both data sets. (Inother words, it confirms that there aren’t towns that we have election returns for that we don’thave voting equipment data for, and vice versa.) However, it’s a nuisance variable once themerge has been successfully completed, so it’s a good idea to drop it once you’ve satisfiedyourself that everything is OK.The reshape commandThe data set shown in Exhibit 3 could be used to analyze the rate of ballot blanking in 1992 and1996, separately, as a function of voting technology used. And, it could be used to study thechange in ballot blanking from 1992 to 1996, as a function of the change in voting technology. However, we might also want to treat the data set as consisting of 30 town-year observations,rather than 15 separate town observations. To do this, I would want to “stack up” the differenttown observations, having the 1992 variables for blanks, ballots, and method appearing first (andidentified according to year), followed by the 1996 variables for blanks, ballots and method (alsoidentified according to year). We can do this using the reshape command.Notice first the naming convention I adopted in the above data sets: each substantive variablename consists of a stem (blank or ballot) followed by the year in question (92 or 96).Then, the following commandreshape long blank ballot method, i(town) j(year)4will “reshape” the Exhibit 3 data set as follows:Exhibit 4. Reshaped, merged data settown blank ballot method yearBarnstable 1163 22747 Paper 92Bourne 125 7885 AccuVote 92Brewster 42 5480 Optech 92Chatham 35 4558 Paper 92Dennis 294 8732 Optech 92Eastham 20 3013 Paper 92Falmouth 381 16377 Paper 92Harwich 184 6741 Optech 92Mashpee 22 4619 Paper 92Orleans 109 4251 Optech 92Provincetown 15 2488 Paper 92Sandwich 35 9151 Optech 92Truro 3 1161 Paper 92Wellfleet 14 1815 Paper 92Yarmouth 441 12862 Optech 92Barnstable 232 22467 AccuVote 96Bourne 61 8032 AccuVote 96Brewster 62 5498 Optech 96Chatham 103 4475 Optech 96Dennis 86 8493 Optech 96Eastham 26 3151 Optech 96Falmouth 169 16224 AccuVote 96Harwich 89 6737 Optech 96Mashpee 46 4962 Optech 96Orleans 113 4229 Optech 96Provincetown 6 2268 Paper 96Sandwich 62 9757 Optech 96Truro 7 1156 Paper 96Wellfleet 24 1768 Paper 96Yarmouth 104 12710 Optech 96We now have a new data set with twice as many observations as before:


View Full Document

MIT 17 871 - STATA merge and reshape commands

Documents in this Course
Load more
Download STATA merge and reshape commands
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STATA merge and reshape commands and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STATA merge and reshape commands 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?