This preview shows page 1-2-3-4-5-37-38-39-40-41-42-74-75-76-77-78 out of 78 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 78 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 2: New data from old (recoding, reshaping, transforming)Last time•We tried to make explicitly the role that computation plays in modern data analysis as part of a larger sales pitch for our approach to this class and the use of R as our main platform for understanding data•We then went on a (wobbly) detour back to the late 1600s and early 1700s to talk about the first test of hypothesis as well as other data use patterns that hold true today•Once collected and distributed, data find secondary uses that the “creators” can’t always anticipate•Our view of data (numerical, graphical, auditory, ... ) is mediated by the available “technology” (actual tools as well as the accepted or dominant representations of the time)•We ended by introducing a data set that we’ll examine in more detail today...Today•We are going to play with the idea of a data table a fair bit -- We will examine how we can reshape, aggregate and reformat data to provide us with alternate views of some phenomenon•Along the way, we will spend some time talking about privacy and about how the computer represents time -- We’ll also learn some basic graphical tools for visualizing simple data typesWhat’s it add up to?•Last time I asked you to start to compile information about where you give off data -- At the end of the week I was hoping you’d think a little about what all that data might amount to if someone were to have access to it•Under the heading of surprising results in this direction, Aaron Cope (formerly at Flickr) had a great idea to use the millions of geotagged images that have been uploaded •Often these images have an explicit geotag (think lat/lon) and users will add text tags that describe where they are (the United States, Texas, London) -- Aaron’s idea was to use all the geotag points together with the explicit place names to build a map of those places...One dangling thread•Last time we jumped from the christening records aggregated beginning in the sixteenth century to the proliferation of data that is recorded now when a child is born •Latanya Sweeny (a computer scientist from Carnegie Mellon University)presents a simple example of the “information explosion” that took place even in the span from 1925 to 1999A dangling thread•We then looked at another source of data that registers births in the United States -- The Social Security Administration also collects information and has made it available in a relatively simple form•They have released a portion of their data for the years 1880 to 2009 -- In each case, they report the names given to babies born in that year together with a count broken down by gender•For privacy reasons, they only report name-gender pairs associated with five or more infants in a given year -- The data are exported in separate files, each a CSV file (Comma Separated Values)A dangling thread•I’ve put the data up on our course web site and you can have a look with your browser or you can read them directly into R•The data in each file is a table where the rows correspond to name-gender pairs together with a count for the year (encoded in the name of the file)•Each row consists of a series of fields (or attributes) separated by, well, commasA dangling thread•You can read these data into R and make simple yearly comparisons -- For example, you might examine the most popular and least popular names each year•We might also consider the popularity of the most frequent name each year -- That is, are we seeing more unique names and fewer “heavy hitters” that are given to relatively large numbers of babies each yearIsabella,F,22067Emma,F,17716Olivia,F,17246Sophia,F,16743Ava,F,15730Emily,F,15204Madison,F,15097Abigail,F,14232Chloe,F,11785Mia,F,11319Elizabeth,F,10879Addison,F,10567Alexis,F,9839Ella,F,9560Samantha,F,9551Natalie,F,9324Grace,F,8194Lily,F,8016Alyssa,F,7900Ashley,F,7741Sarah,F,7652Taylor,F,7517Hannah,F,7482Brianna,F,7281Hailey,F,7262Jacob,M,20858Ethan,M,19664Michael,M,18677Alexander,M,18025William,M,17696Joshua,M,17418Daniel,M,17336Jayden,M,17082Noah,M,17061Anthony,M,16139Christopher,M,16136Aiden,M,15846Matthew,M,15777David,M,15236Andrew,M,14675Joseph,M,14674Logan,M,14331James,M,14022Ryan,M,12986Benjamin,M,12944Elijah,M,12652Gabriel,M,12648Christian,M,12498Nathan,M,11990Jackson,M,11988Ziham,F,5Zikia,F,5Zimaya,F,5Zimora,F,5Zinaya,F,5Zirah,F,5Zoii,F,5Zona,F,5Zoriyah,F,5Zowey,F,5Zuheily,F,5Zujeily,F,5Zula,F,5Zuleimy,F,5Zuley,F,5Zuliana,F,5Zulmy,F,5Zuriyah,F,5Zyani,F,5Zyien,F,5Zykierra,F,5Zynaria,F,5Zynique,F,5Zyrie,F,5Zyriel,F,5Zekhi,M,5Zepplin,M,5Zequan,M,5Zereon,M,5Zevion,M,5Zhen,M,5Zhyair,M,5Zien,M,5Zier,M,5Zildjian,M,5Zim,M,5Zimir,M,5Ziyun,M,5Zlatan,M,5Zoen,M,5Zubayr,M,5Zuhaib,M,5Zykee,M,5Zykell,M,5Zylar,M,5Zyquarius,M,5Zyran,M,5Zyreion,M,5Zyrian,M,5Zyvion,M,52009’s most frequent ... and the leastA dangling thread•Reading the data into R and manipulating things a little, we can look at not just the count but the frequency of name-gender pairs in each year •In 2009 the SSA says there were 2,095,910 boys born (well, SS cardholders born) and 2,001,968 girls -- The top name for a boy that year was Jacob with a frequency (count) of 20,858 or a relative frequency of 20858/2095910 = 1% while the opt name for a girl was Isabella with a relative frequency of 1.1%•Compare the relative frequencies in 1909 to 2009 on the right -- What do you notice? What other questions might you ask?> head(boys1909) name gender freq relfreq2548 John M 9591 0.058491692549 William M 7914 0.048264342550 James M 7593 0.046306692551 George M 4687 0.028584152552 Robert M 4565 0.027840122553 Joseph M 4348 0.02651672> head(girls1909) name gender freq relfreq1 Helen F 9248 0.028202522 Margaret F 7358 0.022438813 Ruth F 6508 0.019846674 Dorothy F 6250 0.019059885 Anna F 5803 0.017696716 Elizabeth F 5175 0.01578158> head(boys2009) name gender freq relfreq2 Jacob M 20858 0.0099517633 Ethan M 19664 0.0093820824 Michael M 18677 0.0089111655 Alexander M 18025 0.0086000837 William M 17696 0.0084431118 Joshua M 17418 0.008310471> head(girls2009) name gender freq relfreq1 Isabella F 22067 0.0110226546 Emma F 17716 0.00884929210 Olivia F 17246 0.00861452313 Sophia F 16743 0.00836327118 Ava F


View Full Document

UCLA STATS 13 - lecture2

Documents in this Course
lab8

lab8

3 pages

Lecture 3

Lecture 3

117 pages

lecture14

lecture14

113 pages

Lab 3

Lab 3

3 pages

Boost

Boost

101 pages

Noise

Noise

97 pages

lecture10

lecture10

10 pages

teach

teach

100 pages

ch11

ch11

8 pages

ch07

ch07

12 pages

ch04

ch04

10 pages

ch07

ch07

12 pages

ch03

ch03

5 pages

ch01

ch01

7 pages

ch10

ch10

7 pages

Lecture

Lecture

2 pages

ch06

ch06

11 pages

ch08

ch08

5 pages

ch11

ch11

9 pages

lecture16

lecture16

101 pages

lab4

lab4

4 pages

ch01

ch01

7 pages

ch08

ch08

5 pages

lecture05

lecture05

13 pages

Load more
Download lecture2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?