This preview shows page 1-2-3-4-5-6-7-8-54-55-56-57-58-59-60-110-111-112-113-114-115-116-117 out of 117 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 117 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 3: ScanningLast time•We played with the idea of a data table a fair bit -- We examined how we can reshape, aggregate and reformat data to provide us with alternate views of some phenomenon•Along the way, spent some time talking about privacy and about how the computer represents time -- We’ll also learn some basic graphical tools for visualizing simple data typesToday•We start with another “What could this amount to?” example, this time using search engine logs as an input -- We again play privacy against utility of aggregate data•We then finish off the registrar’s data, examining a “building view” that starts us thinking about spatial data and anticipates where we’re headed later in the quarter -- We’ll also see some other data the registrar sold us and think about how they could be usefully “joined” to address new questions•Finally, we take up some simple graphical and numerical summaries (a la your Chapters 4 and 5) for a large survey data set from the CDC -- These data will be the subject of your next homework assignment (they also provide a modern “monitoring” or “scanning” example that builds on the Bills of Mortality)What does it add up to?•Last time we saw that your geotagged Flickr images could be used to construct the outlines of places (cities, states, countries, continents) when looked at in the aggregate•We also mentioned a privacy issue related to search engines and that by considering all the searches you have performed over a long period of time, it might be possible to uniquely identify you•Today, we’ll see what unexpected uses people can make of your searching behavior -- This is a medical or at least an epidemiological example!Flu season•Through the U.S. Influenza Sentinel Physicians Surveillance Network, the Centers for Disease Control and Prevention (CDC) monitors the percentage of a physician’s patients that exhibit Influenza-like Illness (a fever and a cough and/or a sore throat, in the absence of a known cause other than the flu)•Physicians in the network send information to the CDC, which, in turn, aggregates the data across states and 10 or so higher-level regions (Pacific, Mountain, North East, etc.) -- Unfortunately this network can be slow to aggregate report and slow to aggregate•In terms of a surveillance system, Google (and Yahoo!) noticed that people take to the web to self-diagnose when they’re not feeling well, and that this activity happens in advance of a trip to the doctor -- This means that examining patterns in search queries might alert authorities to a coming epidemic before it registers with the network of doctorModel selection•In all, Google reported testing out 50 million different search queries, evaluating each with the simple regression equation (we’ll get to this a little later in the quarter) -- Imagine a plot with the proportion of times a given term was queried on the x-axis and the CDC’s ILI on the y-axis•In some sense, Google went through 50 million such plots (well, 450 million since they made one for each of 9 regions) and found those that had the “best” looking relationships -- That is, those terms for which the search proportion predicted the ILI wellModel validation•In the end, some 53 terms were chosen and a final model formed by averaging all the separate query ratios -- In an accompanying technical report the investigators write:•“We noted that the 53 highest scoring search queries appeared to be related to influenza-like illnesses. They describe symptoms, treatments, medications and other diseases that an average person might associate with influenza. The next highest scoring query ‘high school basketball,’ was the highest scoring off-topic query on the list: basketball season tends to coincide with influenza season in the United States.”•The model was then verified through the 2007-2008 flu season, with reportedly good success -- That is, the query model tracked the ILI reports, but were available much faster than the data from the physician’s networkFlu season•Of course Google makes their estimates available (why would I be telling you this story otherwise?) -- The current data are in CSV (comma separated values) format and R has a “convenience” function for reading these data in•On the next page, we show the single line of code that reaches out to the web to pull these data -- notice that we are not using source() as we had in lab, but instead we are reading a CSV file from a URL•(Oh and the arguments to our read function tell R to skip the first 11 lines of the file, since there’s a lot of boilerplate up there; and that the file includes column headings)> flu <- read.csv(url("http://www.google.org/flutrends/us/data.txt"),skip=11)> dim(flu)[1] 393 180> names(flu) [1] "Date" [2] "United.States" [3] "Alabama" [4] "Alaska" [5] "Arizona" [6] "Arkansas" [7] "California" [8] "Colorado" ... (clipped output) [71] "Tucson..AZ" [72] "Alameda..CA" [73] "Berkeley..CA" [74] "Beverly.Hills..CA" [75] "Fresno..CA" [76] "Irvine..CA" [77] "Los.Angeles..CA" ... (clipped output)[135] "Albany..NY" [136] "Buffalo..NY" [137] "New.York..NY" [138] "Rochester..NY" [139] "Syracuse..NY" ... (clipped output)> plot(as.Date(flu$Date),flu$Los.Angeles,type="l")> lines(as.Date(flu$Date),flu$New.York,col="magenta")0 2000 4000 6000 8000 10000 12000 14000flu$Los.Angeles2004 2006 2008 2010One final note•The CDC also makes its ILI data available -- On the next page we present their data for the 2007-2008 flu season (it was hard to find newer data on their web site although I’m reasonably sure it has been published)•The data are in the form of an HTML table and require other R-related tools to read it in -- Even when we


View Full Document

UCLA STATS 13 - Lecture 3

Documents in this Course
lab8

lab8

3 pages

lecture2

lecture2

78 pages

lecture14

lecture14

113 pages

Lab 3

Lab 3

3 pages

Boost

Boost

101 pages

Noise

Noise

97 pages

lecture10

lecture10

10 pages

teach

teach

100 pages

ch11

ch11

8 pages

ch07

ch07

12 pages

ch04

ch04

10 pages

ch07

ch07

12 pages

ch03

ch03

5 pages

ch01

ch01

7 pages

ch10

ch10

7 pages

Lecture

Lecture

2 pages

ch06

ch06

11 pages

ch08

ch08

5 pages

ch11

ch11

9 pages

lecture16

lecture16

101 pages

lab4

lab4

4 pages

ch01

ch01

7 pages

ch08

ch08

5 pages

lecture05

lecture05

13 pages

Load more
Download Lecture 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?