DOC PREVIEW
UCLA STATS 10 - final_practice1a

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Practice for Final - Stats 10/1Summer 2008Ryan Rosario - Section 1A1. According to a 2007 data mining project performed by your TA, 4900 UCLA Facebook profilesand friend lists were downloaded to a hard disk using a Python crawler. The crawler startsby choosing a random Facebook UCLA profile, downloads it to disk, and thenaccesses the list of that user’s UCLA Facebook friends. The crawler then visitsthe profiles for all of those friends and moves on to their friends etc.One aspect was to determine whether or not third party (non-Facebook developed) wallapplications would bias an analysis of standard Wall posts. The number of profiles containingan active third party Wall box are listed below.Application Name ProfilesSuperWall 559FunWall 206SuperWall and FunWall 84(a) Draw an appropriate Venn diagram to represent this context. Label the diagram withprobabilities. Let F represent FunWall and let S represent SuperWall.(b) Find the probability that a randomly selected UCLA Facebook profile in this sampledoes not use either of these third party wall applications.1(c) Suppose my ultimate goal was to determine the proportion of all Facebook users thatdo not have either of these wall applications active. Suppose I use a confidence interval.Provide two reasons why using a confidence interval for this analysis is not valid. Youdo not need to know anything about Facebook to answer this question. Hint: Rereadthis entire problem very carefully, and read the bolded statements.(d) Consider the following table that breaks down the number of UCLA Facebook users inthe sample that displayed their Wall and those that have their Walls hidden (hidden tothe crawler) by gender.Sex Wall Displayed Wall Hidden Row TotalMale 2164 188 2352Female 1835 713 2548Column Total 3999 901 4900i. Among those with hidden Walls, what is the probability that the user is female?ii. Are gender and Wall privacy independent? Why or why not? Use a probablisticargument!!2(e) Suppose the crawler flips a coin to determine whether or not to download the currentprofile to disk. If the coin shows heads then the profile is downloaded to disk, otherwisewe just skip over it without saving any of its data. If it downloads the profile to disk, itwill visit that profile’s friends with probability 0.75. If it does not download the profileto disk, it will visit that profile’s friends with probability 0.1.i. Draw the tree or contingency table that corresponds with the situation describedabove. Annotate all events and probabilities for each branch.ii. Find the probability that the crawler will visit the profiles of the current user’sfriends.iii. Suppose we know that the crawler will visit the current user’s friends. What is theprobability that the current user’s profile was downloaded to disk?3Reach for the STARs. In 1998, Gray Davis approved the Standardized Testing and Re-porting Program which mandated the use of Stanford Achievement Test, Ninth Edition asthe sole norm-referenced measure of educational outcomes in the state. In 2003, the Cali-fornia Department of Education approved replacing Stanford 9 with another test, CaliforniaAchievement Tests, Sixth Edition but only required 3rd and 7th graders to take the test.Everybody else had to take a different battery of tests referred to as Content Standards Test.In this problem, we will analyze a couple of facets of this decision using the material we havestudied this quarter.2. During the research phase leading to this decision, a sample of school districts were selected(and paid) to administer both Stanford 9 and CAT/6 to all of its students. The table belowdisplays 10 pairs of scaled scores for the Language subtest of each battery. Scaled scores takeinto account the difficulty of items on the test as a way of normalizing among different forms.Scaled scores on Stanford 9 range from 200 to 999. Scaled scores on CAT/6 Language rangefrom 0 to 999.Stanford 9 CAT/6 Survey200 50310 125450 500520 600600 670650 690732 780810 820900 950960 999(a) Make a scatterplot of these test scores. Clearly denote what x and y represent.4(b) Compute the sample correlation coefficient r between Stanford 9 language and CAT/6Language. To maximize partial credit, make sure to show all of your work, including sx,sy, ¯x, ¯y as well as the formula and the numbers you have plugged in!(c) For brevity, the sample in part a only has size 10. The actual correlation coefficient isr = 0.89. Compute the regression model equation using the following statistics: ¯x = 500,¯y = 500, sx= 200, sy= 165. To maximize partial credit, show all steps including yourcomputation ofˆb0andˆb1.(d) Interpret the slope and intercept of your regression model. Comment on anything strangeyou may notice, what may have caused it, and how you may be able to resolve it.5(e) Using your regression model from the previous problem, predict the scaled score a studentthat received a 610 on Stanford 9 Language would receive on CAT/6 Language. Supposethe student’s true CAT/6 Language score is 570. Compute the residual. Make a commentabout the model’s prediction.(f) Compute r2and interpret it.(g) Suppose the psychometrician that did this study forgot to include a pairing of scores.This particular student scored 700 on Stanford 9 Language and 360 on CAT/6 Language.Select the correct statement.r will: increase decrease remain about the same63. In addition to national percentiles, test publishers may choose to report other metrics such asstanines and grade equivalents. Stanines express test performance on a scale from 1 to 9 withscores of 1, 2, and 3 representing “below average,” 4, 5, and 6 representing “average” and7, 8, and 9 representing “above average.” Each stanine is 0.5 standard deviations in width,except for the first and ninth which are larger (see the diagram below).Grade equivalents on the other hand, are decimals ranging from 0.0 to 12.9 in 0.1 incrementsthat express scaled scores as an approximate grade level and time of year. The digit beforethe decimal (0 to 12) represents grade level and the digit after the decimal (0 to 9) representsthe month of the school year, assuming a 10 month school year. Grade equivalents alloweducational administrators to gauge approximate grade level improvement overtime.Suppose on CAT/6 Reading the mean scaled score is 500 with standard deviation 150.For parts a, b, and c, use the following graphic (carefully) to help answer thefollowing questions.−3 −2 −1 0 1 2 30.0 0.1 0.2 0.3


View Full Document

UCLA STATS 10 - final_practice1a

Download final_practice1a
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view final_practice1a and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view final_practice1a 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?