PSU STAT 504 - Solutions to Assignment 1

Unformatted text preview:

Solutions to Assignment 1Stat 504, Spring 20031. (a) Yes, the binomial model is appropriate here. On each flip, the coin has some constantprobability p of coming up heads. The coin does not “remember” what it did in the past,so outcomes of successive flips are independent. Thus the two critical assumptions of thebinomial—that the probability of success is the same on every trial, and that trials areindependent—are satisfied. The fact that the coin is “unfair” doesn’t mean that X is notbinomial; it only means that p = .5.(b) No, the binomial model is not appropriate. We cannot assume that the probability of winningremains constant from game to game, because the ten opposing teams vary in ability; somewill be easier to beat than others. Instead of 30 independent trials, we have ten clusters ofthree trials each.(c) Because the subjects in this survey are from different households, there is little or no chancethat any subject’s response will affect any other’s, so the independence criterion is satisfied.Also, since each subject is randomly selected from the population of State College, eachone has the same aprioriprobability of answering “Yes.” Thus the binomial model isappropriate.Note: Some may say that the subjects’ answers in this survey are not independent becausethey are all from State College; they tend to have similar socioeconomic backgrounds, havebeen exposed to the same media coverage (e.g. in the Centre Daily Times), etc. and thereforecannot be truly independent. This line of reasoning, if taken to its logical conclusion, wouldmean that no event in this world can be independent of any other event, because all eventsoccur on the same planet! In practice, statistical independence of n Bernoulli trials does notmean that the trials must take place in completely separate environments. It does mean,however, that any common environmental factors must affect all trials to the same degree;that is, no two trials may be more alike than any other two trials. Since each subject inthis survey is randomly selected from a common population, this symmetry requirement issatisfied and the subjects’ answers may be considered independent. If the survey firm haddefined the target population to be “State College and Bellefonte” and drew 30 subjects atrandom from this combined population, then independence would again be satisfied, becauseit’s a simple random sample from a single population.(d) No, the independence assumption is violated because spouses are more likely to give similaranswers than non-spouses.2. (a) Yes, the Poisson model is appropriate. The exact distribution of each random variable Xiisbinomial,Xi∼ Bin(N, p =1/2, 000, 000).Since N is very large and p is very small, each Xiwill be approximately Poisson withparameter λ = Np.(b) As in part (a), each Xiwill, individually speaking, have an approximate Poisson distribution.But the parameters λ = Np will vary across the Xi’s, so the overall distribution of the sampleXi,X2,...,Xnwill not be Poisson.(c) Each Xirefers to the number of arrivals in a constant unit of time (a single hour). ThePoisson model would be appropriate if we could assume (i) that the average arrival rateλ does not vary throughout the day, and (ii) that the cars are arriving independently andnot in clusters. Although (ii) may be appropriate, (i) is certainly not; it is unreasonable tobelieve that the traffic flow at rush hour would be the same as in the middle of the night.The Poisson model is not appropriate.(d) The Poisson model may be appropriate here, because the average arrival rate λ may beessentially constant for each of the one-hour time periods.3. (a) If X ∼ Bin(30,.04), then:E(X)=np = 30(.04) = 1.2Var(X)=np(1 − p) = 30(.04)(.96) = 1.152SD(X)=√1.152 = 1.073P (X =0) =30!0! 30!.040.9630=.294P (X =1) =30!1! 29!.041.9629=.367P (X =2) =30!2! 28!.042.9628=.222P (X =3) =30!3! 27!.043.9627=.086P (X ≥ 4) = 1 − .294 − .367 − .222 − .086 = .031(b) This interval becomes degenerate at zero whenever ˆp = 0, which happens if X =0. Frompart (a), we know that if p = .04 then P (X =0)=.294, so the interval will miss the true pmore than 29.4% of the time.(c)l(p; x)=2logp +28log(1−p)0.05 0.10 0.15 0.20-9.5 -9.0 -8.5 -8.0 -7.5The MLE is ˆp =2/30 = .0667, and the value of the loglikelihood at the MLE isl(ˆp; x) = 2 log(.0667) + 28 log(.9333) = −7.348.The approximate 95% interval isˆp ± 1.96ˆp(1 − ˆp)n= .0667 ± 1.96(.0667)(.9333)30=(−.023,.156),which is not very sensible because it strays outside the parameter space.(d) On the logit scale, the loglikelihood looks like this:-4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5-9.5 -9.0 -8.5 -8.0 -7.5Both this plot and the one from part (b) look skewed. The MLE on the logit scale isˆφ = log ˆp/(1 − ˆp)=−2.639, and the interval isˆφ ± 1.961nˆp(1 − ˆp)= −2.639 ± 1.435 = (−4.074, −1.204).Transforming back to the p scale givesexp(−4.074)1+exp(−4.074)= .017 toexp(−1.204)1+exp(−1.204)= .231.(e) The interval based on the LR method includes all values of p for which the likelihood isgreater thanl(ˆp; x) − 1.92 = −7.35 − 1.92 = −9.27.Examining l(p; x)overagridofp-values reveals that l(p; x)=−9.27 at p ≈ .011 and p ≈ .192,so the LR interval is (.011,.192). This interval is probably the best one.4. (a) The ML estimate isˆλ =(2+0+1+1+0)/5=0.8andtheintervalisˆλ ± 1.96ˆλn= .8 ± 1.96.85=(.016, 1.584).It’s doubtful that the large-sample approximation is working well with a sample of onlyn =5.(b) For φ =logλ, the Fisher information isI(φ)=I(λ)[ φ(λ)]2=n/λ(1/λ)2= nλ,the interval islogˆλ ± 1.961nˆλ=(−1.203, 0.757),and the interval on the λ-scale isexp(−1.203) = 0.300 to exp(0.757) = 2.132.For φ = λ1/3, the Fisher information isI(φ)=I(λ)[ φ(λ)]2=n/λ1/(3λ2/3)2=9nλ1/3,the interval isˆλ1/3± 1.9619nˆλ1/3=(0.625, 1.232),and the interval on the λ-scale is(0.625)3=0.244 to (1.232)3=1.868.(c) The sample variance isS2=(2 − .8)2+(0− .8)2+(1− .8)2+(1− .8)2+(0− .8)25 − 1=0.7,and the 97.5th percentile of the t-distribution with 4 degrees of freedom is 2.776, so thenormal-theory interval is.8 ± 2.7760.75=(−0.239, 1.839).The data are clearly non-normal, so I wouldn’t trust the interval. Moreover, it strays outsidethe parameter space.(d) The loglikelihood function isl(λ; x)= ixilog λ − nλ =4logλ − 5λ.Plotting it versus λ:0.5 1.0 1.5 2.0-7.5 -7.0 -6.5


View Full Document

PSU STAT 504 - Solutions to Assignment 1

Download Solutions to Assignment 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Solutions to Assignment 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Solutions to Assignment 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?