UW STAT 561 - Estimation of a Discrete monotone Distribution - D2770709

Home> Schools> University of Washington> Statistics (STAT) > STAT 561> Estimation of a Discrete monotone Distribution

UW STAT 561 - Estimation of a Discrete monotone Distribution

Course Stat 561- Special Topics in Applied Statistics

Pages 39

Download Save

Unformatted text preview:

Estimation of a Discrete Monotone DistributionHanna K. Jankowski and Jon A. WellnerOctober 19, 2009AbstractWe study and compare three estimators of a discrete monotone distribution: (a)the (ra w) empirical estimator; (b) the “method of rearrangements” estimator; and (c)the maximum likelihood estimator. We show that the maximum likelihood estimatorstrictly dominates both the rearrangem ent and empirical estimators in cases whenthe distribution has intervals of constancy. For example, when the distribution isuniform on {0,...,y},theasymptoticriskofthemethodofrearrangementsestimator(in squared !2norm) is y/(y +1), while the asymptotic risk of theMLE is of order(log y)/(y +1). For strictly decreasing distributions, the estimators are asymptoticallyequivalent .1IntroductionThis paper is motivated in large part by the recent surge of acitivity concerning “methodof rearrangement” estimators for nonparametric estimation of monotone functions: see, forexample, Foug`eres (1997), Dette and Pilz (2006), Dette et al. (2006), Chernozhukov et al.(2009) and Anevski and Foug`eres (2007). Most of these authors study continuous settingsand often start with a kernel type estimator of the density, which involves choices of a kerneland of a bandwidth. Our goal here is to investigate method of rearrangement estimators andcompare them to natural alternatives (including the maximum likelihood estimators withand without the assumption of monotonicity) in a setting in which there is less ambiguity inthe choice of an initial or “basic” estimator, namely the setting of estimation of a monotonedecreasing mass function on the non-negative integers N = {0, 1, 2,...}.Suppose that p = {px}x∈Nis a probability mass function; i.e. px≥ 0forallx ∈ Nand!x∈Npx=1. Ourprimaryinteresthereisinthesituationinwhichp is monotonedecreasing: px≥ px+1for all x ∈ N.Thethreeestimatorsofp we study are:(a). the (raw) empirical estimator,(b). the method of rearrangement estimator,(c). the maximum likelihood estimator.1Notice that the empirical estimator is also the maximum likelihood estimator when no shapeassumption is made on the true probability mass function.Much as in the continuous case our considerations here carry over to the case of estimationof unimodal mass functions with a known (fixed) mode; see e.g. Foug`eres (1997), Birg´e(1987), and Alamatsaz (1993). For two recent papers discussing connections and trade-offs between discrete and continuous models in a related problem involving nonparametricestimation of a monotone function, see Banerjee et al. (2009) and Maathuis and Hudgens(2009).Distributions from the monotone decreasing family satisfy ∆px≡ px+1− px≤ 0forallx ∈ N,andmaybewrittenasmixturesofuniformmassfunctionspx="y ≥01y +11{0,...,y}(x) qy. (1.1)Here, the mixing distribution q may be recovered viaqx= −(x +1)∆px, (1.2)for any x ∈ N.Remark 1.1. From the form of the mass function, it follows that px≤ 1/(x +1) forall x ≥ 0.Suppose then that we observe X1,X2,...,Xni.i.d. random variables with values in Nand with a monotone decreasing mass function p.Forx ∈ N,let#pn,x≡ n−1n"i=11{x}(Xi)denote the (unconstrained) empirical estimator of the probabilities px.Clearly,thereisnoguarantee that this estimator will also be monotone decreasing, especially for small samplesize. We next consider two estimators which do satisfy this property: the rearrangementestimator and the maximum likelihood estimator (MLE).For a vector w = {w0,...,wk},letrear(w)denotethereverse-orderedvectorsuchthatw$=rear(w)satisfiesw$0≥ w$1≥ ... ≥ w$k. The rearrangement estimator is then simplydefined as#pRn=rear(#pn).We can also write #pRn,x=sup{u : Qn(u) ≤ x},whereQn(u) ≡ #{x : #pn,x≥ u}.To define the MLE we again need some additional notation. For a vector w = {w0,...,wk},let gren(w)betheoperatorwhichreturnsthevectorofthek +1 slopes oftheleast concavemajorant of the points$%j,j"j=0wi&: j = −1, 0,...,k'.2Here, we assume that!−1j=0wj=0. The MLE, also known as the Grenander estimator, isthen defined as#pGn=gren(#pn).Thus, #pGn,xis the left derivative at x of the least concave majorant (LCM) of the empiricaldistribution function Fn(x)=n−1!ni=11[0,x](Xi)(whereweincludethepoint(−1, 0) to findthe left derivative at x =0). Therefore,bydefinition,theMLEisavectoroflocalaveragesover a partition of {0,...,max{X1,...,Xn}}.Thispartitionischosenbythetouchpointsof the LCM with Fn.Itiseasilycheckedthat#pGncorresponds to the isotonic estimator formultinomial data as described in Robertson et al. (1988), pages 7–8 and 38–39.We begin our discussion with two examples: in the first, p is the uniform distribution,and in the second p is strictly monotone decreasing. Tocomparethethreeestimators,weconsider several metrics: the !knorm for 1 ≤ k ≤∞and the Hellinger distance. Recall thatthe Hellinger distance between two mass functions is given byH2(p, ˜p)=2−1([√p −)˜p]2dµ =2−1"x≥0[√px−)˜px]2,while the !kmetrics are defined as||p − ˜p||k=$*!x≥0|px− ˜px|k+1/k1 ≤ k<∞,supx≥0|px− ˜px| k = ∞.In the examples, we compare the Hellinger norm and the !1and !2metrics, as the behaviourof these differs the most.Example 1. Suppose that p is the uniform distribution on {0,...,5}.Forn =100in-dependent draws from this distribution we observe #pn=(0.20, 0.14, 0.11, 0.22, 0.15, 0.18).Then #pRn=(0.22, 0.20, 0.18, 0.15, 0.14, 0.11), and the MLE may be calculated as #pGn=(0.20, 0.16, 0.16, 0.16, 0 .16, 0.16). The estimators are illustrated in Figure 1 (left). The dis-tances of the estimators from the true mass function p are given in Table 1 (left). Themaximum likelihood estimator #pGnis superior in all three metrics shown. To explore thisrelationship further, we repeated the estimation procedure for 1000 Monte Carlo samples ofsize n = 100 from the uniform distribution. Figure 2 (left) shows boxplots of the metrics forthe three estimators. The figure shows that here the rearrangement and empirical estimatorshave the same behaviour; a relationship which we establish rigorously in Theorem 2.1.Example 2. Suppose that p is the geometric distribution with px=(1− θ)θxfor x ∈ Nand with θ =0.75. For n =100drawsfromthisdistributionweobserve#pn, #pRnand #pGnas shown in Figure 1 (right). The distances of the estimators from the true mass functionp are given in Table 1 (right). Here, #pnis outperformed by #pGnand #pRnin all the metrics,with #pRnperforming better in the !1and !2metrics, but not in

View Full Document


School:
Email:
New Password:
Confirm Password:

UW STAT 561 - Estimation of a Discrete monotone Distribution

Sign up for free to view:

Please select your school