DOC PREVIEW
Princeton COS 424 - Lecture #19

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

COS 424: Interacting with DataLecturer: Rob Schapire Lecture #19Scribe: Jingyuan Wu April 17, 20071 A Density Estimation Problem: Modelling the Habitat ofPlant and Animal SpeciesConservation biologists are often concerned with modeling the population distribution ofplants and animals. In particular, our problem is concerned with modeling the populationdistribution of a particular species across a grid map.1.1 Description of DataIn this problem, two types of data will be available. The first type of data we will have iscalled presence records. Presence records are pixels on the grid map where the speciesof concern was observed. The same pixel may be prese nt multiple times if the species wasobserved more than one time within that pixel. The s ec ond type of data we will have iscalled environmental variables. Each environmental variable will contain informationsuch as average rain fall on a particular pixel. Environmental variable data are availablefor each pixel on the grid map.This problem is not simply one of classifying each point on the map as a habitat ora non-habitat, because we have only positive examples and no negative examples. Justbecause the biologist did not observe the species at a particular location does not enable usto label that location as non-habitat.1.2 Formal Definition of VariablesWe will define the following set of variables:• X is defined as the set of all pixels or locations on the grid map.• |X| is the size of cardinality of the set X. This value is generally very large, rangingfrom tens of thousands to millions.• x1, ..., xm∈ X are the set of pixels that are included as the presence records. Notethat the xis are not necessarily distinct.• f1, ..., fnis defined as the set of features. Each fjis defined for all pixels on the gridmap, i.e. fj: X → <.• π is defined as the true distribution of the species. In other words, π(x) is the fractionof the population living at pixel x ∈ X.The set of features includes the environmental variables, but it may also contain ad-ditional functions derived from the environmental variables such as the average rainfallsquared.1.3 AssumptionsWe assume that the set of presence records x1, ..., xnis chosen i.i.d. according to π. Thismeans that π(x), the probability that x will be chosen as a presence rec ord, is proportionalto the population living at xUnfortunately, the assumptions may affect our model’s ability to approximate the realworld for a number of reasons. First, we have assumed implicitly that the true distributionπ doe s not change with time. This may not be realistic since π may in fact change with dayand night cycles and with seasonal cycles. We have also assumed that there is no samplebias. This assumption means that the biologist sampled all the points in X with equaldiligence. This, however, may not be the case since some locations are definitely harder toaccess than others.It may also be noted that even the assumption that the presence records xiare inde-pendent may be suspe ct since it may be that a biologist is more likely to sample a nearbylocation after having observed a butterfly in the present location.2 Approach One: Maximum Likelihood EstimationTo solve this problem, our goal is to create an estimate of π, call it bπ. One method ofattack is to use Maximum Likelihood Estimation. To pro ce ed, we need to first express bπ ina parametric form. We may begin by choosing a linear parametric form:bπ(x) =nXj=1λjfj(x).However, there are several problems with this simple formulation. First, the values bπ(x)may not lie in [0, ]). Since bπ(x) represent proportions, it would make little sense for themto take on negative values or values greater than one. Also,Px∈Xbπ(x) may not equal one.Again, this equality is required because bπ(x) represent proportions.As a result, we may choose to transform the simple linear form to an exponential formand set bπ(x) equal to:qλ(x) =expPnj=1λjfj(x)Zλ.Here Zλis chosen so thatPxqλ(x) = 1. This form has the advantage that it is strictlypositive and lies [0, 1]. To maximize the likelihood function, then, we would choose bπ to beequal to qbλwhere:bλ = arg maxbλmYi=1qbλ(xi)= arg maxbλmXi=1ln qbλ(xi).This last equation turns out to be concave in λ, which means there exist efficient methodsfor solving it.There are a few problems with this maximum likelihood estimation. One problem isthat this estimation technique is prone to overfitting especially if the number of featuresis large. Another problem is that the transformation of the linear model seems some whatarbitrary. Therefore, we will next explore a different approach to this problem.2Figure 1: average of features over presence records3 Approach Two: Entropy MaximizationBefore we proceed, we make the following definitions. We define the sample average ofthe features as the average of a feature over all presence records. (see figure 1 above).Mathematically, we definebE[fj] =1mmXi=1fj(xi).Also, we define the true expection of the features asEπ[fj] =Xx∈Xπ(x)fj(x).In general, we would expect Eπ[fj] ≈bE[fj] for all j. We may then have as our constraintsfor estimating π thatEbπ[fj] =bE[fj] ∀j.These constraints, in general, do not reduce the possible choices for bπ to 1. As a result, weneed other constraints to narrow down our choices.For example, if we were to estimate π with no information nor data of any kind, thenthe most intuitive estimate for π is a uniform distribution. Therefore, we can choose tohave as our goal the selection of bπ that is as close to the uniform distribution as possiblewhile still satisfying a set of constraints.Closeness to the uniform distribution may be measured by entropy H:H(bπ) = −Xx∈Xbπ(x) ln(bπ(x)).It can be shown that H is never negative, and that it is maximized when bπ is uniform.This maximization follows the principle of maximum entropy. Namely, when modelling adistribution, we should maximize the entropy s ubject to constraints representing what weknow about the distribution. So our problem now is to find bπ that maximizes H(bπ) andsuch thatEbπ[fj] =bE[fj] ∀jbπ(x) ≥ 0Xx∈Xbπ(x) = 1.3Note that here, bπ is not parametrized but is instead manipulated directly.The entropy function turns out to be concave and the constraints are linear. As a result,we know that there is a local maximum which is also the global maximum.We may solve this maximization problem using lagrange multipliers:L =Xx∈Xbπ(x) ln bπ(x) −nXj=1λj


View Full Document

Princeton COS 424 - Lecture #19

Download Lecture #19
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture #19 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture #19 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?