SWARTHMORE PHYS 120 - Power-law distributions in empirical data

Unformatted text preview:

arXiv:0706.1062v1 [physics.data-an] 7 Jun 2007Power-law distributions in empirical dataAaron Clauset,1, 2Cosma Rohilla Shalizi,3and M. E. J. Newman41Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA2Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA3Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA4Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109,USAPower-law distributions occur in many situations of scientific interest and have significant conse-quences for our understanding of natural and man-made phenomena. Unfortunately, the empiricaldetection and characterization of power laws is made difficult by the large fluctuations that occurin the tail of the distribution. In particular, standard methods such as least-squares fitting areknown to produce systematically biased estimates of parameters for power-law distributions andshould not be used in most circumstances. Here we describe statistical techniques for makingaccurate parameter estimates for power-law data, based on maximum likelihood methods and theKolmogorov- Smirnov statistic. We also show how to tell whether the data follow a power-law dis-tribution at all, defining quantitative measures that indicate when the p ower law is a reasonablefit to the data and when it i s not. We demonstrate these methods by applying them to twenty-four real-world data sets from a range of different disciplines. Each of the data sets has beenconjectured previously to follow a power-law distribution. In some cases we find these conjecturesto be consistent with the data while in others the power law is ruled out.PACS numbers: 02.50.Tt,02.50.Ng,89.75.DaKeywords: Power-law distributions; Pareto; Zipf; maximum likelihood; heavy-tailed distributions; likelihood ratiotest; model selectionI. INTRODUCTIONScientists have learned many things from observationof the statistical distributions of experimental qua ntitiessuch as the lifetimes of excited atomic or particle s tates,populations of a nima ls, plants, or bac teria, prices on thestock market, or the arrival times of mes sages sent acros sthe Internet. Many, perhaps most, s uch quantities havedistributions that are tightly clustered around their av-erage values. That is, these distributions place a triv-ial amount of probability far from the mean a nd hencethe mean is representative of most observations. For in-stance, it is a useful statement to say that most adultmale Americans are about 180cm tall, because no one de-viates very far from this average figure. E ven the largestdeviations, which are exceptionally rare, are still onlyabout a factor of two from the mean in either directionand are well characterized by quoting a simple standarddeviation.Not all distributions fit this pattern, however, andwhile those that do not are often considered problem-atic or defective because they are not well characterizedby their mean and sta ndard deviation, they are at thesame time some of the most interesting of all scientificobservations. The fact that they cannot be character-ized as simply as other measurements is often a s ign ofcomplex underlying processes that merit further study.Among such distributions, the power law has attractedparticular interest over the years fo r its mathematicalproperties, which sometimes lead to surprising physicalconsequences, and for its appearance in a diverse rangeof natural and man-made phenomena. The sizes of solarflares, the populations of cities, and the intensities ofearthquakes, for example, are all quantities whose dis-tributions are thought to follow power laws. Quantitiessuch as these are not well characterized by their averages.For instance, according to the 200 0 US Census, the av-erage population of a city, town, or village in the UnitedStates is 8226. But this statement is not a useful onefor most purposes beca us e a significant fraction of thetotal po pulation lives in cities (New York, Los Angeles,etc.) whose population differs from the mea n by severalorders of magnitude. Extensive discussions of this andother properties of power laws can be found in the re-views by Mitzenmacher (2004) and Newman (2005), andreferences therein.Power laws are the focus of this article. Specifically,we a ddress a thorny and recurring issue in the scientificliterature, the question of how to recognize a power lawwhen we see one. A quantity x obeys a power law if it isdrawn from a probability distr ibutionp(x) ∝ x−α, (1)where α is a consta nt parameter of the distributionknown as the expon ent or scaling parameter. In real-world situations the scaling parameter typically lies inthe range 2 < α < 3, although there are occasional ex-ceptions.In practice, we rarely, if ever, know for certain thatan observed quantity is drawn from a power- law distri-bution. Instead, the best we can typically do is to saythat our observations are consistent with a model of theworld in which x is drawn from a distribution of the formEq. (1). In this paper we explain how one reaches conclu-sions of this kind in a reliable fashion. Practicing what we2preach, we also apply our methods to a large number ofdata sets describing observations of rea l-world phenom-ena that have at one time or another been claimed tofollow power laws. In the process, we demonstrate thatseveral of them cannot by any stretch of the imaginationbe considered to follow power laws, while for others thepower-law hypothesis appears to be a good one, or atleast is not firmly ruled out.II. FUNDAMENTAL PROPERTIES OF POWER LAWSBefore turning to our main topic of discus sion, wefirst consider so me fundamental mathematical issues thatwill be impo rtant for what follows. Further detailson the mathematics of power laws can be found inMitzenmacher (2004) and Newman (2005).A. Continuous and discrete power-law behaviorPower-law distributions come in two basic flavors: con-tinuous distributions governing continuous r e al numbersand discrete distributions where the quantity of interestcan take only a discrete set of values, normally positiveinteg e rs.Let x represent the quantity whose distribution we areinter e sted in. A continuous power-law distr ibution is onedescribed by a probability density p(x) such thatp(x) dx = Pr(x ≤ X < x + dx) = Cx−αdx, (2)where X is the observed value a nd C is a normaliza-tion constant. Clearly this density diverges as x → 0 soEq. (2) cannot hold for all x


View Full Document

SWARTHMORE PHYS 120 - Power-law distributions in empirical data

Documents in this Course
Load more
Download Power-law distributions in empirical data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Power-law distributions in empirical data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Power-law distributions in empirical data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?