http://www.cs.umd.edu/class/spring2002/cmsc828g/project.htmweightweightweightweight+ =http://trochim.human.cornell.edu/kb/measlevl.htmHere, numerical values just "name" the attribute uniquely. No ordering impliedI.e. jersey numbers in basketball; a player with number 30 is not moreof anything than a player with number 15; certainly not twice whatever number 15 is.ordinal measurement - attributes can be rank-ordered. Distances between attributes do not have any meaning. i.e., on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to 4? No. The interval between values is not interpretable in an ordinal measure.interval measurement - distance between attributes does have meaning. i.e., when we measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between values is interpretable. average makes sense, however ratios don't - 80 degrees is not twice as hot as 40 degreesratio measurement - an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social research most "count" variables are ratio, for example, the number of clients in past six months. Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months."Hierarchy of Measurementsconsider new order preserving mapping: pain 1-10 pain 1-20; 1→1, 2→2, 3→3, 4→4, 5→5, 6→12≥≤))i(x,),i(x),i(x()i(xp21=21p1k2kkE))j(x)i(x()j,i(d −==( )21n1i2kkkx)i(xn1ˆ −=σ===n1ikk)i(xn1x21p1k2kkkWE))j(x)i(x(w)j,i(d −==height(i)height(j)diameter(i)diameter(j)height2(i)height100(i)…height2(j)height100(j)…=−−=n1i)y)i(y)(x)i(x(n1)Y,X(Cov21n1i22n1i)y)i(y()x)i(x()y)i(y)(x)i(x()Y,X( −−−−===ρbusiness acreagenitrous oxidepercentage of large residential lots+1 0 -1data on characteristicsof Boston surburbsYXρ(X,Y) = ?linear covariance, correlationAre X and Y dependent?( ) ( )()211TMH)j(x)i(x)j(x)i(x)j,i(d −Σ−=−1. It automatically accounts for the scaling of the coordinate axes2. It corrects for correlation between the different features Price:1. The covariance matrices can be hard to determine accurately2. The memory and time requirements grow quadratically rather than linearly with the number of features.λ∞λλ1p1kkk))j(x)i(x()j,i(d −===−=p1kkk)j(x)i(x)j,i(d)j(x)i(xmax)j,i(dkkk−=000110110011nnnnnn++++01101111nnnn++p1p)p(itlog−=flattenProblems:– introduces statistical skew– loses relational structure• incapable of detecting link-based patterns– must fix attributes in advance• Principles of Data Mining, Hand, Mannila, Smyth. MIT Press, 2001.• Trochim, William M. The Research Methods Knowledge Base, 2nd Edition. (version current as of 2001). • Pattern Recognition for HCI. Richard Duda,
View Full Document