UMD CMSC 828G - Lecture 2 - D3018654

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 828G> Lecture 2

DOC PREVIEW

UMD CMSC 828G - Lecture 2

School name University of Maryland, College Park

Course Cmsc 828g- Advanced Topics in Information Processing:Data-Intensive Computing with MapReduce

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

http://www.cs.umd.edu/class/spring2002/cmsc828g/project.htmweightweightweightweight+ =http://trochim.human.cornell.edu/kb/measlevl.htmHere, numerical values just "name" the attribute uniquely. No ordering impliedI.e. jersey numbers in basketball; a player with number 30 is not moreof anything than a player with number 15; certainly not twice whatever number 15 is.ordinal measurement - attributes can be rank-ordered. Distances between attributes do not have any meaning. i.e., on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to 4? No. The interval between values is not interpretable in an ordinal measure.interval measurement - distance between attributes does have meaning. i.e., when we measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between values is interpretable. average makes sense, however ratios don't - 80 degrees is not twice as hot as 40 degreesratio measurement - an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social research most "count" variables are ratio, for example, the number of clients in past six months. Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months."Hierarchy of Measurementsconsider new order preserving mapping: pain 1-10 pain 1-20; 1→1, 2→2, 3→3, 4→4, 5→5, 6→12≥≤))i(x,),i(x),i(x()i(xp21=21p1k2kkE))j(x)i(x()j,i(d −==( )21n1i2kkkx)i(xn1ˆ −=σ===n1ikk)i(xn1x21p1k2kkkWE))j(x)i(x(w)j,i(d −==height(i)height(j)diameter(i)diameter(j)height2(i)height100(i)…height2(j)height100(j)…=−−=n1i)y)i(y)(x)i(x(n1)Y,X(Cov21n1i22n1i)y)i(y()x)i(x()y)i(y)(x)i(x()Y,X( −−−−===ρbusiness acreagenitrous oxidepercentage of large residential lots+1 0 -1data on characteristicsof Boston surburbsYXρ(X,Y) = ?linear covariance, correlationAre X and Y dependent?( ) ( )()211TMH)j(x)i(x)j(x)i(x)j,i(d −Σ−=−1. It automatically accounts for the scaling of the coordinate axes2. It corrects for correlation between the different features Price:1. The covariance matrices can be hard to determine accurately2. The memory and time requirements grow quadratically rather than linearly with the number of features.λ∞λλ1p1kkk))j(x)i(x()j,i(d −===−=p1kkk)j(x)i(x)j,i(d)j(x)i(xmax)j,i(dkkk−=000110110011nnnnnn++++01101111nnnn++p1p)p(itlog−=flattenProblems:– introduces statistical skew– loses relational structure• incapable of detecting link-based patterns– must fix attributes in advance• Principles of Data Mining, Hand, Mannila, Smyth. MIT Press, 2001.• Trochim, William M. The Research Methods Knowledge Base, 2nd Edition. (version current as of 2001). • Pattern Recognition for HCI. Richard Duda,

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

UMD CMSC 828G - Lecture 2

Sign up for free to view:

Please select your school