UCSB CS 290 - Association Rule Learning - D2534191

Home> Schools> University of California, Santa Barbara> (CS) > CS 290> Association Rule Learning

DOC PREVIEW

UCSB CS 290 - Association Rule Learning

School name University of California, Santa Barbara

Course Cs 290- Big Data and Networks

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

UnsupervisedUnsupervisedData MiningData MiningUnsupervised Unsupervised Data MiningData MiningAssociation Rule LearninggAssociation Rule Analysis Popular in mining data bases Automated discovery of sets of variables that occur frequently or one(s) leading to other(s)2PR , ANN, & MLAssociation Rule Analysis (cont)3PR , ANN, & MLMarket Basket Analysis  Retail outletsPlacement of merchandises (affinity positioning)Placement of merchandises (affinity positioning) Cross advertisingBkBanks Insurance link analysis for fraud Medical symptom analysis4PR , ANN, & MLCo-occurrence MatrixCustomer 1: beer, pretzels, potato chips, aspirinCustomer 2: diapers, baby lotion, grapefruit juice, baby food, milkCustomer 3: soda, potato chips, milkCustomer 3: soda, potato chips, milkCustomer 4: soup, beer, milk, ice creamCustomer 5: soda, coffee, milk, breadCustomer 6: beer, potato chips Interesting cases can have 10^4 variables and 10^8 of samplesCiliiiti5Co-occurrence gives only pair-wise association PR , ANN, & MLPractical Solutions Run up against curse-of-dimensionalitiesWith 10^4 variables each with many possibleWith 10^4 variables, each with many possible values, need very large # of samples to populate the space,“bump”hunting in fine scale is notthe space, bump hunting in fine scale is not possible Look for regions in the probability spaces with high density Even for binary variables, there are 2^k (e.g., 2^{1 000} ibl 1 0lh2^{1,000} possible 1,0-tuples, must have efficient search algorithms 6PR , ANN, & MLSimplification Assuming binary variablesIf t f th bi iIf not, force them binaries  Instead of 6 different education levels, just 2 (ll db bl )(college and above, or below) Change of variables Initially (X1,…, Xp) Each with (S1, … Sp) possible values K = S1+ … Sp Create Zk binary variables7 1 if the corresponding variable Xi assuming value Sij 0 otherwisePR , ANN, & MLApriori Algorithm Threshold t 1stpass:  Single-variable set: must have occurrence larger than t 2ndpass: Pair-wise variable sets: together must have occurrence large than t… mth pass: Only those tuples in (m-1)thpass have probability yp()ppyhigher than t are considered To avoid combinatorial explosion, t cannot 8be too lowPR , ANN, & MLTuples to Rules Tuples {Zk} to A=>BA antecedentA antecedent B consequentT(A >B) t b bilit fT(A=>B): support, probability of simultaneously observing A and B P(A&B)C(A=>B) = T(A=>B)/T(A): confidenceC(A=>B) = T(A=>B)/T(A): confidence, probability of P(B|A)L(A=>B) = C(A=>B)/T(B): lift probability ofL(A=>B) = C(A=>B)/T(B): lift, probability of P(A&B)/(P(A)P(B))9PR , ANN, & MLExamples K={peanut butter, jelly, bread}{tbttjll}>bd{peanut butter, jelly} => bread Support of 0.03: if {peanut butter, jelly, bread} appears in 3% of sample baskets Confidence of 82%: if peanut butter and jelly are purchased, then 82% time bread is also Lift of 1.9: If bread appear in 43% of sampleLift of 1.9: If bread appear in 43% of sample baskets, then 0.82/0.43=1.910PR , ANN, & ML11PR , ANN, &

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

UCSB CS 290 - Association Rule Learning

Sign up for free to view:

Please select your school