MIT HST 950J - Introduction to Modeling - D2028635

Home> Schools> Massachusetts Institute of Technology> (HST) > HST 950J> Introduction to Modeling

DOC PREVIEW

MIT HST 950J - Introduction to Modeling

School name Massachusetts Institute of Technology

Course Hst 950j- Biomedical Computing

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Introduction to Modeling 6.872/HST950Why build Models? • To predict (identify) something • Diagnosis • Best therapy • Prognosis • Cost • To understand something • Structure of model may correspond to structure of realityWhere do models come from? • • • • Assumes uniform priors over all hypotheses in the space • A-priori knowledge, expressed in • Structure of the space of models • • Adjustments to observed data Pure induction from data Even so, need some “space” of models to explore Maximum A-posteriori Probability (MAP) • Maximum Likelihood (ML)An Example (Russell & Norvig) • Surprise Candy Corp. makes two ﬂavors of candy: cherry and lime • Both ﬂavors come in the same opaque wrapper • Candy is sold in large bags, which have one of the following distributions of ﬂavors, but are visually indistinguishable: • h1: 100% cherry • h2: 75% cherry, 25% lime • h3: 50% cherry, 50% lime • h4: 25% cherry, 75% lime • h5: 100% lime • Relative prevalence of these types of bags is (.1, .2, .4, .2, .1) • As we eat our way through a bag of candy, predict the ﬂavor of the next piece; actually a probability distribution.Bayesian Learning • Calculate the probability of each hypothesis given the data • To predict the probability distribution over an unknown quantity, X, • If the observations d are independent, then • E.g., suppose the ﬁrst 10 candies we taste are all limeh1: 100% cherry h2: 75% cherry, 25% lime h3: 50% cherry, 50% lime Learning Hypothesesh4: 25% cherry, 75% lime h5: 100% lime and Predicting from Them • (a) probabilities of hi after k lime candies; (b) prob. of next lime • Image by MIT OpenCourseWare.MAP prediction: predict just from most probable hypothesis • After 3 limes, h5 is most probable, hence we predict lime • Even though, by (b), it’s only 80% probable 0 2 4 6 8 100 2 4 6 8 1000.20.40.60.810.40.50.60.70.80.91Number of samples in da bNumber of samples in dPosterior probability of hypothesisProbability that next candy is limeP(h1 | d) P(h2 | d) P(h3 | d) P(h4 | d) P(h5 | d)Observations • Bayesian approach asks for prior probabilities on hypotheses! • Natural way to encode bias against complex hypotheses: make their prior probability very low • Choosing hMAP to maximize • is equivalent to minimizing • but as we know that entropy is a measure of information, these two terms are • # of bits needed to describe the data given hypothesis • # bits needed to specify the hypothesis • Thus, MAP learning chooses the hypothesis that maximizes compression of the data; Minimum Description Length principle • Regularization is similar to 2nd term—penalty for complexity • Assuming uniform priors on hypotheses makes MAP yield hML, the maximum likelihood hypothesis, which maximizesLearning More Complex Hypotheses • Input: • Set of cases, each of which includes • numerous features: categorical labels, ordinals, continuous • these correspond to the independent variables • Output: • For each case, a result, prediction, classiﬁcation, etc., corresponding to the dependent variable • In regression problems, a continuous output • a designated feature the model tries to predict • In classiﬁcation problems, a discrete output • the category to which the case is assigned • Task: learn function f(input)=output • that minimizes some measure of errorLinear Regression • General form of the function • For each case: • Find to minimize some function of over all • e.g., mean squared error:Logistic Regression • Logistic function: • E.g, how risk factors contribute to probability of death are the log odds ratios •More sophisticated models • Nearest Neighbor Methods • Classiﬁcation Trees • Artiﬁcial Neural Nets • Support Vector Machines • Bayes Networks (much on this, later) • Rough Sets, Fuzzy Sets, etc. (see 6.873/HST951 or other ML classes)How? • Given: pile of training data, all cases labeled with gold standard outcome • Learn “best” model • Gather new test data, also all labeled with outcomes • Test performance of model on new test data • Simple, no?Simplest Example • Relationship between a diagnostic conclusion and a diagnostic test Test Positive Test Negative Disease Present True Positive False Negative TP+FN Disease Absent False Positive True Negative FP+TN TP+FP FN+TNDeﬁnitions Test Positive Test Negative Disease Present True Positive False Negative TP+FN Disease Absent False Positive True Negative FP+TN TP+FP FN+TN Sensitivity (true positive rate): TP/(TP+FN) ! False negative rate: 1-Sensitivity = FN/(TP+FN) Specificity (true negative rate): TN/(FP+TN) ! False positive rate: 1-Speciﬁcity = FP/(FP+TN) Positive Predictive Value (PPV): TP/(TP+FP) Negative Predictive Value (NPV): TN/(FN+TN)Test Thresholds + -FPFN TWonderful Test + -FPFN TTest Thresholds Change Trade-off between Sensitivity and Speciﬁcity + -FPFN TReceiver Operator Characteristic TPR (sensitivity) (ROC) Curve 0 FPR (1-speciﬁcity)1 0 1 TTPR What makes a better test? 0 FPR (1-speciﬁcity)1 (sensitivity) 0 1 worthless superb OKNeed to explore many models • Remember: • training set => model • model + test set => measure of performance • But • How do we choose the best family of models? • How do we choose the important features? • Models may have structural parameters • Number of hidden units in ANN • Max number of parents in Bayes Net • Parameters (like the betas in LR), and meta-parameters • Not legitimate to “try all” and report the best !!!!!!!!!!!!!!!!!!The Lady Tasting Tea • R.A. Fisher & the Lady • B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea • Fisher was skeptical that she could distinguish • Possible resolutions • Reason about the chemistry of tea and milk • Milk ﬁrst: a little tea interacts with a lot of milk • Tea ﬁrst: vice versa • Perform a “clinical trial” • Ask her to determine order for a series of test cups • Calculate probability that her answers could have occurred by chance guessing; if small, she “wins” • ... Fisher’s Exact Test • Signiﬁcance testing • Reject the null hypothesis (that it happened by chance) if its probability is < 0.1, 0.05, 0.01, 0.001, ..., 0.000001, ..., ????How to deal with multiple testing • Suppose Ms. Bristol had tried this test 100 times, and passed once.Would you be convinced of her ability to

View Full Document