DOC PREVIEW
MIT HST 950J - Introduction to Modeling

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Modeling 6.872/HST950Why build Models? • To predict (identify) something • Diagnosis • Best therapy • Prognosis • Cost • To understand something • Structure of model may correspond to structure of realityWhere do models come from? • • • • Assumes uniform priors over all hypotheses in the space • A-priori knowledge, expressed in • Structure of the space of models • • Adjustments to observed data Pure induction from data Even so, need some “space” of models to explore Maximum A-posteriori Probability (MAP) • Maximum Likelihood (ML)An Example (Russell & Norvig) • Surprise Candy Corp. makes two flavors of candy: cherry and lime • Both flavors come in the same opaque wrapper • Candy is sold in large bags, which have one of the following distributions of flavors, but are visually indistinguishable: • h1: 100% cherry • h2: 75% cherry, 25% lime • h3: 50% cherry, 50% lime • h4: 25% cherry, 75% lime • h5: 100% lime • Relative prevalence of these types of bags is (.1, .2, .4, .2, .1) • As we eat our way through a bag of candy, predict the flavor of the next piece; actually a probability distribution.Bayesian Learning • Calculate the probability of each hypothesis given the data • To predict the probability distribution over an unknown quantity, X, • If the observations d are independent, then • E.g., suppose the first 10 candies we taste are all limeh1: 100% cherry h2: 75% cherry, 25% lime h3: 50% cherry, 50% lime Learning Hypothesesh4: 25% cherry, 75% lime h5: 100% lime and Predicting from Them • (a) probabilities of hi after k lime candies; (b) prob. of next lime • Image by MIT OpenCourseWare.MAP prediction: predict just from most probable hypothesis • After 3 limes, h5 is most probable, hence we predict lime • Even though, by (b), it’s only 80% probable 0 2 4 6 8 100 2 4 6 8 1000.20.40.60.810.40.50.60.70.80.91Number of samples in da bNumber of samples in dPosterior probability of hypothesisProbability that next candy is limeP(h1 | d) P(h2 | d) P(h3 | d) P(h4 | d) P(h5 | d)Observations • Bayesian approach asks for prior probabilities on hypotheses! • Natural way to encode bias against complex hypotheses: make their prior probability very low • Choosing hMAP to maximize • is equivalent to minimizing • but as we know that entropy is a measure of information, these two terms are • # of bits needed to describe the data given hypothesis • # bits needed to specify the hypothesis • Thus, MAP learning chooses the hypothesis that maximizes compression of the data; Minimum Description Length principle • Regularization is similar to 2nd term—penalty for complexity • Assuming uniform priors on hypotheses makes MAP yield hML, the maximum likelihood hypothesis, which maximizesLearning More Complex Hypotheses • Input: • Set of cases, each of which includes • numerous features: categorical labels, ordinals, continuous • these correspond to the independent variables • Output: • For each case, a result, prediction, classification, etc., corresponding to the dependent variable • In regression problems, a continuous output • a designated feature the model tries to predict • In classification problems, a discrete output • the category to which the case is assigned • Task: learn function f(input)=output • that minimizes some measure of errorLinear Regression • General form of the function • For each case: • Find to minimize some function of over all • e.g., mean squared error:Logistic Regression • Logistic function: • E.g, how risk factors contribute to probability of death are the log odds ratios •More sophisticated models • Nearest Neighbor Methods • Classification Trees • Artificial Neural Nets • Support Vector Machines • Bayes Networks (much on this, later) • Rough Sets, Fuzzy Sets, etc. (see 6.873/HST951 or other ML classes)How? • Given: pile of training data, all cases labeled with gold standard outcome • Learn “best” model • Gather new test data, also all labeled with outcomes • Test performance of model on new test data • Simple, no?Simplest Example • Relationship between a diagnostic conclusion and a diagnostic test Test Positive Test Negative Disease Present True Positive False Negative TP+FN Disease Absent False Positive True Negative FP+TN TP+FP FN+TNDefinitions Test Positive Test Negative Disease Present True Positive False Negative TP+FN Disease Absent False Positive True Negative FP+TN TP+FP FN+TN Sensitivity (true positive rate): TP/(TP+FN) ! False negative rate: 1-Sensitivity = FN/(TP+FN) Specificity (true negative rate): TN/(FP+TN) ! False positive rate: 1-Specificity = FP/(FP+TN) Positive Predictive Value (PPV): TP/(TP+FP) Negative Predictive Value (NPV): TN/(FN+TN)Test Thresholds + -FPFN TWonderful Test + -FPFN TTest Thresholds Change Trade-off between Sensitivity and Specificity + -FPFN TReceiver Operator Characteristic TPR (sensitivity) (ROC) Curve 0 FPR (1-specificity)1 0 1 TTPR What makes a better test? 0 FPR (1-specificity)1 (sensitivity) 0 1 worthless superb OKNeed to explore many models • Remember: • training set => model • model + test set => measure of performance • But • How do we choose the best family of models? • How do we choose the important features? • Models may have structural parameters • Number of hidden units in ANN • Max number of parents in Bayes Net • Parameters (like the betas in LR), and meta-parameters • Not legitimate to “try all” and report the best !!!!!!!!!!!!!!!!!!The Lady Tasting Tea • R.A. Fisher & the Lady • B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea • Fisher was skeptical that she could distinguish • Possible resolutions • Reason about the chemistry of tea and milk • Milk first: a little tea interacts with a lot of milk • Tea first: vice versa • Perform a “clinical trial” • Ask her to determine order for a series of test cups • Calculate probability that her answers could have occurred by chance guessing; if small, she “wins” • ... Fisher’s Exact Test • Significance testing • Reject the null hypothesis (that it happened by chance) if its probability is < 0.1, 0.05, 0.01, 0.001, ..., 0.000001, ..., ????How to deal with multiple testing • Suppose Ms. Bristol had tried this test 100 times, and passed once.Would you be convinced of her ability to


View Full Document

MIT HST 950J - Introduction to Modeling

Download Introduction to Modeling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Modeling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Modeling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?