Parameter Learning in MNOutlineLog-linear Markov network (most common representation)Generative v. Discriminative classifiers – A reviewLog-linear CRFs (most common representation)Example: Image SegmentationSlide 7Slide 8Slide 9Example: Inference for LearningSlide 11Representation EquivalenceSlide 13Slide 14Slide 15Slide 16Iterative Proportional Fitting (IPF)Parameter Sharing in your HWIPF parameter sharingSlide 20Parameter Learning in MNOutline•CRF•Learning CRF for 2-d image segmentation•IPF parameter sharing revisited10-708 – Carlos Guestrin 2006-20083Log-linear Markov network(most common representation)•Feature is some function [D] for some subset of variables D–e.g., indicator function•Log-linear model over a Markov network H:–a set of features 1[D1],…, k[Dk]•each Di is a subset of a clique in H•two ’s can be over the same variables–a set of weights w1,…,wk•usually learned from data– iiiiDwZXP )(exp1)(Generative v. Discriminative classifiers – A review•Want to Learn: h:X Y–X – features–Y – target classes•Bayes optimal classifier – P(Y|X)•Generative classifier, e.g., Naïve Bayes:–Assume some functional form for P(X|Y), P(Y)–Estimate parameters of P(X|Y), P(Y) directly from training data–Use Bayes rule to calculate P(Y|X= x)–This is a ‘gene rative’ model•Indirect computation of P(Y|X) through Bayes rule•But, can generate a sample of the data, P(X) = y P(y) P(X|y)•Discriminative classifiers, e.g., Logistic Regression:–Assume some functional form for P(Y|X)–Estimate parameters of P(Y|X) directly from training data–This is the ‘discriminative’ model•Directly learn P(Y|X)•But cannot obtain a sample of the data, because P(X) is not available410-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20085Log-linear CRFs(most common representation)•Graph H: only over hidden vars Y1,..,YP–No assumptions about dependency on observed vars X–You must always observe all of X•Feature is some function [D] for some subset of variables D–e.g., indicator function, •Log-linear model over a CRF H:–a set of features 1[D1],…, k[Dk]•each Di is a subset of a clique in H•two ’s can be over the same variables–a set of weights w1,…,wk•usually learned from data– iiiiXDwXZXYP ),(exp)(1)|(Example: Image Segmentation- A set of features 1[D1],…, k[Dk]–each Di is a subset of a clique in H–two ’s can be over the same variablesy1y2y3y4y5y6y7y8y9We will define features as follows:- : measures compatibility of node color and its segmentation-A set of indicator features triggered for each edge labeling pair {ff,bb,fb,bf}-This is a allowed since we can define many features overr the same subset of variables),( yxiiiiXDwXZXYP ),(exp)(1)|(Example: Image Segmentation- A set of features 1[D1],…, k[Dk]–each Di is a subset of a clique in H–two ’s can be over the same variables fyGMMxPbyGMMxPyxfb|log|log),(otherwisefyfyyyjijiff0,1),(otherwisebyfyyyjijifb0,1),(otherwisefybyyyjijibf0,1),(otherwisebybyyyjijibb0,1),(y1y2y3y4y5y6y7y8y9iiiiXDwXZXYP ),(exp)(1)|(Example: Image Segmentation- A set of features 1[D1],…, k[Dk]–each Di is a subset of a clique in H–two ’s can be over the same variables Vi Eij bbbffbffmjimmiiyywyxXYP},,,{),(),(exp)|( Vi bbbffbffmmmiiCwyxXYP},,,{),(exp)|( EijjiEijjimmmyyIyyC )(,We need to learn parameters wmy1y2y3y4y5y6y7y8y9-Now we just need to sum these featuresiiiiXDwXZXYP ),(exp)(1)|(Example: Image Segmentation Vi bbb ffbffmmmiiCwyxxYP},,,{),(exp)|( ][|][):(1nXCEnCwwDatamwNnmmRequires inference using the current parameter estimates y1y2y3y4y5y6y7y8y9Count for features m in data n EijjiEijjimmmyyIyyC )(,Given N data points (images and their segmentations)Example: Inference for Learning Vi bbbffbffmmmiiCwyxfXYP},,,{),(exp)|( ][|][):(1nXCEnCwwDatamwNnmm ijjifbwnXbyfyIEnXCE ][|),(][|How to compute E[Cfb|X[n]] ijjiwfbwnXbyfyIEnXCE ][|),(][| ijjiwfbwnXbyfyPnXCE ][|,][|y1y2y3y4y5y6y7y8y9 EijjiEijjimmmyyIyyC )(,Example: Inference for Learning Vi bbbffbffmmmiiCwyxfXYP},,,{),(exp)|( ][|][):(1nXCEnCwwDatamwNnmmHow to compute E[Cfb|X[n]] ijjiwfbwnXbyfyPnXCE ][|,][|y1y2y3y4y5y6y7y8y9 EijjiEijjimmmyyIyyC )(,Representation Equivalencey1y2y3y4y5y6y7y8y9 Vi bbbffbffmmmiiCwyxXYP},,,{),(exp)|( EijjiEijjimmmyyIyyC )(, fyGMMxPbyGMMxPyxfb|log|log),(Log linear representationTabular MN representation from HW4 EijjiViiiyyyxXYP ,,)|( ViiiyiViViyiViyiViiiyxGMMxPGMMxPGMMxPyxiii),(exp|logexp|logexp|,Representation Equivalencey1y2y3y4y5y6y7y8y9 Vi bbbffbffmmmiiCwyxXYP},,,{),(exp)|( EijjiEijjimmmyyIyyC )(, fyGMMxPbyGMMxPyxfb|log|log),(Log linear representationTabular MN representation from HW4 EijjiViiiyyyxXYP ,,)|(Now do it over the edge potential)()()()(),(bbyyIbbbfyyIbffbyyIfbffyyIffjijijijijiyyThis is correct as for every assignment to yiyj we select one value from the table},,,{)(),(bbfbbfffmmyyImjijiyyTabular MN representation from HW4 EijjiViiiyyyxXYP ,,)|(Now do it over the edge potentialThis is correct as for every assignment to yiyj we select one value from the
View Full Document