**Unformatted text preview:**

A/0.9BCD/0.1EReview for May 6, 2019 Final Exam1) D-Separability and Computation of Probabilities Assume that the following belief network is given, consisting of nodes A, B, C, D, and E that can take values of true and false.a) Are C and E independent; is C| and E| d-separable? Give a reason for your answer! denotes “no evidence given”[2]b) Is E|CD d-separable from A|CD? Give a reason for your answer! [3]1A/0.9BCD/0.1Ea) Are C and E independent; is C| and E| d-separable? Give a reason for your answer! denotes “no evidence given”[2]There are two paths from C to EC-B-EC-D-ENeither path is blocked —it would only be blocked if both arrows would be pointing to B, and both arrow would be pointing to D respectively; consequently, C and E are not d-separable.b) Is E|CD d-separable from A|CD? Give a reason for your answer! [3]There are 2 paths between A and E:A-C-B-E neither node C(in evidence) satisfies patterns 1a,1b or 2 nor node B (not in evidence) satisfies pattern 3; therefore this path is not blockedA-C-D-E node D(in evidence) satisfies pattern 1a; consequently, this path is blocked.However, as not all paths are blocked between E and A, A|CD is not d-separable from E|CD 2c) Naïve BayesNaïve Bayesian systems make the conditional independence assumption when for example computing P(D|S1,S2,S4). What assumptions are exactly made? What advantages do you see in the approach? What are the drawbacks of making the conditional independence assumption? 3P(D|S1,S2,S3)= P(D)*P(S1,S2,S3|D)/P(S1,S2,S3)P(D)*P(S1|D)/P(S1)*P(S2|D)/P(S2)*P(S3|D)/P(S3)We have to assume that S1|D, S2|D, S3|D are independent and if the exact probability of P(D|S1,S2,S3) needs to be computed we additionally have to assume that S1, S2, and S3 are independent. If Naïve Bayes is just used for a classification problem, it is not necessary to know P(S1,S2,S3) as we are only interested in know if P(D|S1,S2,S3) is larger/smaller than P(D’|S1,S2,S3); consequently the exact value for P(S1,S2,S3) is not relevant!Advantage: low knowledge cost; it is not necessary to acquire a lot of probabilitieswhich might be expensive.Disadvantage: obtained probabilities might not be correct42) FOPL Logic Proofs using Resolution (A1) "Some dolphins eat fish."(A2) "No vegetarian eats fish."(A3) "Every dolphin is intelligent."(A4) "The dolphin 'Flipper' is a vegetarian." (A) "Some intelligent beings are vegetarians."(B) "Flipper is intelligent."(C) "All fish eaters are intelligent."(D) "Some dolphins are no vegetarians"Proof using resolutiona) A1,A2,A3,A4|-A b) A1,A2,A3,A4|-B c) A1,A2,A3,A4|-Cd) A1, A2, A3, A4 |=D (A1) ]x (dolphin(x) ^ eatfish(x) )(A2) ~]v (vegetarian(v) ^ eatfish(v) )(A3) Vd (dolphin(d) --> intelligent(d) )(A4) dolphin(Flipper) ^ vegetarian(Flipper) (~A) ~]i (intelligent(i) ^ vegetarian(i))(~B) ~intelligent(Flipper)(~C) ~Ve (eatfish(e) --> intelligent(e) )(~D) ~]a (dolphin(a) ^ ~vegetarian(a))(1a) dolphin(D)(1b) eatfish(D)(2) ~vegetarian($v) v ~eatfish($v) (3) ~dolphin($d) v intelligent($d)(4a) dolphin(Flipper)(4b) vegetarian(Flipper)(5) ~intelligent($i) v ~vegetarian($i)(6) ~intelligent(Flipper)(7a) eatfish(E)(7b) ~intelligent(E)(8) ~dolphin($a) v vegetarian($a)5The 4 Proofs using Resolution(1a) dolphin(D)(1b) eatfish(D)(2) ~vegetarian($v) v ~eatfish($v) (3) ~dolphin($d) v intelligent($d)(4a) dolphin(Flipper)(4b) vegetarian(Flipper)(5) ~intelligent($i) v ~vegetarian($i)(6) ~intelligent(Flipper)(7a) eatfish(E)(7b) ~intelligent(E)(8) ~dolphin($a) v vegetarian($a) A1,A2,A3 |- A (x1) ~dolphin($d) v ~vegetarian($d) from (5) and (3)(x2) ~vegetarian(Flipper) from (x1) and (4a)(x3) empty clause from (x2) and (4b) A1,A2,A3 |- B(y1) ~dolphin(Flipper) from (6) and (3)(y2) empty clause from (y1) and (4a) A1,A2,A3 |- C(z1) ~vegetarian(E) from (7a) and (2)(z2) ~dolphin(E) from (7b) and (3) ... (no proof found). A1,A2,A3 |- D(w1) ~eatfish($v) v ~dolphin($v) from (2) and (8)(w2) ~eatfish(D) from (w1) and (1a)(w3) empty clause from (w2) and (1b)63) Questions a) What does unification do?Find the most general match for two expressions e.g. unify(P($x,$y,2), P($u, 5, 2))returns (($y 5) ($x $u)) or (($y 5) ($u $x)) but not (($x 9)($u,9)($y 5))b) When proving A,B,C|-D resolution conducts a proof a contradiction; what does this mean?We assume A, B, C, and ~D is true, and show that this leads to a contradiction by deriving the empty clause . c) Assume P(A|B) is 0.2 and P(B) is 0.7 and P(C|A,B) is 0.1; compute P(A,B,C) (this is the same as P(ABC))P(A,B,C)=P(B)*P(A,C|B)=P(B)*P(A|B)*P(C|A,B)=0.7*0.2*0.1=…Using P(A,B)=P(A)*P(B|A)=P(B)*P(A|B)d) SVMS are often using in conjunction with Kernel functions to learn to classify 2 classes based on a training set D; how is the hyperplane obtained in this case?1) D’=(D) 2) Learn a hyperplane for (D)─in the “mapped” space e) Why is leave-one-out cross validation more popular in data science contests than 10-fold cross validation?7Leave-one out cross-validation has less bias, as exactly the same training set/test set pairs have to be used, making it more difficult to “cheat”.

View Full Document