References Headers Leslie Pack Kaelbling Michael L Littman and Andrew W Moore Reinforcement Learning A Survey Journal of Artificial Intelligence Research pages 237 285 May 1996 2 0 1 2 3 4 4 5 5 4 6 6 8 2 7 5 7 5 79 4 4 4 3 6 4 6 NE named entity recognition CO coreference resolution TE template element construction TR template relation construction 4 ST scenario template production 6 6 455 6 6 6 5 A 6 455 6 6 4 6 6 6 6 5 2 4 6 5 5 56 56 2 2 4 2 5 A 0 B 2 D A 1 8 6 4 4 A 3 D I C 6 G B HC D E FA D 666 A D D 6 A 6 6 6 J 4 2 6 6 7 8 2 Language Input Trainer Answers Model 8 4 8 Language Input Decoder 6 Answers 6 4 G 4 A9 F 2 6 6 F 6 2 G 4 8 8 2 K 0 2 2 L1F 4 8 2 6 0 4 0 2 16 2 1F 9 8 2 2 8 84 A 4 2 M MM B6C B6C N 8K N O PH 4 E B8 B B A C 4 C 4 C 4 4 2 N N N A A 10 BN 8K 4 4 MM NC 11 4 47 2 2 47 47 47 2 2 8K 2 M BQNRC 2 12 G 4 B 4 4 7 4 2 4 4 7 7 7 7B 2 N C 4 C 6 2 I 13 SN SN NC NG 14 8 5 9 8 J 6 4 A 2 8 4 2 15 9 0 2 3 4 5 1 3 Training Program training sentences answers Entities Speech Recognition Extractor Text T8 2 2 T 2 5 NE Models Speech 0 1 2 3 3 4 5 Locations 4 Persons Organizations U 2 2 2 16 9 6 B S S S S 9 5 C S S S S S N N N 4 6 J J 17 9 9 0 16 33 5 12 3435 17 33458 18 9 4 19 9 4 ICML 1997 submission to to appear in carnegie mellon university university of california dartmouth college stochastic optimization reinforcement learning model building mobile robot supported in part copyright author title institution note Trained on 2 million words of BibTeX data from the Web 20 9 0 7 J 2 5 6 6 V 5 6 0 7 2A 9 6 7 9 9 D 9 D K K 2 21 8 9 4 22 B 8 9 6C 4 23 8 8 9 24 8 S S S S S K K K K K Q 4 Q N R 4 Q N R 25 8 S A S A S B K B K Q 7 Q R 7 Q R A S A S B K K K 26 8 J J J J D 27 o1 1 ot 1 ot ot 1 oT Given an observation sequence and a model compute the probability of the observation sequence O o1 oT A B Compute P O 28 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT P O X bx1o1 bx2o2 bxT oT P X x1 a x1x2 a x2 x3 a xT 1 xT P O X P O X P X P O P O X P X X 29 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT P O x bx o 1 x1 xT 1 1 T 1 a t 1 b xt xt 1 xt 1ot 1 30 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT Special structure gives us an efficient solution using dynamic programming Intuition Probability of the first t observations is the same for all possible t 1 length state sequences Define t P o o x i i 1 t t 31 1 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT j t 1 P o1 ot 1 xt 1 j P o1 ot 1 xt 1 j P xt 1 j P o1 ot xt 1 j P ot 1 xt 1 j P xt 1 j P o1 ot xt 1 j P ot 1 xt 1 j 32 1 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT P o1 ot xt i xt 1 j P ot 1 xt 1 j i 1 N P o1 ot xt 1 j xt i P xt i P ot 1 xt 1 j i 1 N P o1 ot xt i P xt 1 j xt i P ot 1 xt 1 j i 1 N i t aij b jo i 1 N t 1 33 1 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT i T 1 1 i t P o t oT x t i i t aij biot j t 1 j 1 N Probability of the rest of the states given the first state 34 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT P O P O P O N i 1 N i 1 N i 1 i T Forward Procedure i i 1 Backward Procedure i t i t Combination 35 o1 ot 1 ot J ot 1 oT arg max P X O X 36 x1 xt 1 j o1 ot 1 ot ot 1 oT j t max P x1 xt 1 o1 ot 1 xt j ot x1 xt 1 J 2 M V 37 x1 xt 1 xt xt 1 o1 ot 1 ot ot 1 oT j t max P x1 xt 1 o1 ot 1 xt j ot x1 xt 1 j t 1 max i t aij b jo t 1 i j t 1 arg max i t aij b jo i t 1 Recursive Computation 38 x1 xt 1 xt xt 1 xT o1 ot 1 ot ot 1 oT X T arg max i T i X t 1 t X t 1 P X arg max i T i Compute the most likely state sequence by working backwards 39 1 A B o1 A B ot 1 A B ot A B B ot 1 oT Given an observation sequence find the model that is most likely to produce that sequence No analytic method so Given a model and observation sequence update the model parameters to better fit the observations 40 1 A B o1 A B A B ot 1 ot i t aij b jo j t 1 pt i j m t m t t 1 m 1 N i t pt i j j 1 N 7 A B B ot 1 oT Probability of traversing an arc Probability of being in state i 41 1 A B o1 ot 1 T a ij p i j t 1 t T t 1 …
View Full Document