##
This **preview** shows page *1-2-3-4*
out of 13 **pages**.

*View Full Document*

End of preview. Want to read all 13 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

G1CDG2Review1 February 26, 2020COSC 4368: Artificial Intelligence1) Best first Search and A* [10]a) Best-First-Search (using function h only) [3]b) A* (using f=g+h)[4]Review1 February 26, 2020COSC 4368: Artificial Intelligence1) Best first Search and A* [10]Consider the search space below, where S is the start node and G1 and G2 satisfy the goaltest. Arcs are labeled with the cost of traversing them and the estimated cost to a goal (the h function itself) is reported inside nodes.For each of the following search strategies, indicate which goal state is reached (if any) and list, in order, all the states popped off of the OPEN list. When all else is equal, nodes should be removed from OPEN in alphabetical order.a) Best-First-Search (using function h only) [3]Goal state reached: G2 [1]States popped off OPEN: S, E, G2 [2]b) A* (using f=g+h)[4]Goal state reached: G1 [1]States popped off OPEN: S, A, B, G1 [3]c) Assume you have 2 admissible heuristics h1(s) and h2(s) are given for a given seach problem. Is h3(s)=min(h1(s),h2(s)) also admissible? Would you prefer using h2 or using h3 in conjuction with A*? Give reasons for your anwers[4]. Yes, h3 is admissible. If h1 and h2 always underestimate the “true” cost thenthe lesser of the two will certainly underestimate the true cost as well; therefore, h3 is admissible.27121529 B 5A 90AAG1 0 E8 C8D4G2 0S23841451I will prefer h2, because h2 is always greater equal than h3 and therefore it provides a closer approximation of the true cost. As a matter of fact, h2 dominates h3, which translates into equal or better efficiency of the search, as discussed on the bottom of page 106 of our textbook.2) Local Search a) Assume you apply randomized hill climbing to a minimization involving a continuous, differentiable function that has 3 minima. Will it always find the optimal solution? Give reasons for your answer! [3]2No, HC might climb down the wrong minimum depending on the chosen starting pointb) What is the “main” difference between simulated annealing and randomized hill climbing? [2]3… SA does allow downward steps…c) Assume you apply a version of depth first search that checks for repeated states on the current path1, but which does not use a depth bound to the 8-puzzle. Will it always find a solution if a solution exists (assuming that there are enough computational resources)? [6]1 This version backtracks, if a loop in the current path is encountered.4Yes!Because the search-tree for the 8-puzzle is finite and because the algorithm checks for loops in the current path the algorithm will sooner or later stop moving forward, backtrack, and find the solution eventually.d) When does A* terminate? Be precise! 5When a goal node in the open list is expanded Wrong: when a goal node is appearing on the open list3) Neural Network Basics a) How do neural networks compute the value/activation of a node? [2]No answer given! b) What is neural network learning all about? What is it trying to learn? Be precise! No answer given! Will ask only very basic questions about NN in Midterm1! 64. Reinforcement LearningConsider the following World called ABC is given:Give the Bellman equations for states 1 and 4 of the ABC world! [3]U(1)= 5 + *U(4) [1]U(4)= 3 + *max (U(2)*0.3+ U(3)*0.1+U(5)*0.6, U(1)*0.4+U(5)*0.6) [2]No partial credit! In general: U (s) = R(s) + γ*maxaΣs’ T(s,a,s’)*U (s’) 7b) Now we apply temporal difference learning, assuming the agent starts in state 2 and applies the operator sequence w-y(ending up in state 2)-w; assume the initial utilities are0; what are the new utilities; also assume =0.5 and =0.5)?In summary, excuting w-y-w the agents visits 2-4-2-4General Update Formula: U(s):=(1-)*U(s)+ *(R(s)+ *U(s’))U(2)=0+ 0.5(-2+0))=-1 because R(2)=-2 and U(4)=0U(4)=0+0.5*(3+0.5*-1)=1.25 because R(4)=3 and U(2)=-1U(2)=0.5*-1 + 0.5*(-2+0.5*1.25)=-0.5+0.5*-1.375=-1.1875 because R(2)=-2 U(4)=1.25Remark; if would have been 1, U(2) would be greater than 1!5) RL in general and SARSA/Q-Learning a) Assume you have a policy that always selects the action that leads to the state with the highest expected utility. Present arguments that this is usually not a good policy by describing scenarios in which this policy leads to suboptimal behavior of the agent!8Not suitable for unknown worlds due its lack of exploration Not suitable for changing worlds due to its lack of exploration Other answers might deserve credit. b) Assume the following world is given:Assume the agent is currently in state 2 and her policy always applies action b in every state. We are using SARSA to update the Q-Table. How does the updated Q-Table look like after the agent has applied action b the fourth time assuming when SARSA is used? Assume that the learning rate and the discount rate are both 0.5. Do not only report the updated value, but also give the formulas for the four Q-table updates. Assume that the q-table entries are initially set to 0. Q(a,s) Q(a,s) + α [ R(s) + γ*Q(a’,s’) - Q(a,s) ]Q(b,2)= 0 + 0.5*(2 + 0 – 0)=1Q(b,1)= 0+ 0.5*0=0Q(b,3)=0+ 0.5*(-1+0.5*1-0)= -0.25Q(b,2)=1+ 0.5*(2+1*0-1)=1.5A problem like this will not be on Midterm1 (but likely on Midterm2)!!c) How does SARSA differ from Q-learning?SARSA uses the chosen action in successor state based on the employed policy in its q-table update when estimating the “utilities of the future”, whereas Q-learning uses the optimal action that leads to the best successor state—independent of the employed policy.9d) What role does the learning rate play in temporal difference learning; how does running temporal difference learning with low values of differ from running it with high values of ? [2]It determines how quickly our current beliefs/estimations are updated based on new evidence. e) Assume you run temporal difference learning with high values of —what are the implications of doing that? [2]10If is high the agent will more focus on its long term wellbeing, and will shy away from taking actions—although they lead to immediate rewards—that will lead to the medium and long term suffering of the agent. 6. Comparision of Seach AlgorithmsCompare Traditional Hill Climbing/Randomized Hill Climbing and Best-first Search! What are the main differences between the two approaches?Let n be the size of the search space, the number of nodes in the search tree b the branching factor, the number of

View Full Document