Unformatted text preview:

6.01, Fall Semester, 2007—Lecture 11 Notes 1MASSACHVSETTS INSTITVTE OF TECHNOLOGYDepartment of Electrical Engineering and Computer Science6.01—Introduction to EECS IFall Semester, 2007Lecture 11 NotesLong-term decision-making and searchLong-term decision-makingSo far in this course, we have taken a couple of different approaches to action-selection. In theutility-functions section, we considered a set of possible next actions, evaluated the utility of theresults, and chose the one with the best outcome. When we were studying control loops, we didn’thave the program explicitly consider alternative courses of action, but we, as designers, made modelsof the controller’s behavior in the world and tried to prove whether it would behave in the way wewanted it to, taking into account a longer-term pattern of behavior.Often, we will want a system to generate complex long-term patterns of behavior, but we will notbe able to write a simple control rule to generate those behavior patterns. In that case, we’d likethe system to evaluate alternatives for itself, but instead of evaluating single actions, it will have toevaluate whole sequences of actions, deciding whether they’re a good thing to do given the currentstate of the world.Let’s think of the problem of navigating through a city, given a road map, and knowing where weare. We can’t usually decide whether to turn left at the next intersection without deciding on awhole path.As always, the first step in the process will be to come up with a formal model of a real-worldproblem that abstracts away the irrelevant detail. So, what, exactly, is a path? The car we’redriving will actually follow a trajectory through continuous space(time), but if we tried to plan atthat level of detail we would fail miserably. Why? First, because the space of possible trajectoriesthrough two-dimensional space is just too enormous. Second, because when we’re trying to decidewhich roads to take, we don’t have the information about where the other cars will be on thoseroads, which will end up having a huge effect on the detailed trajectory we’ll end up taking.So, we can divide the problem into two levels: planning in advance which turns we’ll make at whichintersections, but deciding dynamically exactly how to control the steering wheel and the gas tobest move from intersection to intersection, given the current circumstances (other cars, stop-lights,etc.).We can make an abstraction of the driving problem to include road intersections and the waythey’re connected by roads. Then, given a start and a goal intersection, we could consider allpossible paths between them, and choose the one that is best.What criteria might we use to evaluate a path? There are all sorts of reasonable ones: distance,time, gas mileage, traffic-related frustration, scenery, etc. Generally speaking, the approach we’lloutline below can be extended to any criterion that is additive: that is, your happiness with thewhole path is the sum of your happiness with each of the segments. For now, though, we’ll adopt thesimple criterion of wanting to find a path with the fewest “steps” from intersection to intersection.6.01, Fall Semester, 2007—Lecture 11 Notes 2SHBGFDACEFigure 1: Map of small city 1Now, one algorithm for deciding on the best path through a map of the road intersections in our(very small) world (shown in figure 1), would be to enumerate all the paths, evaluate each oneaccording to our criterion, and then return the best one. The problem is that there are lots ofpaths. Even in our little domain, with 9 intersections, there are 210 paths from the intersectionlabeled S to the one labeled G.We can get a much better handle on this problem, by formulating it as an instance of a graph search(or a “state-space search”) problem, for which there are simple algorithms that perform well.State-space searchWe’ll model a state-space search problem formally as consisting of• a set (possibly infinite) of states the system can be in• a starting state which is an element of the set of states• a goal test, which is a function that can be applied to any state, and returns True if that statecan serve as a goal1• a successor function, which is a function that can be applied to a state, yielding a set of statesthat can be reached from the original state with one primitive stepThe decision about what constitutes a primitive step is a modeling decision. It could be to drive tothe next intersection, or to drive a meter, or a variety of other things, depending on the domain.The only requirement is that it terminate in a well-defined next state (and that, when it is time toexecute the plan, we will know how to to execute the primitive step.)We can think of this model as specifying a graph (in the computer scientist’s sense), in which thestates are the nodes and the successor function specifies what arcs exist between the nodes.So, for the little world in figure 1, we might make a model in which1Although in many cases we have a particular goal state (such as the intersection in front of my house), in othercases, we may have the goal of going to any gas station, which can be satisfied by many different intersections.6.01, Fall Semester, 2007—Lecture 11 Notes 3SBFDAC SS EDA A B F H A BBA F H B H A BFigure 2: Partially expanded search tree for city 1• The set of states is the intersections {S, A, B, C, D, E, F, G, H}.• The starting state is S.• The goal test isdef g(x ):return x == ’E ’• The successor function is defined using a dictionary:succ = { ’S ’ : [ ’A ’ ,’B ’] , ’A’: [ ’S ’ , ’C ’, ’ D ’], ’B ’ : [ ’S’ , ’D ’ , ’E ’], \’C’ : [ ’A ’ , ’F ’], ’D ’: [ ’ A ’, ’B ’ , ’F ’, ’ H ’], ’E ’ : [ ’B ’, ’H ’], \’F’ : [ ’C ’ , ’D ’, ’ G ’], ’H ’ : [ ’D ’ , ’E ’, ’G’], ’G ’ : [ ’F ’, ’H’ ]}def successors (x ):return succ [ x]We can think of this structure as defining a search tree, as shown in figure 2. It has the startingstate, S, at the root node, and then each node has its successor states as children. Layer k of thistree contains all possible paths through the graph of length k.Some of these paths are completely ridiculous. It can never be reasonable, if we’re trying to findthe shortest path between two states, to go back to a state we have previously visited on that samepath. So we can adopt the following rule:Pruning


View Full Document

MIT 6 01 - Long-term decision-making and search

Documents in this Course
Week 1

Week 1

3 pages

Op-Amps

Op-Amps

8 pages

Op-Amps

Op-Amps

6 pages

Syllabus

Syllabus

14 pages

Planning

Planning

14 pages

Load more
Download Long-term decision-making and search
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Long-term decision-making and search and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Long-term decision-making and search 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?