UT Dallas CS 6375 - 01 - A Checkers Learning Problem - D3128230

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> 01 - A Checkers Learning Problem

DOC PREVIEW

UT Dallas CS 6375 - 01 - A Checkers Learning Problem

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1PROBLEMAPPROACHTARGET FUNCTIONTARGET FUNCTIONTARGET FUNCTIONProblem RepresentationTARGET FUNCTIONESTIMATING TRAINING VALUESADJUSTING THE WEIGHTSLMS TrainingThe Final DesignA CHECKERS LEARNING PROBLEM“Machine Learning” By Tom MitchellPROBLEM•Task T: playing checkers•Performance measure P: percent of games won in the world tournament•Training experience E: games played against itselfAPPROACH1. The exact type of knowledge to be learned2. A representation for this target knowledge3. A learning mechanismThe type of training experience available can have a significant impact on success or failure of the learnerTARGET FUNCTION •to reduce the problem of improving performance P at task T to the problem of learning some particular target functionLegal Moves Won or LostIndirect Training ExperienceGiven Broad Moves required to winDirect Training ExperienceTARGET FUNCTION •Evaluation function that assigns a numerical score to any given board state.•V : B →Ƞ to denote that V maps any legal board state from the set B to some real value (we use Ƞ to denote the set of real numbers).1. if b is a final board state that is won, then V(b) = 1002. if b is a final board state that is lost, then V(b) = -1003. if b is a final board state that is drawn, then V(b) = 04. if b is a not a final state in the game, then V(b) = V(b' ), where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game (assuming the opponent plays optimally, as well).TARGET FUNCTION •operational description of the ideal target function V is required.•Learning algorithms is expected to acquire only some approximation to the target function, and for this reason the process of learning the target function is often called function approximationOn one hand, we wish to pick a very expressive representation to allow representing as close an approximation as possible to the ideal target function V. On the other hand, the more expressive the representation, the more training data the program will require in order to choose among the alternative hypotheses it can represent.Problem RepresentationA simple representation: for any given board state, the function will be calculated as a linear combination of the following board features.•xl: the number of black pieces on the board•x2: the number of red pieces on the board•x3: the number of black kings on the board•x4: the number of red kings on the board•x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)•X6: the number of red pieces threatened by black•TARGET FUNCTION Thus, our learning program will represent (b) as a linear function of the form where through are numerical coefficients, or weights, to be chosen by the learning algorithm. Learned values for the weights through will determine the relative importance of the various board features in determining the value of the board, whereas the weight will provide an additive constant to the board value.•ESTIMATING TRAINING VALUES•In order to learn the target function we require a set of training examples, each describing a specific board state b and the training value Vtrain(b) for b. In other words, each training example is an ordered pair of the form .•Rule for estimating training values. ← (Successor(b))•ADJUSTING THE WEIGHTS•One common approach is to define the best hypothesis, or set of weights, as that which minimizes the square error E between the training values and the values predicted by the hypothesis .Thus, we seek the weights, or equivalently the , that minimize E for the observed training examples.•LMS TrainingLeast mean squares or LMS training rule is one of several algorithms to incrementally refine the weights. LMS weight update rule.•For each training example •Use the current weights to calculate •For each weight , update it as)•The Final

View Full Document

UT Dallas CS 6375 - 01 - A Checkers Learning Problem

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

UT Dallas CS 6375 - 01 - A Checkers Learning Problem

Sign up for free to view:

Please select your school