UW-Madison ECE 539 - Learning How to Play Black Jack through Reinforcement Learning - D1775594

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 539> Learning How to Play Black Jack through Reinforcement Learning

DOC PREVIEW

UW-Madison ECE 539 - Learning How to Play Black Jack through Reinforcement Learning

School name University of Wisconsin, Madison

Course Ece 539- Introduction to Artificial Neural Network and Fuzzy Systems

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Learning How to Play Black Jack through Reinforcement Learning ECE 539 Project Report December 20, 2010 Jonathan Quenzer Department of Electrical and Computer Engineering University of Wisconsin-Madison Abstract The goal of this project was to show that it is possible for a computer to learn how to play the card game of Black Jack [1] through reinforcement learning using pattern classification tools. To demonstrate the learning, a Matlab program was developed that simulates the game of Black Jack and performs the necessary calculations to make the classification decisions. It will be shown that computer can achieve similar or better results than a human, which can obtain at most a 2% advantage on the dealer if card counting is used. Two different types of decisions were made. The first decision is how much to bet before each hand is played. The second decision is whether the computer should hit or stay. In order to reduce the complexity of the program, splitting hands and doubling down were not included. A K Nearest Neighbor (KNN) classifier [2] was used to make the decisions. The KNN classifier was chosen because there were several non-separable clusters of data in the feature vectors that were generated. 1. Introduction The Matlab program that was developed for the project had many configurable settings including the number of decks being used, the percent of cards remaining before a new shuffle, and the number of players at the table. For simplicity, all of the results in this report were obtained with the program setup for three decks, 15% of the deck remaining before reshuffling, and three players at the table. The Matlab program had two main modes of operation. It could start with previously collected feature sets, or it could start with empty feature sets and build the feature sets after every hand. When using an empty feature set, this simulates the computer starting off with no knowledge of the game and then slowly learning through reinforcement. 2. Feature Set Generation The feature sets that were used to make the decisions were all generated by the Matlab program. After every hand, the hand history was analyzed and used to create two separate feature sets. All of the features were processed to have a mean of ½, a minimum of 0, and a maximum of 1. Figure 1 shows the feature vector composition for the hit or stay decision.Figure 1: Feature vector for hit or stay decision One important decision that was made in selecting the features was to separate the number of Aces in the player’s hand from the rest of the card. This was done because an Ace can either be represented by a value of one or eleven. When the program was creating the features it would automatically add a value of one to the player total and remove an ace from the number of Aces if the Ace had to be counted as a one in order to prevent busting (having a total over 21). The outcome of Black Jack is random, so it’s very likely that for some features there will be an incorrect decision of whether to hit or stay. There were a couple things done to minimize the number of misclassifications in the feature set. The first was to combine all the features that were the same by adding their hit/stay classifications together. This would increase the probability that a certain feature vector was correct. The other thing that helped was the pattern classification tool that was used. The KNN classifier using more than one neighbor would in effect average the K nearest neighbors to the feature vector being classified. The feature set used for deciding how much to bet is shown in Figure 2. Figure 2: Feature set used to decide how much to bet. The choice for how much to bet is either to bet the minimum or bet the maximum. The goal is to bet the minimum whenever the probability of winning a hand is less than 50%, and bet the maximum when it is greater than 50%. 3. KNN Classifier The KNN classification algorithm analyzed the K nearest neighbors to the feature being classified where K is a selectable integer. An example of how I implemented the algorithm is shown in Figure 3. Figure 3: Two Dimensional KNN exampleIn this example K = 5. The five nearest neighbors sum to +3. The sign of the sum is positive, so the decision would be to hit. 4. Results Figure 4 shows the results of several simulations using various numbers of nearest neighbors as well as a simulation where the decisions were random. Figure 4: Results of 5000 hand simulations In these simulations the computer started out with an empty feature set and added to the feature set after every hand. Figure 4 shows that all of the KNN classifiers worked much better than randomly choosing. It also shows that at around hand 4000 the 10 nearest neighbor classifier achieved about a 50% probability of winning. Figure 5 shows the results of a 1000 hand simulation using 10 nearest neighbors where the computer started with a feature set generated from 5000 hands. Figure 5: Results of 1000 hand simulation In Figure 5, each color represents a different player at the same table playing at the same time. The players represented by the blue and black both ended with a positive total, so their probability of winning was greater than 50%. 5. Conclusion The results clearly show that the KNN algorithm worked and allowed the computer to learn and make good decisions on how to bet and whether to hit or stay. The results also showed that it is possible to achieve >50% winning probability if the feature set is large enough. More analysis over several thousands of hands would need to be performed to calculate the exact probability of winning for the feature sets used. References [1] http://www.blackjackinfo.com/blackjack-rules.php [2] Ludmila I Kuncheva, Lakhmi C Jain, Nearest neighbor classifier: Simultaneous editing and feature selection, Pattern Recognition Letters, Volume 20, Issues 11-13, November 1999, Pages 1149-1156, ISSN 0167-8655, DOI:

View Full Document

UW-Madison ECE 539 - Learning How to Play Black Jack through Reinforcement Learning

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

UW-Madison ECE 539 - Learning How to Play Black Jack through Reinforcement Learning

Sign up for free to view:

Please select your school