DOC PREVIEW
UW-Madison ECE 539 - Learning BlackJack with ANN

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning BlackJack with ANN(Aritificial Neural Network)ECE 539Final ProjectIp Kei [email protected]: 90128281001AbstractBlackjack is a card game where the player attempts to beat the dealer by having the total points higher than the dealer’s hand but less than or equal to 21. The probabilistic nature of Blackjack makes it an illustrative application for learning algorithms. The learning system needs to explore different strategies with a higher probability of winning games. This project explores the method of Artificial Neural Network in Blackjack for learning strategies, and the reinforcement learning algorithm is used to learn strategies. Reinforcement learning is a process to map situations to actions when the learner is not told which actions to take but to discover which actions yield the highest reward by trial and error. The trained ANN will be used to play Blackjack without explicitly teaching therules of the game. Furthermore, the efficiency of the ANN and the results from the learning algorithm will be investigated and interpreted as different strategies for playing Blackjack and other random game. Background Initially with two cards to each player, the object of Blackjack is to draw cards from adeck of 52 cards to a total value of 21. The player can choose from the following actions:- Stand to stay with the current hand and take no card.- Hit to add a card to the hand to make the total card value closer to 21- Double Down When the player is holding 2 cards, the player can doublehis bet by hitting with only one more card and stand after that.- Split Having the pair of cards with the same values, the player can splithis hand into two hands. The player may split up to 3 times in a game. For simplicity of this project, Double Down and Split will not be considered inthis project. The value of a hand is the sum of the values of each card in the hand, whereeach card from 2 to 10 is valued according, with J, Q, and K of the value of 10. Aces canbe either 1 or 11. Each player plays against the dealer, and the goal is to obtain a handwith a greater value than the dealer’s hand but less than or equal to 21. A player may hitas many times as he wishes as long as it is not over 21. The player can also win by having25 cards in hand with total points less than 21. The dealer hits when his hand is less than17 and stand when it is greater than 17. When the player is dealt 21 points in the first 2cards, he automatically wins his bet if the dealer is not dealt 21. If the dealer hasblackjack (21 points), the game is over and the dealer wins all bets, or ties with anyplayer with blackjack (21 points). Figure 1 and 2 shows a demo of the Matlab Blackjack(blackjack.m) program. The player needs to press the hit or stand button each turn, andthe total dealer and player points are calculated as shown in the bottom. It also shows thetotal balance remains in the player’s account and the amount of bet the player has put infor this game. In this example, the player bet $0. The program exits when the playerbalance is less than zero.Figure 1: the initial state of the game, the first 2 cards are dealt to the player and to the dealer.To measure the efficiency of the rules in Blackjack, I have simulated the program to play against the dealer for 1000 games, where the dealer follows the 17 points value. The efficiency can be observed from the percentage of winning and the percentage of draw games. The comparison of the play’s random moves versus the dealer’s 17 point rule is shown in Figure 2a.3Figure 2: as the player chooses to stand, the dealer chooses to hit but get busted. The player won.Strategy Win % Tie %Player’s random moves 31% 8%Dealer’s 17 points rule 61% 8%Figure 2a: Efficiency of different strategies in Blackjack. Each Strategy is played1000 games.One of the goals of this project is to develop a better strategy with ANN that beats the Dealer’s 17 points rule, that is, the new strategy will have a higher wining percentage. Different configurations of MLP and preprocess of input training data sets will also be experimented later in this paper. Finally, it will explore some of the Blackjack strategies interpreted from the experiment results.4Applying Reinforcement Learning to BlackjackReinforcement learning is a process to map situations to actions such that the reward value is maximized. The learning algorithm decides which actions to take by finding the actions that yields the highest reward through trial and error. The taken actions will affectthe immediate rewards and the subsequent rewards as well. In Blackjack, given a set of dealer’s and player’s cards, if the probability of winning of each outcome is known, it canalways make the best decision (hit or stand) by taking an action that yields the highest winning probability in the next state. For each final state, the probability of winning is either 1 (if the player wins/draws) or 0 (if the player loses). In this project, the initial winning probability of each intermediate state is set to 0.5 and the learning parameter α isalso initialized to 0.5. The winning probability of each state is updated for the dealer and player after each game. The winning probability of the previous state will get closer to the current state base on this equation: P(s) = P(s) + α*[P(s’)-P(s)], where α is the learning parameter, s is the current state and s’ is the next state. For example, figure 3 shows the first 3 rows taken from the result table in the output when the Matlab program genlookuptable.m is simulated to play against the dealer based on random decision. 2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0 2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0 2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000Figure 3: the result table (lpxx) from the Matlab program output after one game.The first 5 columns represent the dealer’s cards and the next 5 columns represent the player’s cards. The dealer and player can each have a maximum of 5 cards by the game rule. The card values in each hand are sorted in ascending order before they are inserted into the table. Column 11 is the winning probability of each state. Column 12 and 13 represented the action taken by the player, where [1 0]


View Full Document

UW-Madison ECE 539 - Learning BlackJack with ANN

Documents in this Course
Load more
Download Learning BlackJack with ANN
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning BlackJack with ANN and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning BlackJack with ANN 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?