| Type: | Package |
| Title: | Tic-Tac-Toe Game |
| Version: | 0.2.2 |
| Description: | Implements tic-tac-toe game to play on console, either with human or AI players. Various levels of AI players are trained through the Q-learning algorithm. |
| License: | MIT + file LICENSE |
| LazyData: | TRUE |
| RoxygenNote: | 6.0.1 |
| Depends: | R (≥ 2.10) |
| Imports: | hash, stats |
| Suggests: | testthat, combiter, dplyr, tidyr, reshape2, ggplot2 |
| URL: | https://github.com/kota7/tictactoe |
| BugReports: | https://github.com/kota7/tictactoe/issues |
| NeedsCompilation: | no |
| Packaged: | 2017-05-26 14:15:36 UTC; kota |
| Author: | Kota Mori [aut, cre] |
| Maintainer: | Kota Mori <kmori05@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2017-05-26 15:33:31 UTC |
Equivalent States
Description
Returns a set of equivalent states and actions
Usage
equivalent_states(state)equivalent_states_actions(state, action)Arguments
state | state, 3x3 matrix |
action | integer vector of indices (1 to 9) |
Value
equivalent_states returns a list of state matrices
equivalent_states_actions returns a list of two lists:states, the set of equivalent states andactions, the set of equivalent actions
Hash Operations for Single State
Description
Hash Operations for Single State
Usage
haskey(x, ...)## S3 method for class 'xhash'x[state, ...]## S3 replacement method for class 'xhash'x[state, ...] <- value## S3 method for class 'xhash'haskey(x, state, ...)Arguments
x | object |
... | additional arguments to determine the key |
state | state object |
value | value to assign |
Value
haskeyreturns a logical`[`returns a reference to the object`[<-`returns a value
Play Tic-Tac-Toe Game
Description
Start tic-tac-toe game on the console.
Usage
ttt(player1 = ttt_human(), player2 = ttt_human(), sleep = 0.5)Arguments
player1,player2 | objects that inherit |
sleep | interval to take before an AI player to make decision, in second |
Details
At default, the game is played between humans.Setplayer1 orplayer2 tottt_ai() to play againstan AI player.The strength of the AI can be adjusted by passing thelevelargument (0 (weekest) to 5 (strongest)) to thettt_ai function.
To input your move, type the position like "a1". Only two-length stringconsisting of an alphabet and a digit is accepted. Type "exit" tofinish the game.
You may set bothplayer1 andplayer2 as AI players.In this case, the game transition is displayed on the console withouthuman inputs.For conducting a large sized simulations of games between AIs, refer tottt_simulate
See Also
Examples
## Not run: ttt(ttt_human(), ttt_random())## End(Not run)Tic-Tac-Toe AI Player
Description
Create an AI tic-tac-toe game player
Usage
ttt_ai(name = "ttt AI", level = 0L)ttt_random(name = "random AI")Arguments
name | player name |
level | AI strength. must be Integer 0 (weekest) to 5 (strongest) |
Details
level argument controls the strength of AI, from0 (weekest) to 5 (strongest).ttt_random is an alias ofttt_ai(level = 0).
Attt_ai object has thegetmove function, which takesttt_game object and returns a move considered as optimal.getmove function is designed to take attt_game objectand returns a move using the policy function.
The object has the value and policy functions.The value function maps a game stateto the evaluation from the first player's viewpoint.The policy function maps a game state to a set ofoptimal moves in light of the value evaluation.The functions have been trained through the Q-learning.
Value
ttt_ai object
Tic-Tac-Toe Game
Description
Object that encapsulates a tic-tac-toe game.
Usage
ttt_game()Value
ttt_game object
Examples
x <- ttt_game()x$play(3)x$play(5)x$show_board()Human Tic-Tac-Toe Player
Description
Create an human tic-tac-toe player
Usage
ttt_human(name = "no name")Arguments
name | player name |
Value
ttt_human object
Q-Learning for Training Tic-Tac-Toe AI
Description
Train a tic-tac-toe AI through Q-learning
Usage
ttt_qlearn(player, N = 1000L, epsilon = 0.1, alpha = 0.8, gamma = 0.99, simulate = TRUE, sim_every = 250L, N_sim = 1000L, verbose = TRUE)Arguments
player | AI player to train |
N | number of episode, i.e. training games |
epsilon | fraction of random exploration move |
alpha | learning rate |
gamma | discount factor |
simulate | if true, conduct simulation during training |
sim_every | conduct simulation after this many training games |
N_sim | number of simulation games |
verbose | if true, progress report is shown |
Details
This function implements Q-learning to train a tic-tac-toe AI player.It is designed to train one AI player, which plays against itself to update itsvalue and policy functions.
The employed algorithm is Q-learning with epsilon greedy.For each states, the player updates its value evaluation by
V(s) = (1-\alpha) V(s) + \alpha \gamma max_s' V(s')
if it is the first player's turn. If it is the other player's turn, replacemax bymin.Note thats' spans all possible states you can reach froms.The policy function is also updated analogously, that is, the set ofactions to reachs' that maximizesV(s').The parameter\alpha controls the learning rate, andgamma isthe discount factor (earlier win is better than later).
Then the player chooses the next action by\epsilon-greedy method;Follow its policy with probability1-\epsilon, and choose randomaction with probability\epsilon.\epsilon controlsthe ratio of explorative moves.
At the end of a game, the player sets the value of the final state either to100 (if the first player wins), -100 (if the second player wins), or0 (if draw).
This learning process is repeated forN training games.Whensimulate is set true, simulation is conducted aftersim_every training games.This would be usefule for observing the progress of training.In general, as the AI gets smarter, the game tends to result in draw more.
See Sutton and Barto (1998) for more about the Q-learning.
Value
data.frame of simulation outcomes, if any
References
Sutton, Richard S and Barto, Andrew G. Reinforcement Learning: An Introduction. The MIT Press (1998)
Examples
p <- ttt_ai()o <- ttt_qlearn(p, N = 200)Simulate Tic-Tac-Toe Games between AIs
Description
Simulate Tic-Tac-Toe Games between AIs
Usage
ttt_simulate(player1, player2 = player1, N = 1000L, verbose = TRUE, showboard = FALSE, pauseif = integer(0))Arguments
player1,player2 | AI players to simulate |
N | number of simulation games |
verbose | if true, show progress report |
showboard | if true, game transition is displayed |
pauseif | pause the simulation when specified results occur.This can be useful for explorative purposes. |
Value
integer vector of simulation outcomes
Examples
res <- ttt_simulate(ttt_ai(), ttt_ai())prop.table(table(res))Vectorized Hash Operations
Description
Vectorized Hash Operations
Usage
haskeys(x, ...)setvalues(x, ...)getvalues(x, ...)## S3 method for class 'xhash'getvalues(x, states, ...)## S3 method for class 'xhash'setvalues(x, states, values, ...)## S3 method for class 'xhash'haskeys(x, states, ...)Arguments
x | object |
... | additional arugments to determine the keys |
states | state object |
values | values to assign |
Value
haskeysreturns a logical vectorsetvaluesreturns a reference to the objectgetvaluesreturns a list of values
Create Hash Table for Generic Keys
Description
Create Hash Table for Generic Keys
Usage
xhash(convfunc = function(state, ...) state, convfunc_vec = function(states, ...) unlist(Map(convfunc, states, ...)), default_value = NULL)Arguments
convfunc | function that converts a game state to a key.It must take a positional argument |
convfunc_vec | function for vectorized conversion from states to keys.This function must receive a positional argument |
default_value | value to be returned when a state is not recorded inthe table. |
Value
xhash object