- Notifications
You must be signed in to change notification settings - Fork21
A reimplementation of AlphaGo in Go (specifically AlphaZero)
License
gorgonia/agogo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A reimplementation of AlphaGo in Go (specifically AlphaZero)
The algorithm is composed of:
- a Monte-Carlo Tree Search (MCTS) implemented in the
mcts
package; - a Dual Neural Network (DNN) implemented in the
dualnet
package.
The algorithm is wrapped into a top-level structure (AZ
for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.
The contract specifies the description of a game state.
In this package, the contract is a Go interface declared in thegame
package:State
.
In the
agogo
package, each player of the game is anAgent
, and in agame
, twoAgents
are playing in anArena
The
game
package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on aState
of the game. A State is an interface that represents the current game stateas well as the allowed interactions. The interaction is made by an objectPlayer
who is operating aPlayerMove
. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.
This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of thegame
package.Then, the model can be saved and used as a player.
The steps to train the algorithm are:
- Creating a structure that is fulfilling the
State
interface (aka agame). - Creating aconfiguration for your AZ internal MCTS and NN.
- Creating an
AZ
structure based on thegame and theconfiguration - Executing the learning process (by calling the
Learn
method) - Saving the trained model (by calling the
Save
method)
The steps to play against the algorithm are:
- Creating an
AZ
object - Loading the trained model (by calling the
Read
method) - Switching the agent to inference mode via the
SwitchToInference
method - Get the AI move by calling the
Search
method and applying the move to the game manually
Four board games are implemented so far. Each of them is defined as a subpackage ofgame
:
Tic-tac-toe is a m,n,k game where m=n=k=3.
Here is a sample code that trains AlphaGo to play the game. The result is saved in a fileexample.model
// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toefuncencodeBoard(a game.State) []float32 {board:=agogo.EncodeTwoPlayerBoard(a.Board(),nil)fori:=rangeboard {ifboard[i]==0 {board[i]=0.001 } }playerLayer:=make([]float32,len(a.Board()))next:=a.ToMove()ifnext==game.Player(game.Black) {fori:=rangeplayerLayer {playerLayer[i]=1 } }elseifnext==game.Player(game.White) {// vecf32.Scale(board, -1)fori:=rangeplayerLayer {playerLayer[i]=-1 } }retVal:=append(board,playerLayer...)returnretVal}funcmain() {// Create the configuration of the neural networkconf:= agogo.Config{Name:"Tic Tac Toe",NNConf:dual.DefaultConf(3,3,10),MCTSConf:mcts.DefaultConfig(3),UpdateThreshold:0.52, }conf.NNConf.BatchSize=100conf.NNConf.Features=2// write a better encoding of the board, and increase features (and that allows you to increase K as well)conf.NNConf.K=3conf.NNConf.SharedLayers=3conf.MCTSConf= mcts.Config{PUCT:1.0,M:3,N:3,Timeout:100*time.Millisecond,PassPreference:mcts.DontPreferPass,Budget:1000,DumbPass:true,RandomCount:0, }conf.Encoder=encodeBoard// Create a new gameg:=mnk.TicTacToe()// Create the AlphaZero structurea:=agogo.New(g,conf)// Launch the learning processerr:=a.Learn(5,50,100,100)// 5 epochs, 50 episode, 100 NN iters, 100 games.iferr!=nil {log.Println(err) }// Save the modela.Save("example.model")}
funcencodeBoard(a game.State) []float32 {board:=agogo.EncodeTwoPlayerBoard(a.Board(),nil)fori:=rangeboard {ifboard[i]==0 {board[i]=0.001 } }playerLayer:=make([]float32,len(a.Board()))next:=a.ToMove()ifnext==game.Player(game.Black) {fori:=rangeplayerLayer {playerLayer[i]=1 } }elseifnext==game.Player(game.White) {// vecf32.Scale(board, -1)fori:=rangeplayerLayer {playerLayer[i]=-1 } }retVal:=append(board,playerLayer...)returnretVal}funcmain() {conf:= agogo.Config{Name:"Tic Tac Toe",NNConf:dual.DefaultConf(3,3,10),MCTSConf:mcts.DefaultConfig(3), }conf.Encoder=encodeBoardg:=mnk.TicTacToe()a:=agogo.New(g,conf)a.Load("example.model")a.A.Player=mnk.Crossa.B.Player=mnk.Noughta.B.SwitchToInference(g)a.A.SwitchToInference(g)// Put x int the centerstateAfterFirstPlay:=g.Apply(game.PlayerMove{Player:mnk.Cross,Single:4, })fmt.Println(stateAfterFirstPlay)// ⎢ · · · ⎥// ⎢ · X · ⎥// ⎢ · · · ⎥// What to do nextmove:=a.B.Search(stateAfterFirstPlay)fmt.Println(move)// 1g.Apply(game.PlayerMove{Player:mnk.Nought,Single:move, })fmt.Println(stateAfterFirstPlay)// ⎢ · O · ⎥// ⎢ · X · ⎥// ⎢ · · · ⎥}
A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation
Original implementation credits to
About
A reimplementation of AlphaGo in Go (specifically AlphaZero)