gorgonia/agogoPublic

NotificationsYou must be signed in to change notification settings
Fork21
Star218

A reimplementation of AlphaGo in Go (specifically AlphaZero)

License

MIT license

218 stars 21 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cmd/tictactoe		cmd/tictactoe
deploy		deploy
dualnet		dualnet
game		game
internal		internal
mcts		mcts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.go		agent.go
agogo.go		agogo.go
arena.go		arena.go
const_cuda.go		const_cuda.go
const_nocuda.go		const_nocuda.go
datatypes.go		datatypes.go
dummy.go		dummy.go
encoding_helper.go		encoding_helper.go
encoding_helper_test.go		encoding_helper_test.go
go.mod		go.mod
go.sum		go.sum
naughty.go		naughty.go
pool.go		pool.go
statistics.go		statistics.go
utils.go		utils.go

Repository files navigation

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

a Monte-Carlo Tree Search (MCTS) implemented in themcts package;
a Dual Neural Network (DNN) implemented in thedualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in thegame package:State.

Description of some concepts/ubiquitous language

In theagogo package, each player of the game is anAgent, and in agame, twoAgents are playing in anArena
Thegame package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on aState of the game. A State is an interface that represents the current game stateas well as the allowed interactions. The interaction is made by an objectPlayer who is operating aPlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of thegame package.Then, the model can be saved and used as a player.

The steps to train the algorithm are:

Creating a structure that is fulfilling theState interface (aka agame).
Creating aconfiguration for your AZ internal MCTS and NN.
Creating anAZ structure based on thegame and theconfiguration
Executing the learning process (by calling theLearn method)
Saving the trained model (by calling theSave method)

The steps to play against the algorithm are:

Creating anAZ object
Loading the trained model (by calling theRead method)
Switching the agent to inference mode via theSwitchToInference method
Get the AI move by calling theSearch method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage ofgame:

mnk form,n,k game.
wq is the game ofGo (围碁)
c4
komi

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a fileexample.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toefuncencodeBoard(a game.State) []float32 {board:=agogo.EncodeTwoPlayerBoard(a.Board(),nil)fori:=rangeboard {ifboard[i]==0 {board[i]=0.001     }     }playerLayer:=make([]float32,len(a.Board()))next:=a.ToMove()ifnext==game.Player(game.Black) {fori:=rangeplayerLayer {playerLayer[i]=1     }     }elseifnext==game.Player(game.White) {// vecf32.Scale(board, -1)fori:=rangeplayerLayer {playerLayer[i]=-1     }     }retVal:=append(board,playerLayer...)returnretVal}funcmain() {// Create the configuration of the neural networkconf:= agogo.Config{Name:"Tic Tac Toe",NNConf:dual.DefaultConf(3,3,10),MCTSConf:mcts.DefaultConfig(3),UpdateThreshold:0.52,     }conf.NNConf.BatchSize=100conf.NNConf.Features=2// write a better encoding of the board, and increase features (and that allows you to increase K as well)conf.NNConf.K=3conf.NNConf.SharedLayers=3conf.MCTSConf= mcts.Config{PUCT:1.0,M:3,N:3,Timeout:100*time.Millisecond,PassPreference:mcts.DontPreferPass,Budget:1000,DumbPass:true,RandomCount:0,     }conf.Encoder=encodeBoard// Create a new gameg:=mnk.TicTacToe()// Create the AlphaZero structurea:=agogo.New(g,conf)// Launch the learning processerr:=a.Learn(5,50,100,100)// 5 epochs, 50 episode, 100 NN iters, 100 games.iferr!=nil {log.Println(err)    }// Save the modela.Save("example.model")}

Inference

funcencodeBoard(a game.State) []float32 {board:=agogo.EncodeTwoPlayerBoard(a.Board(),nil)fori:=rangeboard {ifboard[i]==0 {board[i]=0.001        }    }playerLayer:=make([]float32,len(a.Board()))next:=a.ToMove()ifnext==game.Player(game.Black) {fori:=rangeplayerLayer {playerLayer[i]=1        }    }elseifnext==game.Player(game.White) {// vecf32.Scale(board, -1)fori:=rangeplayerLayer {playerLayer[i]=-1        }    }retVal:=append(board,playerLayer...)returnretVal}funcmain() {conf:= agogo.Config{Name:"Tic Tac Toe",NNConf:dual.DefaultConf(3,3,10),MCTSConf:mcts.DefaultConfig(3),    }conf.Encoder=encodeBoardg:=mnk.TicTacToe()a:=agogo.New(g,conf)a.Load("example.model")a.A.Player=mnk.Crossa.B.Player=mnk.Noughta.B.SwitchToInference(g)a.A.SwitchToInference(g)// Put x int the centerstateAfterFirstPlay:=g.Apply(game.PlayerMove{Player:mnk.Cross,Single:4,    })fmt.Println(stateAfterFirstPlay)// ⎢ · · · ⎥// ⎢ · X · ⎥// ⎢ · · · ⎥// What to do nextmove:=a.B.Search(stateAfterFirstPlay)fmt.Println(move)// 1g.Apply(game.PlayerMove{Player:mnk.Nought,Single:move,    })fmt.Println(stateAfterFirstPlay)// ⎢ · O · ⎥// ⎢ · X · ⎥// ⎢ · · · ⎥}

Misc

A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation

Credits

Original implementation credits to

About

A reimplementation of AlphaGo in Go (specifically AlphaZero)

Releases

2tags

Packages

No packages published

Contributors3

Languages

Go95.4%
Shell3.9%
Other0.7%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

agogo

About

Description of some concepts/ubiquitous language

Training process

Applying the Algo on a game

Examples

tic-tac-toe

Training

Inference

Misc

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

gorgonia/agogo

Folders and files

Latest commit

History

Repository files navigation

agogo

About

Description of some concepts/ubiquitous language

Training process

Applying the Algo on a game

Examples

tic-tac-toe

Training

Inference

Misc

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages