Movatterモバイル変換

[0]ホーム

Jump to content

Self-play

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromSelf-play (reinforcement learning technique))

Reinforcement learning technique

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Self-play is a technique for improving the performance ofreinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

Definition and motivation

[edit]

Inmulti-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:

It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge.
It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning.

Czarnecki et al^[1] argue that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets $L_{1},L_{2},...,L_{n}$ , such that any $i<j,\pi _{i}\in L_{i},\pi _{j}\in L_{j}$ , the strategy $\pi _{j}$ beats the strategy $\pi _{i}$ . Then, in population-based self-play, if the population is larger than $\max _{i}|L_{i}|$ , then the algorithm would converge to the best possible strategy.

Usage

[edit]

Self-play is used by theAlphaZero program to improve its performance in the games ofchess,shogi andgo.^[2]

Self-play is also used to train the Cicero AI system to outperform humans at the game ofDiplomacy. The technique is also used in training the DeepNash system to play the gameStratego.^[3]^[4]

Connections to other disciplines

[edit]

Self-play has been compared to the epistemological concept oftabula rasa that describes the way that humans acquire knowledge from a "blank slate".^[5]

References

[edit]

^Czarnecki, Wojciech M.; Gidel, Gauthier; Tracey, Brendan; Tuyls, Karl; Omidshafiei, Shayegan; Balduzzi, David; Jaderberg, Max (2020)."Real World Games Look Like Spinning Tops".Advances in Neural Information Processing Systems.33. Curran Associates, Inc.:17443–17454.arXiv:2004.09468.
^Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent;Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen;Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm".arXiv:1712.01815 [cs.AI].
^Snyder, Alison (2022-12-01)."Two new AI systems beat humans at complex games".Axios. Retrieved2022-12-29.
^Erich_Grunewald (22 December 2022),"Notes on Meta's Diplomacy-Playing AI",LessWrong
^Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization".arXiv:1712.01815 [cs.AI].

Retrieved from "https://en.wikipedia.org/w/index.php?title=Self-play&oldid=1297357226"

Categories:

Hidden categories:

[8]ページ先頭

Movatterモバイル変換

Definition and motivation

Usage

Connections to other disciplines

Further reading

References