Movatterモバイル変換

[0]ホーム

Jump to content

Reinforcement learning

Edit links

From Wikipedia, the free encyclopedia

Field of machine learning

For reinforcement learning in psychology, seeReinforcement andOperant conditioning.

The typical framing of a reinforcement learning (RL) scenario: an agent takes actions in an environment, which is interpreted into a reward and a state representation, which are fed back to the agent.

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural radiance field Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Reinforcement learning (RL) is an interdisciplinary area ofmachine learning andoptimal control concerned with how anintelligent agent shouldtake actions in a dynamic environment in order tomaximize a reward signal. Reinforcement learning is one of thethree basic machine learning paradigms, alongsidesupervised learning andunsupervised learning.

Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the cumulative reward (the feedback of which might be incomplete or delayed).^[1] The search for this balance is known as theexploration–exploitation dilemma.

The environment is typically stated in the form of aMarkov decision process (MDP), as many reinforcement learning algorithms usedynamic programming techniques.^[2] The main difference between classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process, and they target large MDPs where exact methods become infeasible.^[3]

Algorithm	Description	Policy	Action space	State space	Operator
Monte Carlo	Every visit to Monte Carlo	Either	Discrete	Discrete	Sample-means of state-values or action-values
TD learning	State–action–reward–state	Off-policy	Discrete	Discrete	State-value
Q-learning	State–action–reward–state	Off-policy	Discrete	Discrete	Action-value
SARSA	State–action–reward–state–action	On-policy	Discrete	Discrete	Action-value
DQN	Deep Q Network	Off-policy	Discrete	Continuous	Action-value
DDPG	Deep Deterministic Policy Gradient	Off-policy	Continuous	Continuous	Action-value
A3C	Asynchronous Advantage Actor-Critic Algorithm	On-policy	Discrete	Continuous	Advantage (=action-value - state-value)
TRPO	Trust Region Policy Optimization	On-policy	Continuous or Discrete	Continuous	Advantage
PPO	Proximal Policy Optimization	On-policy	Continuous or Discrete	Continuous	Advantage
TD3	Twin Delayed Deep Deterministic Policy Gradient	Off-policy	Continuous	Continuous	Action-value
SAC	Soft Actor-Critic	Off-policy	Continuous	Continuous	Advantage
DSAC^[44]^[45]^[46]	Distributional Soft Actor Critic	Off-policy	Continuous	Continuous	Action-value distribution

v t e Computer science
Note: This template roughly follows the 2012ACM Computing Classification System.
Hardware	Printed circuit board Peripheral Integrated circuit Very Large Scale Integration Systems on Chip (SoCs) Energy consumption (Green computing) Electronic design automation Hardware acceleration Processor Size /Form
Computer systems organization	Computer architecture Computational complexity Dependability Embedded system Real-time computing
Networks	Network architecture Network protocol Network components Network scheduler Network performance evaluation Network service
Software organization	Interpreter Middleware Virtual machine Operating system Software quality
Software notations andtools	Programming paradigm Programming language Compiler Domain-specific language Modeling language Software framework Integrated development environment Software configuration management Software library Software repository
Software development	Control variable Software development process Requirements analysis Software design Software construction Software deployment Software engineering Software maintenance Programming team Open-source model
Theory of computation	Model of computation Stochastic Formal language Automata theory Computability theory Computational complexity theory Logic Semantics
Algorithms	Algorithm design Analysis of algorithms Algorithmic efficiency Randomized algorithm Computational geometry
Mathematics ofcomputing	Discrete mathematics Probability Statistics Mathematical software Information theory Mathematical analysis Numerical analysis Theoretical computer science
Information systems	Database management system Information storage systems Enterprise information system Social information systems Geographic information system Decision support system Process control system Multimedia information system Data mining Digital library Computing platform Digital marketing World Wide Web Information retrieval
Security	Cryptography Formal methods Security hacker Security services Intrusion detection system Hardware security Network security Information security Application security
Human–computer interaction	Interaction design Social computing Ubiquitous computing Visualization Accessibility
Concurrency	Concurrent computing Parallel computing Distributed computing Multithreading Multiprocessing
Artificial intelligence	Natural language processing Knowledge representation and reasoning Computer vision Automated planning and scheduling Search methodology Control method Philosophy of artificial intelligence Distributed artificial intelligence
Machine learning	Supervised learning Unsupervised learning Reinforcement learning Multi-task learning Cross-validation
Graphics	Animation Extended reality Augmented Mixed Virtual Rendering Photograph manipulation Graphics processing unit Image compression Solid modeling
Applied computing	Quantum Computing E-commerce Enterprise software Computational mathematics Computational physics Computational chemistry Computational biology Computational social science Computational engineering Differentiable computing Computational healthcare Digital art Electronic publishing Cyberwarfare Electronic voting Video games Word processing Operations research Educational technology Document management
Category Outline Glossaries

Movatterモバイル変換

Principles

Exploration

Algorithms for control learning

Criterion of optimality

Policy

State-value function

Brute force

Value function

Monte Carlo methods

Temporal difference methods

Function approximation methods

Direct policy search

Model-based algorithms

Theory

Research

Comparison of key algorithms

Associative reinforcement learning

Deep reinforcement learning

Adversarial deep reinforcement learning

Fuzzy reinforcement learning

Inverse reinforcement learning

Multi-objective reinforcement learning

Safe reinforcement learning

Self-reinforcement learning

Statistical comparison of reinforcement learning algorithms

See also

References

Further reading

External links