Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Imitation learning

From Wikipedia, the free encyclopedia
Machine learning technique where agents learn from demonstrations

Imitation learning is a paradigm inreinforcement learning, where an agent learns to perform a task bysupervised learning from expert demonstrations. It is also calledlearning from demonstration andapprenticeship learning.[1][2][3]

It has been applied to underactuated robotics,[4] self-driving cars,[5][6][7] quadcopter navigation,[8] helicopter aerobatics,[9] and locomotion.[10][11]

Approaches

[edit]

Expert demonstrations are recordings of an expert performing the desired task, often collected as state-action pairs(ot,at){\displaystyle (o_{t}^{*},a_{t}^{*})}.

Behavior Cloning

[edit]

Behavior Cloning (BC) is the most basic form of imitation learning. Essentially, it uses supervised learning to train a policyπθ{\displaystyle \pi _{\theta }} such that, given an observationot{\displaystyle o_{t}}, it would output an action distributionπθ(|ot){\displaystyle \pi _{\theta }(\cdot |o_{t})} that is approximately the same as the action distribution of the experts.[12]

BC is susceptible todistribution shift. Specifically, if the trained policy differs from the expert policy, it might find itself straying from expert trajectory into observations that would have never occurred in expert trajectories.[12]

This was already noted byALVINN, where they trained a neural network to drive a van using human demonstrations. They noticed that because a human driver never strays far from the path, the network would never be trained on what action to take if it ever finds itself straying far from the path.[5]

DAgger

[edit]

DAgger (DatasetAggregation)[13] improves on behavior cloning by iteratively training on a dataset of expert demonstrations. In each iteration, the algorithm first collects data by rolling out the learned policyπθ{\displaystyle \pi _{\theta }}. Then, it queries the expert for the optimal actionat{\displaystyle a_{t}^{*}} on each observationot{\displaystyle o_{t}} encountered during the rollout. Finally, it aggregates the new data into the datasetDD{(o1,a1),(o2,a2),...,(oT,aT)}{\displaystyle D\leftarrow D\cup \{(o_{1},a_{1}^{*}),(o_{2},a_{2}^{*}),...,(o_{T},a_{T}^{*})\}}and trains a new policy on the aggregated dataset.[12]

Decision transformer

[edit]
Architecture diagram of the decision transformer

The Decision Transformer approach models reinforcement learning as a sequence modelling problem.[14] Similar to Behavior Cloning, it trains a sequence model, such as aTransformer, that models rollout sequences(R1,o1,a1),(R2,o2,a2),,(Rt,ot,at),{\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t},a_{t}),}whereRt=rt+rt+1++rT{\displaystyle R_{t}=r_{t}+r_{t+1}+\dots +r_{T}} is the sum of future reward in the rollout. During training time, the sequence model is trained to predict each actionat{\displaystyle a_{t}}, given the previous rollout as context:(R1,o1,a1),(R2,o2,a2),,(Rt,ot){\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t})}During inference time, to use the sequence model as an effective controller, it is simply given a very high reward predictionR{\displaystyle R}, and it would generalize by predicting an action that would result in the high reward. This was shown toscale predictably to a Transformer with 1 billion parameters that is superhuman on 41Atari games.[15]

Other approaches

[edit]

See[16][17] for more examples.

Related approaches

[edit]

Inverse Reinforcement Learning (IRL) learns a reward function that explains the expert's behavior and then uses reinforcement learning to find a policy that maximizes this reward.[18] Recent works have also explored multi-agent extensions of IRL in networked systems.[19]

Generative Adversarial Imitation Learning (GAIL) usesgenerative adversarial networks (GANs) to match the distribution of agent behavior to the distribution of expert demonstrations.[20] It extends a previous approach using game theory.[21][16]

See also

[edit]

Further reading

[edit]

References

[edit]
  1. ^Russell, Stuart J.; Norvig, Peter (2021). "22.6 Apprenticeship and Inverse Reinforcement Learning".Artificial intelligence: a modern approach. Pearson series in artificial intelligence (Fourth ed.). Hoboken: Pearson.ISBN 978-0-13-461099-3.
  2. ^Sutton, Richard S.; Barto, Andrew G. (2018).Reinforcement learning: an introduction. Adaptive computation and machine learning series (Second ed.). Cambridge, Massachusetts: The MIT Press. p. 470.ISBN 978-0-262-03924-6.
  3. ^Hussein, Ahmed; Gaber, Mohamed Medhat; Elyan, Eyad; Jayne, Chrisina (2017-04-06)."Imitation Learning: A Survey of Learning Methods".ACM Comput. Surv.50 (2): 21:1–21:35.doi:10.1145/3054912.hdl:10059/2298.ISSN 0360-0300.
  4. ^"Ch. 21 - Imitation Learning".underactuated.mit.edu. Retrieved2024-08-08.
  5. ^abPomerleau, Dean A. (1988)."ALVINN: An Autonomous Land Vehicle in a Neural Network".Advances in Neural Information Processing Systems.1. Morgan-Kaufmann.
  6. ^Bojarski, Mariusz; Del Testa, Davide; Dworakowski, Daniel; Firner, Bernhard; Flepp, Beat; Goyal, Prasoon; Jackel, Lawrence D.; Monfort, Mathew; Muller, Urs (2016-04-25). "End to End Learning for Self-Driving Cars".arXiv:1604.07316v1 [cs.CV].
  7. ^Kiran, B Ravi; Sobh, Ibrahim; Talpaert, Victor; Mannion, Patrick; Sallab, Ahmad A. Al; Yogamani, Senthil; Perez, Patrick (June 2022). "Deep Reinforcement Learning for Autonomous Driving: A Survey".IEEE Transactions on Intelligent Transportation Systems.23 (6):4909–4926.arXiv:2002.00444.Bibcode:2022ITITr..23.4909K.doi:10.1109/TITS.2021.3054625.ISSN 1524-9050.
  8. ^Giusti, Alessandro; Guzzi, Jerome; Ciresan, Dan C.; He, Fang-Lin; Rodriguez, Juan P.; Fontana, Flavio; Faessler, Matthias; Forster, Christian; Schmidhuber, Jurgen; Caro, Gianni Di; Scaramuzza, Davide; Gambardella, Luca M. (July 2016)."A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots"(PDF).IEEE Robotics and Automation Letters.1 (2):661–667.Bibcode:2016IRAL....1..661G.doi:10.1109/LRA.2015.2509024.ISSN 2377-3766.
  9. ^"Autonomous Helicopter: Stanford University AI Lab".heli.stanford.edu. Retrieved2024-08-08.
  10. ^Nakanishi, Jun; Morimoto, Jun; Endo, Gen; Cheng, Gordon; Schaal, Stefan; Kawato, Mitsuo (June 2004)."Learning from demonstration and adaptation of biped locomotion".Robotics and Autonomous Systems.47 (2–3):79–91.doi:10.1016/j.robot.2004.03.003.
  11. ^Kalakrishnan, Mrinal; Buchli, Jonas; Pastor, Peter; Schaal, Stefan (October 2009)."Learning locomotion over rough terrain using terrain templates".2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE. pp. 167–172.doi:10.1109/iros.2009.5354701.ISBN 978-1-4244-3803-7.
  12. ^abcCS 285 at UC Berkeley: Deep Reinforcement Learning. Lecture 2: Supervised Learning of Behaviors
  13. ^Ross, Stephane; Gordon, Geoffrey; Bagnell, Drew (2011-06-14)."A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning".Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings:627–635.
  14. ^Chen, Lili; Lu, Kevin; Rajeswaran, Aravind; Lee, Kimin; Grover, Aditya; Laskin, Misha; Abbeel, Pieter; Srinivas, Aravind; Mordatch, Igor (2021)."Decision Transformer: Reinforcement Learning via Sequence Modeling".Advances in Neural Information Processing Systems.34. Curran Associates, Inc.:15084–15097.arXiv:2106.01345.
  15. ^Lee, Kuang-Huei; Nachum, Ofir; Yang, Mengjiao; Lee, Lisa; Freeman, Daniel; Xu, Winnie; Guadarrama, Sergio; Fischer, Ian; Jang, Eric (2022-10-15),Multi-Game Decision Transformers,arXiv:2205.15241
  16. ^abHester, Todd; Vecerik, Matej; Pietquin, Olivier; Lanctot, Marc; Schaul, Tom; Piot, Bilal; Horgan, Dan; Quan, John; Sendonaris, Andrew (2017-04-12). "Deep Q-learning from Demonstrations".arXiv:1704.03732v4 [cs.AI].
  17. ^Duan, Yan; Andrychowicz, Marcin; Stadie, Bradly; Jonathan Ho, OpenAI; Schneider, Jonas; Sutskever, Ilya; Abbeel, Pieter; Zaremba, Wojciech (2017)."One-Shot Imitation Learning".Advances in Neural Information Processing Systems.30. Curran Associates, Inc.
  18. ^A, Ng (2000)."Algorithms for Inverse Reinforcement Learning".Proc. Of 17th International Conference on Machine Learning, 2000:663–670.
  19. ^V. S. Donge, B. Lian, F. L. Lewis and A. Davoudi, "Multiagent Graphical Games With Inverse Reinforcement Learning," in IEEE Transactions on Control of Network Systems, vol. 10, no. 2, pp. 841-852, June 2023, doi:10.1109/TCNS.2022.3210856.
  20. ^Ho, Jonathan; Ermon, Stefano (2016)."Generative Adversarial Imitation Learning".Advances in Neural Information Processing Systems.29. Curran Associates, Inc.arXiv:1606.03476.
  21. ^Syed, Umar; Schapire, Robert E (2007)."A Game-Theoretic Approach to Apprenticeship Learning".Advances in Neural Information Processing Systems.20. Curran Associates, Inc.
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Political
Social and economic
Retrieved from "https://en.wikipedia.org/w/index.php?title=Imitation_learning&oldid=1336011478"
Category:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp