強化學習可以結合人工好奇(artificial curiosity)嚟用:喺廿一世紀初,AI 最大嘅弱點係專化得滯,教一個 AI 幫手睇病,佢唔會識得(例如)做法律相關嘅判斷,但由現實經驗可知,人有能力學完一樣嘢走去學第樣;噉嘅其中一個重要原因係,人具有好奇心-喺手上資訊唔夠嗰陣,人往往會主動噉去搵新嘅資訊吸收;於是有 AI 研究者就提出咗「人工好奇」嘅概念,主張要用電腦模擬人類嘅好奇心,從而教到 AI 唔使吓吓都要由人類畀資訊佢,而係會曉自己搵資訊吸收[6][7]。
有研究者指,呢點就係缺乏好奇心嘅 AI 嘅問題所在-冇好奇心嘅智能體,一定要有人畀有用嘅資訊或者環境佢,先會有能力成長,但喺現實,人成日都會面對「周圍環境冇咩有用資訊」噉嘅情況,要自己去搵資訊[9];而好奇心正正就係能夠「令人自發噉去搵有用資訊」嘅嘢。要達致人工好奇,一段 RL 演算法起碼要有以下嘅嘢[8]:
Auer, Peter; Jaksch, Thomas; Ortner, Ronald (2010). "Near-optimal regret bounds for reinforcement learning".Journal of Machine Learning Research. 11: 1563–1600.
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart; Ernst, Damien (2010).Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis CRC Press.ISBN 978-1-4398-2108-4.
François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning".Foundations and Trends in Machine Learning. 11 (3–4): 219–354. arXiv:1811.12560. Bibcode:2018arXiv181112560F. doi:10.1561/2200000071.
Powell, Warren (2007).Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience.ISBN 978-0-470-17155-4.
Sutton, Richard S.; Barto, Andrew G. (1998).Reinforcement Learning: An Introduction. MIT Press.ISBN 978-0-262-19398-6.
Sutton, Richard S. (1988). "Learning to predict by the method of temporal differences".Machine Learning. 3: 9–44. doi:10.1007/BF00115009.
↑Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey".Journal of Artificial Intelligence Research. 4: 237–285.
↑Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). "Genetic reinforcement learning for neural networks".IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington, USA: IEEE.
↑François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning".Foundations and Trends in Machine Learning. 11 (3–4): 219–354.
↑Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., & Efros, A. A. (2018).Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217.
↑Algorta, S., & Şimşek, Ö. (2019).The Game of Tetris in Machine Learning. arXiv preprint arXiv:1905.01652.