Movatterモバイル変換


[0]ホーム

URL:





Home Page

Papers

Submissions

News

Editorial Board

Open Source Software

Proceedings (PMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Near-optimal Regret Bounds for Reinforcement Learning

Thomas Jaksch, Ronald Ortner, Peter Auer; 11(51):1563−1600, 2010.

Abstract

For undiscounted reinforcement learning in Markov decisionprocesses (MDPs) we consider thetotal regret ofa learning algorithm with respect to an optimal policy.In order to describe the transition structure of an MDP we propose a new parameter:An MDP hasdiameterD if for any pair of statess,s' there isa policy which moves froms tos' in at mostD steps (on average).We present a reinforcement learning algorithm with total regretÕ(DS√AT) afterT steps for any unknown MDPwithS states,A actions per state, and diameterD.A corresponding lower bound ofΩ(√DSAT) on thetotal regret of any learning algorithm is given as well.
These results are complemented by a sample complexity bound on thenumber of suboptimal steps taken by our algorithm. This bound can beused to achieve a (gap-dependent) regret bound that is logarithmic inT.
Finally, we also consider a setting where the MDP is allowed to changea fixed number ofl times. We present a modification of our algorithmthat is able to deal with this setting and show a regret bound ofÕ(l1/3T2/3DS√A).

[pdf][bib]       
©JMLR 2010. (edit,beta)

[8]ページ先頭

©2009-2025 Movatter.jp