Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Online solver based on Monte Carlo tree search for POMDPs with continuous state, action, and observation spaces.

License

NotificationsYou must be signed in to change notification settings

JuliaPOMDP/POMCPOW.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CIcodecov

POMCPOW is an online solver based on Monte Carlo tree search for POMDPs with continuous state, action, and observation spaces. For more information, seehttps://arxiv.org/abs/1709.06196 (code to reproduce the experiments in this paper can be foundhere).

This POMCPOW implementation solves problems specified using thePOMDPs.jl interface. The requirements are the same as for an importance-sampling particle filter - a generative model for the dynamics and an explicit observation model.

Installation

For Julia 1.0 and above, use the JuliaPOMDP registry:

import PkgPkg.add("POMCPOW")

Usage

using POMDPsusing POMCPOWusing POMDPModelsusing POMDPToolssolver=POMCPOWSolver(criterion=MaxUCB(20.0))pomdp=BabyPOMDP()# from POMDPModelsplanner=solve(solver, pomdp)hr=HistoryRecorder(max_steps=100)hist=simulate(hr, pomdp, planner)for (s, b, a, r, sp, o)in hist@show s, a, r, spendrhist=simulate(hr, pomdp,RandomPolicy(pomdp))println("""    Cumulative Discounted Reward (for 1 simulation)        Random:$(discounted_reward(rhist))        POMCPOW:$(discounted_reward(hist))""")

Algorithm options are controlled with keyword arguments to the constructor. Use?POMCPOWSolver to see a list of options. It should output the following:

Fields:

  • eps::Float64:Rollouts and tree expansion will stop when discount^depth is less than this.default:0.01
  • max_depth::Int:Rollouts and tree expension will stop when this depth is reached.default:10
  • criterion::Any:Criterion to decide which action to take at each node. e.g.MaxUCB(c),MaxQ, orMaxTries.default:MaxUCB(1.0)
  • final_criterion::Any:Criterion for choosing the action to take after the tree is constructed.default:MaxQ()
  • tree_queries::Int:Number of iterations during each action() call.default:1000
  • max_time::Float64:Time limit for planning at each steps (seconds).default:Inf
  • rng::AbstractRNG:Random number generator.default:Base.GLOBAL_RNG
  • node_sr_belief_updater::Updater:Updater for state-reward distribution at the nodes.default:POWNodeFilter()
  • estimate_value::Any: (rollout policy can be specified by setting this to RolloutEstimator(policy))Function, object, or number used to estimate the value at the leaf nodes.If this is a functionf,f(pomdp, s, h::BeliefNode, steps) will be called to estimate the value.If this is an objecto,estimate_value(o, pomdp, s, h::BeliefNode, steps) will be called.If this is a number, the value will be set to that number.default:RolloutEstimator(RandomSolver(rng))
  • enable_action_pw::Bool:Controls whether progressive widening is done on actions; iffalse, the entire action space is used.default:true
  • check_repeat_obs::Bool:Check if an observation was sampled multiple times. This has some dictionary maintenance overhead, but prevents multiple nodes with the same observation from being created. If the observation space is discrete, this should probably be used, but can be turned off for speed.default:true
  • check_repeat_act::Bool:Check if an action was sampled multiple times. This has some dictionary maintenance overhead, but prevents multiple nodes with the same action from being created. If the action space is discrete, this should probably be used, but can be turned off for speed.default:true
  • k_action::Float64,alpha_action::Float64,k_observation::Float64,alpha_observation::Float64:These constants control the double progressive widening. A new observationor action will be added if the number of children is less than or equal to kN^alpha.defaults: k:10, alpha:0.5
  • init_V::Any:Function, object, or number used to set the initial V(h,a) value at a new node.If this is a functionf,f(pomdp, h, a) will be called to set the value.If this is an objecto,init_V(o, pomdp, h, a) will be called.If this is a number, V will be set to that numberdefault:0.0
  • init_N::Any:Function, object, or number used to set the initial N(s,a) value at a new node.If this is a functionf,f(pomdp, h, a) will be called to set the value.If this is an objecto,init_N(o, pomdp, h, a) will be called.If this is a number, N will be set to that numberdefault:0
  • next_action::AnyFunction or object used to choose the next action to be considered for progressive widening.The next action is determined based on the POMDP, the belief,b, and the currentBeliefNode,h.If this is a functionf,f(pomdp, b, h) will be called to set the value.If this is an objecto,next_action(o, pomdp, b, h) will be called.default:RandomActionGenerator(rng)
  • default_action::Any:Function, action, or Policy used to determine the action if POMCP fails with exceptionex.If this is a Functionf,f(belief, ex) will be called.If this is a Policyp,action(p, belief) will be called.If it is an objecta,default_action(a, belief, ex) will be called, andif this method is not implemented,a will be returned directly.

Check outVDPTag2.jl for an additional problem that is solved by POMCPOW.

About

Online solver based on Monte Carlo tree search for POMDPs with continuous state, action, and observation spaces.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors11

Languages


[8]ページ先頭

©2009-2025 Movatter.jp