- Notifications
You must be signed in to change notification settings - Fork9
Help! I'm lost in the flatland!
License
JuliaReinforcementLearning/GridWorlds.jl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A package for creating grid world environments for reinforcement learning in Julia. This package is designed to be lightweight and fast.
This package is inspired bygym-minigrid. In order to cite this package, please refer to the fileCITATION.bib
. Starring the repository on GitHub is also appreciated. For benchmarks, refer tobenchmarks/benchmarks.md
.
- SingleRoomUndirected
- SingleRoomDirected
- GridRoomsUndirected
- GridRoomsDirected
- SequentialRoomsUndirected
- SequentialRoomsDirected
- MazeUndirected
- MazeDirected
- GoToTargetUndirected
- GoToTargetDirected
- DoorKeyUndirected
- DoorKeyDirected
- CollectGemsUndirected
- CollectGemsDirected
- CollectGemsMultiAgentUndirected
- DynamicObstaclesUndirected
- DynamicObstaclesDirected
- SokobanUndirected
- SokobanDirected
- Snake
- Catcher
- TransportUndirected
- TransportDirected
import GridWorlds as GW# Each environment `Env` lives in its own module `EnvModule`# For example, the `SingleRoomUndirected` environment lives inside the `SingleRoomUndirectedModule` moduleenv= GW.SingleRoomUndirectedModule.SingleRoomUndirected()# reset the environment. All environments are randomizedGW.reset!(env)# get names of actions that can be performed in this environmentGW.get_action_names(env)# perform actions in the environmentGW.act!(env,1)# move upGW.act!(env,2)# move downGW.act!(env,3)# move leftGW.act!(env,4)# move right# play an environment interactively inside the terminalGW.play!(env)# play and record the interaction in a file called recording.txtGW.play!(env, file_name="recording.txt")# manually step through the frames in the recordingGW.replay(file_name="recording.txt")# replay the recording inside the terminal at a given frame rateGW.replay(file_name="recording.txt", frame_rate=2)# use the RLBase APIimport ReinforcementLearningBase as RLBase# wrap a game instance from this package to create an RLBase compatible environmentrlbase_env= GW.RLBaseEnv(env)# perform RLBase operations on the wrapped environmentRLBase.reset!(rlbase_env)state= RLBase.state(rlbase_env)action_space= RLBase.action_space(rlbase_env)reward= RLBase.reward(rlbase_env)done= RLBase.is_terminated(rlbase_env)rlbase_env(1)# move uprlbase_env(2)# move downrlbase_env(3)# move leftrlbase_env(4)# move right
This package does not intend to reinvent a fully usable reinforcement learning API. Instead, all the games in this package provide the bare minimum of what is needed to for the game logic, which is the ability to reset an environment usingGW.reset!(env)
and to perform actions in the environment usingGW.act!(env, action)
. In order to utilize such a game for reinforcement learning, you would probably be using a higher level reinforcement learning API like the one offered by theReinforcementLearning.jl
package (RLBase
API), for example. As of this writing, all the environments provide a default implementation for theRLBase
API, which means that you can easily wrap a game fromGridWorlds.jl
and use it directly with the rest of theReinforcementLearning.jl
ecosystem.
There are a few possible options for representing the state/observation for an environment. You can use the entire tile map. You can also augment that with other environment specific information like the agent's direction, target (in
GoToTargetUndirected
) etc. In several games, you can also use theGW.get_sub_tile_map!
function to get a partial view of the tile map to be used as the observation.All environemnts provide a default implementation of the
RLBase.state
function. It is recommended that before performing reinforcement learning experiments using an environment, you carefully understand the information contained in the state representation for that environment.As of this writing, all actions in all environments are discrete. And so, to keep things simple and consistent, they are represented by elements of
Base.OneTo(NUM_ACTIONS)
(basically integers going from 1 to NUM_ACTIONS). In order to know which action does what, you can callGW.get_action_names(env)
to get a list of names which gives a better description. For example:julia> env= GW.SingleRoomUndirectedModule.SingleRoomUndirected();julia> GW.get_action_names(env)(:MOVE_UP,:MOVE_DOWN,:MOVE_LEFT,:MOVE_RIGHT)
The order of elements in this list corresponds to that of the actions.
As mentioned before, in order to use these for reinforcement learning experiments, you would mostly be using a higher level API like
RLBase
, which should already provide a way to get these values. For example, in RLBase, rewards can be accessed usingRLBase.reward(env)
and checking whether an environment has terminated or not can by done by callingRLBase.is_terminated(env)
. In case you are using some other API and need more direct control, it is better to take a look at the implementation for that environment to access things like reward and check for termination.
Each environment contains a tile map, which is aBitArray{3}
that encodes information about the presence or absence of objects in the grid world. It is of size(num_objects, height, width)
. The second and third dimensions correspond to positions along the height and width of the tile map. The first dimension corresponds to the presence or absence of objects at a particular position using a multi-hot encoding along the first dimension. You can get the name and ordering of objects along the first dimension of the tile map by using the following method:
julia> env= GW.SingleRoomUndirectedModule.SingleRoomUndirected();julia> GW.get_object_names(env)(:AGENT,:WALL,:GOAL)
Several environments contain the wordUndirected
orDirected
within their name. This refers to the navigation style of the agent.Undirected
means that the agent has no direction associated with it, and navigates around by directly moving up, down, left, or right on the tile map.Directed
means that the agent has a direction associated with it, and it navigates around by moving forward or backward along its current direction, or it could also turn left or right with respect to its current direction. There are 4 directions -UP
,DOWN
,LEFT
, andRIGHT
.
All the environments can be played directly inside the REPL. These interactive sessions can also be recorded in plain text files and replayed in the terminal. There are two ways to replay a recording:
- The default way is to manually step through each recorded frame. This allows you to move through the frames one by one at your own pace using keyboard inputs.
- The second way is to replay the frames at a given frame rate. This would loop through all the frames once and then (and only then) exit the replay.
Here is an example:
In order to programmatically record the behavior of an agent during an episode, you can simply log the string representation of the environment at each step prefixed with a delimiter. You can also log other arbitrary information if you want, like the total reward so far, for example. You can then use theGW.replay
functiton to replay the recording inside the terminal. The string representation of an environment can be obtained usingrepr(MIME"text/plain"(), env)
. Here is an example:
import GridWorlds as GWimport ReinforcementLearningBase as RLBasegame= GW.SingleRoomUndirectedModule.SingleRoomUndirected()env= GW.RLBaseEnv(game)frame_start_delimiter="SOME_FRAME_START_DELIMITER"total_reward=zero(RLBase.reward(env))frame_number=1str=""str= str* frame_start_delimiterstr= str*"frame_number:$(frame_number)\n"str= str*repr(MIME"text/plain"(), env)str= str*"\ntotal_reward:$(total_reward)"while!RLBase.is_terminated(env) action=rand(RLBase.action_space(env))env(action) reward= RLBase.reward(env)global total_reward+= rewardglobal frame_number+=1global str= str* frame_start_delimiterglobal str= str*"frame_number:$(frame_number)\n"global str= str*repr(MIME"text/plain"(), env)global str= str*"\ntotal_reward:$(total_reward)"endwrite("recording.txt", str)GW.replay(file_name="recording.txt", frame_start_delimiter= frame_start_delimiter)
InReinforcementLearning.jl
, you can create ahook for recording the agent's behavior at any point during training.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates.
The objective of the agent is to navigate its way to the desired target. When the agent reaches the desired target, it receives a reward of 1. When the agent reaches the other target, it receives a reward of -1. In either case, the environment terminates upon reaching a target.
The objective of the agent is to navigate its way to the desired target. When the agent reaches the desired target, it receives a reward of 1. When the agent reaches the other target, it receives a reward of -1. In either case, the environment terminates upon reaching a target.
The objective of the agent is to collect the key and navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. Without picking up the key, the agent will not be able to pass through the door that separtes the agent and goal.
The objective of the agent is to collect the key and navigate its way to the goal. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. Without picking up the key, the agent will not be able to pass through the door that separtes the agent and goal.
The objective of the agent is to collect all the randomly scattered gems. When the agent collects a gem, it receives a reward of 1. The environment terminates when the agent has collected all the gems.
The objective of the agent is to collect all the randomly scattered gems. When the agent collects a gem, it receives a reward of 1. The environment terminates when the agent has collected all the gems.
The objective of the agents is to collect all the randomly scattered gems. The agents take turns for performing actions. When an agent collects a gem, the environment gives a reward of 1. The environment terminates when the agents have collected all the gems.
The objective of the agent is to navigate its way to the goal while avoiding collision with obstacles. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. If the agent collides with an obstacle, the agent receives a reward of -1 and the environment terminates.
The objective of the agent is to navigate its way to the goal while avoiding collision with obstacles. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. If the agent collides with an obstacle, the agent receives a reward of -1 and the environment terminates.
The agent needs to push the boxes onto the target positions. The levels are taken fromhttps://github.com/deepmind/boxoban-levels. Upon each reset, a level is randomly selected fromhttps://github.com/deepmind/boxoban-levels/blob/master/medium/train/000.txt. The level dataset can be dynamically swapped during runtime in case more levels are needed. One way to achieve this while using
ReinforcementLearning.jl
is with the help ofhooks.The agent needs to push the boxes onto the target positions. The levels are taken fromhttps://github.com/deepmind/boxoban-levels. Upon each reset, a level is randomly selected fromhttps://github.com/deepmind/boxoban-levels/blob/master/medium/train/000.txt. The level dataset can be dynamically swapped during runtime in case more levels are needed. One way to achieve this while using
ReinforcementLearning.jl
is with the help ofhooks.The objective of the agent is to eat as many food pellets as possible. As soon as the agent eats a food pellet, the length of its body incrases by one and it receives a reward of 1. When the agent tries to move into a wall or into its body, it receives a reward of
- tile_map_height * tile_map_width
and the environment terminates. When the agent collects all the food pellets possible, it receives a reward oftile_map_height * tile_map_width
+ 1 (for the last food pellet it ate).The objective of the agent is to keep catching the falling gems for as long as possible. It receives a reward of 1 when it catches a gem and a new gem gets spawned in the next step. When the agent misses catching a gem, it receives a reward of -1 and the environment terminates.
The objective of the agent is to pick up the gem and drop it to the target location. When the agent drops the gem at the target location, it receives a reward of 1 and the environment terminates.
The objective of the agent is to pick up the gem and drop it to the target location. When the agent drops the gem at the target location, it receives a reward of 1 and the environment terminates.
About
Help! I'm lost in the flatland!
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors7
Uh oh!
There was an error while loading.Please reload this page.