- Notifications
You must be signed in to change notification settings - Fork0
Implementation of the Q-learning and SARSA algorithms to solve the CartPole-v1 environment. [Advance Machine Learning project - UniGe]
License
ErfanFathi/RL_Cartpole
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project implements the Q-learning and SARSA algorithms to solve the CartPole-v1 environment from OpenAI Gym. The Q-learning algorithm learns an optimal action-value function, while the SARSA algorithm learns an action-value function based on the current policy. The goal is to balance a pole on a cart by applying appropriate forces.
- Clone the repository or download the source code files.
git clone git@github.com:ErfanFathi/RL_Cartpole.git
- Install the required packages.
pip3 install -r requirements.txt
- Run the script with the desired parameters. Use the following command to see the available options:
python3 main.py --help
This script uses command-line arguments to configure the learning parameters and other settings. You can specify the following options:
--algorithm
: The algorithm to use for learning. Valid options areq_learning
andsarsa
. Default isq_learning
.--alpha
: The learning rate. Default is0.1
.--gamma
: The discount factor. Default is0.995
.--epsilon
: The probability of choosing a random action. Default is0.1
.--num_episodes
: The number of episodes to run. Default is1000
.--num_steps
: The maximum number of steps per episode. Default is500
.--num_bins
: The number of bins to use for discretizing the state space. Default is20
.
e.g.:
python3 main.py --algorithm q_learning --alpha 0.2 --gamma 0.99 --num_episodes 2000
- The script will execute the chosen algorithm on the CartPole-v1 environment. It will print the name of the generated file containing the results.
- After the execution, a plot of the rewards obtained during the learning process will be saved in the
plots
directory as a PNG file.
- After the execution, a plot of the rewards obtained during the learning process will be saved in the
- Additionally, frames of the agent's behavior will be rendered and saved as a GIF file in the
videos
directory. This provides a visual representation of the learned policy.
- Additionally, frames of the agent's behavior will be rendered and saved as a GIF file in the
Feel free to use, modify this code. And please feel free to fork the codefrom Github and send pull requests.
Report any comment or bugs to:
fathierfan97@gmail.com
Regards,
Erfan Fathi