davide97l/rl-policies-attacks-defensesPublic

NotificationsYou must be signed in to change notification settings
Fork13
Star90

Adversarial attacks on Deep Reinforcement Learning (RL)

License

MIT license

90 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
a2c_ppo_acktr		a2c_ppo_acktr
baselines		baselines
benchmark		benchmark
drl_attacks		drl_attacks
drl_defenses		drl_defenses
img_attacks		img_attacks
img_defenses		img_defenses
log		log
log_2		log_2
log_adv_policy		log_adv_policy
log_def/PongNoFrameskip-v4		log_def/PongNoFrameskip-v4
log_perturbation_benchmark/PongNoFrameskip-v4/dqn		log_perturbation_benchmark/PongNoFrameskip-v4/dqn
net		net
notebook		notebook
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
atari_a2c_ppo.py		atari_a2c_ppo.py
atari_adversarial_training_a2c_ppo.py		atari_adversarial_training_a2c_ppo.py
atari_adversarial_training_dqn.py		atari_adversarial_training_dqn.py
atari_dqn.py		atari_dqn.py
atari_wrapper.py		atari_wrapper.py
example.ipynb		example.ipynb
requirements.txt		requirements.txt
utils.py		utils.py

Repository files navigation

Reinforcement Learning Adversarial Attacks and Defenses

DQN policy	Strategically-timed attack	Uniform attack	Adversarial training

This repository implements some classic adversarial attack methods for deep reinforcement learning agents including (drl_attacks/):

Uniform attack [link].
Strategical timed attack [link].
Critical point attack [link].
Critical strategy attack.
Adversarial policy attack [link].

It is also available the following RL-defense method (drl_defenses/):

Adversarial training [link].

Are provided also some image-defense methods (img_defenses/):

JPEG conversion [link].
Bit squeezing [link].
Image smoothing [link].

Most of this project is based on the RL frameworktianshou based on Pytorch. Image adversarial attacks and defenses are implemented withadvertorch, also based on Pytorch. A2C and PPO policies are instead based onpytorch-a2c-ppo-acktr-gail, DQN uses the tianshou implementation. Any image adversarial attacks is compatible with this project.

Available models

It also makes available trained models for different tasks which can be found in the folderlog. The following table reports their average score for three different algorithms: DQN, A2C and PPO.

task	DQN	A2C	PPO
PongNoFrameskip-v4	20	20	21
BreakoutNoFrameskip-v4	349	400	470
EnduroNoFrameskip-v4	751	NA	1064
QbertNoFrameskip-v4	4382	7762	14580
MsPacmanNoFrameskip-v4	2787	2230	1929
SpaceInvadersNoFrameskip-v4	640	856	1120
SeaquestNoFrameskip-v4	NA	1610	1798

Defended models are saved in the folderlog_def. The average reward is reported as X/Y where X is the reward underclear observations and Y is the reward under adversarial observations generated with uniform attack.

task	DQN (AdvTr)	A2C (AdvTr)	PPO (AdvTr)
PongNoFrameskip-v4	19.6/19.4	18.8/17.9	19.7/18.7

Image adversarial attacks effectiveness

The following table shows thesucceed ratio of some commonimage adversarial attacks methods attacking observations taken from different Atari games environment. (U) and (T) mean that attacks have been performed underuntargeted andtargeted settings respectively. The victim agent is a PPO model.

GSM: Gradient Sign Method (eps=0.01) [link]
PGDA: Projected Gradient Descent Attack (eps=0.01, iter=100) [link]
CW: Carlini&Wagner (iter=100) [link]

environment	GSM (U)	GSM (T)	PGDA (T)	CW (T)
PongNoFrameskip-v4	1	0.5	0.99	0.72
BreakoutNoFrameskip-v4	0.98	0.4	0.83	0.47
EnduroNoFrameskip-v4	1	0.34	0.37	0.3
QbertNoFrameskip-v4	1	0.34	0.5	0.47
MsPacmanNoFrameskip-v4	1	0.45	0.35	0.34
SpaceInvadersNoFrameskip-v4	0.99	0.54	0.67	0.26
SeaquestNoFrameskip-v4	1	0.8	0.5	0.4

Usage

Before start using this repository, install the required libraries in therequirements.txt file.

  pip install -r requirements.txt"

Train DQN agent to play Pong.

  python atari_dqn.py --task "PongNoFrameskip-v4"

Train A2C agent to play Breakout.

  python atari_a2c_ppo.py --env-name "BreakoutNoFrameskip-v4" --algo a2c

Test DQN agent playing Pong.

  python atari_dqn.py --resume_path "log/PongNoFrameskip-v4/dqn/policy.pth" --watch --test_num 10 --task "PongNoFrameskip-v4"

Test A2C agent playing Breakout.

  python atari_a2c_ppo.py --env-name "BreakoutNoFrameskip-v4" --algo a2c --resume_path "log/BreakoutNoFrameskip-v4/a2c/policy.pth" --watch --test_num 10

Train DQN malicious agent to play Pong minimizing the score.

  python atari_dqn.py --task "PongNoFrameskip-v4" --invert_reward --epoch 1

Defend Pong DQN agent withadversarial training.

 python atari_adversarial_training_dqn.py --task "PongNoFrameskip-v4" --resume_path "log/PongNoFrameskip-v4/dqn/policy.pth" --logdir log_def --eps 0.01 --image_attack fgm

Test defended Pong DQN agent.

python atari_adversarial_training_dqn.py --task "PongNoFrameskip-v4" --resume_path "log_def/PongNoFrameskip-v4/dqn/policy.pth" --eps 0.01 --image_attack fgm --target_model_path log/PongNoFrameskip-v4/dqn/policy.pth --watch --test_num 10

To understand how to perform adversarial attacks refer to theexample.ipynb file and to the benchmark examples contained in the folderbenchmark.Moreover, you can find more command examples in the followingpage.

Test attack transferability over policies

This section shows the performance of different adversarial attacks methods and their comparison between attacking a DQN agent and 3 surrogate agents: one trained with the same policy and the others trained on a different algorithm.

Uniform	Strategically-timed

Critical point	Adversarial policy

Test attack transferability over defended policies

This section shows the performance of different adversarial attacks methods and their comparison between attacking a DQN agent defended withadversarial training and 3 surrogate agents: one trained with the same policy and the others trained on a different algorithm. The model has been adversarially trained with eps=0.1 but we attack it with eps=0.5 to show significant performance degradation.

Uniform	Strategically-timed

Perturbation benchmark on defended policies

Test the performance of different image attacks methods attacking observations of DQN agent defended with different defense methods and attacking over different values of epsilon.Image attacks:

FGSM [link]
PGD: Projected Gradient Descent [link]
MI: Momentum Iterative [link]

FGSM adv training	PGD adv training

JPEG conversion	Bit squeezing

Support

If you found this project interesting please support me by giving it a ⭐, I would really appreciate it 😀

About

Adversarial attacks on Deep Reinforcement Learning (RL)

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Adversarial Attacks and Defenses

Available models

Image adversarial attacks effectiveness

Usage

Test attack transferability over policies

Test attack transferability over defended policies

Perturbation benchmark on defended policies

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

davide97l/rl-policies-attacks-defenses

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Adversarial Attacks and Defenses

Available models

Image adversarial attacks effectiveness

Usage

Test attack transferability over policies

Test attack transferability over defended policies

Perturbation benchmark on defended policies

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages