whyisyoung/CADEPublic

NotificationsYou must be signed in to change notification settings
Fork38
Star144

Code for our USENIX Security 2021 paper -- CADE: Detecting and Explaining Concept Drift Samples for Security Applications

License

View license

144 stars 38 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
IDS_data_preprocess		IDS_data_preprocess
cade		cade
data		data
fig		fig
models		models
pure_ae_fig		pure_ae_fig
pure_ae_reports		pure_ae_reports
reports		reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
USENIX_21_drifting_Supplementary_Materials.pdf		USENIX_21_drifting_Supplementary_Materials.pdf
average_all_detection_results.py		average_all_detection_results.py
evaluate_explanation_by_distance.py		evaluate_explanation_by_distance.py
main.py		main.py
requirements-tensorflow-cpu.txt		requirements-tensorflow-cpu.txt
run_boundary_exp_drebin_fakedoc.sh		run_boundary_exp_drebin_fakedoc.sh
run_boundary_exp_ids_infiltration.sh		run_boundary_exp_ids_infiltration.sh
run_cade_exp_drebin_fakedoc.sh		run_cade_exp_drebin_fakedoc.sh
run_cade_exp_ids_infiltration.sh		run_cade_exp_ids_infiltration.sh
run_drebin_cade.sh		run_drebin_cade.sh
run_drebin_pure_ae.sh		run_drebin_pure_ae.sh
run_ids_cade.sh		run_ids_cade.sh
run_ids_pure_ae.sh		run_ids_pure_ae.sh
setup.py		setup.py

Repository files navigation

CADE: Contrastive Autoencoder for Drifting detection and Explanation

The repository contains the code for detecting and explaining a specific type of concept drift (i.e., previously unseen families) in security applications like malware attribution and network intrusion classification.

Further details can be found in the paper "CADE: Detecting and Explaining Concept Drift Samples for Security Applications" by Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadeh, Xinyu Xing, Gang Wang (USENIX Security 2021). We also include supplemental materials in the repo (USENIX_21_drifting_Supplementary_Materials.pdf) due to page limit. Check outhttp://liminyang.web.illinois.edu for up-to-date information on the project.

If you end up building on this research or code as part of a project or publication, please include a reference to the USENIX Security paper:

@inproceedings{yang2021cade,    title = {CADE: Detecting and Explaining Concept Drift Samples for Security Applications},    author = {Yang, Limin and Guo, Wenbo and Hao, Qingying and Ciptadi, Arridhana and Ahmadzadeh, Ali and Xing, Xinyu and Wang, Gang},    booktitle = {Proc. of USENIX Security},    year = {2021}}

1. Installation

Before getting started we recommend setting up a Python 3.6.5 or 3.6.8 virtual environment (other Python 3.6 or above versions might also work but didn't test).

If you are using CPU-based tensorflow, install all required packages:

pip install -r requirements-tensorflow-cpu.txtpython setup.py install

If you are using GPU-based tensorflow, please try the following steps to setup:

module load cuda-toolkit/9.0# other versions might also work but didn't test# you may also try pyenv and virtualenv to create the virtual environment, here we use Anacondaconda create -n cade-gpu python=3.6.8conda activate cade-gpupip install scipy==1.3.3pip install numpy==1.16.1pip install --ignore-installed tensorflow-gpu==1.12.0pip install keras==2.2.5pip install sklearn==0.23.2pip install matplotlib==3.1.2pip install seaborn==0.11.0pip install tqdm==4.49.0python setup.py install

2. Configuration

The preprocessed Drebin and IDS2018 dataset can be found under thedata folder. If you prefer to modify the preprocessing step, you may download the original dataset here:https://www.sec.cs.tu-bs.de/~danarp/drebin/index.html andhttps://www.unb.ca/cic/datasets/ids-2018.html and fill out the configuration incade/config.py.

3. Usage

There are a number of command line arguments to run our program:

$ python main.py -husage: main.py [-h] [--data DATA] [-c {mlp,rf}] [--stage {detect,explanation}]               [--pure-ae {0,1}] [--quiet {0,1}] [--cae-hidden CAE_HIDDEN]               [--cae-batch-size CAE_BATCH_SIZE] [--cae-lr CAE_LR]               [--cae-epochs CAE_EPOCHS] [--cae-lambda-1 CAE_LAMBDA_1]               [--similar-ratio SIMILAR_RATIO] [--margin MARGIN]               [--display-interval DISPLAY_INTERVAL]               [--mad-threshold MAD_THRESHOLD]               [--exp-method {distance_mm1,approximation_loose}]               [--exp-lambda-1 EXP_LAMBDA_1] [--mlp-retrain {0,1}]               [--mlp-hidden MLP_HIDDEN] [--mlp-batch-size MLP_BATCH_SIZE]               [--mlp-lr MLP_LR] [--mlp-epochs MLP_EPOCHS]               [--mlp-dropout MLP_DROPOUT] [--newfamily-label NEWFAMILY_LABEL]               [--tree TREE] [--rf-retrain {0,1}]

Seecade/utils.py or runpython main.py -h for detailed help. You may also checkrun_drebin_cade.sh for a bunch of examples.

4. Examples

4.1 Drift detection

To get the detection performance of CADE on the Drebin dataset (iteratively choose one family from 8 families as the unseen family):

./run_drebin_cade.sh# After the shell script finished runningpython -u average_all_detection_results.py drebin 0# 0 means using CADE, while 1 means using Vanilla AE

To get the detection performance of CADE on the IDS2018 dataset (iteratively choose one family from 3 families as the unseen family):
```
./run_ids_cade.sh# After the shell script finished runningpython -u average_all_detection_results.py IDS 0
```

To get the detection performance of Vanilla Autoencoder on the Drebin dataset:

./run_drebin_pure_ae.sh# After the shell script finished runningpython -u average_all_detection_results.py drebin 1

To get the detection performance of Vanilla Autoencoder on the IDS2018 dataset:

./run_ids_pure_ae.sh# After the shell script finished runningpython -u average_all_detection_results.py IDS 1

4.2 Drift explanation

CADE explaining drift samples on the Drebin-Fakedoc setting (i.e., drebin_new_7):

./run_cade_exp_drebin_fakedoc.sh# It will generate reports/drebin_new_7/mask_distance_mm1_0.001.npz,# which is already provided.# This step is time-consuming and non-deterministic,# so we include the explanation output for saving reproduction time and easier comparison.

CADE explaining drift samples on the IDS2018-Infiltration setting:

./run_cade_exp_ids_infiltration.sh# It will generate reports/IDS_new_Infilteration/mask_distance_mm1_0.001.npz,# which is already provided.

Boundary-based explanation on the Drebin-Fakedoc setting:

./run_boundary_exp_drebin_fakedoc.sh# It will generate reports/drebin_new_7/mask_approximation_loose_0.001.npz,# which is already provided.

Boundary-based explanation on the IDS2018-Infiltration setting:

./run_boundary_exp_ids_infiltration.sh# It will generate reports/IDS_new_Infilteration/mask_approximation_loose_0.001.npz,# which is already provided.

Compare CADE with boundary-based explanation and random explanation (using distance as the evaluation metric)

Drebin-FakeDoc

# 1. To get original distance and CADE distancepython -u evaluate_explanation_by_distance.py drebin_new_7 distance_mm1 0.001 1 0.1# 2. To get random explanation distancepython -u evaluate_explanation_by_distance.py drebin_new_7 random 0.001 0 0.1# since we randomly run 100 times, there might be minor difference on the output.# 3. To get boundary-based explanation distancepython -u evaluate_explanation_by_distance.py drebin_new_7 approximation_loose 0.001 0 0.1# 4. To get gradient-based explanation distancenohup python -u evaluate_explanation_by_distance.py drebin_new_7 gradient 0.001 0 0.1 \> logs/nohup-drebin_new_7-gradient-exp.log&

IDS2018-Infiltration

# 1. To get original distance and CADE distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration distance_mm1 \0.001 1 0.1> logs/nohup-IDS-distance-mm1-exp.log&# 2. To get random explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration random \0.001 0 0.1> logs/nohup-IDS-random-exp.log&# since we randomly run 100 times, there might be minor difference on the output.# 3. To get boundary-based explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration \approximation_loose 0.001 0 0.1> logs/nohup-IDS-boundary-exp.log&# 4. To get gradient-based explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration gradient \0.001 0 0.1> logs/nohup-IDS-gradient-exp.log&

5. Contact

If you have any questions, please contact Limin (liminy2@illinois.edu).

6. Licensing

For ethical considerations, code and data is covered by a modified BSD 3-Clause License which restricts the use of the code to academic purposes and which specifically prohibits commercial applications.

Any redistribution or use of this software must be limited to the purposes of non-commercial scientific research or non-commercial education. Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.

About

Code for our USENIX Security 2021 paper -- CADE: Detecting and Explaining Concept Drift Samples for Security Applications

liminyang.web.illinois.edu

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

CADE: Contrastive Autoencoder for Drifting detection and Explanation

1. Installation

2. Configuration

3. Usage

4. Examples

4.1 Drift detection

4.2 Drift explanation

5. Contact

6. Licensing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

whyisyoung/CADE

Folders and files

Latest commit

History

Repository files navigation

CADE: Contrastive Autoencoder for Drifting detection and Explanation

1. Installation

2. Configuration

3. Usage

4. Examples

4.1 Drift detection

4.2 Drift explanation

5. Contact

6. Licensing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages