Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Code for our USENIX Security 2021 paper -- CADE: Detecting and Explaining Concept Drift Samples for Security Applications

License

NotificationsYou must be signed in to change notification settings

whyisyoung/CADE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The repository contains the code for detecting and explaining a specific type of concept drift (i.e., previously unseen families) in security applications like malware attribution and network intrusion classification.

Further details can be found in the paper "CADE: Detecting and Explaining Concept Drift Samples for Security Applications" by Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadeh, Xinyu Xing, Gang Wang (USENIX Security 2021). We also include supplemental materials in the repo (USENIX_21_drifting_Supplementary_Materials.pdf) due to page limit. Check outhttp://liminyang.web.illinois.edu for up-to-date information on the project.

If you end up building on this research or code as part of a project or publication, please include a reference to the USENIX Security paper:

@inproceedings{yang2021cade,    title = {CADE: Detecting and Explaining Concept Drift Samples for Security Applications},    author = {Yang, Limin and Guo, Wenbo and Hao, Qingying and Ciptadi, Arridhana and Ahmadzadeh, Ali and Xing, Xinyu and Wang, Gang},    booktitle = {Proc. of USENIX Security},    year = {2021}}

1. Installation

Before getting started we recommend setting up a Python 3.6.5 or 3.6.8 virtual environment (other Python 3.6 or above versions might also work but didn't test).

  • If you are using CPU-based tensorflow, install all required packages:

    pip install -r requirements-tensorflow-cpu.txtpython setup.py install
  • If you are using GPU-based tensorflow, please try the following steps to setup:

    module load cuda-toolkit/9.0# other versions might also work but didn't test# you may also try pyenv and virtualenv to create the virtual environment, here we use Anacondaconda create -n cade-gpu python=3.6.8conda activate cade-gpupip install scipy==1.3.3pip install numpy==1.16.1pip install --ignore-installed tensorflow-gpu==1.12.0pip install keras==2.2.5pip install sklearn==0.23.2pip install matplotlib==3.1.2pip install seaborn==0.11.0pip install tqdm==4.49.0python setup.py install

2. Configuration

The preprocessed Drebin and IDS2018 dataset can be found under thedata folder. If you prefer to modify the preprocessing step, you may download the original dataset here:https://www.sec.cs.tu-bs.de/~danarp/drebin/index.html andhttps://www.unb.ca/cic/datasets/ids-2018.html and fill out the configuration incade/config.py.

3. Usage

There are a number of command line arguments to run our program:

$ python main.py -husage: main.py [-h] [--data DATA] [-c {mlp,rf}] [--stage {detect,explanation}]               [--pure-ae {0,1}] [--quiet {0,1}] [--cae-hidden CAE_HIDDEN]               [--cae-batch-size CAE_BATCH_SIZE] [--cae-lr CAE_LR]               [--cae-epochs CAE_EPOCHS] [--cae-lambda-1 CAE_LAMBDA_1]               [--similar-ratio SIMILAR_RATIO] [--margin MARGIN]               [--display-interval DISPLAY_INTERVAL]               [--mad-threshold MAD_THRESHOLD]               [--exp-method {distance_mm1,approximation_loose}]               [--exp-lambda-1 EXP_LAMBDA_1] [--mlp-retrain {0,1}]               [--mlp-hidden MLP_HIDDEN] [--mlp-batch-size MLP_BATCH_SIZE]               [--mlp-lr MLP_LR] [--mlp-epochs MLP_EPOCHS]               [--mlp-dropout MLP_DROPOUT] [--newfamily-label NEWFAMILY_LABEL]               [--tree TREE] [--rf-retrain {0,1}]

Seecade/utils.py or runpython main.py -h for detailed help. You may also checkrun_drebin_cade.sh for a bunch of examples.

4. Examples

4.1 Drift detection

  1. To get the detection performance of CADE on the Drebin dataset (iteratively choose one family from 8 families as the unseen family):

    ./run_drebin_cade.sh# After the shell script finished runningpython -u average_all_detection_results.py drebin 0# 0 means using CADE, while 1 means using Vanilla AE
  2. To get the detection performance of CADE on the IDS2018 dataset (iteratively choose one family from 3 families as the unseen family):

    ./run_ids_cade.sh# After the shell script finished runningpython -u average_all_detection_results.py IDS 0
  3. To get the detection performance of Vanilla Autoencoder on the Drebin dataset:

    ./run_drebin_pure_ae.sh# After the shell script finished runningpython -u average_all_detection_results.py drebin 1
  4. To get the detection performance of Vanilla Autoencoder on the IDS2018 dataset:

    ./run_ids_pure_ae.sh# After the shell script finished runningpython -u average_all_detection_results.py IDS 1

4.2 Drift explanation

  1. CADE explaining drift samples on the Drebin-Fakedoc setting (i.e., drebin_new_7):

    ./run_cade_exp_drebin_fakedoc.sh# It will generate reports/drebin_new_7/mask_distance_mm1_0.001.npz,# which is already provided.# This step is time-consuming and non-deterministic,# so we include the explanation output for saving reproduction time and easier comparison.
  2. CADE explaining drift samples on the IDS2018-Infiltration setting:

    ./run_cade_exp_ids_infiltration.sh# It will generate reports/IDS_new_Infilteration/mask_distance_mm1_0.001.npz,# which is already provided.
  3. Boundary-based explanation on the Drebin-Fakedoc setting:

    ./run_boundary_exp_drebin_fakedoc.sh# It will generate reports/drebin_new_7/mask_approximation_loose_0.001.npz,# which is already provided.
  4. Boundary-based explanation on the IDS2018-Infiltration setting:

    ./run_boundary_exp_ids_infiltration.sh# It will generate reports/IDS_new_Infilteration/mask_approximation_loose_0.001.npz,# which is already provided.
  5. Compare CADE with boundary-based explanation and random explanation (using distance as the evaluation metric)

    1. Drebin-FakeDoc
    # 1. To get original distance and CADE distancepython -u evaluate_explanation_by_distance.py drebin_new_7 distance_mm1 0.001 1 0.1# 2. To get random explanation distancepython -u evaluate_explanation_by_distance.py drebin_new_7 random 0.001 0 0.1# since we randomly run 100 times, there might be minor difference on the output.# 3. To get boundary-based explanation distancepython -u evaluate_explanation_by_distance.py drebin_new_7 approximation_loose 0.001 0 0.1# 4. To get gradient-based explanation distancenohup python -u evaluate_explanation_by_distance.py drebin_new_7 gradient 0.001 0 0.1 \> logs/nohup-drebin_new_7-gradient-exp.log&
    1. IDS2018-Infiltration
    # 1. To get original distance and CADE distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration distance_mm1 \0.001 1 0.1> logs/nohup-IDS-distance-mm1-exp.log&# 2. To get random explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration random \0.001 0 0.1> logs/nohup-IDS-random-exp.log&# since we randomly run 100 times, there might be minor difference on the output.# 3. To get boundary-based explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration \approximation_loose 0.001 0 0.1> logs/nohup-IDS-boundary-exp.log&# 4. To get gradient-based explanation distancenohup python -u evaluate_explanation_by_distance.py IDS_new_Infilteration gradient \0.001 0 0.1> logs/nohup-IDS-gradient-exp.log&

5. Contact

If you have any questions, please contact Limin (liminy2@illinois.edu).

6. Licensing

For ethical considerations, code and data is covered by a modified BSD 3-Clause License which restricts the use of the code to academic purposes and which specifically prohibits commercial applications.

Any redistribution or use of this software must be limited to the purposes of non-commercial scientific research or non-commercial education. Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.


[8]ページ先頭

©2009-2026 Movatter.jp