Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

neu-spiral/CaP2

 
 

Repository files navigation

This codebase contains the implementation of[Communication-Aware DNN Pruning] (INFOCOM2023).

Introduction

We propose a Communication-aware Pruning (CaP) algorithm, a novel distributed inference framework for distributing DNN computations across a physical network.Departing from conventional pruning methods, CaP takes the physical network topology into consideration and produces DNNs that are communication-aware, designed for both accurate and fast execution over such a distributed deployment.Our experiments on CIFAR-10 and CIFAR-100, two deep learning benchmark datasets, show that CaP beats state of the art competitors by up to 4% w.r.t. accuracy on benchmarks.On experiments over real-world scenarios, it simultaneously reduces total execution time by 27%--68% at negligible performance decrease (less than 1%).

Environment Setup

Please install either python 3.9.X or 3.10.X and create a virtual environment using the requirements.txt file.

Instructions

We provide a sample bash script to run our method at 0.75 sparsity ratio on CIFAR-10.

To run CaP:

source env.shrun-cifar10-resnet18.sh

Split Network Emulation

Model inference over a network can be emulated by starting multiple threads on a local or remote machine. For windows users, it is assumed that WSL or another bash emulator is installed. The following is the procedure for making a run:

  1. Train and prune the model
  2. Save the model in the assets/models folder
  3. Create the network setup files (see config/resnet_4_network as example):
    i. config-leaf.json -- indicates how to reach (via ip and port) leaf nodes of network for input transmission
    ii. ip-map.json -- indicates server (ip and port) that each node monitors for incoming connections
    iii. network-graph.json -- defines network graph topology
  4. Update local_network/start_servers.sh and local_network/start_server_helper.bat (windows) or local_network/start_servers_linux.sh (linux) with the following:
    i. file paths to network setup files
    ii. the python environment activation
    iii. terminal/bash emulator (e.g. gnome-terminal, terminator, etc.).
    iv. model name
  5. Setup the servers (from directory ./CaP):
  # Windows (with wsl)   bash local_network/start_servers.sh   # Linux   bash local_network/start_servers_linus.sh
  1. Activate python environment
  2. Send inputs:
# Windows python -m source.utils.send_start_message [path to config-leaf.json]# Linuxpython -m ./source/utils/send_start_message.py [path to config-leaf.json]

Ouputs will appear in the logs/[dir log out] folder specified in the start servers script. Post processing and visaulization tools are found in sandbox/plot_timing.ipynb

Split Network Inference on Colosseum

Example colosseum run procedure. This assumes full and split model files have been loaded onto the file-proxy server at /share/nas/[team name]/CaP-Models/perm beforehand (TODO: generalize, add detail, and verify works):

  1. Connect to VPN via cisco
  2. Make a reservation. Use the CAP-wifi-v1 container for WiFi nodes and JARVIS-server-cap1 for server nodes [TODO: make and test container for UE and base station nodes]
  3. While waiting for SRN nodes to spin up, modify the bash scripts in the CaP/colosseum folder:
    i. Manually configure colosseum/nodes.txt with the correct SRN numbers
    ii. In prep_run.sh, update leaf node connection type [WARNING: not tested for heterogeneous networks] and rf scenario (see colosseum documentation for more details)
    iii. In ./start_servers_colosseum.sh, select model file, batch size, and specify log output directory name
    iv. In ./start_run.sh, uncomment/comment commands based on SRN type
  4. Open bash session in folder CaP/colosseum
  5. Move repo to SRN nodes, start rf, collect ip addresses, and build json config:
bash ./prep_run.sh
  1. Start servers on SRN nodes for split model execution (NOTE: resnet101 models take 1-2 minutes to load):
bash ./start_servers_colosseum.sh
  1. Send starting message to nodes:
bash ./start_run.sh [srn #]
  1. Kill servers (can also be used to kill RF scenario, see script comments for details)
bash ./kill_servers.sh
  1. Update the log file name in CaP/colosseum/start_servers_colosseum.sh (separate run outputs) and repeat steps 9-11 for next run until finished with all runs
  2. Inspect logging messages saved to colosseum's file-proxy server /share/nas/[team name] after end of reservation

Cite

@article{jian2023cap,  title={Communication-Aware DNN Pruning},  author={Jian, Tong and Roy, Debashri Roy and Salehi, Batool and Soltani, Nasim and Chowdhury, Kaushik and Ioannidis, Stratis}  journal={INFOCOM},  year={2023}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook88.4%
  • Python10.8%
  • Other0.8%

[8]ページ先頭

©2009-2025 Movatter.jp