- Notifications
You must be signed in to change notification settings - Fork22
Deep Learning project template for PyTorch (multi-gpu training is supported)
License
NotificationsYou must be signed in to change notification settings
ryul99/pytorch-project-template
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- TensorBoard /wandb support
- Background generator is used (reason of using background generator)
- In Windows, background generator could not be supported. So if error occurs, set false to
use_background_generatorin config
- In Windows, background generator could not be supported. So if error occurs, set false to
- Training state and network checkpoint saving, loading
- Training state includes not only network weights, but also optimizer, step, epoch.
- Checkpoint includes only network weights. This could be used for inference.
- Hydra andOmegaconf is supported
- Distributed Learning using Distributed Data Parallel is supported
- Config with yaml file / easy dot-style access to config
- Code lint / CI
- Code Testing with pytest
assetsdir: icon image ofPytorch Project Template. You can remove this directory.configdir: directory for config filesdatasetdir: dataloader and dataset codes are here. Also, put dataset inmetadir.modeldir:model.pyis for wrapping network architecture.model_arch.pyis for coding network architecture.testsdir: directory forpytesttesting codes. You can check your network's flow of tensor by fixingtests/model/net_arch_test.py.Just copy & pasteNet_arch.forwardmethod tonet_arch_test.pyand addassertphrase to check tensor.utilsdir:train_model.pyandtest_model.pyare for train and test model once.utils.pyis for utility. random seed setting, dot-access hyper parameter, get commit hash, etc are here.writer.pyis for writing logs in tensorboard / wandb.
trainer.pyfile: this is for setting up and iterating epoch.
- python3 (3.8, 3.9, 3.10, 3.11 is tested)
- Write PyTorch version which you want to
requirements.txt. (https://pytorch.org/get-started/) pip install -r requirements.txt
- Config is written in yaml file
- You can choose configs at
config/default.yaml. Custom configs are underconfig/job/
- You can choose configs at
nameis train name you run.working_diris root directory for saving checkpoints, logging logs.deviceis device mode for running your model. You can choosecpuorcudadatafield- Configs for Dataloader.
- glob
train_dir/test_dirwithfile_formatfor Dataloader. - If
divide_dataset_per_gpuis true, origin dataset is divide into sub dataset for each gpu.This could mean the size of origin dataset should be multiple of number of using gpu.If this option is false, dataset is not divided but epoch goes up in multiple of number of gpus.
train/testfield- Configs for training options.
random_seedis for setting python, numpy, pytorch random seed.num_epochis for end iteration step of training.optimizeris for selecting optimizer. Onlyadam optimizeris supported for now.distis for configuring Distributed Data Parallel.gpusis the number that you want to use with DDP (gpusvalue is used atworld_sizein DDP).Not using DDP whengpusis 0, using all gpus whengpusis -1.timeoutis seconds for timeout of process interaction in DDP.When this is set as~, default timeout (1800 seconds) is applied ingloomode and timeout is turned off inncclmode.
modelfield- Configs for Network architecture and options for model.
- You can add configs in yaml format to config your network.
logfield- Configs for logging include tensorboard / wandb logging.
summary_intervalandcheckpoint_intervalare interval of step and epoch between training logging and checkpoint saving.- checkpoint and logs are saved under
working_dir/chkpt_dirandworking_dir/trainer.log. Tensorboard logs are saving underworking_dir/outputs/tensorboard
loadfield- loading from wandb server is supported
wandb_load_pathisRun pathin overview of run. If you don't want to use wandb load, this field should be~.network_chkpt_pathis path to network checkpoint file.If using wandb loading, this field should be checkpoint file name of wandb run.resume_state_pathis path to training state file.If using wandb loading, this field should be training state file name of wandb run.
pip install -r requirements-dev.txtfor install develop dependencies (this requires python 3.6 and above because of black)pre-commit installfor adding pre-commit to git hook
python trainer.py working_dir=$(pwd)
- https://github.com/open-mmlab/mmsr
- https://github.com/allenai/allennlp (test case writing)
About
Deep Learning project template for PyTorch (multi-gpu training is supported)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.