- Notifications
You must be signed in to change notification settings - Fork0
VIAME/bioharn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Training harness for biology related problems
Usesnetharn
(https://gitlab.kitware.com/computer-vision/netharn) to writethe boilerplate for training loops.
Scripts takekwcoco
datasets as inputs. Seehttps://gitlab.kitware.com/computer-vision/kwcoco for how to format in theextended-COCO format (regular MS-COCO files will also work).
To train a detection model seebioharn/detect_fit.py
.
To train a classification model seebioharn/clf_fit.py
.
To predict with a pretrained detection model seebioharn/detect_predict.py
.
To predict with a pretrained classification model seebioharn/clf_predict.py
.
To evaluate ROC / PR-curves with a pretrained detection model and truth seebioharn/detect_eval.py
.
To evaluate ROC / PR-curves with a pretrained classification model and truth seebioharn/clf_eval.py
.
Current supported detection models include
- YOLO-v2
- EfficientDet
- MaskRCNN - Requires mmdet
- CascadeRCNN - Requires mmdet
- RetinaNet - Requires mmdet
Older versions of bioharn were previously targeting mmdet 1.0 revision4c94f10d0ebb566701fb5319f5da6808df0ebf6a but we are now targeting v2.0.
This repo is a component of the VIAME project:https://github.com/VIAME/VIAME
some of the data for this project can be found here
https://data.kitware.com/#collection/58b747ec8d777f0aef5d0f6a
Notes for mmcv install on cuda 10.1 with torch 1.5:
See:https://github.com/open-mmlab/mmcv
pipinstallmmcv-full==latest+torch1.5.0+cu101-fhttps://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html
To train a simple detector let use the kwcoco toy data to make sure we can fitto a small dummy dataset. Lets use the kwcoco CLI to generate toy training. Forthis test we will forgo a validation set.
kwcoco toydata --key=shapes1024 --dst=toydata.kwcoco.json
For our notable hyperparameters we are going to use:
--optim=adam
- we will use the ADAM optimizer for faster convergence (may also want to trysgd
).--lr=1e-4
- we will start with a small learning rate--decay=1e-4
- we will use weight decay regularization of1e-4
to encourage smaller network weights.--window_dims=full
- which means that each batch item will sample a full image.--input_dims=512,512
- which means we are going to resize each image to H=512, W=512 (using letterboxing to preserve aspect ratio) before inputting the item to the network.--schedule=step-16-22
- will divide the learning rate by 10 at epoch 16 and 22.--augment=medium
- will do random flips, crops, and color jitter for augmentation (--augment=complex
will do much more and--augment=simple
will only do flips and crops).--num_batches=auto
- determines the number of batches per epoch. If auto it will use the entire dataset. If you set it to a number if will use that many batches per epoch with random sampling with replacement. This is useful if you are going to use over/undersampling via the--balance
CLI arg.
--batch_size=8
will use 8 items (sampled windows) per batch.
--bstep=8
will run 8 batches before backpropagating (approximates a larger batch size)
Seepython -m bioharn.detect_fit --help
for help on all available options.
python -m bioharn.detect_fit \ --name=det_baseline_toydata \ --workdir=$HOME/work/bioharn \ --train_dataset=./toydata.kwcoco.json \ --arch=retinanet \ --optim=adam \ --lr=1e-4 \ --schedule=step-16-22 \ --augment=medium \ --input_dims=512,512 \ --window_dims=full \ --window_overlap=0.0 \ --normalize_inputs=imagenet \ --num_batches=auto \ --workers=4 --xpu=auto --batch_size=8 --bstep=8 \ --sampler_backend=cog
This should start producing reasonable training-set bounding boxes after a fewminutes of training.
Because we are using netharn, training this detection model will write a"training directory" in your work directory. This directory will be a functionof your "name" and the hash of the learning-relevant hyperparameters.
In this case the training directory will be in:$HOME/work/bioharn/fit/runs/det_baseline_toydata/qxvodtak
and for convenience there willbe a symlink to this directory in$HOME/work/bioharn/fit/name/det_baseline_toydata
. Example training (andvalidation if specified) images will be written to themonitor
directory. Iftensorboard and matplotlib are installed themonitor
directory will alsocontain atensorboard
subdirectory with loss curves as they are produced.