- Notifications
You must be signed in to change notification settings - Fork75
Faster R-CNN / R-FCN 💡 C++ version based on Caffe
License
makefile/frcnn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Special Features for This Caffe Repository
- approximate joint train, test and evaluate models of Faster rcnn/R-FCN, .etc
- support multi-GPU training
- supportR-FCN with OHEM
- supportLight-head R-CNN /R-FCN++
- supportCascade R-CNN
- support FPN (Feature Pyramid Network)
- supportDeformable Conv and Deformable PSROIPooling
- support SSD layers
- support YOLOv3 inference
- Action recognition (Two Stream CNN)
- CTPN layers for scene text detection, port fromtianzhi0549/CTPN
- script for merging
Conv + BatchNorm + Scale
layers to 1 layer when those layer are freezed to reduce memory: 'examples/FRCNN/res50/gen_merged_model.py'. script for merge ResNet: 'examples/FRCNN/merge_resnet.sh'. - support snapshot after got -SIGTERM (kill command's default signal)
- logger tools by VisualDL which can visualize loss scalars and feature images .etc
- support NMS and IOU calc on GPU,Soft-NMS on CPU
- support box-voting & multi-scale testing
- support solver learning rate warm-up strategy & cosine decay lr & Cyclical lr (see sgd_solver.cpp)
- support model file encrypt/decrypt, see 'encrypt_model.cpp' & 'frcnn_api.cpp'
Special layers
- ROIAlign proposed inMask R-CNN
- FocalLoss inFocal Loss for Dense Object Detection
- Swish Activation function inSearching for Activation Functions
- Eltwise layer using in-place sum to reduce memory, fromthis PR
- caffe layer module, layer definition and usage like
Python layer
,from caffePR#5294 - CuDNNDeconv layer, Depth-wise Conv layer, Upsample layer
- CTPN layers include LSTM layer implemented by@junhyukoh,which is faster than upstream master branch of Caffe.
Data Preprocess
data enhancement:
- support Histogram equalization of color image
- haze-free algorithm
data augmentation:
- random flip horizontal
- random jitter
- hue, saturation, exposure
- rotate(multiple of 90 degree)
TODO list
- support batch image greater than 1 (on branch batch)
- support Rotated R-CNN for rotated bounding box (on branch r-frcnn)
- support OHEM (see r-fcn)
This repository uses C++11 features, so make sure to use compiler that is compatible of C++11.
Tested on CUDA 8.0/9.2, CuDNN 7.0, NCCLv1#286916a.
GCC v5.4.0/7.3.1, note that versions lower than v5 are not supported.Python 2.7 for python scripts.
cd$CAFFE_ROOTcp Makefile.config.example Makefile.config# modify the content in Makefile.config to adapt your system# if you like to use VisualDL to log losses, set USE_VISUALDL to 1,# and cd src/logger && makemake -j7# extra: 'py' for python interface of Caffe.# extra: 'pyfrcnn' python wrapper of C++ api. You can use this for demo.make pyfrcnn py
All following steps, you should do these in the$CAFFE_ROOT
path.
The officialFaster R-CNN code of NIPS 2015 paper (written in MATLAB) isavailable here. It is worth noticing that:
- This repository contains a C++ reimplementation of the Python code(py-faster-rcnn), which is built oncaffe.
- This repository used code fromcaffe-faster-rcnn
commit 8ba1d26
as base framework.
Usingsh example/FRCNN/demo_frcnn.sh
, the will process five pictures in theexamples/FRCNN/images
, and put results intoexamples/FRCNN/results
.
Note: You should prepare the trained caffemodel intomodels/FRCNN
, such asZF_faster_rcnn_final.caffemodel
for ZF model.
- The list of training data is
examples/FRCNN/dataset/voc2007.trainval
. - The list of testing data is
examples/FRCNN/dataset/voc2007.trainval
. - Create symlinks for the PASCAL VOC dataset
ln -s $YOUR_VOCdevkit_Path $CAFFE_ROOT/VOCdevkit
.
As shown in VGG examplemodels/FRCNN/vgg16/train_val.proto
, the original pictures should appear at$CAFFE_ROOT/VOCdevkit/VOC2007/JPEGImages/
. (Check window_data_param in FrcnnRoiData)
If you want to train Faster R-CNN on your own dataset, you may prepare custom dataset list.The format is as below
# image-idimage-namenumber of boxeslabel x1 y1 x2 y2 difficulty...
sh examples/FRCNN/zf/train_frcnn.sh
will start training process of voc2007 data using ZF model.
The ImageNet pre-trained models can be found inthis link
If you use the provided training script, please make sure:
- VOCdevkit is within $CAFFE_ROOT and VOC2007 in within VOCdevkit
- ZF pretrain model should be put into models/FRCNN/ as ZF.v2.caffemodel
examples/FRCNN/convert_model.py
transform the parameters ofbbox_pred
layer by mean and stds values,because the regression value is normalized during training and we should recover it to obtain the final model.
sh examples/FRCNN/zf/test_frcnn.sh
the will evaluate the performance of voc2007 test data using the trained ZF model.
- First Step of This Shell : Test all voc-2007-test images and output results in a text file.
- Second Step of This Shell : Compare the results with the ground truth file and calculate the mAP.
The program use config file named likeconfig.json
to set params. Special params need to be cared about:
data_jitter
: data augmentation, if set <0 then no jitter,hue,saturation,exposureim_size_align
: set to stride of last conv layer of FPN to avoid Deconv shape problem, such as 64, set to 0 to disablebbox_normalize_targets
: do bbox norm in training, and do unnorm at testing(do not need convert model weight before testing)test_rpn_score_thresh
: you can set >0 to speed up NMS at testing
Scripts and prototxts for different models are listed in theexamples/FRCNN
More details about the code in include and src directory:
api/FRCNN
for demo and test apicaffe/FRCNN
contains codes related to Faster R-CNNcaffe/RFCN
for R-FCNcaffe/DeformConv
for Deformable Convcaffe/SSD
for SSDexamples/YOLO
for YOLOv3 inference, includes converter script and demo. pay attention to the Upsample layer usage.logger
dir relates to logger toolsmodules
andyaml-cpp
relate to Caffe module layers, which include FPN layers .etcpython/frcnn
relates to pybind11 interface for democaffe/ACTION_REC
Two-Stream Convolutional Networks for Action Recognition in Videocaffe/CTPN
relates to CTPN special layers for scene text detectioncaffe/PR
for some layers from caffe PR
For synchronous with official caffe
- git remote add caffehttps://github.com/BVLC/caffe.git
- git fetch caffe
- git checkout master
- git rebase caffe/master
Rebase the dev branch
- git checkout dev
- git rebase master
- git push -f origin dev
- CUB not found, when compile for GPU version,
frcnn_proposal_layer.cu
requires a head file<cub/cub.cuh>
. CUB is library contained in the official Cuda Toolkit, usually can be found in/usr/local/cuda/include/thrust/system/cuda/detail/
. You should add this path in yourMakefile.config
(trylocate cub.cuh
to find cub on your system) - When Get
error: RPC failed; result=22, HTTP code = 0
, usegit config http.postBuffer 524288000
, increases git buffer to 500mb - Cannot load module layer dynamic library, the program search the modules first in enviroment variable
CAFFE_LAYER_PATH
then in predefinedDEFAULT_LAYER_PATH
in Makefile. So try to setCAFFE_LAYER_PATH
in shell script. And this could be happen when using pycaffe. - about R-FCN: currently not support class-agnostic (although it is easy to modify), and OHEM method has very little improvement in joint train. also remember to set
bg_thresh_lo
to 0 when use OHEM.
Caffe is released under theBSD 2-Clause license.The BAIR/BVLC reference models are released for unrestricted use.
Please cite the following papers in your publications if it helps your research:
@article{jia2014caffe, Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, Journal = {arXiv preprint arXiv:1408.5093}, Title = {Caffe: Convolutional Architecture for Fast Feature Embedding}, Year = {2014}}@inproceedings{girshick2015fast, title={Fast R-CNN}, author={Girshick, Ross}, booktitle={International Conference on Computer Vision}, pages={1440--1448}, year={2015}}@inproceedings{ren2015faster, title={Faster {R-CNN}: Towards real-time object detection with region proposal networks}, author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian}, booktitle={Neural Information Processing Systems}, pages={91--99}, year={2015}}@article{ren2017faster, title={Faster {R-CNN}: Towards real-time object detection with region proposal networks}, author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume={39}, number={6}, pages={1137--1149}, year={2017}, publisher={IEEE}}@article{dai16rfcn, Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks}, Journal = {arXiv preprint arXiv:1605.06409}, Year = {2016}}@article{dai17dcn, Author = {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title = {Deformable Convolutional Networks}, Journal = {arXiv preprint arXiv:1703.06211}, Year = {2017}}@article{ Author = {Navaneeth Bodla and Bharat Singh and Rama Chellappa and Larry S. Davis}, Title = {Soft-NMS -- Improving Object Detection With One Line of Code}, Booktitle = {Proceedings of the IEEE International Conference on Computer Vision}, Year = {2017}}@article{li2017light, title={Light-Head R-CNN: In Defense of Two-Stage Object Detector}, author={Li, Zeming and Peng, Chao and Yu, Gang and Zhang, Xiangyu and Deng, Yangdong and Sun, Jian}, journal={arXiv preprint arXiv:1711.07264}, year={2017}}@inproceedings{cai18cascadercnn, author = {Zhaowei Cai and Nuno Vasconcelos}, Title = {Cascade R-CNN: Delving into High Quality Object Detection}, booktitle = {CVPR}, Year = {2018}}