Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Implementation of popular deep learning networks with TensorRT network definition API

License

NotificationsYou must be signed in to change notification settings

wang-xinyu/tensorrtx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorRTx aims to implement popular deep learning networks with TensorRT network definition API.

Why don't we use a parser (ONNX parser, UFF parser, caffe parser, etc), but use complex APIs to build a network from scratch? I have summarized the advantages in the following aspects.

  • Flexible, easy to modify the network, add/delete a layer or input/output tensor, replace a layer, merge layers, integrate preprocessing and postprocessing into network, etc.
  • Debuggable, construct the entire network in an incremental development manner, easy to get middle layer results.
  • Educational, learn about the network structure during this development, rather than treating everything as a black box.

The basic workflow of TensorRTx is:

  1. Get the trained models from pytorch, mxnet or tensorflow, etc. Some pytorch models can be found in my repopytorchx, the remaining are from popular open-source repos.
  2. Export the weights to a plain text file --.wts file.
  3. Load weights in TensorRT, define the network, build a TensorRT engine.
  4. Load the TensorRT engine and run inference.

News

Tutorials

Test Environment

  1. TensorRT 7.x
  2. TensorRT 8.x(Some of the models support 8.x)

How to run

Each folder has a readme inside, which explains how to run the models inside.

Models

Following models are implemented.

NameDescription
mlpthe very basic model for starters, properly documented
lenetthe simplest, as a "hello world" of this project
alexneteasy to implement, all layers are supported in tensorrt
googlenetGoogLeNet (Inception v1)
inceptionInception v3, v4
mnasnetMNASNet with depth multiplier of 0.5 from the paper
mobilenetMobileNet v2, v3-small, v3-large
resnetresnet-18, resnet-50 and resnext50-32x4d are implemented
senetse-resnet50
shufflenetShuffleNet v2 with 0.5x output channels
squeezenetSqueezeNet 1.1 model
vggVGG 11-layer model
yolov3-tinyweights and pytorch implementation fromultralytics/yolov3
yolov3darknet-53, weights and pytorch implementation fromultralytics/yolov3
yolov3-sppdarknet-53, weights and pytorch implementation fromultralytics/yolov3
yolov4CSPDarknet53, weights fromAlexeyAB/darknet, pytorch implementation fromultralytics/yolov3
yolov5yolov5 v1.0-v7.0 ofultralytics/yolov5, detection, classification and instance segmentation
yolov7yolov7 v0.1, pytorch implementation fromWongKinYiu/yolov7
yolov8yolov8, pytorch implementation fromultralytics
yolov9The Pytorch implementation isWongKinYiu/yolov9.
yolov10The Pytorch implementation isTHU-MIG/yolov10.
yolo11The Pytorch implementation isultralytics.
yolopyolop, pytorch implementation fromhustvl/YOLOP
retinafaceresnet50 and mobilnet0.25, weights frombiubug6/Pytorch_Retinaface
arcfaceLResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights fromdeepinsight/insightface
retinafaceAntiCovmobilenet0.25, weights fromdeepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
dbnetScene Text Detection, weights fromBaofengZan/DBNet.pytorch
crnnpytorch implementation frommeijieru/crnn.pytorch
ufldpytorch implementation fromUltra-Fast-Lane-Detection, ECCV2020
hrnethrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation fromHRNet-Image-Classification andHRNet-Semantic-Segmentation
psenetPSENet Text Detection, tensorflow implementation fromliuheng92/tensorflow_PSENet
ibnnetIBN-Net, pytorch implementation fromXingangPan/IBN-Net, ECCV2018
unetU-Net, pytorch implementation frommilesial/Pytorch-UNet
repvggRepVGG, pytorch implementation fromDingXiaoH/RepVGG
lprnetLPRNet, pytorch implementation fromxuexingyu24/License_Plate_Detection_Pytorch
refinedetRefineDet, pytorch implementation fromluuuyi/RefineDet.PyTorch
densenetDenseNet-121, from torchvision.models
rcnnFasterRCNN and MaskRCNN, model fromdetectron2
tsmTSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019
scaled-yolov4yolov4-csp, pytorch fromWongKinYiu/ScaledYOLOv4
centernetCenterNet DLA-34, pytorch fromxingyizhou/CenterNet
efficientnetEfficientNet b0-b8 and l2, pytorch fromlukemelas/EfficientNet-PyTorch
detrDE⫶TR, pytorch fromfacebookresearch/detr
swin-transformerSwin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation ismicrosoft/Swin-Transformer
real-esrganReal-ESRGAN. The Pytorch implementation isreal-esrgan
superpointSuperPoint. The Pytorch model is frommagicleap/SuperPointPretrainedNetwork
csrnetCSRNet. The Pytorch implementation isleeyeehoo/CSRNet-pytorch
EfficientAdEfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies. Fromanomalib

Model Zoo

The .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.

GoogleDrive |BaiduPan pwd: uvv2

Tricky Operations

Some tricky operations encountered in these models, already solved, but might have better solutions.

NameDescription
BatchNormImplement by a scale layer, used in resnet, googlenet, mobilenet, etc.
MaxPool2d(ceil_mode=True)use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
average pool with paddinguse setAverageCountExcludesPadding() when necessary, see inception.
relu6useRelu6(x) = Relu(x) - Relu(x-6), see mobilenet.
torch.chunk()implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet.
channel shuffleuse two shuffle layers to implementchannel_shuffle, see shufflenet.
adaptive pooluse fixed input dimension, and use regular average pooling, see shufflenet.
leaky reluI wrote a leaky relu plugin, but PRelu inNvInferPlugin.h can be used, see yolov3 in branchtrt4.
yolo layer v1yolo layer is implemented as a plugin, see yolov3 in branchtrt4.
yolo layer v2three yolo layers implemented in one plugin, see yolov3-spp.
upsamplereplaced by a deconvolution layer, see yolov3.
hsigmoidhard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
retinaface output decodeimplement a plugin to decode bbox, confidence and landmarks, see retinaface.
mishmish activation is implemented as a plugin, mish is used in yolov4
prelumxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface
HardSwishhard_swish = x * hard_sigmoid, used in yolov5 v3.0
LSTMImplemented pytorch nn.LSTM() with tensorrt api

Speed Benchmark

ModelsDeviceBatchSizeModeInput Shape(HxW)FPS
YOLOv3-tinyXeon E5-2620/GTX10801FP32608x608333
YOLOv3(darknet53)Xeon E5-2620/GTX10801FP32608x60839.2
YOLOv3(darknet53)Xeon E5-2620/GTX10801INT8608x60871.4
YOLOv3-spp(darknet53)Xeon E5-2620/GTX10801FP32608x60838.5
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10801FP32608x60835.7
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10804FP32608x60840.9
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10808FP32608x60841.3
YOLOv5-s v3.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-s v3.0Xeon E5-2620/GTX10804FP32608x608173
YOLOv5-s v3.0Xeon E5-2620/GTX10808FP32608x608190
YOLOv5-m v3.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v3.0Xeon E5-2620/GTX10801FP32608x60843
YOLOv5-x v3.0Xeon E5-2620/GTX10801FP32608x60829
YOLOv5-s v4.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-m v4.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v4.0Xeon E5-2620/GTX10801FP32608x60840
YOLOv5-x v4.0Xeon E5-2620/GTX10801FP32608x60827
RetinaFace(resnet50)Xeon E5-2620/GTX10801FP32480x64090
RetinaFace(resnet50)Xeon E5-2620/GTX10801INT8480x640204
RetinaFace(mobilenet0.25)Xeon E5-2620/GTX10801FP32480x640417
ArcFace(LResNet50E-IR)Xeon E5-2620/GTX10801FP32112x112333
CRNNXeon E5-2620/GTX10801FP3232x1001000

Help wanted, if you got speed results, please add an issue or PR.

Acknowledgments & Contact

Any contributions, questions and discussions are welcomed, contact me by following info.

E-mail:wangxinyu_es@163.com

WeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群,备注:tensorrtx)


[8]ページ先頭

©2009-2025 Movatter.jp