Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tiny YOLO v2 object detection with tensorflow.js.

License

NotificationsYou must be signed in to change notification settings

justadudewhohacks/tfjs-tiny-yolov2

Repository files navigation

Build Status

JavaScript object detection in the browser based on a tensorflow.js implementation of tiny yolov2.

Table of Contents:

Pre Trained Models

The VOC and COCO models correspond to the quantized weights from the officialdarknet repo. The face detector uses depthwise separable convolutions instead of regular convolutions allowing for much faster prediction and a tiny model size, which is well suited for object detection on mobile devices as well. I trained the face detection model from scratch. Have a look at theTraining your own Object Detector section if you want to train such a model for your own dataset!

Pascal VOC

voc1voc2

COCO

coco1coco2

Face Detection

The face detection model is one of the models available inface-api.js.

face

Running the Examples

cd examplesnpm inpm start

Browse tohttp://localhost:3000/.

Usage

Get the latest build from dist/tiny-yolov2.js or dist/tiny-yolov2.min.js and include the script:

<scriptsrc="tiny-yolov2.js"></script>

Simply load the model:

constconfig=// yolo configconstnet=newyolo.TinyYolov2(config)awaitnet.load(`voc_model-weights_manifest.json`)

The config file of the VOC model looks as follows:

{// the pre trained VOC model uses regular convolutions"withSeparableConvs":false,// iou threshold for nonMaxSuppression"iouThreshold":0.4,// anchor box dimensions, relative to cell size (32px)"anchors":[{"x":1.08,"y":1.19},{"x":3.42,"y":4.41},{"x":6.63,"y":11.38},{"x":9.42,"y":5.11},{"x":16.62,"y":10.52}],// class labels in correct order"classes":["aeroplane","bicycle","bird","boat","bottle","bus","car","cat","chair","cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep","sofa","train","tvmonitor"]}

Inference and drawing the results:

constforwardParams={inputSize:416,scoreThreshold:0.8}constdetections=awaitnet.detect('myInputImage',forwardParams)yolo.drawDetection('myCanvas',detections)

Also check out the examples.

Training your own Object Detector

If you want to train your own object detector, I would suggest training a model using separable convolutions, as it will allow for much faster inference times and the training process will converge much faster, as there are significantly less parameters to train.

Training a multiclass detector will take quite some time, depending on how much classes you are training your object detector on. However, training a single class detector it is possible to get already pretty good results after training for only a few epochs.

Defining your Model Config

{// use separable convolutions over regular convolutions"withSeparableConvs":true,// iou threshold for nonMaxSuppression"iouThreshold":0.4,// instructions for how to determine anchors is given below"anchors":[...],// whatever kind of objects you are training your object detector on"classes":["cat"],// optionally you can compute the mean RGB value for your dataset and// pass it in the config for performing mean value subtraction on your// input images"meanRgb":[...],// scale factors for each loss term (only required for training),// explained below"objectScale":5,"noObjectScale":1,"coordScale":1,"classScale":1}

Labeling your Data with Ground Truth Boxes

For each image in your training set, you should create a corresponding json file, containing the bounding boxes and class labels of each of the instance of objects located in that image. The bounding box dimensions should be relative to the image dimensions.

Consider an image with a width and height of 400px, showing a single cat, which is spanned by the bounding box at x = 50px, y = 100px (upper left corner) with a box size of width = 200px and height = 100px. The corresponding json file should look as follows (note, it is an array of all bounding boxes for that image):

[  {"x":0.125,"y":0.25,"width":0.5,"height":0.25,"label":"cat"  }]

Computing Box Anchors

Before training your detector, you want to compute 5 anchor boxes over your training set. An anchor box is basically an object of shape { "x": boxWidth / 32, "y": boxHeight / 32 } where x and y are the anchor box sizes relative to the grid cell size (32px).

To determine the 5 anchor boxes, you want to simply perform kmeans clustering with 5 clusters over the width and height of each ground truth box of your training set. There should be plenty of options out there, which you can use for kmeans clustering, but I will provide a script for that, coming soon...

Yolo Loss Function

The Yolo loss function computes the sum of the coordinate, object, class and no object loss. You can tune the weight of each loss term contributing to the totoal loss by adjusting the corresponding scale parameters in your config file, as mentioned above.

The no object loss term penalizes the scores of the bounding box of all the box anchors in the grid, which do not have a corresponding ground truth bounding box. In other words, they should optimally predict a score of 0, if there is no object of interest at that position.

On the other hand, the object, class and coordinate loss terms refer to the accuracy of the prediction at each anchor position where there is a ground truth bounding box. The coordinate loss simply penalizes the difference between predicted bounding box coordinates and ground truth box coordinates, the object loss penalizes the difference of the predicted confidence score to the box IOU.

The class loss penalizes the confidence score of the predicted score. Note, that training a single class object detector you can simply ignore that parameter, as the class loss is always 0 in that case.

PS: You can simply go with the default values in the above shown config example.

Initializing the Model Weights

Training a model from scratch, you need some weights to begin with. Simply open initWeights.html located in the /train folder of the repo in your browser. Enter the number of classes, hit save and use the saved file as the initial checkpoint weight file.

Start Training

For a complete example, also check out the /train folder at the root of this repo, which also contains some tooling to save intermediary checkpoints of your model weights as well as statistics of the average loss after each epoch.

Set up the model for training:

constconfig=// your config// simply use any of the optimizer provided by tfjs (I usually use adam)constlearningRate=0.001constoptimizer=tf.train.adam(learningRate,0.9,0.999,1e-8)// initialize a trainable TinyYolov2constnet=newyolo.TinyYolov2Trainable(config,optimizer)// load initial weights or the weights of any checkpointconstcheckpointUri='checkpoints/initial_glorot_1_classes.weights'constweights=newFloat32Array(await(awaitfetch(checkpointUri)).arrayBuffer())awaitnet.load(weights)

What I usually do is naming the json files the same as the corresponding image, e.g.img1.jpg andimg1.json and provide an endpoint to retrieve the json file names as an array:

constboxJsonUris=(awaitfetch('/boxJsonUris')).json()

Furthermore you can choose to train your model on a fixed input size or you can perform multi scale training, which is a good way to improve the accuracy of your model at different scales. This can also be helpful to augment your data, in case you only have a limited number of training samples:

// should be multiples of 32 (grid cell size)consttrainingSizes=[160,224,320,416]

Then we can actually train it:

for(letepoch=startEpoch;epoch<maxEpoch;epoch++){// always shuffle your inputs for each epochconstshuffledInputs=yolo.shuffleArray(boxJsonUris)// loop through shuffled inputsfor(letdataIdx=0;dataIdx<shuffledInputs.length;dataIdx++){// fetch image and corresponding ground truth bounding boxesconstboxJsonUri=shuffledInputs[dataIdx]constimgUri=boxJsonUri.replace('.json','.jpg')constgroundTruth=await(awaitfetch(boxJsonUri)).json()constimg=awaityolo.bufferToImage(await(awaitfetch(imgUri)).blob())// rescale and backward pass input image for each input sizefor(letsizeIdx=0;sizeIdx<trainSizes.length;sizeIdx++){constinputSize=trainSizes[sizeIdx]constbackwardOptions={// filter boxes with width < 32 or height < 32minBoxSize:32,// log computed lossesreportLosses:function({ losses, numBoxes, inputSize}){console.log(`ground truth boxes:${numBoxes} (${inputSize})`)console.log(`noObjectLoss[${dataIdx}]:${yolo.round(losses.noObjectLoss,4)}`)console.log(`objectLoss[${dataIdx}]:${yolo.round(losses.objectLoss,4)}`)console.log(`coordLoss[${dataIdx}]:${yolo.round(losses.coordLoss,4)}`)console.log(`classLoss[${dataIdx}]:${yolo.round(losses.classLoss,4)}`)console.log(`totalLoss[${dataIdx}]:${yolo.round(losses.totalLoss,4)}`)}}constloss=awaitnet.backward(img,groundTruth,inputSize,backwardOptions)if(loss){// don't forget to free the loss tensorloss.dispose()}else{console.log('no boxes remaining after filtering')}}}}

Overfit first!

Generally it's a good idea, to overfit on a small subset of your training data, to verify, that the loss is converging and that your detector is actually learning something. Therefore, you can simply train your detector on 10 - 20 images of your training data for some epochs. Once the loss converges, save the model, run inference on these 10 - 20 images to view the predicted bounding boxes and compare them to the ground truth boxes.

About

Tiny YOLO v2 object detection with tensorflow.js.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp