Auto-annotation API
Overview
This layer provides functionality that allows you to automatically annotate a CVAT datasetby running a custom function on your local machine.A function, in this context, is a Python object that implements a particular protocoldefined by this layer.To avoid confusion with Python functions,auto-annotation functions will be referred to as “AA functions” in the following text.A typical AA function will be based on a machine learning modeland consist of the following basic elements:
Code to load the ML model.
A specification describing the annotations that the AA function can produce.
Code to convert data from CVAT to a format the ML model can understand.
Code to run the ML model.
Code to convert resulting annotations to a format CVAT can understand.
The layer can be divided into several parts:
The interface, containing the protocol that an AA function must implement.
The driver, containing functionality to annotate a CVAT dataset using an AA function.
The predefined AA function based on Ultralytics YOLOv8n.
Theauto-annotate
CLI command provides a way to use an AA function from the command linerather than from a Python program.Seethe CLI documentation for details.
Example
fromtypingimportListimportPIL.Imageimporttorchvision.modelsfromcvat_sdkimportmake_clientimportcvat_sdk.modelsasmodelsimportcvat_sdk.auto_annotationascvataaclassTorchvisionDetectionFunction:def__init__(self,model_name:str,weights_name:str,**kwargs)->None:# load the ML modelweights_enum=torchvision.models.get_model_weights(model_name)self._weights=weights_enum[weights_name]self._transforms=self._weights.transforms()self._model=torchvision.models.get_model(model_name,weights=self._weights,**kwargs)self._model.eval()@propertydefspec(self)->cvataa.DetectionFunctionSpec:# describe the annotationsreturncvataa.DetectionFunctionSpec(labels=[cvataa.label_spec(cat,i,type="rectangle")fori,catinenumerate(self._weights.meta["categories"])ifcat!="N/A"])defdetect(self,context:cvataa.DetectionFunctionContext,image:PIL.Image.Image)->list[models.LabeledShapeRequest]:# determine the threshold for filtering resultsconf_threshold=context.conf_thresholdor0# convert the input into a form the model can understandtransformed_image=[self._transforms(image)]# run the ML modelresults=self._model(transformed_image)# convert the results into a form CVAT can understandreturn[cvataa.rectangle(label.item(),[x.item()forxinbox])forresultinresultsforbox,label,scoreinzip(result["boxes"],result["labels"],result["scores"])ifscore>=conf_threshold]# log into the CVAT serverwithmake_client(host="http://localhost",credentials=("user","password"))asclient:# annotate task 12345 using Faster R-CNNcvataa.annotate_task(client,41617,TorchvisionDetectionFunction("fasterrcnn_resnet50_fpn_v2","DEFAULT",box_score_thresh=0.5),)
Auto-annotation interface
Currently, the only type of AA function supported by this layer is the detection function.Therefore, all of the following information will pertain to detection functions.
A detection function accepts an image and returns a list of shapes found in that image.When it is applied to a dataset, the AA function is run for every image,and the resulting lists of shapes are combined and uploaded to CVAT.
A detection function must have two attributes,spec
anddetect
.
spec
must contain the AA function’s specification,which is an instance ofDetectionFunctionSpec
.
DetectionFunctionSpec
must be initialized with a sequence ofPatchedLabelRequest
objectsthat represent the labels that the AA function knows about.See the docstring ofDetectionFunctionSpec
for more information on the constraintsthat these objects must follow.BadFunctionError
will be raised if any constraint violations are detected.
detect
must be a function/method accepting two parameters:
context
(DetectionFunctionContext
).Contains invocation parameters and information about the current image.The following fields are available:frame_name
(str
). The file name of the frame on the CVAT server.conf_threshold
(float | None
). The confidence threshold that the functionshould use to filter objects. IfNone
, the function may apply a defaultthreshold at its discretion.
image
(PIL.Image.Image
).Contains image data.
detect
must return a list ofLabeledShapeRequest
objects,representing shapes found in the image.See the docstring ofDetectionFunctionSpec
for more information on the constraintsthat these objects must follow.
The same AA function may be used with any dataset that contain labels with the same nameas the AA function’s specification.The way it works is that the driver matches labels between the spec and the dataset,and replaces the label IDs in the shape objects with those defined in the dataset.
For example, suppose the AA function’s spec defines the following labels:
Name | ID |
---|---|
bat | 0 |
rat | 1 |
And the dataset defines the following labels:
Name | ID |
---|---|
bat | 100 |
cat | 101 |
rat | 102 |
Then supposedetect
returns a shape withlabel_id
equal to 1.The driver will see that it refers to therat
label, and replace it with 102,since that’s the ID this label has in the dataset.
The same logic is used for sublabel and attribute IDs.
Helper factory functions
The CVAT API model types used in the AA function protocol are somewhat unwieldy to work with,so it’s recommended to use the helper factory functions provided by this layer.These helpers instantiate an object of their corresponding model type,passing their arguments to the model constructorand sometimes setting some attributes to fixed values.
The following helpers are available for building specifications:
Name | Model type | Fixed attributes |
---|---|---|
label_spec | PatchedLabelRequest | - |
skeleton_label_spec | PatchedLabelRequest | type="skeleton" |
keypoint_spec | SublabelRequest | type="points" |
attribute_spec | AttributeRequest | mutable=False |
checkbox_attribute_spec | AttributeRequest | mutable=False ,input_type="checkbox" ,values=[] |
number_attribute_spec | AttributeRequest | mutable=False ,input_type="number" |
radio_attribute_spec | AttributeRequest | mutable=False ,input_type="radio" |
select_attribute_spec | AttributeRequest | mutable=False ,input_type="select" |
text_attribute_spec | AttributeRequest | mutable=False ,input_type="number" ,values=[] |
Fornumber_attribute_spec
,it’s recommended to use thecvat_sdk.attributes.number_attribute_values
functionto create thevalues
argument, since this function will enforce the constraints expectedfor attribute specs of this type.For example:
cvataa.number_attribute_spec("size",1,number_attribute_values(0,10))
The following helpers are available for use indetect
:
Name | Model type | Fixed attributes |
---|---|---|
shape | LabeledShapeRequest | frame=0 |
mask | LabeledShapeRequest | frame=0 ,type="mask" |
polygon | LabeledShapeRequest | frame=0 ,type="polygon" |
rectangle | LabeledShapeRequest | frame=0 ,type="rectangle" |
skeleton | LabeledShapeRequest | frame=0 ,type="skeleton" |
keypoint | SubLabeledShapeRequest | frame=0 ,type="points" |
Formask
, it is recommended to create the points list usingthecvat_sdk.masks.encode_mask
function, which will convert a bitmap into alist in the format that CVAT expects. For example:
cvataa.mask(my_label,encode_mask(my_mask,# boolean 2D array, same size as the input image[x1,y1,x2,y2],# top left and bottom right coordinates of the mask))
To create shapes with attributes,it’s recommended to use thecvat_sdk.attributes.attribute_vals_from_dict
function,which returns a list of objects that can be passed to anattributes
argument:
cvataa.rectangle(my_label,[x1,y2,x2,y2],attributes=attribute_vals_from_dict({my_attr1:val1,my_attr2:val2}))
Auto-annotation driver
Theannotate_task
function uses an AA function to annotate a CVAT task.It must be called as follows:
annotate_task(<client>,<taskID>,<AAfunction>,<optionalarguments...>)
The supplied client will be used to make all API calls.
By default, new annotations will be appended to the old ones.Useclear_existing=True
to remove old annotations instead.
If a detection function declares a label that has no matching label in the task,then by default,BadFunctionError
is raised, and auto-annotation is aborted.If you useallow_unmatched_label=True
, then such labels will be ignored,and any shapes referring to them will be dropped.Same logic applies to sublabels and attributes.
It’s possible to pass a custom confidence threshold to the function via theconf_threshold
parameter.
annotate_task
will raise aBadFunctionError
exceptionif it detects that the function violated the AA function protocol.
Predefined AA functions
This layer includes several predefined AA functions.You can use them as-is, or as a base on which to build your own.
Each function is implemented as a moduleto allow usage via the CLIauto-annotate
command.Therefore, in order to use it from the SDK,you’ll need to import the corresponding module.
cvat_sdk.auto_annotation.functions.torchvision_detection
This AA function uses object detection models fromthetorchvision library.It produces rectangle annotations.
To use it, install CVAT SDK with thepytorch
extra:
$ pip install "cvat-sdk[pytorch]"
Usage from Python:
fromcvat_sdk.auto_annotation.functions.torchvision_detectionimportcreateascreate_torchvisionannotate_task(<client>,<taskID>,create_torchvision(<modelname>,...))
Usage from the CLI:
cvat-cli auto-annotate"<task ID>" --function-module cvat_sdk.auto_annotation.functions.torchvision_detection\ -pmodel_name=str:"<model name>" ...
Thecreate
function accepts the following parameters:
model_name
(str
) - the name of the model, such asfasterrcnn_resnet50_fpn_v2
.This parameter is required.weights_name
(str
) - the name of a weights enum value for the model, such asCOCO_V1
.Defaults toDEFAULT
.
It also accepts arbitrary additional parameters,which are passed directly to the model constructor.
cvat_sdk.auto_annotation.functions.torchvision_instance_segmentation
This AA function is analogous totorchvision_detection
,except it uses torchvision’s instance segmentation models and produces maskor polygon annotations (depending on the value ofconv_mask_to_poly
).
Refer to that function’s description for usage instructions and parameter information.
cvat_sdk.auto_annotation.functions.torchvision_keypoint_detection
This AA function is analogous totorchvision_detection
,except it uses torchvision’s keypoint detection models and produces skeleton annotations.Keypoints which the model marks as invisible will be marked as occluded in CVAT.
Refer to that function’s description for usage instructions and parameter information.