Text Recognition Tool¶

Google Cloud Vision API¶

classlayoutparser.ocr.GCVFeatureType[source]¶

Bases:layoutparser.ocr.base.BaseOCRElementType

The element types from Google Cloud Vision API

PAGE = 0¶

BLOCK = 1¶

PARA = 2¶

WORD = 3¶

SYMBOL = 4¶

propertychild_level¶

classlayoutparser.ocr.GCVAgent(languages=None,ocr_image_decode_type='.png')[source]¶

Bases:layoutparser.ocr.base.BaseOCRAgent

A wrapper forGoogle Cloud Vision (GCV) TextDetection APIs.

Note

Google Cloud Vision API returns the output text in two types:

text_annotations:
In this format, GCV automatically find the best aggregationlevel for the text, and return the results in a list. We usegather_text_annotations to reterive this type ofinformation.
full_text_annotation:
To support better user control, GCV also provides thefull_text_annotation output, where it returns the hierarchicalstructure of the output text. To process this output, we providethegather_full_text_annotation function to aggregate thetexts of the given aggregation level.

Create a Google Cloud Vision OCR Agent.

Parameters

languages (list, optional) – You can specify the language code of the documents to detect to improveaccuracy. The supported language and their code can be found onthis page.Defaults to None.
ocr_image_decode_type (str, optional) –
The format to convert the input image to before sending for GCV OCR.Defaults to“.png”.
- ”.png” is suggested as it does not compress the image.
- But“.jpg” could also be a good choice if the input image is very large.

DEPENDENCIES = ['google-cloud-vision']¶

classmethodwith_credential(credential_path,**kwargs)[source]¶

Specifiy the credential to use for the GCV OCR API.

Parameters: credential_path (str) – The path to the credential file

detect(image,return_response=False,return_only_text=False,agg_output_level=None)[source]¶

Send the input image for OCR.

Parameters

image (np.ndarray orstr) – The input image array or the name of the image file
return_response (bool, optional) – Whether directly return the google cloud response.Defaults toFalse.
return_only_text (bool, optional) – Whether return only the texts in the OCR results.Defaults toFalse.
agg_output_level (GCVFeatureType, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.

staticgather_text_annotations(response)[source]¶

Convert the text_annotations from GCV output to anLayout object.

Parameters: response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.
Returns: The reterived layout from the response.
Return type: Layout

staticgather_full_text_annotation(response,agg_level)[source]¶

Convert the full_text_annotation from GCV output to anLayout object.

Parameters

response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.
agg_level (GCVFeatureType) – The layout level to aggregate the text in full_text_annotation.

Returns

The reterived layout from the response.

Return type

Layout

load_response(filename)[source]¶

save_response(res,file_name)[source]¶

Tesseract OCR API¶

classlayoutparser.ocr.TesseractFeatureType[source]¶

Bases:layoutparser.ocr.base.BaseOCRElementType

The element types for Tesseract Detection API

PAGE = 0¶

BLOCK = 1¶

PARA = 2¶

LINE = 3¶

WORD = 4¶

propertygroup_levels¶

classlayoutparser.ocr.TesseractAgent(languages='eng',**kwargs)[source]¶

Bases:layoutparser.ocr.base.BaseOCRAgent

A wrapper forTesseract TextDetection APIs based onPyTesseract.

Create a Tesseract OCR Agent.

Parameters: languages (list orstr, optional) – You can specify the language code(s) of the documents to detect to improveaccuracy. The supported language and their code can be found onits github repo.It supports two formats: 1) you can pass in the languages code as a stringof format like“eng+fra”, or 2) you can pack them as a list of strings[“eng”, “fra”].Defaults to ‘eng’.

DEPENDENCIES = ['pytesseract']¶

classmethodwith_tesseract_executable(tesseract_cmd_path,**kwargs)[source]¶

detect(image,return_response=False,return_only_text=True,agg_output_level=None)[source]¶

Send the input image for OCR.

Parameters

image (np.ndarray orstr) – The input image array or the name of the image file
return_response (bool, optional) – Whether directly return all output (string and boxesinfo) from Tesseract.Defaults toFalse.
return_only_text (bool, optional) – Whether return only the texts in the OCR results.Defaults toFalse.
agg_output_level (TesseractFeatureType, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.

staticgather_data(response,agg_level)[source]¶: Gather the OCR’ed text, bounding boxes, and confidencein a given aggeragation level.

staticload_response(filename)[source]¶

staticsave_response(res,file_name)[source]¶

Movatterモバイル変換

Text Recognition Tool¶

Google Cloud Vision API¶

Tesseract OCR API¶