Text Recognition Tool

Google Cloud Vision API

classlayoutparser.ocr.GCVFeatureType[source]

Bases:layoutparser.ocr.base.BaseOCRElementType

The element types from Google Cloud Vision API

PAGE = 0
BLOCK = 1
PARA = 2
WORD = 3
SYMBOL = 4
propertychild_level
classlayoutparser.ocr.GCVAgent(languages=None,ocr_image_decode_type='.png')[source]

Bases:layoutparser.ocr.base.BaseOCRAgent

A wrapper forGoogle Cloud Vision (GCV) TextDetection APIs.

Note

Google Cloud Vision API returns the output text in two types:

  • text_annotations:

    In this format, GCV automatically find the best aggregationlevel for the text, and return the results in a list. We usegather_text_annotations to reterive this type ofinformation.

  • full_text_annotation:

    To support better user control, GCV also provides thefull_text_annotation output, where it returns the hierarchicalstructure of the output text. To process this output, we providethegather_full_text_annotation function to aggregate thetexts of the given aggregation level.

Create a Google Cloud Vision OCR Agent.

Parameters
  • languages (list, optional) – You can specify the language code of the documents to detect to improveaccuracy. The supported language and their code can be found onthis page.Defaults to None.

  • ocr_image_decode_type (str, optional) –

    The format to convert the input image to before sending for GCV OCR.Defaults to“.png”.

    • ”.png” is suggested as it does not compress the image.

    • But“.jpg” could also be a good choice if the input image is very large.

DEPENDENCIES = ['google-cloud-vision']
classmethodwith_credential(credential_path,**kwargs)[source]

Specifiy the credential to use for the GCV OCR API.

Parameters

credential_path (str) – The path to the credential file

detect(image,return_response=False,return_only_text=False,agg_output_level=None)[source]

Send the input image for OCR.

Parameters
  • image (np.ndarray orstr) – The input image array or the name of the image file

  • return_response (bool, optional) – Whether directly return the google cloud response.Defaults toFalse.

  • return_only_text (bool, optional) – Whether return only the texts in the OCR results.Defaults toFalse.

  • agg_output_level (GCVFeatureType, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.

staticgather_text_annotations(response)[source]

Convert the text_annotations from GCV output to anLayout object.

Parameters

response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.

Returns

The reterived layout from the response.

Return type

Layout

staticgather_full_text_annotation(response,agg_level)[source]

Convert the full_text_annotation from GCV output to anLayout object.

Parameters
  • response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.

  • agg_level (GCVFeatureType) – The layout level to aggregate the text in full_text_annotation.

Returns

The reterived layout from the response.

Return type

Layout

load_response(filename)[source]
save_response(res,file_name)[source]

Tesseract OCR API

classlayoutparser.ocr.TesseractFeatureType[source]

Bases:layoutparser.ocr.base.BaseOCRElementType

The element types for Tesseract Detection API

PAGE = 0
BLOCK = 1
PARA = 2
LINE = 3
WORD = 4
propertygroup_levels
classlayoutparser.ocr.TesseractAgent(languages='eng',**kwargs)[source]

Bases:layoutparser.ocr.base.BaseOCRAgent

A wrapper forTesseract TextDetection APIs based onPyTesseract.

Create a Tesseract OCR Agent.

Parameters

languages (list orstr, optional) – You can specify the language code(s) of the documents to detect to improveaccuracy. The supported language and their code can be found onits github repo.It supports two formats: 1) you can pass in the languages code as a stringof format like“eng+fra”, or 2) you can pack them as a list of strings[“eng”, “fra”].Defaults to ‘eng’.

DEPENDENCIES = ['pytesseract']
classmethodwith_tesseract_executable(tesseract_cmd_path,**kwargs)[source]
detect(image,return_response=False,return_only_text=True,agg_output_level=None)[source]

Send the input image for OCR.

Parameters
  • image (np.ndarray orstr) – The input image array or the name of the image file

  • return_response (bool, optional) – Whether directly return all output (string and boxesinfo) from Tesseract.Defaults toFalse.

  • return_only_text (bool, optional) – Whether return only the texts in the OCR results.Defaults toFalse.

  • agg_output_level (TesseractFeatureType, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.

staticgather_data(response,agg_level)[source]

Gather the OCR’ed text, bounding boxes, and confidencein a given aggeragation level.

staticload_response(filename)[source]
staticsave_response(res,file_name)[source]