Text Recognition Tool¶
Google Cloud Vision API¶
- class
layoutparser.ocr.
GCVFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types from Google Cloud Vision API
PAGE
= 0¶
BLOCK
= 1¶
PARA
= 2¶
WORD
= 3¶
SYMBOL
= 4¶
- property
child_level
¶
- class
layoutparser.ocr.
GCVAgent
(languages=None,ocr_image_decode_type='.png')[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper forGoogle Cloud Vision (GCV) TextDetection APIs.
Note
Google Cloud Vision API returns the output text in two types:
text_annotations:
In this format, GCV automatically find the best aggregationlevel for the text, and return the results in a list. We use
gather_text_annotations
to reterive this type ofinformation.full_text_annotation:
To support better user control, GCV also provides thefull_text_annotation output, where it returns the hierarchicalstructure of the output text. To process this output, we providethe
gather_full_text_annotation
function to aggregate thetexts of the given aggregation level.
Create a Google Cloud Vision OCR Agent.
- Parameters
languages (
list
, optional) – You can specify the language code of the documents to detect to improveaccuracy. The supported language and their code can be found onthis page.Defaults to None.ocr_image_decode_type (
str
, optional) –The format to convert the input image to before sending for GCV OCR.Defaults to“.png”.
”.png” is suggested as it does not compress the image.
But“.jpg” could also be a good choice if the input image is very large.
DEPENDENCIES
= ['google-cloud-vision']¶
- classmethod
with_credential
(credential_path,**kwargs)[source]¶ Specifiy the credential to use for the GCV OCR API.
- Parameters
credential_path (
str
) – The path to the credential file
detect
(image,return_response=False,return_only_text=False,agg_output_level=None)[source]¶Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return the google cloud response.Defaults toFalse.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results.Defaults toFalse.agg_output_level (
GCVFeatureType
, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.
- static
gather_text_annotations
(response)[source]¶ Convert the text_annotations from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.- Returns
The reterived layout from the response.
- Return type
Layout
- static
gather_full_text_annotation
(response,agg_level)[source]¶ Convert the full_text_annotation from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.agg_level (
GCVFeatureType
) – The layout level to aggregate the text in full_text_annotation.
- Returns
The reterived layout from the response.
- Return type
Layout
Tesseract OCR API¶
- class
layoutparser.ocr.
TesseractFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types for Tesseract Detection API
PAGE
= 0¶
BLOCK
= 1¶
PARA
= 2¶
LINE
= 3¶
WORD
= 4¶
- property
group_levels
¶
- class
layoutparser.ocr.
TesseractAgent
(languages='eng',**kwargs)[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper forTesseract TextDetection APIs based onPyTesseract.
Create a Tesseract OCR Agent.
- Parameters
languages (
list
orstr
, optional) – You can specify the language code(s) of the documents to detect to improveaccuracy. The supported language and their code can be found onits github repo.It supports two formats: 1) you can pass in the languages code as a stringof format like“eng+fra”, or 2) you can pack them as a list of strings[“eng”, “fra”].Defaults to ‘eng’.
DEPENDENCIES
= ['pytesseract']¶
detect
(image,return_response=False,return_only_text=True,agg_output_level=None)[source]¶Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return all output (string and boxesinfo) from Tesseract.Defaults toFalse.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results.Defaults toFalse.agg_output_level (
TesseractFeatureType
, optional) – When set, aggregate the GCV output with respect to thespecified aggregation level. Defaults toNone.