Package Methods (0.14.2a0) Stay organized with collections Save and categorize content based on your preferences.
Summary of entries of Methods for documentai-toolbox.
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
_get_client_info(module:typing.Optional[str]=None,)->google.api_core.gapic_v1.client_info.ClientInfoReturns a custom user agent header.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
_get_storage_client(module:typing.Optional[str]=None,)->google.cloud.storage.client.ClientReturns a Storage client with custom user agent header.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
create_batches(gcs_bucket_name:str,gcs_prefix:str,batch_size:int=1000)->typing.List[google.cloud.documentai_v1.types.document_io.BatchDocumentsInputConfig]Create batches of documents in Cloud Storage to process withbatch_process_documents().
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
create_gcs_uri(gcs_bucket_name:str,gcs_prefix:str)->strCreates a Cloud Storage uri from the bucket_name and prefix.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
get_blob(gcs_uri:str,module:typing.Optional[str]="get-bytes")->google.cloud.storage.blob.BlobReturns a blob from Cloud Storage.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
get_blobs(gcs_uri:typing.Optional[str]=None,gcs_bucket_name:typing.Optional[str]=None,gcs_prefix:typing.Optional[str]="/",module:typing.Optional[str]="get-bytes",)->typing.List[google.cloud.storage.blob.Blob]Returns a list of blobs from Cloud Storage.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
get_bytes(gcs_bucket_name:str,gcs_prefix:str)->typing.List[bytes]Returns a list of bytes of json files from Cloud Storage.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
list_gcs_document_tree(gcs_bucket_name:str,gcs_prefix:str)->typing.Dict[str,typing.List[str]]Returns a list path to files in Cloud Storage folder.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
print_gcs_document_tree(gcs_bucket_name:str,gcs_prefix:str,files_to_display:int=4)->NonePrints a tree of filenames in a Cloud Storage folder.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
split_gcs_uri(gcs_uri:str)->typing.Tuple[str,str]Splits a Cloud Storage uri into the bucket_name and prefix.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
upload_file(gcs_output_directory:str,file_name:str,file_content:str,content_type:str="application/json",module:typing.Optional[str]="upload-file",)->NoneUploads the converted docproto to gcs.
See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
_apply_text_offset(documentai_object:typing.Union[typing.Dict[str,typing.Dict],typing.List],text_offset:int,)->NoneApplies a text offset to all text_segments indocumentai_object.
See more:google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
_bigquery_column_name(input_string:str)->strConverts a string into a BigQuery column name.
See more:google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
_dict_to_bigquery(dic:typing.Dict[str,typing.Union[str,typing.List[str]]],dataset_name:str,table_name:str,project_id:typing.Optional[str],)->google.cloud.bigquery.job.load.LoadJobLoads dictionary to a BigQuery table.
See more:google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
_entities_from_shards(shards:typing.List[google.cloud.documentai_v1.types.document.Document],)->typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]Returns a list of Entities and Properties from a list of documentai.Document shards.
See more:google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
_get_batch_process_metadata(operation_name:str,location:typing.Optional[str]=None,timeout:typing.Optional[float]=None,)->google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadataGetBatchProcessMetadata from abatch_process_documents() long-running operation.
See more:google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document._get_shards
_get_shards(gcs_bucket_name:str,gcs_prefix:str)->typing.List[google.cloud.documentai_v1.types.document.Document]Returns a list ofdocumentai.Document shards from a Cloud Storage folder.
See more:google.cloud.documentai_toolbox.wrappers.document._get_shards
google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
_insert_into_dictionary_with_list(dic:typing.Dict[str,typing.Union[str,typing.List[str]]],key:str,value:str)->typing.Dict[str,typing.Union[str,typing.List[str]]]Inserts value into a dictionary that can contain lists.
See more:google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
_pages_from_shards(shards:typing.List[google.cloud.documentai_v1.types.document.Document],)->typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]Returns a list of Pages from a list of documentai.Document shards.
See more:google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
_get_hocr_bounding_box(element_with_layout:typing.Union[google.cloud.documentai_v1.types.document.Document.Page.Paragraph,google.cloud.documentai_v1.types.document.Document.Page,google.cloud.documentai_v1.types.document.Document.Page.Token,google.cloud.documentai_v1.types.document.Document.Page.Block,google.cloud.documentai_v1.types.document.Document.Page.Symbol,],page_dimension:google.cloud.documentai_v1.types.document.Document.Page.Dimension,)->typing.Optional[str]Returns a hOCR bounding box string.
See more:google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
google.cloud.documentai_toolbox.wrappers.page._text_from_layout
_text_from_layout(layout:google.cloud.documentai_v1.types.document.Document.Page.Layout,text:str)->strReturns a text from a single layout element.
See more:google.cloud.documentai_toolbox.wrappers.page._text_from_layout
google.cloud.documentai_toolbox.wrappers.page._trim_text
_trim_text(text:str)->strRemove extra space characters from text (blank, newline, tab, etc.).
See more:google.cloud.documentai_toolbox.wrappers.page._trim_text
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_json_response
convert_document_to_annotate_file_json_response()->strConvert OCR data fromDocument.proto to JSON str ofAnnotateFileResponse for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_response
convert_document_to_annotate_file_response()->(google.cloud.vision_v1.types.image_annotator.AnnotateFileResponse)Convert OCR data fromDocument.proto toAnnotateFileResponse.proto for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
entities_to_bigquery(dataset_name:str,table_name:str,project_id:typing.Optional[str]=None)->google.cloud.bigquery.job.load.LoadJobAdds extracted entities to a BigQuery table.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
entities_to_dict()->typing.Dict[str,typing.Union[str,typing.List[str]]]Returns Dictionary of entities in document.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
export_hocr_str(title:str)->strExports a string hOCR version of the Document.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
google.cloud.documentai_toolbox.wrappers.document.Document.export_images
export_images(output_path:str,output_file_prefix:str,output_file_extension:str)->typing.List[str]Exports images fromDocument.entities to files.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.export_images
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
form_fields_to_bigquery(dataset_name:str,table_name:str,project_id:typing.Optional[str]=None)->google.cloud.bigquery.job.load.LoadJobAdds extracted form fields to a BigQuery table.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
form_fields_to_dict()->typing.Dict[str,typing.Union[str,typing.List[str]]]Returns dictionary of form fields in document.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
from_batch_process_metadata(metadata:google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata,)->typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]Loads Documents from Cloud Storage, using the output fromBatchProcessMetadata.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
from_batch_process_operation(location:str,operation_name:str,timeout:typing.Optional[float]=None)->typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]Loads Documents from Cloud Storage, using the operation name returned frombatch_process_documents().
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
from_document_path(document_path:str,)->google.cloud.documentai_toolbox.wrappers.document.DocumentLoadsDocument from localdocument_path.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
from_documentai_document(documentai_document:google.cloud.documentai_v1.types.document.Document,)->google.cloud.documentai_toolbox.wrappers.document.DocumentLoadsDocument from localdocumentai_document.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
from_gcs(gcs_bucket_name:str,gcs_prefix:str,gcs_input_uri:typing.Optional[str]=None)->google.cloud.documentai_toolbox.wrappers.document.DocumentLoads a Document from a Cloud Storage directory.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
from_gcs_uri(gcs_uri:str,gcs_input_uri:typing.Optional[str]=None)->google.cloud.documentai_toolbox.wrappers.document.DocumentLoads a Document from a Cloud Storage uri.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
get_entity_by_type(target_type:str,)->typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]Returns the list ofEntities oftarget_type.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
get_form_field_by_name(target_field:str,)->typing.List[google.cloud.documentai_toolbox.wrappers.page.FormField]Returns the list ofFormFields namedtarget_field.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
search_pages(target_string:typing.Optional[str]=None,pattern:typing.Optional[str]=None)->typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]Returns the list of Pages containing target_string or text matching pattern.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
split_pdf(pdf_path:str,output_path:str)->typing.List[str]Splits local PDF file into multiple PDF files based on output from a Splitter processor.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
to_merged_documentai_document()->(google.cloud.documentai_v1.types.document.Document)Exports a documentai.Document from the wrapped document with shards merged.
See more:google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
crop_image(documentai_page:google.cloud.documentai_v1.types.document.Document.Page,)->typing.Optional[PIL.Image.Image]Return image cropped from page image for detected entity.
See more:google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
_get_elements(element_type:typing.Type,attribute_name:str)->typing.ListHelper method to create elements based on specified type.
See more:google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
_extract_table_rows(table_rows:typing.Iterable[google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow],)->typing.List[typing.List[str]]Returns a list of rows from table_rows.
See more:google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
to_dataframe()->pandas.core.frame.DataFrameReturns pd.DataFrame from documentai.table.
See more:google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element
_get_children_of_element(potential_children:typing.List[google.cloud.documentai_toolbox.wrappers.page._BasePageElement],)->typing.List[google.cloud.documentai_toolbox.wrappers.page._BasePageElement]Filters potential child elements to identify only those fully contained within this element.
See more:google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-30 UTC.