Package Methods (0.14.2a0)

Summary of entries of Methods for documentai-toolbox.

google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info

_get_client_info(module:typing.Optional[str]=None,)->google.api_core.gapic_v1.client_info.ClientInfo

google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client

_get_storage_client(module:typing.Optional[str]=None,)->google.cloud.storage.client.Client

Returns a Storage client with custom user agent header.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client

google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches

create_batches(gcs_bucket_name:str,gcs_prefix:str,batch_size:int=1000)->typing.List[google.cloud.documentai_v1.types.document_io.BatchDocumentsInputConfig]

Create batches of documents in Cloud Storage to process withbatch_process_documents().

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches

google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri

create_gcs_uri(gcs_bucket_name:str,gcs_prefix:str)->str

Creates a Cloud Storage uri from the bucket_name and prefix.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri

google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob

get_blob(gcs_uri:str,module:typing.Optional[str]="get-bytes")->google.cloud.storage.blob.Blob

Returns a blob from Cloud Storage.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob

google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs

get_blobs(gcs_uri:typing.Optional[str]=None,gcs_bucket_name:typing.Optional[str]=None,gcs_prefix:typing.Optional[str]="/",module:typing.Optional[str]="get-bytes",)->typing.List[google.cloud.storage.blob.Blob]

Returns a list of blobs from Cloud Storage.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs

google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes

get_bytes(gcs_bucket_name:str,gcs_prefix:str)->typing.List[bytes]

Returns a list of bytes of json files from Cloud Storage.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes

google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree

list_gcs_document_tree(gcs_bucket_name:str,gcs_prefix:str)->typing.Dict[str,typing.List[str]]

Returns a list path to files in Cloud Storage folder.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree

google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree

print_gcs_document_tree(gcs_bucket_name:str,gcs_prefix:str,files_to_display:int=4)->None

Prints a tree of filenames in a Cloud Storage folder.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree

google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri

split_gcs_uri(gcs_uri:str)->typing.Tuple[str,str]

Splits a Cloud Storage uri into the bucket_name and prefix.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri

google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file

upload_file(gcs_output_directory:str,file_name:str,file_content:str,content_type:str="application/json",module:typing.Optional[str]="upload-file",)->None

Uploads the converted docproto to gcs.

See more:google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file

google.cloud.documentai_toolbox.wrappers.document._apply_text_offset

_apply_text_offset(documentai_object:typing.Union[typing.Dict[str,typing.Dict],typing.List],text_offset:int,)->None

Applies a text offset to all text_segments indocumentai_object.

See more:google.cloud.documentai_toolbox.wrappers.document._apply_text_offset

google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name

_bigquery_column_name(input_string:str)->str

Converts a string into a BigQuery column name.

See more:google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name

google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery

_dict_to_bigquery(dic:typing.Dict[str,typing.Union[str,typing.List[str]]],dataset_name:str,table_name:str,project_id:typing.Optional[str],)->google.cloud.bigquery.job.load.LoadJob

Loads dictionary to a BigQuery table.

See more:google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery

google.cloud.documentai_toolbox.wrappers.document._entities_from_shards

_entities_from_shards(shards:typing.List[google.cloud.documentai_v1.types.document.Document],)->typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]

Returns a list of Entities and Properties from a list of documentai.Document shards.

See more:google.cloud.documentai_toolbox.wrappers.document._entities_from_shards

google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata

_get_batch_process_metadata(operation_name:str,location:typing.Optional[str]=None,timeout:typing.Optional[float]=None,)->google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata

GetBatchProcessMetadata from abatch_process_documents() long-running operation.

See more:google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata

google.cloud.documentai_toolbox.wrappers.document._get_shards

_get_shards(gcs_bucket_name:str,gcs_prefix:str)->typing.List[google.cloud.documentai_v1.types.document.Document]

Returns a list ofdocumentai.Document shards from a Cloud Storage folder.

See more:google.cloud.documentai_toolbox.wrappers.document._get_shards

google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list

_insert_into_dictionary_with_list(dic:typing.Dict[str,typing.Union[str,typing.List[str]]],key:str,value:str)->typing.Dict[str,typing.Union[str,typing.List[str]]]

Inserts value into a dictionary that can contain lists.

See more:google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list

google.cloud.documentai_toolbox.wrappers.document._pages_from_shards

_pages_from_shards(shards:typing.List[google.cloud.documentai_v1.types.document.Document],)->typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]

Returns a list of Pages from a list of documentai.Document shards.

See more:google.cloud.documentai_toolbox.wrappers.document._pages_from_shards

google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box

_get_hocr_bounding_box(element_with_layout:typing.Union[google.cloud.documentai_v1.types.document.Document.Page.Paragraph,google.cloud.documentai_v1.types.document.Document.Page,google.cloud.documentai_v1.types.document.Document.Page.Token,google.cloud.documentai_v1.types.document.Document.Page.Block,google.cloud.documentai_v1.types.document.Document.Page.Symbol,],page_dimension:google.cloud.documentai_v1.types.document.Document.Page.Dimension,)->typing.Optional[str]

google.cloud.documentai_toolbox.wrappers.page._text_from_layout

_text_from_layout(layout:google.cloud.documentai_v1.types.document.Document.Page.Layout,text:str)->str

Returns a text from a single layout element.

See more:google.cloud.documentai_toolbox.wrappers.page._text_from_layout

google.cloud.documentai_toolbox.wrappers.page._trim_text

_trim_text(text:str)->str

Remove extra space characters from text (blank, newline, tab, etc.).

See more:google.cloud.documentai_toolbox.wrappers.page._trim_text

google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_json_response

convert_document_to_annotate_file_json_response()->str

Convert OCR data fromDocument.proto to JSON str ofAnnotateFileResponse for Vision API.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_json_response

google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_response

convert_document_to_annotate_file_response()->(google.cloud.vision_v1.types.image_annotator.AnnotateFileResponse)

Convert OCR data fromDocument.proto toAnnotateFileResponse.proto for Vision API.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_response

google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery

entities_to_bigquery(dataset_name:str,table_name:str,project_id:typing.Optional[str]=None)->google.cloud.bigquery.job.load.LoadJob

Adds extracted entities to a BigQuery table.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery

google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict

entities_to_dict()->typing.Dict[str,typing.Union[str,typing.List[str]]]

Returns Dictionary of entities in document.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict

google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str

export_hocr_str(title:str)->str

Exports a string hOCR version of the Document.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str

google.cloud.documentai_toolbox.wrappers.document.Document.export_images

export_images(output_path:str,output_file_prefix:str,output_file_extension:str)->typing.List[str]

Exports images fromDocument.entities to files.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.export_images

google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery

form_fields_to_bigquery(dataset_name:str,table_name:str,project_id:typing.Optional[str]=None)->google.cloud.bigquery.job.load.LoadJob

Adds extracted form fields to a BigQuery table.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery

google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict

form_fields_to_dict()->typing.Dict[str,typing.Union[str,typing.List[str]]]

Returns dictionary of form fields in document.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict

google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata

from_batch_process_metadata(metadata:google.cloud.documentai_v1.types.document_processor_service.BatchProcessMetadata,)->typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]

Loads Documents from Cloud Storage, using the output fromBatchProcessMetadata.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata

google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation

from_batch_process_operation(location:str,operation_name:str,timeout:typing.Optional[float]=None)->typing.List[google.cloud.documentai_toolbox.wrappers.document.Document]

Loads Documents from Cloud Storage, using the operation name returned frombatch_process_documents().

See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation

google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path

from_document_path(document_path:str,)->google.cloud.documentai_toolbox.wrappers.document.Document

google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document

from_documentai_document(documentai_document:google.cloud.documentai_v1.types.document.Document,)->google.cloud.documentai_toolbox.wrappers.document.Document

google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs

from_gcs(gcs_bucket_name:str,gcs_prefix:str,gcs_input_uri:typing.Optional[str]=None)->google.cloud.documentai_toolbox.wrappers.document.Document

Loads a Document from a Cloud Storage directory.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs

google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri

from_gcs_uri(gcs_uri:str,gcs_input_uri:typing.Optional[str]=None)->google.cloud.documentai_toolbox.wrappers.document.Document

Loads a Document from a Cloud Storage uri.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri

google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type

get_entity_by_type(target_type:str,)->typing.List[google.cloud.documentai_toolbox.wrappers.entity.Entity]

google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name

get_form_field_by_name(target_field:str,)->typing.List[google.cloud.documentai_toolbox.wrappers.page.FormField]

Returns the list ofFormFields namedtarget_field.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name

google.cloud.documentai_toolbox.wrappers.document.Document.search_pages

search_pages(target_string:typing.Optional[str]=None,pattern:typing.Optional[str]=None)->typing.List[google.cloud.documentai_toolbox.wrappers.page.Page]

Returns the list of Pages containing target_string or text matching pattern.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.search_pages

google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf

split_pdf(pdf_path:str,output_path:str)->typing.List[str]

Splits local PDF file into multiple PDF files based on output from a Splitter processor.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf

google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document

to_merged_documentai_document()->(google.cloud.documentai_v1.types.document.Document)

Exports a documentai.Document from the wrapped document with shards merged.

See more:google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document

google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image

crop_image(documentai_page:google.cloud.documentai_v1.types.document.Document.Page,)->typing.Optional[PIL.Image.Image]

Return image cropped from page image for detected entity.

See more:google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image

google.cloud.documentai_toolbox.wrappers.page.Page._get_elements

_get_elements(element_type:typing.Type,attribute_name:str)->typing.List

Helper method to create elements based on specified type.

See more:google.cloud.documentai_toolbox.wrappers.page.Page._get_elements

google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows

_extract_table_rows(table_rows:typing.Iterable[google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow],)->typing.List[typing.List[str]]

Returns a list of rows from table_rows.

See more:google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows

google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe

to_dataframe()->pandas.core.frame.DataFrame

Returns pd.DataFrame from documentai.table.

See more:google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe

google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element

_get_children_of_element(potential_children:typing.List[google.cloud.documentai_toolbox.wrappers.page._BasePageElement],)->typing.List[google.cloud.documentai_toolbox.wrappers.page._BasePageElement]

Filters potential child elements to identify only those fully contained within this element.

See more:google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-30 UTC.