Parse and chunk documents

This page describes how to use Vertex AI Search to parse and chunk yourdocuments.

You can configure parsing or chunking settings in order to:

Specify how Vertex AI Search parses content. You can specify howto parse unstructured content when you upload it to Vertex AI Search.Vertex AI Search provides a digital parser, OCR parser forPDFs, and a layout parser. You can also bring your own parseddocuments. The layout parser is recommended when you have rich content andstructural elements like sections, paragraphs, tables, images, and lists to beextracted from documents for search and answer generation.
SeeImprove content detection with parsing.
Use Vertex AI Search for retrieval-augmented generation (RAG).Improve the output of LLMs with relevant data that you've uploaded to yourVertex AI Search app. To do this, you'll turn on document chunking,which indexes your data as chunks to improve relevance and decreasecomputational load for LLMs. You'll also turn on the layout parser, whichdetects document elements such as headings and lists, to improve how documentsare chunked.
For information about chunking for RAG and how to return chunks in searchrequests, seeChunk documents for RAG.

Parse documents

You can control content parsing in the following ways:

Specify parser type. You can specify the type of parsing to applydepending on file type:
- Digital parser. The digital parser is on by default for all file typesunless a different parser type is specified. The digital parser processesingested documents if no other default parser is specified for the datastore or if the specified parser doesn't support the file type of aningested document.
- OCR parsing for PDFs. This parser can be used as lower-cost alternativeto the layout parser if you are uploading scanned PDFs or PDFs withtext inside images.See theOCR parser for PDFs section of this document.
- Layout parser. If you plan to use Vertex AI Search for RAG,turn on the layout parser for HTML, PDF, DOCX, PPTX, and XLSX files. SeeChunk documents for RAG for information about thisparser and how to turn it on. The layout parser can perform opticalcharacter recognition on images and scanned documents.
Bring your own parsed document. (Preview with allowlist) If you've alreadyparsed your unstructured documents, you can import that pre-parsed contentinto Vertex AI Search. SeeBring your own parseddocument.

Parser availability comparison

The following table lists the availability of each parser by document filetypes and shows which elements each parser can detect and parse.

File type	Digital parser	OCR parser for PDFs	Layout parser
HTML	Detects paragraph elements	Not applicable	Detects paragraph, table, image, list, title, and heading elements
PDF	Detects paragraph (digital text) elements	Detects paragraph elements	Detects paragraph, table, image, title, and heading elements
DOCX	Detects paragraph elements	Not applicable	Detects paragraph, table, image, list, title, heading elements
PPTX	Detects paragraph elements	Not applicable	Detects paragraph, table, image, list, title, heading elements
TXT	Detects paragraph elements	Not applicable	Not applicable
XLSX	Detects paragraph elements	Not applicable	Detects paragraph, table, title, heading elements
XLSM	Detects paragraph elements	Not applicable	Detects paragraph, table, title, heading elements

The maximum file size of an unstructured document that you can import is thesame for all three parsers. SeePrepare data foringesting.

Digital parser

Note: The digital parser is offered without charge.

The digital parser extracts machine-readable text from documents. It detectstext blocks, but not document elements such as tables, lists, and headings.

The digital parser is used as the default if you don't specify a differentparser as the default during data store creation or if a specified parserdoesn't support a file type that's being uploaded.

OCR parser for PDFs

Note: The OCR parser for PDFs is in GeneralAvailability. For information about pricing,seeDocument AI featurepricing.

For scanned PDFs and PDFs where the text is part of an image, such as scanneddocuments and images like screenshots that contain text, then the OCR parser forPDFs can be good choice. If you have PDFs that have both non-searchabletext (such as scanned text or infographics) and machine-readable text, you canset the fielduseNativeText to true when specifying the OCR parser. In thiscase, machine-readable text is merged with OCR parsing outputs to improve textextraction quality.

If your PDFs have complex hierarchy or visual or table components, then theLayout parser might give better results.

If you have searchable PDFs or other digital formats that are mostly composed ofmachine-readable text, you typically don't need to use the OCR parser for PDFs.

OCR processing features are available for custom search apps withunstructured data stores.Because the OCR parser only applies to PDF files,only PDF files that are ingested are processed by the OCR parser;other file types are processed by the digital parser.

The OCR processor can parse the first 500 pages of a PDF file. Pages beyond the500 limit aren't processed.

Layout parser

Note: The layout parser is in GeneralAvailability. Forinformation about pricing, seeDocument AI featurepricing.

Layout parsing lets Vertex AI Search detect layouts forPDF, HTML, DOCX, PPTX, XLSX, and XLSM files. Vertex AI Search can then identifycontent elements like text blocks, tables, lists, and structural elements suchas titles and headings and use them to define the organization and hierarchy ofa document.

You can either turn on layout parsing for all file types or specify which filetypes to turn it on for. The layout parser detects content elements likeparagraphs, tables, lists, and structural elements like titles, headings,headers, and footnotes.

If you have complex non-searchable PDFs, such as scanned PDFs with complicatedhierarchy or tables, or PDFs with text inside images, such as infographics,Google recommends layout parser instead of the OCR parser.

The layout parser is available only when using document chunking for RAG. Whendocument chunking is turned on, Vertex AI Search breaks documents upinto chunks at ingestion time and can return documents as chunks. Detectingdocument layout enables content-aware chunking and enhances search and answergeneration related to document elements. For more information about chunkingdocuments for RAG, seeChunk documents for RAG.

You can select one or more of the following layout parser add-ons when youcreate your data store:

Image annotation (Public Preview)

If image annotation is enabled, when an image is detected in a source document,a description (annotation) of the image and the image itself are assigned to achunk. The annotation determines if the chunk should be returned in asearch result. If an answer is generated, the annotation can be a source for theanswer.

The layout parser can detect the following image types: BMP, GIF, JPEG, PNG, andTIFF.

To enable image annotation in layout parsing, do the following when you create the data store:

SelectDocument processing options >Layout parser settings >Enable image annotation.

Table annotation (Public Preview)

If table annotation is enabled, when a table is detected in a source document,a description (annotation) of the table and the table itself are assigned to achunk. The annotation determines if the chunk should be returned in asearch result. If an answer is generated, the annotation can be a source for theanswer.

To enable table annotation in layout parsing, do the following when you create the data store:

SelectDocument processing options >Layout parser settings >Enable table annotation.

Gemini layout parsing (Public Preview)

If Gemini layout parsing is enabled, Gemini is used toprovide layout analysis and content extraction on PDF files. This featureprovides high-quality table recognition, improved reading order, and moreaccurate text recognition. It is available for data stores that haveunstructured documents. You can use Gemini parser add-on along with thetable annotation add-on.

To enable Gemini layout parsing, do the following when you create the data store:

SelectDocument processing options >Layout parser settings >Enable Gemini enhancement.

Exclude HTML content

When using the layout parser for HTML documents, you can exclude specific partsof the HTML content from being processed. To improve data quality for searchapplications and RAG applications, you can exclude boilerplate or sections suchas navigation menus, headers, footers, or sidebars.

ThelayoutParsingConfig provides the following fieldsfor this purpose:

excludeHtmlElements: List of HTML tags to be excluded.Content within these tags is excluded.
excludeHtmlClasses: List of HTML class attributes to be excluded. HTMLelements containing these class attributes, along with their content, areexcluded.
excludeHtmlIds: List of HTML element ID attributes to be excluded. HTMLelements with these ID attributes, along with their content, are excluded.

Specify a default parser

By including thedocumentProcessingConfig objectwhen you create a data store, you can specify a default parser for that datastore. If you don't includedocumentProcessingConfig.defaultParsingConfig, thedigital parser is used. The digital parser is also used if the specified parseris not available for a file type.

REST

To specify a default parser:

Whencreating a search data store using the API,includedocumentProcessingConfig.defaultParsingConfig in the data storecreation request. You can specify the OCR parser, the layout parser, or thedigital parser:
- To specify the OCR parser:
```
"documentProcessingConfig":{"defaultParsingConfig":{"ocrParsingConfig":{"useNativeText":"NATIVE_TEXT_BOOLEAN"}}}
```
  - NATIVE_TEXT_BOOLEAN is optional. Set it only ifyou're ingesting PDFs. If set totrue, this turns on machine-readabletext processing for the OCR parser. The default isfalse.
- To specify the layout parser:
```
"documentProcessingConfig":{"defaultParsingConfig":{"layoutParsingConfig":{}}}
```
- To specify the digital parser:
  Note: Specifying the digital parser asdefaultParsingConfig is typically not necessary. When no other parser is explicitly specified, the digital parser is used by default.
```
"documentProcessingConfig":{"defaultParsingConfig":{"digitalParsingConfig":{}}}
```

Console

When creating a search data store through the console,you can specify the default parser.

Example

The following example specifies OCR parser as the default parser during datastore creation.

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project: exampleproject"\"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123"\-d'{  "displayName": "exampledatastore",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],  "contentConfig": "CONTENT_REQUIRED",  "documentProcessingConfig": {    "defaultParsingConfig": {      "ocrParsingConfig": {        "useNativeText": "false"      }    }  }}'

Specify parser overrides for file types

You can specify that a particular file type (PDF, HTML, DOCX, PPTX, XLSX, and XLSM) shouldbe parsed by a different parser than the default parser. To do so, include thedocumentProcessingConfig field in your data storecreation request and specify the override parser. If you don't specify a defaultparser, then the digital parser is the default.

REST

To specify a file-type-specific parser override:

Whencreating a search data store using the API,includedocumentProcessingConfig.defaultParsingConfig in the data storecreation request.
You can specify a parser for a given file type:
```
"documentProcessingConfig":{"parsingConfigOverrides":{"FILE_TYPE":{PARSING_CONFIG},}}
```
Replace the following:
- FILE_TYPE: Accepted values arepdf,html,docx,pptx,xlsm, andxlsx.
- PARSING_CONFIG: Specify the configuration for the parserthat you want to apply to the file type. You can specify the OCR parser, thelayout parser, or the digital parser:
  - To specify the OCR parser:
```
"ocrParsingConfig":{"useNativeText":"NATIVE_TEXT_BOOLEAN"}
```
    - NATIVE_TEXT_BOOLEAN: Optional. Set only if you'reingesting PDFs. If set totrue, this turns on machine-readabletext processing for the OCR parser. The default isfalse.
  - To specify the layout parser:
```
"layoutParsingConfig":{}
```
  - To specify the digital parser:
```
"documentProcessingConfig":{"defaultParsingConfig":{"digitalParsingConfig":{}}}
```

Console

Whencreating a search data store through the console,you can specify parser overrides for specific file types.

Example

The following example specifies during data store creation that PDF files shouldbe processed by the OCR parser and that HTML files should be processed by thelayout parser. In this case, any files other than PDF and HTML files would beprocessed by the digital parser.

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project: exampleproject"\"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123"\-d'{  "displayName": "exampledatastore",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],  "contentConfig": "CONTENT_REQUIRED",  "documentProcessingConfig": {    "parsingConfigOverrides": {      "pdf": {        "ocrParsingConfig": {            "useNativeText": "false"          },      },      "html": {         "layoutParsingConfig": {}      }    }  }}'

Edit document parsing for existing data stores

If you already have a data store, you can change the default parser and add fileformat exceptions. However, the updated parser settings only apply to newdocuments imported to the data store. Documents already in the data store arenot re-parsed with the new settings.

To change document parsing settings for a data store, do the following:

In the Google Cloud console, go to theAI Applications page.
AI Applications
In the navigation menu, clickData Stores.
In theName column, click the data store that you want to edit.
On theProcessing config tab, edit theDocument parsing settings.
TheDocument chunking settings can't be changed. If the data store doesn'thave document chunking enabled, then you can't choose the layout parser.
ClickSubmit.

Configure layout parser to exclude HTML content

You can configure layout parser toexclude HTML content by specifyingexcludeHtmlElements,excludeHtmlClasses orexcludeHtmlIds indocumentProcessingConfig.defaultParsingConfig.layoutParsingConfig.

REST

To exclude certain HTML content from being processed by layout parser, followthese steps:

Whencreating a search data store using the API,includedocumentProcessingConfig.defaultParsingConfig.layoutParsingConfigin the data store creation request.

To exclude specific HTML tag types, use:

"documentProcessingConfig":{"defaultParsingConfig":{"layoutParsingConfig":{"excludeHtmlElements":["HTML_TAG_1","HTML_TAG_2","HTML_TAG_N"]}}}

Replace theHTML_TAG variables with tag names, for example,nav andfooter.

To exclude specific HTML element class attributes, use:

"documentProcessingConfig":{"defaultParsingConfig":{"layoutParsingConfig":{"excludeHtmlClasses":["HTML_CLASS_1","HTML_CLASS_2","HTML_CLASS_N"]}}}

Replace theHTML_CLASS variables with class attributes, forexample,overlay andscreenreader.

To exclude specific HTML element ID attributes, use:

"documentProcessingConfig":{"defaultParsingConfig":{"layoutParsingConfig":{"excludeHtmlIds":["HTML_ID_1","HTML_ID_2","HTML_ID_N"]}}}

Replace theHTML_ID variables with ID attributes, forexample,cookie-banner.

Example

This example specifies that when HTML files are processed by the layoutparser, the following are skipped by the parser:

HTML element tags,header,footer,nav, andaside
HTML element class attributes of typeoverlays andscreenreader
Any elements with the attribute ID ofcookie-banner

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project: exampleproject"\"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123&createAdvancedSiteSearch=true"\-d'{  "displayName": "exampledatastore",  "industryVertical": "GENERIC",  "contentConfig": "PUBLIC_WEBSITE",  "documentProcessingConfig": {    "chunkingConfig": {      "layoutBasedChunkingConfig": {}    },    "defaultParsingConfig": {      "layoutParsingConfig": {       "excludeHtmlElements": ["header", "footer", "nav", "aside"],       "excludeHtmlClasses": ["overlays", "screenreader"],       "excludeHtmlIds": ["cookie-banner"]      }    }  }}'

Get parsed documents in JSON

You can get a parsed document in JSON format by calling thegetProcessedDocument method and specifyingPARSED_DOCUMENT as the processeddocument type. Getting parsed documents in JSON can be helpful if you need toupload the parsed document elsewhere or if you decide to re-import parseddocuments to Vertex AI Search using thebring your own parseddocument feature.

REST

To get parsed documents in JSON, follow this step:

Call thegetProcessedDocument method:

curl-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=PARSED_DOCUMENT"

Replace the following:

PROJECT_ID: The ID of your project.
DATA_STORE_ID: The ID your data store.
DOCUMENT_ID: The ID of the document to get.

Bring your own parsed document

Note: This feature is in Preview with allowlist.

You can import pre-parsed, unstructured documents intoVertex AI Search data stores. For example, instead of importing a rawPDF document, you can parse the PDF yourself and import the parsing resultinstead. This lets you import your documents in a structured way, ensuringthat search and answer generation have information about the document's layoutand elements.

A parsed, unstructured document is represented by JSON that describes theunstructured document using a sequence of text, table, and list blocks. Youimport JSON files with your parsed unstructured document data in the same waythat you import other types of unstructured documents, such as PDFs. When thisfeature is turned on, whenever a JSON file is uploaded and identified by eitheranapplication/json MIME type or a .JSON extension, it is treated as aparsed document.

To turn on this feature and for information about how to use it, contact yourGoogle account team.

Chunk documents for RAG

By default, Vertex AI Search is optimized for document retrieval, whereyour search app returns a document such as a PDF or web page with each searchresult.

Document chunking features are available for custom search apps withunstructured data stores.

Vertex AI Search can instead be optimized for RAG, where your searchapp is primarily used to augment LLM output with your custom data. When documentchunking is turned on, Vertex AI Search breaks up your documents intochunks. In search results, your search app can return relevant chunks of datainstead of full documents. Using chunked data for RAG increases relevance forLLM answers and reduces computational load for LLMs.

To use Vertex AI Search for RAG:

Turn on document chunking when you create your datastore.
Alternatively,upload your own chunks (Preview withallowlist) if you've already chunked your own documents.
Retrieve and view chunks in the following ways:
Return chunks in search requests.

Limitations

The following limitations apply to chunking:

Document chunking can't be turned on or off after data store creation.
You can make search requests for documents instead of chunks from a data storewith document chunking turned on. However, data stores with document chunkingturned on aren't optimized for returning documents. Documents are returned byaggregating chunks into documents.
When document chunking is turned on, search summaries and search withfollow-ups are supported in Public preview but not supported as GA.

Document chunking options

This section describes the options that you specify in order to turn ondocument chunking.

During data store creation, turn on the following options so thatVertex AI Search can index your documents as chunks.

Layout-aware document chunking. To turn this option on, include thedocumentProcessingConfig field in your data store creation request and specifyChunkingConfig.LayoutBasedChunkingConfig.
When layout-aware document chunking is turned on, Vertex AI Searchdetects a document's layout and take it into account during chunking. Thisimproves semantic coherence and reduces noise in the content when it's usedfor retrieval and LLM generation. All text in a chunk will come from thesame layout entity, such as headings, subheadings, and lists.
Layout parsing. To turn this option on, specifyParsingConfig.LayoutParsingConfig during data store creation.
The layout parser detect layouts for files such as PDF, HTML, and DOCX files.It identifies elements like text blocks, tables, lists, titles, and headings,and uses them to define the organization and hierarchy of a document.
For more about layout parsing, seeLayout parsing.

Turn on document chunking

You can turn on document chunking by includingthedocumentProcessingConfig objectin your data store creation request and turning on layout-aware documentchunking and layout parsing.

REST

To turn on document chunking:

Whencreating a search data store using the API,include thedocumentProcessingConfig.chunkingConfig object in the data storecreation request.
```
"documentProcessingConfig":{"chunkingConfig":{"layoutBasedChunkingConfig":{"chunkSize":CHUNK_SIZE_LIMIT,"includeAncestorHeadings":HEADINGS_BOOLEAN,}},"defaultParsingConfig":{"layoutParsingConfig":{}}}
```
Replace the following:
- CHUNK_SIZE_LIMIT: Optional. The token size limit foreach chunk. The default value is 500. Supported values are 100-500(inclusive).
- HEADINGS_BOOLEAN: Optional. Determines whether headingsare included in each chunk. The default value isfalse. Appending titleand headings at all levels to chunks from the middle of the document canhelp prevent context loss in chunk retrieval and ranking.

Console

Whencreating a search data store through the console,you can turn on document chunking.

Bring your own chunks (Preview with allowlist)

Note: This feature is in Preview with allowlist.

If you've already chunked your own documents, you can upload those toVertex AI Search instead of turning on document chunking options.

Bringing your own chunks is a Preview with allowlist feature. To use thisfeature, contact your Google account team.

List a document's chunks

To list all chunks for a specific document, call theChunks.list method.

REST

To list chunks for a document, follow this step:

Call the Chunks.list method:

curl-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks"

Replace the following:

PROJECT_ID: The ID of your project.
DATA_STORE_ID: The ID your data store.
DOCUMENT_ID: The ID of the document to list chunks from.

Get chunks in JSON from a processed document

You can get all the chunks from a specific document in JSON format by callingthegetProcessedDocument method. Getting chunks in JSON can be helpful if youneed to upload chunks elsewhere or if you decide to re-import chunks toVertex AI Search using thebring your own chunks feature.

REST

To get JSON chunks for a document, follow this step:

Call thegetProcessedDocument method:

curl-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=CHUNKED_DOCUMENT"

Replace the following:

PROJECT_ID: The ID of your project.
DATA_STORE_ID: The ID your data store.
DOCUMENT_ID: The ID of the document to get chunksfrom.

Get specific chunks

To get a specific chunk, call theChunks.get method.

REST

To get a specific chunk, follow this step:

Call theChunks.get method:

curl-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks/CHUNK_ID"

Replace the following:

PROJECT_ID: The ID of your project.
DATA_STORE_ID: The ID your data store.
DOCUMENT_ID: The ID of the document that the chunk is from.
CHUNK_ID: The ID of the chunk to return.

Return chunks in search requests

After you've confirmed that your data has been chunked correctly, yourVertex AI Search can return chunked data in its search results.

The response returns a chunk that is relevant to the search query. In addition,you can choose to return adjacent chunks that appear before and after therelevant chunk in the source document. Adjacent chunks can add context andaccuracy.

REST

To get chunked data:

When making a search request, specifyContentSearchSpec.SearchResultMode aschunks.
```
contentSearchSpec": {  "searchResultMode": "RESULT_MODE",  "chunkSpec": {       "numPreviousChunks":NUMBER_OF_PREVIOUS_CHUNKS,       "numNextChunks":NUMBER_OF_NEXT_CHUNKS   }}
```
- RESULT_MODE: Determines whether search results are returnedas full documents or in chunks. To get chunks, the data store musthave document chunking turned on. Accepted values aredocuments andchunks. If document chunking is turned on for your data store, thedefault value ischunks.
- NUMBER_OF_PREVIOUS_CHUNKS: The number of chunks to returnthat immediately preceded the relevant chunk. The maximum allowed valueis 5.
- NUMBER_OF_NEXT_CHUNKS: The number of chunks to returnthat immediately follow the relevant chunk. The maximum allowed value is5.

Example

The following example of a search query request setsSearchResultMode tochunks, requests one previous chunk and one next chunk, and limits the numberof results to a single relevant chunk usingpageSize.

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project: exampleproject"\"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores/datastore123/servingConfigs/default_search:search"\-d'{  "query": "animal",  "pageSize": 1,  "contentSearchSpec": {    "searchResultMode": "CHUNKS",    "chunkSpec": {           "numPreviousChunks": 1,           "numNextChunks": 1       }  }}'

The following example shows the response that is returned for the example query.The response contains the relevant chunks, the previous and next chunks, theoriginal document's metadata, and the span of document pages that each chunk wasderived from.

Response

{"results":[{"chunk":{"name":"projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c17","id":"c17","content":"\n# ESS10: Stakeholder Engagement and Information Disclosure\nReaders should also refer to ESS10 and its guidance notes, plus the template available for a stakeholder engagement plan. More detail on stakeholder engagement in projects with risks related to animal health is contained in section 4 below. The type of stakeholders (men and women) that can be engaged by the Borrower as part of the project's environmental and social assessment and project design and implementation are diverse and vary based on the type of intervention. The stakeholders can include: Pastoralists, farmers, herders, women's groups, women farmers, community members, fishermen, youths, etc. Cooperatives members, farmer groups, women's livestock associations, water user associations, community councils, slaughterhouse workers, traders, etc. Veterinarians, para-veterinary professionals, animal health workers, community animal health workers, faculties and students in veterinary colleges, etc. 8 \n# 4. Good Practice in Animal Health Risk Assessment and Management\n\n# Approach\nRisk assessment provides the transparent, adequate and objective evaluation needed by interested parties to make decisions on health-related risks associated with project activities involving live animals. As the ESF requires, it is conducted throughout the project cycle, to provide or indicate likelihood and impact of a given hazard, identify factors that shape the risk, and find proportionate and appropriate management options. The level of risk may be reduced by mitigation measures, such as infrastructure (e.g., diagnostic laboratories, border control posts, quarantine stations), codes of practice (e.g., good animal husbandry practices, on-farm biosecurity, quarantine, vaccination), policies and regulations (e.g., rules for importing live animals, ban on growth hormones and promotors, feed standards, distance required between farms, vaccination), institutional capacity (e.g., veterinary services, surveillance and monitoring), changes in individual behavior (e.g., hygiene, hand washing, care for animals). Annex 2 provides examples of mitigation practices. This list is not an exhaustive one but a compendium of most practiced interventions and activities. The cited measures should take into account social, economic, as well as cultural, gender and occupational aspects, and other factors that may affect the acceptability of mitigation practices by project beneficiaries and other stakeholders. Risk assessment is reviewed and updated through the project cycle (for example to take into account increased trade and travel connectivity between rural and urban settings and how this may affect risks of disease occurrence and/or outbreak). Projects monitor changes in risks (likelihood and impact) by using data, triggers or indicators. ","documentMetadata":{"uri":"gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf","title":"AnimalHealthGoodPracticeNote"},"pageSpan":{"pageStart":14,"pageEnd":15},"chunkMetadata":{"previousChunks":[{"name":"projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c16","id":"c16","content":"\n# ESS6: Biodiversity Conservation and Sustainable Management of Living Natural Resources\nThe risks associated with livestock interventions under ESS6 include animal welfare (in relation to housing, transport, and slaughter); diffusion of pathogens from domestic animals to wildlife, with risks for endemic species and biodiversity (e.g., sheep and goat plague in Mongolia affecting the saiga, an endemic species of wild antelope); the introduction of new breeds with potential risk of introducing exotic or new diseases; and the release of new species that are not endemic with competitive advantage, potentially putting endemic species at risk of extinction. Animal welfare relates to how an animal is coping with the conditions in which it lives. An animal is in a good state of welfare if it is healthy, comfortable, well nourished, safe, able to express innate behavior, 7 Good Practice Note - Animal Health and related risks and is not suffering from unpleasant states such as pain, fear or distress. Good animal welfare requires appropriate animal care, disease prevention and veterinary treatment; appropriate shelter, management and nutrition; humane handling, slaughter or culling. The OIE provides standards for animal welfare on farms, during transport and at the time of slaughter, for their welfare and for purposes of disease control, in its Terrestrial and Aquatic Codes. The 2014 IFC Good Practice Note: Improving Animal Welfare in Livestock Operations is another example of practical guidance provided to development practitioners for implementation in investments and operations. Pastoralists rely heavily on livestock as a source of food, income and social status. Emergency projects to restock the herds of pastoralists affected by drought, disease or other natural disaster should pay particular attention to animal welfare (in terms of transport, access to water, feed, and animal health) to avoid potential disease transmission and ensure humane treatment of animals. Restocking also entails assessing the assets of pastoralists and their ability to maintain livestock in good conditions (access to pasture and water, social relationship, technical knowledge, etc.). Pastoralist communities also need to be engaged by the project to determine the type of animals and breed and the minimum herd size to be considered for restocking. \n# Box 5. Safeguarding the welfare of animals and related risks in project activities\nIn Haiti, the RESEPAG project (Relaunching Agriculture: Strengthening Agriculture Public Services) financed housing for goats and provided technical recommendations for improving their welfare, which is critical to avoid the respiratory infections, including pneumonia, that are serious diseases for goats. To prevent these diseases, requires optimal sanitation and air quality in herd housing. This involves ensuring that buildings have adequate ventilation and dust levels are reduced to minimize the opportunity for infection. Good nutrition, water and minerals are also needed to support the goats immune function. The project paid particular attention to: (i) housing design to ensure good ventilation; (ii) locating housing close to water sources and away from human habitation and noisy areas; (iii) providing mineral blocks for micronutrients; (iv) ensuring availability of drinking water and clean food troughs. ","documentMetadata":{"uri":"gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf","title":"AnimalHealthGoodPracticeNote"},"pageSpan":{"pageStart":13,"pageEnd":14}}],"nextChunks":[{"name":"projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c18","id":"c18","content":"\n# Scoping of risks\nEarly scoping of risks related to animal health informs decisions to initiate more comprehensive risk assessment according to the type of livestock interventions and activities. It can be based on the following considerations: • • • • Type of livestock interventions supported by the project (such as expansion of feed resources, improvement of animal genetics, construction/upgrading and management of post-farm-gate facilities, etc. See also Annex 2); Geographic scope and scale of the livestock interventions; Human and animal populations that are likely to be affected (farmers, women, children, domestic animals, wildlife, etc.); and Changes in the project or project context (such as emerging disease outbreak, extreme weather or climatic conditions) that would require a re-assessment of risk levels, mitigation measures and their likely effect on risk reduction. Scenario planning can also help to identify project-specific vulnerabilities, country-wide or locally, and help shape pragmatic analyses that address single or multiple hazards. In this process, some populations may be identified as having disproportionate exposure or vulnerability to certain risks because of occupation, gender, age, cultural or religious affiliation, socio-economic or health status. For example, women and children may be the main caretakers of livestock in the case of 9 Good Practice Note - Animal Health and related risks household farming, which puts them into close contact with animals and animal products. In farms and slaughterhouses, workers and veterinarians are particularly exposed, as they may be in direct contact with sick animals (see Box 2 for an illustration). Fragility, conflict, and violence (FCV) can exacerbate risk, in terms of likelihood and impact. Migrants new to a geographic area may be immunologically naïve to endemic zoonotic diseases or they may inadvertently introduce exotic diseases; and refugees or internally displaced populations may have high population density with limited infrastructure, leaving them vulnerable to disease exposure. Factors such as lack of access to sanitation, hygiene, housing, and health and veterinary services may also affect disease prevalence, contributing to perpetuation of poverty in some populations. Risk assessment should identify populations at risk and prioritize vulnerable populations and circumstances where risks may be increased. It should be noted that activities that seem minor can still have major consequences. See Box 6 for an example illustrating how such small interventions in a project may have large-scale consequences. It highlights the need for risk assessment, even for simple livestock interventions and activities, and how this can help during the project cycle (from concept to implementation). ","documentMetadata":{"uri":"gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf","title":"AnimalHealthGoodPracticeNote"},"pageSpan":{"pageStart":15,"pageEnd":16}}]}}}],"totalSize":61,"attributionToken":"jwHwjgoMCICPjbAGEISp2J0BEiQ2NjAzMmZhYS0wMDAwLTJjYzEtYWQxYS1hYzNlYjE0Mzc2MTQiB0dFTkVSSUMqUMLwnhXb7Ygtq8SKLa3Eii3d7Ygtj_enIqOAlyLm7Ygtt7eMLduPmiKN96cijr6dFcXL8xfdj5oi9-yILdSynRWCspoi-eyILYCymiLk7Ygt","nextPageToken":"ANxYzNzQTMiV2MjFWLhFDZh1SMjNmMtADMwATL5EmZyMDM2YDJaMQv3yagQYAsciPgIwgExEgC","guidedSearchResult":{},"summary":{}}

What's next

Create a search data store

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Parse and chunk documents Stay organized with collections Save and categorize content based on your preferences.

Parse documents

Parser availability comparison

Digital parser

OCR parser for PDFs

Layout parser

Image annotation (Public Preview)

Table annotation (Public Preview)

Gemini layout parsing (Public Preview)

Exclude HTML content

Specify a default parser

REST

Console

Example

Specify parser overrides for file types

REST

Console

Example

Edit document parsing for existing data stores

Configure layout parser to exclude HTML content

REST

Example

Get parsed documents in JSON

REST

Bring your own parsed document

Chunk documents for RAG

Limitations

Document chunking options

Turn on document chunking

REST

Console

Bring your own chunks (Preview with allowlist)

List a document's chunks

REST

Get chunks in JSON from a processed document

REST

Get specific chunks

REST

Return chunks in search requests

REST

Example

Response

What's next

Parse and chunk documents