Use the LLM parser

TheVPC-SC security controls and CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't supported.

This page explains how to use the Vertex AI RAG Engine LLM parser.

Introduction

Vertex AI RAG Engine uses LLMs for document parsing. LLMs havethe ability to effectively process documents in the following ways:

  • Understand and interpret semantic content across various formats.
  • Retrieve relevant document chunks.
  • Extract meaningful information from documents.
  • Identify relevant sections in documents.
  • Accurately summarize complex documents.
  • Understand and interact with visuals.
  • Extract data from charts and diagrams.
  • Describe images.
  • Understand relationships between charts and text.
  • Provide more contextually rich and accurate responses.

The capabilities of the Vertex AI RAG Engine significantlyimproves the quality of generated responses.

Supported models

The following models support the Vertex AI RAG Engine LLM parser:

Supported file types

The following file types are supported by the LLM parser:

  • application/pdf
  • image/png
  • image/jpeg
  • image/webp
  • image/heic
  • image/heif

Pricing and quotas

For pricing details, seeVertex AI pricing.

For quotas that apply, seeRate quotas.

The LLM parser calls Gemini models to parse your documents. Thiscreates additional costs, which are charged to your project. The cost can beroughly estimated using this formula:

cost = number_of_document_files * average_pages_per_document *(average_input_tokens * input_token_pricing_of_selected_model +average_output_tokens * output_token_pricing_of_selected_model)

For example, you have 1,000 PDF files, and each PDF file has 50 pages. Theaverage PDF page has 500 tokens, and we need an additional 100 tokens forprompting. The average output is 100 tokens.

Gemini 2.0 Flash-Lite is used in your configuration forparsing, and it costs $0.075 for 1M input tokens and $0.3 for output texttokens.

Note: This example is for an educational purpose on how to estimate your cost.It doesn't reflect the real cost you pay for every indexing request using theLLM parser.
cost = 1,000 * 50 * (600 * 0.075 / 1M + 100 * 0.3 / 1M) = 3.75

The cost is $3.75.

Import files withLlmParser enabled

Replace the values in the following variables used in the code samples:

REST

curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE/ragFiles:import"-d'{    "import_rag_files_config": {      "gcs_source": {        "uris":  ["GCS_URI", "GOOGLE_DRIVE_URI"]      },      "rag_file_chunking_config": {        "chunk_size": 512,        "chunk_overlap": 102      },      "rag_file_parsing_config": {        "llm_parser": {          "model_name": "MODEL_NAME",          "custom_parsing_prompt": "CUSTOM_PARSING_PROMPT"          "max_parsing_requests_per_min": "MAX_PARSING_REQUESTS_PER_MIN"        }      }    }  }'

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall theVertex AI SDK for Python. Formore information, see thePython API referencedocumentation.

fromvertexaiimportragimportvertexaiPROJECT_ID="PROJECT_ID"CORPUS_NAME="RAG_CORPUS_RESOURCE"LOCATION="LOCATION"MODEL_ID="MODEL_ID"MODEL_NAME="projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"MAX_PARSING_REQUESTS_PER_MIN=MAX_PARSING_REQUESTS_PER_MIN# OptionalCUSTOM_PARSING_PROMPT="Your custom prompt"# OptionalPATHS=["https://drive.google.com/file/123","gs://my_bucket/my_files_dir"]# Initialize Vertex AI API once per sessionvertexai.init(project={PROJECT_ID},location={LOCATION})transformation_config=rag.TransformationConfig(chunking_config=rag.ChunkingConfig(chunk_size=1024,# Optionalchunk_overlap=200,# Optional),)llm_parser_config=rag.LlmParserConfig(model_name=MODEL_NAME,max_parsing_requests_per_min=MAX_PARSING_REQUESTS_PER_MIN,# Optionalcustom_parsing_prompt=CUSTOM_PARSING_PROMPT,# Optional)rag.import_files(CORPUS_NAME,PATHS,llm_parser=llm_parser_config,transformation_config=transformation_config,)

Prompting

The Vertex AI RAG Engine LLM parser uses a predefined and tuned promptfor parsing documents. However, if you have specialized documents that might notbe suitable for a general prompt, you have the option to specify your customparsing prompt when using the API. When requesting Gemini to parse yourdocuments, Vertex AI RAG Engine appends a prompt to your defaultsystem prompt.

Prompt template table

To help with document parsing, the following table provides a prompt templateexample to guide you in creating prompts that Vertex AI RAG Engine can use to parse your documents:

InstructionTemplate statementExample
Specify role.You are a/an [Specify the role, such as a factual data extractor or an information retriever].You are an information retriever.
Specify task.Extract [Specify the type of information, such as factual statements, key data, or specific details] from the [Specify the document source, such as a document, text, article, image, table].Extract key data from the sample.txt file.
Explain how you want the LLM to generate the output according to your documents.Present each fact in a [Specify the output format, such as a structured list or text format], and link to its [Specify the source location, such as a page, paragraph, table, or row].Present each fact in a structured list, and link to its sample page.
Highlight what should be the focus of the LLM.Extract [Specify the key data types, such as the names, dates, numbers, attributes, or relationships] exactly as stated.Extract names and dates.
Highlight what you want the LLM to avoid.[List the actions to avoid, such as analysis, interpretation, summarizing, inferring, or giving opinions]. Extract only what the document explicitly says.No giving opinions. Extract only what the document explicitly says.

General guidance

Follow these guidelines to write your prompt to send to the LLM parser.

  • Specific: Clearly define the task and the type of information to beextracted.
  • Detailed: Provide detailed instructions on output format, sourceattribution, and handling of different data structures.
  • Constraining: Explicitly state what the AI shouldn't do such as analysisor interpretation.
  • Clear: Use clear and directive language.
  • Structured: Organize instructions logically using numbered lists or bulletpoints for readability.

Parsing quality analysis

This table lists results from scenarios that customers ran usingVertex AI RAG Engine. The feedback shows that the LLM parser improvesthe quality of parsing documents.

ScenarioResult
Parsing information across slides and linking sectionsThe LLM parser successfully linked section titles on one slide to the detailed information presented on subsequent slides.
Understanding and extracting information from tablesThe LLM parser correctly related columns and headers within a large table to answer specific questions.
Interpreting flowchartsThe LLM parser was able to follow the logic of a flowchart and extract the correct sequence of actions and corresponding information.
Extracting data from graphsThe LLM parser could interpret different types of graphs, such as line graphs, and extract specific data points based on the query.
Capturing relationships between headings and textThe LLM parser, guided by the prompt, paid attention to heading structures and could retrieve all relevant information associated with a particular topic or section.
Potential to overcome embedding limitations with prompt engineeringWhile initially hampered by embedding model limitations in some use cases, additional experiments demonstrated that a well-crafted LLM parser prompt could potentially mitigate these issues and retrieve the correct information even when semantic understanding is challenging for the embedding model alone.

The LLM parser enhances the LLM's ability to understand and reason about thecontext within a document, which leads to more accurate and comprehensiveresponses.

Retrieval query

After you enter a prompt that's sent to a generative AI model, the retrievalcomponent in RAG searches through its knowledge base to find information that'srelevant to the query. For an example of retrieving RAG files from a corpusbased on a query text, seeRetrievalquery.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.