Use the LLM parser Stay organized with collections Save and categorize content based on your preferences.
TheVPC-SC security controls and CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't supported.
This page explains how to use the Vertex AI RAG Engine LLM parser.
Introduction
Vertex AI RAG Engine uses LLMs for document parsing. LLMs havethe ability to effectively process documents in the following ways:
- Understand and interpret semantic content across various formats.
- Retrieve relevant document chunks.
- Extract meaningful information from documents.
- Identify relevant sections in documents.
- Accurately summarize complex documents.
- Understand and interact with visuals.
- Extract data from charts and diagrams.
- Describe images.
- Understand relationships between charts and text.
- Provide more contextually rich and accurate responses.
The capabilities of the Vertex AI RAG Engine significantlyimproves the quality of generated responses.
Supported models
The following models support the Vertex AI RAG Engine LLM parser:
- Gemini 3 ProPreview model
- Gemini 2.5 Pro
- Gemini 2.5 FlashPreview model
- Gemini 2.5 Flash-LitePreview model
- Gemini 2.5 Flash
- Gemini 2.5 Flash-Lite
- Gemini 2.0 Flash
Supported file types
The following file types are supported by the LLM parser:
application/pdfimage/pngimage/jpegimage/webpimage/heicimage/heif
Pricing and quotas
For pricing details, seeVertex AI pricing.
For quotas that apply, seeRate quotas.
The LLM parser calls Gemini models to parse your documents. Thiscreates additional costs, which are charged to your project. The cost can beroughly estimated using this formula:
cost = number_of_document_files * average_pages_per_document *(average_input_tokens * input_token_pricing_of_selected_model +average_output_tokens * output_token_pricing_of_selected_model)
For example, you have 1,000 PDF files, and each PDF file has 50 pages. Theaverage PDF page has 500 tokens, and we need an additional 100 tokens forprompting. The average output is 100 tokens.
Gemini 2.0 Flash-Lite is used in your configuration forparsing, and it costs $0.075 for 1M input tokens and $0.3 for output texttokens.
Note: This example is for an educational purpose on how to estimate your cost.It doesn't reflect the real cost you pay for every indexing request using theLLM parser.cost = 1,000 * 50 * (600 * 0.075 / 1M + 100 * 0.3 / 1M) = 3.75
The cost is $3.75.
Import files withLlmParser enabled
Replace the values in the following variables used in the code samples:
- PROJECT_ID: The ID for your Google Cloud project.
- LOCATION: The region where your request is processed.
- RAG_CORPUS_RESOURCE: The ID of your corpus.
- GCS_URI: The Cloud Storage URI of thefiles you want to import.
- GOOGLE_DRIVE_URI: The Google Drive URIof the files you want to import.
- MODEL_NAME: The resource name of the model used forparsing.
Format:
projects/{project_id}/locations/{location}/publishers/google/models/{model_id} - CUSTOM_PARSING_PROMPT: Optional: Customprompt that is configured by the customer for LLM parser to use for parsingdocuments.
MAX_PARSING_REQUESTS_PER_MIN: Optional:The maximum number of requests the job can make to theVertex AI model per minute. For more information, seeGenerative AI on Vertex AI rate limits andtheQuotas & System Limits page for yourproject to set an appropriate value.
REST
curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE/ragFiles:import"-d'{ "import_rag_files_config": { "gcs_source": { "uris": ["GCS_URI", "GOOGLE_DRIVE_URI"] }, "rag_file_chunking_config": { "chunk_size": 512, "chunk_overlap": 102 }, "rag_file_parsing_config": { "llm_parser": { "model_name": "MODEL_NAME", "custom_parsing_prompt": "CUSTOM_PARSING_PROMPT" "max_parsing_requests_per_min": "MAX_PARSING_REQUESTS_PER_MIN" } } } }'Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall theVertex AI SDK for Python. Formore information, see thePython API referencedocumentation.
fromvertexaiimportragimportvertexaiPROJECT_ID="PROJECT_ID"CORPUS_NAME="RAG_CORPUS_RESOURCE"LOCATION="LOCATION"MODEL_ID="MODEL_ID"MODEL_NAME="projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"MAX_PARSING_REQUESTS_PER_MIN=MAX_PARSING_REQUESTS_PER_MIN# OptionalCUSTOM_PARSING_PROMPT="Your custom prompt"# OptionalPATHS=["https://drive.google.com/file/123","gs://my_bucket/my_files_dir"]# Initialize Vertex AI API once per sessionvertexai.init(project={PROJECT_ID},location={LOCATION})transformation_config=rag.TransformationConfig(chunking_config=rag.ChunkingConfig(chunk_size=1024,# Optionalchunk_overlap=200,# Optional),)llm_parser_config=rag.LlmParserConfig(model_name=MODEL_NAME,max_parsing_requests_per_min=MAX_PARSING_REQUESTS_PER_MIN,# Optionalcustom_parsing_prompt=CUSTOM_PARSING_PROMPT,# Optional)rag.import_files(CORPUS_NAME,PATHS,llm_parser=llm_parser_config,transformation_config=transformation_config,)Prompting
The Vertex AI RAG Engine LLM parser uses a predefined and tuned promptfor parsing documents. However, if you have specialized documents that might notbe suitable for a general prompt, you have the option to specify your customparsing prompt when using the API. When requesting Gemini to parse yourdocuments, Vertex AI RAG Engine appends a prompt to your defaultsystem prompt.
Prompt template table
To help with document parsing, the following table provides a prompt templateexample to guide you in creating prompts that Vertex AI RAG Engine can use to parse your documents:
| Instruction | Template statement | Example |
|---|---|---|
| Specify role. | You are a/an [Specify the role, such as a factual data extractor or an information retriever]. | You are an information retriever. |
| Specify task. | Extract [Specify the type of information, such as factual statements, key data, or specific details] from the [Specify the document source, such as a document, text, article, image, table]. | Extract key data from the sample.txt file. |
| Explain how you want the LLM to generate the output according to your documents. | Present each fact in a [Specify the output format, such as a structured list or text format], and link to its [Specify the source location, such as a page, paragraph, table, or row]. | Present each fact in a structured list, and link to its sample page. |
| Highlight what should be the focus of the LLM. | Extract [Specify the key data types, such as the names, dates, numbers, attributes, or relationships] exactly as stated. | Extract names and dates. |
| Highlight what you want the LLM to avoid. | [List the actions to avoid, such as analysis, interpretation, summarizing, inferring, or giving opinions]. Extract only what the document explicitly says. | No giving opinions. Extract only what the document explicitly says. |
General guidance
Follow these guidelines to write your prompt to send to the LLM parser.
- Specific: Clearly define the task and the type of information to beextracted.
- Detailed: Provide detailed instructions on output format, sourceattribution, and handling of different data structures.
- Constraining: Explicitly state what the AI shouldn't do such as analysisor interpretation.
- Clear: Use clear and directive language.
- Structured: Organize instructions logically using numbered lists or bulletpoints for readability.
Parsing quality analysis
This table lists results from scenarios that customers ran usingVertex AI RAG Engine. The feedback shows that the LLM parser improvesthe quality of parsing documents.
| Scenario | Result | |
|---|---|---|
| Parsing information across slides and linking sections | The LLM parser successfully linked section titles on one slide to the detailed information presented on subsequent slides. | |
| Understanding and extracting information from tables | The LLM parser correctly related columns and headers within a large table to answer specific questions. | |
| Interpreting flowcharts | The LLM parser was able to follow the logic of a flowchart and extract the correct sequence of actions and corresponding information. | |
| Extracting data from graphs | The LLM parser could interpret different types of graphs, such as line graphs, and extract specific data points based on the query. | |
| Capturing relationships between headings and text | The LLM parser, guided by the prompt, paid attention to heading structures and could retrieve all relevant information associated with a particular topic or section. | |
| Potential to overcome embedding limitations with prompt engineering | While initially hampered by embedding model limitations in some use cases, additional experiments demonstrated that a well-crafted LLM parser prompt could potentially mitigate these issues and retrieve the correct information even when semantic understanding is challenging for the embedding model alone. |
The LLM parser enhances the LLM's ability to understand and reason about thecontext within a document, which leads to more accurate and comprehensiveresponses.
Retrieval query
After you enter a prompt that's sent to a generative AI model, the retrievalcomponent in RAG searches through its knowledge base to find information that'srelevant to the query. For an example of retrieving RAG files from a corpusbased on a query text, seeRetrievalquery.
What's next
- To learn more about Vertex AI RAG Engine, seeVertex AI RAG Engine overview.
- To learn more about the Vertex AI RAG Engine, seeVertex AI RAG Engine API.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.