Extraction overview

Document AI offers multiple products to extract information from documentsfor different use cases:

  • Form Parser
  • Custom extractor, which offers three different modeling types:

    • Foundation model
    • Custom model based
    • Custom template based
  • Layout Parser

Form Parser

Form Parser extracts key-value pairs (KVP), tables, selection marks (checkboxes),and generic fields to augment and automate extraction. It can extract up to 11generic entities and checkboxes out of the box. You don't specify the fields (schema),you want to extract with the Form Parser. The model detects and returns entitiesof interest from each page of documents.

Custom extractor

The custom extractor extracts entities you define in schema and offers three modeling options:foundation model, custom model based, and custom template based. Given promisingresults from foundation models with little to no training data, we recommend startingwith the foundation model as the first option and try out other options as needed.The foundation models do zero- to few-shot prediction, based on up to 5 labeleddocuments in the dataset, and fine-tuned prediction with more than 10 labeled documents in the dataset.

Training methodDocument examplesDocument layout variationFree form text or paragraphsNumber of training documents for production-ready quality, depending on variability
Fine tune and foundation model (generative AI).Contract, terms of service, invoice, bank statement, bill of lading, payslips.High to Low (preferred).High.Medium: 0-50+ documents.
Custom model.Model.Similar forms with layout variation across years or vendors (for example, W9).Low to medium.Low.High: 10-100+ documents.
Template.Tax forms with a fixed layout (for example, Forms 941 and 709).None.Low.Low (3 documents).

Because foundation models typically require fewer training documents, they'rerecommended as the first option for all variable layouts.

Layout Parser

Note: Layout Parser is in Public preview

Layout Parser transforms documents in various formats into structuredrepresentations, making content like paragraphs, tables, lists, and structuralelements like headings, page headers, and footers accessible, and creatingcontext-aware chunks that facilitate information retrieval in a range ofgenerative AI and discovery apps.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.