Create, use, and manage a custom document classifier

Use custom classifier to classify documents. Build it from the ground up with yourown documents and custom classes. Its generative AI aspect allows few-shot learningand fine-tuning. These improve accuracy with fewer samples and corrections withiterative auto-labeling.

Custom classifier covers these three general use cases.

Pretrained model: Use the pretrained generative AI foundation model toquickly classify documents with your supplied labels.
Fine-tuning: Improve accuracy by training the generative AI foundation modelon your own data and labels.
Train a custom model: Train a non-generative AI custom extractor using yourown data and labels.

Custom classifier model versions

Confidence scores are supported for custom classifier models inPreview. Forbest performance, use them with fine-tuned models.

Model version	Description	Release channel	ML processing in US/EU	Fine-tuning in US/EU	Release date
`pretrained-foundation-model-v1.4-2025-05-16`	Release candidate powered by the Gemini 2.0 Flash LLM. Also includes advanced OCR features.Note: Effective March 31, 2026, this version will no longer be accessible.	Release Candidate	Yes	US, EU (Preview)	May 16, 2025
`pretrained-classifier-v1.5-2025-08-05`	Release candidate powered by the Gemini 2.5 Flash LLM. Also includes advanced OCR features.	Release Candidate	Yes	US, EU (Preview)	August 5, 2025

Create a custom classifier in the Google Cloud console

You can create custom classifiers that are specifically suited to your documentsand trained and evaluated with your data. This processor identifies classes ofdocuments from a user-defined set of classes. You can then use this trained processoron additional documents. You typically would use a custom classifier on documentsthat are different types, then use the identification to pass the documents to anextraction processor to extract the entities.

For the general process to create and use a processor, see theHow tosection.

Note: If you have your documents in separate folders by class, then you can skipstep 4 by specifying the class at import time.

You can make your own configuration choices that suit your workflow.

To follow step-by-step guidance for this task directly in the Google Cloud console, clickGuide me:

Guide me

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.

Verify that billing is enabled for your Google Cloud project.

Enable the Document AI, Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Go to project selector

If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.

Verify that billing is enabled for your Google Cloud project.

Enable the Document AI, Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the APIs

Required roles

To get the permissions that you need to create a custom classifier, ask your administrator to grant you the following IAM roles on your project:

Document AI Administrator (roles/documentai.admin)
Storage Admin (roles/storage.admin)

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Note: If you want to access input files stored in a different project, then youneed to grant additional roles to the Document AI service agent. Formore information, see Cross project file accesssetup.

Create a processor

Complete the following steps.

Go to theWorkbench
For custom document classifier, selectCreate processor.
In theCreate processor menu, enter a name for your processor, such asmy-custom-document-classifier.
Select the region closest to you.
SelectCreate. TheProcessor Details tab appears.

Configure dataset

To train this new processor, you must create a dataset with training andtesting data to help the processor identify the documents that you want to splitand classify. This dataset requires a new location. This can be an emptyCloud Storage bucketor folder, or you can allow an internally managed location.

After theProcessor Details tab appears, then you can:

SelectGoogle-managed storage in case you want to use Cloud Storage.
SelectI'll specify my own storage location if you want to use your ownstorage to use Customer-Managed Encryption Keys (CMEK), and follow theprocedure inCreate a dataset.

custom-classifier-3

Note: Folders for datasets must be treated as read-only. Don't change or add anything.

Import documents into a dataset

Next, you import your documents into your dataset.

On theBuild tab, selectImport documents.
When choosing to use a storage bucket, you must enter theSource Path forthe bucket. For this training example, enter this bucket name inSource path. This links directly to one document.
```
cloud-samples-data/documentai/Custom/Patents/PDF/computer_vision_20.pdf
```
ForData split, selectUnassigned. The document in this folder is notassigned to either the testing or training set. LeaveImport with auto-labelingunchecked.
SelectImport. Document AI reads the documents from the bucket intothe dataset. It does not modify the import bucket or read from the bucket afterthe import is complete.
Optional: To delete imported documents, in theBuild tab, go toManage dataset > select the documents > clickDelete.

When you import documents, you can optionally assign the documents totheTraining orTest set when imported, or wait to assign them later.

For more information about preparing your data for import, refer to the Data preparation guide.

Define processor schema

You can create the processor schema either before or after you import documentsinto your dataset. The schema provides labels that you use to annotate documents.

On theBuild tab, selectManage Dataset >Edit Schema.TheEdit schema page opens.
SelectCreate label.
Enter the name for the label.
SelectCreate. Refer toDefine processor schemafor detailed instructions on creating and editing a schema.
Note: When a processor is trained, labels cannot be deleted. Instead, you candisable any label you don't want to use.
Create each of the following labels for the processor schema.
Tip: Use the description field to enter a prompt which describes the label.This helps train the model and differentiate similarly written labels. Learnmore atlabel with property descriptions.
SelectSave when the labels are complete.

Note: Labels are flat and don't support child entities.

Label a document

The process of selecting text in a document and applying labels is known asannotation.

Return to theBuild tab, and selecta documentto open theManage Dataset console.
Among theoptions,select the appropriate label for the document.If you're using the sample document provided, selectcomputer_vision.
When labeled, the document should look like this:
SelectMark as Labeledwhen you have finished annotating the document.
On theManage Dataset tab, theDocument panel shows that one documenthas been labeled.

Tip: Remember to be meticulous when labeling manually or using batch labeling.Tidy labeling improves techniques using zero-and few-shot labeling.

Assign annotated document to the training set

Now that you have labeled this example document, you can assign it to thetraining set.

On theManage Dataset tab, select theSelect Allcheckbox.
From theAssign to Setlist, selectTraining.

In theDocuments panel, you can find that one document has been assigned to thetraining set.

Import prelabeled data to the training and test sets

In this guide, you are provided with prelabeled data. If working on your ownproject, you have to determine how to label your data. Refer toLabeling options.

Document AI custom processors require a minimum of one document in both thetraining and test sets for each document type to be labeled. We recommend thatyou have at least 10 documents for each label for best performance. For 5 labels,you would need 50 documents to train and 50 to test. More training datatypically produces higher accuracy.

SelectImport documents.
Enter the following path inSource path. This bucket contains pre-labeled documents in theDocument JSON format.
```
cloud-samples-data/documentai/Custom/Patents/JSON/Classification-InventionType
```
From theData split list, selectAuto-split. This automaticallysplits the documents to have 80% in the training set and 20% in the test set.Ignore theApply labels section.
SelectImport. The import might take several minutes to complete.

When the import is finished, you'll find the documents in theManage Datasettab.

Batch label documents at import

Optionally, after the schema has been configured, you can label all documents that are in aparticular directory at import to save time with labeling.

custom-classifier-9

SelectImport documents.
Enter the following path inSource path. This bucket contains unlabeled documents in PDF format.
```
cloud-samples-data/documentai/Custom/Patents/PDF-CDC-BatchLabel
```
From theData split list, selectAuto-split. This automatically splitsthe documents to have 80% in the training set and 20% in the test set.
In theApply labels section, selectChoose label.
For these sample documents, selectother.
SelectImport and wait for the process to finish. You can leave this pageand return later.When complete, you find the documents on theManage Dataset tab with the labelapplied.

Train the processor

Now that you have imported the training and test data, you can train the processor.Because training might take several hours, make sure you have set up the processorwith the appropriate data and labels before you begin training.

You can train fine-tuned and custom models with your labeled data. Fine-tunedmodels use generative AI. The custom models trains a unique large language Modelusing your labeled data. You need a minimum of two labels in the schema, with arecommended ten training documents and 10 test documents (minimum of 1).

SelectTrain New Version.

Note: Fine-tuning will tune a foundation model, which is recommended.Train a custom model will train a conventional model, one without generative AI.

In theVersion name field, enter a name for this processor version, such asmy-cdc-version-1.
Optional: SelectView Label Stats to find information about the documentlabels that can help determine your coverage. SelectClose to return to thetraining setup.
SelectStart training.You can check the status on the side panel.

Deploy the processor version

After training is complete, navigate to theManage Versionstab. You can view details about the version you just trained.
Select thebeside the version you want to deploy, and selectDeploy version.
SelectDeployfrom the dialog window.
Deployment takes a few minutes to complete.

Evaluate and test the processor

After deployment is complete, navigate to theEvaluate & Testtab.
On this page, you can view evaluation metrics including the F1 score, precisionand recall for the full document, and individual labels. For more informationabout evaluation and statistics, refer to Evaluate processor.
Download a document that has not been involved in previous training or testingso that you can use it to evaluate the processor version. If using your own data,you would use a document set aside for this purpose.
Download PDF
SelectUpload Test Document and select the document you just downloaded.
TheCustom Document Classifier analysis page opens. The output demonstrateshow well the document was classified.
You can also rerun the evaluation against a different test set or processorversion.

Auto-label newly imported documents

After deploying a trained processor version, you can useAuto-labeling to save time on labeling when importing new documents.

On theManage Dataset page,Import documents.
Copy and paste the following Cloud Storage path. This directory containsfive unlabeled patent PDFs. From theData split drop-down list, selectTraining.
```
cloud-samples-data/documentai/Custom/Patents/PDF-CDC-AutoLabel
```
In theApply labels section, selectAuto-labeling.
Select an existing processor version to label the documents.
- For example:2af620b2fd4d1fcf
SelectImport and wait for the process to finish. You can leave this pageand return later. When complete, the documents appear in theAuto-labeledsection of theManage Dataset page.
You cannot use auto-labeled documents for training or testing without markingthem as labeled. Go to theAuto-labeledsection to view the auto-labeled documents.
Select the first document to enter the labeling console.
Verify the label to ensure it's correct. Adjust if it's incorrect.
SelectMark as Labeled when finished.
Repeat the label verification for each auto-labeled document, then return totheManage Dataset page to assign the data for training.

Use the processor

You can manage your custom-trained processor versions just like any other processorversion. For more information, refer toManaging processor versions.

You can alsoSend a processing request to your customprocessor, and theresponse can be handled the same as other classifier processors.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

In the Google Cloud console navigation menu, selectDocument AI, thenMy Processors.
SelectMore actions in the same row as the processor you want to delete.
SelectDelete processor, enter the processor name, then selectDeleteagain to confirm.

What's next

For more details, seeGuides.
Review theprocessors list.
Separate documents into readable chunks withLayout Parser.
UseEnterprise Document OCR todetect and extract text.

Custom splitter

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換