Best practices for bulk data loading

This page describes the best practices when bulk loading data toCloud Firestorewith tools likemongoimport.

Cloud Firestore is a highly distributed system offering automaticscaling to meet the needs of your business.Cloud Firestore dynamicallysplits and combines your data based on the load received by the system.

Load-based splitting happens automatically without any requiredpre-configuration. TheCloud Firestore load-based splitting system has someimportant, unique characteristics compared to other document databases thatare important to keep in mind as you model your data.

Cloud Firestore's distributed nature can require changing some design choicesto change, particularly for workloads that were optimized fordatabases where the primary replica is the bottleneck for write throughput.

Best Practices

Workloads that process large amounts of data in a single threaded client cancreate a bottleneck. Clients might be able to use single threading to bulk loaddata, as the throughput of the client and server are similarly matched.ACloud Firestore database can handle significantly more parallelism, butthis requires that you configure clients to send requests in parallel.

mongoimport

When using themongoimport tool, requests are made sequentially by default.To improve the load time intoCloud Firestore,set the number of workers with the--numInsertionWorkers flag.The correct setting might require tuning based onthe size of your client, but we generally recommend starting with at least32.

async programming

When developing your own software using MongoDB compatible APIs, you can improveparallelism in the following ways:

  • Async frameworks: using async frameworks let you process and respondto requests in parallel. It is not necessary todevelop any complex pooling or queuing when making calls to your database.Each request flow can use independent connections and make theirdatabase calls in parallel.
  • Use parallelized compute offerings: using services likeCloud Run,your system can scale the number of computation workers required to process data.

Transient Failures

When working with a large distributed system likeCloud Firestore, you might encountertransient failures such as network blips or contention on adocument.

When bulk loading large amounts of information, it's important tomaintain a retry strategy for failed writes without failing the larger bulk loadoperation.

Note:Cloud Firestore does not supportretryWrites. We recommend using transactions to ensure your applicationguarantees idempotency.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.