Run Spark jobs with DataprocFileOutputCommitter Stay organized with collections Save and categorize content based on your preferences.
TheDataprocFileOutputCommitter feature is an enhancedversion of the open sourceFileOutputCommitter. Itenables concurrent writes by Apache Spark jobs to an output location.
Limitations
TheDataprocFileOutputCommitter feature supports Spark jobs run onDataproc Compute Engine clusters created withthe following image versions:
2.1 image versions 2.1.10 and higher
2.0 image versions 2.0.62 and higher
UseDataprocFileOutputCommitter
To use this feature:
Create a Dataproc on Compute Engine clusterusing image versions
2.1.10or2.0.62or higher.Set
spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactoryandspark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=falseas a job property when yousubmit a Spark jobto the cluster.- Google Cloud CLI example:
gcloud dataproc jobs submit spark \ --properties=spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory,spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false \ --region=REGION \ other args ...
- Code example:
The Dataproc file output committer must setsc.hadoopConfiguration.set("spark.hadoop.mapreduce.outputcommitter.factory.class","org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory")sc.hadoopConfiguration.set("spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs","false")spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=falseto avoid conflicts between success marker files created during concurrent writes.You can also set this property inspark-defaults.conf.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.