Hadoop MapReduce job with Bigtable

This example usesHadoop to perform a simple MapReduce job thatcounts the number of times a word appears in a text file. The MapReduce jobuses Bigtable to store the results of the map operation. The code forthis example is in the GitHub repositoryGoogleCloudPlatform/cloud-bigtable-examples, in the directoryjava/dataproc-wordcount.

Set up authentication

To use the Java samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

If you're using a local shell, then create local authentication credentials for your user account:

gcloudauthapplication-defaultlogin

You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

For more information, see Set up authentication for a local development environment.

Overview of the code sample

The code sample provides a simple command-line interface that takes one or moretext files and a table name as input, finds all of the words that appear in thefile, and counts how many times each word appears. The MapReduce logic appearsin theWordCountHBase class.

First, a mapper tokenizes the text file's contents and generates key-valuepairs, where the key is a word from the text file and the value is1:

publicstaticclassTokenizerMapperextendsMapper<Object,Text,ImmutableBytesWritable,IntWritable>{privatefinalstaticIntWritableone=newIntWritable(1);@Overridepublicvoidmap(Objectkey,Textvalue,Contextcontext)throwsIOException,InterruptedException{StringTokenizeritr=newStringTokenizer(value.toString());ImmutableBytesWritableword=newImmutableBytesWritable();while(itr.hasMoreTokens()){word.set(Bytes.toBytes(itr.nextToken()));context.write(word,one);}}}

A reducer then sums the values for each key and writes the results to aBigtable table that you specified. Each row key is a word from thetext file. Each row contains acf:count column, which contains the number oftimes the row key appears in the text file.

publicstaticclassMyTableReducerextendsTableReducer<ImmutableBytesWritable,IntWritable,ImmutableBytesWritable>{@Overridepublicvoidreduce(ImmutableBytesWritablekey,Iterable<IntWritable>values,Contextcontext)throwsIOException,InterruptedException{intsum=sum(values);Putput=newPut(key.get());put.addColumn(COLUMN_FAMILY,COUNT_COLUMN_NAME,Bytes.toBytes(sum));context.write(null,put);}publicintsum(Iterable<IntWritable>values){inti=0;for(IntWritableval:values){i+=val.get();}returni;}}

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Hadoop MapReduce job with Bigtable

Set up authentication

Overview of the code sample