Use Gemma open models with Dataflow

Gemma is a family of lightweight, state-of-the art open models builtfrom research and technology used to create the Gemini models.You can use Gemma models in your Apache Beam inference pipelines.The termopen weight means that a model's pretrained parameters, or weights, arereleased. Details such as the original dataset, model architecture, and trainingcode aren't provided.

For a list of available models and the details about each model, see theGemma models overview.
To learn how to download and use models, seeGet started with Gemma using KerasNLP.
To download a model, seeGemma models.

Use cases

You can use Gemma models with Dataflow forsentiment analysis.With Dataflow and the Gemma models, you can process events, suchas customer reviews, as they arrive. Run the reviews through the model toanalyze them, and then generate recommendations. By combining Gemma withApache Beam, you can seamlessly complete this workflow.

Support and limitations

Gemma open models are supported with Apache Beam and Dataflowwith the following requirements:

Available for batch and streaming pipelines that use the Apache BeamPython SDK versions 2.46.0 and later.
Dataflow jobs must useRunner v2.
Dataflow jobs must useGPUs.For a list of GPU types supported with Dataflow, seeAvailability. The T4 andL4 GPU types are recommended.
The model must be downloaded and saved in the.keras file format.
TheTensorFlow model handleris recommended but not required.

Prerequisites

Access Gemma models throughKaggle.
Complete theconsent formand accept the terms and conditions.
Download the Gemma model. Save it in the.keras file format in a location that yourDataflow job can access, such as a Cloud Storage bucket.When you specify a value for themodel path variable, use the path to this storage location.
To run your job on Dataflow, create a custom containerimage. This step makes it possible to run the pipeline with GPUs on theDataflow service.
- To see a complete workflow that includes creating a Docker image, seeRunInference on Dataflow streaming with Gemmain GitHub.
- For more information about building theDocker image, seeBuild a custom container imagein "Run a pipeline with GPUs."
- To push the container to Artifact Registry by using Docker, see theBuild and push the imagesection in "Build custom container images for Dataflow."

Use Gemma in your pipeline

To use a Gemma model in your Apache Beam pipeline, follow these steps.

In your Apache Beam code, after you import your pipeline dependencies, includea path to your saved model:
```
model_path="MODEL_PATH"
```
ReplaceMODEL_PATH with the path where you saved thedownloaded model. For example, if you save your model to a Cloud Storagebucket, the path has the formatgs://STORAGE_PATH/FILENAME.keras.

The Keras implementation of the Gemma models has agenerate() methodthat generates text based on a prompt. To pass elements to thegenerate() method, use a custom inference function.

defgemma_inference_function(model,batch,inference_args,model_id):vectorized_batch=np.stack(batch,axis=0)# The only inference_arg expected here is a max_length parameter to# determine how many words are included in the output.predictions=model.generate(vectorized_batch,**inference_args)returnutils._convert_to_result(batch,predictions,model_id)

Run your pipeline, specifying the path to the trained model. Thisexample uses a TensorFlow model handler.

classFormatOutput(beam.DoFn):defprocess(self,element,*args,**kwargs):yield"Input:{input}, Output:{output}".format(input=element.example,output=element.inference)# Instantiate a NumPy array of string prompts for the model.examples=np.array(["Tell me the sentiment of the phrase 'I like pizza': "])# Specify the model handler, providing a path and the custom inference function.model_handler=TFModelHandlerNumpy(model_path,inference_fn=gemma_inference_function)withbeam.Pipeline()asp:_=(p|beam.Create(examples)# Create a PCollection of the prompts.|RunInference(model_handler,inference_args={'max_length':32})# Send the prompts to the model and get responses.|beam.ParDo(FormatOutput())# Format the output.|beam.Map(print)# Print the formatted output.)

What's next

Create a Dataflow streaming pipeline that uses RunInference and Gemma.
Run inference with a Gemma open model in Google Colab (requires Colab Enterprise).
Run a pipeline with GPUs.
Tune your model.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換