Build hybrid experiences with on-device and cloud-hosted models

Preview: Using theFirebase AI Logic SDKs to build hybrid experiences is a Preview feature, which means that it isn't subject to any SLA or deprecation policy and could change in backwards-incompatible ways.

This initial release onlysupports on-device inference for web apps running on Chrome on Desktop.


Build AI-powered apps and features with hybrid inference usingFirebase AI Logic. Hybrid inference enables running inference usingon-device models when available and seamlessly falling back to cloud-hostedmodels otherwise (and vice versa).

With this release, hybrid inference is available using theFirebase AI Logic client SDK for Web with support for on-deviceinference for Chrome on Desktop.

Jump to the code examples

Recommended use cases and supported capabilities

Recommended use cases:

  • Using an on-device model for inference offers:

    • Enhanced privacy
    • Local context
    • Inference at no-cost
    • Offline functionality
  • Using hybrid functionality offers:

    • Reach 100% of your audience, regardless of on-device model availabilityor internet connectivity

Supported capabilities and features for on-device inference:

Get started

This guide shows you how to get started using theFirebase AI Logic SDK forWeb to perform hybrid inference.

Inference using an on-device model uses thePrompt API from Chrome;whereas inference using a cloud-hosted model uses your chosenGemini APIprovider (either theGemini Developer API or theVertex AI Gemini API).

Get started developing using localhost, as described in this section(you can also learn more aboutusing APIs on localhost in the Chrome documentation). Then, once you've implemented your feature, youcan optionallyenable end-users to try out your feature.

Step 1: Set up Chrome and the Prompt API for on-device inference

  1. Make sure you're using a recent version of Chrome. Update inchrome://settings/help.
    On-device inference is available from Chrome v139 and higher.

  2. Enable the on-device multimodal model by setting the following flag toEnabled:

    • chrome://flags/#prompt-api-for-gemini-nano-multimodal-input
  3. Restart Chrome.

  4. (Optional) Download the on-device model before the first request.

    The Prompt API is built into Chrome; however, the on-device model isn'tavailable by default. If you haven't yet downloaded the model before yourfirst request for on-device inference, the request will automatically startthe model download in the background.

    Note: Downloading the model can take several minutes, so waiting toauto-download with the first request can significantly delay receiving aresponse to that request.

    View instructions to download the on-device model

    1. OpenDeveloper Tools > Console.

    2. Run the following:

      awaitLanguageModel.availability();
    3. Make sure that the output isavailable,downloading, ordownloadable.

    4. If the output isdownloadable, start the model download by running:

      awaitLanguageModel.create();
    5. You can use the followingmonitor callback to listen for downloadprogress and make sure that the model isavailable before makingrequests:

      constsession=awaitLanguageModel.create({monitor(m){m.addEventListener("downloadprogress",(e)=>{console.log(`Downloaded${e.loaded*100}%`);});},});

Step 2: Set up a Firebase project and connect your app to Firebase

  1. Sign into theFirebase console,and then select your Firebase project.

    Don't already have a Firebase project?

    If you don't already have a Firebase project, click the button to create anew Firebase project, and then use either of the following options:

    • Option 1: Create a wholly new Firebase project (and its underlyingGoogle Cloud project automatically) by entering a new project name in thefirst step of the workflow.

    • Option 2: "Add Firebase" to an existingGoogle Cloud project byclickingAdd Firebase to Google Cloud project (at bottom of page).In the first step of the workflow, start entering theproject name ofthe existing project, and then select the project from the displayed list.

    Complete the remaining steps of the on-screen workflow to create a Firebaseproject. Note that when prompted, you donot need to set upGoogle Analytics to use theFirebase AI Logic SDKs.

  2. In theFirebase console, go to theFirebase AI Logic page.

  3. ClickGet started to launch a guided workflow that helps you set up therequired APIsand resources for your project.

  4. Select the "Gemini API" provider that you'd like to use with theFirebase AI Logic SDKs.Gemini Developer API isrecommended for first-time users. You can always add billing or set upVertex AI Gemini API later, if you'd like.

    • Gemini Developer APIbilling optional(available on the no-cost Spark pricing plan, and you can upgrade later ifdesired)
      The console will enable the required APIs and create aGemini API key in your project.
      Donot add thisGemini API key into your app's codebase.Learn more.

    • Vertex AI Gemini APIbilling required(requires the pay-as-you-go Blaze pricing plan)
      The console will help you set up billing and enable therequired APIs in your project.

  5. If prompted in the console's workflow, follow the on-screen instructions toregister your app and connect it to Firebase.

  6. Continue to the next step in this guide to add the SDK to your app.

Note: In theFirebase console, you're strongly encouraged to set upFirebase App Check. If you're just trying out theGemini API,you don't need to set upApp Check right away; however, we recommendsetting it up as soon as you start seriously developing your app.

Step 3: Add the SDK

The Firebase library provides access to the APIs for interacting with generativemodels. The library is included as part of the Firebase JavaScript SDK for Web.

  1. Install the Firebase JS SDK for Web using npm:

    npm install firebase
  2. Initialize Firebase in your app:

    import{initializeApp}from"firebase/app";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);

Step 4: Initialize the service and create a model instance

Click yourGemini API provider to view provider-specific content and code on this page.

Before sending a prompt to aGemini model, initialize the service foryour chosen API provider and create aGenerativeModel instance.

Set themode to one of:

  • PREFER_ON_DEVICE: Configures the SDK to use the on-device model if it'savailable, or fall back to the cloud-hosted model.

  • ONLY_ON_DEVICE: Configures the SDK to use the on-device model or throwan exception.

  • PREFER_IN_CLOUD: Configures the SDK to use the cloud-hosted model ifit's available, or fall back to the on-device model.

  • ONLY_IN_CLOUD: Configures the SDK to never use the on-device model.

When you usePREFER_ON_DEVICE,PREFER_IN_CLOUD, orONLY_IN_CLOUD thedefault cloud-hosted model isgemini-2.0-flash-lite,but you canoverride the default.

import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend,InferenceMode}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance// Set the mode, for example to use on-device model when possibleconstmodel=getGenerativeModel(ai,{mode:InferenceMode.PREFER_ON_DEVICE});
Note:Downloading the on-device model can take several minutes.
If you haven't yet
downloaded the model before your first request for on-device inference,the request will automatically start the model download in the background (whichcan significantly delay receiving a response to that request).

Send a prompt request to a model

This section provides examples for how to send various types of input togenerate different types of output, including:

If you want to generate structured output (like JSON or enums), thenuse one of the following "generate text" examples and additionallyconfigure the model to respond according to a provided schema.

Generate text from text-only input

Before trying this sample, make sure that you've completed theGet started section of this guide.

You can usegenerateContent()to generate text from a prompt that contains text:

// Imports + initialization of FirebaseApp and backend service + creation of model instance// Wrap in an async function so you can use awaitasyncfunctionrun(){// Provide a prompt that contains textconstprompt="Write a story about a magic backpack."// To generate text output, call `generateContent` with the text inputconstresult=awaitmodel.generateContent(prompt);constresponse=result.response;consttext=response.text();console.log(text);}run();

Note thatFirebase AI Logic also supports streaming of text responses usinggenerateContentStream (instead ofgenerateContent).

Generate text from text-and-image (multimodal) input

Before trying this sample, make sure that you've completed theGet started section of this guide.

You can usegenerateContent()to generate text from a prompt that contains text and image files—providing eachinput file'smimeType and the file itself.

The supported input image types for on-device inference are PNG and JPEG.

Important: For on-device inference, the maximum token limit is 6000 tokens.
// Imports + initialization of FirebaseApp and backend service + creation of model instance// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(',')[1]);reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the imageconstprompt="Write a poem about this picture:";constfileInputEl=document.querySelector("input[type=file]");constimagePart=awaitfileToGenerativePart(fileInputEl.files[0]);// To generate text output, call `generateContent` with the text and imageconstresult=awaitmodel.generateContent([prompt,imagePart]);constresponse=result.response;consttext=response.text();console.log(text);}run();

Note thatFirebase AI Logic also supports streaming of text responses usinggenerateContentStream (instead ofgenerateContent).

What else can you do?

In addition to the examples above, you can alsoenable end-users to try out your feature,use alternative inference modes,override the default fallback model, anduse model configuration to control responses.

Enable end-users to try out your feature

To enable end-users to try out your feature, you canenroll in the Chrome Origin Trials.Note that there's a limited duration and usage for these trials.

  1. Register for thePrompt API Chrome Origin Trial.You'll be given a token.

  2. Provide the token on every web page for which you want the trial feature tobe enabled. Use one of the following options:

    • Provide the token as a meta tag in the<head> tag:<meta http-equiv="origin-trial" content="TOKEN">

    • Provide the token as an HTTP header:Origin-Trial:TOKEN

    • Provide the tokenprogrammatically.

Use alternative inference modes

The examples above used thePREFER_ON_DEVICE mode to configure the SDK to usean on-device model if it's available, or fall back to a cloud-hosted model. TheSDK offers three alternativeinference modes:ONLY_ON_DEVICE,ONLY_IN_CLOUD, andPREFER_IN_CLOUD.

  • UseONLY_ON_DEVICE mode so that the SDK can only use an on-devicemodel. In this configuration, the API will throw an error if an on-devicemodel is not available.

    constmodel=getGenerativeModel(ai,{mode:InferenceMode.ONLY_ON_DEVICE});
  • UseONLY_IN_CLOUD mode so that the SDK can only use a cloud-hosted model.

    constmodel=getGenerativeModel(ai,{mode:InferenceMode.ONLY_IN_CLOUD});
  • UsePREFER_IN_CLOUD mode so that the SDK will attempt to use thecloud-hosted model, but will fall back to the on-device model if thecloud-hosted model is unavailable (for example, the device is offline).

    constmodel=getGenerativeModel(ai,{mode:InferenceMode.PREFER_IN_CLOUD});
Note:Downloading the on-device model can take several minutes.
If you haven't yet
downloaded the model before your first request for on-device inference,the request will automatically start the model download in the background (whichcan significantly delay receiving a response to that request).

Override the default fallback model

The default cloud-hosted model isgemini-2.0-flash-lite.

This model is the fallback cloud-hosted model when you use thePREFER_ON_DEVICE mode. It's also the default model when you use theONLY_IN_CLOUD mode or thePREFER_IN_CLOUD mode.

You can use theinCloudParams configuration option to specify an alternative default cloud-hosted model.

constmodel=getGenerativeModel(ai,{mode:InferenceMode.INFERENCE_MODE,inCloudParams:{model:"GEMINI_MODEL_NAME"}});

Find model names for allsupported Gemini models.

Use model configuration to control responses

In each request to a model, you can send along a model configuration to controlhow the model generates a response. Cloud-hosted models and on-device modelsoffer different configuration options.

The configuration is maintained for the lifetime of the instance. If you want touse a different config, create a newGenerativeModel instance with thatconfig.

Set the configuration for a cloud-hosted model

Use theinCloudParams option to configure a cloud-hostedGemini model. Learn aboutavailable parameters.

constmodel=getGenerativeModel(ai,{mode:InferenceMode.INFERENCE_MODE,inCloudParams:{model:"GEMINI_MODEL_NAME"temperature:0.8,topK:10}});

Set the configuration for an on-device model

Note that inference using an on-device model uses thePrompt API from Chrome.

Use theonDeviceParams option to configure an on-device model. Learn aboutavailable parameters.

constmodel=getGenerativeModel(ai,{mode:InferenceMode.INFERENCE_MODE,onDeviceParams:{createOptions:{temperature:0.8,topK:8}}});

Set the configuration for structured output (like JSON)

Generating structured output (like JSON and enums) is supported forinference using both cloud-hosted and on-device models.

For hybrid inference, use bothinCloudParams andonDeviceParams to configure the model to respond with structured output. For the other modes,use only the applicable configuration.

  • ForinCloudParams: Specify the appropriateresponseMimeType (inthis example,application/json) as well as theresponseSchema that youwant the model to use.

  • ForonDeviceParams: Specify theresponseConstraint that youwant the model to use.

JSON output

The following example adapts thegeneral JSON output examplefor hybrid inference:

import{getAI,getGenerativeModel,Schema}from"firebase/ai";constjsonSchema=Schema.object({properties:{characters:Schema.array({items:Schema.object({properties:{name:Schema.string(),accessory:Schema.string(),age:Schema.number(),species:Schema.string(),},optionalProperties:["accessory"],}),}),}});constmodel=getGenerativeModel(ai,{mode:InferenceMode.INFERENCE_MODE,inCloudParams:{model:"gemini-2.5-flash"generationConfig:{responseMimeType:"application/json",responseSchema:jsonSchema},}onDeviceParams:{promptOptions:{responseConstraint:jsonSchema}}});
Enum output

As above, but adapting thedocumentation on enum outputfor hybrid inference:

// ...constenumSchema=Schema.enumString({enum:["drama","comedy","documentary"],});constmodel=getGenerativeModel(ai,{// ...generationConfig:{responseMimeType:"text/x.enum",responseSchema:enumSchema},// ...});// ...

Features not yet available for on-device inference

As an experimental release, not all the capabilities of the Web SDK areavailable foron-device inference.The following features arenot yetsupported for on-device inference (but they are usually available forcloud-based inference).

Note: For many of these features, if you set the mode toPREFER_ON_DEVICE, the SDK will just automatically fall back to use thecloud-hosted model for these not-yet-available capabilities.
  • Generating text from image file input types other than JPEG and PNG

    • Can fallback to the cloud-hosted model;however,ONLY_ON_DEVICE mode will throw an error.
  • Generating text from audio, video, and documents (like PDFs) inputs

    • Can fallback to the cloud-hosted model;however,ONLY_ON_DEVICE mode will throw an error.
  • Generating images usingGemini orImagen models

    • Can fallback to the cloud-hosted model;however,ONLY_ON_DEVICE mode will throw an error.
  • Providing files using URLs in multimodal requests. You must provide files asinline data to on-device models.

  • Multi-turn chat

    • Can fallback to the cloud-hosted model;however,ONLY_ON_DEVICE mode will throw an error.
  • Bi-directional streaming with theGemini Live API

  • Providing the model withtools to help it generate its response (likefunction calling, code execution, and grounding with Google Search)

  • Count tokens

    • Always throws an error. The count will differ between cloud-hosted andon-device models, so there is no intuitive fallback.
  • AI monitoring in theFirebase console for on-device inference.

    • Note that any inference using the cloud-hosted models can bemonitored just like other inference using theFirebase AI Logic client SDK for Web.


Give feedback about your experience withFirebase AI Logic


Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-07 UTC.