Getting started with the Gemini API and Android

Learn how to use the Gemini API and the Google AI SDK to prototype generative AI in Android applications.

 

Introduction to the Gemini API and prompt engineering

Pathway

Explore Google AI Studio and the capabilities of the Gemini generative AI model. Learn how to design and test the different types of prompts (freeform, structured, and chat) and get an API key for the Gemini API.

Note that the Google AI client SDK for Android is only for prototyping and exploring the Gemini generative AI models. For use cases beyond prototyping (especially production or enterprise-scale apps), use Vertex AI in Firebase instead. It offers an SDK for Android that has additional security features, support for large media file uploads, and streamlined integrations into the Firebase and Google Cloud ecosystem.

This pathway can be useful for further experimentation with the Gemini API and lays the groundwork for integrating its features into your application. Optionally, you can also try out the API using a simple NodeJS web application. If you don't already have NodeJS and NPM on your machine, feel free to skip this step and return back to Android in this pathway.

Build your own generative AI powered Android app

Video

Watch this talk from Google I/O 2024 to learn how to add generative AI to your Android app using the Gemini API.

Explore use cases for generative AI in mobile apps and learn how to get started with the Gemini API and the Google AI client SDK on Android.

The Google AI client SDK for Android is only for prototyping. There are additional security considerations for using the Gemini API key in your mobile client applications since you're risking exposing this API key to malicious actors if it's embedded or retrieved by your client application. So, for use cases beyond prototyping (especially production and enterprise-scale apps), migrate to Vertex AI in Firebase to call the Gemini API directly from your client app. Alternatively, you can access the Gemini models server-side through Vertex AI.

Introduction to the Google AI client SDK for Android

For mobile apps, you need to consider whether you want to use generative AI with a remote, cloud-based model or a local, on-device model. Take into consideration such factors as network dependency, the size of the model you want to use, cost, and privacy when choosing your approach.

This solution focuses on using the Google AI client SDK for Android to remotely access the Google AI Gemini API for generative AI. This approach features the following:

  • Is network-dependent and sends data to the cloud for processing access.
  • Provides native Kotlin and Java SDK and does not require working directly with REST APIs or custom server-side integrations.
  • Runs on Google's servers, providing access to larger and more performant models without any device or hardware dependencies.
  • Easy access to the latest improvements in Google's automatically updated models.

Getting started with the Google AI client SDK for Android requires setting up a project in Google AI Studio to obtain an API key for the Gemini API. Next, add the required dependencies to your app's build configuration, initialize the model that best fits your use case and submit a prompt to generate output.

If you want to use the alternative on-device approach, see the next step which covers Gemini Nano using the Google AI Edge SDK.

The Google AI client SDK for Android is only for prototyping. For use cases beyond prototyping (especially production and enterprise-scale apps), migrate to Vertex AI in Firebase to call the Gemini API directly from your client app. Alternatively, you can access the Gemini models server-side through Vertex AI.

If you use Android Studio, you can quickly get started with the Gemini API template that's described in more detail in a later step.

Access Gemini Nano on-device with the Google AI Edge SDK (experimental)

The alternative approach to using the Google AI client SDK to access the Gemini API is using an on-device AI model such as Gemini Nano powered by Android AICore through the Google AI Edge SDK for Android.

Instead of calling a remote service that provides access to a generative AI model, the prompts are processed by a model that is stored on the device itself. This option removes the dependency on network access and completes all processing on-device. Consider this approach for potential cost-savings, offline access, smaller and narrower tasks, as well as local processing of sensitive data for your app.

Gemini Nano is available for experimental access using the Google AI Edge SDK. Get started experimenting with Gemini Nano in your own app and follow the guide to begin experimenting with on-device AI capabilities to enhance your app.

Build with the Google AI client SDK in Android Studio

Android Studio includes a new project template for the Gemini API that helps you explore and prototype generative AI in Android apps with the Google AI client SDK.

Follow the steps in the template to set up an API key (if you don't already have one). Then, configure the application and make your first API call. The template automatically sets up an Android app that connects to the Gemini API and summarizes text.

Note that there are additional security considerations for using API keys directly in mobile client applications. The final step in this solution shows how to prepare your Android app for use cases beyond prototyping (most importantly, production) by migrating to Vertex AI in Firebase to access the Gemini API.

Explore the Android sample apps in Kotlin

Code sample

Explore the generative AI sample app for the Google AI client SDK for Android.

This example app demonstrates three key use cases: generating text, photo reasoning (using multimodal inputs), and multi-turn conversations (chat). It also shows how to use content streaming to improve response time by displaying partial results.

Follow the steps in the README to get started, which includes configuring your Gemini API key.

Multimodal prompting using the Google AI SDK

Multimodal prompts combine different types of media together, such as text, images, and audio. For example, you could create prompts that identify objects in an image, extract text from a photo, or reference a picture.

To get started, read this guide about file prompting strategies and multimodal concepts, which includes best practices for designing multimodal prompts.

Next, explore the multimodal capabilities of the Gemini models in Google AI Studio by uploading or selecting a file as part of your prompt.

Learn how to use multimodal inputs using the Google AI client SDK for Android, find image requirements for prompts, and explore the multimodal photo reasoning demo in the sample app.

For further reading, see the solution Leveraging the Gemini Pro Vision model for image understanding, multimodal prompts and accessibility.

Prepare for production by migrating to Vertex AI in Firebase

Using the Google AI client SDK for Android to call the Gemini API directly from a mobile client is only for prototyping and experimentation. When you start to seriously develop your app beyong prototyping (especially as you prepare for production), transition to use Vertex AI in Firebase and its SDK for Android.

For calling the Gemini API directly from your Android app, we strongly recommend using the Vertex AI in Firebase client SDK for Android. This SDK offers enhanced security features for mobile apps, including Firebase App Check to help protect your app from unauthorized client access. When you use this SDK, you can include large media files in your requests by using Cloud Storage for Firebase. Vertex AI in Firebase also integrates with other products in Google's Firebase developer platform (like Cloud Firestore and Firebase Remote Config), while also giving you streamlined access to the tools, workflows, and scale offered through Google Cloud. Among other differences , Vertex AI also supports increased request quotas and enterprise features.

Follow this guide to migrate to the Vertex AI in Firebase client SDK by updating your package dependencies, imports, and changing how the AI model is initialized.