Gemini API Setup Guide: Get Your API Key, Install Python SDK & Send Your First Multimodal Request

Gemini API Setup Guide: From API Key to Your First Multimodal Request

Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available today. Whether you want to generate text, analyze images, or process audio, Gemini handles it all through a single unified API. This step-by-step guide walks you through everything — from getting your API key in Google AI Studio to sending your first multimodal request using the Python SDK.

Step 1: Get Your Gemini API Key from Google AI Studio

Before writing any code, you need an API key. Google AI Studio provides a free tier that’s generous enough for development and prototyping.

  • Visit Google AI Studio at aistudio.google.com
  • Sign in with your Google account
  • Click “Get API Key” in the left sidebar
  • Click “Create API Key” and select an existing Google Cloud project or create a new one
  • Copy the generated key immediately — you won’t be able to view it again in full

Important: The free tier includes up to 15 requests per minute for Gemini 2.0 Flash and 2 requests per minute for Gemini 2.5 Pro. For production workloads, enable billing on your Google Cloud project.

Step 2: Install the Google Generative AI Python SDK

The official Python SDK is the fastest way to interact with the Gemini API. You need Python 3.9 or higher.

python -m venv gemini-env

On macOS/Linux

source gemini-env/bin/activate

On Windows

gemini-env\Scripts\activate

Install the SDK

pip install google-genai

This installs the latest google-genai package, which is the unified SDK for Gemini models (replacing the older google-generativeai package).

Verify the installation:

python -c “import google.genai; print(‘SDK installed successfully’)“

Step 3: Configure Your API Key

You have two options for providing your API key. Using an environment variable is the recommended approach.

# macOS/Linux export GEMINI_API_KEY=“YOUR_API_KEY”

Windows PowerShell

$env:GEMINI_API_KEY=“YOUR_API_KEY”

Windows CMD

set GEMINI_API_KEY=YOUR_API_KEY

Option B: Inline in Code (Development Only)

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

**Never commit API keys to version control.** Add your key to a .env file and include .env in your .gitignore.

Step 4: Send Your First Text Request

Let’s verify everything works with a simple text generation call.

from google import genai import os

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=“Explain quantum computing in three sentences.” )

print(response.text)

If you see a coherent response about quantum computing, your setup is complete.

Step 5: Send Your First Multimodal Request

Gemini’s standout feature is native multimodal understanding. Here’s how to analyze an image with text in a single request.

Analyze an Image from a URL

from google import genai from google.genai import types import os import urllib.request

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

Download a sample image

image_url = “https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg” image_path = “sample.jpg” urllib.request.urlretrieve(image_url, image_path)

Upload and analyze

my_file = client.files.upload(file=image_path)

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=[ my_file, “Describe what you see in this image in detail. Identify the species if possible.” ] )

print(response.text)

Analyze a Local Image with Inline Data

from google import genai
from google.genai import types
import pathlib
import os

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

image_bytes = pathlib.Path("your_photo.jpg").read_bytes()

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Content(parts=[
            types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
            types.Part.from_text("What objects are in this image? List them.")
        ])
    ]
)

print(response.text)

Step 6: Explore Available Models

Choose the right model for your use case:

ModelBest ForContext WindowSpeed
gemini-2.5-proComplex reasoning, coding, analysis1M tokensModerate
gemini-2.5-flashBalanced speed and quality1M tokensFast
gemini-2.0-flashHigh-volume, low-latency tasks1M tokensVery Fast
gemini-2.0-flash-liteCost-efficient, simple tasks1M tokensFastest
List models programmatically:

for model in client.models.list(): print(model.name)

Step 7: Streaming Responses

For long outputs, streaming delivers tokens as they're generated rather than waiting for the full response.

response = client.models.generate_content_stream( model=“gemini-2.0-flash”, contents=“Write a 500-word essay on renewable energy.” )

for chunk in response: print(chunk.text, end="", flush=True)

Pro Tips for Power Users

  • Use system instructions — Pass a config parameter with system_instruction to set persistent behavior across the conversation without repeating context in every prompt.
  • Batch with the Files API — Upload large files (up to 2GB) via client.files.upload() once, then reference them across multiple requests using the returned file object. Files persist for 48 hours.
  • Control output format — Set response_mime_type=“application/json” in the config and provide a response_schema to get structured JSON output every time.
  • Token counting — Use client.models.count_tokens() before sending large prompts to estimate costs and stay within limits.
  • Safety settings — Adjust safety thresholds per request using safety_settings in the config if the defaults are too restrictive for your legitimate use case.

Troubleshooting Common Errors

ErrorCauseSolution
403 PERMISSION_DENIEDInvalid or expired API keyRegenerate your key in Google AI Studio and update your environment variable
429 RESOURCE_EXHAUSTEDRate limit exceededImplement exponential backoff or upgrade to a paid tier
ModuleNotFoundError: google.genaiSDK not installed or wrong packageRun pip install google-genai (not google-generativeai)
400 INVALID_ARGUMENTUnsupported file type or malformed requestVerify the MIME type matches the file content and check the request structure
500 INTERNALServer-side issueWait and retry. If persistent, check the Google Cloud status dashboard

Frequently Asked Questions

Is the Gemini API free to use?

Yes, Google offers a free tier through Google AI Studio with rate limits (e.g., 15 RPM for Gemini 2.0 Flash). This is sufficient for development and testing. For production workloads with higher rate limits and SLA guarantees, you need to enable billing on your Google Cloud project and use the paid tier.

What file types does Gemini support for multimodal input?

Gemini supports a wide range of file types including images (JPEG, PNG, GIF, WebP), video (MP4, MPEG, MOV, AVI, WebM), audio (MP3, WAV, AIFF, FLAC, OGG), and documents (PDF, plain text). You can upload files up to 2GB through the Files API. For inline data, stick to files under 20MB.

What is the difference between google-genai and google-generativeai packages?

The google-genai package is the newer, unified SDK that provides a cleaner API with the genai.Client interface. The older google-generativeai package uses the genai.configure() pattern and is in maintenance mode. New projects should use google-genai as it supports all the latest features and models including Gemini 2.0 and 2.5 series.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide