Gemini API Setup Guide: Get Your API Key, Install Python SDK & Send Your First Multimodal Request

Gemini API Setup Guide: From API Key to Your First Multimodal Request

Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available today. Whether you want to generate text, analyze images, or process audio, Gemini handles it all through a single unified API. This step-by-step guide walks you through everything — from getting your API key in Google AI Studio to sending your first multimodal request using the Python SDK.

Step 1: Get Your Gemini API Key from Google AI Studio

Before writing any code, you need an API key. Google AI Studio provides a free tier that’s generous enough for development and prototyping.

Visit Google AI Studio at aistudio.google.com
Sign in with your Google account
Click “Get API Key” in the left sidebar
Click “Create API Key” and select an existing Google Cloud project or create a new one
Copy the generated key immediately — you won’t be able to view it again in full

Important: The free tier includes up to 15 requests per minute for Gemini 2.0 Flash and 2 requests per minute for Gemini 2.5 Pro. For production workloads, enable billing on your Google Cloud project.

Step 2: Install the Google Generative AI Python SDK

The official Python SDK is the fastest way to interact with the Gemini API. You need Python 3.9 or higher.

Create a Virtual Environment (Recommended)

python -m venv gemini-env


On macOS/Linux
source gemini-env/bin/activate
On Windows

gemini-env\Scripts\activate

Install the SDK

pip install google-genai

This installs the latest google-genai package, which is the unified SDK for Gemini models (replacing the older google-generativeai package).

Verify the installation:

python -c “import google.genai; print(‘SDK installed successfully’)“

Step 3: Configure Your API Key

You have two options for providing your API key. Using an environment variable is the recommended approach.

Option A: Environment Variable (Recommended)

# macOS/Linux export GEMINI_API_KEY=“YOUR_API_KEY”


Windows PowerShell
$env:GEMINI_API_KEY=“YOUR_API_KEY”
Windows CMD

set GEMINI_API_KEY=YOUR_API_KEY

Option B: Inline in Code (Development Only)

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

**Never commit API keys to version control.** Add your key to a .env file and include .env in your .gitignore.

Step 4: Send Your First Text Request

Let’s verify everything works with a simple text generation call.

from google import genai import os


client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=“Explain quantum computing in three sentences.”
)

print(response.text)

If you see a coherent response about quantum computing, your setup is complete.

Step 5: Send Your First Multimodal Request

Gemini’s standout feature is native multimodal understanding. Here’s how to analyze an image with text in a single request.

Analyze an Image from a URL

from google import genai from google.genai import types import os import urllib.request


client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))
Download a sample image
image_url = “https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg”
image_path = “sample.jpg”
urllib.request.urlretrieve(image_url, image_path)
Upload and analyze
my_file = client.files.upload(file=image_path)
response = client.models.generate_content(
model=“gemini-2.0-flash”,
contents=[
my_file,
“Describe what you see in this image in detail. Identify the species if possible.”
]
)

print(response.text)

Analyze a Local Image with Inline Data

from google import genai
from google.genai import types
import pathlib
import os

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

image_bytes = pathlib.Path("your_photo.jpg").read_bytes()

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Content(parts=[
            types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
            types.Part.from_text("What objects are in this image? List them.")
        ])
    ]
)

print(response.text)

Step 6: Explore Available Models

Choose the right model for your use case:

Model	Best For	Context Window	Speed
`gemini-2.5-pro`	Complex reasoning, coding, analysis	1M tokens	Moderate
`gemini-2.5-flash`	Balanced speed and quality	1M tokens	Fast
`gemini-2.0-flash`	High-volume, low-latency tasks	1M tokens	Very Fast
`gemini-2.0-flash-lite`	Cost-efficient, simple tasks	1M tokens	Fastest

List models programmatically:

for model in client.models.list(): print(model.name)

Step 7: Streaming Responses

For long outputs, streaming delivers tokens as they're generated rather than waiting for the full response.

response = client.models.generate_content_stream( model=“gemini-2.0-flash”, contents=“Write a 500-word essay on renewable energy.” )

for chunk in response: print(chunk.text, end="", flush=True)

Pro Tips for Power Users

Use system instructions — Pass a config parameter with system_instruction to set persistent behavior across the conversation without repeating context in every prompt.
Batch with the Files API — Upload large files (up to 2GB) via client.files.upload() once, then reference them across multiple requests using the returned file object. Files persist for 48 hours.
Control output format — Set response_mime_type=“application/json” in the config and provide a response_schema to get structured JSON output every time.
Token counting — Use client.models.count_tokens() before sending large prompts to estimate costs and stay within limits.
Safety settings — Adjust safety thresholds per request using safety_settings in the config if the defaults are too restrictive for your legitimate use case.

Troubleshooting Common Errors

Error	Cause	Solution
`403 PERMISSION_DENIED`	Invalid or expired API key	Regenerate your key in Google AI Studio and update your environment variable
`429 RESOURCE_EXHAUSTED`	Rate limit exceeded	Implement exponential backoff or upgrade to a paid tier
`ModuleNotFoundError: google.genai`	SDK not installed or wrong package	Run `pip install google-genai` (not `google-generativeai`)
`400 INVALID_ARGUMENT`	Unsupported file type or malformed request	Verify the MIME type matches the file content and check the request structure
`500 INTERNAL`	Server-side issue	Wait and retry. If persistent, check the Google Cloud status dashboard

Frequently Asked Questions

Is the Gemini API free to use?

Yes, Google offers a free tier through Google AI Studio with rate limits (e.g., 15 RPM for Gemini 2.0 Flash). This is sufficient for development and testing. For production workloads with higher rate limits and SLA guarantees, you need to enable billing on your Google Cloud project and use the paid tier.

What file types does Gemini support for multimodal input?

Gemini supports a wide range of file types including images (JPEG, PNG, GIF, WebP), video (MP4, MPEG, MOV, AVI, WebM), audio (MP3, WAV, AIFF, FLAC, OGG), and documents (PDF, plain text). You can upload files up to 2GB through the Files API. For inline data, stick to files under 20MB.

What is the difference between google-genai and google-generativeai packages?

The google-genai package is the newer, unified SDK that provides a cleaner API with the genai.Client interface. The older google-generativeai package uses the genai.configure() pattern and is in maintenance mode. New projects should use google-genai as it supports all the latest features and models including Gemini 2.0 and 2.5 series.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide