Gemini API Setup Complete Guide: From API Key to Your First Multimodal Request

Gemini API Setup Complete Guide: API Key, Python SDK, and First Multimodal Request

Google’s Gemini API gives developers access to one of the most powerful multimodal AI models available. This step-by-step guide walks you through getting your API key from Google AI Studio, installing the Python SDK, and sending your first text and multimodal requests — all in under 15 minutes.

Step 1: Get Your Gemini API Key from Google AI Studio

  • Visit Google AI Studio — Navigate to aistudio.google.com and sign in with your Google account.- Click “Get API Key” — In the left sidebar, click the Get API Key button.- Create API Key — Select Create API key in new project or choose an existing Google Cloud project. Google will provision a new project automatically if needed.- Copy and Store Your Key — Copy the generated key immediately. Store it securely — you won’t be able to view it again in the console.# Store as environment variable (recommended)

Linux / macOS

export GEMINI_API_KEY=“YOUR_API_KEY”

Windows PowerShell

$env:GEMINI_API_KEY=“YOUR_API_KEY”

Persist across sessions (Linux/macOS — add to .bashrc or .zshrc)

echo ‘export GEMINI_API_KEY=“YOUR_API_KEY”’ >> ~/.bashrc source ~/.bashrc- Verify the Key — Run a quick curl test to confirm your key works:

curl “https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_API_KEY
A JSON response listing available models confirms your key is active.

Step 2: Install the Google Generative AI Python SDK

The official Python SDK simplifies interaction with the Gemini API. - **Ensure Python 3.9+** is installed:python --version- **Install the SDK** via pip:

pip install -U google-genai
- **Verify the installation:**
python -c "from google import genai; print('SDK installed successfully')"
## Step 3: Send Your First Text Request

Start with a simple text generation call to confirm everything works end to end. from google import genai import os

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=“Explain how neural networks learn in 3 sentences.” )

print(response.text)

Expected output: A concise explanation of neural network learning in three sentences.

Step 4: Send Your First Multimodal Request

Gemini’s true power lies in processing text, images, audio, and video together. Here’s how to analyze a local image with a text prompt. from google import genai from google.genai import types import os import pathlib

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

Load a local image file

image_path = pathlib.Path(“sample.jpg”) image_data = image_path.read_bytes()

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=[ types.Part.from_bytes(data=image_data, mime_type=“image/jpeg”), “Describe this image in detail. What objects are visible?” ] )

print(response.text)

Analyzing an Image from a URL

from google import genai
from google.genai import types
import os
import urllib.request

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

# Download image bytes
image_url = "https://example.com/photo.jpg"
image_data = urllib.request.urlopen(image_url).read()

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Part.from_bytes(data=image_data, mime_type="image/jpeg"),
        "What is happening in this image?"
    ]
)

print(response.text)

Step 5: Streaming Responses

For long outputs, streaming delivers tokens as they are generated, reducing perceived latency. from google import genai import os

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

response = client.models.generate_content_stream( model=“gemini-2.0-flash”, contents=“Write a 500-word essay about climate change.” )

for chunk in response: print(chunk.text, end="", flush=True)

Step 6: Configure Generation Parameters

Fine-tune output quality with generation configuration. from google import genai from google.genai import types import os

client = genai.Client(api_key=os.environ.get(“GEMINI_API_KEY”))

response = client.models.generate_content( model=“gemini-2.0-flash”, contents=“Write a creative product tagline for a smart water bottle.”, config=types.GenerateContentConfig( temperature=0.9, top_p=0.95, max_output_tokens=256, ) )

print(response.text)

ParameterRangePurpose
temperature0.0 – 2.0Controls randomness. Lower = deterministic, higher = creative
top_p0.0 – 1.0Nucleus sampling threshold
max_output_tokens1 – model maxLimits response length
top_k1 – 40Limits token candidates per step

Available Models Reference

ModelBest ForContext Window
gemini-2.0-flashFast, cost-effective general tasks1M tokens
gemini-2.0-flash-liteHighest speed, lowest cost1M tokens
gemini-2.5-proComplex reasoning, coding1M tokens
gemini-2.5-flashBalanced speed and thinking1M tokens
## Pro Tips for Power Users - **Use system instructions** — Set persistent behavior by adding a system_instruction parameter in your config to define the model's persona or constraints.- **Batch with async** — Use client.aio.models.generate_content for async calls when processing multiple requests concurrently.- **JSON mode** — Set response_mime_type="application/json" in your config to force structured JSON output — ideal for API pipelines.- **Safety settings** — Customize safety thresholds per category using safety_settings in your config if defaults are too restrictive for your use case.- **Token counting** — Call client.models.count_tokens() before large requests to estimate cost and stay within rate limits.- **Caching** — For repeated context (like a large document), use context caching to reduce latency and cost on subsequent requests. ## Troubleshooting Common Errors
ErrorCauseSolution
400 API_KEY_INVALIDIncorrect or expired API keyRegenerate your key in Google AI Studio and update your environment variable
429 RESOURCE_EXHAUSTEDRate limit exceededImplement exponential backoff or upgrade to a paid tier for higher quotas
ModuleNotFoundError: google.genaiSDK not installed or wrong packageRun pip install -U google-genai (not google-generativeai, which is the legacy package)
403 PERMISSION_DENIEDAPI not enabled for your projectEnable the Generative Language API in your Google Cloud Console
500 INTERNALTransient server errorRetry after a few seconds. If persistent, check the Google Cloud Status Dashboard
## Frequently Asked Questions

Is the Gemini API free to use?

Yes, the Gemini API offers a generous free tier through Google AI Studio. The free tier includes rate-limited access to models like Gemini 2.0 Flash. For production workloads requiring higher throughput, you can enable billing in Google Cloud and pay per token. Check the official pricing page for current rates per model.

What file types does Gemini support for multimodal input?

Gemini supports a wide range of input types: JPEG, PNG, GIF, and WebP for images; MP3, WAV, FLAC, and OGG for audio; MP4, AVI, MOV, and MKV for video; and PDF for documents. You can combine multiple file types in a single request. The Files API handles uploads larger than 20MB, while inline data works for smaller files.

What is the difference between google-genai and google-generativeai packages?

The google-genai package is the current, recommended SDK that uses a unified client pattern (genai.Client). The google-generativeai package is the older, legacy SDK with a different API surface. New projects should always use google-genai. If you are migrating from the legacy SDK, the main change is moving from genai.configure() and genai.GenerativeModel() to the client-based approach shown in this guide.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide