Grok API Setup Guide for Python: xAI API Key, SDK Installation, Function Calling & Streaming
Grok API Setup Guide for Python Developers
Grok, developed by xAI, offers a powerful large language model accessible through a REST API that is fully compatible with the OpenAI SDK. This guide walks Python developers through every step — from generating your xAI API key to implementing function calling and streaming responses in production-ready code.
Step 1: Generate Your xAI API Key
- Navigate to console.x.ai and sign in with your X (Twitter) account or email.- Once in the dashboard, click API Keys in the left sidebar.- Click Create API Key, give it a descriptive name (e.g.,
my-python-app), and click Generate.- Copy the key immediately — it will not be shown again. Store it in a secure location such as a.envfile or a secrets manager.Your key will look something like:xai-AbCdEfGhIjKlMnOpQrStUvWxYz1234567890…
Step 2: Install the Required SDK
The Grok API uses the OpenAI-compatible chat completions format, so you can use the official OpenAI Python SDK with a custom base URL.
# Create a virtual environment (recommended)
python -m venv grok-env
source grok-env/bin/activate # Linux/macOS
grok-env\Scripts\activate # Windows
Install dependencies
pip install openai python-dotenv
Create a .env file in your project root:
XAI_API_KEY=YOUR_API_KEY
Step 3: Basic API Call Configuration
Initialize the client by pointing it to the xAI base URL:
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(
api_key=os.getenv(“XAI_API_KEY”),
base_url=“https://api.x.ai/v1”,
)
response = client.chat.completions.create(
model=“grok-3-latest”,
messages=[
{“role”: “system”, “content”: “You are a helpful coding assistant.”},
{“role”: “user”, “content”: “Explain Python decorators in 3 sentences.”}
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
Available models include grok-3-latest, grok-3-fast, and grok-2-latest. Use grok-3-fast for lower latency and cost.
Step 4: Configure Function Calling (Tool Use)
Grok supports OpenAI-compatible function calling, allowing the model to invoke structured tools.
import json
Define your tools
tools = [
{
“type”: “function”,
“function”: {
“name”: “get_weather”,
“description”: “Get current weather for a given city”,
“parameters”: {
“type”: “object”,
“properties”: {
“city”: {
“type”: “string”,
“description”: “City name, e.g. San Francisco”
},
“unit”: {
“type”: “string”,
“enum”: [“celsius”, “fahrenheit”],
“description”: “Temperature unit”
}
},
“required”: [“city”]
}
}
}
]
First request — model decides to call a tool
response = client.chat.completions.create(
model=“grok-3-latest”,
messages=[{“role”: “user”, “content”: “What’s the weather in Tokyo?”}],
tools=tools,
tool_choice=“auto”,
)
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f”Function: {tool_call.function.name}, Args: {args}“)
# Simulate function execution
weather_result = {"city": args["city"], "temp": "18°C", "condition": "Cloudy"}
# Second request — feed the result back
follow_up = client.chat.completions.create(
model="grok-3-latest",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(weather_result)
}
],
tools=tools,
)
print(follow_up.choices[0].message.content)</code></pre>
Step 5: Implement Streaming Responses
Streaming reduces perceived latency by delivering tokens as they are generated:
stream = client.chat.completions.create(
model="grok-3-latest",
messages=[
{"role": "user", "content": "Write a Python quicksort implementation."}
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # Newline after stream completes
Async Streaming
For web applications using FastAPI or similar async frameworks:
import asyncio
from openai import AsyncOpenAI
async_client = AsyncOpenAI(
api_key=os.getenv(“XAI_API_KEY”),
base_url=“https://api.x.ai/v1”,
)
async def stream_grok():
stream = await async_client.chat.completions.create(
model=“grok-3-latest”,
messages=[{“role”: “user”, “content”: “Explain async generators.”}],
stream=True,
)
async for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
asyncio.run(stream_grok())
Pro Tips for Power Users
- Model selection strategy: Use
grok-3-fast for high-throughput tasks like classification and extraction. Reserve grok-3-latest for complex reasoning and generation.- Structured outputs: Pass response_format={“type”: “json_object”} and instruct the system prompt to return JSON. This ensures parseable output every time.- Rate limit handling: Wrap calls with exponential backoff. The OpenAI SDK includes built-in retry logic — configure it with max_retries=3 in the client constructor.- Cost monitoring: Check response.usage.prompt_tokens and response.usage.completion_tokens after each call to track spend.- System prompt caching: Keep your system prompt identical across requests to benefit from xAI’s prompt caching, which reduces latency and cost on repeated prefixes.
Troubleshooting Common Errors
Error Cause Fix 401 UnauthorizedInvalid or expired API key Regenerate key at console.x.ai and update your .env file 404 Not FoundWrong base URL or model name Verify base_url="https://api.x.ai/v1" and check model name spelling 429 Too Many RequestsRate limit exceeded Add max_retries=3 to client or implement exponential backoff openai.APIConnectionErrorNetwork issue or firewall blocking Check internet connectivity; whitelist api.x.ai in your firewall json.JSONDecodeError on tool argsModel returned malformed function args Add stricter parameter descriptions; use tool_choice="required" to force valid output
## Frequently Asked Questions
Can I use the Grok API without the OpenAI SDK?
Yes. The Grok API exposes standard REST endpoints. You can use requests or httpx to send POST requests to https://api.x.ai/v1/chat/completions with your API key in the Authorization: Bearer header. However, the OpenAI SDK handles retries, streaming parsing, and type safety out of the box, making it the recommended approach.
What is the difference between grok-3-latest and grok-3-fast?
grok-3-latest is the full-capability model optimized for complex reasoning, coding, and multi-step tasks. grok-3-fast is a smaller, faster variant with lower latency and reduced cost per token, ideal for simpler tasks like classification, summarization, and high-volume processing. Both support function calling and streaming.
Does Grok support multi-turn conversations with function calling?
Yes. You maintain conversation context by appending each assistant response and tool result to your messages array, exactly as shown in Step 4. The model can call multiple tools in sequence across turns, and you feed results back using the tool role with the matching tool_call_id.