Perplexity API Complete Setup Guide: API Key, Python SDK, Citation Parsing & Search Models

Perplexity API Complete Setup Guide

Perplexity AI offers a powerful API that combines large language models with real-time web search capabilities. This guide walks you through everything from obtaining your API key to parsing source citations and selecting the right search-augmented model for your use case.

Step 1: Create a Perplexity Account and Generate Your API Key

  • Visit perplexity.ai and sign up for an account (or log in if you already have one).- Navigate to Settings → API or go directly to the API settings page.- Click Generate API Key. Copy the key immediately — it will only be displayed once.- Add billing information. Perplexity API uses a pay-per-request pricing model. You must have a valid payment method on file before making API calls.- Store your key securely. Never commit it to version control.

Store Your Key as an Environment Variable

# Linux / macOS export PERPLEXITY_API_KEY=“YOUR_API_KEY”

Windows PowerShell

$env:PERPLEXITY_API_KEY=“YOUR_API_KEY”

Or add to a .env file (never commit this file)

echo “PERPLEXITY_API_KEY=YOUR_API_KEY” >> .env

Step 2: Install the Required Python Packages

Perplexity's API is compatible with the OpenAI SDK, so you can use the official openai Python library as your client. # Install the OpenAI Python SDK pip install openai

Optional: for environment variable management

pip install python-dotenv

Optional: for async workflows

pip install httpx

Step 3: Configure the Python Client

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("PERPLEXITY_API_KEY", "YOUR_API_KEY"),
    base_url="https://api.perplexity.ai"
)

That's it — three lines and you have a fully configured client. The key difference from standard OpenAI usage is the base_url parameter pointing to Perplexity's endpoint.

Step 4: Make Your First API Call

response = client.chat.completions.create( model=“sonar-pro”, messages=[ {“role”: “system”, “content”: “You are a helpful research assistant. Be concise and cite sources.”}, {“role”: “user”, “content”: “What are the latest developments in quantum computing in 2026?”} ] )

print(response.choices[0].message.content)

Step 5: Parse Source Citations

One of Perplexity's most valuable features is returning source citations alongside generated answers. Citations are returned in the response object. response = client.chat.completions.create( model="sonar-pro", messages=[ {"role": "user", "content": "What is retrieval-augmented generation?"} ] )

Access the answer

answer = response.choices[0].message.content print(“Answer:”, answer)

Access citations (returned in the response metadata)

if hasattr(response, ‘citations’): citations = response.citations print(“\nSources:”) for i, citation in enumerate(citations, 1): print(f” [{i}] {citation}”)

Building a Citation Formatter

import re

def format_response_with_citations(response):
    """Format Perplexity response with numbered source citations."""
    content = response.choices[0].message.content
    citations = getattr(response, 'citations', [])

    if not citations:
        return content

    formatted = content + "\n\n--- Sources ---\n"
    for i, url in enumerate(citations, 1):
        formatted += f"[{i}] {url}\n"

    return formatted

# Usage
result = format_response_with_citations(response)
print(result)

Step 6: Choose the Right Search-Augmented Model

Perplexity offers several models optimized for different use cases:

ModelBest ForContext WindowKey Feature
sonarQuick lookups, simple Q&A128k tokensFast, cost-effective search
sonar-proComplex research, multi-step reasoning200k tokensMulti-search, deeper analysis
sonar-reasoningMath, logic, scientific tasks128k tokensExtended thinking with search
sonar-reasoning-proAdvanced reasoning with research128k tokensBest reasoning + search combo
sonar-deep-researchComprehensive research reports128k tokensExhaustive multi-step research
# Example: Using sonar-reasoning for a complex query response = client.chat.completions.create( model="sonar-reasoning", messages=[ {"role": "user", "content": "Compare the environmental impact of lithium-ion vs solid-state batteries."} ] ) print(response.choices[0].message.content) ## Step 7: Advanced Configuration Options
# Fine-tune search behavior with additional parameters
response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "Latest Python 3.14 features"}
    ],
    temperature=0.2,           # Lower = more factual
    max_tokens=1024,           # Control response length
    top_p=0.9,                 # Nucleus sampling
    search_recency_filter="week",  # Filter: day, week, month, year
    return_related_questions=True   # Get follow-up suggestions
)
## Pro Tips for Power Users - **Use streaming for long responses:** Add stream=True to your API call and iterate over chunks for real-time output display.- **System prompts matter:** A well-crafted system prompt dramatically improves citation quality. Tell the model to "always cite sources" and "prefer recent publications."- **Batch requests efficiently:** For multiple queries, use Python's asyncio with httpx to run concurrent API calls and reduce total latency.- **Cache frequent queries:** Implement a simple dictionary or Redis cache keyed by query hash to avoid redundant API calls and reduce costs.- **Filter by recency:** Use search_recency_filter when freshness matters — set to "day" for breaking news or "month" for recent developments.- **Monitor usage:** Check your API dashboard regularly. Set billing alerts to avoid unexpected charges. ## Troubleshooting Common Errors
ErrorCauseFix
401 UnauthorizedInvalid or missing API keyVerify your key is correct and the environment variable is loaded. Regenerate the key if needed.
403 ForbiddenNo billing method on fileAdd a payment method in your Perplexity API settings before making calls.
429 Too Many RequestsRate limit exceededImplement exponential backoff. Default rate limits vary by plan tier.
model_not_foundIncorrect model nameDouble-check the model ID. Use sonar, sonar-pro, or sonar-reasoning.
Connection refusedWrong base URLEnsure base_url is set to https://api.perplexity.ai exactly.
# Robust error handling template
import time
from openai import OpenAI, APIError, RateLimitError

client = OpenAI( api_key=os.environ[“PERPLEXITY_API_KEY”], base_url=“https://api.perplexity.ai” )

def query_with_retry(prompt, model=“sonar-pro”, max_retries=3): for attempt in range(max_retries): try: response = client.chat.completions.create( model=model, messages=[{“role”: “user”, “content”: prompt}] ) return response except RateLimitError: wait = 2 ** attempt print(f”Rate limited. Retrying in {wait}s…”) time.sleep(wait) except APIError as e: print(f”API Error: {e}”) raise raise Exception(“Max retries exceeded”)

Frequently Asked Questions

Is the Perplexity API compatible with the OpenAI Python SDK?

Yes. Perplexity's API follows the OpenAI chat completions format. You simply install the openai package and point the base_url to https://api.perplexity.ai. All standard parameters like temperature, max_tokens, and stream work as expected, with additional Perplexity-specific options like search_recency_filter.

How do I access and parse citations from Perplexity API responses?

Citations are returned as part of the response object. Access them via response.citations, which provides a list of source URLs referenced in the generated answer. You can pair these with bracketed numbers in the response text to build fully attributed outputs for research or content workflows.

Which Perplexity model should I use for my project?

Use sonar for fast, simple lookups where speed and cost matter most. Choose sonar-pro for complex research requiring multiple search passes and deeper analysis. Pick sonar-reasoning or sonar-reasoning-pro for tasks that involve logic, math, or scientific analysis combined with real-time web data. For exhaustive multi-step reports, sonar-deep-research is the most thorough option.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide