Devin Best Practices: Writing Effective Session Playbooks for Higher PR Approval Rates

Devin Best Practices: Writing Effective Session Playbooks for Maximum First-Attempt PR Approval

Devin, Cognition’s autonomous AI software engineer, performs dramatically better when guided by well-structured session playbooks. Teams that invest in clear task prompts, acceptance criteria, and iterative review workflows consistently see 70–90% first-attempt PR approval rates compared to the 30–40% baseline of ad-hoc prompting. This guide covers the proven patterns for writing playbooks that get code merged on the first try.

Step 1: Structure Your Session Playbook

A session playbook is a reusable template that tells Devin exactly what to build, how to validate it, and what standards to meet. Every playbook should contain four core sections.

Playbook Anatomy

SectionPurposeRequired?
Task PromptDefines what Devin should build or fixYes
Acceptance CriteriaMeasurable conditions for completionYes
Context & ConstraintsRepo conventions, forbidden patterns, dependenciesRecommended
Review WorkflowSnapshot checkpoints and feedback instructionsRecommended
Store playbooks as markdown files in your repository so the entire team can iterate on them: mkdir -p .devin/playbooks touch .devin/playbooks/feature-template.md touch .devin/playbooks/bugfix-template.md touch .devin/playbooks/refactor-template.md ## Step 2: Write Structured Task Prompts

Vague prompts produce vague results. Structured prompts dramatically reduce ambiguity and rework cycles.

Bad vs. Good Prompt Comparison

Bad PromptGood Prompt
"Add user authentication""Implement JWT-based authentication middleware in src/middleware/auth.ts using the existing jsonwebtoken package. Protect all routes under /api/v2/*. Return 401 with JSON error body on invalid tokens."
"Fix the bug in payments""Fix the race condition in src/services/payment.ts:processCharge() where concurrent requests can double-charge. Add a distributed lock using the existing Redis client at src/lib/redis.ts. Include a regression test."
### Prompt Template # Task: [Short descriptive title]

Objective

[One sentence describing the deliverable]

Scope

  • Files to modify: [list specific paths]
  • Files NOT to modify: [explicit exclusions]
  • Branch: create from main, name as devin/[feature-name]

Technical Requirements

  • Use [specific library/pattern] for [specific purpose]
  • Follow existing patterns in [reference file path]
  • Maintain backward compatibility with [specific API/interface]

Out of Scope

  • Do NOT refactor unrelated code
  • Do NOT update dependencies
  • Do NOT modify CI configuration

Step 3: Define Precise Acceptance Criteria

Acceptance criteria are the single most important factor in first-attempt approvals. Write them as testable, binary pass/fail conditions. # Acceptance Criteria

Functional

  • POST /api/v2/auth/login returns 200 with valid JWT on correct credentials
  • POST /api/v2/auth/login returns 401 with {"error": "invalid_credentials"} on wrong password
  • All /api/v2/* routes return 401 when no Authorization header is present
  • Token expiry is set to 15 minutes (configurable via AUTH_TOKEN_TTL env var)

Code Quality

  • All new functions have TypeScript return types
  • No any types introduced
  • Existing tests still pass: npm run test
  • New tests added with >80% coverage for new code: npm run test -- --coverage
  • Linting passes: npm run lint

Verification Commands

npm run test
npm run lint
npm run build
curl -X POST http://localhost:3000/api/v2/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"test@example.com","password":"testpass123"}'
```</code></pre>
## Step 4: Configure Snapshot Review Workflows
<p>Snapshots let you inspect Devin's progress at key milestones before it finishes the entire session. Set explicit checkpoints in your playbook.
<code># Review Checkpoints

## Checkpoint 1: Architecture Decision
Pause after creating new files and defining interfaces.
I will review the type definitions and file structure before implementation begins.

## Checkpoint 2: Core Implementation
Pause after implementing the main logic but before writing tests.
I will review the implementation for correctness and adherence to patterns.

## Checkpoint 3: Tests & Final PR
Pause after tests are written and passing.
I will review test coverage and edge cases before PR submission.</code></pre><p>When reviewing snapshots in the Devin dashboard, use this checklist:
- Check the file tree — are changes scoped to the expected files?- Review the shell output — did all commands succeed?- Read the code diff — does it follow the repo's conventions?- Verify no unintended side effects in adjacent files
## Step 5: Implement Iterative Feedback Loops
When a snapshot reveals issues, provide feedback that is specific, actionable, and references exact locations.

### Effective Feedback Format
<code># Feedback on Checkpoint 2

## Issue 1: Missing error handling
In `src/middleware/auth.ts` line 34, the `jwt.verify()` call
is not wrapped in try/catch. Malformed tokens will crash the process.
Fix: wrap in try/catch and return 401 with `{"error": "malformed_token"}`.

## Issue 2: Wrong status code
In `src/routes/auth.ts` line 56, you return 403 for expired tokens.
Our API convention is to return 401 for all auth failures.
Reference: see `src/middleware/legacyAuth.ts` line 22 for the pattern.

## Issue 3: Missing test case
Add a test for tokens signed with a wrong secret key.
This is a critical security edge case.</code></pre><p>Avoid vague feedback like "this doesn't look right" or "improve the error handling." Always specify the file, the line, the problem, and the expected fix.

## Pro Tips for Power Users
- **Compose playbooks from modules:** Create reusable snippets for common acceptance criteria (linting, testing, TypeScript strictness) and reference them across playbooks to maintain consistency.- **Include negative constraints:** Telling Devin what NOT to do is as important as what to do. Explicitly list forbidden patterns, libraries, or approaches.- **Pin dependency versions:** If your task involves packages, specify exact versions to prevent Devin from upgrading or downgrading unexpectedly.- **Use knowledge files:** Place a <code>.devin/knowledge.md</code> file in your repo root documenting architectural decisions, naming conventions, and code patterns. Devin references this automatically.- **Set session-level environment variables:** Pass config values like API base URLs and feature flags so Devin's runtime matches your expectations.- **Chain sessions for complex features:** Break large features into sequential sessions where each builds on the previous PR. This keeps scope small and approval rates high.
## Troubleshooting Common Issues
<table><thead><tr><th>Problem</th><th>Cause</th><th>Solution</th></tr></thead><tbody><tr><td>Devin modifies files outside scope</td><td>Missing scope constraints in playbook</td><td>Add explicit "Files NOT to modify" section and use <code>.devin/ignore</code> patterns</td></tr><tr><td>Tests pass locally but fail in CI</td><td>Environment mismatch</td><td>Include exact Node/Python version and env vars in the playbook context section</td></tr><tr><td>Code style doesn't match repo</td><td>No reference files provided</td><td>Point Devin to 2–3 exemplary files that demonstrate the expected patterns</td></tr><tr><td>Devin gets stuck in a loop</td><td>Contradictory or impossible acceptance criteria</td><td>Review criteria for conflicts; simplify and re-run the session</td></tr><tr><td>PR has unnecessary refactoring</td><td>Prompt is too open-ended</td><td>Add "Out of Scope" section explicitly banning refactoring of existing code</td></tr></tbody></table><!-- RELATED_CONTENT_PLACEHOLDER -->
## Frequently Asked Questions

### How many acceptance criteria should a session playbook include?
Aim for 5–12 specific, testable criteria per session. Fewer than 5 usually means the task is underspecified, leading to assumptions that cause PR rejections. More than 12 often signals the task should be split into multiple sessions. Each criterion should be a binary pass/fail condition that Devin can verify with a command or test.

### When should I use snapshot review checkpoints versus letting Devin run to completion?
Use checkpoints for any task that involves architectural decisions, new file creation, or security-sensitive code. For well-defined, repetitive tasks like adding CRUD endpoints that follow existing patterns, letting Devin run to completion is fine. As a rule, if the task takes more than 30 minutes of Devin time, add at least one mid-session checkpoint to catch issues early.

### How do I handle sessions where Devin's PR needs multiple rounds of revision?
If a PR requires more than two rounds of feedback, stop and improve the playbook rather than continuing to iterate on the PR. Copy Devin's mistakes into the playbook as explicit negative constraints. Update acceptance criteria to cover the edge cases that were missed. This investment pays off across all future sessions that use the template, systematically reducing revision cycles over time.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide