Devin Best Practices: Writing Effective Session Playbooks for Higher PR Approval Rates

Devin Best Practices: Writing Effective Session Playbooks for Maximum First-Attempt PR Approval

Devin, Cognition’s autonomous AI software engineer, performs dramatically better when guided by well-structured session playbooks. Teams that invest in clear task prompts, acceptance criteria, and iterative review workflows consistently see 70–90% first-attempt PR approval rates compared to the 30–40% baseline of ad-hoc prompting. This guide covers the proven patterns for writing playbooks that get code merged on the first try.

Step 1: Structure Your Session Playbook

A session playbook is a reusable template that tells Devin exactly what to build, how to validate it, and what standards to meet. Every playbook should contain four core sections.

Playbook Anatomy

SectionPurposeRequired?
Task PromptDefines what Devin should build or fixYes
Acceptance CriteriaMeasurable conditions for completionYes
Context & ConstraintsRepo conventions, forbidden patterns, dependenciesRecommended
Review WorkflowSnapshot checkpoints and feedback instructionsRecommended
Store playbooks as markdown files in your repository so the entire team can iterate on them: mkdir -p .devin/playbooks touch .devin/playbooks/feature-template.md touch .devin/playbooks/bugfix-template.md touch .devin/playbooks/refactor-template.md ## Step 2: Write Structured Task Prompts

Vague prompts produce vague results. Structured prompts dramatically reduce ambiguity and rework cycles.

Bad vs. Good Prompt Comparison

Bad PromptGood Prompt
"Add user authentication""Implement JWT-based authentication middleware in src/middleware/auth.ts using the existing jsonwebtoken package. Protect all routes under /api/v2/*. Return 401 with JSON error body on invalid tokens."
"Fix the bug in payments""Fix the race condition in src/services/payment.ts:processCharge() where concurrent requests can double-charge. Add a distributed lock using the existing Redis client at src/lib/redis.ts. Include a regression test."
### Prompt Template # Task: [Short descriptive title]

Objective

[One sentence describing the deliverable]

Scope

  • Files to modify: [list specific paths]
  • Files NOT to modify: [explicit exclusions]
  • Branch: create from main, name as devin/[feature-name]

Technical Requirements

  • Use [specific library/pattern] for [specific purpose]
  • Follow existing patterns in [reference file path]
  • Maintain backward compatibility with [specific API/interface]

Out of Scope

  • Do NOT refactor unrelated code
  • Do NOT update dependencies
  • Do NOT modify CI configuration

Step 3: Define Precise Acceptance Criteria

Acceptance criteria are the single most important factor in first-attempt approvals. Write them as testable, binary pass/fail conditions. # Acceptance Criteria

Functional

  • POST /api/v2/auth/login returns 200 with valid JWT on correct credentials
  • POST /api/v2/auth/login returns 401 with {"error": "invalid_credentials"} on wrong password
  • All /api/v2/* routes return 401 when no Authorization header is present
  • Token expiry is set to 15 minutes (configurable via AUTH_TOKEN_TTL env var)

Code Quality

  • All new functions have TypeScript return types
  • No any types introduced
  • Existing tests still pass: npm run test
  • New tests added with >80% coverage for new code: npm run test -- --coverage
  • Linting passes: npm run lint

Verification Commands

npm run test
npm run lint
npm run build
curl -X POST http://localhost:3000/api/v2/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"test@example.com","password":"testpass123"}'
```</code></pre>
## Step 4: Configure Snapshot Review Workflows
<p>Snapshots let you inspect Devin's progress at key milestones before it finishes the entire session. Set explicit checkpoints in your playbook.
<code># Review Checkpoints

## Checkpoint 1: Architecture Decision
Pause after creating new files and defining interfaces.
I will review the type definitions and file structure before implementation begins.

## Checkpoint 2: Core Implementation
Pause after implementing the main logic but before writing tests.
I will review the implementation for correctness and adherence to patterns.

## Checkpoint 3: Tests & Final PR
Pause after tests are written and passing.
I will review test coverage and edge cases before PR submission.</code></pre><p>When reviewing snapshots in the Devin dashboard, use this checklist:
- Check the file tree — are changes scoped to the expected files?- Review the shell output — did all commands succeed?- Read the code diff — does it follow the repo's conventions?- Verify no unintended side effects in adjacent files
## Step 5: Implement Iterative Feedback Loops
When a snapshot reveals issues, provide feedback that is specific, actionable, and references exact locations.

### Effective Feedback Format
<code># Feedback on Checkpoint 2

## Issue 1: Missing error handling
In `src/middleware/auth.ts` line 34, the `jwt.verify()` call
is not wrapped in try/catch. Malformed tokens will crash the process.
Fix: wrap in try/catch and return 401 with `{"error": "malformed_token"}`.

## Issue 2: Wrong status code
In `src/routes/auth.ts` line 56, you return 403 for expired tokens.
Our API convention is to return 401 for all auth failures.
Reference: see `src/middleware/legacyAuth.ts` line 22 for the pattern.

## Issue 3: Missing test case
Add a test for tokens signed with a wrong secret key.
This is a critical security edge case.</code></pre><p>Avoid vague feedback like "this doesn't look right" or "improve the error handling." Always specify the file, the line, the problem, and the expected fix.

## Pro Tips for Power Users
- **Compose playbooks from modules:** Create reusable snippets for common acceptance criteria (linting, testing, TypeScript strictness) and reference them across playbooks to maintain consistency.- **Include negative constraints:** Telling Devin what NOT to do is as important as what to do. Explicitly list forbidden patterns, libraries, or approaches.- **Pin dependency versions:** If your task involves packages, specify exact versions to prevent Devin from upgrading or downgrading unexpectedly.- **Use knowledge files:** Place a <code>.devin/knowledge.md</code> file in your repo root documenting architectural decisions, naming conventions, and code patterns. Devin references this automatically.- **Set session-level environment variables:** Pass config values like API base URLs and feature flags so Devin's runtime matches your expectations.- **Chain sessions for complex features:** Break large features into sequential sessions where each builds on the previous PR. This keeps scope small and approval rates high.
## Troubleshooting Common Issues
<table><thead><tr><th>Problem</th><th>Cause</th><th>Solution</th></tr></thead><tbody><tr><td>Devin modifies files outside scope</td><td>Missing scope constraints in playbook</td><td>Add explicit "Files NOT to modify" section and use <code>.devin/ignore</code> patterns</td></tr><tr><td>Tests pass locally but fail in CI</td><td>Environment mismatch</td><td>Include exact Node/Python version and env vars in the playbook context section</td></tr><tr><td>Code style doesn't match repo</td><td>No reference files provided</td><td>Point Devin to 2–3 exemplary files that demonstrate the expected patterns</td></tr><tr><td>Devin gets stuck in a loop</td><td>Contradictory or impossible acceptance criteria</td><td>Review criteria for conflicts; simplify and re-run the session</td></tr><tr><td>PR has unnecessary refactoring</td><td>Prompt is too open-ended</td><td>Add "Out of Scope" section explicitly banning refactoring of existing code</td></tr></tbody></table><!-- RELATED_CONTENT_PLACEHOLDER -->
## Frequently Asked Questions

### How many acceptance criteria should a session playbook include?
Aim for 5–12 specific, testable criteria per session. Fewer than 5 usually means the task is underspecified, leading to assumptions that cause PR rejections. More than 12 often signals the task should be split into multiple sessions. Each criterion should be a binary pass/fail condition that Devin can verify with a command or test.

### When should I use snapshot review checkpoints versus letting Devin run to completion?
Use checkpoints for any task that involves architectural decisions, new file creation, or security-sensitive code. For well-defined, repetitive tasks like adding CRUD endpoints that follow existing patterns, letting Devin run to completion is fine. As a rule, if the task takes more than 30 minutes of Devin time, add at least one mid-session checkpoint to catch issues early.

### How do I handle sessions where Devin's PR needs multiple rounds of revision?
If a PR requires more than two rounds of feedback, stop and improve the playbook rather than continuing to iterate on the PR. Copy Devin's mistakes into the playbook as explicit negative constraints. Update acceptance criteria to cover the edge cases that were missed. This investment pays off across all future sessions that use the template, systematically reducing revision cycles over time.

Explore More Tools

Grok Best Practices for Real-Time News Analysis and Fact-Checking with X Post Sourcing Best Practices Devin Best Practices: Delegating Multi-File Refactoring with Spec Docs, Branch Isolation & Code Review Checkpoints Best Practices Bolt Case Study: How a Solo Developer Shipped a Full-Stack SaaS MVP in One Weekend Case Study Midjourney Case Study: How an Indie Game Studio Created 200 Consistent Character Assets with Style References and Prompt Chaining Case Study How to Install and Configure Antigravity AI for Automated Physics Simulation Workflows Guide How to Set Up Runway Gen-3 Alpha for AI Video Generation: Complete Configuration Guide Guide Replit Agent vs Cursor AI vs GitHub Copilot Workspace: Full-Stack Prototyping Compared (2026) Comparison How to Build a Multi-Page SaaS Landing Site in v0 with Reusable Components and Next.js Export How-To Kling AI vs Runway Gen-3 vs Pika Labs: Complete AI Video Generation Comparison (2026) Comparison Claude 3.5 Sonnet vs GPT-4o vs Gemini 1.5 Pro: Long-Document Summarization Compared (2025) Comparison Midjourney v6 vs DALL-E 3 vs Stable Diffusion XL: Product Photography Comparison 2025 Comparison Runway Gen-3 Alpha vs Pika 1.0 vs Kling AI: Short-Form Video Ad Creation Compared (2026) Comparison BMI Calculator - Free Online Body Mass Index Tool Calculator Retirement Savings Calculator - Free Online Planner Calculator 13-Week Cash Flow Forecasting Best Practices for Small Businesses: Weekly Updates, Collections Tracking, and Scenario Planning Best Practices 30-60-90 Day Onboarding Plan Template for New Marketing Managers Template Amazon PPC Case Study: How a Private Label Supplement Brand Lowered ACOS With Negative Keyword Mining and Exact-Match Campaigns Case Study ATS-Friendly Resume Formatting Best Practices for Career Changers Best Practices Accounts Payable Automation Case Study: How a Multi-Location Restaurant Group Cut Invoice Processing Time With OCR and Approval Routing Case Study Apartment Move-Out Checklist for Renters: Cleaning, Damage Photos, and Security Deposit Return Checklist