Lymah

Posted on Oct 29

Building Effective Prompts and Workflows for Code Review with goose

#goos #ai #opensource #mcp

Code review is one of the most valuable and time-consuming activities in software development. Done well, it catches bugs, spreads knowledge, and maintains code quality. Done poorly, it becomes a bottleneck that slows delivery and frustrates teams. Enter goose, Block's open source AI agent that can transform how you approach code review.

This guide will show you how to build effective prompts and workflows that leverage goose to enhance your code review process while maintaining the human judgment that makes reviews valuable.

New to goose prompting? Check out Best Practices for Prompt Engineering with goose to master the fundamentals before diving into these code review-specific techniques.

Why Use goose for Code Review?

Before diving into prompts and workflows, let's understand what makes goose uniquely suited for code review:

Local Execution: goose runs on your machine, giving you complete control over your code and keeping sensitive intellectual property secure. Unlike cloud-based tools, there's no data leaving your environment.

Model Flexibility: You can configure goose to use any LLM—from GPT-4 to Claude to local models—allowing you to balance cost, performance, and privacy based on your needs.

Autonomous Action: goose doesn't just suggest changes; it can read files, run tests, check dependencies, and even make modifications autonomously when instructed. This means it can verify its own suggestions before presenting them.

Extensibility: Through the Model Context Protocol (MCP), Goose integrates with your existing tools, including GitHub, GitLab, Jira, and Slack, allowing it to fit seamlessly into your current workflow.

Understanding goose's Code Review Capabilities

goose excels at several code review tasks:

Pattern Recognition: Identifying common anti-patterns, code smells, and violations of established conventions
Context Analysis: Understanding the broader codebase to suggest improvements that align with existing architecture
Test Verification: Running tests to confirm proposed changes don't break functionality
Documentation Gaps: Flagging missing or outdated documentation
Security Scanning: Identifying potential security vulnerabilities and suggesting fixes
Performance Analysis: Spotting inefficient code patterns that could impact performance

Code Review-Specific Prompting Patterns

While the general prompting guide covers foundational techniques, code review requires specialized patterns.

The Severity-Based Review Pattern

Structure your reviews by severity level to ensure critical issues get immediate attention:

Review [files] using these severity levels:

🔴 CRITICAL (block merge):
- Security vulnerabilities (injection, auth bypasses)
- Data loss or corruption risks
- Breaking changes to public APIs
- Unhandled error paths that could crash production

🟡 HIGH (must fix before merge):
- Performance issues (N+1 queries, memory leaks)
- Missing error handling for recoverable errors
- Test coverage below 80% for new code
- SOLID principle violations

🟢 MEDIUM (should address):
- Code readability issues
- Inconsistent naming or style
- Missing documentation
- Potential edge cases

⚪ LOW (nice to have):
- Minor refactoring opportunities
- Additional test coverage
- Style nitpicks

For each issue found:
1. Severity level with emoji
2. Specific line numbers
3. Explanation of impact
4. Concrete fix with code example
5. If CRITICAL or HIGH: implement the fix and verify with tests

Format as a GitHub/GitLab comment I can paste directly.

The Autonomous Fix-and-Verify Pattern

Let goose prove its suggestions work before you review them:

Review [file] for security issues and:

1. Identify each vulnerability
2. Assess the risk (critical/high/medium/low)
3. For CRITICAL and HIGH risks:
   a. Create a fix branch
   b. Implement the solution
   c. Run the test suite
   d. Run security scan tools if available
   e. ONLY present fixes that pass all checks
4. For MEDIUM/LOW:
   a. Suggest the fix with explanation
   b. Don't implement unless I ask

Present findings as:
- Issue: [description]
- Impact: [what could happen]
- Fix: [code that passed tests]
- Test Results: [proof it works]

Why this works: You get pre-validated solutions instead of untested suggestions, saving review cycles.

The Architectural Context Pattern

Help goose understand your codebase's design principles:

Review [new-feature-files] for architectural consistency:

OUR ARCHITECTURE:
1. Layered Architecture (strict):
   - Routes → Controllers → Services → Models
   - NO business logic in controllers
   - NO direct model access from controllers

2. Dependency Rules:
   - Services can depend on models and other services
   - Controllers depend only on services
   - Models have no dependencies on upper layers

3. Error Handling Standard:
   - Services throw domain-specific errors
   - Controllers catch and translate to HTTP responses
   - All errors logged with context

4. File Organization:
   - Group by feature, not by layer
   - Each feature is a subdirectory with all its layers

REVIEW CHECKLIST:
✓ Does this follow our layering rules?
✓ Are dependencies flowing in the right direction?
✓ Is business logic in the correct layer?
✓ Does error handling follow our pattern?
✓ Is the file organization consistent?

For violations:
- Quote the problematic code
- Explain which principle it violates
- Show how to fix it following our patterns
- Reference similar code in the codebase that does it correctly

The Comparative Review Pattern

When evaluating different approaches in a PR:

I'm reviewing two proposed implementations for [feature]:

APPROACH A (Current PR):
[Link or description]

APPROACH B (Alternative suggested):
[Link or description]

Compare them on these dimensions:

1. **Correctness** (do both work?)
   - Test both implementations
   - Check edge cases
   - Verify error handling

2. **Performance** (which is faster?)
   - Analyze time complexity
   - Consider space complexity
   - Note any potential bottlenecks

3. **Maintainability** (which is clearer?)
   - Rate code readability (1-10)
   - Assess how easy to modify
   - Check for code smells

4. **Testability** (which is easier to test?)
   - Mock complexity
   - Test coverage potential
   - Integration test needs

5. **Consistency** (which fits our codebase better?)
   - Pattern alignment
   - Style consistency
   - Dependency choices

6. **Future-proofing** (which handles growth better?)
   - Extensibility
   - Breaking change risk
   - Migration path if needed

Provide:
- Side-by-side comparison table
- Recommendation with confidence level (%)
- Reasoning for recommendation
- If close call, trade-offs to consider

Complete Code Review Workflows

Workflow 1: Pre-Commit Self-Review

Catch issues before creating a PR:
Script: scripts/goose-precommit.sh

#!/bin/bash

echo "🦢 Running goose pre-commit review..."

CHANGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)

if [ -z "$CHANGED_FILES" ]; then
    echo "No staged files to review"
    exit 0
fi

goose session start "precommit-$(date +%s)"

cat <<EOF | goose prompt
Pre-commit review of staged files:

$(echo "$CHANGED_FILES" | sed 's/^/- /')

QUICK CHECKS (auto-fix if possible):
1. Run linter and fix violations
2. Remove console.log/debugger statements
3. Remove commented-out code blocks
4. Check for TODO/FIXME comments (list them)
5. Verify no secrets or API keys

QUALITY CHECKS (report only):
1. New functions have tests?
2. Complex logic has comments?
3. Error handling present?
4. Type safety (TypeScript types/JSDoc)?

For auto-fixed issues:
- Show what you fixed
- Make the changes
- Confirm they're ready to commit

For quality issues:
- Create a checklist of concerns
- Let me decide whether to fix now or later

Safe to commit? YES/NO with reasoning.
EOF

Integration with Git

# .git/hooks/pre-commit
#!/bin/bash
./scripts/goose-precommit.sh

Workflow 2: Automated PR Initial Review

Add this as a GitHub Action to review PRs automatically:
.github/workflows/goose-review.yml

name: goose PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Install goose
        run: |
          curl -sSL https://block.github.io/goose/install.sh | sh
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Run goose Review
        id: goose-review
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          cat > review-prompt.md <<'EOF'
          Automated PR Review for PR #${{ github.event.pull_request.number }}

          FILES CHANGED:
          ${{ steps.changes.outputs.files }}

          PR CONTEXT:
          - Title: ${{ github.event.pull_request.title }}
          - Author: ${{ github.event.pull_request.user.login }}
          - Base: ${{ github.base_ref }}
          - Branch: ${{ github.head_ref }}

          REVIEW FOCUS:

          1. **Quick Wins** (auto-fixable):
             - Linting issues
             - Formatting problems
             - Import organization
             - Type annotations

          2. **Risk Assessment**:
             - Breaking changes?
             - Database migrations?
             - New dependencies?
             - Performance impacts?

          3. **Quality Checks**:
             - Test coverage for new code
             - Error handling completeness
             - Security considerations
             - Documentation updates needed?

          4. **Architecture Review**:
             - Follows our patterns?
             - Proper separation of concerns?
             - Consistent with codebase?

          OUTPUT FORMAT:

          ## goose Automated Review

          ### 📊 Summary
          - **Risk Level**: [LOW/MEDIUM/HIGH] + explanation
          - **Test Coverage**: [X%] for new/modified code
          - **Estimated Review Time**: [X minutes] for human reviewer

          ### 🔴 Critical Issues (block merge)
          [List with file:line references]

          ### 🟡 High Priority (fix before merge)
          [List with file:line references]

          ### 🟢 Suggestions (consider)
          [List with file:line references]

          ### What Looks Good
          [Positive feedback - important for morale!]

          ### Recommended Next Steps
          [Actionable list for the author]

          ### Human Reviewer: Focus Here
          [Areas that need human judgment]

          ---
          *Automated review by Goose • Not a substitute for human review*
          EOF

          goose run --prompt-file review-prompt.md > review-output.md

      - name: Post Review Comment
        uses: actions/github-script@v6
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const fs = require('fs');
            const review = fs.readFileSync('review-output.md', 'utf8');

            await github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: review
            });

Workflow 3: Security-Focused Review

For PRs touching authentication, payments, or sensitive data:

SECURITY REVIEW: [files]

THREAT MODEL for our application:
- User data: emails, names, addresses (PII)
- Payment information: handled via Stripe (PCI-DSS scope)
- Authentication: JWT tokens, bcrypt passwords
- API keys: stored in env vars, never committed
- User uploads: images only, max 5MB

SECURITY CHECKLIST:

**Authentication & Authorization:**
- [ ] All endpoints check authentication?
- [ ] Authorization rules prevent privilege escalation?
- [ ] Session management secure (expiry, renewal, logout)?
- [ ] Password requirements enforced?
- [ ] Account lockout after failed attempts?

**Input Validation:**
- [ ] All user inputs validated against schema?
- [ ] SQL injection prevention (parameterized queries)?
- [ ] XSS prevention (output encoding)?
- [ ] File upload restrictions (type, size, content)?
- [ ] API rate limiting in place?

**Data Protection:**
- [ ] PII encrypted at rest?
- [ ] Passwords hashed with bcrypt (10+ rounds)?
- [ ] Sensitive data not logged?
- [ ] API keys/secrets not hardcoded?
- [ ] HTTPS enforced?

**Security Headers:**
- [ ] Content-Security-Policy set?
- [ ] X-Frame-Options prevents clickjacking?
- [ ] CSRF tokens on state-changing operations?
- [ ] Secure cookie flags (HttpOnly, Secure, SameSite)?

**Error Handling:**
- [ ] Error messages don't leak info (stack traces, db details)?
- [ ] Failed auth attempts logged?
- [ ] Graceful degradation secure?

For each issue found:

1. **Vulnerability**: [What's wrong]
2. **OWASP Category**: [e.g., A03:2021 – Injection]
3. **Attack Scenario**: [How could this be exploited]
4. **Risk Level**: [CRITICAL/HIGH/MEDIUM/LOW]
5. **Fix**: [Specific code changes]
6. **Verification**: [How to test the fix]

CRITICAL ISSUES:
- Implement the fix immediately
- Run security tests to verify
- Create a test case that would fail without the fix

After review:
- Security score: X/10
- Critical issues: [count]
- Ready for security team review? YES/NO

Workflow 4: Performance Review

For performance-sensitive code:

PERFORMANCE REVIEW: [files]

OUR PERFORMANCE BUDGETS:
- API response time: < 200ms (p95)
- Database queries: < 100ms per query
- Page load time: < 3s
- Memory per request: < 50MB
- Max concurrent requests: 1000+

REVIEW CHECKLIST:

**Database Optimization:**
- [ ] N+1 query problems?
- [ ] Missing indexes on queried columns?
- [ ] SELECT * used (fetch only needed columns)?
- [ ] Unnecessary JOINs?
- [ ] Query results cached appropriately?

**Algorithm Efficiency:**
- [ ] Time complexity acceptable? (O(n²) is usually bad)
- [ ] Space complexity reasonable?
- [ ] Nested loops with large datasets?
- [ ] Unnecessary array/object operations?

**Async/Concurrency:**
- [ ] Async operations parallelized where possible?
- [ ] No blocking operations in async code?
- [ ] Promise.all() used for concurrent operations?
- [ ] Resource pooling for expensive operations?

**Memory Management:**
- [ ] Memory leaks (event listeners, closures)?
- [ ] Large objects held in memory unnecessarily?
- [ ] Streams used for large files?
- [ ] Pagination for large result sets?

**Caching Strategy:**
- [ ] Repeated computations cached?
- [ ] HTTP caching headers set correctly?
- [ ] Redis/memory cache used appropriately?
- [ ] Cache invalidation strategy sound?

For each performance issue:

1. **Problem**: [What's slow]
2. **Current Performance**: [Measurements if available]
3. **Why It's Slow**: [Root cause]
4. **Optimization**: [Specific fix with code]
5. **Expected Impact**: [Performance improvement estimate]
6. **Trade-offs**: [Any downsides to the optimization]

BENCHMARK:
- Create a performance test for critical paths
- Run before/after benchmarks
- Show the improvement

Final assessment:
- Meets our performance budgets? YES/NO
- Benchmark results: [before → after]
- Recommended load testing scenarios

Workflow 5: Legacy Code Refactoring Review

When modernizing old code:

LEGACY REFACTORING REVIEW: [files]

MIGRATION CONTEXT:
From: [old pattern/version]
To: [new pattern/version]
Reason: [why we're refactoring]
Constraints:
- [ ] Backward compatibility required? [YES/NO]
- [ ] Gradual rollout needed? [YES/NO]
- [ ] Zero downtime deployment? [YES/NO]

REFACTORING CHECKLIST:

**Behavioral Equivalence:**
1. Load existing test suite
2. Run tests against OLD code: [results]
3. Run tests against NEW code: [results]
4. Any behavioral differences?
   - If YES: Are they intentional improvements?
   - Document each difference
5. Add regression tests for bug fixes

**Code Quality Improvements:**
- [ ] Reduced complexity (cyclomatic complexity)?
- [ ] Better naming and structure?
- [ ] Removed code duplication?
- [ ] Improved error handling?
- [ ] Better separation of concerns?

**Test Coverage:**
- Old coverage: [X%]
- New coverage: [Y%]
- Gaps in new implementation:
  - [List untested scenarios]
- Recommended additional tests:
  - [Specific test cases needed]

**Performance Impact:**
Run benchmarks:
- Old implementation: [metrics]
- New implementation: [metrics]
- Performance change: [+/- X%]
- If slower: is it worth the code quality improvement?

**Dependency Analysis:**
- New dependencies added:
  - [List with justification]
- Dependencies removed:
  - [List - why safe to remove]
- Dependency version changes:
  - [List with breaking changes noted]

**Deployment Strategy:**
Based on the risk, recommend:

- [ ] **Big Bang** (all at once)
  - Low risk, good test coverage
  - No API changes
  - Easy rollback

- [ ] **Feature Flag** (gradual rollout)
  - Medium risk
  - Can test in production with subset of users
  - Easy to revert

- [ ] **Parallel Run** (both versions)
  - High risk
  - Critical system
  - Compare outputs before switching

**Migration Checklist:**
- [ ] Database migrations (if any)
- [ ] Configuration changes needed
- [ ] Environment variable updates
- [ ] Dependent services that need updates
- [ ] Documentation updates
- [ ] Runbook for deployment

**Rollback Plan:**
If the refactoring causes issues:
1. [Step 1 to rollback]
2. [Step 2 to rollback]
3. [How to verify rollback successful]

FINAL ASSESSMENT:
- Risk Level: [LOW/MEDIUM/HIGH]
- Confidence Score: [0-100%] safe to deploy
- Recommended Approach: [Big Bang/Feature Flag/Parallel Run]
- Blockers: [Any issues that must be resolved first]
- Ready for staging? YES/NO

Building a Team Review Culture with Goose

Creating Team Standards

File: .goose/team-standards.md

# Team Code Review Standards

## Our Review Philosophy
- Reviews are for learning, not judging
- Everyone's code gets reviewed, including seniors
- Focus on code, not the coder
- Ask questions, don't demand changes
- Suggest alternatives, don't just criticize

## Severity Levels
- 🔴 CRITICAL: Security, data loss, breaking changes
- 🟡 HIGH: Bugs, performance issues, missing tests
- 🟢 MEDIUM: Maintainability, style, documentation
- ⚪ LOW: Minor improvements, nitpicks

## Response Time Expectations
- Critical PRs: < 2 hours
- Standard PRs: < 1 business day
- Large PRs (500+ lines): < 2 business days

## PR Size Guidelines
- Ideal: < 200 lines
- Maximum: < 500 lines
- If larger: break into multiple PRs

## Required Reviews
- 1 approval for standard changes
- 2 approvals for:
  - Security-sensitive code
  - Database migrations
  - API changes
  - Architecture changes

## Auto-Checks Before Human Review
- [ ] CI passes
- [ ] Tests cover new code (80%+)
- [ ] Goose pre-review complete
- [ ] Self-review checklist done

## What to Look For
- Correctness: Does it work?
- Tests: Is it tested?
- Clarity: Can I understand it?
- Simplicity: Is this the simplest approach?
- Consistency: Does it fit our patterns?

Prompt for Using Team Standards

Review this PR according to our team standards in .goose/team-standards.md:

[PR details]

IMPORTANT: 
- Follow our severity levels exactly
- Keep our review philosophy in mind (constructive, not critical)
- Use our standard format
- Consider our response time when prioritizing issues

Focus your review on what matters most to our team:
1. Correctness and testing (most important)
2. Security and performance
3. Maintainability and clarity
4. Consistency with our patterns

Format output as a friendly, constructive review comment.

Measuring Review Effectiveness

Track these metrics to improve your review process:

Efficiency Metrics

Script: scripts/review-metrics.sh

#!/bin/bash

# Time from PR creation to first review
gh pr list --state merged --json number,createdAt,reviews \
  --jq '.[] | {pr: .number, first_review: (.reviews[0].submittedAt // "none")}' \
  > metrics/first-review-times.json

# Review cycle time
gh pr list --state merged --json number,createdAt,mergedAt,reviews \
  --jq '.[] | {pr: .number, cycles: (.reviews | length), total_time: .mergedAt}' \
  > metrics/review-cycles.json

# Issues found in review vs production
# Track in a simple CSV
echo "date,pr_number,issues_in_review,issues_in_prod" >> metrics/defect-escape.csv

Quality Metrics

Create a review retrospective prompt:

REVIEW RETROSPECTIVE

Analyze our last 10 merged PRs:

QUESTIONS:
1. How many issues did goose catch vs. humans?
   - Break down by severity

2. Were there any defects that made it to production?
   - Should goose have caught them?
   - Update review prompts to catch similar issues

3. What patterns of issues appear repeatedly?
   - Suggest adding them to our automated checks

4. Review cycle analysis:
   - Average time to first review?
   - Average number of review rounds?
   - Where are bottlenecks?

5. goose effectiveness:
   - False positive rate (flags that weren't issues)
   - False negative rate (missed issues)
   - Time saved vs. manual review

RECOMMENDATIONS:
Based on this analysis:
- What should we add to our automated checks?
- What prompts should we refine?
- Where do we still need more human judgment?
- What's working well that we should do more of?

Save findings to docs/review-retrospectives/YYYY-MM.md

Advanced Techniques

Multi-Perspective Review

Get different angles on the same code:

Review [files] from multiple perspectives:

PERSPECTIVE 1 - SECURITY ANALYST:
You are a security expert looking for vulnerabilities.
[Full security checklist]

PERSPECTIVE 2 - PERFORMANCE ENGINEER:
You are a performance expert looking for inefficiencies.
[Full performance checklist]

PERSPECTIVE 3 - JUNIOR DEVELOPER:
You are a junior developer trying to understand this code.
What's confusing? What needs better comments?

PERSPECTIVE 4 - FUTURE MAINTAINER:
You will maintain this code in 2 years.
What will make your life difficult?

Synthesize findings across all perspectives and prioritize.

Contextual Learning from Past Reviews

LEARN FROM HISTORY:

Before reviewing [current-pr], analyze:
1. Similar PRs we've reviewed before
2. Issues we found in those reviews
3. Patterns of problems in this area of the codebase

THEN review [current-pr] with extra attention to:
- Historically problematic patterns
- Issues that have bitten us before
- Areas where we commonly find bugs

Reference specific past PRs if you find similar issues:
"This is similar to the issue we found in PR #123..."

Explaining Review Feedback

I'm about to post this review feedback to a developer:

[Your draft review comments]

Before I post:
1. Review my tone - is it constructive and helpful?
2. Are any comments unclear or ambiguous?
3. Could any feedback be misinterpreted as personal criticism?
4. Am I explaining WHY something is an issue, not just THAT it's an issue?
5. Did I include what's good, not just what's wrong?

Suggest improvements to make this review more effective and kind.

Troubleshooting Common Review Challenges

Challenge: Too Many False Positives

Symptom: goose flags things that aren't actually problems in your context.
Solution: Build context into your prompts:

Review [files] with these EXCEPTIONS:

FALSE POSITIVES TO IGNORE:
- console.log in /scripts/ (we use these for CLI tools)
- any/unknown types in /types/legacy/ (migrating gradually)
- TODO comments with ticket numbers (tracked in Jira)
- Magic numbers in /config/ (configuration values)
- process.env in /config/ (centralized config only)

CONTEXT ABOUT OUR CODEBASE:
- We use dependency injection (constructors take many params)
- We prefer composition over inheritance (not OOP)
- We use React's useCallback extensively (not unnecessary)
- Our API uses snake_case (not camelCase)

Only flag genuine issues, not patterns we've consciously chosen.

Challenge: goose Misses Domain-Specific Issues

Symptom: goose doesn't catch bugs that require domain knowledge.
Solution: Add domain context to reviews:

Review [financial-transaction-code] with this DOMAIN CONTEXT:

FINANCIAL RULES:
- All currency calculations must use Decimal type (not Number)
- Transactions must be idempotent (safe to retry)
- Amounts must always have 2 decimal places
- Currency codes must be ISO 4217 (USD, EUR, etc.)
- Negative amounts only allowed for refunds/chargebacks

BUSINESS RULES:
- Minimum transaction: $0.50
- Maximum transaction: $10,000
- Refunds allowed within 90 days
- Partial refunds must not exceed original amount

COMPLIANCE:
- PCI-DSS: never log full card numbers
- GDPR: include data retention policies
- SOX: maintain audit trail for all changes

Flag any violations of these rules with severity CRITICAL.

Challenge: Inconsistent Review Quality

Symptom: Some reviews are thorough, others miss obvious issues.
Solution: Create standardized review templates:
File: .goose/prompts/standard-review.md

# Standard Code Review Template

## PR Information
- PR: #{{pr_number}}
- Files: {{files}}
- Author: {{author}}

## Pre-Flight Checks
- [ ] CI passing?
- [ ] Tests included?
- [ ] PR description clear?

## Review Stages

### Stage 1: Quick Scan (2 min)
- Breaking changes?
- Security red flags?
- Missing tests?

STOP: If critical issues found, report immediately.

### Stage 2: Detailed Review (main)
- [Full review checklist from .goose/team-standards.md]

### Stage 3: Suggestions (optional)
- Refactoring opportunities
- Additional tests
- Documentation improvements

## Output Format
[Standard format from team standards]

---
Use this template: `goose review --template .goose/prompts/standard-review.md`

Integration with Your Dev Tools

VS Code Integration

.vscode/tasks.json

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Goose: Review Current File",
      "type": "shell",
      "command": "goose",
      "args": [
        "prompt",
        "Review ${file} for issues. Use our standards in .goose/team-standards.md"
      ],
      "presentation": {
        "reveal": "always",
        "panel": "new"
      }
    },
    {
      "label": "Goose: Review Git Changes",
      "type": "shell",
      "command": "bash",
      "args": [
        "-c",
        "goose prompt \"Review my uncommitted changes: $(git diff --name-only)\""
      ]
    }
  ]
}

Git Aliases

# Add to ~/.gitconfig
[alias]
    review = "!goose prompt 'Review my staged changes before committing'"
    review-pr = "!f() { goose prompt \"Review PR #$1 for merge readiness\"; }; f"
    review-security = "!goose prompt --template .goose/prompts/security-review.md"

Real-World Example: Complete Review Session

Let's walk through reviewing a real feature PR:
Scenario: Developer submits PR adding user profile editing.

Stage 1: Initial Triage

goose session start "review-pr-1234"

Quick triage of PR #1234: "Add user profile editing"

FILES CHANGED:
- src/controllers/user-controller.js (modified, +89 lines)
- src/services/user-service.js (modified, +134 lines)
- src/models/user.js (modified, +45 lines)
- tests/user-service.test.js (new, +156 lines)
- docs/api.md (modified, +23 lines)

QUICK CHECKS (30 seconds each):
1. Does CI pass? ✓/✗
2. Are tests included? ✓/✗
3. Is test coverage adequate (80%+)? ✓/✗
4. Any obvious security issues? ✓/✗
5. Documentation updated? ✓/✗

If any ✗, flag as NEEDS WORK with specifics.
If all ✓, proceed to detailed review.

goose Output:

TRIAGE RESULTS:

✅ CI Status: Passing (all 234 tests)
✅ Tests Included: Yes (156 new test cases)
✅ Test Coverage: 87% (exceeds requirement)
⚠️  Security: Potential issue found
✅ Documentation: Updated

🔴 BLOCKER FOUND:
File: src/services/user-service.js:78
Issue: User input not validated before database update
Impact: SQL injection risk
Severity: CRITICAL

RECOMMENDATION: Request changes. Security issue must be fixed before detailed review.

Stage 2: Developer Fixes Issue

Developer pushes fix. Resume review:

Security issue fixed in latest commit. 

Continue with detailed review of PR #1234:

ARCHITECTURAL REVIEW:
- Follows our Controller→Service→Model pattern?
- Business logic properly in service layer?
- No direct DB access from controller?
- Error handling consistent with our patterns?

CODE QUALITY:
- Naming clear and consistent?
- Functions focused (single responsibility)?
- Comments explain "why", not "what"?
- No code duplication?

EDGE CASES:
- What happens with:
  - Empty string inputs?
  - Very long inputs (SQL varchar limits)?
  - Special characters in names?
  - Concurrent updates to same profile?
  - Non-existent user IDs?

Run tests for each edge case. If not covered, suggest additional tests.

PERFORMANCE:
- Any N+1 query issues?
- Unnecessary data fetching?
- Missing indexes on queried columns?

Format as a structured review comment.

goose Output:

## Code Review: PR #1234

### ✅ Architecture (9/10)

**Strengths:**
- Perfect adherence to Controller→Service→Model layering
- Business logic correctly isolated in service layer
- Consistent with existing patterns

**Minor suggestion:**
- Line 134: Consider extracting validation logic to a separate validator function for reusability

### ⚠️  Edge Case Handling (6/10 - NEEDS IMPROVEMENT)

**Issues Found:**

1. **MEDIUM**: Empty string handling (user-service.js:89)
javascript
   // Current code allows empty strings
   if (profileData.bio) { ... }

   // Should validate:
   if (profileData.bio && profileData.bio.trim().length > 0) { ... }

2. **MEDIUM**: Missing length validation (user-service.js:92)
   - Database column: VARCHAR(500) for bio
   - No validation before insert
   - Could cause database errors
   - **Fix**: Add validation: `if (bio.length > 500) throw new ValidationError(...)`

3. **HIGH**: Concurrent update handling (user-service.js:78-95)
   - No optimistic locking
   - Last write wins (could lose data)
   - **Tested scenario**: Two users update same profile simultaneously
   - **Result**: Second update overwrites first without warning
   - **Fix**: Add version field or updated_at check

**Suggested Tests to Add:**
javascript
describe('edge cases', () => {
  it('should reject empty bio after trim', async () => {
    await expect(
      userService.updateProfile(userId, { bio: '   ' })
    ).rejects.toThrow(ValidationError);
  });

  it('should reject bio exceeding 500 chars', async () => {
    const longBio = 'a'.repeat(501);
    await expect(
      userService.updateProfile(userId, { bio: longBio })
    ).rejects.toThrow(ValidationError);
  });

  it('should handle concurrent updates gracefully', async () => {
    // Test implementation
  });
});


### ✅ Performance (9/10)

**Strengths:**
- No N+1 queries detected
- Efficient single query for profile update
- Appropriate use of indexes

**Optimization opportunity:**
- Line 112: Fetches entire user object when only ID needed
- **Current**: `SELECT * FROM users WHERE id = ?`
- **Optimized**: `SELECT id FROM users WHERE id = ?`
- **Impact**: Minor, but good practice

### 🎯 Overall Assessment

**Ready for merge**: NO (after addressing MEDIUM/HIGH issues)

**Estimated fix time**: 1-2 hours

**Priority fixes:**
1. ✅ CRITICAL: SQL injection (FIXED in latest commit)
2. 🟡 HIGH: Concurrent update handling
3. 🟡 MEDIUM: Input validation (empty strings, length)
4. 🟢 LOW: Query optimization

**What's Good** (don't forget the positives!):
- Excellent test coverage (87%)
- Clean, readable code
- Good documentation
- Follows team patterns

**Recommended Next Steps for Author:**
1. Add optimistic locking or version checking
2. Implement input validation with helpful error messages
3. Add the suggested edge case tests
4. Consider the query optimization

**For Human Reviewers:**
Focus your time on:
- Validating the concurrent update solution
- Reviewing the business logic for profile restrictions
- Checking if the API changes are backward compatible

Stage 3: Interactive Review Session

After initial review, have a conversation with goose:

Developer Question: "For the concurrent update issue, what's better: 
optimistic locking or a version field?"

Goose Response Analysis:
Compare both approaches for our use case:
- Our database: PostgreSQL
- Update frequency: Low (users rarely edit profiles simultaneously)
- Consistency requirements: Medium (okay if occasional conflict)
- Existing patterns: Check what we use elsewhere in the codebase

Recommend the simpler approach that fits our patterns.

goose Output:

CONCURRENT UPDATE SOLUTIONS COMPARISON:

**Option 1: Version Field (Optimistic Locking)**
sql
ALTER TABLE users ADD COLUMN version INTEGER DEFAULT 0;

javascript
// Update with version check
const result = await User.update(
  { ...updates, version: currentVersion + 1 },
  { where: { id: userId, version: currentVersion } }
);

if (result.rowCount === 0) {
  throw new ConflictError('Profile was updated by another user');
}

**Pros:**
- Explicit versioning
- Easy to reason about
- Common pattern

**Cons:**
- Requires schema change
- Another field to maintain

**Option 2: Updated_at Timestamp Check**
javascript
// We already have updated_at column
const currentUpdatedAt = await User.findByPk(userId, { 
  attributes: ['updated_at'] 
});

const result = await User.update(
  updates,
  { 
    where: { 
      id: userId, 
      updated_at: currentUpdatedAt 
    } 
  }
);

if (result.rowCount === 0) {
  throw new ConflictError('Profile was modified. Please refresh and try again.');
}

**Pros:**
- No schema change needed
- We already track updated_at
- Simpler migration

**Cons:**
- Timestamp precision might cause issues
- Less explicit than version field

**What we use elsewhere:**
Checked codebase... We use updated_at checks in:
- src/services/post-service.js
- src/services/settings-service.js

**RECOMMENDATION: Option 2 (updated_at check)**

Confidence: 85%

Reasoning:
1. Consistent with existing patterns
2. No migration needed
3. Profile updates are low-frequency
4. Simpler implementation
5. Updated_at is already indexed

**Implementation:**
I can implement this now if you'd like. Should I proceed?

Creating Your Review Prompt Library

Build a collection of reusable prompts:

Directory Structure:

.goose/
├── prompts/
│   ├── quick-review.md
│   ├── security-review.md
│   ├── performance-review.md
│   ├── refactoring-review.md
│   ├── api-review.md
│   └── test-review.md
├── team-standards.md
└── config.yaml

Example: .goose/prompts/api-review.md

# API Endpoint Review

Review {{files}} as a new/modified API endpoint:

## API Standards Check

**RESTful Conventions:**
- [ ] Correct HTTP method (GET/POST/PUT/DELETE/PATCH)
- [ ] Proper status codes (200, 201, 400, 404, 500, etc.)
- [ ] Resource naming (plural nouns, kebab-case)
- [ ] Proper use of path params vs query params

**Request Validation:**
- [ ] Input schema validation (Joi/Yup/Zod)
- [ ] Required fields enforced
- [ ] Data types validated
- [ ] Size limits on strings/arrays
- [ ] Sanitization of user input

**Response Format:**
- [ ] Consistent response structure
- [ ] Proper error format: `{ error: "message", code: "ERROR_CODE" }`
- [ ] Success format: `{ data: {...}, meta: {...} }`
- [ ] Pagination for lists: `{ data: [], pagination: { page, limit, total } }`

**Security:**
- [ ] Authentication required (unless public)
- [ ] Authorization checks (user can access this resource)
- [ ] Rate limiting configured
- [ ] CORS properly configured
- [ ] No sensitive data in logs

**Documentation:**
- [ ] OpenAPI/Swagger documentation
- [ ] Example requests/responses
- [ ] Error codes documented
- [ ] Authentication requirements clear

**Testing:**
- [ ] Success cases tested
- [ ] Error cases tested (400, 401, 403, 404)
- [ ] Edge cases tested
- [ ] Integration tests for external dependencies

**Backward Compatibility:**
- [ ] Breaking changes to existing endpoints?
- [ ] Version bump needed?
- [ ] Migration path for clients?
- [ ] Deprecation warnings if needed

## Review Output

For each issue found:
- **Category**: [Request/Response/Security/Documentation/Testing]
- **Severity**: [CRITICAL/HIGH/MEDIUM/LOW]
- **Issue**: [Description]
- **Location**: [File:Line]
- **Fix**: [Specific suggestion]

## API Compatibility Analysis

Compare with existing API patterns:
1. Does response format match our other endpoints?
2. Is error handling consistent?
3. Does it follow our authentication pattern?
4. Is it documented like our other endpoints?

## Performance Considerations

- Potential for high traffic?
- Database query efficiency?
- Need for caching?
- Rate limiting appropriate?

## Final Checklist

- [ ] Follows REST conventions
- [ ] Input validated
- [ ] Properly secured
- [ ] Well documented
- [ ] Thoroughly tested
- [ ] Backward compatible
- [ ] Performance acceptable

**Ready for production**: YES/NO

Continuous Improvement

Weekly Review Retrospective

Run this every Friday:

#!/bin/bash
# scripts/weekly-review-retrospective.sh

WEEK_START=$(date -d "last monday" +%Y-%m-%d)
WEEK_END=$(date +%Y-%m-%d)

goose prompt <<EOF
Weekly Code Review Retrospective: $WEEK_START to $WEEK_END

Analyze all PRs merged this week:

$(gh pr list --state merged --search "merged:>$WEEK_START" --json number,title,reviews)

METRICS TO CALCULATE:
1. Total PRs reviewed: [count]
2. Average time to first review: [hours]
3. Average number of review rounds: [count]
4. Common issues found:
   - By category (security, performance, style, etc.)
   - By severity
5. Goose effectiveness:
   - Issues caught by Goose vs humans
   - False positives from Goose
   - Time saved estimates

TRENDS:
- Are we getting faster at reviews?
- Are we finding fewer issues (improving code quality)?
- Are same issues appearing repeatedly (need automation)?

IMPROVEMENTS FOR NEXT WEEK:
- New automated checks to add
- Prompt templates to create/update
- Team standards to clarify
- Training needs identified

OUTPUT:
Save to docs/retrospectives/$(date +%Y-week-%U).md
EOF

Learning from Production Issues

When bugs escape to production:

POST-INCIDENT REVIEW:

PRODUCTION ISSUE: {{issue_description}}
Discovered: {{date}}
Severity: {{impact}}

CODE REVIEW FORENSICS:
1. Was this code reviewed? PR #{{number}}
2. Pull up the review comments
3. Should this issue have been caught?

ANALYSIS:
- What type of issue was it?
  - Logic error
  - Edge case
  - Performance issue
  - Security vulnerability

- Why wasn't it caught in review?
  - Not tested?
  - Test didn't cover this scenario?
  - Reviewer missed it?
  - Goose didn't flag it?

- What could prevent this in the future?
  - Add to automated checks
  - Update review checklist
  - Improve test coverage requirements
  - Add specific prompt for this pattern

ACTION ITEMS:
1. Update review prompts to catch similar issues
2. Add to automated test scenarios
3. Document in team-standards.md
4. Share learnings in team meeting

Save learnings to docs/incident-learnings/{{incident-id}}.md

Best Practices Summary

Do's ✅

Provide Context: Always include relevant background, constraints, and standards
Use Severity Levels: Prioritize issues consistently (Critical/High/Medium/Low)
Verify Fixes: Have Goose implement and test critical fixes before presenting
Be Specific: Reference exact files, lines, and code patterns
Build Incrementally: Start with quick checks, then deep dive
Reference Standards: Point to your team's documented conventions
Track Metrics: Measure effectiveness and iterate on prompts
Stay Positive: Include what's good, not just what's wrong
Learn Continuously: Update prompts based on what works
Human-in-Loop: Use goose to augment, not replace, human judgment

Don'ts ❌

Don't Be Vague: "Review this code" → Specify what to look for
Don't Assume Context: goose doesn't know your codebase intimately
Don't Skip Verification: Always validate AI suggestions before applying
Don't Ignore False Positives: Update prompts to reduce noise
Don't Over-Automate: Some things need human judgment (architecture, UX)
Don't Forget Security: Always include security in review criteria
Don't Review Giant PRs: Break them down first, or review incrementally
Don't Neglect Documentation: Include docs in review scope
Don't Deploy Blindly: Review AI-generated fixes before committing
Don't Forget to Iterate: Improve your prompts based on results

Conclusion

Effective code review with goose isn't about replacing human reviewers; it's about making them more effective. By crafting thoughtful prompts and building structured workflows, you can:

Catch more issues earlier, before human reviewers spend time
Focus human attention on architecture, design, and complex logic
Maintain consistency across all reviews, regardless of reviewer
Speed up feedback loops with automated initial reviews
Improve code quality through systematic, comprehensive checks
Learn continuously by tracking what works and refining your approach

The key is treating goose as a junior reviewer that you're training. Start with the basics from the general prompting guide, then build specialized review workflows that fit your team's unique needs and standards

Your review process will improve over time as you:

Refine prompts based on what issues they catch
Add domain-specific checks for your application
Build a library of reusable templates
Track metrics and iterate on effectiveness

Start with one workflow, measure its impact, and expand from there. Code review with goose is a journey of continuous improvement—just like the code itself.

Resources

goose Documentation
goose GitHub
Prompt Engineering Guide: Best Practices for Prompt Engineering with goose
Getting Started Guide: Getting Started with goose on Windows
Community: Discord | GitHub Discussions

Building effective code review workflows with goose? Share your prompts and learnings with the community! We'd love to hear what's working for your team.