shreyas shinde

Posted on Oct 16 • Originally published at kanaeru.ai on Oct 16

Testing with Real Services: A Pragmatic Guide to Integration Testing Without Mocks

#integrationtesting #english

Listen up, team. I'm Integra, and I'm here to tell you something that might ruffle some feathers: your mock-heavy test suite is giving you a false sense of security. Sure, mocks are fast, predictable, and easy to set up. But they're also lying to you about how your system actually behaves in production.

After years of watching "well-tested" applications crumble in production because their integration points were validated against fantasyland mocks, I've become a staunch advocate for real service testing. Not because I'm a purist, but because I'm pragmatic. I want tests that actually catch the bugs that matter.

In this guide, I'll walk you through the systematic approach to integration testing with real services—the kind that actually tells you if your database queries work, if your API calls succeed, and if your message queues deliver messages. We'll cover environment setup, credential management, cleanup strategies, and how to achieve that sweet spot of 90-95% coverage without burning down your CI/CD pipeline.

Why Real Services Beat Mocks (Most of the Time)

Let's address the elephant in the room first. The testing pyramid, introduced by Mike Cohn in 2009, has guided generations of developers toward a foundation of unit tests with fewer integration tests on top. And that's still sound advice. But here's where teams go wrong: they replace all integration testing with mocked dependencies, thinking they're being efficient.

The Problem with Mock-First Testing

When you mock your database, you're testing your mock, not your database. When you mock your HTTP client, you're validating that you called fetch() correctly, not that the remote API actually returns the data your code expects.

Here's what mocks can't catch:

Schema mismatches : Your mock returns user.firstName, but the API actually sends user.first_name
Network failures : Timeouts, connection resets, DNS failures—all invisible in mock-land
Database constraints : Your mock happily accepts duplicate emails, but PostgreSQL throws a unique constraint violation
Authentication flows : OAuth tokens expire, refresh tokens fail, API keys get rate-limited
Serialization issues : That JavaScript Date object doesn't serialize the way you think it does

As Philipp Hauer eloquently put it in his 2019 article: "Integration tests test all classes and layers together in the same way as in production. This makes bugs in the integration of classes much more likely to be detected and tests are more meaningful".

When Mocks ARE Appropriate

I'm not a zealot. There are legitimate scenarios for mocks even in integration testing:

Testing failure scenarios : Network simulators like Toxiproxy can inject latency and failures in controlled ways
Third-party services you don't control : If you're integrating with Stripe's production API, you probably want their test mode, not real charges
Slow or expensive operations : If your ML model takes 5 minutes to train, mock the inference in most tests
Isolating specific components : Testing service A's behavior when service B fails? Mock B's responses

The key principle: mock at the boundaries, test the integration.

Setting Up Test Environments That Don't Lie

A test environment that mirrors production is non-negotiable for real service testing. But "mirror production" doesn't mean "duplicate your entire AWS infrastructure." It means having the same types of services with the same interfaces.

The Container Revolution

Thanks to Docker and Testcontainers, we can spin up real databases, message queues, and even complex services in seconds. Here's what a modern test environment looks like:

// testSetup.ts - Environment bootstrapping
import { GenericContainer, StartedTestContainer } from 'testcontainers';
import { Pool } from 'pg';
import Redis from 'ioredis';

export class TestEnvironment {
  private postgresContainer: StartedTestContainer;
  private redisContainer: StartedTestContainer;
  private dbPool: Pool;
  private redisClient: Redis;

  async setup(): Promise<void> {
    // Start PostgreSQL with exact production version
    this.postgresContainer = await new GenericContainer('postgres:15-alpine')
      .withEnvironment({
        POSTGRES_USER: 'testuser',
        POSTGRES_PASSWORD: 'testpass',
        POSTGRES_DB: 'testdb',
      })
      .withExposedPorts(5432)
      .start();

    // Start Redis with production configuration
    this.redisContainer = await new GenericContainer('redis:7-alpine')
      .withExposedPorts(6379)
      .start();

    // Initialize real clients
    const pgPort = this.postgresContainer.getMappedPort(5432);
    this.dbPool = new Pool({
      host: 'localhost',
      port: pgPort,
      user: 'testuser',
      password: 'testpass',
      database: 'testdb',
    });

    const redisPort = this.redisContainer.getMappedPort(6379);
    this.redisClient = new Redis({ host: 'localhost', port: redisPort });

    // Run migrations on real database
    await this.runMigrations();
  }

  async cleanup(): Promise<void> {
    await this.dbPool.end();
    await this.redisClient.quit();
    await this.postgresContainer.stop();
    await this.redisContainer.stop();
  }

  getDbPool(): Pool {
    return this.dbPool;
  }

  getRedisClient(): Redis {
    return this.redisClient;
  }

  private async runMigrations(): Promise<void> {
    // Run your actual migration scripts
    // This ensures test DB schema matches production
    const migrationSQL = await readFile('./migrations/001_initial.sql', 'utf-8');
    await this.dbPool.query(migrationSQL);
  }
}

Key insight : Notice we're using the exact same PostgreSQL version as production. Version mismatches are a common source of "works on my machine" bugs.

Environment Configuration Strategy

Your test environment needs different configurations than production, but the same structure. Here's the pattern I recommend:

// config/test.ts
export const testConfig = {
  database: {
    // Provided by Testcontainers at runtime
    host: process.env.TEST_DB_HOST || 'localhost',
    port: parseInt(process.env.TEST_DB_PORT || '5432'),
    // Safe credentials for testing
    user: 'testuser',
    password: 'testpass',
  },

  externalAPIs: {
    // Use sandbox/test modes of real services
    stripe: {
      apiKey: process.env.STRIPE_TEST_KEY, // sk_test_...
      webhookSecret: process.env.STRIPE_TEST_WEBHOOK_SECRET,
    },
    sendgrid: {
      apiKey: process.env.SENDGRID_TEST_KEY,
      // Use SendGrid's sandbox mode
      sandboxMode: true,
    },
  },

  // Feature flags for test scenarios
  features: {
    enableRateLimiting: true, // Test rate limits!
    enableCaching: true, // Test cache invalidation!
    enableRetries: true, // Test retry logic!
  },
};

Managing API Credentials: The Right Way

Here's where many teams stumble: they hardcode test API keys in their codebase or, worse, use production keys in tests. Both are security nightmares.

The Secret Management Hierarchy

Local Development : Use .env.test files (gitignored!) with test credentials
CI/CD Pipelines : Store secrets in your CI provider's vault (GitHub Secrets, GitLab CI/CD variables, etc.)
Shared Test Environments : Use dedicated secret managers (AWS Secrets Manager, HashiCorp Vault)

Here's a robust credential loading pattern:

// lib/testCredentials.ts
import { config } from 'dotenv';

export class TestCredentialManager {
  private credentials: Map<string, string> = new Map();

  constructor() {
    // Load from .env.test if present (local dev)
    config({ path: '.env.test' });

    // Override with CI environment variables if present
    this.loadFromEnvironment();

    // Validate required credentials
    this.validate();
  }

  private loadFromEnvironment(): void {
    const requiredCreds = [
      'STRIPE_TEST_KEY',
      'SENDGRID_TEST_KEY',
      'AWS_TEST_ACCESS_KEY',
      'AWS_TEST_SECRET_KEY',
    ];

    requiredCreds.forEach((key) => {
      const value = process.env[key];
      if (value) {
        this.credentials.set(key, value);
      }
    });
  }

  private validate(): void {
    const missing: string[] = [];

    // Check for essential credentials
    if (!this.credentials.has('STRIPE_TEST_KEY')) {
      missing.push('STRIPE_TEST_KEY');
    }

    if (missing.length > 0) {
      console.warn(
        `⚠️ Missing test credentials: ${missing.join(', ')}\n` +
        `Some integration tests will be skipped.\n` +
        `See README.md for credential setup instructions.`
      );
    }
  }

  get(key: string): string | undefined {
    return this.credentials.get(key);
  }

  has(key: string): boolean {
    return this.credentials.has(key);
  }

  // Fail gracefully when credentials are missing
  requireOrSkip(key: string, testFn: () => void): void {
    if (!this.has(key)) {
      console.log(`⏭️ Skipping test - missing ${key}`);
      return;
    }
    testFn();
  }
}

// Usage in tests
const credManager = new TestCredentialManager();

describe('Stripe Payment Integration', () => {
  it('should process payment with real Stripe API', async () => {
    credManager.requireOrSkip('STRIPE_TEST_KEY', async () => {
      const stripe = new Stripe(credManager.get('STRIPE_TEST_KEY')!);

      const paymentIntent = await stripe.paymentIntents.create({
        amount: 1000,
        currency: 'usd',
        payment_method_types: ['card'],
      });

      expect(paymentIntent.status).toBe('requires_payment_method');
    });
  });
});

Critical principle : Tests should gracefully degrade when credentials are missing, not crash the entire suite. This lets developers run partial test suites locally while CI runs the full battery.

CI/CD Integration Pattern

In your GitHub Actions workflow:

# .github/workflows/test.yml
name: Integration Tests

on: [push, pull_request]

jobs:
  integration-tests:
    runs-on: ubuntu-latest

    env:
      # Inject secrets from GitHub Secrets
      STRIPE_TEST_KEY: ${{ secrets.STRIPE_TEST_KEY }}
      SENDGRID_TEST_KEY: ${{ secrets.SENDGRID_TEST_KEY }}
      AWS_TEST_ACCESS_KEY: ${{ secrets.AWS_TEST_ACCESS_KEY }}
      AWS_TEST_SECRET_KEY: ${{ secrets.AWS_TEST_SECRET_KEY }}

    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci

      - name: Run integration tests
        run: npm run test:integration

      - name: Upload coverage reports
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/integration-coverage.json

Cleanup Strategies: The Idempotency Imperative

Here's a truth bomb: if your tests aren't idempotent, they're not reliable. Idempotent tests produce the same results every time they run, regardless of previous executions.

The biggest threat to idempotency? Dirty state. Test A creates a user with email test@example.com, test B assumes that email is available. Test B fails. You debug for an hour before realizing test A didn't clean up.

The Setup-Before Pattern (Recommended)

Contrary to intuition, cleaning up before tests is more reliable than cleaning up after :

// tests/integration/userService.test.ts
describe('UserService Integration', () => {
  let testEnv: TestEnvironment;
  let userService: UserService;

  beforeAll(async () => {
    testEnv = new TestEnvironment();
    await testEnv.setup();
  });

  afterAll(async () => {
    await testEnv.cleanup();
  });

  beforeEach(async () => {
    // CLEAN BEFORE, not after
    // This ensures tests start from known state
    await cleanDatabase(testEnv.getDbPool());

    userService = new UserService(testEnv.getDbPool());
  });

  it('should create user with unique email', async () => {
    const user = await userService.createUser({
      email: 'test@example.com',
      name: 'Test User',
    });

    expect(user.id).toBeDefined();
    expect(user.email).toBe('test@example.com');
  });

  it('should reject duplicate email', async () => {
    await userService.createUser({
      email: 'duplicate@example.com',
      name: 'User One',
    });

    await expect(
      userService.createUser({
        email: 'duplicate@example.com',
        name: 'User Two',
      })
    ).rejects.toThrow('Email already exists');
  });
});

async function cleanDatabase(pool: Pool): Promise<void> {
  // Truncate tables in correct order (respecting foreign keys)
  await pool.query('TRUNCATE users, orders, payments CASCADE');
}

Why cleanup before? If a test crashes mid-execution, the after-cleanup never runs. The database stays dirty. The next test run fails mysteriously. With before-cleanup, every test starts from a known state.

The Try-Finally Pattern for External Services

For external APIs and services you can't easily reset, use try-finally blocks:

it('should send email via SendGrid', async () => {
  const testEmailId = `test-${Date.now()}@example.com`;
  let emailSent = false;

  try {
    // Arrange
    const sendgrid = new SendGridClient(testConfig.sendgridApiKey);

    // Act
    await sendgrid.send({
      to: testEmailId,
      from: 'noreply@example.com',
      subject: 'Test Email',
      text: 'This is a test',
    });
    emailSent = true;

    // Assert
    const emails = await sendgrid.searchEmails({
      to: testEmailId,
      limit: 1,
    });
    expect(emails).toHaveLength(1);

  } finally {
    // Cleanup - even if test fails
    if (emailSent) {
      await sendgrid.deleteEmail(testEmailId);
    }
  }
});

Handling Parallel Test Execution

Modern test runners execute tests in parallel for speed. This is great until test A deletes the user test B is querying. The solution? Data isolation :

// testDataFactory.ts
export class TestDataFactory {
  private static counter = 0;

  static uniqueEmail(): string {
    return `test-${process.pid}-${TestDataFactory.counter++}@example.com`;
  }

  static uniqueUserId(): string {
    return `user-${process.pid}-${TestDataFactory.counter++}`;
  }

  static async createIsolatedUser(pool: Pool): Promise<User> {
    const email = TestDataFactory.uniqueEmail();
    const result = await pool.query(
      'INSERT INTO users (email, name) VALUES ($1, $2) RETURNING *',
      [email, `Test User ${TestDataFactory.counter}`]
    );
    return result.rows[0];
  }
}

// Usage ensures no collisions between parallel tests
it('test A with isolated data', async () => {
  const user = await TestDataFactory.createIsolatedUser(pool);
  // Test uses user, no other test can access this user
});

it('test B with isolated data', async () => {
  const user = await TestDataFactory.createIsolatedUser(pool);
  // Runs in parallel with test A, zero conflicts
});

Testing Error Scenarios: Where Real Services Shine

Mocks make happy-path testing easy. Real services make failure testing possible. And failure testing is where you find the bugs that crash production.

Network Failure Simulation

Tools like Toxiproxy let you inject network failures into real service calls:

import { Toxiproxy } from 'toxiproxy-node-client';

describe('Payment Service - Network Resilience', () => {
  let toxiproxy: Toxiproxy;
  let paymentService: PaymentService;

  beforeAll(async () => {
    toxiproxy = new Toxiproxy('http://localhost:8474');

    // Create proxy for Stripe API
    await toxiproxy.createProxy({
      name: 'stripe_api',
      listen: '0.0.0.0:6789',
      upstream: 'api.stripe.com:443',
    });
  });

  it('should retry on network timeout', async () => {
    // Inject 5-second latency
    await toxiproxy.addToxic({
      proxy: 'stripe_api',
      type: 'latency',
      attributes: { latency: 5000 },
    });

    const start = Date.now();

    await expect(
      paymentService.processPayment({ amount: 1000 })
    ).rejects.toThrow('Request timeout');

    const duration = Date.now() - start;

    // Verify retry logic kicked in (3 retries = ~15 seconds)
    expect(duration).toBeGreaterThan(15000);
  });

  it('should handle connection reset', async () => {
    // Inject connection reset
    await toxiproxy.addToxic({
      proxy: 'stripe_api',
      type: 'reset_peer',
      attributes: { timeout: 0 },
    });

    await expect(
      paymentService.processPayment({ amount: 1000 })
    ).rejects.toThrow('Connection reset');
  });

  afterEach(async () => {
    // Remove toxics between tests
    await toxiproxy.removeToxic({ proxy: 'stripe_api' });
  });
});

Rate Limiting and Throttling

Test how your system handles API rate limits:

it('should respect rate limits', async () => {
  const apiClient = new ExternalAPIClient(testConfig.apiKey);
  const results: Array<'success' | 'throttled'> = [];

  // Hammer the API with 100 requests
  const requests = Array.from({ length: 100 }, async () => {
    try {
      await apiClient.getData();
      results.push('success');
    } catch (error) {
      if (error.statusCode === 429) {
        results.push('throttled');
      } else {
        throw error;
      }
    }
  });

  await Promise.allSettled(requests);

  // Verify rate limiting kicked in
  expect(results.filter(r => r === 'throttled').length).toBeGreaterThan(0);

  // Verify some requests succeeded (we're not completely blocked)
  expect(results.filter(r => r === 'success').length).toBeGreaterThan(0);
});

Achieving 90-95% Coverage: The Pragmatic Target

Let's talk numbers. 100% coverage is a fool's errand—you'll spend more time maintaining tests than writing features. But below 80%, you're flying blind. The sweet spot? 90-95% coverage with a strategic mix of test types.

The Modern Test Distribution

Guillermo Rauch's famous quote: "Write tests. Not too many. Mostly integration". Here's what that looks like in practice:

50-60% Unit Tests : Fast, focused, testing business logic in isolation
30-40% Integration Tests : Real services, testing component interactions
5-10% E2E Tests : Full system tests, critical user journeys

Graphic Suggestion 1 : Modified Testing Pyramid showing integration tests as the strategic middle layer, with callouts for "Real Database," "Real APIs," and "Real Message Queues."

Coverage Gaps to Prioritize

Focus your integration tests on these high-value areas:

Authentication/Authorization flows : Token refresh, permission checks, session management
Data persistence : Database transactions, constraint violations, migrations
External API integrations : Payment processing, email delivery, third-party data
Message queue operations : Event publishing, message consumption, dead-letter handling
Cache invalidation : When does the cache refresh? What happens on cache miss?

Measuring What Matters

Code coverage tools lie. They tell you lines executed, not behaviors validated. Track integration coverage separately:

// package.json
{
  "scripts": {
    "test:unit": "jest --coverage --coverageDirectory=coverage/unit",
    "test:integration": "jest --config=jest.integration.config.js --coverage --coverageDirectory=coverage/integration",
    "test:coverage": "node scripts/mergeCoverage.js"
  }
}


// scripts/mergeCoverage.js
import { mergeCoverageReports } from 'coverage-merge';

const unitCoverage = require('../coverage/unit/coverage-summary.json');
const integrationCoverage = require('../coverage/integration/coverage-summary.json');

const merged = mergeCoverageReports([unitCoverage, integrationCoverage]);

console.log('Combined Coverage Report:');
console.log(`Lines: ${merged.total.lines.pct}%`);
console.log(`Statements: ${merged.total.statements.pct}%`);
console.log(`Functions: ${merged.total.functions.pct}%`);
console.log(`Branches: ${merged.total.branches.pct}%`);

// Fail if below threshold
if (merged.total.lines.pct < 90) {
  console.error('❌ Coverage below 90% threshold');
  process.exit(1);
}

Graphic Suggestion 2 : Coverage dashboard mockup showing unit vs. integration coverage breakdown by module, with integration tests highlighting the "risky" areas (database, external APIs).

CI/CD Integration: Tests That Run Everywhere

Integration tests in CI/CD are tricky. They're slower than unit tests, require infrastructure, and need credentials. But they're also your last line of defense before production.

The Multi-Stage Pipeline

# .github/workflows/full-pipeline.yml
name: Full Test Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm ci
      - run: npm run test:unit
      - uses: codecov/codecov-action@v3
        with:
          files: ./coverage/unit/coverage-final.json
          flags: unit

  integration-tests:
    runs-on: ubuntu-latest
    # Only run on main/develop or when PR is marked ready
    if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop' || github.event.pull_request.draft == false

    services:
      # GitHub Actions provides service containers
      postgres:
        image: postgres:15-alpine
        env:
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432

      redis:
        image: redis:7-alpine
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 6379:6379

    env:
      TEST_DB_HOST: localhost
      TEST_DB_PORT: 5432
      STRIPE_TEST_KEY: ${{ secrets.STRIPE_TEST_KEY }}
      SENDGRID_TEST_KEY: ${{ secrets.SENDGRID_TEST_KEY }}

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm ci
      - run: npm run db:migrate:test
      - run: npm run test:integration
      - uses: codecov/codecov-action@v3
        with:
          files: ./coverage/integration/coverage-final.json
          flags: integration

  e2e-tests:
    runs-on: ubuntu-latest
    needs: [unit-tests, integration-tests]
    # Only run E2E on main branch or when explicitly requested
    if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-e2e')

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm ci
      - run: npm run test:e2e

Key patterns :

Unit tests run on every commit (fast feedback)
Integration tests run on main/develop and ready PRs (catch integration bugs before merge)
E2E tests run only on main or when explicitly requested (slow but comprehensive)

Graphic Suggestion 3 : CI/CD pipeline flowchart showing the multi-stage approach with conditionals (when to run which tests), including infrastructure setup (containers) and secret injection points.

Optimization: Cached Dependencies

Integration tests that rebuild Docker images every run waste time. Cache aggressively:

- name: Cache Docker layers
  uses: actions/cache@v3
  with:
    path: /tmp/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
    restore-keys: |
      ${{ runner.os }}-buildx-

- name: Pull Docker images
  run: |
    docker pull postgres:15-alpine
    docker pull redis:7-alpine

Parallel Execution in CI

Run independent integration test suites in parallel:

integration-tests:
  strategy:
    matrix:
      test-suite: [database, api, messaging, cache]

  steps:
    - run: npm run test:integration:${{ matrix.test-suite }}

Graphic Suggestion 4 : Test execution timeline showing serial vs. parallel execution, highlighting time savings from running database, API, messaging, and cache tests simultaneously.

Real-World Integration Test Example

Let's put it all together with a realistic e-commerce checkout flow:

// tests/integration/checkout.test.ts
import { TestEnvironment } from '../testSetup';
import { CheckoutService } from '../../src/services/CheckoutService';
import { StripePaymentProcessor } from '../../src/payments/StripePaymentProcessor';
import { SendGridEmailService } from '../../src/email/SendGridEmailService';
import { TestDataFactory } from '../testDataFactory';
import { TestCredentialManager } from '../testCredentials';

describe('Checkout Integration', () => {
  let testEnv: TestEnvironment;
  let checkoutService: CheckoutService;
  let credManager: TestCredentialManager;

  beforeAll(async () => {
    testEnv = new TestEnvironment();
    await testEnv.setup();
    credManager = new TestCredentialManager();
  });

  afterAll(async () => {
    await testEnv.cleanup();
  });

  beforeEach(async () => {
    // Clean state before each test
    await testEnv.getDbPool().query('TRUNCATE orders, payments, users CASCADE');
  });

  it('should complete full checkout with real payment and email', async () => {
    credManager.requireOrSkip('STRIPE_TEST_KEY', async () => {
      credManager.requireOrSkip('SENDGRID_TEST_KEY', async () => {
        // Arrange: Create test user with isolated data
        const user = await TestDataFactory.createIsolatedUser(testEnv.getDbPool());

        const paymentProcessor = new StripePaymentProcessor(
          credManager.get('STRIPE_TEST_KEY')!
        );

        const emailService = new SendGridEmailService(
          credManager.get('SENDGRID_TEST_KEY')!
        );

        checkoutService = new CheckoutService(
          testEnv.getDbPool(),
          paymentProcessor,
          emailService
        );

        const cart = {
          items: [
            { productId: 'prod_123', quantity: 2, price: 1999 },
            { productId: 'prod_456', quantity: 1, price: 4999 },
          ],
        };

        let orderId: string;

        try {
          // Act: Process checkout with REAL Stripe payment
          const result = await checkoutService.processCheckout({
            userId: user.id,
            cart,
            paymentMethod: {
              type: 'card',
              cardToken: 'tok_visa', // Stripe test token
            },
          });

          orderId = result.orderId;

          // Assert: Verify order created in REAL database
          const orderResult = await testEnv.getDbPool().query(
            'SELECT * FROM orders WHERE id = $1',
            [orderId]
          );
          expect(orderResult.rows).toHaveLength(1);
          expect(orderResult.rows[0].status).toBe('completed');
          expect(orderResult.rows[0].total_amount).toBe(8997);

          // Assert: Verify payment recorded
          const paymentResult = await testEnv.getDbPool().query(
            'SELECT * FROM payments WHERE order_id = $1',
            [orderId]
          );
          expect(paymentResult.rows).toHaveLength(1);
          expect(paymentResult.rows[0].status).toBe('succeeded');
          expect(paymentResult.rows[0].provider).toBe('stripe');

          // Assert: Verify email sent via REAL SendGrid
          const emails = await emailService.searchEmails({
            to: user.email,
            subject: 'Order Confirmation',
            limit: 1,
          });
          expect(emails).toHaveLength(1);
          expect(emails[0].body).toContain(orderId);

        } finally {
          // Cleanup: Cancel order and refund payment
          if (orderId) {
            await checkoutService.cancelOrder(orderId);
          }
        }
      });
    });
  });

  it('should handle payment failure gracefully', async () => {
    credManager.requireOrSkip('STRIPE_TEST_KEY', async () => {
      const user = await TestDataFactory.createIsolatedUser(testEnv.getDbPool());

      const paymentProcessor = new StripePaymentProcessor(
        credManager.get('STRIPE_TEST_KEY')!
      );

      checkoutService = new CheckoutService(
        testEnv.getDbPool(),
        paymentProcessor,
        new SendGridEmailService(credManager.get('SENDGRID_TEST_KEY')!)
      );

      const cart = {
        items: [{ productId: 'prod_789', quantity: 1, price: 9999 }],
      };

      // Act: Use Stripe's test token for declined card
      await expect(
        checkoutService.processCheckout({
          userId: user.id,
          cart,
          paymentMethod: {
            type: 'card',
            cardToken: 'tok_chargeDeclined', // Stripe test token for declined
          },
        })
      ).rejects.toThrow('Payment declined');

      // Assert: Verify order marked as failed
      const orderResult = await testEnv.getDbPool().query(
        'SELECT * FROM orders WHERE user_id = $1',
        [user.id]
      );
      expect(orderResult.rows).toHaveLength(1);
      expect(orderResult.rows[0].status).toBe('payment_failed');

      // Assert: No successful payment recorded
      const paymentResult = await testEnv.getDbPool().query(
        'SELECT * FROM payments WHERE status = $1',
        ['succeeded']
      );
      expect(paymentResult.rows).toHaveLength(0);
    });
  });
});

This test validates:

Real PostgreSQL database operations (order creation, payment recording)
Real Stripe payment processing (using their test mode)
Real SendGrid email delivery (using sandbox mode)
Proper error handling with failed payments
Complete cleanup even on test failure

Graphic Suggestion 5 : Sequence diagram of the checkout flow showing interactions between test code → database → Stripe API → SendGrid API, with annotations for assertion points and cleanup steps.

Common Pitfalls and How to Avoid Them

After years of real-service testing, here are the traps I see teams fall into:

Pitfall 1: Flaky Tests Due to Timing

Problem : Test passes locally, fails in CI randomly.

Solution : Never use arbitrary timeouts. Use explicit waits:

// ❌ Bad: Arbitrary timeout
await sleep(1000);
expect(order.status).toBe('completed');

// ✅ Good: Wait for condition
await waitFor(
  async () => {
    const order = await getOrder(orderId);
    return order.status === 'completed';
  },
  { timeout: 5000, interval: 100 }
);

Pitfall 2: Test Data Pollution

Problem : Tests interfere with each other, random failures.

Solution : Unique identifiers + cleanup before tests (as shown earlier).

Pitfall 3: Ignoring Test Performance

Problem : Integration suite takes 30 minutes, developers stop running it.

Solution : Parallelize, cache dependencies, and set time budgets:

// jest.integration.config.js
module.exports = {
  testTimeout: 10000, // 10 seconds max per test
  maxWorkers: '50%', // Use half CPU cores for parallel execution
  setupFilesAfterEnv: ['<rootDir>/tests/testSetup.ts'],
};

If a test exceeds 10 seconds, it needs optimization or should become an E2E test.

Pitfall 4: Over-Testing Edge Cases

Problem : 1000 tests, 90% test the same happy path.

Solution : Use test matrices for edge cases:

describe.each([
  { input: 'valid@email.com', expected: true },
  { input: 'invalid', expected: false },
  { input: 'no@domain', expected: false },
  { input: '', expected: false },
  { input: null, expected: false },
])('Email validation', ({ input, expected }) => {
  it(`should return ${expected} for "${input}"`, async () => {
    const result = await validateEmail(input);
    expect(result).toBe(expected);
  });
});

The Bottom Line: Tests That Earn Trust

Real service testing isn't about perfection. It's about confidence. When your integration tests pass, you should feel comfortable deploying to production. When they fail, you should trust that they caught a real bug, not a mock mismatch.

Here's my systematic checklist for building that confidence:

Environment Setup : Use containers to mirror production services
Credential Management : Secure secrets, graceful degradation when missing
Cleanup Strategy : Clean before tests, use try-finally for external services
Data Isolation : Unique identifiers to prevent test interference
Error Scenarios : Test failures, timeouts, rate limits with real service simulation
Coverage Target : Aim for 90-95% with strategic test distribution
CI/CD Integration : Multi-stage pipeline with caching and parallelization

Integration testing with real services requires more setup than mocks. It's slower. It's more complex. But when done right, it's the difference between "we think it works" and "we know it works."

Now go forth and test with real databases, real APIs, and real confidence.

Integration Testing Architecture

The Modified Test Pyramid for Real Services

While the traditional test pyramid emphasizes unit tests at the base, real-service integration testing requires a different balance:

Integration tests take a larger share when testing complex external service interactions.

Real Service Test Environment Flow

A production-grade integration test follows this lifecycle:

This ensures tests are isolated and idempotent, running reliably in CI/CD pipelines.

References

: [1] Cohn, M. (2009). Succeeding with Agile: Software Development Using Scrum. The Testing Pyramid

: [2] Hauer, P. (2019). Focus on Integration Tests Instead of Mock-Based Tests. https://phauer.com/2019/focus-integration-tests-mock-based-tests/

: [3] Hauer, P. (2019). Integration testing tools and practices. Focus on Integration Tests Instead of Mock-Based Tests

: [4] Stack Overflow Community. (2018). Is it considered a good practice to mock in integration tests? https://stackoverflow.com/questions/52107522/

: [5] Server Fault Community. Credentials management within CI/CD environment. https://serverfault.com/questions/924431/

: [6] Rojek, M. (2021). Idempotence in Software Testing. https://medium.com/@rojek.mac/idempotence-in-software-testing-b8fd946320c5

: [7] Software Engineering Stack Exchange. Cleanup & Arrange practices during integration testing to avoid dirty databases. https://softwareengineering.stackexchange.com/questions/308666/

: [8] Stack Overflow Community. What strategy to use with xUnit for integration tests when knowing they run in parallel? https://stackoverflow.com/questions/55297811/

: [9] LinearB. Test Coverage Demystified: A Complete Introductory Guide. https://linearb.io/blog/test-coverage-demystified

: [10] Web.dev. Pyramid or Crab? Find a testing strategy that fits. https://web.dev/articles/ta-strategies

Originally published at kanaeru.ai

DEV Community