Abe Wheeler

Posted on Feb 3 • Originally published at sunpeak.ai

The Complete Guide to Testing ChatGPT Apps

#mcp #chatgpt #webdev #react

Testing ChatGPT Apps presents unique challenges. Your UI runs inside ChatGPT's runtime, responds to tool invocations, and adapts to multiple display modes and themes. Without proper testing infrastructure, you're deploying blind.

TL;DR: Use sunpeak's built-in testing with Vitest for unit tests (pnpm test) and Playwright for e2e tests (pnpm test:e2e). Define states in simulation files, test across display modes with createSimulatorUrl, and run everything in CI.

This guide covers everything you need to test ChatGPT Apps and MCP Apps with confidence.

Why Testing ChatGPT Apps is Different

ChatGPT Apps run in a specialized runtime environment. Your React components don't just render in a browser—they render inside ChatGPT's Apps SDK runtime with:

ChatGPT frontend state - Inline, in picture-in-picture, and fullscreen display modes, light or dark theme, etc.
Tool invocations - ChatGPT calls your app's tools with specific inputs
Backend state - Various possible states for users and sessions in your database
Widget state - Persistent state that survives across invocations

Testing each combination manually isn't feasible, the combinatorics are brutal.
You need automated testing that covers all these scenarios.

Setting Up Your Testing Environment

If you're using the sunpeak framework, testing is pre-configured. Start with:

pnpm add -g sunpeak && sunpeak new
cd my-app

Your project includes:

Vitest configured with jsdom, React Testing Library, and jest-dom matchers
Playwright configured to test against the ChatGPT simulator
Simulation files in tests/simulations/ for deterministic states

Unit Testing with Vitest

Unit tests validate individual components in isolation. Run them with:

pnpm test

Create tests alongside your components in src/resources with the .test.tsx extension:

import { render, screen } from '@testing-library/react';
import { Counter } from '../src/resources/counter-resource';

describe('Counter', () => {
  it('renders the initial count', () => {
    render(<Counter />);
    expect(screen.getByText('0')).toBeInTheDocument();
  });

  it('increments when button is clicked', async () => {
    render(<Counter />);
    await userEvent.click(screen.getByRole('button', { name: /increment/i }));
    expect(screen.getByText('1')).toBeInTheDocument();
  });
});

Unit tests run fast and catch component-level bugs early. They're ideal for testing:

Component rendering logic
User interactions within a component
Props and state handling

End-to-End Testing with Playwright

E2E tests validate your app running in the ChatGPT simulator. Run them with:

pnpm test:e2e

Create tests in tests/e2e/ with the .spec.ts extension:

import { test, expect } from '@playwright/test';
import { createSimulatorUrl } from 'sunpeak/chatgpt';

test('counter increments in fullscreen mode', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'counter-show',
    displayMode: 'fullscreen',
    theme: 'dark',
  }));

  await page.getByRole('button', { name: /increment/i }).click();
  await expect(page.getByText('1')).toBeVisible();
});

The createSimulatorUrl utility generates URLs with your test configuration:

simulation - Your simulation file name (sets initial state)
displayMode - inline, pip, or fullscreen (tests display adaptation)
theme - light or dark (tests theme handling)
deviceType - mobile, tablet, desktop, or unknown (tests responsive behavior)
touch / hover - Enable or disable touch/hover capabilities
safeAreaTop, safeAreaBottom, etc. - Simulate device notches and insets

Creating Simulation Files

Simulation files define deterministic states for testing. Create them in tests/simulations/{resource-name}/:

{
  "userMessage": "Show me a counter starting at 5",
  "tool": {
    "name": "show_counter",
    "description": "Displays an interactive counter",
    "inputSchema": {
      "type": "object",
      "properties": {
        "initialCount": { "type": "number" }
      }
    }
  },
  "callToolRequestParams": {
    "arguments": { "initialCount": 5 }
  },
  "callToolResult": {
    "content": [{ "type": "text", "text": "Counter displayed" }],
    "structuredContent": {
      "count": 5
    }
  }
}

This simulation:

Shows userMessage in the simulator chat interface
Defines the tool with its name and input schema
Sets callToolRequestParams with mock input accessible via useToolInput()
Provides callToolResult with mock data passed to your component via useWidgetProps()

Use simulations to test specific states without manual setup:

// Test the counter with structuredContent.count = 5
await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
await expect(page.getByText('5')).toBeVisible();

// Test a different initial state
await page.goto(createSimulatorUrl({ simulation: 'counter-initial' }));
await expect(page.getByText('0')).toBeVisible();

Testing Across Display Modes

ChatGPT Apps appear in three display modes. Test all of them:

const displayModes = ['inline', 'pip', 'fullscreen'] as const;

for (const displayMode of displayModes) {
  test(`renders correctly in ${displayMode} mode`, async ({ page }) => {
    await page.goto(createSimulatorUrl({
      simulation: 'counter-show',
      displayMode,
    }));

    await expect(page.getByRole('button')).toBeVisible();
  });
}

Each mode has different constraints:

Inline - Limited height, embedded in chat
Picture-in-picture - Floating window, can be repositioned
Fullscreen - Maximum space, modal overlay

Your app should adapt gracefully to each.

Testing Theme Adaptation

Test both light and dark themes:

test('adapts to dark theme', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'counter-show',
    theme: 'dark',
  }));

  // Verify dark theme styles are applied
  const button = page.getByRole('button');
  await expect(button).toHaveCSS('background-color', 'rgb(255, 184, 0)');
});

Running Tests in CI/CD

Add testing to your GitHub Actions workflow:

name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - run: pnpm install
      - run: pnpm test
      - run: pnpm exec playwright install --with-deps
      - run: pnpm test:e2e

Playwright tests automatically:

Start the sunpeak dev server
Wait for it to be ready
Run tests against the ChatGPT simulator
Shut down when complete

Debugging Failing Tests

When tests fail, use these debugging techniques:

Playwright Debug Mode

pnpm test:e2e --ui

Opens a visual debugger where you can:

Step through tests
Inspect the DOM at each step
See screenshots and traces

Vitest Verbose Output

pnpm test --reporter=verbose

Shows detailed output including:

Individual assertion results
Component render output
Error stack traces

Screenshot on Failure

Playwright automatically captures screenshots on failure. Find them in test-results/.

Testing Best Practices

One assertion per test. Keep tests focused and easy to debug:

// Good: focused test
test('increment button is visible', async ({ page }) => {
  await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
  await expect(page.getByRole('button', { name: /increment/i })).toBeVisible();
});

// Avoid: multiple unrelated assertions
test('counter works', async ({ page }) => {
  // Too many things being tested at once
});

Test behavior, not implementation. Focus on what users see:

// Good: tests user-visible behavior
await expect(page.getByText('5')).toBeVisible();

// Avoid: tests implementation details
await expect(component.state.count).toBe(5);

Use descriptive test names. Make failures self-explanatory:

// Good: clear failure message
test('displays error message when API call fails', ...)

// Avoid: vague description
test('handles error', ...)

Clean up between tests. Reset state to avoid test pollution:

afterEach(async () => {
  // Reset any global state
});

Next Steps

Testing is essential for shipping reliable ChatGPT Apps and MCP Apps. With sunpeak's testing infrastructure, you can:

Run unit tests with Vitest for fast feedback
Run e2e tests with Playwright for full integration coverage
Test across display modes, themes, and device types
Integrate testing into your CI/CD pipeline

Get started with sunpeak:

pnpm add -g sunpeak && sunpeak new

DEV Community