DEV Community

Cover image for How to Use GPT-5.4 API
Wanda
Wanda

Posted on • Originally published at apidog.com

How to Use GPT-5.4 API

Quick Answer

To use GPT-5.4 API: Install the OpenAI SDK (pip install openai), initialize the client with your API key, and call chat.completions.create() with model "gpt-5.4". Key features: native computer use (browser automation), tool search (47% token reduction), 1M context window, and vision/image processing. Pricing: $2.50/M input tokens, $15/M output tokens. This guide covers setup, practical code, computer use, tool integration, and production best practices.

Try Apidog today

Introduction

GPT-5.4 introduces native computer use, efficient tool search, and large context windows. To leverage these in real-world apps, you need to know how to implement browser automation, tool orchestration, image processing, and manage large code or document contexts.

This guide provides actionable code for every major GPT-5.4 feature: computer use automation, tool search for MCP servers, high-res image processing, long-context codebases, and production cost optimization.

Whether automating browser workflows, building AI agents, or integrating GPT-5.4 into your stack, this reference gives you the code and architectural patterns you need.

💡 When integrating GPT-5.4 APIs, use Apidog to design, test, and document endpoints. Apidog enables you to debug requests, automate tests, mock responses, and generate docs—essential for building robust AI features that combine GPT-5.4 with other services.

button


Quick Start: Your First GPT-5.4 Request

Get started in 5 minutes. Before coding, test your GPT-5.4 API requests in Apidog:

  1. Create a new HTTP POST to https://api.openai.com/v1/chat/completions
  2. Add Authorization header: Bearer YOUR_API_KEY
  3. Set request body with model, messages, and parameters
  4. Send and inspect the response
  5. Save to a collection for repeated testing
  6. Use environment variables for different API keys

Apidog API Request Screenshot

Visual API testing helps you debug faster and understand request/response structure before coding.

Prerequisites

  • OpenAI account with billing enabled
  • API key from platform.openai.com/api-keys
  • Python 3.7+ or Node.js 14+

Python Quick Start

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY")
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to sort a list of dictionaries by a key."}
    ]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Node.js Quick Start

const OpenAI = require('openai');

const client = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
});

async function main() {
    const response = await client.chat.completions.create({
        model: 'gpt-5.4',
        messages: [
            { role: 'system', content: 'You are a helpful coding assistant.' },
            { role: 'user', content: 'Write a Python function to sort a list of dictionaries by a key.' }
        ]
    });

    console.log(response.choices[0].message.content);
}

main();
Enter fullscreen mode Exit fullscreen mode

Expected Output

def sort_dicts_by_key(dict_list, key, reverse=False):
    """
    Sort a list of dictionaries by a specified key.

    Args:
        dict_list: List of dictionaries to sort
        key: The dictionary key to sort by
        reverse: If True, sort in descending order

    Returns:
        Sorted list of dictionaries
    """
    return sorted(dict_list, key=lambda x: x.get(key, ''), reverse=reverse)

# Example usage
data = [
    {'name': 'Alice', 'age': 30},
    {'name': 'Bob', 'age': 25},
    {'name': 'Charlie', 'age': 35}
]

sorted_by_age = sort_dicts_by_key(data, 'age')
print(sorted_by_age)
# [{'name': 'Bob', 'age': 25}, {'name': 'Alice', 'age': 30}, {'name': 'Charlie', 'age': 35}]
Enter fullscreen mode Exit fullscreen mode

Understanding GPT-5.4 Capabilities

GPT-5.4 excels in four practical areas. Match your use case to the right feature set:

1. Knowledge Work

  • Spreadsheet creation/analysis
  • Presentation/document generation
  • Financial modeling and data analysis

Knowledge Work

2. Computer Use

  • Browser automation
  • Data entry across apps
  • Web scraping with interaction
  • Workflow testing
  • Cross-application automation

Computer Use

3. Coding

  • Full-stack and frontend code generation
  • Debugging and code refactoring
  • Test creation

Coding

4. Tool Integration

  • MCP server integrations
  • Multi-step API workflows
  • External tool orchestration

Tool Integration


Computer Use API

GPT-5.4 can operate computers through screenshots, mouse, and keyboard events.

Computer Use Example

Test every computer use API endpoint in Apidog: screenshot upload, command execution, and multi-step workflows.

Implementation steps:

  • Validate screenshot upload endpoints
  • Test APIs for click/type/scroll
  • Mock responses for each action
  • Automate multi-turn workflow tests
  • Document the API contract for your team

How Computer Use Works

  1. Send a screenshot of the current screen state
  2. Model analyzes the UI and returns action commands (click, type, etc.)
  3. Your app executes commands and sends a new screenshot
  4. Repeat until the task completes

Basic Computer Use Setup

from openai import OpenAI
import base64

client = OpenAI()

def take_screenshot():
    import pyautogui
    screenshot = pyautogui.screenshot()
    import io
    buffer = io.BytesIO()
    screenshot.save(buffer, format='PNG')
    return base64.b64encode(buffer.getvalue()).decode('utf-8')

def execute_computer_command(command):
    import pyautogui
    action = command.get('action')
    if action == 'click':
        x, y = command.get('coordinate', [0, 0])
        pyautogui.click(x, y)
    elif action == 'type':
        text = command.get('text', '')
        pyautogui.write(text, interval=0.05)
    elif action == 'scroll':
        amount = command.get('scroll_amount', 0)
        pyautogui.scroll(amount)
    elif action == 'keypress':
        key = command.get('key', '')
        pyautogui.press(key)
    return take_screenshot()

messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Navigate to gmail.com and log in with the credentials I provided."},
        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{take_screenshot()}"}}
    ]
}]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    tools=[{
        "type": "computer",
        "display_width": 1920,
        "display_height": 1080,
        "display_number": 1
    }],
    tool_choice="required"
)

for tool_call in response.choices[0].message.tool_calls:
    if tool_call.type == "computer":
        command = tool_call.function.arguments
        new_screenshot = execute_computer_command(command)
        messages.append({"role": "assistant", "content": response.choices[0].message.content})
        messages.append({"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{new_screenshot}"}}
        ]})
Enter fullscreen mode Exit fullscreen mode

Computer Use Safety Policies

Set safety levels for actions:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    tools=[{
        "type": "computer",
        "display_width": 1920,
        "display_height": 1080,
        "confirmation_policy": "always"  # "never" or "selective" also available
    }],
    system_message="""You are operating a computer. Follow these safety rules:
    1. Never enter credentials without explicit user confirmation
    2. Ask before deleting files or data
    3. Confirm before sending emails or messages
    4. Report any errors or unexpected states immediately
    """
)
Enter fullscreen mode Exit fullscreen mode

Browser Automation Example

Automate browser workflows with Playwright:

from playwright.sync_api import sync_playwright
import base64

def browser_automation_workflow():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto("https://example.com")
        screenshot = page.screenshot()
        screenshot_b64 = base64.b64encode(screenshot).decode('utf-8')

        messages = [{
            "role": "user",
            "content": [
                {"type": "text", "text": "Find the login form and fill it out."},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_b64}"}}
            ]
        }]

        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=messages,
            tools=[{"type": "computer"}],
            tool_choice="required"
        )

        for tool_call in response.choices[0].message.tool_calls:
            if tool_call.type == "computer":
                command = json.loads(tool_call.function.arguments)
                if command.get('action') == 'click':
                    x, y = command.get('coordinate', [0, 0])
                    page.mouse.click(x, y)
                elif command.get('action') == 'type':
                    page.keyboard.type(command.get('text', ''))
                new_screenshot = page.screenshot()
                # ... continue loop
Enter fullscreen mode Exit fullscreen mode

Email and Calendar Automation

Automate multi-step workflows:

def process_email_and_schedule_meeting():
    workflow_prompt = """
    Complete this workflow:
    1. Open Gmail and find unread emails from the last 24 hours
    2. Identify any meeting requests or scheduling questions
    3. For each meeting request:
       - Extract proposed dates/times
       - Note attendees and meeting purpose
    4. Open Google Calendar and check availability
    5. Send calendar invites for confirmed meetings
    6. Reply to emails confirming the scheduled time

    Report back with a summary of what was accomplished.
    """

    screenshot = take_screenshot()

    messages = [{
        "role": "user",
        "content": [
            {"type": "text", "text": workflow_prompt},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot}"}}
        ]
    }]

    for turn in range(10):
        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=messages,
            tools=[{"type": "computer"}],
            tool_choice="required"
        )
        if "complete" in response.choices[0].message.content.lower():
            print(f"Workflow completed in {turn + 1} turns")
            break
        # Execute commands and update messages as shown earlier
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

From real-world testing on 30K property tax portals:

  • 95% first-attempt success
  • 3x faster than previous models
  • 70% fewer tokens/session

Tips:

  1. Use high-res screenshots (1920x1080+)
  2. Be specific in task descriptions
  3. Limit conversation turns
  4. Cache screenshots when possible
  5. Use selective confirmations for trusted flows

Tool Search and Integration

Tool search reduces token usage by 47% and supports large tool ecosystems.

How Tool Search Works

Instead of loading all tool definitions, send a lightweight list. The model requests full definitions when needed.

Tool Search

Basic Tool Search Setup

# Lightweight tool list
available_tools = [
    {"name": "get_weather", "description": "Get current weather for a location"},
    {"name": "send_email", "description": "Send an email to a recipient"},
    {"name": "calendar_search", "description": "Search calendar for events"}
    # ... more tools
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo and send it to my team?"}],
    tools=available_tools,
    tool_choice="auto"
)

# Provide full tool definition if/when requested by the model
Enter fullscreen mode Exit fullscreen mode

MCP Server Integration

Example: Scale's MCP Atlas benchmark—47% token reduction.

mcp_servers = [
    {"name": "filesystem", "description": "File system operations", "tool_count": 12},
    {"name": "database", "description": "Database query operations", "tool_count": 8},
    {"name": "web-search", "description": "Web search and scraping", "tool_count": 15}
    # ... up to 36 MCP servers
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Find all Python files modified today and search for TODO comments."}],
    tools=mcp_servers,
    parallel_tool_calls=True
)
Enter fullscreen mode Exit fullscreen mode

Toolathlon-Style Multi-Step Workflows

def grade_assignments_workflow():
    workflow_steps = """
    1. Read emails from students with assignment attachments
    2. Download each attachment
    3. Upload to grading portal
    4. Grade each assignment using rubric
    5. Record grades in spreadsheet
    6. Send confirmation emails to students
    """

    tools = [
        {"name": "email_read", "description": "Read emails from inbox"},
        {"name": "email_send", "description": "Send emails"},
        {"name": "file_download", "description": "Download file attachments"},
        {"name": "file_upload", "description": "Upload files to web portal"},
        {"name": "web_form_fill", "description": "Fill and submit web forms"},
        {"name": "spreadsheet_write", "description": "Write data to spreadsheet"},
        {"name": "rubric_evaluate", "description": "Evaluate work against rubric"}
    ]

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": workflow_steps}],
        tools=tools,
        parallel_tool_calls=True
    )
Enter fullscreen mode Exit fullscreen mode

Vision and Image Processing

GPT-5.4 supports up to 10.24M pixel image detail for advanced visual tasks.

Image Detail Levels

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/high-res-image.jpg",
                    "detail": "original"  # "original", "high", or "low"
                }
            },
            {"type": "text", "text": "Analyze this technical diagram."}
        ]
    }]
)
Enter fullscreen mode Exit fullscreen mode

Document Parsing Example

def parse_complex_document(pdf_path):
    from pdf2image import convert_from_path
    pages = convert_from_path(pdf_path, dpi=300)
    messages = [{"role": "user", "content": []}]
    for i, page in enumerate(pages[:5]):
        import io, base64
        buffer = io.BytesIO()
        page.save(buffer, format='PNG')
        img_b64 = base64.b64encode(buffer.getvalue()).decode()
        messages[0]["content"].append({
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "high"}
        })
    messages[0]["content"].append({
        "type": "text",
        "text": """
        Extract all data from this document:
        1. Tables with row/column headers
        2. Key figures and their captions
        3. Summary statistics mentioned in text
        Return as structured JSON.
        """
    })

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=messages
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

UI Screenshot Analysis

def analyze_ui_screenshot(screenshot_path):
    with open(screenshot_path, 'rb') as f:
        img_b64 = base64.b64encode(f.read()).decode()

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "original"}
                },
                {
                    "type": "text",
                    "text": """
                    Review this UI screenshot for accessibility issues:
                    1. Color contrast problems
                    2. Missing labels or alt text indicators
                    3. Keyboard navigation issues (visible focus states)
                    4. Text size and readability
                    5. Screen reader compatibility concerns

                    List issues with specific locations and severity.
                    """
                }
            ]
        }]
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Long Context Workflows

GPT-5.4 supports up to 1M token context windows (experimental).

Standard Context (272K tokens)

with open('large_codebase.py', 'r') as f:
    code = f.read()

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a code review assistant."},
        {"role": "user", "content": f"""
        Review this codebase for:
        1. Security vulnerabilities
        2. Performance issues
        3. Code style inconsistencies
        4. Missing error handling

        Code:
        {code}
        """}
    ],
    max_tokens=4000
)
Enter fullscreen mode Exit fullscreen mode

Extended Context (1M tokens)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": large_document}],
    extra_body={
        "model_context_window": 1048576,  # 1M tokens
        "model_auto_compact_token_limit": 272000  # Auto-compact after 272K
    }
)
# Requests >272K tokens are billed at 2x usage rate
Enter fullscreen mode Exit fullscreen mode

Multi-Document Analysis

def analyze_multiple_documents(documents):
    content_parts = []
    for i, doc in enumerate(documents):
        content_parts.append(f"=== Document {i+1}: {doc['title']} ===\n")
        content_parts.append(doc['content'][:50000])
        content_parts.append("\n\n")
    combined_content = "".join(content_parts)
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{
            "role": "user",
            "content": f"""
            Analyze these documents and provide:
            1. Summary of key themes across all documents
            2. Contradictions or inconsistencies between documents
            3. Action items mentioned in any document
            4. Timeline of events if applicable

            {combined_content}
            """
        }],
        max_tokens=8000
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Coding and Development Workflows

GPT-5.4 matches GPT-5.3-Codex on SWE-Bench Pro (57.7%) and adds computer use capabilities.

Frontend Generation

def generate_frontend_component(spec):
    prompt = f"""
    Create a complete React component based on this specification:

    {spec}

    Requirements:
    1. Functional component with hooks
    2. TypeScript types for all props and state
    3. Tailwind CSS for styling
    4. Responsive design (mobile, tablet, desktop)
    5. Accessibility (ARIA labels, keyboard navigation)
    6. Unit tests with Jest/React Testing Library

    Return complete code for:
    - Component file (.tsx)
    - Styles (if not Tailwind)
    - Test file (.test.tsx)
    """

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=6000
    )
    return response.choices[0].message.content

theme_park_spec = """
Create an interactive isometric theme park simulation game:
- Tile-based path placement
- Ride and scenery construction
- Guest pathfinding and queueing
- Park metrics (money, guests, happiness, cleanliness)
- Browser-playable with Playwright testing
- Generated isometric assets
"""

component_code = generate_frontend_component(theme_park_spec)
Enter fullscreen mode Exit fullscreen mode

Debugging Complex Issues

def debug_with_full_context(error_logs, codebase_files, stack_trace):
    context = f"""
    ERROR LOGS:
    {error_logs}

    STACK TRACE:
    {stack_trace}

    RELEVANT CODE FILES:
    {codebase_files}

    Task: Identify the root cause and provide a fix.
    Consider:
    1. Race conditions or timing issues
    2. Memory leaks or resource exhaustion
    3. Incorrect assumptions about data flow
    4. Edge cases not handled
    5. External dependency issues

    Provide:
    1. Root cause analysis
    2. Specific code changes needed
    3. Tests to prevent regression
    """

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": context}],
        max_tokens=4000
    )

    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Playwright Interactive Testing

def playwright_interactive_debug():
    prompt = """
    Build a todo web application and test it as you build:

    1. Create HTML structure
    2. Add CSS styling
    3. Implement JavaScript functionality
    4. After each feature, use Playwright to:
       - Verify element visibility
       - Test user interactions
       - Check state persistence
       - Validate edge cases

    Report any issues found during testing and fix them.
    """

    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": prompt}],
        tools=[{"type": "playwright_interactive"}],
        max_tokens=8000
    )

    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

Streaming delivers lower latency for long completions.

Python Streaming

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Write a detailed explanation of quantum computing."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Node.js Streaming

const stream = await client.chat.completions.create({
    model: 'gpt-5.4',
    messages: [{ role: 'user', content: 'Write a detailed explanation of quantum computing.' }],
    stream: true
});

for await (const chunk of stream) {
    if (chunk.choices[0].delta.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}
Enter fullscreen mode Exit fullscreen mode

Streaming with Token Counting

def stream_with_usage(stream):
    total_tokens = 0
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            print(content, end="", flush=True)
            total_tokens += len(content) // 4  # Approximate
        if chunk.usage:
            print(f"\n\nUsage: {chunk.usage.total_tokens} tokens")
    return total_tokens
Enter fullscreen mode Exit fullscreen mode

Error Handling and Retry Logic

Production code should implement robust error handling.

Comprehensive Error Handling

from openai import OpenAI, RateLimitError, APIError, AuthenticationError
import time

client = OpenAI()

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
                max_tokens=2000,
                temperature=0.7
            )
            return response

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except APIError as e:
            if e.status_code >= 500:
                if attempt == max_retries - 1:
                    raise
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise

        except AuthenticationError:
            print("Invalid API key. Check your credentials.")
            raise

        except Exception as e:
            print(f"Unexpected error: {e}")
            raise

    raise Exception("Max retries exceeded")

# Usage
try:
    response = make_request_with_retry([
        {"role": "user", "content": "Hello, GPT-5.4!"}
    ])
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Request failed: {e}")
Enter fullscreen mode Exit fullscreen mode

Timeout Handling

import httpx

client = OpenAI(
    timeout=httpx.Timeout(60.0, connect=10.0)
)

try:
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Long-running task..."}]
    )
except httpx.TimeoutException:
    print("Request timed out. Consider using streaming or reducing complexity.")
Enter fullscreen mode Exit fullscreen mode

Production Best Practices

Using Apidog for Production API Workflows

Before deploying, set up test automation and monitoring:

API Testing Pipeline:

  • Use Apidog to build test suites for all scenarios (success/error)
  • Automate API tests in CI/CD to catch breaking changes
  • Mock GPT-5.4 responses for integration tests to save tokens
  • Generate docs automatically from tested requests

Apidog Testing Pipeline

Team Collaboration:

  • Share API collections for consistent integration
  • Use environments for dev/staging/prod keys
  • Document request behaviors and edge cases

Teams using Apidog report 40-60% faster API integration cycles due to unified debugging, automation, and documentation.


Cost Optimization Strategies

Prompt Optimization

# Bad prompt (wastes tokens)
bad_prompt = """
Hello! I hope you're doing well. I was wondering if you could possibly help me
with something. I have this code here and I'm not quite sure what it does.
Could you please explain it to me? Here's the code:
""" + code

# Good prompt (concise)
good_prompt = f"Explain what this code does:\n{code}"

# Save ~50 tokens per request (~$125/month at 1M requests)
Enter fullscreen mode Exit fullscreen mode

Response Length Control

# Limit max tokens
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Summarize this article."}],
    max_tokens=200
)

# Use stop sequences
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "List 5 items."}],
    stop=["\n\n", "6."]
)
Enter fullscreen mode Exit fullscreen mode

Batch Processing

from openai import OpenAI
import json

client = OpenAI()
batch_requests = []
for article in articles:
    batch_requests.append({
        "custom_id": article["id"],
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-5.4",
            "messages": [{"role": "user", "content": article["content"]}]
        }
    })

batch_file = client.files.create(
    file=json.dumps(batch_requests),
    purpose="batch"
)
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)
# 50% cost savings for non-real-time jobs
Enter fullscreen mode Exit fullscreen mode

Caching Repeated Requests

import hashlib
import json

class ResponseCache:
    def __init__(self):
        self.cache = {}

    def _get_key(self, messages):
        return hashlib.md5(json.dumps(messages).encode()).hexdigest()

    def get_or_create(self, client, messages, **kwargs):
        key = self._get_key(messages)
        if key in self.cache:
            return self.cache[key]
        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=messages,
            **kwargs
        )
        self.cache[key] = response
        return response

# Usage
cache = ResponseCache()
response = cache.get_or_create(client, messages)
Enter fullscreen mode Exit fullscreen mode

Conclusion

GPT-5.4 enables new classes of AI automation: browser workflows, cross-app integration, advanced vision, and massive context. Tool search reduces costs. 1M-token context processes entire codebases.

Production integration requires thorough API testing, debugging, and documentation. Apidog provides a unified platform for end-to-end API lifecycle.

button

Whether you're building agents, automating tasks, or launching customer-facing features, solid API development practices reduce bugs and accelerate delivery.

Start with basic completions, then layer in computer use, tool search, and vision as needed. Monitor costs early and optimize prompts, batching, and caching.


FAQ

How do I use GPT-5.4 computer use feature?

Use the computer tool in API requests. Send screenshots as images, receive computer commands (click, type, scroll) in response. Execute commands using pyautogui or Playwright, then send new screenshots. Loop until done. Set safety policies as needed.

What is tool search and how do I enable it?

Tool search loads tool definitions on demand, reducing token usage by 47%. Enable by sending a lightweight tool list; the model requests full definitions if needed. Works automatically with MCP servers.

How do I use the 1M token context window?

Configure via extra_body parameters: model_context_window: 1048576, model_auto_compact_token_limit: 272000. Requests >272K tokens count at 2x usage. Experimental in Codex.

What is the difference between gpt-5.4 and gpt-5.4-pro?

GPT-5.4 Pro delivers higher accuracy on complex reasoning (89.3% vs 82.7% on BrowseComp) but costs 12x more ($30/$180 vs $2.50/$15). Use standard for most workloads; Pro for maximum accuracy.

How do I reduce GPT-5.4 API costs?

Cache repeated inputs (up to 90% savings), optimize prompts, set max_tokens, use Batch API (50% discount), implement response caching, and set appropriate image detail levels.

Can GPT-5.4 process multiple images in one request?

Yes. Include multiple image_url content parts in a single message—ideal for multi-page documents, comparisons, or sequential screenshots.

How do I handle rate limits in production?

Use exponential backoff retries (1s, 2s, 4s...), Batch API for bulk jobs, distribute requests to smooth traffic, and request higher limits for high-volume apps.

What programming languages does GPT-5.4 support best?

Best: Python, JavaScript/TypeScript, React, Node.js, common web stacks. Also strong in Java, Go, Rust, SQL. Matches GPT-5.3-Codex performance (57.7% SWE-Bench Pro).

How do I stream GPT-5.4 responses?

Set stream=True in requests. Iterate over response chunks and process delta. Reduces latency for long responses.

Is GPT-5.4 suitable for production workloads?

Yes. GPT-5.4 has 33% fewer factual errors than GPT-5.2, is more token-efficient, and supports robust error handling. Implement retry logic, monitoring, and cost tracking for production use.

Top comments (0)