DEV Community: Wanda

How to use the Grok text to video API (complete guide)

Wanda — Fri, 03 Apr 2026 08:39:26 +0000

TL;DR

The Grok text-to-video API lets you generate video clips from text prompts. You POST /v1/videos/generations with your prompt, receive a request_id, then poll GET /v1/videos/{request_id} until "status" is "done". Model: grok-imagine-video. Pricing starts at $0.05/sec at 480p. The xAI Python SDK handles polling out-of-the-box.

Try Apidog today

Introduction

xAI generated 1.2 billion videos in January 2026, the first month after launching the Grok text-to-video API. Grok also ranked #1 on the Artificial Analysis text-to-video leaderboard that month, proving the platform's scale and robustness.

This is a hands-on guide for developers to:

Make your first video generation request
Poll for results and handle async flow
Tune parameters (duration, resolution, aspect ratio)
Write effective prompts
Use reference images
Extend or edit existing videos
Know when to use text-to-video vs image-to-video

💡 Note: The Grok video API is async. Your frontend/app shouldn't block waiting for video generation. For UI dev/testing, use Apidog's Smart Mock to instantly mock generation and poll endpoints—no real credits needed. This allows full UI and workflow testing while your backend is still in progress.

What is the Grok Text-to-Video API?

The Grok text-to-video API is part of xAI's media generation suite at https://api.x.ai. Send a text prompt—grok-imagine-video generates a video from scratch (no image required).

Other endpoints:

Synchronous image generation: POST /v1/images/generations (grok-imagine-image, $0.02/image)
Video extension/editing endpoints

Difference: text-to-video creates everything from your prompt. For animating a specific image, see the Grok image to video API guide.

How Text-to-Video Generation Works (Async Pattern)

Video generation is async:

Send a POST request with your prompt.
API returns a request_id immediately.
Video is generated server-side.
Poll GET /v1/videos/{request_id} until "status" is "done".
When done, response includes the video URL.

Your frontend/app must handle this async flow (e.g., show loading, then display video when ready).

Prerequisites

Before coding, set up:

xAI Account — console.x.ai
API Key — In the xAI console, create and save your key. Pass as Bearer token in headers.

Example (set as environment variable):

export XAI_API_KEY="your_api_key_here"

(Optional) Install the xAI Python SDK:

pip install xai-sdk

Your First Text-to-Video Request

Endpoint: POST https://api.x.ai/v1/videos/generations

Required fields: model, prompt

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting"
  }'

Response:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}

Using Python (requests)

import requests
import os

API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "grok-imagine-video",
    "prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting"
}

response = requests.post(
    f"{BASE_URL}/v1/videos/generations",
    headers=headers,
    json=payload
)

data = response.json()
request_id = data["request_id"]
print(f"Generation started. Request ID: {request_id}")

Polling for the Video Result

Poll GET /v1/videos/{request_id} until "status" is "done".

Status values:

"processing": still generating
"done": complete, video URL available
"failed": error

Python polling loop:

import requests
import time
import os

API_KEY = os.environ["XAI_API_KEY"]
BASE_URL = "https://api.x.ai"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

def poll_video(request_id: str, interval: int = 5, max_attempts: int = 60) -> dict:
    url = f"{BASE_URL}/v1/videos/{request_id}"

    for attempt in range(max_attempts):
        response = requests.get(url, headers=headers)
        data = response.json()

        status = data.get("status")
        progress = data.get("progress", 0)
        print(f"Attempt {attempt + 1}: status={status}, progress={progress}%")

        if status == "done":
            return data
        elif status == "failed":
            raise RuntimeError(f"Video generation failed: {data}")

        time.sleep(interval)

    raise TimeoutError(f"Video not ready after {max_attempts} attempts")

def generate_video(prompt: str) -> str:
    response = requests.post(
        f"{BASE_URL}/v1/videos/generations",
        headers={**headers, "Content-Type": "application/json"},
        json={"model": "grok-imagine-video", "prompt": prompt}
    )
    request_id = response.json()["request_id"]
    print(f"Request ID: {request_id}")

    result = poll_video(request_id)
    video_url = result["video"]["url"]
    print(f"Video ready: {video_url}")
    return video_url

# Example usage
video_url = generate_video(
    "A timelapse of a city skyline at sunset transitioning to night, aerial view"
)

Poll response:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 8,
    "respect_moderation": true
  },
  "progress": 100,
  "usage": {
    "cost_in_usd_ticks": 500000000
  }
}

Using the xAI Python SDK

The SDK handles polling for you. client.video.generate() blocks until the video is ready.

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

result = client.video.generate(
    model="grok-imagine-video",
    prompt="A golden retriever running through autumn leaves in slow motion",
    duration=8,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {result.video.url}")
print(f"Duration: {result.video.duration}s")

Use the SDK for quick starts. Use raw requests for advanced polling, retries, or custom status handling.

Writing Effective Prompts for Video Generation

Prompt quality determines output quality. Use a structured approach:

Scene description: Specify subject and setting.
- “A white ceramic coffee mug on a wooden table beside a rain-streaked window”
Motion: Describe what moves and how.
- “The camera slowly orbits the mug as steam curls upward”
Camera style: Use film/video terms.
- “Close-up,” “tracking shot,” “overhead drone view,” etc.
Lighting and mood: Specify light and vibe.
- “Golden hour,” “overcast,” “neon-lit,” etc.
Style references: Add style cues.
- “Cinematic,” “anime,” “stop-motion,” etc.

Prompt structure example:

A lone astronaut floats past the International Space Station,
tether drifting behind them. The camera tracks slowly
alongside, showing Earth below. Cinematic, IMAX quality,
warm sunrise light reflecting off the visor.

Controlling Resolution, Duration, and Aspect Ratio

Parameters:

duration (seconds): 1–15 (default: 6)
resolution: "480p" (default) or "720p"
aspect_ratio: "16:9" (default), "9:16", "1:1", "4:3", "3:4", "3:2", "2:3"

Aspect Ratio Table:

Ratio	Best for
`16:9`	Desktop, YouTube, presentations
`9:16`	TikTok, Instagram Reels, mobile
`1:1`	Instagram feed, social cards
`4:3`	Classic video, presentations
`3:4`	Portrait mobile content
`3:2`	Standard photo ratio
`2:3`	Portrait photography

Example (all parameters):

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
    "duration": 10,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'

Using Reference Images to Guide Video Style

The reference_images parameter accepts up to 7 image URLs to guide style, color, and texture (but prompt still drives content):

{
  "model": "grok-imagine-video",
  "prompt": "A coastal town at dawn, waves breaking gently on a rocky shore",
  "reference_images": [
    {"url": "https://example.com/my-style-reference.jpg"},
    {"url": "https://example.com/color-palette-reference.jpg"}
  ]
}

Use visually consistent images for best results.
Reference images influence visual style, not the subject.
For turning a specific image into a video, use the image-to-video endpoint.

Extending and Editing Generated Videos

Two async endpoints for working with existing videos:

Extend a video: POST /v1/videos/extensions
- Pass the original request_id and a new prompt to add footage.
Edit a video: POST /v1/videos/edits
- Modify style or content of an existing video with a text instruction.

Both return a new request_id and follow the same poll-until-done pattern.

Reading the Cost from the API Response

Check the usage object in the final poll response:

"usage": {
  "cost_in_usd_ticks": 500000000
}

1 USD tick = 0.0000001 USD. (Divide by 10,000,000)
Example: 500000000 ticks = $0.50.

Python conversion:

cost_in_usd = result["usage"]["cost_in_usd_ticks"] / 10_000_000
print(f"Cost: ${cost_in_usd:.4f}")
# Output: Cost: $0.0500

Pricing Reference:

Resolution	Price per second	10-second clip
480p	$0.05	$0.50
720p	$0.07	$0.70

How to Test Your Grok Video API with Apidog

Async polling makes frontend testing tricky. Apidog's Smart Mock solves this:

Use Case 1: Smart Mock for Frontend Development

Mock the generation endpoint: In Apidog, create POST /v1/videos/generations. Define response with a request_id string. Smart Mock generates a UUID automatically.

  {
    "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
  }

Mock the poll endpoint: Create GET /v1/videos/{request_id}. Define schema with status, video.url, video.duration, progress, usage.cost_in_usd_ticks. Use a Custom Mock to return "status": "done" with a fake MP4 URL.

  {
    "status": "done",
    "video": {
      "url": "https://vidgen.x.ai/mock-video-12345.mp4",
      "duration": 8,
      "respect_moderation": true
    },
    "progress": 100,
    "usage": {
      "cost_in_usd_ticks": 400000000
    }
  }

Frontend teams can now test all UI states (loading, done, error) without spending API credits.

Use Case 2: Test Scenarios for the Polling Loop

Step 1: Add POST /v1/videos/generations as the first step. Extract request_id with JSONPath $.request_id into videoRequestId.
Step 2: Add GET /v1/videos/{{videoRequestId}} with a loop. Break when response.body.status == "done". Add a 5s wait between polls.
Step 3: Assert $.video.url is not empty to confirm successful completion.

Automate this in CI to catch regressions in your async flow.

Text-to-Video vs Image-to-Video: When to Use Each

Both use grok-imagine-video, but for different cases:

Text-to-video:

For creating original content from a prompt/concept
When you want full model creativity
For prompt-based content tools
No source image needed

Image-to-video:

To animate a specific image, product, or artwork
To preserve visual details from an existing image
For consistent animation across a set of images

Routing tip: If user uploads an image, use image-to-video. If they only type a prompt, use text-to-video.

See the Grok image to video API guide for more.

Common Errors and Fixes

401 Unauthorized: API key missing/invalid. Check Authorization header (Bearer YOUR_XAI_API_KEY).
429 Too Many Requests: Rate limit exceeded (60/min, 1/sec). Add delay between polls (≥5s).
status: "failed": Prompt rejected by moderation. Revise to remove ambiguous or sensitive terms.
Video URL 404: URLs are temporary. Download video immediately after generation.
Empty/frozen video: Add explicit motion to your prompt.
Slow polling: 720p and longer durations take more time. Use 480p/short clips for dev/testing.

Conclusion

The Grok text-to-video API provides a simple async pattern: send a prompt, get a request_id, poll until done, fetch your MP4. Once you structure the polling loop, other features—duration, resolution, aspect ratio, reference images—are easy to integrate.

For production, always track costs via cost_in_usd_ticks. Use Apidog to mock endpoints and run automated test scenarios so your frontend and backend teams can develop in parallel.

FAQ

What model name do I use for text-to-video generation?

Use grok-imagine-video as the model field in POST /v1/videos/generations.

How long does video generation take?

Depends on duration and resolution. 480p may finish in < 30s; 720p and/or longer clips may take minutes. Poll every 5–10 seconds.

Can I generate a video longer than 15 seconds?

No. Max duration is 15s. For longer, generate then extend using POST /v1/videos/extensions.

How do I download the generated video?

Download the MP4 from result.video.url immediately—URLs expire.

What happens if my prompt violates content moderation?

Status will be "failed". Check respect_moderation in the poll response. Revise your prompt and try again.

Is there a free tier for the video API?

No official free tier, but check console.x.ai for current offers/credits.

How do reference_images differ from a source image?

Reference images guide style for text-to-video. Source images (image-to-video) become the actual first video frame.

Best way to test polling loop without spending credits?

Use Apidog's Smart Mock to mock generation and poll endpoints, covering all states.

How to use the Grok image to video API (step-by-step guide)

Wanda — Fri, 03 Apr 2026 08:35:12 +0000

TL;DR

The Grok image-to-video API, powered by the grok-imagine-video model, animates a static image into a video clip. POST your image URL, prompt, and settings to https://api.x.ai/v1/videos/generations. The API returns a request_id immediately; poll GET /v1/videos/{request_id} until status is "done". Duration: 1–15 seconds. Pricing: from $0.05/sec for 480p output.

Try Apidog today

Introduction

On January 28, 2026, xAI launched the grok-imagine-video model for public API access. In its first month, it generated 1.2 billion videos and topped the Artificial Analysis text-to-video leaderboard. With image-to-video, you send the API a photo and a descriptive prompt, and it animates your image into an MP4 video.

This async workflow means your integration isn't finished when the POST returns 200—you must handle "processing", "done", and "failed" states robustly.

Apidog's Test Scenarios let you automate this: POST to /v1/videos/generations, extract the request_id, poll until status == "done", then assert the video URL is present.

What is the Grok image-to-video API?

Grok image-to-video is part of xAI's video generation suite. The grok-imagine-video model accepts an image as the first frame and animates it based on your prompt.

Endpoint:

POST https://api.x.ai/v1/videos/generations

Authenticate with a Bearer token:

Authorization: Bearer YOUR_XAI_API_KEY

Get your API key from the xAI console. This API also supports text-to-video (omit the image parameter), video extensions, and edits.

How the image-to-video process works

Set the image parameter in your POST body to define the first frame of your video. The model starts from your image and predicts natural motion according to your prompt.

Example: upload a mountain lake photo and prompt "gentle ripples spread across the water as morning mist drifts." The video starts exactly with your photo, then animates the scene.

Use image-to-video when:

You have product photos, landscapes, or portraits you want animated.
Brand assets require a consistent first frame.
Motion should be grounded in a specific scene.

Use text-to-video when:

You don’t have a source image or just want to brainstorm.
Scene composition isn’t predetermined.
Fast iteration matters more than first-frame precision.

Prerequisites

Before your first call:

xAI account: console.x.ai
API key: from the xAI console (store in an environment variable)
Python 3.8+ or Node.js 18+ (examples below)
Public image URL or base64-encoded image (data URI)

Set your API key:

export XAI_API_KEY="your_key_here"

Install the xAI Python SDK if you want higher-level access:

pip install xai-sdk

For raw HTTP, you only need requests (Python) or fetch (Node.js).

Making your first image-to-video request

Using curl

curl -X POST https://api.x.ai/v1/videos/generations \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
      "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'

Response:

{
  "request_id": "d97415a1-5796-b7ec-379f-4e6819e08fdf"
}

The video is generated asynchronously—poll to check status.

Using Python (raw requests)

import os
import requests

api_key = os.environ["XAI_API_KEY"]

payload = {
    "model": "grok-imagine-video",
    "prompt": "Gentle waves move across the surface, morning mist rises slowly",
    "image": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/24701-nature-natural-beauty.jpg/1280px-24701-nature-natural-beauty.jpg"
    },
    "duration": 6,
    "resolution": "720p",
    "aspect_ratio": "16:9"
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.x.ai/v1/videos/generations",
    json=payload,
    headers=headers
)

data = response.json()
request_id = data["request_id"]
print(f"Job started: {request_id}")

Using a base64 image

Encode a local image as a data URI:

import base64

with open("my_image.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode("utf-8")

payload["image"] = {
    "url": f"data:image/jpeg;base64,{encoded}"
}

Polling for the result

After submitting your request, poll the status endpoint:

GET https://api.x.ai/v1/videos/{request_id}

Status values:

Status	Meaning
`"processing"`	Video is still rendering
`"done"`	Video ready; URL in response
`"failed"`	Something went wrong

Completed response:

{
  "status": "done",
  "video": {
    "url": "https://vidgen.x.ai/....mp4",
    "duration": 6
  },
  "progress": 100
}

Full Python polling loop

import time

def poll_video(request_id: str, api_key: str, interval: int = 5) -> dict:
    url = f"https://api.x.ai/v1/videos/{request_id}"
    headers = {"Authorization": f"Bearer {api_key}"}

    while True:
        response = requests.get(url, headers=headers)
        data = response.json()
        status = data.get("status")

        print(f"Status: {status} | Progress: {data.get('progress', 0)}%")

        if status == "done":
            return data["video"]
        elif status == "failed":
            raise RuntimeError(f"Video generation failed for {request_id}")

        time.sleep(interval)

# Usage
video = poll_video(request_id, api_key)
print(f"Video URL: {video['url']}")
print(f"Duration: {video['duration']}s")

Tip: Keep intervals ≥5 seconds to avoid API rate limits (60 requests/minute).

Using the xAI Python SDK

The xai-sdk library abstracts polling:

from xai_sdk import Client
import os

client = Client(api_key=os.environ["XAI_API_KEY"])

video = client.video.generate(
    model="grok-imagine-video",
    prompt="Gentle waves move across the surface, morning mist rises slowly",
    image={"url": "https://example.com/landscape.jpg"},
    duration=6,
    resolution="720p",
    aspect_ratio="16:9"
)

print(f"Video URL: {video.url}")
print(f"Duration: {video.duration}s")

Use the SDK for simple blocking calls; use raw HTTP for custom polling or logging.

Controlling resolution, duration, and aspect ratio

Grok's API gives you flexibility:

Duration

Integers 1–15 seconds; default is 6.

"duration": 10

Resolution

Value	Description
`"480p"`	Default, lower cost, faster
`"720p"`	Higher quality; $0.07/sec

"resolution": "720p"

Aspect ratio

Value	Use case
`"16:9"`	Default, landscape
`"9:16"`	Vertical, stories/reels
`"1:1"`	Square, social
`"4:3"`	Presentations
`"3:4"`	Portrait
`"3:2"`	Photography crop
`"2:3"`	Tall portrait

Defaults to match your source image unless set explicitly.

Using reference images for style guidance

image: source photograph (first frame)
reference_images: up to 7 images to guide style (not used as frames)

Example:

{
  "model": "grok-imagine-video",
  "prompt": "A product rotating slowly on a clean white surface",
  "image": {
    "url": "https://example.com/product-shot.jpg"
  },
  "reference_images": [
    {"url": "https://example.com/brand-style-reference-1.jpg"},
    {"url": "https://example.com/lighting-reference.jpg"}
  ],
  "duration": 6,
  "resolution": "720p"
}

Reference images influence style/appearance, not the actual video frames.

Extending and editing videos

You can go beyond initial generation:

Extending a video

POST /v1/videos/extensions lets you add seconds to an existing clip (max 15 sec per call).

curl -X POST https://api.x.ai/v1/videos/extensions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "The mist continues to lift as sunlight breaks through",
    "duration": 5
  }'

Poll the same status endpoint for the extended clip.

Editing a video

POST /v1/videos/edits applies prompt-based modifications:

curl -X POST https://api.x.ai/v1/videos/edits \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-imagine-video",
    "video_id": "your_original_request_id",
    "prompt": "Change the sky to a dramatic sunset with deep orange tones"
  }'

Extensions and edits are both async.

Pricing breakdown: what a 10-second video costs

Component	Cost
Input image	$0.002 per image
Output 480p	$0.05 per second
Output 720p	$0.07 per second

10s video @ 720p:

Input image: $0.002
Output: 10 × $0.07 = $0.70
Total: $0.702

6s video @ 480p (default):

Input image: $0.002
Output: 6 × $0.05 = $0.30
Total: $0.302

The input charge applies to every generation request, even with the same image.

Text-to-video (no image) skips the $0.002 input fee.

How to test your Grok video API integration with Apidog

The async workflow requires testing for:

Generation request returns a request_id
Polling handles "processing" correctly
Final response has status == "done" and a video URL

Apidog's Test Scenarios automate this:

Step 1: Create a new Test Scenario

In Apidog, go to the Tests module and click +. Name: "Grok image-to-video async flow".

Step 2: Add the generation request

URL: https://api.x.ai/v1/videos/generations
Method: POST
Header: Authorization: Bearer {{xai_api_key}}
Body:

{
  "model": "grok-imagine-video",
  "prompt": "Gentle mist rises from the water as light filters through the trees",
  "image": {
    "url": "https://example.com/your-test-image.jpg"
  },
  "duration": 6,
  "resolution": "480p"
}

Step 3: Extract the request_id

Add an Extract Variable processor:

Variable: video_request_id
Source: Response body
Extraction: JSONPath
JSONPath: $.request_id

Step 4: Build the polling loop

Add a For loop:

Inside, add GET:
- URL: https://api.x.ai/v1/videos/{{video_request_id}}
- Method: GET
- Header: Authorization: Bearer {{xai_api_key}}
Extract video_status via JSONPath $.status
Add Wait (5000ms) to avoid rate limits
Loop break: {{video_status}} == "done"

Step 5: Assert the video URL

After the loop, add GET to the same endpoint. Add Assertion:

Field: $.video.url
Condition: Is not empty

Run the scenario:

Click Run. Apidog executes POST, extracts request_id, polls until done, asserts video URL. The test report shows each step's results.

Integrate into CI/CD with Apidog CLI:

apidog run --scenario grok-video-async-flow --env production

Common errors and fixes

401 Unauthorized

Check API key and Authorization header.

422 Unprocessable Entity

Malformed body: check model, prompt, and accessible image.url.

Image URL not accessible

xAI must fetch the URL. Use a public link or base64 data URI.

Status stuck at "processing"

If stuck >10min, resubmit. Generation can take 30s–several minutes.

429 Rate limit

Max 60 requests/min, 1/sec. Add delays between polls.

Base64 upload rejected

Add correct MIME prefix (e.g., data:image/jpeg;base64,).

Aspect ratio mismatch

Set aspect ratio to match your source image for best results.

Conclusion

The Grok image-to-video API lets you animate static images with a simple async workflow: POST image + prompt, get request_id, poll until "done", download the MP4. The model is proven and scales to billions of videos.

Async patterns are error-prone—use Apidog Test Scenarios to automate extraction, polling, and assertions. This catches integration issues before production.

Start building your integration with Apidog free. No credit card required.

FAQ

What model name do I use for the Grok image-to-video API?

Use grok-imagine-video as the model field.

What's the difference between image and reference_images?

image: first frame (animated).

reference_images: guide style/content only.

How long does video generation take?

6s @ 480p: 1–3 min.

15s @ 720p: 4–8 min.

Poll every 5s.

Can I use a local file as the source image?

Yes. Encode as base64 data URI and pass as image.url.

What if I don't specify aspect_ratio?

Defaults to your image's proportions. Text-to-video defaults to 16:9.

How much does a 10s 720p video cost?

$0.002 image + 10 × $0.07 = ~$0.702.

What are the rate limits?

60 requests/min, 1/sec (POST+GET combined).

Can I extend a video beyond 15 seconds?

Yes, with POST /v1/videos/extensions. Each call is up to 15s and async.

How much does the Bird SMS API cost in 2026?

Wanda — Fri, 03 Apr 2026 07:07:18 +0000

TL;DR

Bird SMS API starts at $0.00331 per outbound US message and $0.003 per inbound US message, making it one of the lowest entry-level SMS API prices. There's a free plan (5 SMS/day) for testing, and the Pro plan ($49/month) includes 1,000 SMS credits.

Try Apidog today

This post breaks down Bird SMS pricing for 2025 and 2026, what drives up your bill, and how Bird stacks up to other SMS API providers.

Bird SMS Pricing Overview

Bird uses a two-part pricing model:

Platform plan fee (monthly subscription)
Per-message fee (pay-as-you-go, on top of your plan)

Plan	Monthly cost	SMS included
Free	$0	5 SMS/day
Pro	$49	1,000 SMS/month
Enterprise	Custom	Custom volume

After you exceed your included SMS, you pay per message:

Outbound SMS (US): $0.00331/message
Inbound SMS (US): $0.003/message

Bird lists CRM/Marketing Automation pricing in EUR on their detailed rate page, but these USD figures are for the API/developer tier. For the latest rates, check bird.com/en/pricing/sms as carrier and exchange rates can change.

Pricing Breakdown: SMS, MMS, WhatsApp, and Email

SMS

US outbound SMS: $0.00331/message. International rates vary:

Country	Outbound rate (approx.)
United States	~$0.00331
United Kingdom	~€0.036
Australia	~€0.009
Germany	~€0.056
India	varies by route
Brazil	~€0.047

Note: US/Canada messages may have additional carrier fees (10DLC, toll-free). See the hidden costs section.

MMS

MMS costs more than SMS and is available for US/Canada numbers. Expect 3–5x the SMS rate, depending on your plan. Check bird.com/en/pricing/sms for current MMS rates.

WhatsApp pricing has two components:

Bird processing fee (per 1,000 messages):

Volume	Fee per 1,000 messages
1 to 1,000	$0.001
1,001 to 100,000	$0.005
100,001 to 500,000	$0.0045
500,001+	$0.004

Meta passthrough fee (conversation-based):

Marketing: $0.0250
Utility: $0.0034
Authentication: $0.0034

Bird passes Meta fees through at cost.

Email

Email via Bird’s API starts at $0.001/email for low volumes. Pro plan includes 10,000 emails/month.

Voice

Voice API pricing is usage-based and country-specific. US outbound calls: ~$0.013–$0.015/min. Inbound slightly less. For high volume, contact Bird sales.

What Affects Your Bird Bill

Several factors can increase your actual cost:

1. Country and Carrier Routing

US SMS is cheap, but sending to Germany (~€0.056), Bangladesh (~€0.208), or Burundi (~€0.269) can be 10–80x the US rate. Always check Bird’s country rate sheet before estimating international spend.

2. Number Type

The phone number type you use changes costs:

Long codes (10DLC): Standard US numbers; require brand/campaign registration.
Short codes: 5–6 digit numbers, high throughput, higher rental ($500–$1,000/month).
Toll-free: Lower monthly cost, moderate throughput.

Rental fees are on top of per-message rates.

3. Message Length and Encoding

Standard SMS: 160 GSM-7 chars.
Over 160 chars = multiple segments, each billed.
Unicode (emoji/accented/non-Latin): 70 chars per segment. A single emoji in a 200-char message could result in 3 segments billed instead of 2.

4. Channel Mix

If you use fallback routes (e.g., WhatsApp if SMS fails), you’ll be charged for both. Model the cost impact before activating multi-channel flows in Bird’s Flow Builder.

5. Volume

No public tiered discounts for SMS volume. High-volume senders (millions/month) should negotiate custom Enterprise rates.

Hidden Costs and Fees

Bird is transparent, but watch these line items:

US Carrier Fees (10DLC)

US carriers require brand/campaign registration for A2P SMS:

Brand registration: One-time fee (~$4–$44)
Campaign registration: ~$10/month per campaign
Per-message surcharge: Extra cents per message

These are set by carriers, not Bird, and are passed through at cost.

Phone Number Rental

Long code: $1–$2/month
Toll-free: $2–$3/month
Short code: $500–$1,000/month

Inbound Webhooks and Processing

Inbound SMS triggers the inbound rate ($0.003/message for US). Opt-out/help replies count as inbound.

Platform Subscription

The $49/month Pro plan is fixed overhead. If you send low volumes, focus on per-message rates; for high volumes, the included 1,000 SMS may be a small fraction.

How Bird Compares to Alternatives

Provider	US outbound SMS	US inbound SMS	Notes
Bird	$0.00331	$0.003	Lowest base rate; omnichannel
Twilio	$0.0079	$0.0079	Higher base; huge ecosystem
Telnyx	~$0.004	~$0.001	Competitive; carrier-direct
Plivo	$0.0055	$0.0005	Lower inbound; simpler platform

Bird offers the lowest US outbound base rate. Twilio is >2x at entry level, but volume discounts can close the gap for enterprises. If you want minimal SMS API overhead, Telnyx/Plivo are simpler. Bird is best for omnichannel (SMS, WhatsApp, email) with automation.

How to Get Started with Bird

Create a free account at bird.com. Free plan: 5 SMS/day, no credit card required.
Register your brand/campaign for US 10DLC compliance via the dashboard. Approval takes 1–3 business days.
Provision a phone number (US long code) via the dashboard or Numbers API.
Make your first API call. Bird uses REST/JSON. Example:

   curl -X POST "https://api.bird.com/v1/messages" \
     -H "Authorization: AccessKey YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "originator": "+1YOURNUMBER",
       "recipients": ["+1DESTINATION"],
       "body": "Hello from Bird SMS API!"
     }'

See docs.bird.com for SDKs (Node.js, Python, PHP, Go, Java, Ruby) and webhook setup.

Conclusion

Bird SMS API is the lowest-cost base rate among major US providers ($0.00331/message). Free plan lets you test without a credit card; Pro covers 1,000 SMS/month at $49.

Watch out for: US 10DLC fees, number rental, and international message rates which can be much higher. Always build these into your cost model.

Pro tip: Use Apidog to test your Bird integration—build requests, manage environments, and chain tests before going live.

FAQ

What is the Bird SMS API price per message in the US?

$0.00331 per outbound US SMS, $0.003 per inbound. Additional 10DLC carrier fees apply.

Is Bird the same as MessageBird?

Yes. MessageBird rebranded to Bird in 2023. The API may still reference MessageBird in docs.

Does Bird charge for inbound SMS?

Yes, $0.003 per inbound US message. Replies like opt-out or help count as inbound.

What is the Bird free plan?

5 SMS/day, 10 emails/day, 15 AI agent messages/day. Public API access, no credit card required.

How does Bird SMS pricing compare to Twilio?

Bird ($0.00331) is cheaper than Twilio ($0.0079) for US outbound. At 100,000 messages/month, Bird costs ~$331, Twilio ~$790 (before Twilio’s volume discounts).

Are there hidden fees with Bird SMS?

Yes: US 10DLC brand/campaign fees, per-message carrier surcharges, phone number rental, and higher international rates.

How do I test the Bird SMS API without sending real messages?

Use Apidog Smart Mock to simulate API responses. You can also use Bird’s sandbox tools for test messages. Apidog lets you chain requests and assert on responses without touching production.

Twilio SMS API cost: complete pricing breakdown for 2026

Wanda — Fri, 03 Apr 2026 03:52:27 +0000

TL;DR

US long code SMS (outbound and inbound): $0.0083 per message
MMS outbound: $0.022 per message; MMS inbound: $0.0165 per message
Toll-free SMS: $0.0083 per message plus carrier surcharges
Short code rental: $1,000/month for a random code; $1,500/month for a vanity code
10DLC brand registration: $4.50 one-time fee (updated August 2025); campaigns from $1.50 to $10/month
Phone number (long code): $1.15/month; toll-free: $2.15/month
Carrier fees stack on top of all base prices
Volume discounts kick in automatically at 150,000 messages per month
Free trial available; no credit card required to start

Introduction

Twilio is the default SMS API for most developer teams. The documentation is comprehensive, uptime is strong, and the REST API is developer-friendly. But as you scale from a test app to production, your costs can spike. You pay Twilio's base rate, plus carrier fees, a phone number fee, and potential 10DLC registration.

Try Apidog today

💡 Before you build out your SMS integration, test it properly.

Apidog lets you automate test scenarios to validate your Twilio webhook responses against your OpenAPI spec. Confirm status codes, check response body schemas, and run contract tests in CI/CD. Use Apidog free to catch issues before production.

This guide breaks down every Twilio SMS pricing line item. Learn what drives your bill, which fees are easy to miss, and how Twilio compares to alternatives.

Twilio SMS Pricing Overview

Twilio uses pay-as-you-go pricing. You pay per message segment, per phone number per month, and for any extra features or registrations. There are no mandatory contracts or minimums unless you negotiate a committed-use discount.

US SMS (long code): $0.0083 per message segment (inbound & outbound)
Carrier fees, number rental, and 10DLC registration add to the base cost.

Twilio applies automatic volume discounts after 150,000 messages per month per number type. No negotiation required.

Pricing Breakdown: SMS, MMS, Toll-Free, and Short Codes

Long Code SMS (10-digit numbers)

Long codes are standard US 10-digit numbers for most A2P messaging.

Message type	Price per segment
SMS outbound	$0.0083
SMS inbound	$0.0083
MMS outbound	$0.022
MMS inbound	$0.0165

Key point: Twilio charges per segment, not per message.

GSM-7: 160 chars/segment
Unicode: 70 chars/segment
Example: Sending 200 plain-text chars = 2 segments.

Toll-Free Numbers

Toll-free numbers use the same base SMS rates as long codes: $0.0083 per segment (inbound/outbound).

MMS outbound: $0.022
MMS inbound: $0.02 Toll-free numbers don't require 10DLC registration, but do require toll-free verification for high throughput.

Short Codes (5-6 digit numbers)

Short codes allow for high-throughput campaigns. Per-message SMS rate is $0.0083, but rental costs are much higher.

Short code type	Monthly cost
Random short code	$1,000/month (billed quarterly)
Vanity short code	$1,500/month (billed quarterly)
Bring-your-own vanity	$500/month (billed quarterly)

MMS-enabled codes: $500 one-time fee

Volume Discounts

Twilio applies automatic tiered discounts for long code, toll-free, and short code messages:

Monthly messages	Price per segment
1 to 150,000	$0.0083
150,001 to 300,000	$0.0081
300,001 to 500,000	$0.0079
500,001 to 750,000	$0.0077
750,001 to 1,000,000	$0.0075
1,000,001+	$0.0073

Note: Volume is calculated per number type (long code, toll-free, short code).

What Affects Your Twilio Bill

Phone Number Type

Your sending number determines the rate table. If you split traffic across number types, you might not hit discount thresholds.

Message Segments

Messages are billed by segment. Unicode or longer messages increase your cost per send.

Carrier Fees

US carriers add surcharges on top of Twilio's base rate.

Long code carrier fees (outbound SMS):

Carrier	Outbound fee
AT&T	$0.0035
T-Mobile	$0.0045
Verizon	$0.004
US Cellular	$0.005
All others	$0.004

MMS carrier fees are higher: e.g., AT&T adds $0.009, T-Mobile adds $0.01 per MMS.

Example: Outbound SMS to AT&T: $0.0083 (Twilio) + $0.0035 (carrier) = $0.0118 per segment.

Destination Country

International rates vary:

UK: $0.04/message
India: $0.0029/message
Brazil: $0.075/message

Twilio provides a downloadable CSV with all rates.

10DLC Registration

US A2P 10DLC requires registration for business messaging on long codes. Unregistered traffic is likely to be filtered by carriers.

Hidden Costs and Fees

10DLC Registration Fees

Fees updated August 2025 (TCR changes):

Brand registration:
- Sole Proprietor/Low Volume: $4.50 one-time
- Standard (with secondary vetting): $46 one-time
Campaign registration:
- Vetting fee: $15 per campaign
- Monthly recurring: $1.50 to $10/month

Campaign type	Monthly fee
Sole Proprietor	$2
Low-volume mixed	$1.50
Standard	$10
Charity/501(c)(3)	$3
Emergency services	$5

Standard vetting: $41.50
Vetting appeals: $11
Auth Plus retry: $12.50 per retry

Phone Number Fees

Long code: $1.15/month
Toll-free: $2.15/month

Provisioning multiple numbers (for local presence or testing) increases costs.

Failed Message Processing Fee

$0.001 per message in "Failed" status This fee applies during debugging or if carrier filtering increases.

Optional Feature Add-ons

Engagement Suite (link shortening/tracking): $0.015 per outbound message after 1,000 free/month
Compliance Toolkit (AI SMS review): $0.015 per outbound message

Support Plans

Free developer support is limited
Paid plans start at $250/month ("Developer" tier)
Enterprise support costs more

How Twilio Compares to Alternatives

Provider	US SMS outbound	US SMS inbound	Phone number	10DLC required
Twilio	$0.0083 + carrier fees	$0.0083 + carrier fees	$1.15/month	Yes
Plivo	$0.005	$0.00035	$0.80/month	Yes
Telnyx	$0.004	$0.004	$1.00/month	Yes
Bird (MessageBird)	~$0.007	~$0.007	Varies	Yes

Plivo and Telnyx are cheaper per message; Plivo's inbound rate is much lower.
Telnyx bundles carrier fees differently—final price is often lower.
Twilio is ahead in developer experience, documentation, integrations, and advanced features.

How to Test Twilio for Free

Twilio offers a free trial—no credit card required.

Use the credit to send real messages and provision a test number.
Trial messages to unverified numbers include a "Sent from your Twilio trial account" prefix (removed after upgrading).
Trial lets you:
- Provision test number
- Send/receive SMS via API
- Test webhook delivery
- Explore the Console and logs

No monthly minimums; pay only for what you use after upgrading.

How to Test Your Twilio SMS Integration with Apidog

Automated testing is critical for robust Twilio integrations. Manual testing doesn't scale, and won't catch schema issues in webhook payloads.

Apidog's Test Scenarios let you chain multiple API requests:

Import your Twilio webhook endpoint spec into a scenario
Add steps for each flow: send SMS, receive delivery callback, handle inbound reply
Orchestrate step sequence and data passing with {{$.stepId.response.body.field}}
Automate: Validate all steps in a single run, just like real-world usage

Example workflow:

Send SMS via Twilio API
Capture message SID from response
Receive webhook callback and validate payload

Contract Testing with Apidog:

Response Validation checks each actual API response against your OpenAPI spec
Ensures:
- HTTP status codes match your documentation
- Required fields are present
- Field types are correct
- No extra fields unless allowed by additionalProperties

No need to write custom assertion scripts—just link your OpenAPI spec and let Apidog handle validation in every test step.

Get started:

Download Apidog free
Import your Twilio API spec or define webhook schema
Build a test scenario mirroring your SMS send/receive flow

Conclusion

Twilio SMS API pricing starts at $0.0083 per segment for US long codes, but real per-message cost includes carrier fees, phone number rental, and 10DLC registration. For a single AT&T outbound message, expect about $0.0118 per segment. Standard 10DLC campaigns add $10/month per campaign.

Pricing is transparent and discounts are automatic
Don't rely on the base rate alone—model your total cost
Build and test your integration thoroughly

Apidog makes it easy to automate Twilio integration tests and validate your webhook flows. Use Test Scenarios and Contract Testing to ensure your API responses match your spec, without writing custom test code.

FAQ

Q: What is the base price for a US outbound SMS on Twilio?

A: $0.0083 per message segment for US long code outbound SMS. Carrier fees (e.g., AT&T adds $0.0035) apply on top.

Q: Does Twilio charge for inbound SMS?

A: Yes. $0.0083 per segment for inbound SMS on long codes and toll-free numbers. Carrier fees may apply.

Q: What is 10DLC and do I have to pay for it?

A: 10DLC is the US registration system for A2P SMS from 10-digit long codes. Brand registration: $4.50 one-time. Campaign fees: $1.50 to $10/month.

Q: How much does a Twilio short code cost?

A: Random short code: $1,000/month (billed quarterly). Vanity: $1,500/month. One-time $500 setup fee if MMS-enabled. SMS rate: $0.0083/message.

Q: Does Twilio offer volume discounts?

A: Yes. First 150,000 messages/month per number type: $0.0083/segment. Rates drop in tiers to $0.0073 for >1M/month.

Q: Is there a free trial for Twilio?

A: Yes, with free credits and no credit card required. Trial messages to unverified numbers include a prefix.

Q: What's the cheapest Twilio alternative for SMS?

A: Plivo and Telnyx have lower per-message rates. Plivo: $0.005 outbound, $0.00035 inbound. Telnyx: $0.004/message. Both require 10DLC registration. Twilio wins in documentation and advanced features.

How much does the Vonage SMS API cost? (2026 pricing)

Wanda — Fri, 03 Apr 2026 03:47:13 +0000

TL;DR

Vonage SMS API pricing starts at $0.00809 per outbound message and $0.00649 per inbound message in the US. There’s no monthly minimum; you pay only for what you send and receive. International rates vary and can exceed $1.00 per message in some regions. If you’re developing or testing a Vonage SMS integration, Apidog streamlines sending test requests, validating responses, and catching errors before deployment.

Try Apidog today

Introduction

If you’ve heard of Nexmo, you already know Vonage. Vonage acquired Nexmo in 2016, and while the Nexmo brand lingered, most developer resources now live under developer.vonage.com. Since 2022, Vonage has been part of Ericsson, but the API platform continues under the Vonage name.

Vonage powers over 100,000 businesses and supports 1.6 million developers. Its APIs offer global messaging with a pay-as-you-go model and no long-term commitment, spanning more than 190 countries.

💡 Pro tip: Before integrating, equip yourself with a robust API testing tool. Apidog lets you build and automate test scenarios for Vonage SMS, validate schemas, and chain API calls. Sign up free—no credit card needed.

Vonage SMS pricing overview

Vonage runs a pay-as-you-go model for SMS: no platform or subscription fees. Message rates depend on destination and number type.

US Pricing Tiers:

Message type	Price per message
Outbound SMS (US LVN)	$0.00809
Inbound SMS (US LVN)	$0.00649
Outbound SMS (US Toll-Free)	Contact sales
Inbound SMS (US Toll-Free)	Free

LVN: Local Virtual Number (standard 10-digit long code/10DLC).
Rates update in real time based on usage.
Download the global pricing sheet from your Vonage dashboard for all countries.

Pricing breakdown: outbound, inbound, MMS, and WhatsApp

Outbound SMS

Outbound messages (your app → end user) in the US cost $0.00809/message via local virtual number. Always check your dashboard for the latest rates.

Inbound SMS

Inbound (end user → your Vonage number) messages in the US cost $0.00649/message for local numbers. Some toll-free numbers get free inbound SMS—verify this on your account.

MMS pricing

Vonage supports MMS via its Messages API, but MMS rates are not public. Contact sales for a quote. MMS pricing varies by carrier and country.

WhatsApp via Vonage

WhatsApp messages via Vonage have two charges:

Meta fee: Paid to WhatsApp/Meta, varies by message category. See Meta’s WhatsApp pricing.
Vonage platform fee: Starts at $0.00015 per message.

Facebook Messenger

Messenger via Vonage costs $0.0011 per delivered message.

RCS (Rich Communication Services)

Vonage supports RCS with carrier-specific US rates:

Carrier	RCS Rich (text-based)	RCS Rich Media
T-Mobile	$0.00620	$0.01250
Verizon	$0.00400	$0.00600
AT&T	$0.00450	$0.01000
US Cellular	$0.00620	$0.01350

A one-time setup fee of $600/country applies for RCS.

What affects your Vonage bill

Several variables impact monthly spend:

Destination country

Biggest cost factor. US messages are low-cost; some international destinations cost $0.50 or more per SMS. Always check the global pricing sheet.

Number type

Vonage offers:

Long code (10DLC): Most affordable, standard numbers.
Toll-free: Higher throughput, free inbound.
Short codes: High-volume marketing, higher cost, require registration.

Each number type has different rental and message rates.

Number rental fees

Renting a Vonage number incurs a monthly fee (US local numbers start at a few dollars/month; toll-free and short codes cost more).

Message encoding

GSM-7: 160 characters/SMS.
Unicode: 70 characters/SMS (emoji, accented letters).
Long messages split into multiple SMS, increasing cost.

Carrier surcharges

US carriers may add surcharges, especially for A2P (application-to-person) messaging. Vonage passes these fees through as line items on your invoice.

Hidden costs to watch for

Support tiers

Free: Community forums and docs.
Business: Paid plan.
Premium: 24/7 phone + account manager, $3,300/mo.

Add-on APIs

Some APIs incur extra monthly fees:

Add-on	Monthly cost
Audit API	$550/month
Auto-redact (PII removal)	$1,100/month
Reports API	$495/month (or $0.00049/CDR pay-as-you-go)

Audit API: For compliance.
Auto-redact: Removes PII from logs.

Verify API costs

Vonage’s Verify API for 2FA: $0.0572 per successful verification (plus SMS/voice delivery costs). Failed verifications still incur per-message costs.

Number registration (US 10DLC)

A2P SMS via US long codes requires registration with The Campaign Registry:

Brand registration: One-time fee.
Campaign registration: Recurring fee per campaign.

Unregistered messaging may be filtered.

Vonage vs alternatives

Comparison of major US SMS providers:

Provider	US outbound SMS	US inbound SMS	Free trial	Support
Vonage	$0.00809	$0.00649	Yes (verified numbers only)	Paid tiers; $3,300/mo for 24/7
Twilio	$0.0079	$0.0075	Yes ($15 credit)	Paid support from $250/mo
Plivo	$0.0055	$0.0005	Yes	Free basic; paid tiers
Telnyx	$0.004	$0.002	Yes ($5 credit)	24/7 email free; phone with paid

Notes:

Telnyx: Lowest price, newer platform, smaller support team.
Plivo: Great inbound pricing, strong developer support.
Twilio: Largest ecosystem and documentation, higher price.
Vonage: Middle on price, strong carrier relationships via Ericsson.

Price per message is just one factor; also consider delivery, compliance, tooling, and support.

How to try Vonage for free

Vonage offers a free trial account. Here’s what you get and what’s limited.

What you get

A Vonage virtual number for test SMS
API credentials (key/secret)
Access to all documentation and SDKs
Test credits for limited message sending

What’s restricted

During the trial, you can only send SMS to numbers you’ve verified on your account. Upgrade to a paid account to send to any number.

How to start

Go to dashboard.nexmo.com or vonage.com/communications-apis
Create a free account with your email.
Verify your phone number.
Retrieve your API key/secret from the dashboard.
Send your first SMS using the REST API or an official SDK.

Vonage offers SDKs for Node.js, Python, PHP, Ruby, Java, .NET, and Go. Or, use any HTTP client for direct REST calls.

How to test your Vonage SMS integration with Apidog

Once you have Vonage API credentials, you need to test your integration before production. Apidog is built for this.

Setting up a test scenario

In Apidog, go to the Tests module and create a new scenario.
Add steps from:
- Your OpenAPI spec (if you’ve defined the Vonage endpoint)
- A custom request to the Vonage REST API
- Import a cURL command from Vonage docs

Example: Test sending an SMS

POST https://rest.nexmo.com/sms/json
Content-Type: application/x-www-form-urlencoded

api_key=YOUR_KEY
api_secret=YOUR_SECRET
from=YOUR_NUMBER
to=DESTINATION_NUMBER
text=Hello from Apidog!

Validating the response

Apidog validates responses against your schema. For Vonage, a successful send returns a JSON object with a messages array; messages[0].status should equal "0".

Add assertions:

HTTP status code is 200
messages[0].status equals "0"
messages[0].message-id is not empty

If an error occurs, Apidog flags the step and displays the full response for debugging.

Passing data between requests

Apidog supports data chaining with {{$.stepId.response.body.field}}. Extract the message-id from the send response and use it in a follow-up status check request.

Running tests in CI/CD

Integrate Apidog with GitHub Actions, GitLab CI, or Jenkins. Export your scenario and run tests on every pull request using the Apidog CLI to catch SMS integration errors pre-deployment.

Try it free at apidog.com. No credit card required.

Conclusion

Vonage SMS API provides pay-as-you-go pricing, with US outbound SMS at $0.00809/message and inbound at $0.00649. Costs scale with country, number type, and encoding. Add-ons like the Audit API, Auto-redact, and premium support have significant fees, so plan accordingly.

If you’re comparing providers, Telnyx and Plivo offer lower prices, while Twilio offers the largest ecosystem. Vonage is a solid option for carrier-grade reliability and global reach.

Before deploying, use Apidog to create automated test scenarios for your Vonage SMS integration. Validate every flow and catch errors in development—not in production.

FAQ

Is Vonage the same as Nexmo?

Yes. Vonage acquired Nexmo in 2016. The developer platform remains at developer.vonage.com, and legacy integrations still use rest.nexmo.com.

Does Vonage charge a monthly fee for SMS?

No; you pay per message sent/received and for any numbers you rent. Optional add-ons like Audit API and Auto-redact have monthly fees.

How much does a Vonage phone number cost?

US long codes start at a few dollars/month; toll-free and short codes cost more. Check your dashboard for current rates.

What countries does Vonage SMS support?

Vonage supports SMS in 190+ countries. Prices vary; some regions cost $0.50+ per SMS. Download the pricing sheet from your dashboard.

Does Vonage offer volume discounts?

Yes. If you send large volumes, contact sales for custom pricing. Pay-as-you-go rates apply by default.

Can I receive inbound SMS for free?

US toll-free numbers often get free inbound SMS. Long codes charge $0.00649/inbound message in the US. Rates vary by country.

How does Vonage compare to Twilio for SMS?

Vonage’s US outbound rate is slightly higher than Twilio’s. Twilio has a larger developer ecosystem. Vonage stands out for carrier relationships via Ericsson and sometimes better international pricing. For US messaging, price differences are small; developer experience and support may be the deciding factors.

What does Cursor 3 mean for API developers?

Wanda — Fri, 03 Apr 2026 03:41:36 +0000

TL;DR: Cursor 3 launched on April 2, 2026, shifting from an IDE-first interface to an agent-first workspace. For API developers, major improvements include parallel agent execution, structured MCP tool outputs, and seamless cloud-to-local task handoff. By integrating Cursor 3 with Apidog's MCP Server, your AI agents can read live API specs and generate accurate, schema-aware code—no more manual copy-pasting.

Try Apidog today

The shift: From IDE to agent-first workflows

AI code editors have evolved rapidly, but Cursor 3 is a foundational redesign—not just an upgrade. Previously, agents in Cursor acted as assistants you called when needed. Now, agents are the main unit of work. You launch and manage multiple agents (like browser tabs), run tasks in parallel, and select the best results.

For API developers, this aligns with real project workflows—simultaneously writing endpoints, updating docs, and debugging schemas. Cursor 3’s tooling better matches the parallel, coordination-heavy nature of API development.

💡 Pro tip: Cursor 3 doesn’t natively read your API spec. Connect Apidog’s MCP Server once, and agents can pull OpenAPI schemas, endpoint definitions, and test scenarios directly. This prevents hallucinated field names and keeps generated code in sync with your spec.

This guide details what’s changed in Cursor 3, its impact for API workflows, and provides an actionable integration between Cursor 3 and Apidog’s MCP Server.

What’s new in Cursor 3

Cursor 3 ships significant upgrades for API developers:

Agents Window

The new Agents Window replaces the editor-centric UI. You can now run agents across multiple repositories, whether local, in git worktrees, cloud, or remote SSH environments.

Open: Cmd+Shift+P → “Agents Window”
Benefit: Launch multiple agents for parallel tasks (e.g., scaffold a new endpoint in one repo while another agent fixes a shared library).

Design Mode

Inside the Agents Window, Design Mode lets you annotate browser UIs directly—select elements, highlight areas, and add context for agents without verbose descriptions.

Shortcuts:
- Cmd+Shift+D – Toggle Design Mode
- Shift+drag – Select area
- Cmd+L – Add element to chat

MCP Apps: Structured Content Output

MCP Apps now support structured outputs. Tool responses from MCP servers (like Apidog) can now return rich, structured data instead of flat text.

Practical effect: Agents receive clean, typed API data (schemas, endpoints, test results), reducing misinterpretation.

Worktrees and `/best-of-n`

/worktree creates an isolated git worktree for safe, side-effect-free experimentation.
/best-of-n runs the same task in parallel across multiple models (e.g., Claude, GPT-4o, Gemini), each in its own worktree, so you can compare and pick the best result.

Cloud-to-local handoff

Agents can move seamlessly between cloud and local environments. Start a task in the cloud and pull it down for local testing—or push to the cloud to keep processes running when you’re offline.

Impact for API development

Parallel endpoint development

Assign each endpoint to a separate agent, run in parallel, and review all outputs together. This compresses the time from “need endpoints” to “reviewable drafts.”

Schema-aware code generation

When integrated with Apidog’s MCP Server, Cursor agents pull your real OpenAPI schemas:

// Example MCP server config in Cursor
{
  "mcpServers": {
    "apidog": {
      "command": "npx",
      "args": ["-y", "@apidog/mcp-server@latest"],
      "env": {
        "APIDOG_ACCESS_TOKEN": "your_access_token"
      }
    }
  }
}

Agents generate code that matches your true data models and endpoints—no more guesswork or manual corrections.

Contract testing inside the editor

Agents can execute terminal commands. Combine this with the Apidog CLI for automated contract validation:

apidog run --scenario <test-id>

Agents generate code, execute tests, analyze failures, and iterate—all from within Cursor.

Documentation review

Agents can cross-check your Apidog docs (via MCP Server) versus implementation, flagging mismatches as part of their review loop. This helps prevent documentation drift.

Actionable workflow: Cursor 3 + Apidog MCP Server

Here’s a step-by-step API development loop:

1. Connect Apidog MCP Server to Cursor

In Cursor MCP settings, add:

{
  "mcpServers": {
    "apidog": {
      "command": "npx",
      "args": ["-y", "@apidog/mcp-server@latest"],
      "env": {
        "APIDOG_ACCESS_TOKEN": "your_access_token"
      }
    }
  }
}

Get your access token in Apidog under Account Settings > API Access Token.

2. Scaffold a new endpoint

Add a new endpoint (e.g., POST /invoices) to your Apidog spec.
In Agents Window, instruct:

"Look up the POST /invoices endpoint in the Apidog project. Read its request and response schemas. Generate a Node.js/Express handler that matches the spec. Then run the test scenario to verify it."
The agent:
1. Calls get_endpoint_detail via MCP Server.
2. Generates handler code.
3. Runs apidog run --scenario invoice-creation-test --env staging.
4. Reviews failures and patches code.
You review and approve the final diff.

3. Use `/best-of-n` for complex endpoints

Let multiple agents generate implementations. Compare worktrees, and pick the best result based on error handling or code structure.

4. Keep docs in sync

Run an agent session:

"Check the Apidog documentation for POST /invoices. Compare it against the code in invoices.js. Flag any discrepancies. If the response shape in code differs from the spec, update the Apidog spec to match."

Agents analyze both sources via MCP and propose corrections.

Getting started: Practical setup

Step 1: Upgrade Cursor

Download the latest Cursor from cursor.com.
Open command palette (Cmd+Shift+P), select “Agents Window” to confirm Cursor 3 is running.

Step 2: Generate an Apidog access token

In Apidog, go to Account Settings > API Access Token.
Generate and copy your token.

Step 3: Add the Apidog MCP Server to Cursor

In Cursor Settings > MCP, add:

{
  "mcpServers": {
    "apidog": {
      "command": "npx",
      "args": ["-y", "@apidog/mcp-server@latest"],
      "env": {
        "APIDOG_ACCESS_TOKEN": "your_token_here",
        "APIDOG_PROJECT_ID": "your_project_id"
      }
    }
  }
}

Find your project ID in the Apidog project URL.

Step 4: Verify the connection

In the Agents Window, start a session and type: "List the endpoints in my Apidog project."
If you see your endpoints, the setup works.

Step 5: Install and configure Apidog CLI

npm install -g apidog-cli
apidog -v

In any Apidog test scenario, open the CI/CD tab and copy the CLI command for your project/scenario.
Run in Cursor’s terminal, or let an agent handle it.

Step 6: Run your first MCP-powered agent task

Prompt:

"Look up the schema for the User object in Apidog. Generate a TypeScript interface that matches it exactly."

Review output for accuracy. If correct, your integration is ready for complex workflows.

Wrapping up

Cursor 3’s agent-centric design aligns with the realities of modern API development: orchestrating parallel work across endpoints and environments. The structured MCP output—especially with Apidog’s MCP Server and CLI—lets agents generate better, spec-driven code and close the loop with automated tests. This isn’t a theoretical demo; it’s a daily workflow you can implement now.

Frequently asked questions

Does Cursor 3 replace the existing IDE interface?

No. Cursor 3 adds the Agents Window but keeps the IDE interface. You can use both interchangeably.

What's different between Cursor 3 and previous versions?

Cursor 3 centers on agents (not the editor), supports parallel execution, cloud-to-local handoff, Design Mode, and /worktree / /best-of-n commands.

How does Apidog MCP Server connect to Cursor 3?

Add Apidog MCP Server as an MCP configuration in Cursor Settings. It exposes your project’s API data as callable tools. Cursor agents use these tools to read endpoint specs, schemas, and scenarios in structured format.

Can Cursor 3 agents run Apidog test scenarios automatically?

Yes. With Apidog CLI configured, agents can run test scenarios, parse outputs, and iterate code—all within the agent workflow.

Do I need a paid Cursor plan to use the Agents Window?

Agents Window is available in all plans, but cloud agent execution requires a paid subscription. Local agents work on the free tier. See cursor.com/pricing for details.

How to run Gemma 4 locally with Ollama: a complete guide

Wanda — Fri, 03 Apr 2026 02:46:43 +0000

TL;DR

Gemma 4 launched on April 3, 2026, with Ollama v0.20.0 providing same-day support. You can pull and run the default gemma4:e4b model with just two commands. This tutorial shows you how to set up, select models, use the API, and test your local Gemma 4 endpoints using Apidog.

Try Apidog today

Introduction

Google released Gemma 4 on April 2, 2026. Ollama v0.20.0 shipped within 24 hours, supporting all four model variants.

Why should developers care? Gemma 4 is a significant upgrade: 89.2% on AIME 2026 (vs. Gemma 3's 20.8%) and a jump to 2150 ELO on Codeforces for coding. It features native function calling, configurable thinking modes, and a 256K context window on larger variants—all running locally.

For API-powered app development, local setup means you get a fast, private AI layer. Use it for generating mock data, writing test scenarios, and validating API responses—no cloud dependency.

💡 Once Gemma 4 runs locally, Apidog's Smart Mock can generate realistic API response data from your schema using AI-backed inference. Define your API shape once; Apidog handles the mock data—ideal for consistent, schema-compliant test data in local experiments.

This guide covers installation, running models, using the API, and testing endpoints.

What's new in Gemma 4

Gemma 4 ships four model variants:

Key improvements:

Reasoning and coding: 31B model scores 80% on LiveCodeBench v6 (Gemma 3 27B: 29.1%).
Mixture-of-Experts (MoE): 26B uses MoE (4B active params), giving high quality at lower compute.
Longer context: E2B/E4B support 128K tokens; 26B/31B support 256K—enough for large codebases or specs.
Native function calling: All models accept function schemas and return valid JSON—no prompt tricks.
Audio and image input: E2B/E4B accept audio and images.
Thinking modes: Enable/disable chain-of-thought per request as needed.

Gemma 4 model variants explained

Choose a model based on your hardware:

Model	Size on disk	Context	Architecture	Best for
`gemma4:e2b`	7.2 GB	128K	Dense	Laptops, edge, audio/image
`gemma4:e4b`	9.6 GB	128K	Dense	Most developers
`gemma4:26b`	18 GB	256K	MoE (4B active)	Best quality per GB
`gemma4:31b`	20 GB	256K	Dense	Max quality

The e4b model is default (ollama run gemma4). Fits most GPUs (10+ GB VRAM) and Apple Silicon.
26b is MoE: only 4B parameters active per token. Fast inference with near-flagship quality—good for 20+ GB RAM.

Prerequisites

Ollama v0.20.0 or later is required.

Check version:

ollama --version

Upgrade if needed:

# macOS
brew upgrade ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

For Windows, download the latest from ollama.com.

Hardware requirements:

gemma4:e2b: 8 GB RAM min (16 GB recommended)
gemma4:e4b: 10 GB VRAM or 16 GB unified memory
gemma4:26b: 20+ GB RAM or unified memory
gemma4:31b: 24 GB VRAM or 32 GB unified memory

Installing and running Gemma 4

Pull and run the default e4b model:

ollama run gemma4

This downloads ~9.6 GB and starts an interactive session. Try it:

>>> What are the HTTP status codes for client errors?

Run specific variants:

# Edge model, smallest
ollama run gemma4:e2b

# MoE for quality/size
ollama run gemma4:26b

# Full flagship
ollama run gemma4:31b

Pull without running:

ollama pull gemma4
ollama pull gemma4:26b

List installed models:

ollama list

Using the Gemma 4 API locally

Ollama exposes a REST API at http://localhost:11434.

Generate a completion

curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma4",
    "prompt": "Write a JSON response for a user profile API endpoint",
    "stream": false
  }'

Chat completion (OpenAI-compatible)

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma4",
    "messages": [
      {
        "role": "user",
        "content": "Generate a realistic JSON mock for an e-commerce order API response"
      }
    ]
  }'

Python client

import requests

def ask_gemma4(prompt: str, model: str = "gemma4") -> str:
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    response.raise_for_status()
    return response.json()["response"]

result = ask_gemma4("List the fields a payment API response should include")
print(result)

Using the OpenAI Python SDK

Ollama's API supports the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required by SDK, unused by Ollama
)

response = client.chat.completions.create(
    model="gemma4",
    messages=[
        {
            "role": "system",
            "content": "You generate realistic API response data in JSON format."
        },
        {
            "role": "user",
            "content": "Generate a sample response for a GET /users/{id} endpoint"
        }
    ]
)

print(response.choices[0].message.content)

Using function calling with Gemma 4

Gemma 4 supports native function calling—define a tool schema, get structured JSON matching your function signature.

Example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_user",
            "description": "Retrieve a user by ID from the API",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {
                        "type": "integer",
                        "description": "The unique user ID"
                    },
                    "include_orders": {
                        "type": "boolean",
                        "description": "Whether to include order history"
                    }
                },
                "required": ["user_id"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gemma4",
    messages=[
        {"role": "user", "content": "Get user 42 with their order history"}
    ],
    tools=tools,
    tool_choice="auto"
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # get_user
print(tool_call.function.arguments)  # {"user_id": 42, "include_orders": true}

The model extracts parameters from natural language, returning valid JSON—no post-processing needed.

Enabling thinking mode

For complex tasks (e.g., writing test scenarios, analyzing API specs), enable chain-of-thought reasoning:

response = client.chat.completions.create(
    model="gemma4",
    messages=[
        {
            "role": "user",
            "content": "Design a complete test scenario for a payment processing API with edge cases"
        }
    ],
    extra_body={"think": True}
)

print(response.choices[0].message.content)

Skip thinking mode for simple requests to reduce latency.

Testing Gemma 4 API responses with Apidog

With Gemma 4 running locally, use Apidog to test endpoints efficiently.

Steps:

Import Ollama API spec: In Apidog, create a new project; set base URL to http://localhost:11434.
Define endpoints: Add:
- POST /api/generate (single-turn completions)
- POST /v1/chat/completions (multi-turn chat)
- GET /api/tags (list models)
Set up Test Scenario: Chain requests with assertions:
- Step 1: GET /api/tags—assert gemma4 is listed.
- Step 2: POST /api/generate—assert response field is non-empty.
- Step 3: POST /v1/chat/completions—assert reply format.
- Use Apidog's Extract Variable processor to pass responses between steps for multi-turn flow testing.
Validate schemas: Apidog Contract Testing validates API responses against your OpenAPI spec. Define expected response shapes and run contract tests after model updates.
Parallel development with Smart Mock: Apidog's Smart Mock generates schema-compliant responses from your API spec, letting frontend teams work without waiting for the local model.

Multimodal input with Gemma 4

E2B and E4B models accept images alongside text. Send images as base64-encoded strings:

import base64

with open("api_diagram.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemma4:e4b",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    }
                },
                {
                    "type": "text",
                    "text": "Describe the API flow shown in this diagram and identify potential error paths"
                }
            ]
        }
    ]
)

Use this to analyze diagrams, screenshots, or extract info from images needed for your API.

Common issues and fixes

Model not found: Run ollama pull gemma4 or verify with ollama list.
Slow inference on CPU: Use gemma4:e2b for better performance.
Out of memory: Check VRAM/unified memory with ollama ps. Use smaller models if needed.
Apple Silicon issues: Update Ollama (0.20.0+ adds MLX support).
Port in use: Run OLLAMA_HOST=0.0.0.0:11435 ollama serve to use a different port.
Cut-off responses: Increase context window: add "options": {"num_ctx": 8192} to your request body.

Gemma 4 vs other local models

Model	Best size for most users	Context	Function calling	Coding benchmark
Gemma 4	e4b (9.6 GB)	128K-256K	Native	80% LiveCodeBench
Llama 3.3	70B-Q4 (40 GB)	128K	Native	~60% LiveCodeBench
Qwen3.6-Plus	72B-Q4 (44 GB)	128K	Native	Strong
Mistral Small	24B (14 GB)	128K	Native	Moderate

Gemma 4's MoE 26B (18 GB) delivers near-flagship quality with better tokens/sec than larger dense models.

For coding, 31B is competitive with larger models.
For laptops/edge, e2b runs under 8 GB.

Conclusion

Gemma 4 with Ollama is a powerful local AI setup. Installation is fast, the default model fits most developer machines, and the improvements over Gemma 3 are substantial.

Start with:

ollama run gemma4

Test the API using Apidog to validate endpoints, then select the right model variant for your needs.

For API-driven development, combining local inference with Apidog's Smart Mock and Test Scenarios delivers a complete, cloud-free workflow.

FAQ

How do I update Gemma 4 in Ollama when a new version comes out?

Run ollama pull gemma4 to fetch the latest version.

Can I run Gemma 4 on a machine without a GPU?

Yes, but it's slow (1–3 tokens/sec). e2b is best for CPU-only.

What's the difference between gemma4:e2b and gemma4:e4b?

Both are dense models. E4B has more parameters and better reasoning; E2B is smaller and supports audio input. For text, e4b is the better default.

Does Gemma 4 work with LangChain and LlamaIndex?

Yes. Point the provider to http://localhost:11434 and use gemma4 as the model name.

Is the local Gemma 4 API compatible with OpenAI code?

Mostly yes. Ollama's /v1/chat/completions endpoint matches the OpenAI format. Set base_url to http://localhost:11434/v1 and use any api_key.

How do I use Gemma 4's thinking mode?

Add "think": true in the extra_body (OpenAI SDK) or top-level JSON in direct API calls.

Can I serve Gemma 4 to other machines on my network?

Yes. Start Ollama with OLLAMA_HOST=0.0.0.0:11434 ollama serve and use your IP address.

What's the best Gemma 4 model for API development?

For mock data and tests, e4b balances speed and quality. For complex analysis, 26b MoE offers better results at lower resource cost.

How do you run Gemma 4 as an API backend?

Wanda — Fri, 03 Apr 2026 02:32:17 +0000

TL;DR:

Google released Gemma 4 in April 2026—a family of four open models under Apache 2.0 that outperform models 20x their size on standard benchmarks. You can access the Gemma 4 API via Google AI Studio, Vertex AI, or run it locally with Ollama and vLLM. Combine this with Apidog's Smart Mock to auto-generate schema-conformant API responses from your OpenAPI specs—no manual mock rules required.

Try Apidog today

Introduction

Open-source AI models often force developers to choose between capability and deployability: large models are powerful but hard to run locally, while smaller models lack advanced reasoning. Gemma 4, from Google DeepMind, breaks this compromise.

Gemma 4 is the most capable open model family Google has released. The 31B Dense model ranks #3 on Arena AI's leaderboard, outperforming models 20x its size. The 26B Mixture of Experts (MoE) is #6, with both running on a single 80GB GPU. The E2B and E4B models run fully offline on edge devices and phones.

For API developers, Gemma 4 offers native function calling, structured JSON output, and 256K token context windows—making it ideal for building AI-powered API tooling: generating test data, writing mocks, and analyzing API responses.

💡 Tip: Need to validate AI-generated responses against your OpenAPI spec? Apidog's Smart Mock engine auto-generates schema-conformant mock responses with zero manual rules. Connect Apidog to your Gemma 4 workflow to instantly produce contextually appropriate data.

What is Gemma 4 and What's New

Gemma 4 is the fourth generation of open language models from Google DeepMind. Since the Gemma series began in early 2024, it has seen 400+ million downloads and over 100,000 community variants.

Gemma 4 is licensed under Apache 2.0, enabling unrestricted commercial use, modification, and distribution—a major shift from previous custom licenses.

Key improvements:

Native multimodal input: All Gemma 4 models process images and video natively. E2B/E4B models also support audio for speech recognition.
Longer context windows: E2B/E4B: 128K tokens; 26B/31B: 256K tokens—enough for entire codebase prompts.
Agentic workflow support: Native function calling, structured JSON output mode, and system instructions support agent orchestration.
Advanced reasoning: 31B model excels at multi-step instructions and math, crucial for API test generation.
140+ language support: Trained natively on 140+ languages.
Apache 2.0 licensing: Removes legal ambiguity for commercial users.

Gemma 4 Model Variants and Capabilities

Gemma 4 comes in four variants, each optimized for different hardware:

Model	Parameters	Active params (inference)	Context	Best for
E2B	Effective 2B	~2B	128K	Mobile, IoT, offline edge
E4B	Effective 4B	~4B	128K	Phones, Raspberry Pi, Jetson
26B MoE	26B total	~3.8B active	256K	Latency-sensitive server tasks
31B Dense	31B	31B	256K	Highest quality, fine-tuning

E2B/E4B use Mixture of Experts (MoE) to minimize RAM and battery usage on edge devices. The 26B MoE, with only 3.8B active parameters during inference, is ideal for low-latency server tasks. 31B Dense is best for high-quality, complex use cases.

For API tooling, 26B MoE offers the best speed-quality tradeoff, while 31B Dense is optimal for structured JSON output and multi-step logic. All models support function calling and JSON output mode.

Setting Up Gemma 4 API: Step by Step

You can access Gemma 4 via Google AI Studio, Vertex AI, or run it locally.

Option 1: Google AI Studio (Fastest for Prototyping)

Sign up: Go to Google AI Studio and create a free account. Generate an API key.
Install SDK:

   pip install google-genai

Basic API Call:

   import google.generativeai as genai

   genai.configure(api_key="YOUR_API_KEY")

   model = genai.GenerativeModel("gemma-4-31b-it")

   response = model.generate_content(
       "Generate a JSON object for a user account with id, email, and created_at fields."
   )

   print(response.text)

Structured JSON Output:

   import google.generativeai as genai
   import json

   genai.configure(api_key="YOUR_API_KEY")

   model = genai.GenerativeModel(
       "gemma-4-31b-it",
       generation_config={"response_mime_type": "application/json"}
   )

   prompt = """
   Generate 3 sample user objects for an e-commerce API. 
   Each user should have: id (integer), email (string), username (string), 
   created_at (ISO 8601 timestamp), and subscription_tier (free|pro|enterprise).
   Return as a JSON array.
   """

   response = model.generate_content(prompt)
   users = json.loads(response.text)
   print(json.dumps(users, indent=2))

Option 2: Local Deployment with Ollama

Install Ollama: ollama.com
Pull Model:

   ollama pull gemma4

Run Server:

   ollama serve

OpenAI-Compatible API Call:

   import requests
   import json

   response = requests.post(
       "http://localhost:11434/api/chat",
       json={
           "model": "gemma4",
           "messages": [
               {
                   "role": "user",
                   "content": "Generate a valid JSON response for a REST API /products endpoint. Include id, name, price, and stock fields."
               }
           ],
           "stream": False
       }
   )

   result = response.json()
   print(result["message"]["content"])

Option 3: Function Calling for API Orchestration

Gemma 4 supports native function calling for tool-based workflows.

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

tools = [
    {
        "function_declarations": [
            {
                "name": "get_api_schema",
                "description": "Retrieve the OpenAPI schema for a given endpoint path",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "endpoint_path": {
                            "type": "string",
                            "description": "The API endpoint path, e.g. /users/{id}"
                        },
                        "method": {
                            "type": "string",
                            "enum": ["GET", "POST", "PUT", "DELETE", "PATCH"]
                        }
                    },
                    "required": ["endpoint_path", "method"]
                }
            }
        ]
    }
]

model = genai.GenerativeModel("gemma-4-31b-it", tools=tools)

response = model.generate_content(
    "I need to test the GET /users/{id} endpoint. What schema should the response follow?"
)

# Check if the model wants to call a function
if response.candidates[0].content.parts[0].function_call:
    fc = response.candidates[0].content.parts[0].function_call
    print(f"Model called function: {fc.name}")
    print(f"With args: {dict(fc.args)}")

Building AI-Powered API Mocks with Gemma 4

Use Gemma 4 to generate mock data directly from your OpenAPI schema—ideal for prototyping frontends or testing edge cases.

import google.generativeai as genai
import json

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel(
    "gemma-4-31b-it",
    generation_config={"response_mime_type": "application/json"}
)

# OpenAPI schema for the response
schema = {
    "type": "object",
    "properties": {
        "id": {"type": "integer"},
        "order_number": {"type": "string", "pattern": "^ORD-[0-9]{6}$"},
        "status": {"type": "string", "enum": ["pending", "shipped", "delivered", "cancelled"]},
        "total": {"type": "number", "minimum": 0},
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "integer"},
                    "quantity": {"type": "integer", "minimum": 1},
                    "unit_price": {"type": "number"}
                }
            }
        },
        "created_at": {"type": "string", "format": "date-time"}
    }
}

prompt = f"""
Generate 5 realistic mock responses for an order management API.
Each response must conform exactly to this JSON Schema:
{json.dumps(schema, indent=2)}

Make the data realistic: use realistic prices, product IDs, and varied statuses.
Return as a JSON array of 5 order objects.
"""

response = model.generate_content(prompt)
mock_orders = json.loads(response.text)
print(json.dumps(mock_orders, indent=2))

Gemma 4 respects enum values, string patterns, and numeric ranges—producing mock data that matches your API contract. You can generate mocks for any endpoint by feeding in the relevant schema.

For larger specs, paste your entire OpenAPI definition and request multiple test cases per endpoint. Export your Apidog collection as OpenAPI, feed it to Gemma 4, and get a complete mock dataset in seconds.

Testing Gemma 4 API Responses with Apidog

Once Gemma 4 is generating responses or powering your API, validate those outputs with Apidog's Test Scenarios.

Workflow:

Import Gemma 4 Endpoint:

In Apidog, create a new endpoint for your Gemma 4 API or wrapper. Set the response schema.
Use Smart Mock for Baseline Responses:

Use Smart Mock to auto-generate sample data from your schema—no manual rules. Property names like email or created_at get realistic, type-appropriate values.

Create a Test Scenario:

In Apidog's Tests module, build a scenario:
- Call your authentication endpoint (if needed)
- Send a prompt to Gemma 4 with the token
- Extract generated JSON from the response
- Validate JSON structure with schema assertions
- Pass the validated data to downstream endpoints
Set up Assertions:

Check status codes, headers, and JSON fields. Extract Gemma 4's output for further steps.
Data-driven Testing:

Import CSV/JSON files with prompt variations. Run all of them in one click to verify Gemma 4 handles diverse inputs.

Setup takes ~15 minutes. Afterward, run tests manually or via CLI in CI/CD.

Real-World Use Cases

API test data generation: Instantly generate hundreds of realistic records from your OpenAPI schema.
Intelligent API mocking: Return context-aware mock responses (e.g., product search returns different results by query).
API documentation generation: Feed your codebase to Gemma 4 and prompt it to generate OpenAPI docs.
Response schema validation: Analyze responses for missing fields, incorrect types, or enum mismatches.
Automated regression testing: Generate tests for historical bugs based on schema and bug reports.

Gemma 4 vs. Other Open Models for API Use

Model	Params	Context	JSON Output	Function Calling	License
Gemma 4 31B	31B	256K	Native	Native	Apache 2.0
Gemma 4 26B MoE	26B (3.8B active)	256K	Native	Native	Apache 2.0
Llama 3.3 70B	70B	128K	Via prompt	Via prompt	Llama Community
Mistral 7B	7B	32K	Via prompt	Limited	Apache 2.0
Qwen 2.5 72B	72B	128K	Native	Native	Apache 2.0

Gemma 4's native JSON mode, function calling, long context, and Apache 2.0 license make it a strong choice for API tooling. Llama 3.3 70B is competitive but needs double the compute. Qwen is strong for multilingual, but requires more hardware. Mistral is fast but limited in context and features.

Recommendation: Use Gemma 4 26B MoE for low-latency tasks, Gemma 4 31B for highest quality and structured output.

Conclusion

Gemma 4 gives developers a credible open-source alternative for building API tooling—no legal friction, no extensive prompt engineering. Native function calling and JSON output make it easy to integrate into any API workflow.

The four model sizes cover everything from edge devices to workstations. 26B MoE is the practical default for most use cases.

Pair Gemma 4 with Apidog for a seamless loop between AI-generated data and API validation. Use Gemma 4 for test data and mocks, Apidog's Smart Mock for schema prototyping, and Test Scenarios for contract validation. This workflow accelerates building and testing AI-powered APIs.

FAQ

What is Gemma 4?

Gemma 4 is Google DeepMind's latest open language model family (E2B, E4B, 26B MoE, 31B Dense) under Apache 2.0. The 31B model is #3 on Arena AI's leaderboard.

Is Gemma 4 free to use?

Model weights are free under Apache 2.0. You pay for compute. Google AI Studio offers a free tier; Vertex AI charges standard rates.

Can Gemma 4 output structured JSON?

Yes. Use response_mime_type: "application/json" with the SDK to force valid JSON output—ideal for programmatic API integration.

How does Gemma 4 compare to GPT-4o for API development?

GPT-4o is proprietary, requires cloud usage, and is more expensive. Gemma 4 31B is free for local deployment and competitive on reasoning benchmarks.

Can I fine-tune Gemma 4 on my API data?

Yes. Fine-tune via AI Studio, Vertex AI, or tools like Hugging Face TRL. Domain-specific fine-tuning improves output for custom schemas.

What hardware do I need for local Gemma 4?

31B/26B run on a single 80GB H100 GPU (bfloat16). Quantized versions fit on 16–24GB consumer GPUs. E4B/E2B run on phones, Raspberry Pi, and Jetson.

Does Gemma 4 support function calling?

Yes, all Gemma 4 models support native function calling—define tools as JSON objects, and the model calls them with structured arguments.

How do I test Gemma 4 API responses automatically?

Use Apidog's Test Scenarios to chain requests and assertions. Run tests locally, via CLI, or in CI/CD on every commit.

Qwen3.6-Plus API: Beats Claude on Terminal Benchmarks

Wanda — Thu, 02 Apr 2026 09:26:38 +0000

TL;DR

Qwen3.6-Plus is now officially released. It scores 78.8% on SWE-bench Verified and 61.6% on Terminal-Bench 2.0, outperforming Claude Opus 4.5 on terminal tasks. Features include a 1M token context window, the new preserve_thinking parameter for agent loops, and seamless integration with Claude Code, OpenClaw, and Qwen Code via an OpenAI-compatible API.

Try Apidog today

From Preview to Release

If you read our previous guide on Qwen 3.6 Plus Preview on OpenRouter, you know what this model can deliver. The preview launched on March 30 with no waitlist and free OpenRouter access, processing over 400 million tokens in just two days.

The official release brings a production-ready model available via Alibaba Cloud Model Studio. Now you get a stable API, SLA-backed uptime, and a new API parameter (preserve_thinking) that improves multi-step agent workflows.

This guide covers the key changes, how to use the API, and how to test your integrations with Apidog before production deployment.

What Qwen3.6-Plus Is

Qwen3.6-Plus is a hosted mixture-of-experts model from Alibaba's Qwen team. Like the Qwen3.5 series, it uses sparse activation for efficient compute.

Key specs:

1M token context window
Mandatory chain-of-thought reasoning
New preserve_thinking parameter for agentic tasks
Native multimodal support (vision, video, document understanding)
OpenAI-compatible API, Anthropic-compatible API, OpenAI Responses API

Open-source smaller variants will be available soon for self-hosted setups.

Benchmark Results

Coding Agents

Qwen3.6-Plus is just behind Claude Opus 4.5 on SWE-bench tasks, but leads on terminal operations.

Terminal-Bench 2.0 evaluates real shell operations—file management, process control, multi-step workflows with significant compute. Qwen3.6-Plus scores 61.6%, beating Claude Opus 4.5 at 59.3%.

General Agents and Tool Use

Benchmark	Claude Opus 4.5	Qwen3.6-Plus
TAU3-Bench	70.2%	70.7%
DeepPlanning	33.9%	41.5%
MCPMark	42.3%	48.2%
MCP-Atlas	71.8%	74.1%
WideSearch	76.4%	74.3%

MCPMark tests GitHub MCP tool calls. Qwen3.6-Plus leads on key planning and tool use tasks.

Reasoning and Knowledge

Benchmark	Claude Opus 4.5	Qwen3.6-Plus
GPQA	87.0%	90.4%
LiveCodeBench v6	84.8%	87.1%
IFEval strict	90.9%	94.3%
MMLU-Pro	89.5%	88.5%

Qwen3.6-Plus leads in science reasoning and instruction-following benchmarks, key for structured agentic tasks.

Multimodal

Benchmark	Qwen3.6-Plus	Notes
OmniDocBench 1.5	91.2%	Top in table
RefCOCO avg	93.5%	Top in table
We-Math	89.0%	Top in table
CountBench	97.6%	Top in table
OSWorld-Verified	62.5%	Behind Claude (66.3%)

Qwen3.6-Plus is ahead in document, spatial, and object detection tasks, though Claude leads in desktop automation.

How to Call the API

Qwen3.6-Plus is available on Alibaba Cloud Model Studio. Get your API key at modelstudio.alibabacloud.com.

Regional Base URLs:

Singapore: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Beijing: https://dashscope.aliyuncs.com/compatible-mode/v1
US Virginia: https://dashscope-us.aliyuncs.com/compatible-mode/v1

Basic Call With Streaming

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=[{"role": "user", "content": "Review this Python function and find bugs."}],
    extra_body={"enable_thinking": True},
    stream=True
)

reasoning = ""
answer = ""
is_answering = False

for chunk in completion:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        if not is_answering:
            reasoning += delta.reasoning_content
    if delta.content:
        if not is_answering:
            is_answering = True
        answer += delta.content
        print(delta.content, end="", flush=True)

The `preserve_thinking` Parameter

The preview only retained reasoning from the current turn. The official release adds preserve_thinking.

When preserve_thinking: true is set, the model keeps chain-of-thought from all prior turns—ideal for multi-step agent workflows. Disabled by default to save tokens.

completion = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=conversation_history,
    extra_body={
        "enable_thinking": True,
        "preserve_thinking": True, # keep reasoning across all turns
    },
    stream=True
)

Use Qwen3.6-Plus With Claude Code

Qwen's API supports Anthropic's protocol. Use Claude Code with Qwen3.6-Plus by setting environment variables:

npm install -g @anthropic-ai/claude-code

export ANTHROPIC_MODEL="qwen3.6-plus"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.6-plus"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=your_dashscope_api_key

claude

Use Qwen3.6-Plus With OpenClaw

OpenClaw is a self-hosted coding agent. Install and configure for Model Studio:

# Install (Node.js 22+)
curl -fsSL https://molt.bot/install.sh | bash

export DASHSCOPE_API_KEY=your_key
openclaw dashboard

Edit ~/.openclaw/openclaw.json to include:

{
  "models": {
    "providers": [{
      "name": "alibaba-coding-plan",
      "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
      "apiKey": "${DASHSCOPE_API_KEY}",
      "models": [{"id": "qwen3.6-plus", "reasoning": true}]
    }]
  },
  "agents": {
    "defaults": {"models": ["qwen3.6-plus"]}
  }
}

Use Qwen3.6-Plus With Qwen Code

Qwen Code is Alibaba's open-source terminal agent. 1,000 free API calls/day with Qwen OAuth.

npm install -g @qwen-code/qwen-code@latest
qwen
# Type /auth to sign in and activate free tier

Why `preserve_thinking` Changes Agent Behavior

Typical LLM APIs reset reasoning each turn. For multi-step agent tasks, this causes context drift.

With preserve_thinking, the model keeps all prior reasoning visible, making decisions more consistent over complex workflows and reducing repeated reasoning (saves tokens).

Example agent loop:

conversation = []

def agent_step(user_message, preserve=True):
    conversation.append({"role": "user", "content": user_message})

    response = client.chat.completions.create(
        model="qwen3.6-plus",
        messages=conversation,
        extra_body={
            "enable_thinking": True,
            "preserve_thinking": preserve,
        },
        stream=False
    )

    message = response.choices[0].message
    conversation.append({"role": "assistant", "content": message.content})
    return message.content

# Multi-step code review agent
result = agent_step("Analyze the auth module for security issues.")
result = agent_step("Now suggest fixes for the top 3 issues you found.")
result = agent_step("Write tests that validate each fix.")

Without preserve_thinking, step 3 won't have access to the issues found in step 1.

What It's Best For

Repository-level bug fixing: SWE-bench Verified 78.8%, Pro 56.6%. Strong for automated code repair/review pipelines.
Terminal automation: Top performer on Terminal-Bench 2.0; ideal for shell-heavy workflows and build pipelines.
MCP tool calling: MCPMark at 48.2%—best for MCP-based integrations.
Long-context document/code analysis: 1M token window handles codebase reviews and large documents.
Frontend code generation: Nearly tied with Claude Opus 4.5 for frontend tasks (QwenWebBench 1501.7 vs 1517.9).
Multilingual scenarios: WMT24++ at 84.3%, MAXIFE at 88.2% across 23 languages.

Testing Qwen3.6-Plus API Calls With Apidog

The endpoint is OpenAI-compatible. Import it into Apidog and test like any other API.

POST to https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
API key: Authorization: Bearer {{DASHSCOPE_API_KEY}}

Sample response assertions:

pm.test("Response contains choices", () => {
  const body = pm.response.json();
  pm.expect(body).to.have.property("choices");
  pm.expect(body.choices[0].message.content).to.be.a("string").and.not.empty;
});

pm.test("No empty reasoning when thinking enabled", () => {
  const choice = pm.response.json().choices[0];
  if (choice.message.reasoning_content !== undefined) {
    pm.expect(choice.message.reasoning_content).to.not.be.empty;
  }
});

Use Smart Mock in Apidog to generate test responses without hitting the live API.
For multi-turn agents, create a Test Scenario chaining requests. Validate that preserve_thinking carries reasoning across turns before production.

Download Apidog free to start testing.

What's Coming Next

Smaller open-source variants will be released soon, following the Qwen3.5 pattern (sparse MoE, Apache 2.0 weights).

Roadmap:

Longer-horizon, repository-level tasks (complex, multi-file problem solving)
Multimodal agent development, including GUI agents and visual coding

Qwen3.5 open-source models quickly became a default for self-hosted coding agents. Expect the same for Qwen3.6 variants.

Conclusion

Qwen3.6-Plus closes the gap with Claude Opus 4.5 on coding and leads in terminal, MCP tool use, and planning. With a 1M token context, Anthropic protocol support, and the new preserve_thinking parameter, it's ready for production agentic systems.

The official API brings stability, SLA coverage, and reliable agent-focused workflows.

Apidog simplifies testing: import the endpoint, add assertions, use mocks, and run regression tests as you update your model or API version.

FAQ

What is the difference between Qwen3.6-Plus and the preview?

The preview (qwen/qwen3.6-plus-preview) launched on OpenRouter on March 30, 2026. The official release adds the preserve_thinking parameter, SLA-backed uptime, and full Model Studio support. Smaller open-source variants are also coming.

What is preserve_thinking and when should I use it?

By default, only current-turn reasoning is kept. Set preserve_thinking: true to retain reasoning from all previous turns. Use for multi-step agent loops where past reasoning should inform next actions.

How does Qwen3.6-Plus compare to Claude Opus 4.5?

Claude leads on SWE-bench Verified (80.9% vs 78.8%) and OSWorld-Verified (66.3% vs 62.5%). Qwen3.6-Plus leads on Terminal-Bench 2.0 (61.6% vs 59.3%), MCPMark (48.2% vs 42.3%), DeepPlanning (41.5% vs 33.9%), and GPQA (90.4% vs 87.0%).

Can I use Qwen3.6-Plus with Claude Code?

Yes. Set ANTHROPIC_BASE_URL to the Dashscope Anthropic-compatible endpoint, ANTHROPIC_MODEL to qwen3.6-plus, and ANTHROPIC_AUTH_TOKEN to your Dashscope API key.

Is Qwen3.6-Plus open source?

The hosted API model is not open-weight. Smaller variants with public weights will be released soon.

How do I get free access?

Install Qwen Code (npm install -g @qwen-code/qwen-code@latest), run qwen, then /auth. Sign in with Qwen Code OAuth for 1,000 free API calls/day against Qwen3.6-Plus.

What context window does it support?

1 million tokens by default. Some benchmarks used 256K for comparison, but the API default is 1M.

How do I test the API integration before deploying?

Import the endpoint into Apidog, add your API key as an environment variable, write response assertions, and use Smart Mock for offline development. Chain requests into a Test Scenario to validate multi-turn agent behavior before production deployment.

Holo3:The best Computer Use Model ?

Wanda — Thu, 02 Apr 2026 08:47:36 +0000

TL;DR

H Company launched Holo3 on March 31, 2026—a mixture-of-experts model scoring 78.85% on OSWorld-Verified, setting a new high on the leading desktop computer use benchmark. It outperforms GPT-5.4 and Opus 4.6 at a lower cost. The API is live, and the 35B variant is open-weight on HuggingFace under Apache 2.0.

Try Apidog today

The computer use gap most developers haven't solved

You've automated your APIs and streamlined your CI/CD pipeline, but automating legacy enterprise software, old desktop apps, and multi-step workflows across several UIs remains a challenge. RPA tools (like UiPath, Automation Anywhere) usually rely on brittle screen-coordinate scripts that are prone to breaking with UI changes. Manual work has often been the fallback.

Computer use AI models solve this by interpreting screenshots and issuing GUI actions—click, type, scroll—allowing automation of any GUI, regardless of API support. Holo3, released by H Company, is currently the most capable public model for these tasks.

💡 If you’re automating workflows or testing pipelines involving desktop software, Holo3’s API is worth integrating. Below, learn exactly how to connect Holo3 calls into your workflow using Apidog.

What is Holo3?

Holo3 is a computer use model: provide a screenshot and a task description, and it returns a set of actions (clicks, keystrokes, scrolls) to execute on that UI. Repeat the process—screenshot, task, action—until the workflow completes.

Variants:

Holo3-122B-A10B: Flagship, 122B parameters (10B active). Hosted API only (details). Top performance.
Holo3-35B-A3B: 35B parameters (3B active). Open-weight on HuggingFace (Apache 2.0). Free API tier and self-hostable.

The MoE architecture means only a subset of parameters are used per token, making Holo3 significantly cheaper to run than parameter count alone suggests. H Company claims Holo3-122B-A10B is less expensive per task than GPT-5.4 and Opus 4.6.

OSWorld-Verified: what the benchmark actually measures

OSWorld-Verified is the main benchmark for AI computer use. Unlike text-only benchmarks, OSWorld evaluates execution: the agent must complete real desktop tasks, and success is determined by the post-task system state.

Task coverage:

Single-app tasks (e.g., open a file, fill a form)
Cross-app workflows (e.g., extract data from PDF, update spreadsheet, send email)
Long-horizon, multi-app sequences requiring context retention

Holo3-122B-A10B scores 78.85% on OSWorld-Verified. For reference, the previous state-of-the-art was ~40%; top models from Anthropic and OpenAI were in the 60–65% range.

H Company’s internal benchmarks show Holo3 excels in multi-app workflows—tasks that require reasoning and action across several applications at once.

How Holo3 was trained: the Agentic Learning Flywheel

Unlike most models trained on static demos, H Company uses a continuous loop called the Agentic Learning Flywheel:

Synthetic Navigation Data: Human and AI-generated instructions create navigation scenarios.
Out-of-Domain Augmentation: Programmatic extensions cover unexpected UI states and edge cases.
Curated Reinforcement Learning: Each example is filtered and used in RL to directly maximize task completion.

Training data comes from the Synthetic Environment Factory: coding agents build full enterprise web applications from scenario specs, creating realistic, verifiable training environments.

This approach enables Holo3 to outperform much larger base models on benchmark tasks.

How to call the Holo3 API

The Holo3 API uses a standard screenshot-action loop. Here’s how to implement it:

1. Set up authentication

# H Company Inference API base URL
https://api.hcompany.ai/v1

# Headers
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Get your API key at hcompany.ai/holo-models-api. The free tier covers Holo3-35B-A3B.

2. Send a screenshot with a task

import base64
import httpx
import pyautogui

# Capture a screenshot
screenshot = pyautogui.screenshot()
screenshot.save("/tmp/screen.png")

with open("/tmp/screen.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.hcompany.ai/v1/computer-use",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "holo3-122b-a10b",
        "task": "Open the invoice folder and find the most recent PDF",
        "screenshot": image_b64,
        "screen_width": 1920,
        "screen_height": 1080
    }
)

action = response.json()
print(action)

3. Parse and execute the action

API responses are structured actions to execute:

{
  "action_type": "click",
  "coordinate": [245, 380],
  "reasoning": "The invoice folder icon is visible at this position"
}

Action types: click, double_click, right_click, type, key, scroll, screenshot_request (model needs a fresh view), and task_complete.

4. Loop until completion

def run_computer_use_task(task: str, max_steps: int = 20):
    for step in range(max_steps):
        screenshot = capture_screen()
        response = call_holo3_api(task, screenshot)
        action = response["action"]

        if action["action_type"] == "task_complete":
            print(f"Done in {step + 1} steps")
            return response["result"]

        execute_action(action)

    raise TimeoutError("Task not completed within step limit")

Testing Holo3 API calls with Apidog

To ensure robust integration, use Apidog:

Import the endpoint: In Apidog, create a new HTTP request to https://api.hcompany.ai/v1/computer-use. Set the Authorization header as an environment variable.
Set up request validation: Use Apidog's test assertions to validate response structure:

// In Apidog's post-response script
pm.test("Action type is valid", () => {
    const validActions = ["click", "type", "key", "scroll", "task_complete", "screenshot_request"];
    pm.expect(validActions).to.include(pm.response.json().action.action_type);
});

pm.test("Coordinates are within screen bounds", () => {
    const action = pm.response.json().action;
    if (action.coordinate) {
        pm.expect(action.coordinate[0]).to.be.within(0, 1920);
        pm.expect(action.coordinate[1]).to.be.within(0, 1080);
    }
});

Mock the API during development: Use Smart Mock to simulate Holo3 responses, saving credits and enabling parallel frontend/backend dev.
Run test scenarios: Chain multiple Holo3 requests in a Test Scenario to simulate and validate full multi-step workflows before running on live systems.

Holo3 vs Claude Computer Use vs OpenAI Operator

	Holo3-122B	Holo3-35B	Claude Computer Use	OpenAI Operator
OSWorld-Verified	78.85%	~55% (est.)	~65%	~62%
API access	Yes	Yes (free tier)	Yes	Yes
Open weights	No	Yes (Apache 2.0)	No	No
Self-hostable	No	Yes	No	No
Cost vs GPT-5.4	Lower	Much lower	Comparable	GPT-5.4 pricing
Best for	Production enterprise	Dev/testing/OSS	Anthropic ecosystem	OpenAI ecosystem

Choose based on your stack:

Holo3-122B: For maximum accuracy on complex workflows; cost is secondary to reliability.
Holo3-35B: For development, testing, open source, or if you want to self-host.
Claude Computer Use: If you’re already in the Anthropic ecosystem.
OpenAI Operator: If you’re using GPT-5.4 and want single-vendor integration.

Enterprise use cases

Holo3 enables automation for workflows with no clean API-based solution:

Legacy system data entry: Automate data entry/extraction in old ERP and CRM systems without APIs.
Cross-platform reconciliation: Pull data from PDFs, check against spreadsheets, update dashboards—end-to-end.
Web app regression testing: Use Holo3 for plain-language task automation, avoiding brittle Selenium selectors.
Competitive intelligence: Browse and extract structured data from sites that block typical scraping.

Holo3 performs well across E-commerce, Business software, Collaboration, and especially Multi-App workflows.

What’s next: Adaptive Agency

H Company is developing Adaptive Agency—models that can navigate and learn new, bespoke enterprise software in real time, beyond what they've seen in training. The goal is on-the-fly reasoning about unfamiliar software structures and workflows.

If delivered, this will eliminate the last major gap in computer use AI for enterprise deployment.

Conclusion

Holo3 sets a new standard for desktop automation with 78.85% on OSWorld-Verified, outperforming Claude and GPT-based models on complex tasks. The free tier and open weights for Holo3-35B-A3B make it easy for developers to start.

The integration workflow is simple: screenshot, POST to API, execute action, repeat. Apidog streamlines this process—validating responses, mocking APIs during development, and running test scenarios before production.

If you’re building desktop GUI automation, start with Apidog and verify your Holo3 integration before deploying to production.

FAQ

What is Holo3?

Holo3 is a computer use AI model from H Company that takes screenshots as input and returns actions (clicks, keystrokes, scrolls) to complete tasks on a desktop or browser. It scores 78.85% on the OSWorld-Verified benchmark.

Is Holo3 open source?

Holo3-35B-A3B is open-weight under Apache 2.0 on HuggingFace. Holo3-122B-A10B is API-only. Both are available through H Company's inference API; 35B has a free tier.

How does the OSWorld benchmark work?

OSWorld tests AI agents on real computer tasks—web navigation, file management, cross-app workflows. Success is verified by checking the post-task system state. Tasks range from single-app to complex multi-app sequences.

How does Holo3 compare to Claude Computer Use?

Holo3-122B scores higher on OSWorld-Verified (78.85% vs ~65% for Claude) and is cheaper per task. Claude remains a solid choice if you’re already using Anthropic APIs.

Can I run Holo3 locally?

Yes, with Holo3-35B-A3B (Apache 2.0, HuggingFace). The 122B model is API-only.

What are main use cases for computer use APIs?

Legacy system automation (no REST API), cross-app workflows, web app regression testing without brittle selectors, competitive intelligence scraping, and any desktop workflow requiring manual intervention.

How do I test my Holo3 API integration?

Use Apidog to import endpoints, set up response validation, mock the API, and chain requests into test scenarios.

What is "Adaptive Agency" in Holo3's roadmap?

H Company is developing models that can navigate and reason about software they’ve never encountered, learning UI structure in real time—removing the final barrier for enterprise-scale computer use AI.

axios@1.14.1 Supply Chain Attack: What to Do Now

Wanda — Thu, 02 Apr 2026 08:45:00 +0000

TL;DR

On March 30–31, 2026, axios versions 1.14.1 and 0.30.4 were compromised on npm via a malicious dependency that deployed a remote access trojan (RAT) to affected systems. Both versions are now unpublished. The last known safe version is 1.14.0. If you installed axios@1.14.1 or 0.30.4, treat the machine as compromised and rotate all credentials immediately.

Try Apidog today

What axios is and why this matters

axios has over 100 million weekly downloads on npm. It’s the HTTP client behind countless frontend frameworks, backend Node.js services, and enterprise apps. When a core package like this is compromised, the impact is massive—developers running npm install in a narrow window on March 30–31 unknowingly pulled malware.

This was not a theoretical supply chain risk. The attack was confirmed and delivered a multi-stage RAT capable of executing arbitrary commands, stealing system data, and persisting on infected machines.

If your team uses axios, and you leverage Apidog to design and test your HTTP integrations, you need to act before your next deploy.

Timeline of the attack

March 30, 2026 — 23:59:12 UTC: A malicious package plain-crypto-js@4.2.1 was published to npm by an account tied to nrwise@proton.me. A clean 4.2.0 version was published 18 hours earlier as a plausible typosquat of crypto-js.
March 31, 2026 — 00:05:41 UTC: Socket’s automated detection flagged plain-crypto-js@4.2.1 as malicious six minutes after publishing.
March 31, 2026 — shortly after midnight: axios@1.14.1 was published, depending on the malicious plain-crypto-js@4.2.1. This release was not in the official GitHub tags (latest legit tag: v1.14.0).
March 31, 2026 — morning: A GitHub issue (#10604) reported both axios@1.14.1 and 0.30.4 as compromised. Maintainers couldn’t immediately revoke access; the attacker had higher npm permissions.
March 31, 2026: Both compromised versions were unpublished. Maintainers revoked tokens, tightened publish controls, and investigated how a long-lived npm token was abused.

How the attack worked

The attacker exploited a long-lived npm token used in axios’s publishing workflow, likely after compromising a maintainer’s credentials. This allowed them to publish a malicious version outside the normal release process.

Key steps:

The new version added plain-crypto-js@4.2.1 as a dependency, camouflaging it as a legitimate utility.
The earlier clean version (4.2.0) established benign history to avoid suspicion.
Payload analysis:
1. Stage 1: Malicious code executed at install time via npm lifecycle scripts, dropping a secondary payload.
2. Stage 2: The secondary payload deployed a persistent RAT.
3. Stage 3: The RAT enabled arbitrary shell command execution, exfiltrated environment variables and secrets, and sent system data to a remote server.

The RAT persists across reboots. Simply removing the npm package does NOT remove the RAT.

Am I affected?

You may be affected if:

You ran npm install axios or npm install (with axios in package.json) between March 30, 23:59 UTC and March 31, 2026 midday UTC.
Your node_modules/axios/package.json shows version 1.14.1 or 0.30.4.
Your package-lock.json or yarn.lock resolves axios to 1.14.1 or 0.30.4.

Check your environment:

# Check installed version
npm list axios

# Check lock file
grep '"axios"' package-lock.json | head -5

# Check for plain-crypto-js presence
npm list plain-crypto-js
ls node_modules/plain-crypto-js 2>/dev/null && echo "INFECTED" || echo "Not found"

If plain-crypto-js exists in node_modules, you ran the malicious version.

What to do right now

1. Update axios immediately

npm install axios@1.14.0
# Or pin to latest safe
npm install axios@latest

Verify:

npm list axios
# Should show 1.14.0 or higher (once new clean versions are published)

2. If you installed the compromised version

Treat the machine as compromised:

Rotate all secrets: API keys, DB credentials, SSH keys, cloud provider tokens, .env variables.
Check environment variables: RAT targets secrets in process env and filesystem.
Audit network connections: Review outbound traffic during the affected period for unknown IPs.
Scan for persistence: Inspect cron jobs, startup scripts, and systemd services created around the compromise.
Re-image the machine: On CI runners or production servers, re-install the OS. On developer laptops, rotate all credentials before considering the machine safe.

3. Audit your CI/CD pipelines

If your build pipeline ran npm install during the window, the CI environment may be compromised.

# Check build logs for affected timeframe
# Look for axios@1.14.1 in install output

# Verify current CI node_modules are clean
npm list axios plain-crypto-js

Rotate any secrets available to CI: deployment keys, cloud credentials, registry tokens.

4. Verify your lock file

Lock files (package-lock.json, yarn.lock) should pin safe versions. If you find 1.14.1, regenerate:

# Remove and regenerate
rm package-lock.json
npm install

Check that the new lock file resolves axios to a known safe version before committing.

Using Apidog to audit your axios API calls

If you use axios as your HTTP client, Apidog can help verify that your integration still works as expected after updating dependencies.

Steps:

Update to axios@1.14.0.
Import your existing API endpoints into Apidog.
Run regression checks to ensure no request/response behavior has changed.

For example, use Apidog’s response assertions to detect unexpected fields or headers:

// Apidog post-response assertion
pm.test("Response is clean — no injected fields", () => {
    const body = pm.response.json();
    pm.expect(body).to.not.have.property('__injected');
    pm.expect(pm.response.headers.get('X-Injected-Header')).to.be.null;
});

Running your test suite against the updated axios version in Apidog helps you establish a clean baseline before deploying.

Try Apidog free to set up HTTP client regression tests.

Why supply chain attacks on npm are hard to stop

The axios incident is part of a wider pattern:

event-stream (2018): Malicious payload targeting bitcoin wallets (8M downloads/week).
ua-parser-js (2021): Compromised to drop a cryptominer and password stealer.
node-ipc (2022): Maintainer added destructive code for specific geolocations.
xz utils (2024): Social engineering led to a backdoor in a core Linux utility.
axios (2026): Maintainer credentials compromised, RAT published via dependency.

Core problem: Trust is placed in publishing accounts, not just code. If a maintainer’s credentials are compromised, so is the package.

Mitigation strategies that help:

Measure	What it does
Lock files (`package-lock.json`)	Pin exact versions, prevent silent upgrades
`npm audit` in CI	Flag known vulnerabilities before deploy
Socket.dev / Snyk	Behavioral analysis—flag suspicious packages early
Two-factor auth on npm	Harden credential compromise
Publish with short-lived tokens	Limit exposure if a token leaks
Review lock files in PRs	Catch dependency changes in code review

The axios team is moving to tighter publish controls, but broader ecosystem changes are needed.

Indicators of Compromise (IOCs)

From Socket’s analysis:

Malicious packages: plain-crypto-js@4.2.1, axios@1.14.1, axios@0.30.4
Publisher email: nrwise@proton.me
Behaviors: Network connections at install time, RAT persistence, environment variable exfiltration
Safe axios versions: 1.14.0 and below (except 0.30.4), 1.13.x, 1.12.x

If you suspect infection, report to npm security: security@npmjs.com and preserve relevant logs.

Conclusion

The axios 1.14.1 incident underscores that dependency security is an ongoing process, not a one-off audit. Pin your versions, use behavioral analysis tools like Socket in CI, rotate secrets if anything looks suspicious, and always review lock file changes in code review.

If you need to re-validate your API integration after an axios update, Apidog gives you the assertion, mocking, and regression testing tools to verify HTTP client behavior before you ship.

FAQ

Which axios versions are compromised?

axios@1.14.1 and axios@0.30.4. Both are unpublished. Use 1.14.0 or any in the 1.13.x, 1.12.x lines.

What does the malicious axios payload do?

It brings in plain-crypto-js@4.2.1, which delivers a multi-stage payload, including a RAT that can execute remote commands, exfiltrate secrets, and persist across reboots.

How do I know if I installed the compromised version?

Run npm list axios—if it shows 1.14.1 or 0.30.4, you’re affected. Also run npm list plain-crypto-js—if present, the malicious code ran.

Is it enough to just update axios?

No. While updating removes the malicious dependency, the RAT may already be installed. Rotate all secrets and audit for persistence mechanisms.

How did the attacker publish to npm?

They likely compromised a maintainer’s credentials and used a long-lived npm token with publish access. The axios team is tightening publish controls.

How is this different from a regular vulnerability?

A vulnerability is a flaw in existing code. A supply chain attack injects malicious code through a trusted publish channel. The compromised code was not in axios’s GitHub—it was injected at publish time.

How can I protect my projects from future supply chain attacks?

Use lock files
Run npm audit in CI
Add behavioral analysis (e.g., Socket.dev)
Enable 2FA on npm accounts
Use short-lived publish tokens
Audit lock file diffs in code review

Best AI Coding Agent in 2026? Claude Code vs OpenClaw

Wanda — Thu, 02 Apr 2026 08:14:05 +0000

TL;DR / Quick Answer

Claude Code is your go-to for focused software engineering in the terminal or IDE: code edits, repo-aware reasoning, review automation, and controlled coding loops. OpenClaw is better for agent operations at scale: multi-channel messaging, multi-provider routing, plugins, and gateway-level automation.

💡 For API teams, don't frame it as just "Claude Code vs OpenClaw." Use one for coding/orchestration, then run the full API lifecycle—design, testing, debugging, mocking, documentation—with Apidog.

Try Apidog today

Introduction

Most "Claude Code vs OpenClaw" comparisons stop at surface-level differences. For real engineering decisions, you need practical, implementation-focused insights:

Product architecture & scope
CLI and automation surface
Permissions, approvals, sandboxing
Memory/context models
Integration and channel coverage
Multi-agent and operational controls
Real-world community use cases

You also need to know how Apidog fits in when your coding agent and API lifecycle tool are different products. If you build APIs with coding agents only, you still need structured tooling for schema-first design, regression testing, mocks, and documentation—Apidog gives you that workflow.

Main Section 1: Core Product Difference

Claude Code and OpenClaw overlap, but serve different core use cases.

Claude Code: Coding-centered agent. Focuses on codebase understanding, file edits, command execution, IDE integration, hooks, sessions, and CI workflows.
OpenClaw: Agent gateway platform (coding included). Emphasizes command breadth, model/provider flexibility, channel connectors, plugins, multi-agent routing, and operator controls.

What This Means in Daily Work

Claude Code: Optimizes the developer loop.
OpenClaw: Optimizes the agent platform loop.

If your team works mainly in code repos and PRs → Claude Code is a better fit.

If you need agents operating across chat channels, multiple providers, and gateway controls → OpenClaw is better.

Fast Positioning Table

Category	Claude Code	OpenClaw
Primary orientation	Coding agent	Agent platform + gateway
Main value	Developer workflow quality	Integration/orchestration
Typical interface priority	Terminal + IDE	CLI + channels + plugins
Best early adopter	Backend/platform devs	Automation-heavy operator teams
API lifecycle coverage	Partial (coding)	Partial (automation)

Main Section 2: Full Feature-by-Feature Comparison

1) CLI and Command Model

Claude Code: Coding-focused CLI, interactive/non-interactive, sessions, system prompt flags, model settings, worktree flows, tool restriction.
OpenClaw: Wider CLI for agents, models, memory, approvals, sandbox, browser, cron, webhooks, channels, plugins, secrets, security.

Actionable takeaway:

Use Claude Code CLI for tight coding loops.
Use OpenClaw CLI for full platform operations.

2) IDE Integration and Coding UX

Claude Code: Deep IDE integration (VS Code extension, inline diffs, diagnostics, selection context).
OpenClaw: Supports coding, but emphasizes cross-surface/cross-channel capability.

Tip: If IDE comfort is key, start with Claude Code.

3) Multi-Agent and Delegation

Claude Code: Subagents/agent teams for software.
OpenClaw: Strong multi-agent routing, workspaces, per-agent session/policy.

Use OpenClaw when you need explicit ops partitioning.

4) Memory and Long-Term Context

Claude Code: CLAUDE.md for instructions; auto memory, project-scoped.
OpenClaw: Semantic search + explicit CLI memory indexing/search.

Tip: For project-embedded memory, use Claude Code. For explicit ops memory, use OpenClaw.

5) Security Controls: Permissions, Approvals, Sandboxing

Claude Code: Permissions, hooks, settings for tool access.
OpenClaw: Detailed security docs—deployment, trust boundaries, approval policies, gateway hardening.

Action:

Claude Code for coding governance.
OpenClaw for gateway/multi-channel security.

6) Hooks and Deterministic Guardrails

Claude Code: Hooks for deterministic tool events.
OpenClaw: Hooks/event automation via gateway, plugins, ops commands.

Implement hooks in Claude Code for code standards; use OpenClaw for larger choreography.

7) Model Provider Flexibility

Claude Code: Claude-first, with 3rd-party infra options.
OpenClaw: Multi-provider, documented catalog.

Choose:

Claude Code for Claude-based standardization.
OpenClaw for provider-mix.

8) Channel and Messaging Integrations

Claude Code: Not main focus.
OpenClaw: Telegram, Slack, Discord, WhatsApp, Signal, Google Chat, Teams, IRC, Mattermost, etc.

If channels are central, pick OpenClaw.

9) Plugins and Extensibility

Claude Code: MCP, commands, hooks for dev workflows.
OpenClaw: Plugin lifecycle tooling (list, install, enable, disable, doctor), marketplace patterns.

Dev workflow extensibility → Claude Code. Platform extensibility → OpenClaw.

10) Operational Overhead

Claude Code: Faster onboarding for software teams.
OpenClaw: More flexible, but needs stronger ops discipline (gateway, runbook, security).

Action:

Go with Claude Code for fast coding setup.
Scale with OpenClaw if you need orchestration and can invest in ops.

Main Section 3: Community Use Cases (Field Signals)

Real-world usage highlights where each tool excels or fails.

Community Use Case A: Local Machine Access Scope

Lesson: Restrict scope on local execution. Prefer constrained directories/tasks over broad machine-level prompts.
Implementation: Always define explicit instruction scope and permissions.

Community Use Case B: Session-Limit Pressure & Scheduling

Lesson: Plan for session limits in Claude Code-heavy teams.
Implementation: Batch jobs, schedule off-peak, segment work.

Community Use Case C: OpenClaw + Telegram Local Deployment

Lesson: OpenClaw works for remote, channel-driven workflows after security hardening.
Implementation: Harden deployment before going live on chat channels.

Community Use Case D: OpenClaw as Orchestration Layer

Lesson: OpenClaw can be control plane; Claude Code as coding worker.
Implementation: Use OpenClaw for pipeline orchestration, Claude Code for implementation.

Community Use Case E: Channel-First Automation Experiments

Lesson: OpenClaw enables rapid channel-based/cross-system automation.
Implementation: Use OpenClaw for hackathon or experimental channel-native projects.

Summary:

Claude Code = best for engineering loops in repos/IDE
OpenClaw = best for orchestration across channels/agents

Main Section 4: Onboarding Price and Onboarding Time

Onboarding Price Snapshot (March 27, 2026)

Item	Claude Code	OpenClaw
Base product access	Anthropic plans (Pro $20/mo, Max $100+/mo) or API pay-as-you-go	Open-source MIT, no license fee
Direct seat/license cost	Non-zero (subscription)	$0 software license
Usage cost driver	Claude usage limits or API	Model provider spend + infra
Budget planning style	Seat/subscription or token	Infra + provider-token

Onboarding Time Snapshot

Step	Claude Code	OpenClaw
First install	Short (Node + CLI)	Short (installer + `openclaw onboard`)
Time-to-first-use	Fast (terminal/IDE)	Fast (dashboard/chat); more time for channels
Time-to-prod governance	Medium	Medium-high
Biggest setup risk	Policy/permission drift	Gateway/channel misconfig

Practical Cost-Time Takeaways

Claude Code: Predictable entry cost if you're already on Anthropic.
OpenClaw: $0 license, but operational cost depends on provider/infra.
Claude Code: Faster onboarding for coding-only.
OpenClaw: Fast for dashboard/local; more complex with channels/security.

Main Section 5: Where Apidog Fits (Non-Negotiable for API Teams)

Neither Claude Code nor OpenClaw replaces end-to-end API lifecycle management.

If you need:

API contract source of truth
Regression-grade endpoint tests
Mock environment parity
Production-ready docs

Use Apidog.

Recommended Architecture

Use Claude Code or OpenClaw for implementation/refactor.
Store API definitions and schema-first workflow in Apidog.
Run regression/assertion scenarios in Apidog.
Publish/maintain docs from Apidog.
Use Apidog mocks/environments for frontend/QA stability.

Example: Agent + Apidog Validation Loop

# Generate/refine service code using your agent
npm run dev

# Then in Apidog:
# 1) Import OpenAPI or collection
# 2) Configure environments/auth vars
# 3) Create scenario assertions for success/failure
# 4) Save as regression suite

Example Payload for Regression Scenario

{
  "request": {
    "method": "POST",
    "url": "/v1/invoices",
    "body": {
      "customerId": "cus_1001",
      "amount": 1499,
      "currency": "USD"
    }
  },
  "expect": {
    "status": 201,
    "json": {
      "id": "string",
      "customerId": "cus_1001",
      "currency": "USD",
      "amount": 1499
    }
  }
}

Agent speed + Apidog validation = fewer regressions.

Main Section 6: Decision Framework by Team Profile

Pick Claude Code first when:

Your bottleneck is developer speed in codebases.
Team works in terminal/IDE all day.
You need coding-specific UX and policy hooks.
Multi-channel agent ops are not core.

Pick OpenClaw first when:

You need assistants on chat channels/ops surfaces.
Multi-provider flexibility is required.
Explicit gateway/orchestration control is needed.
You're ready for more operational complexity.

Use both when:

OpenClaw for orchestration/control plane.
Claude Code as coding specialist.
Clear governance boundaries.

Always pair with Apidog when:

Your product depends on APIs.
You need contract confidence, regression safety, docs quality.
Backend, QA, frontend, docs need to share one API workspace.

Main Section 7: 30-Day Pilot Plan (Recommended)

Don't choose by opinion—test by rollout.

Track: PR cycle time, escaped API defects, regression pass rate, policy incidents.
Pick: One CRUD-heavy API + one integration-heavy API.
Run: Add endpoint, refactor module, fix production bug, add regression tests.
Measure: Setup time, policy tuning, incident resolution.

Implementation:

Define metrics before testing.
Select two representative services.
Run identical tasks on both setups.
Keep API checks in Apidog constant.
Compare operational cost.
Review findings with engineering/security.

This gives you a defensible, measured decision.

Main Section 8: Implementation Playbooks by Team Type

Playbook A: Startup API Team (5-12 engineers)

Use one coding agent for first 60 days.
Standardize code-review/command-safety policy.
Keep all API contract/regression work in Apidog.
Weekly metric review: lead time, rollback count, API test pass rate.

Why: Avoids tool sprawl, keeps API quality stable as prompts evolve.

Playbook B: Mid-Size Multi-Product Team

Claude Code for repo-heavy squads.
OpenClaw for channel-driven ops squads.
Shared Apidog workspace taxonomy.
Each team publishes endpoint change notes with Apidog test evidence.

Why: Teams get correct tools; Apidog is the quality layer.

Playbook C: Platform or DevEx Team

Use OpenClaw for cross-channel/system orchestration.
Keep Claude Code for deep codebase/refactor tasks.
Define trust boundaries, approval rules before rollout.
Use Apidog for API behavior checks pre-deployment.

Why: Separates orchestration/coding depth; reduces cross-team incidents.

Conclusion

Claude Code and OpenClaw are both powerful, but in different domains:

Claude Code: Best for pure coding execution.
OpenClaw: Best for orchestration and channel integration.
Community usage confirms this split.
For reliable API delivery, pair both with Apidog.

To maximize API velocity:

Select your coding/orchestration layer for your workflow, then standardize API lifecycle quality in Apidog.

FAQ

Is this a direct one-to-one comparison?

Not exactly. There's overlap, but the focus is different: Claude Code is coding-centric, OpenClaw is orchestration-centric.

Can OpenClaw replace Claude Code completely?

Depends on coding depth. OpenClaw can handle broad automation, but Claude Code is stronger for daily coding.

Can Claude Code replace OpenClaw for channel-driven workflows?

No—if channel operations are central, OpenClaw is the natural fit.

Why include community signals?

They reveal real-world scope, failure modes, and onboarding friction sooner than formal case studies.

Does Apidog overlap with either tool?

No—Apidog complements both. It handles API lifecycle control and collaboration, not code generation.

What's the safest way to start?

Start narrow: constrained scopes, explicit approvals, auditable test flows, and Apidog-based API validation before scaling automation.