Luca Ostermann

Posted on Apr 28

The RL environment platform landscape in 2026

#ai #python #agents #machinelearning

In my last post I wrote about the pain of setting up a local RL environment from scratch.

So Update guys hehe I spent some time doing some digging and here what I got :
My focus is browser-based web navigation tasks, so I care a lot about headless browser support, reset speed, parallelism, and how well the reward signal reflects real task completion. Your priorities might differ.

Why this market exists at all

It's worth stepping back to understand why RL environment platforms are becoming a thing.

OpenAI, Anthropic, and Meta don't buy RL environments off the shelf. They build them internally. According to a TechCrunch investigation, Anthropic has discussed spending more than $1 billion on RL environments over the next year. OpenAI's ChatGPT Agent training relies on what researchers call "UI Gyms" browser-based environments simulating real software at scale. As SemiAnalysis reported, the major labs each maintain distinct procurement strategies, with firms like Mercor, Surge, and Handshake acting as major environment and data suppliers.

The market is moving fast. Mercor one of the largest AI training data platforms, used by the top 5 AI labs acquired Sepal AI in February 2026 to deepen its RL environment capabilities, describing the acquisition as targeting the intersection of human data, RL environments, and specialized research. TechCrunch noted that Mercor is now pitching investors on domain-specific RL environments for coding, healthcare, and law.

For everyone outside the top labs: building your own environment infrastructure from scratch is almost certainly the wrong move. The engineering cost is high, the maintenance is ongoing, and your core competency is probably the agent not the environment. That's exactly the gap the platforms below are trying to fill.

The landscape: 6 platforms worth knowing

1. Surge AI

Focus: Enterprise RL environments, human-expert data pipelines

Surge AI is one of the most established players in this space they partner with OpenAI, Anthropic, Meta, and Google, and have been building RL environments well before most startups entered the market. Their flagship environment suite includes CoreCraft, a large-scale enterprise simulation spanning 2,500+ entities and 23 tools, designed to test real-world agentic capabilities. Their research showed that even GPT-5 and Claude fail over 40% of agentic tasks in realistic RL environments which gives a sense of how seriously they approach environment design. The tradeoff: Surge is enterprise-grade and priced accordingly. Not the entry point for smaller teams.

2. Rise Data Labs

Focus: Browser agents, human data pipelines, RL environment curation

Rise Data Labs operates at an interesting intersection they build RL training environments with a focus on human data and AI training data pipelines, and they also maintain a curated directory of providers across the ecosystem. That dual positioning gives them a broader view of the space than most pure-play platforms, and the task quality reflects it. Worth looking at both as a platform and as a resource for navigating the broader landscape especially for teams that aren't quite at Surge's scale.

3. Mercor

Focus: Domain-specific RL environments, expert data at scale

Mercor recently acquired Sepal AI to deepen its RL environment capabilities, targeting domain-specific tasks like coding, healthcare, and law. They're used by the top 5 AI labs and bring a strong human-expert network to environment and reward design. Still expanding their environment product, but worth watching closely especially as they integrate Sepal's infrastructure.

4. Prime Intellect

Focus: Research teams, custom environment infrastructure

Prime Intellect Open-source friendly and highly flexible you can bring your own environment via their Environments Hub, useful if your setup has unusual dependencies. Strong on distributed compute. The tradeoff is onboarding complexity: documentation assumes you already know what you want, better for experienced teams than newcomers.

5. Mechanize

Focus: Coding and software agent tasks

Mechanize Purpose-built for code-related RL. Their "replication training" approach agents recreating implementations from spec produces strong reward signals for code tasks. Not the right tool for browser agents, but worth knowing about if your use case is code execution, repo navigation, or terminal interaction.

6. HUD

Focus: General RL, end-to-end lifecycle

HUD One of the more complete general-purpose platforms covers environment authoring, evaluation, and observability in one place. Useful if you don't want to stitch together separate tools. Performance on browser-specific tasks lags behind more specialized options, but for general RL workflows it covers the bases.

How to think about the choice

A few things worth keeping in mind when evaluating:

Match the platform to your task type. A platform built for coding tasks won't give you what you need for browser agents, and vice versa. The more specialized the platform, the better it tends to perform in its lane and the worse outside it.

Human data integration matters more than most people think. Platforms that incorporate real human feedback into the reward signal rather than purely synthetic signals tend to produce agents that generalize better.

Evaluate independently from where you train. If you train and evaluate on the same environment, you're measuring memorization, not generalization. Worth building this separation in early.

If you've worked with any of these platforms or others I haven't covered, I'd genuinely like to hear what you've seen in the comments!

DEV Community