What we learned building healthcare integrations for the past year

Michael Kronovet — Fri, 01 May 2026 23:05:42 +0000

Over the past year we’ve been building integrations with EHRs and payor portals that don’t expose real APIs. It’s been a pain in the ass. Thought it’d be helpful to share what we’ve learned and also hear from others what’s worked well or hasn’t worked for them.

I’ll start with saying that integrating with things in healthcare is hard

Healthcare data is extremely fragmented and the APIs that are accessible are missing tons of functionality or just don’t exist. FHIR endpoints are missing the data you actually need and lack writeback. Going directly through EHR vendors is expensive and slow. Most payor portals have no APIs at all, and if they do they’re really limited. Companies that offer third-party APIs have huge coverage gaps.

So, as all the other healthcare startups do, we decided to embrace browser automation.

Starting with fully agentic browser agents

We started with fully AI-driven browser agents, because it was the lowest implementation lift and seemed like it would be robust. We soon realized though that the EHRs flows we needed to run were very non-intuitive and involved tons of clicks. Agents were not able to perform the workflows correctly, and they were also slow and expensive since they required LLM API calls for each action and we didn’t like the idea of giving an agent full autonomy to click anything it wanted.

Then we moved to using Stagehand, which seemed like a good mix of determinism+AI to catch errors when they popped up. Sadly, that ended up causing us a lot of problems. Stagehand relied on the accessibility tree, which did not always map to the correct actions/selectors on the EHRs, and it consistently failed on more complex DOMs like Athena. We also realized that the errors we were encountering were not even caused by the kinds of changes Stagehand was designed for.

Moving to playwright

We realized that the errors we encountered when running automations were almost always in one of the following buckets:

Nondeterministic popups
Things on screen rendering extremely slowly
Edge cases with internal logic that we had not anticipated (e.g., you see duplicate patients and don’t know which chart to access with no clear them, something you absolutely don’t want an AI resolving on its own)

We realized that we could make our code much more reliable if we just built a lightweight popup detector and smarter wait times/reloading. For the edge cases, we needed to be in the loop for them anyways, since we did not want a LLM doing something that we did not anticipate and fell outside of what the anticipated workflow was (e.g., trying to link the correct referral between seemingly duplicate patients)

Using network requests where possible

For reliability, speed, and maintainability, we rebuilt some of our integrations as direct API/network calls. These were quicker and more reliable than relying on browser agents or playwright scripts. However, some websites have security setups that make this the wrong integration approach because they will detect that you’re a bot and ban you.

The bigger shift: build-time AI vs. runtime AI

The most important mental model change we had was moving from runtime AI (agents making decisions on the running playwright scripts live) to build-time AI (using Claude Code to generate and iterate on Playwright scripts + having Claude fix errors). As engineers, it was important for us to have full visibility and control over the actual code, and we want to be in the loop whenever an edge case pops up in the workflow, so we can decide on the right way to handle it.

Our current stack

Script creation: Local development with Playwright + Claude Code. We’ll step through and record workflows, leave some comments, then run the recorded flow and have Claude connect to the running script to inspect the logs and network requests to build its automations.

Robustness: We have certain specific cases where we add AI fallbacks like a popup detector that will take a screenshot and use X/Y coordinate to close the popup and retry the logic it had initially intended.

Infra: We had been self hosting everything on GCP as Cloud Run jobs where on failure we captured screenshots and structured error logs to make it easy for an agent to debug. Then we pull the task locally and rerun to make sure it works. Recently moved to hosted platforms like Kernel and Browserbase. Has seemed to make scripts a lot more flaky though.

Edit
We recently bundled all our internal logic for building and maintaining these workflows into an open source skill/CLI called libretto: https://github.com/saffron-health/libretto

DEV Community: Michael Kronovet

What we learned building healthcare integrations for the past year