Kazuya

Posted on Dec 9, 2025

AWS re:Invent 2025-How Heidi Health is leveraging GenAI to transform the global healthcare industry

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025-How Heidi Health is leveraging GenAI to transform the global healthcare industry

In this video, Ocha from Heidi shares how they scaled their AI medical scribe to become the most used globally with 370,000+ clinicians and 10 million consults monthly. He explains three key lessons: building confidence in AI through "clinicians in the loop" evaluation processes using synthetic data and LLM-as-judge tools; navigating global expansion challenges including data sovereignty, model availability across regions, medical terminology differences, and evolving regulations by using infrastructure as code with Amazon Bedrock and EKS for standardized deployment; and the importance of focusing on one workflow, treating clinicians as core product assets, and building flexible architecture from day one for business survival.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Heidi: Building Confidence in AI Through Clinicians in the Loop

Hi, I'm Ocha. It's amazing to be here at re:Invent. In the past three days, I've learned a lot from the sessions, mixers, and fellow startups on how they're developing AI. So in this session, it's time for me to contribute our learnings in scaling generative AI in real-world healthcare. So this is the agenda for today. First of all, I would like to introduce you to Heidi, our journey on becoming one of the biggest AI scribes in the world, and how we are navigating through the healthcare system with AI. And at the end of this lightning talk, I hope you can get some valuable takeaways to contribute back to the product you're building.

So before we start, hands up who has been to the doctor and then felt rushed or couldn't get your doctor's attention because they were staring at the computer. Yeah, looks like everyone. It's unfortunate because oftentimes at this moment we felt frustrated, but it's actually not the doctor's intention because they had to deal with administrative tasks of writing notes. So Heidi is here to solve that problem. At Heidi, we want clinicians to enjoy what they do without the hassle of the admin, giving time back to the doctors and providing better patient experience. And that is just one of our initiatives on our mission to double the world's healthcare capacity by providing an AI care partner.

So here's Heidi in action. Imagine you're a doctor and you're currently having a consultation session with a patient. Once you start transcribing, Heidi will generate clinical notes without any modification or further action needed, and that's not the only thing Heidi can do. Based on the templates that they created, doctors can now use the notes as their reference to create patient explainer letters. Doctors can then ask AI to do clinical research, and even Heidi can give suggestions to doctors on the tasks that they need to do as a follow-up of the consultation. This is exciting because doctors can now focus on patients and get Heidi to do all of the admin stuff.

So a little about our journey. Our founder Tom Kelly, who was a practicing doctor, started building a chatbot tool named OSCER to help medical students in mastering their clinical examination using early transformer models. Seeing some success, we jumped to full healthcare, where we created a care platform to support doctors and enhance patient care. This worked for a while until generative AI emerged where we could leverage the capability of non-deterministic output. We focused on one workflow, clinical note generation, and landed on Heidi, an ambient AI scribe that eases documentation burden. We've grown out of a small company in Australia, and now we're recognized as a global player. Looking at the numbers, Heidi is the most used AI scribe globally. 370,000 plus clinicians on Heidi, 10 million consults per month, number one AI scribe by adoption in Canada, and backed by well-known investors, and this is just a start. And that's me. So I've been with Heidi for about four years now. One of the founding engineers at Heidi, I was solo developing software infrastructure, observability, security compliance, and all that fun stuff. And now that we're growing at lightning speed, I'm focusing on platform engineering, giving engineers the tools and experience they need to develop securely and efficiently.

So let's get to the first lesson, which is the challenge of AI in healthcare, which is building confidence in AI. So when we started developing Heidi, we immediately thought of how we could personalize and customize notes so that it could write the same as clinicians. Our custom templates have been a huge success and doctors loved them, but as engineers we kept thinking about how we could make it more efficient, thinking about latency, thinking about context windows of large language models. But as more clinicians used Heidi, we encountered more unique cases related to their specialty. For example, how do we get the tone, the specifics of our note summary correct for each doctor so they feel confident and they can rely on it. This is where we realized that truth is what matters. Healthcare requires clinical accuracy, and we were trying to validate non-deterministic outputs at scale. You can't just write unit tests for clinical empathy or diagnostic nuance. We needed doctors.

So what this meant at the start, back then we only had a couple of doctors, this meant we started giving doctors a Jupyter notebook each. Jupyter notebook is a tool used by data scientists to write code and experiment.

And doctors can start experiments connecting with LLMs, enter the prompts, the transcription, change the temperature, and so on. But the problem is that doctors in the end have to aggregate the results to summarize all of the testing. So to address that problem, we started to provide JupyterHub hosted on EC2 as a collaboration tool so that the doctors don't have to consolidate everything in the end.

Clearly this is not going to work at scale, so not every clinician or doctor is going to be a coder. It works okay for scrappy testing with small numbers of doctors. So how do we scale it then? Picture this: how does a typical clinician workflow look like in Heidi for transcription and note generation? At first, we needed data points we can use when doing evaluation, but how are we going to do that in a testing environment because we can't use any user's data?

One of the ways that the doctors do is we can do some mock consultations and have case studies with Heidi users, but most importantly, the most recent technique that we use is using LLMs to try to create synthetic data to generate consultations both in audio form and text form. With enough data, clinicians can start to do more evaluation on word error rates, template adherence checks, basically making sure that the templates that clinicians made are safe, medically safe, hallucination rate checks, and this process is what we call clinicians in the loop.

As Heidi started to hire more and more clinicians, the evaluation process needs to be scalable. At this stage, engineers started to introduce more tooling that can help clinicians, such as internal tooling for evaluating flagged sessions to be reviewed in the testing environment, building connections between the sessions and LLM contexts to gain better understanding, and also LLM as a judge tool to do it at scale. All of these processes can be used as a feedback loop to improve the model, prompts, and medical safety. It shaped our product and engineering decisions, hiring, and go-to-market.

Scaling Globally: Navigating Data Sovereignty, Model Availability, and Regional Healthcare Complexity

So speaking about scaling, scaling clinical experts is one challenge. The next is how we scale Heidi outside of Australia. When we started to expand Heidi globally, we quickly realized that healthcare isn't a single standard. We encountered four distinct layers of complexity that we have to solve simultaneously.

First is data sovereignty. This isn't just about storage, it's about strict data locality and network architecture. For example, in Australia, we must strictly use AP Southeast 2 or AP Southeast 4, the newest region in Melbourne, whereas in the US we might utilize US East 1 or US West 2. It's not just where the data is stored, but it's also how it moves. We need to ensure that workloads stay private using a well-architected VPC network to control exactly how systems communicate with each other within those specific borders.

And the second is model availability. If you're building solely for the US, it's relatively easy because models are available everywhere here in the US. You can pick almost every provider, but the moment you try to expand to new regions, that luxury disappears. We suddenly had to think about other options because the models we wanted simply were just not available or not compliant in those local zones.

And third is medical reality itself. A GP appointment in Australia looks very different from a primary care visit in New York. It's not just about the accent, it's training, consultation flow, and the medical terminology. Heidi has to adapt to these nuances to capture the consultation accurately.

And finally, we are building on shifting sand. Gen AI is a new frontier that is actively influencing the regulatory space. Navigating different regions means managing different compliance requirements simultaneously. This isn't just a legal headache, it directly affects our product roadmap, our engineering decisions every single day.

So facing these four massive hurdles, we had to architect a solution that could handle them all. So how did we actually meet these challenges from a technical perspective? The answer lies in standardization. We need to make sure all of our infrastructure in AWS is standardized across every single region. We utilize infrastructure as code to ensure that our deployment is consistent. This gives us a flexible architecture

that allows us to easily deploy into new regions without reinventing the wheel. Essentially, we treat new regions as plug and play templates. You'll notice that there is an EKS cluster in the diagram. It's a bit small, so I'm going to make it quite big over there. This is central to our strategy for model availability.

When we're talking about immediate availability, when we're entering new regions, we use LLM providers that are already available and compliant in that designated region, like Amazon Bedrock. This solves the immediate cold start problem. However, in the long term, it is imperative for us to have infrastructure that can support self-hosted models. This is where EKS shines, since AWS EKS supports most of the global regions. Once we have our infrastructure template ready, we can serve our own inference models everywhere. This hybrid approach, Bedrock for speed and EKS for control, solves the model availability globally.

But as I mentioned earlier, healthcare isn't just code, right? It's people. Once the technical pipes have been laid, we still face the massive non-technical hurdle, which is building trust. Trust starts with speaking the language, and I don't just mean French or Spanish. I mean medicine. We hire clinician ambassadors in every region we operate in. These are doctors who believe in Heidi's mission and provide specific on-ground support. These aren't just consultants. They provide specific support to ensure Heidi can speak the local medical dialect. They validate that Heidi doesn't just translate words, but also understands local practice patterns, ensuring the output feels natural to a GP in New York or a specialist in Sydney.

Finally, we tackle the complex regulatory requirements through a rigorous compliance network. We established a dedicated internal legal and compliance team that manages the shifting landscape of international laws. We also work with external partners, especially focusing on medical safety. This ensures that we move fast on infrastructure, but we never compromise on safety. By combining this plug and play technical architecture with a human-in-the-loop trust strategy, we're able to scale globally while staying local.

So if there is one thing that I want you to walk away with today, it's that technology alone isn't the product. What made Heidi successful was not just the release of foundational models, but what made us successful was the pivot. We moved from a broad care platform that tried to do everything to focusing on a single workflow that brings immediate tangible value to doctors. Don't try to boil the ocean. Just solve one painful problem perfectly.

And second, in a world of generative AI, the human is more important than ever. Doctors and clinicians are the core part of the product. We learned to treat our subject matter experts not just as testers, but also as our biggest asset. They're the guardians of quality. And finally, for the builders in the room, build a flexible architecture from day one. This isn't just about code quality. It's about business survival. It is this flexibility that allows us to respond to changing regulatory environments and expand into new regions with completely different requirements. Your architecture should be an enabler of expansion, not a bottleneck.

And that's about it. I hope you learned one or two things from our journey. And if you're interested in building the future of healthcare or have any questions about Heidi, I'd love to chat. You can find more about Heidi on the left QR link, and you can find me on the right QR link. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.