DEV Community

Cover image for What Does AI Across the SDLC Actually Look Like at a Health Insurance Company?
Mohammed Ali Chherawalla
Mohammed Ali Chherawalla

Posted on

What Does AI Across the SDLC Actually Look Like at a Health Insurance Company?

There is no shortage of articles about AI in software development. Most of them talk in abstractions. "AI will transform engineering." "Copilot increases productivity." "The future of development is AI-assisted."

None of that helps you if you are an engineering leader at a health insurance company trying to figure out what this actually looks like on a Monday morning when your team sits down to work on the claims processing platform.

This article is the practical version. Every phase of the software development lifecycle, mapped to what AI does and does not do in a health insurance engineering environment. No theory. No hype. Just what we have seen work across insurance engineering teams at Wednesday Solutions.

The Starting Point Most People Get Wrong

Most engineering leaders start their AI adoption journey by picking a tool. They evaluate Copilot vs Cursor vs Claude Code. They run a pilot with 5 engineers. They ask those engineers if they feel more productive. The engineers say yes. The leader rolls it out to everyone.

Six months later, nothing has changed at the organizational level.

Faros AI studied over 10,000 developers and found exactly this pattern. Individual engineers completed 21% more tasks. They merged 98% more pull requests. But organizational delivery metrics stayed flat. More code was being written. The same amount of software was being shipped.

The reason is simple. Giving an engineer an AI coding assistant is like giving a factory worker a faster drill. They drill faster. But if the bottleneck is in the assembly line, in the quality checks, in the shipping process, the faster drill does not make the factory produce more.

In health insurance engineering, the bottlenecks are almost never in how fast someone writes code. They are in understanding what to build, reviewing what was built, testing it against thousands of edge cases, and deploying it without breaking a platform that millions of people depend on.

AI across the SDLC means putting AI at every one of those bottlenecks. Not just the coding part.

Phase 1: Requirements and Planning

What the day looks like without AI

A product manager writes a requirements document for a new claims adjudication rule. It references the existing workflow, the policy types affected, and the expected behavior. An engineer reads it and starts building. Three days later, the engineer asks a question that reveals they interpreted the requirement differently. The product manager clarifies. The engineer reworks two days of code. This happens on almost every feature.

The problem is not bad communication. The problem is that the claims processing system has 15 years of accumulated business logic, and no single document captures all of it. Every new feature interacts with rules that exist in code comments, in the heads of senior engineers, and in policy documents that engineering never sees.

What the day looks like with AI

The team builds agent skills for their system. These are structured knowledge packs that contain every business rule, architecture decision, data model, and constraint in the claims platform. When a new requirement comes in, the AI can answer questions like "what existing rules does this interact with?" and "which policy types are affected by this change?" before the engineer writes a line of code.

This is not magic. It is the equivalent of having a senior engineer with perfect memory available to every person on the team, at all times. The difference is that this senior engineer never goes on leave, never forgets a rule from 2019, and never gives a different answer depending on who asks.

For health insurance specifically, agent skills capture things like: the claims adjudication sequence, which fields are mandatory for different submission types, how the provider network data refreshes, what the co-pay calculation logic looks like across different plan tiers, and how the legacy system handles exceptions. This context eliminates the majority of rework that comes from misunderstood requirements.

Phase 2: Design and Architecture

What the day looks like without AI

Your architects spend significant time on decisions that feel unique but follow patterns. How should we structure this new microservice? What is the right way to handle async processing for batch claims? How do we integrate this new feature with the legacy system without creating another point of fragility?

These decisions matter. A bad architecture choice in a claims processing system can take months to unwind. But most of these decisions are not novel. They are variations on problems that have been solved before, in your own codebase and across the industry.

What the day looks like with AI

AI with agent skills can propose architecture options that account for your existing system. Not generic patterns from a textbook. Patterns that work with your specific tech stack, your data volumes, your integration points.

An architect asks: "We need to add a real-time eligibility check to the claims workflow. Here is how the current workflow processes claims. What are the architecture options that do not add latency to the critical path?" The AI proposes options grounded in how your system actually works, because the agent skills contain that context.

The architect still makes the decision. But instead of spending two days researching options, they spend two hours evaluating options that are already tailored to their constraints. For a health insurance platform handling millions of claims, the quality of those decisions improves because the AI never forgets a constraint that the architect might overlook after a long week.

Phase 3: Code Generation

What the day looks like without AI

An engineer picks up a ticket to implement a new co-pay calculation for a specific plan type. They look at how existing calculations work. They trace through the codebase to understand the data flow. They write the new logic. They write unit tests. They submit a pull request. Total time: 3 to 5 days for a mid-level engineer.

What the day looks like with AI

The same engineer works with AI that has agent skills containing the claims processing logic, the existing co-pay calculations, the data model, and the testing patterns the team uses. They describe what needs to happen. The AI generates the implementation, following the same patterns and conventions used elsewhere in the codebase. The engineer reviews it, adjusts the edge cases, and submits. Total time: 1 to 1.5 days.

The speed gain is not because the AI writes code faster than a human types. It is because the AI skips the hours of codebase exploration that the engineer would have done manually. It already knows how the system works.

But here is the part that matters more than raw speed. The AI generates code that is consistent with the rest of the codebase. Same naming conventions. Same error handling patterns. Same test structure. When you have 50 engineers writing claims processing code, consistency is what keeps the system maintainable. Without AI, consistency depends on code review catching every deviation. With AI and well-built agent skills, consistency is the default.

DORA 2025 found that AI generates boilerplate code 2 to 4 times faster. That is real but misleading. The real gain in health insurance is not boilerplate speed. It is the elimination of the ramp-up time that every engineer spends understanding the system before they can contribute. A new engineer with access to well-built agent skills contributes meaningfully in days, not months.

Phase 4: Code Review

What the day looks like without AI

Your three most senior engineers are the only people who can review pull requests for the claims processing system. They understand the legacy integrations, the edge cases, the regulatory constraints. Every pull request waits in their queue. Average review time: 2 to 3 days. On a busy week, 5 days.

These senior engineers spend 30 to 40% of their week reviewing code. That is 30 to 40% of your most expensive, most experienced people doing work that is partly mechanical: checking naming conventions, catching null pointer risks, flagging missing error handling. The high-value part of their review, evaluating architecture decisions and business logic correctness, gets squeezed into whatever time is left.

What the day looks like with AI

Automated AI review tools handle the first pass on every pull request. They catch inconsistent naming, missing error handling, security vulnerabilities, style violations, and common bugs. They recommend fixes that engineers accept with a single click. Engineers self-correct before the human review starts.

By the time a pull request reaches your senior engineer, the mechanical issues are already resolved. The senior engineer spends their review time on the questions that require human judgment: does this architecture decision make sense for our system? Does this business logic correctly handle the edge case where a member has dual coverage? Will this change interact poorly with the batch processing job that runs overnight?

The review cycle compresses from days to hours. Your senior engineers get 15 to 20% of their week back. And the quality of reviews improves because human attention is focused on the decisions that actually need human attention.

At Wednesday Solutions, automated first-pass reviews are standard on every project. The reduction in review cycle time is one of the most immediate and visible improvements any engineering team can make.

Phase 5: Testing

What the day looks like without AI

Testing a claims processing platform is exhausting. The number of edge cases is enormous. Different policy types, different plan tiers, different provider networks, different co-pay structures, deductible interactions, out-of-network scenarios, coordination of benefits with other insurers. A single claims adjudication change can affect hundreds of combinations.

Manual test case creation takes weeks. Running the full regression suite takes days. Tests break constantly because test data changes. Engineers spend more time maintaining tests than writing them. And despite all of this effort, coverage is incomplete. The team tests the happy path and the most common edge cases. The uncommon edge cases get found in production.

What the day looks like with AI

AI-automated testing changes this at two levels.

At the API level, testing tools sit at the network layer and capture every API request your system makes. They automatically generate API tests from observed traffic patterns. No engineer writes these tests manually. Coverage expands to include patterns that no human would have thought to test, because the tool captures real behavior, not hypothesized behavior.

At the end-to-end level, vision-based testing tools take screenshots of your application, evaluate what the user is trying to do, and validate outcomes. This eliminates the flakiness that plagues traditional end-to-end tests, which break every time a CSS class changes or a button moves 3 pixels. The vision-based approach tests what the user sees, not what the DOM looks like.

For a health insurance platform, this means test coverage goes from "we cover the top 50 scenarios" to "we cover every scenario the system has ever processed." That is where the 75% reduction in production bugs comes from. Not from better code. From catching bugs that would have slipped through manual testing.

Phase 6: Deployment and Operations

What the day looks like without AI

Deployments are scheduled events. The team picks a window, usually outside business hours, and pushes the release. Someone stays online to monitor. If something breaks, the team triages manually. For a health insurance platform during open enrollment season, this means deployments slow down or stop entirely because the risk of downtime is too high.

The result: features pile up. Releases get bigger. Bigger releases carry more risk. More risk means more caution. More caution means slower releases. It is a cycle that compounds.

What the day looks like with AI

AI-assisted deployment does not mean the AI pushes the button. It means the deployment pipeline is instrumented with AI monitoring that catches anomalies within minutes, not hours. Automated rollback triggers when error rates spike. Recovery time compresses from hours to minutes.

We stabilized a health insurance platform that was crashing 4 hours daily during peak season. Three weeks to zero downtime. Revenue protected immediately. The longer-term rebuild moved the platform to a modular architecture where individual services deploy independently. That meant the team could ship changes to the eligibility service without touching the claims service, and deploy during business hours with confidence.

The DORA 2025 report identifies deployment frequency and recovery time as two of the five key engineering performance metrics. Top-performing teams deploy on-demand, multiple times per day, with recovery times under one hour. When your deployment pipeline is AI-assisted, you move from "we deploy once a month and hope nothing breaks" to "we deploy daily and know we can recover in minutes if something does."

Phase 7: Monitoring and Continuous Improvement

What the day looks like without AI

Your operations team watches dashboards. They set static thresholds for alerts. When something crosses a threshold, they get paged. Half the alerts are noise. The real problems sometimes hide behind normal-looking metrics until a user calls to report that their claims are not processing.

What the day looks like with AI

AI-powered monitoring learns the normal behavior patterns of your system. It identifies anomalies that static thresholds miss. A 5% increase in claims processing latency at 2 AM is normal (batch jobs). A 5% increase at 10 AM is not (something is wrong). The AI knows the difference because it has learned the patterns.

More importantly, AI connects monitoring data back to the development process. When a production issue occurs, the AI can trace it to the specific deployment, the specific pull request, and the specific code change that caused it. Root cause analysis that used to take hours takes minutes.

For health insurance platforms processing millions of claims, this means you catch degradation before it becomes an outage, and you fix the root cause instead of applying patches.

What This Looks Like Across a Typical Week

Monday: An engineer picks up a new feature ticket. AI with agent skills answers their context questions immediately. They start building within an hour, not after two days of codebase exploration.

Tuesday: The engineer submits a pull request. Automated review catches 12 issues in the first pass. The engineer fixes them in 30 minutes. The senior engineer reviews the remaining architectural questions in an hour.

Wednesday: The feature passes AI-automated testing, which runs 400 test scenarios in 20 minutes. Manual testing would have covered 50 scenarios in 2 days.

Thursday: The feature deploys to production during business hours. AI monitoring confirms no anomalies within 15 minutes.

Friday: The team ships another feature instead of spending the day in regression testing and deployment planning.

That is what "AI across the SDLC" actually means. Not one tool. Not one phase. A compounding effect where speed gains at each phase stack on top of each other.

The Prerequisite That Makes All of This Work

Everything described above depends on one thing: your engineering processes need to be documented before AI can accelerate them.

If your code review standards live in one senior engineer's head, AI cannot automate the first pass. If your testing strategy is "whatever the engineer thinks is important," AI cannot generate meaningful tests. If your deployment process is a checklist that someone remembers most of but occasionally misses a step, AI cannot make deployments safer.

The DORA 2025 report, based on a survey of roughly 5,000 technology professionals, put it clearly: AI amplifies what is already there. Strong practices plus AI equals multiplied gains. Weak practices plus AI equals amplified chaos.

The first step is not buying a tool. It is writing down what "good" looks like for every phase of your SDLC. Once that is documented, every AI tool you adopt has a rubric to follow. That is when the compounding starts.

At Wednesday Solutions, we start every enterprise engagement by assessing exactly this: are the processes codified enough for AI to amplify? If yes, we move fast. If not, we help document them first. We have a 4.8/5.0 rating on Clutch across 23 reviews, with insurance and financial services among our longest-running engagements, because getting this foundation right is what separates teams that get 3x faster from teams that get frustrated.


Frequently Asked Questions

What does AI across the SDLC mean for health insurance companies?

It means using AI tools at every phase of software development, not just code writing. Planning, design, code generation, code review, testing, deployment, and monitoring all have AI applications. The compounding effect of AI at each phase is where the real speed gains come from. A health insurance engineering team that only uses AI for code generation captures maybe 20% of the possible improvement.

Which phase of the SDLC benefits most from AI in health insurance engineering?

Testing. Health insurance platforms have an enormous number of edge cases: policy types, plan tiers, provider networks, co-pay structures, deductible calculations, coordination of benefits. AI-automated testing can cover hundreds of scenarios in minutes where manual testing covers dozens in days. This is where most teams see the biggest immediate impact on quality and speed.

What are agent skills and how do they work in health insurance engineering?

Agent skills are structured knowledge packs that teach AI tools how your specific system works. For health insurance, they contain your claims processing workflows, business rules, data models, provider network structures, plan tier logic, and architectural constraints. Every AI tool on your team accesses the same agent skills, so output is consistent and specific to your system rather than generic.

How long does it take to implement AI across the SDLC at a health insurance company?

Start with one team, one phase. Automated code review can be implemented in a single sprint with immediate results. Testing automation takes 2 to 3 sprints to reach meaningful coverage. Building comprehensive agent skills for a complex claims platform takes 2 to 4 weeks of senior engineer time. Most teams see measurable improvement within 90 days of starting with a single team.

Does AI-generated code work with legacy health insurance platforms?

Yes, and this is one of the highest-value use cases. Legacy systems are where engineers spend the most time understanding context before they can write code. Agent skills that capture how the legacy system works eliminate that ramp-up time. We modernized a legacy codebase with over 2,000 files, legacy C code, zero documentation, and SOAP APIs. Structured AI with agent skills delivered what manual approaches and naive AI could not.

What happens when AI makes a mistake in a health insurance system?

The same thing that happens when a human makes a mistake: the review and testing process catches it. AI-generated code still goes through automated review, human review, and automated testing. The difference is that automated testing catches mistakes across hundreds of scenarios, not just the ones a human thought to test. Net result: fewer bugs reach production with AI than without it, because the safety net is wider.

Can small health insurance engineering teams benefit from AI across the SDLC?

Small teams often benefit the most. If you have 15 engineers managing a platform that needs 40, AI closes the capacity gap without hiring. Automated code review means your 2 senior engineers are not spending half their week reviewing pull requests. Automated testing means you get enterprise-grade coverage without a dedicated QA team. Agent skills mean new hires contribute in days instead of months. The smaller the team relative to the complexity, the bigger the AI impact.

How do you measure whether AI across the SDLC is working?

Track the five DORA performance metrics: deployment frequency, lead time for changes, change fail rate, recovery time, and rework rate. Measure them before AI adoption and track them monthly. If deployment frequency is increasing, lead times are shrinking, and change fail rate is dropping, your approach is working. If any of those metrics are flat, look at the phase of the SDLC where the bottleneck sits and focus AI adoption there.

Top comments (0)