DEV Community: Todd Bernson

MLOps for Voice Cloning: CI/CD and Model Management in an AWS Environment

Todd Bernson — Mon, 23 Jun 2025 13:41:27 +0000

By Todd Bernson, CTO of BSC Analytics and USMC Veteran

You can train the world's best voice cloning model in your basement, but unless you can deploy it consistently, monitor it intelligently, and update it without burning down prod... it's just a science project.

Welcome to the world of MLOps — where machine learning meets actual engineering discipline. This article covers how to apply DevOps best practices to a voice cloning platform running on AWS, with a focus on CI/CD, model versioning, monitoring, and rollback strategies.

Spoiler alert: it's not just about the model. It’s about the platform.

What Makes Voice Cloning MLOps-Heavy?

Voice generation pipelines include:

Text preprocessing
Model inference (Tortoise-TTS, Coqui, etc.)
Audio output formatting
Storage and retrieval layers

Each part needs:

Version control
Deployment repeatability
Monitoring
Rollback capability

And unlike classic apps, changes in the model or weights can introduce regressions that are invisible until someone hears a result that sounds like a broken robot.

CI/CD: More Than Just App Code

Our CI/CD pipeline handles:

Infrastructure (Terraform)
Application code (API logic, orchestration)
ML model versions
Container builds (EKS)
Monitoring rules and alerts

Tools We Use:

GitHub Actions for workflow automation
Terraform for infrastructure versioning
Docker for building and tagging model containers
ECR for storing voice inference images
S3 for storing model weights and artifacts (if using Sagemaker)

Model Versioning: Know What You Deployed

We treat models like code:

Each model version gets a unique SHA tag
We store them in S3 and reference via input config
Every deployment logs which model version was used

Canary Deployments for ML Models

Never deploy a new model version blind.

We use:

Blue/Green EKS service updates for inference
Traffic-shifting via API Gateway stage variables
Automated test cases that check:
- Latency
- Audio length
- Audio fidelity
- Output duration vs expected

If the model goes rogue, we roll back — fast.

Build & Deploy Flow

Here’s a typical flow:

Dev pushes code or model update
GitHub Actions triggers:
- Linting / unit tests
- Docker build
- Terraform plan and apply
- Canary deployment to EKS
Health checks run

Bonus: logs and metrics for the deployment go into CloudWatch and get visualized.

Monitoring the Right Things

It's not enough to know the model responded. You need to know:

Did the audio sound right?
How long did it take to generate?
Was it the right version of the model?
Did we return any unexpected silence or clipping?

Metrics Tracked:

Inference duration
Audio file size / length consistency
API latency (P95 and P99)
Success/failure ratio
Model version used per request

Managing Drift Between Environments

You know what’s fun? Discovering that your staging environment works, but production silently fails because it’s using a different Docker image or something else.

So we:

Use Terraform for parity with dev/stage/prod
Automatically tag all deployments with env, model, and version

No surprises. No snowflakes. No "it works on dev" excuses.

Secure Secrets for ML Inference

Yes, your model container still needs secrets.

Secrets Manager for API keys / DB creds
Injected at runtime via EKS CSI driver

Best practice have this rotated automatically. Audited via CloudTrail. Encrypted end-to-end.

Final Thoughts

MLOps is where voice cloning becomes enterprise-ready.

Done right, it lets you:

Version and test your models like code
Deploy updates without outages
Catch regression before customers do
Build trust with engineering, compliance, and finance

And the best part? You can build this on AWS with the services you already use — EKS, Lambda, S3, CloudWatch, Terraform, GitHub Actions.

If you're building anything with voice, ML, and scale — and you're not treating it like a product — you're already behind.

Scaling an AI Voice Platform: Lessons in Performance and Cost Optimization on AWS

Todd Bernson — Wed, 18 Jun 2025 17:15:41 +0000

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who Tunes Inference and Deadlifts

Building an AI-powered voice cloning platform is fun. Watching it get crushed under load because you didn’t scale it properly? Not so much.

In this post, we’re talking about real-world lessons from scaling a voice cloning solution that generates and serves thousands of audio messages — personalized, on-demand, and secured in AWS. Not in theory. In production. With logs to prove it.

TL;DR

You’ll learn:

When to use EKS vs. SageMaker for inference
How to batch workloads and queue intelligently
Cost control levers that keep your CFO from panicking
Why CloudWatch is your best friend and worst critic

The Problem

Generating voice responses isn’t like querying a database. Every request involves:

Model inference (heavy compute)
Audio storage (and sometimes conversion)
Input validation
Possibly authentication

Multiply that by tens of thousands of requests per day, and things start to sweat.

So how do you scale?

Step 1: Know Your Workload Types

Not all voice generation is equal.

Lightweight:

Short responses (“Your appointment is confirmed.”)
Real-time generation (user is waiting)
Low concurrency

Use: AWS Lambda

Heavyweight:

Longform responses
Background jobs (e.g., batch generation of 5,000 voicemails)
High concurrency

Use: EKS (spot for batch, on-demand for latency-sensitive)

GPU-Intensive:

Complex voices, multi-speaker, multi-language synthesis
Realtime delivery with near-zero latency
High fidelity outputs

Use: SageMaker endpoints (with multi-model containers if needed)

Step 2: Queue Everything

Even the fastest systems benefit from decoupling.

API Gateway triggers SQS → SQS triggers EKS
Use Step Functions for batch orchestration
Prioritize workloads (e.g., VIP client messages jump the queue)

This buys you buffer time, allows retry logic, and improves overall system health.

Step 3: Watch the Watchers (aka CloudWatch)

What to monitor:

EKS CPU/memory % over time
Lambda duration and cold start counts
API Gateway 5xx and latency percentiles
SQS queue length (spikes = backlog = unhappy customers)

Set alarms. Send alerts. Watch for cost and scale patterns.

Step 4: Storage Strategy

Don't just dump audio into S3 and forget it. Be strategic.

Use S3 Standard for recently accessed files
Transition to Infrequent Access after 30 days
Lifecycle delete after 90–180 days unless marked otherwise

Bonus: tag files by use case (e.g., welcome-message, alert, promo) and optimize access patterns.

Step 5: Cost Optimization Tactics

EKS

Spot tasks for batch jobs (up to 90% cheaper)
Tune task CPU/memory to match actual model requirements
Use CloudWatch metrics to scale up/down containers

API Gateway

If you exceed 10M calls/month, consider ALB + Lambda via Lambda Function URLs

CloudFront

Cache voice files when possible
Use signed URLs for access control (not public-read S3)
What I did instead of ☝️ was mount S3 directly to the pod in EKS to simplify permissions.

Architecture Snapshot

[Frontend] → [API Gateway]
     ↓             ↓
 [Auth Layer] → [SQS]
                     ↓
                [EKS]
               ↓         ↓
          [S3 Audio]   [CloudWatch Logs]

Success Metrics That Matter

✅ Avg response time
✅ Batch jobs processed within SLA window
✅ Cost per voice file
✅ API success rate

If you’re not measuring these, you’re flying blind.

Final Thoughts

Scaling a voice AI platform isn’t about tossing more compute at the problem. It’s about:

Understanding what type of workload you’re running
Decoupling smartly
Tuning services like an engine, not a hammer
Building enough observability to know when things go sideways

The best part? With AWS, you can build something that scales to millions — and still fits in a startup budget. If you design it right.

Security in Voice AI: Safeguarding Cloned Voice Data and APIs with AWS Best Practices

Todd Bernson — Tue, 17 Jun 2025 21:53:01 +0000

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who Treats IAM Policies Like They're Handling Live Ammo

Voice AI is cool — until it leaks a customer’s audio file to the internet, ends up on a subreddit, and your CISO faints into a pile of SOC 2 binders. If you’re going to work with AI-generated voices, especially self-hosted ones, you better know how to lock it down.

This article breaks down how to secure your voice cloning infrastructure on AWS the way a Marine would: with discipline, precision, and zero tolerance for sloppy access control.

Whether you're in finance, healthcare, insurance, or just paranoid (which in cloud security is a virtue), here’s how to bulletproof your deployment.

1. IAM: Zero Trust or Bust

First rule: no service should have more access than it needs. IAM is your gatekeeper.

Least Privilege

Every Lambda, EKS deployment, and API Gateway integration uses its own IAM role.
S3 permissions are scoped to specific buckets and prefixes.
No wildcard "Action": "*" or "Resource": "*" nonsense.

Inline vs Managed Policies

Use custom inline policies to restrict actions tightly.
Avoid attaching AWS-managed policies directly unless scoped by a boundary.

Example policy snippet:

{
  "Effect": "Allow",
  "Action": [
             "s3:GetObject",
             "s3:PutObject"
            ],
  "Resource": "arn:aws:s3:::voice-clone-prod/audio/*"
}

2. Network Security: Stay in the VPC

Your inference engine (like Tortoise-TTS in ECS) does not need a public IP.

Best practices:

EkS nodes live in private subnets.
NAT Gateway used only when outbound is required.
No internet-facing access unless explicitly required (e.g., CloudFront).

If you’re feeling extra paranoid, attach a WAF to your CloudFront and enable throttling + IP filtering. Because someday someone will test your endpoint with curl.

3. Data Protection: Encrypt Everything

At Rest:

S3 buckets with default encryption of CMK.
Sensitive metadata (user ID, timestamps, script text) also encrypted at the application level if needed.

In Transit:

HTTPS only. TLS 1.2+. No exceptions.
Custom domain for APIs using CloudFront + ACM-managed certs.

Secrets:

Use AWS Secrets Manager for storing:
- API keys
- Database creds
- Model-specific config

Accessed at runtime only via scoped roles. Rotated. Audited.

4. Logging & Monitoring: If You Can’t See It, You Can’t Secure It

CloudWatch Logs:

Capture API requests (via API Gateway logging).
Log custom metrics: request duration, model inference times, failures.

CloudTrail:

Enabled globally.
Monitors:
- IAM role usage
- S3 access
- Secrets Manager requests

Export logs to S3 and send alarms via SNS if weird things happen — like someone trying to access from us-east-5...

GuardDuty + Security Hub:

Detects anomalies: port scanning, unexpected API usage, etc.
Integrate with your SIEM or just let it yell at your DevSecOps channel in Slack.

5. API Security: No One Hits My Endpoint Without ID

Your API Gateway isn’t public candy.

Options:

IAM auth for internal services.
Google Auth for user-level access.
API keys + usage plans for partner integrations.
WAF rules to rate-limit, IP block, and reject known bad patterns.

You can even use Lambda authorizers if you want to get creative with token validation (which is what I did).

6. Isolation By Design

If you’re multi-tenant (e.g., supporting multiple departments or clients):

Isolate environments by account (best) or VPC/namespace (acceptable).
Separate S3 prefixes per tenant with enforced IAM policies.
Don’t ever cross audio files or inference containers across customers unless it’s anonymized and approved.

Bonus: tag everything (Environment, Owner, DataSensitivity) to support automated compliance checks.

7. Compliance: Make Auditors Say “Wow”

HIPAA? SOC 2? GDPR? CCPA? No problem.

What They’ll Want:

Encryption policies (check)
Logging and access monitoring (check)
User access controls (check)
Data retention and deletion capabilities (also check)

Set up:

S3 lifecycle policies (auto-delete after 90 days)
Explicit “DeleteObject” API access in IAM
Audit report generation from CloudTrail + Athena queries

They won’t just nod — they’ll invite you to present at their next audit prep session.

Final Security Checklist

Area	Secured With
IAM Roles	Scoped to service/resource level
S3 Buckets	KMS encryption + bucket policies
API Gateway	Auth, WAF, throttling, logging
Compute	Private subnets, no public IPs
Secrets	Secrets Manager + least-privilege access
Monitoring	CloudWatch, CloudTrail, GuardDuty
Compliance	Automated logs + data lifecycle enforcement

Final Thoughts

Security in voice AI isn’t optional — especially when you’re generating content that sounds like your employees, agents, or doctors.

Done right, a voice cloning platform on AWS:

Keeps customer data locked down
Delivers zero-trust compliance
Maintains auditability for even the most intense regulatory environments

And best of all? It still scales, still performs, and still costs less than most per-character voice APIs.

The ROI of Voice Automation: Cost Savings and Efficiency Gains from Self-Hosted Voice Clones on AWS

Todd Bernson — Mon, 16 Jun 2025 15:29:13 +0000

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who’d Rather Pay for Compute Than Per-Character TTS Pricing

Let’s skip the buzzwords and get straight to what your CFO actually cares about: does this AI voice thing save money?

The answer is yes — if you do it right. That means not paying extra per character to a SaaS platform that charges more to say “please hold” than a human would to just answer the call.

This article lays out the real-world return on investment (ROI) of deploying a self-hosted voice cloning platform on AWS, based on what I’ve built — and what you can too.

The Problem With Pay-Per-Sentence

Managed voice APIs (Polly, ElevenLabs, you name it) are fantastic for prototypes. But scale them up and they’ll chew through your budget faster than a sales team with an open bar.

Let’s say:

You send 100,000 personalized voice messages per month.
Each message averages 800 characters.
That’s 80,000,000 characters — or $240/month minimum with Polly.
Scale that by 12 months and $2880/year — just to say the same things over and over again.

Now imagine that same workload running inside your AWS account, on your infrastructure, with no recurring per-character licensing.

Where the Savings Come From

Let’s break it down.

Model Hosting

Use open-source models like Tortoise-TTS or Coqui:

No licensing fees.
Full control over inference.
Deploy via EKS, Lambda, or SageMaker depending on workload.

Compute Strategy

You’re not running this thing 24/7 — you’re processing jobs in bursts. That’s what AWS does best.

Options:

Lambda for short jobs (<15s).
EKS spot for longer, cost-effective bursts.
SageMaker endpoints for real-time inference with GPU when needed.

Storage

Audio and logs live in Amazon S3:

Standard + Infrequent Access tiers.
Lifecycle policies auto-archive old content.
Total cost for 100,000 audio files (10 sec each): ~$2/month.

Reuse and Replay

One of the biggest wins of self-hosted: cache and reuse output.

Did Jane Smith’s insurance reminder change? No? Reuse last month’s voice file.
Store hashed scripts → check before reprocessing.
Huge savings. Huge.

Automation and CI/CD

Terraform + GitHub Actions = no manual deployment overhead.

Cost to manage: low.
Time to deploy new voices or updates: minutes.
Maintenance: minimal (patch EKS images monthly or use managed runtime updates).

But Wait, There’s More (Than Cost)

It’s not just about saving money. It’s about what you unlock when you stop renting voices and start owning your own pipeline.

Speed

New voices in minutes, not 2 weeks waiting on a vendor’s custom voice program.
Edits and updates in minutes — push a commit, redeploy.

Privacy

No PII leaves your AWS environment.
No “for quality and training purposes” clause buried in a vendor contract.
You control retention, logging, and compliance.

Scalability

You’re in control:

Scale EKS tasks based on SQS queues.
Possibly Use Step Functions for batch workflows.
Go global with CloudFront + S3 for voice file distribution.

Real-World Example: Insurance Use Case

Scenario: An insurance company sends:

50,000 monthly reminders.
25,000 claims updates.
10,000 wellness check-in messages.

Managed TTS Cost: ~$2,280/month

Self-Hosted AWS Cost: ~$150/month (including compute, storage, monitoring)

Annual Savings: Over $25,560

Now toss in brand voice control, security, reusability, and better CX — and you’ve got an ROI case that even the most skeptical exec will nod at between Slack messages.

Total Cost Breakdown

Component	Monthly Estimate (Self-Hosted)
EKS Compute (Spot)	$100
S3 Storage	$10
CloudWatch Logs	$15
Secrets Manager	$5
CI/CD (GitHub)	Free (or already included)
Total	~$130-$150/month

Compared to managed APIs at 10x that cost, with less flexibility.

ROI Bonus Points

Reuse recordings? ✅
Clone internal voices? ✅
Multilingual support? ✅
Sync to CRM or EMR systems? ✅
Monetize the platform as a service offering? Don’t tempt me.

Final Thoughts

If you’re still paying per character for voice automation, it’s time to ask why.

AWS gives you:

Control
Cost savings
Flexibility
Compliance

You just need the courage (and maybe some Terraform modules) to build it.

And once you do? You own the pipeline, the experience, and the margins. That’s not just ROI — that’s a competitive advantage.

AI Voices in Healthcare: Ensuring Privacy and Compliance with AWS-Powered Voice Cloning

Todd Bernson — Fri, 13 Jun 2025 14:46:07 +0000

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Voice Cloning Nerd with a Respect for HIPAA and Heavy Deadlifts

Healthcare doesn’t mess around when it comes to privacy. Between HIPAA, HITRUST, and the unofficial but very real “don’t you dare leak my test results” rule, any AI solution operating in this space better know how to behave.

So when I decided to bring voice cloning — yes, real-time AI-generated voices — into healthcare workflows, I knew two things:

It had to feel human.
It had to act like a raider-trained compliance officer.

Let’s talk about how we built a fully self-hosted, AWS-powered voice cloning platform designed for healthcare environments — balancing personalization with the paranoia (justified!) that comes with handling PHI.

Why Voice Cloning in Healthcare?

Simple: people trust people, not robots.

Voice matters when:

A nurse gives post-op instructions.
A doctor shares lab results.
A health coach follows up on a treatment plan.
A reminder tells someone to refill their prescription.

Now imagine all that happening automatically, 24/7, in the patient’s language and tone preference — without overloading human staff.

That’s where AI voice cloning comes in. But only if it’s private, secure, and compliant.

Step One: Host It Yourself (on AWS)

Unlike third-party voice APIs that send data off into the magical ether (along with your compliance budget), our platform runs 100% inside your AWS account.

Key Stack:

Amazon EKS for compute
Amazon S3 for audio storage
API Gateway to receive input and trigger inference
IAM roles scoped to specific services (no wide-open buckets)
CloudTrail and CloudWatch for audit and observability
Terraform for everything (because of course)

All audio data — both input and output — remains fully encrypted, access-controlled, and traceable.

HIPAA Compliance: More Than Just a Checkbox

Want to make an auditor smile? Do this:

Encryption

At rest: S3 + AWS KMS-managed keys.
In transit: TLS 1.2+ enforced everywhere.

Access Control

IAM roles scoped per service.
No user access to buckets.
API Gateway protected with Custom Lambda Tokens.

Auditing

CloudTrail logs every API call.
CloudWatch logs all inference requests, failures, and usage patterns.
Optional integration with Security Hub and GuardDuty for threat detection.

Data Residency

Deploy to specific AWS regions.
Restrict S3 bucket replication or data movement across borders.

Retention Policies

Lifecycle rules on S3 buckets for data expiration.
Optional patient-specific TTL enforcement via tagging.

Real-World Healthcare Use Cases

Let’s get specific. Here’s what this platform can do today in healthcare:

Post-Op Follow-ups

Patients receive a voice message that sounds like their nurse, detailing what to watch for, when to call, and how to care for themselves. Delivered at scale. Personalized. Consistent.

Prescription Reminders

A voice reminder that says, “Hi James, it’s time to refill your Metformin.” Not a generic robovoice — their actual provider’s voice. Higher adherence. Lower readmission.

Mental Health Coaching

Cloned voices with tone-aware delivery can help deliver supportive messages in a non-threatening, empathetic way — even in different languages.

Pediatric Care Instructions

Parents hear instructions from the doctor their child saw — not a stranger. Less confusion, more trust, and fewer frantic follow-up calls.

Architecture Snapshot

[Patient Input] → [API Gateway] → [EKS]
       ↓                             ↓
    [Auth]                    [Voice Cloning Container]
       ↓                             ↓
 [Audit Logs] ← CloudWatch ← S3 Storage → [Frontend or IVR System]

Everything is logged. Nothing leaks. And your IT security team gets dashboards they can show off at compliance reviews.

Security-First Development Practices

We didn’t stop at infra:

All containers are scanned via Amazon ECR vulnerability scanning.
Enforced static code checks and Terraform validations.
No hardcoded secrets — everything’s injected at runtime via Secrets Manager (really easy with boto3).

Cost? Reasonable. Sanity? Preserved.

With EKS + spot pricing, inference costs can be as low as fractions of a cent per request. Compare that to vendor APIs charging you per character and throwing your data in a training set you never approved.

Also: owning your platform means you set the rules — not some ML black box team you’ve never met.

Why Use Custom Solutions?

Polly is great for standard TTS tasks, but it won’t let you natively train your own voice models. That’s a dealbreaker.

With our custom approach:

You control the model.
You define what’s stored and what’s deleted.
You can version models per patient, provider, or condition.

Final Thoughts

Healthcare deserves better than phone trees and tinny robovoices. It deserves personalization and privacy. That’s not a contradiction — that’s architecture.

This voice cloning platform gives you:

Full HIPAA-compliant deployment in AWS
Secure, scalable model inference
Meaningful, personalized communication at scale
Peace of mind for patients and compliance teams

Voice Cloning for Financial Services: Revolutionizing Customer Engagement in a Secure AWS Environment

Todd Bernson — Thu, 12 Jun 2025 17:09:22 +0000

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Slightly Over-Caffeinated Cloud Nerd

If there’s one thing financial institutions love more than acronyms, it’s trust. And if there’s one thing their customers can’t stand, it’s robotic voice systems that sound like they were pulled from a 1995 infomercial. Welcome to the intersection of personalization, security, and scale — where voice cloning and AWS meet to deliver something banks didn’t know they needed but now absolutely do.

This article dives deep into how a self-hosted, AWS-powered voice cloning platform (built by yours truly) can transform customer engagement in finance — all while checking the boxes on security, compliance, and cost efficiency.

See how I cloned my own voice on EKS.

Your browser does not support the audio element.

Why Voice Cloning in Finance?

Customer experience in financial services is, well... lagging. Long hold times, disconnected call scripts, and the “please enter your account number followed by pound” robot voice aren’t helping your NPS.

Enter voice cloning — not the gimmicky, deepfake-adjacent nonsense, but a real, controlled, secure AI system that speaks like your people. Imagine:

Loan officers sending personalized voice messages to clients.
Fraud alerts spoken in a trusted representative’s voice.
Wealth management updates delivered as though your advisor recorded them at 5AM just for you (which, let’s be honest, they didn’t).

But Is It Secure?

Glad you asked, compliance team.

This solution runs entirely within your AWS account, deployed with Terraform, and locked down tighter than a vault in Zurich.

IAM and Zero Trust

Fine-grained IAM roles mean no unnecessary access. Your API Gateway only talks to your ECS/Lambda backend. CloudWatch is there to rat out any shady behavior. There are no wildcard permissions, no “trust me, bro” roles. This is zero-trust, Marine-style.

Private Networking

The inference engine? Lives in a private subnet, behind a NAT gateway, with zero public internet exposure. Only API Gateway (optionally fronted by WAF and Cognito for auth) gets a whiff of the outside world.

Data Sovereignty

All voice data — input, output, and model artifacts — stay in your encrypted S3 buckets. Managed with KMS, audit-logged with CloudTrail, and optionally replicated across regions for DR. You want to keep it in-country? Easy. You want retention policies? Done.

Cost Considerations: Polly vs. Clone

Let’s not kid ourselves — Polly’s cheap. Until it isn’t.

If you’re doing high-volume interactions, especially personalized ones, Polly’s per-character pricing quickly adds up. And don’t forget, Polly’s voices aren’t yours. You’re just renting them, like a tux that fits weird in the shoulders.

With a self-hosted solution:

Run inference on spot EKS nodes for efficiency.
Use batching strategies for outbound messages.
Control your hardware (yes, even GPUs if you want to be extra fancy with SageMaker).

End result? Lower cost at scale, and a voice pipeline you own.

Real-World Use Cases

Let’s talk use cases that actually matter to finance.

1. Loan Decisions That Don’t Sound Robotic

Your platform can generate approval or denial messages in the same voice that onboarded the customer. Humanizing the experience reduces complaints and increases clarity — especially when tone and inflection match the gravity of the message.

2. High-Touch Wealth Management

Top-tier clients expect personalization. Sending periodic market updates or insights in a familiar voice — even when pre-recorded — maintains engagement without chewing up your advisor’s calendar.

3. Fraud Alerts with Trust

Fraud is sensitive. Customers ignore robocalls, but if it sounds like the rep they spoke to last week? Now you’ve got their attention.

4. Interactive Voice Portals

Imagine an IVR that doesn’t sound like every other bank. One that adapts tone to customer segment, preferred language, or even regional accent. All while running on infrastructure you control.

Compliance: Because Auditors Are People Too

Here’s what regulators care about, and how this solution handles it:

Concern	How It’s Handled
Data encryption	All data encrypted at rest (S3/KMS) and in transit (HTTPS/TLS 1.2+)
Auditability	CloudTrail + CloudWatch logs on every transaction
Access controls	IAM policies restrict roles to least privilege
Geolocation controls	Bucket policies, VPC restrictions, and region pinning
Data retention	Automated TTL and lifecycle policies in S3
PII isolation	Separate storage, tagging, and policy enforcement

This isn’t just compliant. It’s auditor catnip.

Architecture Snapshot

Here’s a high-level view of what powers this thing:

Frontend: Static React app hosted on Amazon S3 + CloudFront.
Backend API: Amazon API Gateway + AWS EKS.
Model Inference: Open-source TTS model (like Tortoise-TTS) wrapped in Docker.
Storage: Amazon S3 with KMS, versioning, lifecycle rules.
Security: IAM, VPC, CloudTrail, CloudWatch, WAF, Cognito (optional).
Infra Management: Terraform, like every project that respects itself.

And yes, it’s all in code. No click-ops here.

Personalization That Scales

Here’s the real kicker: you don’t have to build one voice. You can build hundreds. For:

Branch-specific greetings
Multilingual support
Client segmentation
Seasonal promos ("Happy Holidays from First Trust!")

And it’s reproducible, auditable, and automated — a CI/CD dream for voice systems.

Final Thoughts

Financial institutions that want to stay relevant in 2025 and beyond need to stop thinking like call centers and start thinking like brand experience engines. Voice is the next frontier — and not the kind that yells at you to reset your PIN.

If you're serious about:

Controlling costs,
Strengthening compliance,
Enhancing trust,
And delivering real personalization...

Then building your own voice platform on AWS isn’t just viable — it’s inevitable.

Written by: Todd Bernson, CTO, Voice Cloning Nerd, USMC Vet, and Probably Lifting Something Heavy Right Now

Beyond Polly: Custom Voice Cloning on AWS vs. Using Native AWS AI Services

Todd Bernson — Wed, 11 Jun 2025 17:17:16 +0000

By Todd Bernson, CTO of BSC Analytics, Voice Architect, and Guy Who Politely Declined Polly’s Help Because He Could Do It Better Himself

Let’s get something straight: Amazon Polly is great — until it isn’t. If you’re building a chatbot, narrating product updates, or making your app sound vaguely robotic (in a “pleasant call center” way), Polly delivers. It’s fast, it’s affordable, and it supports multiple languages with all the predictable cheer of a Disney ride operator.

But what happens when you want your voice app to sound... like you? Or your CEO? Or your 90-year-old grandfather? What if you need complete control over pronunciation, tone, pause patterns, and the ability to train on custom audio that would make Polly blush?

This is where the polite façade of managed services starts to fray, and custom voice cloning takes the stage — enter my self-hosted, AWS-powered, open-source driven voice cloning platform.

Polly: The Managed Marvel

Let’s give credit where it’s due. Polly:

Is easy to use.
Scales automatically.
Requires zero infrastructure.
Has SDKs for everything from Python to C++ to Amazon’s favorite child: JavaScript.

It’s perfect for:

Reading weather forecasts aloud.
Voicing automated reminders.
Anything with a script that doesn’t care if it sounds like everyone else.

But it’s not:

Customizable beyond SSML tags.
Trainable on new voices.
Particularly human in tone or nuance.

For regulated industries like finance and healthcare — where personalization, privacy, and control matter more than a “cheerful male voice number 4” — Polly’s out-of-the-box charm wears thin.

Building a Custom Voice Cloner (Like a Lunatic With Free Time)

So I did what any sensible AI engineer would do: built my own (Gunny Highway voice - "Improvise, Adapt, Overcome.)

This custom voice cloning app runs entirely in AWS — but not using AWS ML services like Polly or Bedrock. Instead, it’s built around open-source models like Tortoise-TTS, containerized, and deployed on EKS, with full integration across:

Amazon S3 (storage for audio input/output)
EKS (inference jobs)
API Gateway (entry point)
IAM (tight security, no wildcard party hats)
CloudWatch (observability for when someone uploads 17-minute TED Talks for cloning)

It’s a black box that behaves the way I want it to: securely, at scale, with custom voices and zero vendor lock-in.

Why Custom?

Here’s the deal:

1. Voice Uniqueness

Custom voice cloning allows you to train on your own audio samples. Want to sound like Morgan Freeman’s long-lost cousin? No problem (as long as you have the licensing — stay legal, kids).

2. Full Control Over Output

With Polly, you’re stuck adjusting speech patterns via markup. With Tortoise-TTS and similar models, you can control:

Intonation
Breathing pauses
Emotional delivery
Speech rate based on training inputs

This is priceless when crafting a brand experience, or in sensitive use cases like reading lab results to patients or delivering loan decisions with empathy.

3. Data Privacy and Residency

If you're working in finance or healthcare, you already know: data sovereignty is everything. When you run the model inside your own AWS account, using private S3 buckets and hardened VPCs, you're no longer just compliant — you're bulletproof.

No customer voice data ever leaves your control. No vendor logs. No "AI improvement” clause buried in the EULA.

4. Cost at Scale

Managed services shine at low volume. But clone 100,000 personalized voicemails a day and Polly's per-character pricing turns into a CFO’s nightmare.

Running your own inference jobs on EKS with spot instances or even SageMaker (if you're feeling fancy) lets you optimize for:

Cost per inference
Batch processing throughput
GPU/CPU usage tuning

Yes, there’s engineering overhead. But this is AWS. We eat YAML and billing reports for breakfast.

Hybrid Models: You Can Have Both

Not ready to ditch Polly? You don’t have to.

Use Polly for generic prompts, but call your custom API for:

Customer names
High-sensitivity scripts
Brand voice intros

Mixing and matching is a perfectly viable (and cost-effective) strategy. Your Terraform won’t judge you. Neither will I.

Industry Use Cases That Demand Customization

Finance:

Personalized fraud alerts from a cloned customer rep
Wealth manager assistant tools using their real voice
Secure client onboarding instructions that sound like the company

Healthcare:

Post-operative instructions read in a familiar nurse’s voice
Mental health guidance delivered in a calm, patient-specific tone
Multilingual support without the stilted tone of over-optimized TTS

Insurance:

Claim updates voiced by agents customers already trust
Emergency preparation alerts personalized by region

In all of these, the value isn’t just the voice. It’s trust, tone, and consistency. Polly can’t always deliver that.

The Reality Check

Running a custom voice clone system means accepting some responsibility:

Model maintenance
Container updates
Security patching
More observability

But in return, you get:

Ownership
Flexibility
Enterprise-grade privacy
The ability to say "yes" to marketing’s weirdest voiceover requests

And hey — if something breaks, at least you’ll understand why it broke. Try getting that from a managed service black box.

Final Verdict: Build When It Matters

There’s a reason AWS gives you building blocks instead of black boxes. It’s because your use case isn’t generic. You need:

Custom voices
Secure environments
Price control at scale
A brand voice you actually own

If that sounds like you, go custom.

If not, Polly’s waiting with open arms and a smiling, pre-trained voice.

Published by: BSC Analytics | Written by Todd Bernson, CTO, Voice Cloning Pioneer, and Proudly Not Polly

Terraforming the Voice: Deploying a Clone Application with Infrastructure as Code on AWS

Todd Bernson — Tue, 10 Jun 2025 15:46:06 +0000

Terraforming the Voice: Deploying a Clone Application with Infrastructure as Code on AWS

By Todd Bernson, CTO of BSC Analytics, Terraform Whisperer

There’s something beautiful about watching an entire production-grade environment spring to life from a single command — like watching a barbell float off the ground when the form is just right. This article is for those of us who believe that if your infrastructure isn’t defined in code, it’s one rogue click away from disaster.

Welcome to the story of how I built and deployed a self-hosted voice cloning application on AWS using Terraform for full-stack automation. We’re not talking about a toy project or an ML demo in a Jupyter notebook — this is a fully containerized, production-ready, auto-scaling, API-driven platform running in the cloud, doing real work. And it’s all defined, versioned, and repeatable, thanks to Terraform.

The Problem with ClickOps

Before we dive into the nuts and bolts, a quick word about ClickOps: don’t. I’ve seen more environments lost to fat-fingered console misclicks than leg days I've skipped. If your architecture lives in a dashboard, you don’t have architecture — you have a house of cards, built by a caffeinated intern and a bunch of undocumented AWS services.

Enter Terraform: HashiCorp’s solution for engineers who believe in immutability, repeatability, and not doing the same thing twice.

Project Overview: Voice Cloning Platform

We’re deploying a voice cloning system that includes:

A static frontend hosted on Amazon S3 with CloudFront
A backend API layer using API Gateway, Lambda, and/or EKS
ML inference containers running voice models like Tortoise-TTS
Audio files and output stored in S3
Monitoring via CloudWatch
IAM roles for secure, scoped access

All of it defined, provisioned, and version-controlled in Terraform. No clicks required.

Terraform Module Breakdown

The project is broken into modules. Because monolith Terraform files are like mixing all your protein powders in one shaker — technically it works, but you’ll regret it later.

1. `s3-static-site`

This module provisions:

An S3 bucket for static frontend files
CloudFront distribution with proper caching behavior
OAI (Origin Access Identity) to restrict direct S3 access
Route53 records if needed for custom domain

2. `api-layer`

Depending on the job type, this module provisions:

API Gateway (REST or HTTP)
Lambda functions (for authorization)

All versions are tracked. All permissions scoped. All endpoints logged.

3. `voice-model-inference`

EKS using the Tortoise-TTS container from ECR
IAM roles allowing secure access to model artifacts in S3
Logging via CloudWatch
GPU instances if you’re running inferencing at scale

4. `monitoring`

Because observability is not optional:

CloudWatch dashboards
Log groups with retention policies
Alarms on task failures, API errors, and latency thresholds

5. `iam-baseline`

Scoped policies for Lambda and EKS
Roles for CloudFront, S3 access, and API Gateway execution
No * permissions. Ever.

Deploy Flow

Your deploy process should be as crisp as a fresh uniform. Here’s how mine runs:

Clone repo
Set env-specific terraform.tfvars
Run terraform init
Run terraform plan -out=plan.out
Run terraform apply plan.out
Grab coffee, watch CloudWatch logs roll in

Each environment (dev, staging, prod) uses workspaces and backend state isolation. You can redeploy the entire stack quickly — assuming us-east-1 isn’t having “a moment.”

Secrets and Configs

Secrets are stored in AWS Secrets Manager, injected into Lambda and EKS tasks via environment variables.

If your config lives in config.js, you might as well tattoo your AWS keys on your forehead.

Real-World Lessons

S3 Bucket Policies: Don’t let CloudFront cache a 403 error. Test permissions before deploy.
Terraform State Locking: Use DynamoDB for backend locking or suffer the wrath of simultaneous apply attempts. Terraform now supports state locking in S3.
Cost Tags: Tag everything. Billing reports should not require detective work.

Dev Experience

Everything’s hooked into GitHub Actions:

Lint Terraform
Run terraform plan and post diff to PR
Auto-apply on merge to main (with approval gates)

Because manual deploys are for the birds. Or for vendors who bill hourly.

Why This Matters

Voice cloning isn’t just a novelty. In finance, healthcare, and insurance, it can revolutionize how humans interact with systems. But to be enterprise-ready, it needs:

Secure deployment
Scalable architecture
Auditability
Repeatability

This Terraform foundation ensures all four. Whether you’re standing up 1 environment or 100, the experience is the same. And when something breaks (it will), you’ll know exactly where to look — not which region your intern forgot to tag.

Final Thoughts

Building this platform felt like prepping for a lifting competition. The planning mattered as much as the execution, and when everything locked into place — it just felt solid.

Use Terraform. Use modules. Lock your state. And never let IAM policies become a "temporary fix."

Semper Fi, and happy provisioning.

Architecting a Scalable Voice Cloning Platform on AWS: A Case Study

Todd Bernson — Mon, 09 Jun 2025 13:17:58 +0000

If you've ever found yourself staring at a whiteboard trying to connect the dots between AI workloads, secure infrastructure, and scalability, welcome to my world. This is the story of how I built a fully self-hosted, scalable, and cost-optimized voice cloning platform on AWS using only a few tools: Terraform, containers, and a little grit learned from the Marine Corps and a lifetime under a barbell.

Let me walk you through the choices I made (yes, all of them), the architecture that emerged, and the hilariously non-obvious problems you only find after you're deep into deploying open-source ML models that occasionally throw tantrums like a toddler hyped up on Red Bull.

The Problem: Voice Cloning for Humans, Not Robots

Text-to-speech platforms are everywhere. Some sound like HAL 9000 on decaf. Others are good, but the second you want to use a proprietary voice (like, say, your own), you're either stuck paying by the syllable or signing your data rights away faster than you can say "GDPR."
So I built my own. A fully self-hosted solution using open-source models (shoutout to Tortoise-TTS and its uncanny ability to clone your voice right down to your awkward pauses). But cloning is only part of the fun — delivering that experience at scale, securely, and reliably is where AWS steps into the spotlight.

High-Level Architecture

The stack breaks down like this:

Frontend: Static web app hosted on Amazon S3, served through CloudFront.
Backend API: Deployed on ECS Fargate or Lambda (depending on the workload), behind API Gateway.
Voice Model Serving: Containerized ML model for inference.
Storage: S3 for audio and model artifacts.
Security & Identity: IAM roles, policies, and execution contexts.
Monitoring: CloudWatch for logs and metrics.
Infra: Terraform. Always Terraform.

Everything is defined in code, because if it’s not repeatable and testable, it’s a hobby project — not production-ready.

Frontend: Static Doesn’t Mean Boring

Let’s be honest, most frontends are glorified HTML wrapped in JavaScript sprinkles. Mine isn’t much different, but it’s clean, fast, and lives on S3 with CloudFront doing the content delivery heavy lifting. It’s versioned, integrated into my Terraform code, and invalidates CloudFront caches during deploys so I don’t get support tickets saying “it’s not loading” from someone’s uncle using IE11.

API Layer: Gateway Drug to Lambda or ECS

API Gateway with a VPC Link forwards to a internal load balancer and to EKS deployment.
API Gateway fronts all routes requests based on API parameters. Terraform templates make it trivial to switch execution paths — a small but powerful way to fine-tune cost vs. performance tradeoffs.
And yes, everything is rate-limited, throttled, and logged. Because one day some internal engineer will forget that uploading 200 audio files at once isn't polite.

Voice Model: Running Tortoise, Fast

Tortoise-TTS doesn’t exactly scream efficiency. It’s a brilliant model — and like all brilliant things, it comes with eccentricities. It’s Dockerized, stored in ECR, and run via EKS deployment triggered by events or API calls.
Each task has access to GPU (if needed). To bypass a lot of the S3 presigned URL complexity, S3 is simply mounted to the kubernetes deployment and uses an SA for least privelege. Yes, I do least privilege here. It’s not just a talking point in my security audit — it’s a way of life.

Terraform: The One True Religion

From the IAM role assumptions to VPC peering, subnet creation, and service discovery — everything is codified in Terraform.

Key modules:

aws_s3_bucket
aws_lambda_function
aws_eks_cluster
aws_api_gateway_http_api
aws_cloudwatch_log_group

You can burn it all down and stand it back up in just a few minutes. We work smarter not harder, unlike the Marines which sometimes flipped that around.

IAM: Gatekeeper of Sanity

I treat IAM like a loaded weapon. Every function, container, and service has its own scoped role. S3 buckets enforce object-level permissions. API Gateway uses usage plans and API keys with throttling. There’s no blanket admin access here — even if it makes debugging a little more annoying. It’s worth the tradeoff.

Also: never, ever let a Lambda function assume a role with wildcard permissions. That way lies madness.

Observability: Logs, Metrics, and Catching Fires Early

CloudWatch captures everything:

Lambda logs
EKS logs
Custom metrics for audio generation durations
Alerts for anomalies (latency spikes, task failures, etc.)

You can’t fix what you can’t see. I’ve got dashboards that would make a SOC analyst tear up. And not from joy — from envy.

Real-World Challenges

Running large AI models on AWS is like lifting heavy — it looks cool when it works, but if your form is off, something’s gonna break.

Problems I ran into:

EKS warm-up time was too long for short-lived audio jobs
CloudFront caching had to be fine-tuned to avoid stale UI/UX bugs

Solutions:

Container layers helped deployment move much more quickly.
Readiness probe keeps 5xx errors at bay.
Use CloudFront cache invalidation scripts in CI/CD

Closing Thoughts

Building this platform was part science, part art, and part gym therapy. AWS gave me the tools, Terraform gave me the control, and coffee gave me the persistence.

Would I do it again? Absolutely. But I’d like to remind the next brave soul: just because AWS offers 200+ services doesn’t mean you need all of them. Pick the ones that fit your use case. Glue them together smartly. Monitor everything. Lock it all down.

And if all else fails — lift something heavy, then get back to debugging.

By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Certified Deadlifter of Ridiculous Cloud Problems

Legacy, Meet Cloud Native: Lessons from Blending COBOL, K8s, and ML

Todd Bernson — Wed, 09 Apr 2025 14:18:50 +0000

Introduction

When people talk about modernization, they often picture “lift and shift,” total rewrites, or big-bang digital transformation. But reality is messier. In most enterprises, legacy code like COBOL isn’t going anywhere—it still runs core business functions, and rewriting it is usually a non-starter. Instead, the smarter move is to wrap and extend it: containerize it, orchestrate it, observe it, and—yes—train machine learning models around it.

In this final article of the eks_cobol series, we’ll reflect on the architectural lessons, tech gotchas, and practical wins of combining COBOL, Kubernetes, and SageMaker. You’ll walk away with a blueprint for how to do it in your own environment—and where the landmines are buried.

Architectural Recap

Let’s start with what we built:

COBOL on Kubernetes: We run GnuCOBOL inside containerized workloads, scheduled by K8s Jobs, with persistent shared storage via Amazon EFS.
Structured Logging: STDOUT/STDERR logs are parsed and saved as JSON files in S3 for traceability and ML readiness.
PostgreSQL Sink: Valid, enriched records are inserted into a relational store for downstream use.
SageMaker Model: We trained an XGBoost model on historical failures to predict which jobs are likely to fail before execution.
Feedback Loop: Inference scores now route high-risk files away from execution or into validation workflows.

It’s COBOL—but with an observability stack, proactive defense, and self-learning behavior.

Key Lessons Learned

1. Don’t Rewrite What Already Works

We didn’t rewrite COBOL. We containerized it. That’s a critical distinction. GnuCOBOL let us preserve decades of business logic while packaging it into a portable, observable runtime. By wrapping COBOL in Docker and invoking it via shell, we gained control without touching the legacy internals.

If the codebase is stable and correct, leave it alone. Modernize around it.

2. Logs Are a Goldmine—Structure Them

COBOL wasn’t built for structured logging. But by intercepting logs and shaping them into JSON, we unlocked a treasure trove of analytics possibilities. Every error, success, or anomaly became traceable, searchable, and ML-trainable.

Your pipeline is only as smart as your logs are readable.

3. Machine Learning Loves Legacy

This is not hype. ML is perfect for legacy systems because:

It doesn’t require code access.
It thrives on patterns and history.
It improves incrementally.

Our failure prediction model now prevents bad jobs from ever running, saving compute time and protecting downstream systems.

4. Kubernetes Handles Legacy Workloads Surprisingly Well

Many assume Kubernetes is for stateless microservices only. Wrong. We used EFS + Jobs + taints/tolerations to isolate legacy workloads without sacrificing elasticity or modern DevOps practices.

Legacy ≠ incompatible. With the right node pools and volume setup, K8s handles batch, stateful, or weird workloads just fine.

5. Async Communication Is Essential

Each component of this pipeline operates independently:

COBOL runs in isolation.
Parsers and enrichers are microservices.
ML runs out-of-band, in a parallel path.

S3, EFS, and event-driven messaging (SQS or Step Functions) glue the pieces together. That’s how we scale and decouple without breaking the whole thing.

Gotchas to Watch Out For

❌ Parsing COBOL Errors Is a Pain

You’ll spend way more time writing regex and building robust parsers than you’d like. COBOL errors weren’t designed to be machine readable. Build good test cases.

❌ Storage Permissions in K8s + EFS

Mounting EFS with the right IAM and access points requires some pain up front. Use the AWS EFS CSI driver and restrict access by namespace or workload label.

❌ Model Drift Can Sneak Up on You

As inputs evolve (new file formats, new job types), your ML model may lose accuracy. Schedule retraining and monitor for prediction distribution changes using SageMaker Model Monitor.

❌ Job Bloat If You Don’t Clean Up

Kubernetes Jobs can leave stale pods if not configured correctly. Use .spec.ttlSecondsAfterFinished or a custom controller to delete completed/failed jobs.

The Bigger Picture

This project isn’t just a modernization. It’s proof that:

COBOL is not the enemy.
Kubernetes isn’t just for Node.js and Python.
Machine learning isn’t just for greenfield use cases.

You can combine old and new, determinism and prediction, batch and real-time. It’s not just technically feasible—it’s strategically smart. You protect your investment in legacy, while gaining all the advantages of modern infrastructure and AI.

Final Architecture Diagram

Conclusion

You don’t need to choose between rewriting everything or staying frozen in time. This series showed how to elevate COBOL with containers, orchestrators, log structure, and machine learning—all without rewriting core logic.

This hybrid approach isn't just a one-off—it's a repeatable strategy. Any legacy system that produces structured input/output can benefit from this architecture. You give it new life, visibility, and intelligence. And that makes your system—and your team—a lot smarter.

Building a Smart Feedback Loop: Real-Time Inference on COBOL Logs

Todd Bernson — Tue, 08 Apr 2025 14:54:17 +0000

Introduction

Modern data pipelines don't stop at processing—they evolve. With our eks_cobol system running legacy COBOL code on Kubernetes and logging structured outputs, we’ve laid the foundation for a smarter system. Now it’s time to close the loop.

In this article, we show how we could integrate the SageMaker model from Article 5 into a real-time feedback loop. Instead of just reacting to COBOL job results, we proactively intercept bad inputs before they cause failure. We’ll cover how inference is triggered pre-execution, how results are logged and acted upon, and how this closes the loop between batch legacy logic and modern ML-based automation.

The Loop: From Prediction to Action

Here’s the basic feedback loop:

File is ingested and analyzed.
Metadata is extracted (size, record count, filename, etc.).
Metadata is sent to the SageMaker inference endpoint.
If the predicted probability of failure > threshold:
- File is flagged or quarantined.
- User is alerted.
- Optionally skipped from COBOL execution.
Otherwise, the file proceeds to COBOL job processing.

We use the exact SageMaker endpoint created in Article 5 to power the loop.

Trigger Point: Right After File Ingest

The feedback loop starts after a file lands in the mounted EFS directory. Our ingestion service performs lightweight analysis—no full record parsing, just enough metadata for inference.

Example features:

Byte size (os.path.getsize)
Filename pattern (date, region)
Number of records (quick line count)
Known anomalies (e.g., blank lines)

We wrap this logic in a predict_failure_risk() function that calls the SageMaker endpoint.

def predict_failure_risk(input_file_path):
    size = os.path.getsize(input_file_path)
    name = os.path.basename(input_file_path)

    # Create simple one-hot encoding for file extension
    extension = name.split('.')[-1]
    ext_flags = [1 if extension == 'csv' else 0]  # Extend for more types as needed

    # Simulated other features
    features = [size] + ext_flags
    payload = ','.join(map(str, features))

    response = boto3.client('sagemaker-runtime').invoke_endpoint(
        EndpointName='cobol-failure-predictor',
        ContentType='text/csv',
        Body=payload
    )

    score = float(response['Body'].read().decode())
    return score

If the returned score exceeds our threshold (0.8 for high confidence), we act.

Risk Routing: High vs. Low Confidence Paths

We define 3 potential paths based on model confidence:

Low Risk (< 0.5): File is processed normally.
Medium Risk (0.5–0.8): File is tagged but proceeds; alerts may be logged.
High Risk (> 0.8): File is moved to /mnt/data/quarantine/, skipped from execution, and flagged for review.

These thresholds are tunable based on model accuracy, job cost, and risk tolerance.

The routing logic is embedded into the controller script before the COBOL job kicks off:

score = predict_failure_risk('/mnt/data/input/job123.csv')

if score > 0.8:
    print("High failure risk. Skipping COBOL execution.")
    move_to_quarantine('/mnt/data/input/job123.csv')
elif score > 0.5:
    print("Medium risk. Proceeding with caution.")
else:
    print("Low risk. Running job.")
    run_cobol('/mnt/data/input/job123.csv')

Logging and Traceability

For every prediction, we log:

Job ID
Score
Action taken
Timestamp

These logs are sent to CloudWatch and optionally to a DynamoDB "job decisions" table for auditing.

{
  "jobId": "job123",
  "score": 0.91,
  "decision": "quarantined",
  "timestamp": "2025-04-03T18:12:30Z"
}

This gives us full traceability from ingestion through prediction to final action.

Feedback into the Model

To keep the loop smart, we must evolve the model. So, for every prediction that results in:

A correct decision → reinforce via logs.
A wrong decision → flag for retraining.

A Lambda function watches the quarantine bucket. If a file in quarantine is later processed successfully by an engineer, it’s tagged as a false positive and fed into the retraining dataset. This self-healing process makes the model more precise over time.

Business Impact

Before this feedback loop, bad jobs would:

Run anyway, wasting CPU time.
Cause cascading failures in downstream services.
Require postmortem triage.

Now, we proactively flag risky inputs. Engineers focus only on edge cases. Overall job success rates improve, and so does trust in the system.

This loop also enables us to A/B test different models, thresholds, and routing logic—giving us a lab for optimization without interrupting the production flow.

Conclusion

COBOL jobs don’t have to be dumb. By wrapping them in modern ML pipelines, we get real-time intelligence that prevents failures before they happen. SageMaker gives us prediction. Kubernetes gives us orchestration. And a simple controller gives us the glue to wire it all together.

With a smart feedback loop in place, eks_cobol becomes more than a modernization play—it becomes a self-improving system that learns from its own failures.

Predicting Legacy Failures: Training and Hosting ML Models in SageMaker

Todd Bernson — Mon, 07 Apr 2025 13:08:13 +0000

Introduction

Legacy systems are infamous for failing silently—or catastrophically—with no early warning signs. In our eks_cobol pipeline, COBOL batch jobs handle sensitive data transformations. When something goes wrong, we don’t just want to know after it fails—we want to know before it runs. Enter machine learning.

This article covers how we use Amazon SageMaker to train a model that predicts COBOL job failures based on input metadata and content characteristics. You’ll see how we take the structured error data from Article 4, create features, train a model using XGBoost, host it with a live endpoint, and wire it into our processing pipeline for real-time inference.

The Prediction Problem

The goal is to predict whether a COBOL job will fail, before running it, using data available at ingest time. Features include:

Filename (which may encode customer, date, region, etc.)
File size (bytes)
Record count
Presence of null fields or format anomalies
Job type or business logic variant

We label previous failed jobs with isFailure = True and successful jobs with isFailure = False. The model learns correlations between input patterns and known failures.

Building the Training Dataset

We merge two CSVs:

One from failed COBOL jobs (errors_flat.csv)
One from successful jobs (success_flat.csv)

A preprocessing script ensures both datasets are aligned, normalized, and balanced.

import pandas as pd

errors = pd.read_csv('errors_flat.csv')
success = pd.read_csv('success_flat.csv')

df = pd.concat([errors, success], ignore_index=True)
df['fileSize'] = df['rawRecord'].apply(lambda x: len(str(x).encode('utf-8')))
df['fileExtension'] = df['inputFile'].apply(lambda x: x.split('.')[-1])
df = pd.get_dummies(df, columns=['errorType', 'fileExtension'])

df = df[['fileSize', 'isFailure'] + [col for col in df.columns if col.startswith('errorType_') or col.startswith('fileExtension_')]]
df.to_csv('ml_input.csv', index=False)

Training the Model in SageMaker

We use SageMaker’s built-in XGBoost container for binary classification. The training script is handled via a SageMaker training job or a SageMaker Studio notebook.

from sagemaker.inputs import TrainingInput

container = sagemaker.image_uris.retrieve("xgboost", session.boto_region_name, "1.3-1")

xgb_estimator = sagemaker.estimator.Estimator(
    image_uri=container,
    role=role,
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    output_path=f's3://{bucket}/{prefix}/output',
    sagemaker_session=session
)

xgb_estimator.set_hyperparameters(
    objective="binary:logistic",
    num_round=100,
    max_depth=5,
    eta=0.2,
    subsample=0.8,
    colsample_bytree=0.8
)

xgb_estimator.fit({
    "train": TrainingInput(train_s3_path, content_type="csv"),
    "validation": TrainingInput(test_s3_path, content_type="csv")
})

This trains a binary classifier that predicts failure probability (0.0 to 1.0) given new job metadata.

Hosting the Inference Endpoint

Once the model is trained and stored in S3, we deploy it to a real-time SageMaker endpoint:

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

predictor = xgb_estimator.deploy(initial_instance_count=1, instance_type="ml.p3.2xlarge")
predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()

sample = X_test.head(1).to_csv(header=False, index=False).strip()
print("Sample row:", sample)
print("Prediction:", predictor.predict(sample))

Now we can send job metadata in real-time and receive a prediction before running the COBOL job.

Integrating Inference into the Pipeline

Before a COBOL job runs, the ingestion service sends a prediction request to the SageMaker endpoint. If the prediction is above a threshold (say 0.8), we mark the job as "high risk" and route it to a validation or quarantine path.

import boto3
import json

runtime = boto3.client('sagemaker-runtime')

def get_failure_score(fileSize, ext_onehot, error_type_onehot):
    payload = f"{fileSize}," + ",".join(map(str, ext_onehot + error_type_onehot))
    response = runtime.invoke_endpoint(
        EndpointName='cobol-failure-predictor',
        ContentType='text/csv',
        Body=payload
    )
    score = float(response['Body'].read().decode())
    return score

This gives us predictive observability—no more surprises when a job fails after burning through hours of runtime.

Model Monitoring and Retraining

We use SageMaker Model Monitor to detect drift in prediction distributions. As more jobs are processed, both successful and failed, we continuously push new records to the training bucket and retrain the model weekly via a scheduled SageMaker pipeline or Lambda-triggered training job.

The retraining process includes:

Collect new .json logs from S3
Run the same flatten + preprocess script
Update the dataset
Launch a training job with versioned output
Replace the endpoint via blue/green deployment

Conclusion

Machine learning isn’t just for flashy new systems—it can massively improve how legacy pipelines operate. By training and hosting a binary classifier in SageMaker, we’ve added a predictive safety net to our COBOL workflows. With every job that fails or succeeds, the model gets smarter, reducing wasted compute and catching bad inputs early.

This is the kind of hybrid future that actually works: COBOL + Kubernetes + JSON + SageMaker, working in concert. And it all starts with clean training data and good feature engineering.