<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Todd Bernson</title>
    <description>The latest articles on DEV Community by Todd Bernson (@semperfitodd).</description>
    <link>https://dev.to/semperfitodd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3012557%2F55faeb34-2e68-4f3b-8340-1c671dcee764.png</url>
      <title>DEV Community: Todd Bernson</title>
      <link>https://dev.to/semperfitodd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/semperfitodd"/>
    <language>en</language>
    <item>
      <title>MLOps for Voice Cloning: CI/CD and Model Management in an AWS Environment</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Mon, 23 Jun 2025 13:41:27 +0000</pubDate>
      <link>https://dev.to/semperfitodd/mlops-for-voice-cloning-cicd-and-model-management-in-an-aws-environment-20pf</link>
      <guid>https://dev.to/semperfitodd/mlops-for-voice-cloning-cicd-and-model-management-in-an-aws-environment-20pf</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics and USMC Veteran&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You can train the world's best voice cloning model in your basement, but unless you can deploy it consistently, monitor it intelligently, and update it without burning down prod... it's just a science project.&lt;/p&gt;

&lt;p&gt;Welcome to the world of &lt;strong&gt;MLOps&lt;/strong&gt; — where machine learning meets actual engineering discipline. This article covers how to apply DevOps best practices to a voice cloning platform running on AWS, with a focus on &lt;strong&gt;CI/CD&lt;/strong&gt;, &lt;strong&gt;model versioning&lt;/strong&gt;, &lt;strong&gt;monitoring&lt;/strong&gt;, and &lt;strong&gt;rollback&lt;/strong&gt; strategies.&lt;/p&gt;

&lt;p&gt;Spoiler alert: it's not just about the model. It’s about the platform.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes Voice Cloning MLOps-Heavy?
&lt;/h2&gt;

&lt;p&gt;Voice generation pipelines include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text preprocessing&lt;/li&gt;
&lt;li&gt;Model inference (Tortoise-TTS, Coqui, etc.)&lt;/li&gt;
&lt;li&gt;Audio output formatting&lt;/li&gt;
&lt;li&gt;Storage and retrieval layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each part needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version control&lt;/li&gt;
&lt;li&gt;Deployment repeatability&lt;/li&gt;
&lt;li&gt;Monitoring&lt;/li&gt;
&lt;li&gt;Rollback capability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And unlike classic apps, changes in the &lt;strong&gt;model&lt;/strong&gt; or &lt;strong&gt;weights&lt;/strong&gt; can introduce regressions that are invisible until someone hears a result that sounds like a broken robot.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI/CD: More Than Just App Code
&lt;/h2&gt;

&lt;p&gt;Our CI/CD pipeline handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure (Terraform)&lt;/li&gt;
&lt;li&gt;Application code (API logic, orchestration)&lt;/li&gt;
&lt;li&gt;ML model versions&lt;/li&gt;
&lt;li&gt;Container builds (EKS)&lt;/li&gt;
&lt;li&gt;Monitoring rules and alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tools We Use:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions&lt;/strong&gt; for workflow automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform&lt;/strong&gt; for infrastructure versioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; for building and tagging model containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECR&lt;/strong&gt; for storing voice inference images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt; for storing model weights and artifacts (if using Sagemaker)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Model Versioning: Know What You Deployed
&lt;/h2&gt;

&lt;p&gt;We treat models like code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each model version gets a unique SHA tag&lt;/li&gt;
&lt;li&gt;We store them in S3 and reference via input config&lt;/li&gt;
&lt;li&gt;Every deployment logs which model version was used&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Canary Deployments for ML Models
&lt;/h2&gt;

&lt;p&gt;Never deploy a new model version blind.&lt;/p&gt;

&lt;p&gt;We use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blue/Green EKS service updates&lt;/strong&gt; for inference&lt;/li&gt;
&lt;li&gt;Traffic-shifting via API Gateway stage variables&lt;/li&gt;
&lt;li&gt;Automated test cases that check:

&lt;ul&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Audio length&lt;/li&gt;
&lt;li&gt;Audio fidelity&lt;/li&gt;
&lt;li&gt;Output duration vs expected&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If the model goes rogue, we roll back — fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build &amp;amp; Deploy Flow
&lt;/h2&gt;

&lt;p&gt;Here’s a typical flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dev pushes code or model update&lt;/li&gt;
&lt;li&gt;GitHub Actions triggers:

&lt;ul&gt;
&lt;li&gt;Linting / unit tests&lt;/li&gt;
&lt;li&gt;Docker build&lt;/li&gt;
&lt;li&gt;Terraform &lt;code&gt;plan&lt;/code&gt; and &lt;code&gt;apply&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Canary deployment to EKS&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Health checks run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bonus: logs and metrics for the deployment go into CloudWatch and get visualized.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monitoring the Right Things
&lt;/h2&gt;

&lt;p&gt;It's not enough to know the model responded. You need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Did the audio sound right?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How long did it take to generate?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Was it the right version of the model?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Did we return any unexpected silence or clipping?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Metrics Tracked:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Inference duration&lt;/li&gt;
&lt;li&gt;Audio file size / length consistency&lt;/li&gt;
&lt;li&gt;API latency (P95 and P99)&lt;/li&gt;
&lt;li&gt;Success/failure ratio&lt;/li&gt;
&lt;li&gt;Model version used per request&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Managing Drift Between Environments
&lt;/h2&gt;

&lt;p&gt;You know what’s fun? Discovering that your staging environment works, but production silently fails because it’s using a different Docker image or something else.&lt;/p&gt;

&lt;p&gt;So we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Terraform for parity with dev/stage/prod&lt;/li&gt;
&lt;li&gt;Automatically tag all deployments with env, model, and version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No surprises. No snowflakes. No "it works on dev" excuses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secure Secrets for ML Inference
&lt;/h2&gt;

&lt;p&gt;Yes, your model container still needs secrets.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Secrets Manager&lt;/strong&gt; for API keys / DB creds&lt;/li&gt;
&lt;li&gt;Injected at runtime via EKS CSI driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best practice have this rotated automatically. Audited via CloudTrail. Encrypted end-to-end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;MLOps is where voice cloning becomes enterprise-ready.&lt;/p&gt;

&lt;p&gt;Done right, it lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version and test your models like code&lt;/li&gt;
&lt;li&gt;Deploy updates without outages&lt;/li&gt;
&lt;li&gt;Catch regression before customers do&lt;/li&gt;
&lt;li&gt;Build trust with engineering, compliance, and finance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the best part? You can build this on AWS with the services you already use — EKS, Lambda, S3, CloudWatch, Terraform, GitHub Actions.&lt;/p&gt;

&lt;p&gt;If you're building anything with voice, ML, and scale — and you're not treating it like a product — you're already behind.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Scaling an AI Voice Platform: Lessons in Performance and Cost Optimization on AWS</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Wed, 18 Jun 2025 17:15:41 +0000</pubDate>
      <link>https://dev.to/semperfitodd/scaling-an-ai-voice-platform-lessons-in-performance-and-cost-optimization-on-aws-2fll</link>
      <guid>https://dev.to/semperfitodd/scaling-an-ai-voice-platform-lessons-in-performance-and-cost-optimization-on-aws-2fll</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who Tunes Inference and Deadlifts&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Building an AI-powered voice cloning platform is fun. Watching it get crushed under load because you didn’t scale it properly? Not so much.&lt;/p&gt;

&lt;p&gt;In this post, we’re talking about &lt;strong&gt;real-world lessons&lt;/strong&gt; from scaling a voice cloning solution that generates and serves thousands of audio messages — personalized, on-demand, and secured in AWS. Not in theory. In production. With logs to prove it.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;You’ll learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When to use EKS vs. SageMaker for inference&lt;/li&gt;
&lt;li&gt;How to batch workloads and queue intelligently&lt;/li&gt;
&lt;li&gt;Cost control levers that keep your CFO from panicking&lt;/li&gt;
&lt;li&gt;Why CloudWatch is your best friend and worst critic&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Generating voice responses isn’t like querying a database. Every request involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model inference (heavy compute)&lt;/li&gt;
&lt;li&gt;Audio storage (and sometimes conversion)&lt;/li&gt;
&lt;li&gt;Input validation&lt;/li&gt;
&lt;li&gt;Possibly authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiply that by &lt;strong&gt;tens of thousands of requests per day&lt;/strong&gt;, and things start to sweat.&lt;/p&gt;

&lt;p&gt;So how do you scale?&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Know Your Workload Types
&lt;/h2&gt;

&lt;p&gt;Not all voice generation is equal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lightweight:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Short responses (“Your appointment is confirmed.”)&lt;/li&gt;
&lt;li&gt;Real-time generation (user is waiting)&lt;/li&gt;
&lt;li&gt;Low concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use:&lt;/strong&gt; AWS Lambda&lt;/p&gt;

&lt;h3&gt;
  
  
  Heavyweight:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Longform responses&lt;/li&gt;
&lt;li&gt;Background jobs (e.g., batch generation of 5,000 voicemails)&lt;/li&gt;
&lt;li&gt;High concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use:&lt;/strong&gt; EKS (spot for batch, on-demand for latency-sensitive)&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU-Intensive:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Complex voices, multi-speaker, multi-language synthesis&lt;/li&gt;
&lt;li&gt;Realtime delivery with near-zero latency&lt;/li&gt;
&lt;li&gt;High fidelity outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use:&lt;/strong&gt; SageMaker endpoints (with multi-model containers if needed)&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Queue Everything
&lt;/h2&gt;

&lt;p&gt;Even the fastest systems benefit from &lt;strong&gt;decoupling&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway triggers SQS → SQS triggers EKS&lt;/li&gt;
&lt;li&gt;Use Step Functions for batch orchestration&lt;/li&gt;
&lt;li&gt;Prioritize workloads (e.g., VIP client messages jump the queue)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This buys you &lt;strong&gt;buffer time&lt;/strong&gt;, allows &lt;strong&gt;retry logic&lt;/strong&gt;, and improves overall system health.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Watch the Watchers (aka CloudWatch)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What to monitor:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;EKS CPU/memory % over time&lt;/li&gt;
&lt;li&gt;Lambda duration and cold start counts&lt;/li&gt;
&lt;li&gt;API Gateway 5xx and latency percentiles&lt;/li&gt;
&lt;li&gt;SQS queue length (spikes = backlog = unhappy customers)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set alarms. Send alerts. Watch for cost and scale patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Storage Strategy
&lt;/h2&gt;

&lt;p&gt;Don't just dump audio into S3 and forget it. Be strategic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use S3 Standard for recently accessed files&lt;/li&gt;
&lt;li&gt;Transition to Infrequent Access after 30 days&lt;/li&gt;
&lt;li&gt;Lifecycle delete after 90–180 days unless marked otherwise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bonus: tag files by use case (e.g., &lt;code&gt;welcome-message&lt;/code&gt;, &lt;code&gt;alert&lt;/code&gt;, &lt;code&gt;promo&lt;/code&gt;) and optimize access patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Cost Optimization Tactics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  EKS
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Spot tasks for batch jobs (up to 90% cheaper)&lt;/li&gt;
&lt;li&gt;Tune task CPU/memory to match actual model requirements&lt;/li&gt;
&lt;li&gt;Use CloudWatch metrics to scale up/down containers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  API Gateway
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If you exceed 10M calls/month, consider ALB + Lambda via Lambda Function URLs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CloudFront
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cache voice files when possible&lt;/li&gt;
&lt;li&gt;Use signed URLs for access control (not public-read S3)&lt;/li&gt;
&lt;li&gt;What I did instead of ☝️ was mount S3 directly to the pod in EKS to simplify permissions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Snapshot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Frontend] → [API Gateway]
     ↓             ↓
 [Auth Layer] → [SQS]
                     ↓
                [EKS]
               ↓         ↓
          [S3 Audio]   [CloudWatch Logs]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Success Metrics That Matter
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Avg response time&lt;/li&gt;
&lt;li&gt;✅ Batch jobs processed within SLA window&lt;/li&gt;
&lt;li&gt;✅ Cost per voice file&lt;/li&gt;
&lt;li&gt;✅ API success rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re not measuring these, you’re flying blind.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Scaling a voice AI platform isn’t about tossing more compute at the problem. It’s about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understanding what type of workload you’re running&lt;/li&gt;
&lt;li&gt;Decoupling smartly&lt;/li&gt;
&lt;li&gt;Tuning services like an engine, not a hammer&lt;/li&gt;
&lt;li&gt;Building enough observability to know when things go sideways&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best part? With AWS, you can build something that scales to millions — and still fits in a startup budget. If you design it right.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Security in Voice AI: Safeguarding Cloned Voice Data and APIs with AWS Best Practices</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Tue, 17 Jun 2025 21:53:01 +0000</pubDate>
      <link>https://dev.to/semperfitodd/security-in-voice-ai-safeguarding-cloned-voice-data-and-apis-with-aws-best-practices-3i7d</link>
      <guid>https://dev.to/semperfitodd/security-in-voice-ai-safeguarding-cloned-voice-data-and-apis-with-aws-best-practices-3i7d</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who Treats IAM Policies Like They're Handling Live Ammo&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Voice AI is cool — until it leaks a customer’s audio file to the internet, ends up on a subreddit, and your CISO faints into a pile of SOC 2 binders. If you’re going to work with AI-generated voices, especially self-hosted ones, you better know how to &lt;strong&gt;lock it down&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article breaks down how to &lt;strong&gt;secure your voice cloning infrastructure&lt;/strong&gt; on AWS the way a Marine would: with discipline, precision, and zero tolerance for sloppy access control.&lt;/p&gt;

&lt;p&gt;Whether you're in finance, healthcare, insurance, or just paranoid (which in cloud security is a virtue), here’s how to bulletproof your deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. IAM: Zero Trust or Bust
&lt;/h2&gt;

&lt;p&gt;First rule: no service should have more access than it needs. IAM is your gatekeeper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Least Privilege
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Every Lambda, EKS deployment, and API Gateway integration uses its own IAM role.&lt;/li&gt;
&lt;li&gt;S3 permissions are scoped to &lt;em&gt;specific&lt;/em&gt; buckets and prefixes.&lt;/li&gt;
&lt;li&gt;No wildcard &lt;code&gt;"Action": "*"&lt;/code&gt; or &lt;code&gt;"Resource": "*"&lt;/code&gt; nonsense.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inline vs Managed Policies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use custom inline policies to restrict actions tightly.&lt;/li&gt;
&lt;li&gt;Avoid attaching AWS-managed policies directly unless scoped by a boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example policy snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
             &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
             &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::voice-clone-prod/audio/*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Network Security: Stay in the VPC
&lt;/h2&gt;

&lt;p&gt;Your inference engine (like Tortoise-TTS in ECS) does &lt;strong&gt;not&lt;/strong&gt; need a public IP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best practices:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;EkS nodes live in &lt;strong&gt;private subnets&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;NAT Gateway used only when outbound is required.&lt;/li&gt;
&lt;li&gt;No internet-facing access unless explicitly required (e.g., CloudFront).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re feeling extra paranoid, attach a WAF to your CloudFront and enable throttling + IP filtering. Because someday someone &lt;em&gt;will&lt;/em&gt; test your endpoint with curl.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Data Protection: Encrypt Everything
&lt;/h2&gt;

&lt;h3&gt;
  
  
  At Rest:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;S3 buckets with default encryption of CMK.&lt;/li&gt;
&lt;li&gt;Sensitive metadata (user ID, timestamps, script text) also encrypted at the application level if needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  In Transit:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;HTTPS only. TLS 1.2+. No exceptions.&lt;/li&gt;
&lt;li&gt;Custom domain for APIs using CloudFront + ACM-managed certs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secrets:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt; for storing:

&lt;ul&gt;
&lt;li&gt;API keys&lt;/li&gt;
&lt;li&gt;Database creds&lt;/li&gt;
&lt;li&gt;Model-specific config&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Accessed at runtime only via scoped roles. Rotated. Audited.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Logging &amp;amp; Monitoring: If You Can’t See It, You Can’t Secure It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CloudWatch Logs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Capture API requests (via API Gateway logging).&lt;/li&gt;
&lt;li&gt;Log custom metrics: request duration, model inference times, failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CloudTrail:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enabled globally.&lt;/li&gt;
&lt;li&gt;Monitors:

&lt;ul&gt;
&lt;li&gt;IAM role usage&lt;/li&gt;
&lt;li&gt;S3 access&lt;/li&gt;
&lt;li&gt;Secrets Manager requests&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Export logs to S3 and send alarms via SNS if weird things happen — like someone trying to access from &lt;code&gt;us-east-5&lt;/code&gt;...&lt;/p&gt;

&lt;h3&gt;
  
  
  GuardDuty + Security Hub:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Detects anomalies: port scanning, unexpected API usage, etc.&lt;/li&gt;
&lt;li&gt;Integrate with your SIEM or just let it yell at your DevSecOps channel in Slack.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. API Security: No One Hits My Endpoint Without ID
&lt;/h2&gt;

&lt;p&gt;Your API Gateway isn’t public candy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Options:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IAM auth&lt;/strong&gt; for internal services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Auth&lt;/strong&gt; for user-level access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys + usage plans&lt;/strong&gt; for partner integrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAF&lt;/strong&gt; rules to rate-limit, IP block, and reject known bad patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can even use Lambda authorizers if you want to get creative with token validation (which is what I did).&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Isolation By Design
&lt;/h2&gt;

&lt;p&gt;If you’re multi-tenant (e.g., supporting multiple departments or clients):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Isolate environments by &lt;strong&gt;account&lt;/strong&gt; (best) or &lt;strong&gt;VPC/namespace&lt;/strong&gt; (acceptable).&lt;/li&gt;
&lt;li&gt;Separate S3 prefixes per tenant with enforced IAM policies.&lt;/li&gt;
&lt;li&gt;Don’t ever cross audio files or inference containers across customers unless it’s anonymized and approved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bonus: tag everything (&lt;code&gt;Environment&lt;/code&gt;, &lt;code&gt;Owner&lt;/code&gt;, &lt;code&gt;DataSensitivity&lt;/code&gt;) to support automated compliance checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Compliance: Make Auditors Say “Wow”
&lt;/h2&gt;

&lt;p&gt;HIPAA? SOC 2? GDPR? CCPA? No problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What They’ll Want:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Encryption policies (check)&lt;/li&gt;
&lt;li&gt;Logging and access monitoring (check)&lt;/li&gt;
&lt;li&gt;User access controls (check)&lt;/li&gt;
&lt;li&gt;Data retention and deletion capabilities (also check)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 lifecycle policies (auto-delete after 90 days)&lt;/li&gt;
&lt;li&gt;Explicit “DeleteObject” API access in IAM&lt;/li&gt;
&lt;li&gt;Audit report generation from CloudTrail + Athena queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They won’t just nod — they’ll &lt;strong&gt;invite you to present&lt;/strong&gt; at their next audit prep session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Security Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Secured With&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IAM Roles&lt;/td&gt;
&lt;td&gt;Scoped to service/resource level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Buckets&lt;/td&gt;
&lt;td&gt;KMS encryption + bucket policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;Auth, WAF, throttling, logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;td&gt;Private subnets, no public IPs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets&lt;/td&gt;
&lt;td&gt;Secrets Manager + least-privilege access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;CloudWatch, CloudTrail, GuardDuty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;Automated logs + data lifecycle enforcement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Security in voice AI isn’t optional — especially when you’re generating content that sounds like &lt;strong&gt;your employees, agents, or doctors&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Done right, a voice cloning platform on AWS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keeps customer data locked down&lt;/li&gt;
&lt;li&gt;Delivers zero-trust compliance&lt;/li&gt;
&lt;li&gt;Maintains auditability for even the most intense regulatory environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And best of all? It still scales, still performs, and still costs less than most per-character voice APIs.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>The ROI of Voice Automation: Cost Savings and Efficiency Gains from Self-Hosted Voice Clones on AWS</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Mon, 16 Jun 2025 15:29:13 +0000</pubDate>
      <link>https://dev.to/semperfitodd/the-roi-of-voice-automation-cost-savings-and-efficiency-gains-from-self-hosted-voice-clones-on-aws-4kmn</link>
      <guid>https://dev.to/semperfitodd/the-roi-of-voice-automation-cost-savings-and-efficiency-gains-from-self-hosted-voice-clones-on-aws-4kmn</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Guy Who’d Rather Pay for Compute Than Per-Character TTS Pricing&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Let’s skip the buzzwords and get straight to what your CFO actually cares about: &lt;strong&gt;does this AI voice thing save money&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;The answer is yes — if you do it right. That means not paying extra per character to a SaaS platform that charges more to say “please hold” than a human would to just answer the call.&lt;/p&gt;

&lt;p&gt;This article lays out the real-world return on investment (ROI) of deploying a &lt;strong&gt;self-hosted voice cloning platform on AWS&lt;/strong&gt;, based on what I’ve built — and what you can too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Pay-Per-Sentence
&lt;/h2&gt;

&lt;p&gt;Managed voice APIs (Polly, ElevenLabs, you name it) are fantastic for prototypes. But scale them up and they’ll chew through your budget faster than a sales team with an open bar.&lt;/p&gt;

&lt;p&gt;Let’s say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You send 100,000 personalized voice messages per month.&lt;/li&gt;
&lt;li&gt;Each message averages 800 characters.&lt;/li&gt;
&lt;li&gt;That’s 80,000,000 characters — or &lt;strong&gt;$240/month minimum&lt;/strong&gt; with Polly.&lt;/li&gt;
&lt;li&gt;Scale that by 12 months and &lt;strong&gt;$2880/year&lt;/strong&gt; — just to say the same things over and over again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now imagine that same workload running &lt;strong&gt;inside your AWS account&lt;/strong&gt;, on &lt;strong&gt;your infrastructure&lt;/strong&gt;, with &lt;strong&gt;no recurring per-character licensing&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Savings Come From
&lt;/h2&gt;

&lt;p&gt;Let’s break it down.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Hosting
&lt;/h3&gt;

&lt;p&gt;Use open-source models like &lt;strong&gt;Tortoise-TTS&lt;/strong&gt; or &lt;strong&gt;Coqui&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No licensing fees.&lt;/li&gt;
&lt;li&gt;Full control over inference.&lt;/li&gt;
&lt;li&gt;Deploy via &lt;strong&gt;EKS&lt;/strong&gt;, &lt;strong&gt;Lambda&lt;/strong&gt;, or &lt;strong&gt;SageMaker&lt;/strong&gt; depending on workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Compute Strategy
&lt;/h3&gt;

&lt;p&gt;You’re not running this thing 24/7 — you’re processing jobs in bursts. That’s what AWS does best.&lt;/p&gt;

&lt;p&gt;Options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda&lt;/strong&gt; for short jobs (&amp;lt;15s).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EKS spot&lt;/strong&gt; for longer, cost-effective bursts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker endpoints&lt;/strong&gt; for real-time inference with GPU when needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;Audio and logs live in &lt;strong&gt;Amazon S3&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard + Infrequent Access tiers.&lt;/li&gt;
&lt;li&gt;Lifecycle policies auto-archive old content.&lt;/li&gt;
&lt;li&gt;Total cost for 100,000 audio files (10 sec each): ~$2/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reuse and Replay
&lt;/h3&gt;

&lt;p&gt;One of the biggest wins of self-hosted: &lt;strong&gt;cache and reuse output&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did Jane Smith’s insurance reminder change? No? Reuse last month’s voice file.&lt;/li&gt;
&lt;li&gt;Store hashed scripts → check before reprocessing.&lt;/li&gt;
&lt;li&gt;Huge savings. Huge.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Automation and CI/CD
&lt;/h3&gt;

&lt;p&gt;Terraform + GitHub Actions = no manual deployment overhead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost to manage: low.&lt;/li&gt;
&lt;li&gt;Time to deploy new voices or updates: minutes.&lt;/li&gt;
&lt;li&gt;Maintenance: minimal (patch EKS images monthly or use managed runtime updates).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  But Wait, There’s More (Than Cost)
&lt;/h2&gt;

&lt;p&gt;It’s not just about saving money. It’s about &lt;strong&gt;what you unlock&lt;/strong&gt; when you stop renting voices and start owning your own pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;New voices in minutes, not 2 weeks waiting on a vendor’s custom voice program.&lt;/li&gt;
&lt;li&gt;Edits and updates in minutes — push a commit, redeploy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Privacy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No PII leaves your AWS environment.&lt;/li&gt;
&lt;li&gt;No “for quality and training purposes” clause buried in a vendor contract.&lt;/li&gt;
&lt;li&gt;You control retention, logging, and compliance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scalability
&lt;/h3&gt;

&lt;p&gt;You’re in control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scale EKS tasks based on SQS queues.&lt;/li&gt;
&lt;li&gt;Possibly Use Step Functions for batch workflows.&lt;/li&gt;
&lt;li&gt;Go global with CloudFront + S3 for voice file distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Example: Insurance Use Case
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: An insurance company sends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50,000 monthly reminders.&lt;/li&gt;
&lt;li&gt;25,000 claims updates.&lt;/li&gt;
&lt;li&gt;10,000 wellness check-in messages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Managed TTS Cost&lt;/strong&gt;: ~$2,280/month&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Self-Hosted AWS Cost&lt;/strong&gt;: ~$150/month (including compute, storage, monitoring)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Annual Savings&lt;/strong&gt;: Over &lt;strong&gt;$25,560&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now toss in brand voice control, security, reusability, and better CX — and you’ve got an ROI case that even the most skeptical exec will nod at between Slack messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Total Cost Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Monthly Estimate (Self-Hosted)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EKS Compute (Spot)&lt;/td&gt;
&lt;td&gt;$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Storage&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Logs&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets Manager&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD (GitHub)&lt;/td&gt;
&lt;td&gt;Free (or already included)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$130-$150/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Compared to managed APIs at 10x that cost, with less flexibility.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ROI Bonus Points
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Reuse recordings? ✅&lt;/li&gt;
&lt;li&gt;Clone internal voices? ✅&lt;/li&gt;
&lt;li&gt;Multilingual support? ✅&lt;/li&gt;
&lt;li&gt;Sync to CRM or EMR systems? ✅&lt;/li&gt;
&lt;li&gt;Monetize the platform as a service offering? Don’t tempt me.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re still paying per character for voice automation, it’s time to ask why.&lt;/p&gt;

&lt;p&gt;AWS gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control&lt;/li&gt;
&lt;li&gt;Cost savings&lt;/li&gt;
&lt;li&gt;Flexibility&lt;/li&gt;
&lt;li&gt;Compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You just need the courage (and maybe some Terraform modules) to build it.&lt;/p&gt;

&lt;p&gt;And once you do? You own the pipeline, the experience, and the margins. That’s not just ROI — that’s a competitive advantage.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>AI Voices in Healthcare: Ensuring Privacy and Compliance with AWS-Powered Voice Cloning</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Fri, 13 Jun 2025 14:46:07 +0000</pubDate>
      <link>https://dev.to/semperfitodd/ai-voices-in-healthcare-ensuring-privacy-and-compliance-with-aws-powered-voice-cloning-2a43</link>
      <guid>https://dev.to/semperfitodd/ai-voices-in-healthcare-ensuring-privacy-and-compliance-with-aws-powered-voice-cloning-2a43</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Voice Cloning Nerd with a Respect for HIPAA and Heavy Deadlifts&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Healthcare doesn’t mess around when it comes to privacy. Between HIPAA, HITRUST, and the unofficial but very real “don’t you dare leak my test results” rule, any AI solution operating in this space better know how to behave.&lt;/p&gt;

&lt;p&gt;So when I decided to bring voice cloning — yes, real-time AI-generated voices — into healthcare workflows, I knew two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It had to &lt;em&gt;feel&lt;/em&gt; human.&lt;/li&gt;
&lt;li&gt;It had to &lt;em&gt;act&lt;/em&gt; like a raider-trained compliance officer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s talk about how we built a fully self-hosted, AWS-powered voice cloning platform designed for &lt;strong&gt;healthcare environments&lt;/strong&gt; — balancing personalization with the paranoia (justified!) that comes with handling PHI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Voice Cloning in Healthcare?
&lt;/h2&gt;

&lt;p&gt;Simple: people trust people, not robots.&lt;/p&gt;

&lt;p&gt;Voice matters when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A nurse gives post-op instructions.&lt;/li&gt;
&lt;li&gt;A doctor shares lab results.&lt;/li&gt;
&lt;li&gt;A health coach follows up on a treatment plan.&lt;/li&gt;
&lt;li&gt;A reminder tells someone to refill their prescription.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now imagine all that happening &lt;strong&gt;automatically&lt;/strong&gt;, 24/7, in the patient’s language and tone preference — without overloading human staff.&lt;/p&gt;

&lt;p&gt;That’s where AI voice cloning comes in. But only if it’s private, secure, and compliant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step One: Host It Yourself (on AWS)
&lt;/h2&gt;

&lt;p&gt;Unlike third-party voice APIs that send data off into the magical ether (along with your compliance budget), our platform runs &lt;strong&gt;100% inside your AWS account&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Stack:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EKS&lt;/strong&gt; for compute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3&lt;/strong&gt; for audio storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt; to receive input and trigger inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM roles&lt;/strong&gt; scoped to specific services (no wide-open buckets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudTrail&lt;/strong&gt; and &lt;strong&gt;CloudWatch&lt;/strong&gt; for audit and observability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform&lt;/strong&gt; for everything (because of course)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All audio data — both input and output — remains fully encrypted, access-controlled, and traceable.&lt;/p&gt;

&lt;h2&gt;
  
  
  HIPAA Compliance: More Than Just a Checkbox
&lt;/h2&gt;

&lt;p&gt;Want to make an auditor smile? Do this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Encryption
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;At rest&lt;/strong&gt;: S3 + AWS KMS-managed keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In transit&lt;/strong&gt;: TLS 1.2+ enforced everywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Access Control
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;IAM roles scoped per service.&lt;/li&gt;
&lt;li&gt;No user access to buckets.&lt;/li&gt;
&lt;li&gt;API Gateway protected with &lt;strong&gt;Custom Lambda Tokens&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Auditing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CloudTrail logs every API call.&lt;/li&gt;
&lt;li&gt;CloudWatch logs all inference requests, failures, and usage patterns.&lt;/li&gt;
&lt;li&gt;Optional integration with &lt;strong&gt;Security Hub&lt;/strong&gt; and &lt;strong&gt;GuardDuty&lt;/strong&gt; for threat detection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Residency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Deploy to specific AWS regions.&lt;/li&gt;
&lt;li&gt;Restrict S3 bucket replication or data movement across borders.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Retention Policies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lifecycle rules on S3 buckets for data expiration.&lt;/li&gt;
&lt;li&gt;Optional patient-specific TTL enforcement via tagging.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Healthcare Use Cases
&lt;/h2&gt;

&lt;p&gt;Let’s get specific. Here’s what this platform can do &lt;strong&gt;today&lt;/strong&gt; in healthcare:&lt;/p&gt;

&lt;h3&gt;
  
  
  Post-Op Follow-ups
&lt;/h3&gt;

&lt;p&gt;Patients receive a voice message that sounds like their nurse, detailing what to watch for, when to call, and how to care for themselves. Delivered at scale. Personalized. Consistent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prescription Reminders
&lt;/h3&gt;

&lt;p&gt;A voice reminder that says, “Hi James, it’s time to refill your Metformin.” Not a generic robovoice — their actual provider’s voice. Higher adherence. Lower readmission.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mental Health Coaching
&lt;/h3&gt;

&lt;p&gt;Cloned voices with tone-aware delivery can help deliver supportive messages in a &lt;strong&gt;non-threatening&lt;/strong&gt;, empathetic way — even in different languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pediatric Care Instructions
&lt;/h3&gt;

&lt;p&gt;Parents hear instructions from the doctor their child saw — not a stranger. Less confusion, more trust, and fewer frantic follow-up calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Snapshot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Patient Input] → [API Gateway] → [EKS]
       ↓                             ↓
    [Auth]                    [Voice Cloning Container]
       ↓                             ↓
 [Audit Logs] ← CloudWatch ← S3 Storage → [Frontend or IVR System]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is logged. Nothing leaks. And your IT security team gets dashboards they can show off at compliance reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security-First Development Practices
&lt;/h2&gt;

&lt;p&gt;We didn’t stop at infra:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All containers are scanned via Amazon ECR vulnerability scanning.&lt;/li&gt;
&lt;li&gt;Enforced static code checks and Terraform validations.&lt;/li&gt;
&lt;li&gt;No hardcoded secrets — everything’s injected at runtime via Secrets Manager (really easy with boto3).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost? Reasonable. Sanity? Preserved.
&lt;/h2&gt;

&lt;p&gt;With EKS + spot pricing, inference costs can be as low as &lt;strong&gt;fractions of a cent&lt;/strong&gt; per request. Compare that to vendor APIs charging you per character and throwing your data in a training set you never approved.&lt;/p&gt;

&lt;p&gt;Also: owning your platform means &lt;strong&gt;you set the rules&lt;/strong&gt; — not some ML black box team you’ve never met.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Custom Solutions?
&lt;/h2&gt;

&lt;p&gt;Polly is great for standard TTS tasks, but it won’t let you natively train your own voice models. That’s a dealbreaker.&lt;/p&gt;

&lt;p&gt;With our custom approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You control the model.&lt;/li&gt;
&lt;li&gt;You define what’s stored and what’s deleted.&lt;/li&gt;
&lt;li&gt;You can version models per patient, provider, or condition.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Healthcare deserves better than phone trees and tinny robovoices. It deserves personalization &lt;em&gt;and&lt;/em&gt; privacy. That’s not a contradiction — that’s architecture.&lt;/p&gt;

&lt;p&gt;This voice cloning platform gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full HIPAA-compliant deployment in AWS&lt;/li&gt;
&lt;li&gt;Secure, scalable model inference&lt;/li&gt;
&lt;li&gt;Meaningful, personalized communication at scale&lt;/li&gt;
&lt;li&gt;Peace of mind for patients &lt;em&gt;and&lt;/em&gt; compliance teams&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>aws</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Voice Cloning for Financial Services: Revolutionizing Customer Engagement in a Secure AWS Environment</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Thu, 12 Jun 2025 17:09:22 +0000</pubDate>
      <link>https://dev.to/semperfitodd/voice-cloning-for-financial-services-revolutionizing-customer-engagement-in-a-secure-aws-5dpf</link>
      <guid>https://dev.to/semperfitodd/voice-cloning-for-financial-services-revolutionizing-customer-engagement-in-a-secure-aws-5dpf</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Slightly Over-Caffeinated Cloud Nerd&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;If there’s one thing financial institutions love more than acronyms, it’s trust. And if there’s one thing their customers can’t stand, it’s robotic voice systems that sound like they were pulled from a 1995 infomercial. Welcome to the intersection of personalization, security, and scale — where voice cloning and AWS meet to deliver something banks didn’t know they needed but now absolutely do.&lt;/p&gt;

&lt;p&gt;This article dives deep into how a self-hosted, AWS-powered voice cloning platform (built by yours truly) can transform customer engagement in finance — all while checking the boxes on &lt;strong&gt;security&lt;/strong&gt;, &lt;strong&gt;compliance&lt;/strong&gt;, and &lt;strong&gt;cost efficiency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;See how I cloned my own voice on EKS.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  Your browser does not support the audio element.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Voice Cloning in Finance?
&lt;/h2&gt;

&lt;p&gt;Customer experience in financial services is, well... lagging. Long hold times, disconnected call scripts, and the “please enter your account number followed by pound” robot voice aren’t helping your NPS.&lt;/p&gt;

&lt;p&gt;Enter voice cloning — not the gimmicky, deepfake-adjacent nonsense, but a real, controlled, secure AI system that speaks &lt;strong&gt;like your people&lt;/strong&gt;. Imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loan officers sending personalized voice messages to clients.&lt;/li&gt;
&lt;li&gt;Fraud alerts spoken in a trusted representative’s voice.&lt;/li&gt;
&lt;li&gt;Wealth management updates delivered as though your advisor recorded them at 5AM just for you (which, let’s be honest, they didn’t).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  But Is It Secure?
&lt;/h2&gt;

&lt;p&gt;Glad you asked, compliance team.&lt;/p&gt;

&lt;p&gt;This solution runs &lt;strong&gt;entirely within your AWS account&lt;/strong&gt;, deployed with Terraform, and locked down tighter than a vault in Zurich.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM and Zero Trust
&lt;/h3&gt;

&lt;p&gt;Fine-grained IAM roles mean &lt;strong&gt;no unnecessary access&lt;/strong&gt;. Your API Gateway only talks to your ECS/Lambda backend. CloudWatch is there to rat out any shady behavior. There are &lt;strong&gt;no wildcard permissions&lt;/strong&gt;, no “trust me, bro” roles. This is zero-trust, Marine-style.&lt;/p&gt;

&lt;h3&gt;
  
  
  Private Networking
&lt;/h3&gt;

&lt;p&gt;The inference engine? Lives in a &lt;strong&gt;private subnet&lt;/strong&gt;, behind a NAT gateway, with zero public internet exposure. Only API Gateway (optionally fronted by WAF and Cognito for auth) gets a whiff of the outside world.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Sovereignty
&lt;/h3&gt;

&lt;p&gt;All voice data — input, output, and model artifacts — stay in &lt;strong&gt;your&lt;/strong&gt; encrypted S3 buckets. Managed with &lt;strong&gt;KMS&lt;/strong&gt;, audit-logged with &lt;strong&gt;CloudTrail&lt;/strong&gt;, and optionally replicated across &lt;strong&gt;regions&lt;/strong&gt; for DR. You want to keep it in-country? Easy. You want retention policies? Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Considerations: Polly vs. Clone
&lt;/h2&gt;

&lt;p&gt;Let’s not kid ourselves — Polly’s cheap. Until it isn’t.&lt;/p&gt;

&lt;p&gt;If you’re doing high-volume interactions, especially personalized ones, Polly’s per-character pricing quickly adds up. And don’t forget, Polly’s voices aren’t &lt;em&gt;yours&lt;/em&gt;. You’re just renting them, like a tux that fits weird in the shoulders.&lt;/p&gt;

&lt;p&gt;With a self-hosted solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run inference on &lt;strong&gt;spot EKS nodes&lt;/strong&gt; for efficiency.&lt;/li&gt;
&lt;li&gt;Use batching strategies for outbound messages.&lt;/li&gt;
&lt;li&gt;Control your hardware (yes, even GPUs if you want to be extra fancy with SageMaker).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;End result? Lower cost at scale, and a voice pipeline you &lt;strong&gt;own&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Let’s talk use cases that actually matter to finance.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Loan Decisions That Don’t Sound Robotic&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Your platform can generate approval or denial messages in the same voice that onboarded the customer. Humanizing the experience reduces complaints and increases clarity — especially when tone and inflection match the gravity of the message.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;High-Touch Wealth Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Top-tier clients expect personalization. Sending periodic market updates or insights in a familiar voice — even when pre-recorded — maintains engagement without chewing up your advisor’s calendar.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Fraud Alerts with Trust&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Fraud is sensitive. Customers ignore robocalls, but if it &lt;em&gt;sounds&lt;/em&gt; like the rep they spoke to last week? Now you’ve got their attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Interactive Voice Portals&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Imagine an IVR that doesn’t sound like every other bank. One that adapts tone to customer segment, preferred language, or even regional accent. All while running on infrastructure you control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance: Because Auditors Are People Too
&lt;/h2&gt;

&lt;p&gt;Here’s what regulators care about, and how this solution handles it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;How It’s Handled&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data encryption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All data encrypted at rest (S3/KMS) and in transit (HTTPS/TLS 1.2+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auditability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CloudTrail + CloudWatch logs on every transaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Access controls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IAM policies restrict roles to least privilege&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Geolocation controls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bucket policies, VPC restrictions, and region pinning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data retention&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automated TTL and lifecycle policies in S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PII isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Separate storage, tagging, and policy enforcement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This isn’t just compliant. It’s auditor catnip.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Snapshot
&lt;/h2&gt;

&lt;p&gt;Here’s a high-level view of what powers this thing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Static React app hosted on Amazon S3 + CloudFront.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend API&lt;/strong&gt;: Amazon API Gateway + AWS EKS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Inference&lt;/strong&gt;: Open-source TTS model (like Tortoise-TTS) wrapped in Docker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: Amazon S3 with KMS, versioning, lifecycle rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: IAM, VPC, CloudTrail, CloudWatch, WAF, Cognito (optional).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infra Management&lt;/strong&gt;: Terraform, like every project that respects itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yes, it’s all in code. No click-ops here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personalization That Scales
&lt;/h2&gt;

&lt;p&gt;Here’s the real kicker: you don’t have to build one voice. You can build &lt;strong&gt;hundreds&lt;/strong&gt;. For:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Branch-specific greetings&lt;/li&gt;
&lt;li&gt;Multilingual support&lt;/li&gt;
&lt;li&gt;Client segmentation&lt;/li&gt;
&lt;li&gt;Seasonal promos ("Happy Holidays from First Trust!")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it’s reproducible, auditable, and automated — a CI/CD dream for voice systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Financial institutions that want to stay relevant in 2025 and beyond need to stop thinking like call centers and start thinking like &lt;strong&gt;brand experience engines&lt;/strong&gt;. Voice is the next frontier — and not the kind that yells at you to reset your PIN.&lt;/p&gt;

&lt;p&gt;If you're serious about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Controlling costs,&lt;/li&gt;
&lt;li&gt;Strengthening compliance,&lt;/li&gt;
&lt;li&gt;Enhancing trust,&lt;/li&gt;
&lt;li&gt;And delivering &lt;em&gt;real&lt;/em&gt; personalization...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then building your own voice platform on AWS isn’t just viable — it’s inevitable.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Written by:&lt;/strong&gt; Todd Bernson, CTO, Voice Cloning Nerd, USMC Vet, and Probably Lifting Something Heavy Right Now&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Beyond Polly: Custom Voice Cloning on AWS vs. Using Native AWS AI Services</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Wed, 11 Jun 2025 17:17:16 +0000</pubDate>
      <link>https://dev.to/semperfitodd/beyond-polly-custom-voice-cloning-on-aws-vs-using-native-aws-ai-services-34ep</link>
      <guid>https://dev.to/semperfitodd/beyond-polly-custom-voice-cloning-on-aws-vs-using-native-aws-ai-services-34ep</guid>
      <description>&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, Voice Architect, and Guy Who Politely Declined Polly’s Help Because He Could Do It Better Himself&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Let’s get something straight: Amazon Polly is great — until it isn’t. If you’re building a chatbot, narrating product updates, or making your app sound vaguely robotic (in a “pleasant call center” way), Polly delivers. It’s fast, it’s affordable, and it supports multiple languages with all the predictable cheer of a Disney ride operator.&lt;/p&gt;

&lt;p&gt;But what happens when you want your voice app to sound... like you? Or your CEO? Or your 90-year-old grandfather? What if you need complete control over pronunciation, tone, pause patterns, and the ability to train on custom audio that would make Polly blush?&lt;/p&gt;

&lt;p&gt;This is where the polite façade of managed services starts to fray, and custom voice cloning takes the stage — enter my self-hosted, AWS-powered, open-source driven voice cloning platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Polly: The Managed Marvel
&lt;/h3&gt;

&lt;p&gt;Let’s give credit where it’s due. Polly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is easy to use.&lt;/li&gt;
&lt;li&gt;Scales automatically.&lt;/li&gt;
&lt;li&gt;Requires zero infrastructure.&lt;/li&gt;
&lt;li&gt;Has SDKs for everything from Python to C++ to Amazon’s favorite child: JavaScript.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading weather forecasts aloud.&lt;/li&gt;
&lt;li&gt;Voicing automated reminders.&lt;/li&gt;
&lt;li&gt;Anything with a script that doesn’t care if it sounds like everyone else.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it’s not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customizable beyond SSML tags.&lt;/li&gt;
&lt;li&gt;Trainable on new voices.&lt;/li&gt;
&lt;li&gt;Particularly &lt;em&gt;human&lt;/em&gt; in tone or nuance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For regulated industries like finance and healthcare — where personalization, privacy, and control matter more than a “cheerful male voice number 4” — Polly’s out-of-the-box charm wears thin.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building a Custom Voice Cloner (Like a Lunatic With Free Time)
&lt;/h3&gt;

&lt;p&gt;So I did what any sensible AI engineer would do: built my own (Gunny Highway voice - "Improvise, Adapt, Overcome.)&lt;/p&gt;

&lt;p&gt;This custom voice cloning app runs entirely in AWS — but not using AWS ML services like Polly or Bedrock. Instead, it’s built around open-source models like &lt;strong&gt;Tortoise-TTS&lt;/strong&gt;, containerized, and deployed on EKS, with full integration across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3&lt;/strong&gt; (storage for audio input/output)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EKS&lt;/strong&gt; (inference jobs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt; (entry point)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM&lt;/strong&gt; (tight security, no wildcard party hats)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch&lt;/strong&gt; (observability for when someone uploads 17-minute TED Talks for cloning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a black box that behaves the way I want it to: securely, at scale, with custom voices and zero vendor lock-in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Custom?
&lt;/h3&gt;

&lt;p&gt;Here’s the deal:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Voice Uniqueness&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Custom voice cloning allows you to train on &lt;em&gt;your own&lt;/em&gt; audio samples. Want to sound like Morgan Freeman’s long-lost cousin? No problem (as long as you have the licensing — stay legal, kids).&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Full Control Over Output&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;With Polly, you’re stuck adjusting speech patterns via markup. With Tortoise-TTS and similar models, you can control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intonation&lt;/li&gt;
&lt;li&gt;Breathing pauses&lt;/li&gt;
&lt;li&gt;Emotional delivery&lt;/li&gt;
&lt;li&gt;Speech rate based on training inputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is priceless when crafting a brand experience, or in sensitive use cases like reading lab results to patients or delivering loan decisions with empathy.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Data Privacy and Residency&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If you're working in finance or healthcare, you already know: data sovereignty is everything. When you run the model inside your own AWS account, using private S3 buckets and hardened VPCs, you're no longer just compliant — you're &lt;em&gt;bulletproof&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;No customer voice data ever leaves your control. No vendor logs. No "AI improvement” clause buried in the EULA.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Cost at Scale&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Managed services shine at low volume. But clone 100,000 personalized voicemails a day and Polly's per-character pricing turns into a CFO’s nightmare.&lt;/p&gt;

&lt;p&gt;Running your own inference jobs on EKS with spot instances or even SageMaker (if you're feeling fancy) lets you optimize for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per inference&lt;/li&gt;
&lt;li&gt;Batch processing throughput&lt;/li&gt;
&lt;li&gt;GPU/CPU usage tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, there’s engineering overhead. But this is AWS. We eat YAML and billing reports for breakfast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Models: You Can Have Both
&lt;/h3&gt;

&lt;p&gt;Not ready to ditch Polly? You don’t have to.&lt;/p&gt;

&lt;p&gt;Use Polly for generic prompts, but call your custom API for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer names&lt;/li&gt;
&lt;li&gt;High-sensitivity scripts&lt;/li&gt;
&lt;li&gt;Brand voice intros&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mixing and matching is a perfectly viable (and cost-effective) strategy. Your Terraform won’t judge you. Neither will I.&lt;/p&gt;

&lt;h3&gt;
  
  
  Industry Use Cases That Demand Customization
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Finance:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Personalized fraud alerts from a cloned customer rep&lt;/li&gt;
&lt;li&gt;Wealth manager assistant tools using their real voice&lt;/li&gt;
&lt;li&gt;Secure client onboarding instructions that &lt;em&gt;sound&lt;/em&gt; like the company&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Healthcare:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Post-operative instructions read in a familiar nurse’s voice&lt;/li&gt;
&lt;li&gt;Mental health guidance delivered in a calm, patient-specific tone&lt;/li&gt;
&lt;li&gt;Multilingual support without the stilted tone of over-optimized TTS&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Insurance:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Claim updates voiced by agents customers already trust&lt;/li&gt;
&lt;li&gt;Emergency preparation alerts personalized by region&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all of these, the value isn’t just the voice. It’s &lt;strong&gt;trust&lt;/strong&gt;, tone, and &lt;strong&gt;consistency&lt;/strong&gt;. Polly can’t always deliver that.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Reality Check
&lt;/h3&gt;

&lt;p&gt;Running a custom voice clone system means accepting some responsibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model maintenance&lt;/li&gt;
&lt;li&gt;Container updates&lt;/li&gt;
&lt;li&gt;Security patching&lt;/li&gt;
&lt;li&gt;More observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in return, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ownership&lt;/li&gt;
&lt;li&gt;Flexibility&lt;/li&gt;
&lt;li&gt;Enterprise-grade privacy&lt;/li&gt;
&lt;li&gt;The ability to say "yes" to marketing’s weirdest voiceover requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And hey — if something breaks, at least you’ll understand &lt;em&gt;why&lt;/em&gt; it broke. Try getting that from a managed service black box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Verdict: Build When It Matters
&lt;/h3&gt;

&lt;p&gt;There’s a reason AWS gives you building blocks instead of black boxes. It’s because your use case isn’t generic. You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom voices&lt;/li&gt;
&lt;li&gt;Secure environments&lt;/li&gt;
&lt;li&gt;Price control at scale&lt;/li&gt;
&lt;li&gt;A brand voice you actually own&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If that sounds like you, go custom.&lt;/p&gt;

&lt;p&gt;If not, Polly’s waiting with open arms and a smiling, pre-trained voice.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Published by:&lt;/strong&gt; BSC Analytics | Written by Todd Bernson, CTO, Voice Cloning Pioneer, and Proudly Not Polly&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Terraforming the Voice: Deploying a Clone Application with Infrastructure as Code on AWS</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Tue, 10 Jun 2025 15:46:06 +0000</pubDate>
      <link>https://dev.to/semperfitodd/terraforming-the-voice-deploying-a-clone-application-with-infrastructure-as-code-on-aws-338</link>
      <guid>https://dev.to/semperfitodd/terraforming-the-voice-deploying-a-clone-application-with-infrastructure-as-code-on-aws-338</guid>
      <description>&lt;p&gt;Terraforming the Voice: Deploying a Clone Application with Infrastructure as Code on AWS&lt;/p&gt;

&lt;p&gt;&lt;em&gt;By Todd Bernson, CTO of BSC Analytics, Terraform Whisperer&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There’s something beautiful about watching an entire production-grade environment spring to life from a single command — like watching a barbell float off the ground when the form is just right. This article is for those of us who believe that if your infrastructure isn’t defined in code, it’s one rogue click away from disaster.&lt;/p&gt;

&lt;p&gt;Welcome to the story of how I built and deployed a self-hosted voice cloning application on AWS using Terraform for full-stack automation. We’re not talking about a toy project or an ML demo in a Jupyter notebook — this is a fully containerized, production-ready, auto-scaling, API-driven platform running in the cloud, doing real work. And it’s all defined, versioned, and repeatable, thanks to Terraform.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with ClickOps
&lt;/h3&gt;

&lt;p&gt;Before we dive into the nuts and bolts, a quick word about ClickOps: don’t. I’ve seen more environments lost to fat-fingered console misclicks than leg days I've skipped. If your architecture lives in a dashboard, you don’t have architecture — you have a house of cards, built by a caffeinated intern and a bunch of undocumented AWS services.&lt;/p&gt;

&lt;p&gt;Enter Terraform: HashiCorp’s solution for engineers who believe in immutability, repeatability, and not doing the same thing twice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Overview: Voice Cloning Platform
&lt;/h3&gt;

&lt;p&gt;We’re deploying a voice cloning system that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A static frontend hosted on &lt;strong&gt;Amazon S3&lt;/strong&gt; with &lt;strong&gt;CloudFront&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A backend API layer using &lt;strong&gt;API Gateway&lt;/strong&gt;, &lt;strong&gt;Lambda&lt;/strong&gt;, and/or &lt;strong&gt;EKS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;ML inference containers running voice models like &lt;strong&gt;Tortoise-TTS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Audio files and output stored in &lt;strong&gt;S3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Monitoring via &lt;strong&gt;CloudWatch&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;IAM roles for secure, scoped access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of it defined, provisioned, and version-controlled in Terraform. No clicks required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform Module Breakdown
&lt;/h3&gt;

&lt;p&gt;The project is broken into modules. Because monolith Terraform files are like mixing all your protein powders in one shaker — technically it works, but you’ll regret it later.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. &lt;code&gt;s3-static-site&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This module provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An S3 bucket for static frontend files&lt;/li&gt;
&lt;li&gt;CloudFront distribution with proper caching behavior&lt;/li&gt;
&lt;li&gt;OAI (Origin Access Identity) to restrict direct S3 access&lt;/li&gt;
&lt;li&gt;Route53 records if needed for custom domain&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. &lt;code&gt;api-layer&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;Depending on the job type, this module provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway (REST or HTTP)&lt;/li&gt;
&lt;li&gt;Lambda functions (for authorization)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All versions are tracked. All permissions scoped. All endpoints logged.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;code&gt;voice-model-inference&lt;/code&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;EKS using the Tortoise-TTS container from ECR&lt;/li&gt;
&lt;li&gt;IAM roles allowing secure access to model artifacts in S3&lt;/li&gt;
&lt;li&gt;Logging via CloudWatch&lt;/li&gt;
&lt;li&gt;GPU instances if you’re running inferencing at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. &lt;code&gt;monitoring&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;Because observability is not optional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch dashboards&lt;/li&gt;
&lt;li&gt;Log groups with retention policies&lt;/li&gt;
&lt;li&gt;Alarms on task failures, API errors, and latency thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. &lt;code&gt;iam-baseline&lt;/code&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Scoped policies for Lambda and EKS&lt;/li&gt;
&lt;li&gt;Roles for CloudFront, S3 access, and API Gateway execution&lt;/li&gt;
&lt;li&gt;No &lt;code&gt;*&lt;/code&gt; permissions. Ever.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy Flow
&lt;/h3&gt;

&lt;p&gt;Your deploy process should be as crisp as a fresh uniform. Here’s how mine runs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone repo&lt;/li&gt;
&lt;li&gt;Set env-specific &lt;code&gt;terraform.tfvars&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform init&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform plan -out=plan.out&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform apply plan.out&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Grab coffee, watch CloudWatch logs roll in&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each environment (dev, staging, prod) uses workspaces and backend state isolation. You can redeploy the entire stack quickly — assuming us-east-1 isn’t having “a moment.”&lt;/p&gt;

&lt;h3&gt;
  
  
  Secrets and Configs
&lt;/h3&gt;

&lt;p&gt;Secrets are stored in &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;, injected into Lambda and EKS tasks via environment variables.&lt;/p&gt;

&lt;p&gt;If your config lives in &lt;code&gt;config.js&lt;/code&gt;, you might as well tattoo your AWS keys on your forehead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Lessons
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S3 Bucket Policies&lt;/strong&gt;: Don’t let CloudFront cache a 403 error. Test permissions before deploy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform State Locking&lt;/strong&gt;: Use DynamoDB for backend locking or suffer the wrath of simultaneous &lt;code&gt;apply&lt;/code&gt; attempts. Terraform now supports state locking in S3.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Tags&lt;/strong&gt;: Tag everything. Billing reports should not require detective work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dev Experience
&lt;/h3&gt;

&lt;p&gt;Everything’s hooked into &lt;strong&gt;GitHub Actions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lint Terraform&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform plan&lt;/code&gt; and post diff to PR&lt;/li&gt;
&lt;li&gt;Auto-apply on merge to &lt;code&gt;main&lt;/code&gt; (with approval gates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because manual deploys are for the birds. Or for vendors who bill hourly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;Voice cloning isn’t just a novelty. In finance, healthcare, and insurance, it can revolutionize how humans interact with systems. But to be enterprise-ready, it needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Secure deployment&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable architecture&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Auditability&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repeatability&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This Terraform foundation ensures all four. Whether you’re standing up 1 environment or 100, the experience is the same. And when something breaks (it will), you’ll know exactly where to look — not which region your intern forgot to tag.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Building this platform felt like prepping for a lifting competition. The planning mattered as much as the execution, and when everything locked into place — it just felt solid.&lt;/p&gt;

&lt;p&gt;Use Terraform. Use modules. Lock your state. And never let IAM policies become a "temporary fix."&lt;/p&gt;

&lt;p&gt;Semper Fi, and happy provisioning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>aws</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Architecting a Scalable Voice Cloning Platform on AWS: A Case Study</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Mon, 09 Jun 2025 13:17:58 +0000</pubDate>
      <link>https://dev.to/semperfitodd/architecting-a-scalable-voice-cloning-platform-on-aws-a-case-study-20e6</link>
      <guid>https://dev.to/semperfitodd/architecting-a-scalable-voice-cloning-platform-on-aws-a-case-study-20e6</guid>
      <description>&lt;p&gt;If you've ever found yourself staring at a whiteboard trying to connect the dots between AI workloads, secure infrastructure, and scalability, welcome to my world. This is the story of how I built a fully self-hosted, scalable, and cost-optimized voice cloning platform on AWS using only a few tools: Terraform, containers, and a little grit learned from the Marine Corps and a lifetime under a barbell.&lt;/p&gt;

&lt;p&gt;Let me walk you through the choices I made (yes, all of them), the architecture that emerged, and the hilariously non-obvious problems you only find after you're deep into deploying open-source ML models that occasionally throw tantrums like a toddler hyped up on Red Bull.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Voice Cloning for Humans, Not Robots
&lt;/h2&gt;

&lt;p&gt;Text-to-speech platforms are everywhere. Some sound like HAL 9000 on decaf. Others are good, but the second you want to use a proprietary voice (like, say, your own), you're either stuck paying by the syllable or signing your data rights away faster than you can say "GDPR."&lt;br&gt;
So I built my own. A fully self-hosted solution using open-source models (shoutout to Tortoise-TTS and its uncanny ability to clone your voice right down to your awkward pauses). But cloning is only part of the fun — delivering that experience at scale, securely, and reliably is where AWS steps into the spotlight.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;

&lt;p&gt;The stack breaks down like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Static web app hosted on Amazon S3, served through CloudFront.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend API:&lt;/strong&gt; Deployed on ECS Fargate or Lambda (depending on the workload), behind API Gateway.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Model Serving:&lt;/strong&gt; Containerized ML model for inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; S3 for audio and model artifacts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security &amp;amp; Identity:&lt;/strong&gt; IAM roles, policies, and execution contexts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; CloudWatch for logs and metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infra:&lt;/strong&gt; Terraform. Always Terraform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is defined in code, because if it’s not repeatable and testable, it’s a hobby project — not production-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frontend: Static Doesn’t Mean Boring
&lt;/h2&gt;

&lt;p&gt;Let’s be honest, most frontends are glorified HTML wrapped in JavaScript sprinkles. Mine isn’t much different, but it’s clean, fast, and lives on S3 with CloudFront doing the content delivery heavy lifting. It’s versioned, integrated into my Terraform code, and invalidates CloudFront caches during deploys so I don’t get support tickets saying “it’s not loading” from someone’s uncle using IE11.&lt;/p&gt;

&lt;h2&gt;
  
  
  API Layer: Gateway Drug to Lambda or ECS
&lt;/h2&gt;

&lt;p&gt;API Gateway with a VPC Link forwards to a internal load balancer and to EKS deployment.&lt;br&gt;
API Gateway fronts all routes requests based on API parameters. Terraform templates make it trivial to switch execution paths — a small but powerful way to fine-tune cost vs. performance tradeoffs.&lt;br&gt;
And yes, everything is rate-limited, throttled, and logged. Because one day some internal engineer will forget that uploading 200 audio files at once isn't polite.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice Model: Running Tortoise, Fast
&lt;/h2&gt;

&lt;p&gt;Tortoise-TTS doesn’t exactly scream efficiency. It’s a brilliant model — and like all brilliant things, it comes with eccentricities. It’s Dockerized, stored in ECR, and run via EKS deployment triggered by events or API calls.&lt;br&gt;
Each task has access to GPU (if needed). To bypass a lot of the S3 presigned URL complexity, S3 is simply mounted to the kubernetes deployment and uses an SA for least privelege. Yes, I do least privilege here. It’s not just a talking point in my security audit — it’s a way of life.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform: The One True Religion
&lt;/h2&gt;

&lt;p&gt;From the IAM role assumptions to VPC peering, subnet creation, and service discovery — everything is codified in Terraform.&lt;/p&gt;

&lt;p&gt;Key modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;aws_s3_bucket&lt;/li&gt;
&lt;li&gt;aws_lambda_function&lt;/li&gt;
&lt;li&gt;aws_eks_cluster&lt;/li&gt;
&lt;li&gt;aws_api_gateway_http_api&lt;/li&gt;
&lt;li&gt;aws_cloudwatch_log_group&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can burn it all down and stand it back up in just a few minutes. We work smarter not harder, unlike the Marines which sometimes flipped that around.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM: Gatekeeper of Sanity
&lt;/h2&gt;

&lt;p&gt;I treat IAM like a loaded weapon. Every function, container, and service has its own scoped role. S3 buckets enforce object-level permissions. API Gateway uses usage plans and API keys with throttling. There’s no blanket admin access here — even if it makes debugging a little more annoying. It’s worth the tradeoff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Also:&lt;/strong&gt; never, ever let a Lambda function assume a role with wildcard permissions. That way lies madness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: Logs, Metrics, and Catching Fires Early
&lt;/h2&gt;

&lt;p&gt;CloudWatch captures everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda logs&lt;/li&gt;
&lt;li&gt;EKS logs&lt;/li&gt;
&lt;li&gt;Custom metrics for audio generation durations&lt;/li&gt;
&lt;li&gt;Alerts for anomalies (latency spikes, task failures, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can’t fix what you can’t see. I’ve got dashboards that would make a SOC analyst tear up. And not from joy — from envy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Challenges
&lt;/h2&gt;

&lt;p&gt;Running large AI models on AWS is like lifting heavy — it looks cool when it works, but if your form is off, something’s gonna break.&lt;/p&gt;

&lt;p&gt;Problems I ran into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EKS warm-up time was too long for short-lived audio jobs&lt;/li&gt;
&lt;li&gt;CloudFront caching had to be fine-tuned to avoid stale UI/UX bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container layers helped deployment move much more quickly.&lt;/li&gt;
&lt;li&gt;Readiness probe keeps 5xx errors at bay.&lt;/li&gt;
&lt;li&gt;Use CloudFront cache invalidation scripts in CI/CD&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Building this platform was part science, part art, and part gym therapy. AWS gave me the tools, Terraform gave me the control, and coffee gave me the persistence.&lt;/p&gt;

&lt;p&gt;Would I do it again? Absolutely. But I’d like to remind the next brave soul: just because AWS offers 200+ services doesn’t mean you need all of them. Pick the ones that fit your use case. Glue them together smartly. Monitor everything. Lock it all down.&lt;/p&gt;

&lt;p&gt;And if all else fails — lift something heavy, then get back to debugging.&lt;/p&gt;

&lt;p&gt;By Todd Bernson, CTO of BSC Analytics, USMC Veteran, and Certified Deadlifter of Ridiculous Cloud Problems&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>tts</category>
    </item>
    <item>
      <title>Legacy, Meet Cloud Native: Lessons from Blending COBOL, K8s, and ML</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Wed, 09 Apr 2025 14:18:50 +0000</pubDate>
      <link>https://dev.to/semperfitodd/legacy-meet-cloud-native-lessons-from-blending-cobol-k8s-and-ml-420o</link>
      <guid>https://dev.to/semperfitodd/legacy-meet-cloud-native-lessons-from-blending-cobol-k8s-and-ml-420o</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When people talk about modernization, they often picture “lift and shift,” total rewrites, or big-bang digital transformation. But reality is messier. In most enterprises, legacy code like COBOL isn’t going anywhere—it still runs core business functions, and rewriting it is usually a non-starter. Instead, the smarter move is to &lt;em&gt;wrap and extend&lt;/em&gt; it: containerize it, orchestrate it, observe it, and—yes—train machine learning models around it.&lt;/p&gt;

&lt;p&gt;In this final article of the &lt;code&gt;eks_cobol&lt;/code&gt; series, we’ll reflect on the architectural lessons, tech gotchas, and practical wins of combining COBOL, Kubernetes, and SageMaker. You’ll walk away with a blueprint for how to do it in your own environment—and where the landmines are buried.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural Recap
&lt;/h2&gt;

&lt;p&gt;Let’s start with what we built:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;COBOL on Kubernetes&lt;/strong&gt;: We run GnuCOBOL inside containerized workloads, scheduled by K8s Jobs, with persistent shared storage via Amazon EFS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Logging&lt;/strong&gt;: STDOUT/STDERR logs are parsed and saved as JSON files in S3 for traceability and ML readiness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL Sink&lt;/strong&gt;: Valid, enriched records are inserted into a relational store for downstream use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker Model&lt;/strong&gt;: We trained an XGBoost model on historical failures to predict which jobs are likely to fail before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback Loop&lt;/strong&gt;: Inference scores now route high-risk files away from execution or into validation workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s COBOL—but with an observability stack, proactive defense, and self-learning behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Don’t Rewrite What Already Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We didn’t rewrite COBOL. We containerized it. That’s a critical distinction. GnuCOBOL let us preserve decades of business logic while packaging it into a portable, observable runtime. By wrapping COBOL in Docker and invoking it via shell, we gained control without touching the legacy internals.&lt;/p&gt;

&lt;p&gt;If the codebase is stable and correct, leave it alone. Modernize around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Logs Are a Goldmine—Structure Them&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;COBOL wasn’t built for structured logging. But by intercepting logs and shaping them into JSON, we unlocked a treasure trove of analytics possibilities. Every error, success, or anomaly became traceable, searchable, and ML-trainable.&lt;/p&gt;

&lt;p&gt;Your pipeline is only as smart as your logs are readable.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Machine Learning Loves Legacy&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is not hype. ML is perfect for legacy systems because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn’t require code access.&lt;/li&gt;
&lt;li&gt;It thrives on patterns and history.&lt;/li&gt;
&lt;li&gt;It improves incrementally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our failure prediction model now prevents bad jobs from ever running, saving compute time and protecting downstream systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Kubernetes Handles Legacy Workloads Surprisingly Well&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Many assume Kubernetes is for stateless microservices only. Wrong. We used EFS + Jobs + taints/tolerations to isolate legacy workloads without sacrificing elasticity or modern DevOps practices.&lt;/p&gt;

&lt;p&gt;Legacy ≠ incompatible. With the right node pools and volume setup, K8s handles batch, stateful, or weird workloads just fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Async Communication Is Essential&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Each component of this pipeline operates independently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COBOL runs in isolation.&lt;/li&gt;
&lt;li&gt;Parsers and enrichers are microservices.&lt;/li&gt;
&lt;li&gt;ML runs out-of-band, in a parallel path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;S3, EFS, and event-driven messaging (SQS or Step Functions) glue the pieces together. That’s how we scale and decouple without breaking the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas to Watch Out For
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Parsing COBOL Errors Is a Pain
&lt;/h3&gt;

&lt;p&gt;You’ll spend way more time writing regex and building robust parsers than you’d like. COBOL errors weren’t designed to be machine readable. Build good test cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Storage Permissions in K8s + EFS
&lt;/h3&gt;

&lt;p&gt;Mounting EFS with the right IAM and access points requires some pain up front. Use the AWS EFS CSI driver and restrict access by namespace or workload label.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Model Drift Can Sneak Up on You
&lt;/h3&gt;

&lt;p&gt;As inputs evolve (new file formats, new job types), your ML model may lose accuracy. Schedule retraining and monitor for prediction distribution changes using SageMaker Model Monitor.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Job Bloat If You Don’t Clean Up
&lt;/h3&gt;

&lt;p&gt;Kubernetes Jobs can leave stale pods if not configured correctly. Use &lt;code&gt;.spec.ttlSecondsAfterFinished&lt;/code&gt; or a custom controller to delete completed/failed jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This project isn’t just a modernization. It’s proof that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COBOL is not the enemy.&lt;/li&gt;
&lt;li&gt;Kubernetes isn’t just for Node.js and Python.&lt;/li&gt;
&lt;li&gt;Machine learning isn’t just for greenfield use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can combine old and new, determinism and prediction, batch and real-time. It’s not just technically feasible—it’s strategically smart. You protect your investment in legacy, while gaining all the advantages of modern infrastructure and AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Architecture Diagram
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo56y26kddf3vfeboogf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo56y26kddf3vfeboogf.png" alt=" " width="800" height="635"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You don’t need to choose between rewriting everything or staying frozen in time. This series showed how to elevate COBOL with containers, orchestrators, log structure, and machine learning—all without rewriting core logic.&lt;/p&gt;

&lt;p&gt;This hybrid approach isn't just a one-off—it's a repeatable strategy. Any legacy system that produces structured input/output can benefit from this architecture. You give it new life, visibility, and intelligence. And that makes your system—and your team—a lot smarter.&lt;/p&gt;

</description>
      <category>community</category>
      <category>machinelearning</category>
      <category>kubernetes</category>
      <category>cobol</category>
    </item>
    <item>
      <title>Building a Smart Feedback Loop: Real-Time Inference on COBOL Logs</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Tue, 08 Apr 2025 14:54:17 +0000</pubDate>
      <link>https://dev.to/semperfitodd/building-a-smart-feedback-loop-real-time-inference-on-cobol-logs-4n7i</link>
      <guid>https://dev.to/semperfitodd/building-a-smart-feedback-loop-real-time-inference-on-cobol-logs-4n7i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern data pipelines don't stop at processing—they evolve. With our &lt;code&gt;eks_cobol&lt;/code&gt; system running legacy COBOL code on Kubernetes and logging structured outputs, we’ve laid the foundation for a smarter system. Now it’s time to close the loop.&lt;/p&gt;

&lt;p&gt;In this article, we show how we could integrate the SageMaker model from Article 5 into a real-time feedback loop. Instead of just reacting to COBOL job results, we proactively intercept bad inputs &lt;em&gt;before&lt;/em&gt; they cause failure. We’ll cover how inference is triggered pre-execution, how results are logged and acted upon, and how this closes the loop between batch legacy logic and modern ML-based automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Loop: From Prediction to Action
&lt;/h2&gt;

&lt;p&gt;Here’s the basic feedback loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;File is ingested and analyzed.&lt;/li&gt;
&lt;li&gt;Metadata is extracted (size, record count, filename, etc.).&lt;/li&gt;
&lt;li&gt;Metadata is sent to the SageMaker inference endpoint.&lt;/li&gt;
&lt;li&gt;If the predicted probability of failure &amp;gt; threshold:

&lt;ul&gt;
&lt;li&gt;File is flagged or quarantined.&lt;/li&gt;
&lt;li&gt;User is alerted.&lt;/li&gt;
&lt;li&gt;Optionally skipped from COBOL execution.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Otherwise, the file proceeds to COBOL job processing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We use the exact SageMaker endpoint created in Article 5 to power the loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trigger Point: Right After File Ingest
&lt;/h2&gt;

&lt;p&gt;The feedback loop starts after a file lands in the mounted EFS directory. Our ingestion service performs lightweight analysis—no full record parsing, just enough metadata for inference.&lt;/p&gt;

&lt;p&gt;Example features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Byte size (&lt;code&gt;os.path.getsize&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Filename pattern (date, region)&lt;/li&gt;
&lt;li&gt;Number of records (quick line count)&lt;/li&gt;
&lt;li&gt;Known anomalies (e.g., blank lines)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We wrap this logic in a &lt;code&gt;predict_failure_risk()&lt;/code&gt; function that calls the SageMaker endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict_failure_risk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getsize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create simple one-hot encoding for file extension
&lt;/span&gt;    &lt;span class="n"&gt;extension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;ext_flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;extension&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Extend for more types as needed
&lt;/span&gt;
    &lt;span class="c1"&gt;# Simulated other features
&lt;/span&gt;    &lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ext_flags&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;invoke_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;EndpointName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cobol-failure-predictor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ContentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text/csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the returned score exceeds our threshold (0.8 for high confidence), we act.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risk Routing: High vs. Low Confidence Paths
&lt;/h2&gt;

&lt;p&gt;We define 3 potential paths based on model confidence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Low Risk (&amp;lt; 0.5)&lt;/strong&gt;: File is processed normally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium Risk (0.5–0.8)&lt;/strong&gt;: File is tagged but proceeds; alerts may be logged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Risk (&amp;gt; 0.8)&lt;/strong&gt;: File is moved to &lt;code&gt;/mnt/data/quarantine/&lt;/code&gt;, skipped from execution, and flagged for review.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These thresholds are tunable based on model accuracy, job cost, and risk tolerance.&lt;/p&gt;

&lt;p&gt;The routing logic is embedded into the controller script before the COBOL job kicks off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;predict_failure_risk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/input/job123.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High failure risk. Skipping COBOL execution.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;move_to_quarantine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/input/job123.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Medium risk. Proceeding with caution.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Low risk. Running job.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;run_cobol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/mnt/data/input/job123.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Logging and Traceability
&lt;/h2&gt;

&lt;p&gt;For every prediction, we log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Job ID&lt;/li&gt;
&lt;li&gt;Score&lt;/li&gt;
&lt;li&gt;Action taken&lt;/li&gt;
&lt;li&gt;Timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These logs are sent to CloudWatch and optionally to a DynamoDB "job decisions" table for auditing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"job123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"quarantined"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-04-03T18:12:30Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us full traceability from ingestion through prediction to final action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feedback into the Model
&lt;/h2&gt;

&lt;p&gt;To keep the loop smart, we must evolve the model. So, for every prediction that results in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A correct decision → reinforce via logs.&lt;/li&gt;
&lt;li&gt;A wrong decision → flag for retraining.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Lambda function watches the quarantine bucket. If a file in quarantine is later processed successfully by an engineer, it’s tagged as a &lt;em&gt;false positive&lt;/em&gt; and fed into the retraining dataset. This self-healing process makes the model more precise over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Impact
&lt;/h2&gt;

&lt;p&gt;Before this feedback loop, bad jobs would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run anyway, wasting CPU time.&lt;/li&gt;
&lt;li&gt;Cause cascading failures in downstream services.&lt;/li&gt;
&lt;li&gt;Require postmortem triage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, we proactively flag risky inputs. Engineers focus only on edge cases. Overall job success rates improve, and so does trust in the system.&lt;/p&gt;

&lt;p&gt;This loop also enables us to A/B test different models, thresholds, and routing logic—giving us a lab for optimization without interrupting the production flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;COBOL jobs don’t have to be dumb. By wrapping them in modern ML pipelines, we get real-time intelligence that prevents failures before they happen. SageMaker gives us prediction. Kubernetes gives us orchestration. And a simple controller gives us the glue to wire it all together.&lt;/p&gt;

&lt;p&gt;With a smart feedback loop in place, &lt;code&gt;eks_cobol&lt;/code&gt; becomes more than a modernization play—it becomes a self-improving system that learns from its own failures.&lt;/p&gt;

</description>
      <category>community</category>
      <category>machinelearning</category>
      <category>cobol</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Predicting Legacy Failures: Training and Hosting ML Models in SageMaker</title>
      <dc:creator>Todd Bernson</dc:creator>
      <pubDate>Mon, 07 Apr 2025 13:08:13 +0000</pubDate>
      <link>https://dev.to/semperfitodd/predicting-legacy-failures-training-and-hosting-ml-models-in-sagemaker-48b7</link>
      <guid>https://dev.to/semperfitodd/predicting-legacy-failures-training-and-hosting-ml-models-in-sagemaker-48b7</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Legacy systems are infamous for failing silently—or catastrophically—with no early warning signs. In our &lt;code&gt;eks_cobol&lt;/code&gt; pipeline, COBOL batch jobs handle sensitive data transformations. When something goes wrong, we don’t just want to know &lt;em&gt;after&lt;/em&gt; it fails—we want to know &lt;em&gt;before&lt;/em&gt; it runs. Enter machine learning.&lt;/p&gt;

&lt;p&gt;This article covers how we use Amazon SageMaker to train a model that predicts COBOL job failures based on input metadata and content characteristics. You’ll see how we take the structured error data from Article 4, create features, train a model using XGBoost, host it with a live endpoint, and wire it into our processing pipeline for real-time inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Prediction Problem
&lt;/h2&gt;

&lt;p&gt;The goal is to predict whether a COBOL job will fail, before running it, using data available at ingest time. Features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filename (which may encode customer, date, region, etc.)&lt;/li&gt;
&lt;li&gt;File size (bytes)&lt;/li&gt;
&lt;li&gt;Record count&lt;/li&gt;
&lt;li&gt;Presence of null fields or format anomalies&lt;/li&gt;
&lt;li&gt;Job type or business logic variant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We label previous failed jobs with &lt;code&gt;isFailure = True&lt;/code&gt; and successful jobs with &lt;code&gt;isFailure = False&lt;/code&gt;. The model learns correlations between input patterns and known failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Training Dataset
&lt;/h2&gt;

&lt;p&gt;We merge two CSVs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One from failed COBOL jobs (&lt;code&gt;errors_flat.csv&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;One from successful jobs (&lt;code&gt;success_flat.csv&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A preprocessing script ensures both datasets are aligned, normalized, and balanced.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errors_flat.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;success_flat.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ignore_index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fileSize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rawRecord&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fileExtension&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inputFile&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errorType&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fileExtension&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fileSize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;isFailure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errorType_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fileExtension_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)]]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml_input.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Training the Model in SageMaker
&lt;/h2&gt;

&lt;p&gt;We use SageMaker’s built-in XGBoost container for binary classification. The training script is handled via a SageMaker training job or a SageMaker Studio notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.inputs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TrainingInput&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;image_uris&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xgboost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boto_region_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.3-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb_estimator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Estimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml.p3.2xlarge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sagemaker_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb_estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_hyperparameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;binary:logistic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subsample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;colsample_bytree&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;xgb_estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TrainingInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_s3_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TrainingInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_s3_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This trains a binary classifier that predicts failure probability (0.0 to 1.0) given new job metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosting the Inference Endpoint
&lt;/h2&gt;

&lt;p&gt;Once the model is trained and stored in S3, we deploy it to a real-time SageMaker endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.serializers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CSVSerializer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.deserializers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONDeserializer&lt;/span&gt;

&lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb_estimator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml.p3.2xlarge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serializer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CSVSerializer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deserializer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JSONDeserializer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sample row:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prediction:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can send job metadata in real-time and receive a prediction before running the COBOL job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Inference into the Pipeline
&lt;/h2&gt;

&lt;p&gt;Before a COBOL job runs, the ingestion service sends a prediction request to the SageMaker endpoint. If the prediction is above a threshold (say 0.8), we mark the job as "high risk" and route it to a validation or quarantine path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sagemaker-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_failure_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ext_onehot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_type_onehot&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fileSize&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ext_onehot&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;error_type_onehot&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;EndpointName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cobol-failure-predictor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ContentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text/csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us predictive observability—no more surprises when a job fails after burning through hours of runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Monitoring and Retraining
&lt;/h2&gt;

&lt;p&gt;We use SageMaker Model Monitor to detect drift in prediction distributions. As more jobs are processed, both successful and failed, we continuously push new records to the training bucket and retrain the model weekly via a scheduled SageMaker pipeline or Lambda-triggered training job.&lt;/p&gt;

&lt;p&gt;The retraining process includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collect new &lt;code&gt;.json&lt;/code&gt; logs from S3&lt;/li&gt;
&lt;li&gt;Run the same flatten + preprocess script&lt;/li&gt;
&lt;li&gt;Update the dataset&lt;/li&gt;
&lt;li&gt;Launch a training job with versioned output&lt;/li&gt;
&lt;li&gt;Replace the endpoint via blue/green deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Machine learning isn’t just for flashy new systems—it can massively improve how legacy pipelines operate. By training and hosting a binary classifier in SageMaker, we’ve added a predictive safety net to our COBOL workflows. With every job that fails or succeeds, the model gets smarter, reducing wasted compute and catching bad inputs early.&lt;/p&gt;

&lt;p&gt;This is the kind of hybrid future that actually works: COBOL + Kubernetes + JSON + SageMaker, working in concert. And it all starts with clean training data and good feature engineering.&lt;/p&gt;

</description>
      <category>community</category>
      <category>machinelearning</category>
      <category>sagemaker</category>
      <category>cobol</category>
    </item>
  </channel>
</rss>
