<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alessandra Bilardi</title>
    <description>The latest articles on DEV Community by Alessandra Bilardi (@bilardi).</description>
    <link>https://dev.to/bilardi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872509%2Ff4e33b54-08ad-4b6d-a6ba-ff25409f3dee.jpg</url>
      <title>DEV Community: Alessandra Bilardi</title>
      <link>https://dev.to/bilardi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bilardi"/>
    <language>en</language>
    <item>
      <title>When boto3 doesn't have it (yet), you write it: a realtime speech-to-speech story in Python</title>
      <dc:creator>Alessandra Bilardi</dc:creator>
      <pubDate>Wed, 20 May 2026 22:00:07 +0000</pubDate>
      <link>https://dev.to/bilardi/when-boto3-doesnt-have-it-yet-you-write-it-a-realtime-speech-to-speech-story-in-python-4c42</link>
      <guid>https://dev.to/bilardi/when-boto3-doesnt-have-it-yet-you-write-it-a-realtime-speech-to-speech-story-in-python-4c42</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frboz67ocrh4hjjvf8mkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frboz67ocrh4hjjvf8mkx.png" alt="Architecture" width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At a meetup's networking session, someone dropped: "the new speech-to-speech feature in Teams is really cool". Microsoft Teams added the interpreter agent with realtime AI-powered speech-to-speech translation during calls. So the natural question: how complicated is building one with AWS ? And what performance does it deliver ?&lt;/p&gt;

&lt;p&gt;Meanwhile, for PyCon IT 2026, with an inclusivity goal, the plan was already to use &lt;a href="https://github.com/bilardi/realtime-transcription" rel="noopener noreferrer"&gt;bilardi/realtime-transcription&lt;/a&gt; with a monitor in the room showing the talk transcript. But wouldn't it be handier if each attendee had the translated transcript directly on their own mobile, and maybe the audio in their own language too, naturally without installing anything ?&lt;/p&gt;

&lt;p&gt;And so &lt;a href="https://github.com/bilardi/realtime-speech-to-speech" rel="noopener noreferrer"&gt;bilardi/realtime-speech-to-speech&lt;/a&gt; was born, ready to use, for any conference or meetup. Under the hood there are three AWS services chained together: &lt;a href="https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html" rel="noopener noreferrer"&gt;Transcribe Streaming&lt;/a&gt; for Automatic Speech Recognition (ASR) from audio to text, &lt;a href="https://aws.amazon.com/translate/" rel="noopener noreferrer"&gt;Translate&lt;/a&gt; for the translation, &lt;a href="https://aws.amazon.com/polly/" rel="noopener noreferrer"&gt;Polly&lt;/a&gt; bidirectional streaming for Text-to-Speech (TTS) from text to audio. Architecture, costs and usage live in the repo: here, instead, I tell the choices and what went sideways along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  A stage PoC for multilingual meetups
&lt;/h2&gt;

&lt;p&gt;The initial alternatives were three, from the simplest to the most complex.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;When it makes sense&lt;/th&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One-way PoC, 1 speaker language → 1 listener language&lt;/td&gt;
&lt;td&gt;Minimal validation of the AWS pipeline&lt;/td&gt;
&lt;td&gt;Headphones to keep the mic from recapturing the TTS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bidirectional 1:1 conversation&lt;/td&gt;
&lt;td&gt;International meeting between two people&lt;/td&gt;
&lt;td&gt;Two symmetric pipelines + a second device to test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-to-many conference (fan-out), multilingual&lt;/td&gt;
&lt;td&gt;Talks and meetups with international audience&lt;/td&gt;
&lt;td&gt;Browser audio playback + N parallel pipelines under contention&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I started from the 1:1 one-way PoC to validate the AWS pipeline and the new piece (Polly bidirectional streaming plus the browser audio playback), and from there moved on to 1-to-many, which is the real scenario for a conference or a meetup. Direction and language pair stay as two environment variables: changing scenario becomes editing two lines in &lt;code&gt;.env&lt;/code&gt;, no refactor.&lt;/p&gt;

&lt;p&gt;Listener client: the browser. Mobile has it without installing anything, and opening a URL is the simplest UX for the "PC speaks, mobile listens" test. A native app isn't worth it even in production for this use case, let alone for a PoC: targets to maintain, stores to publish to, zero advantages over a page opened from a QR code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not Nova 2 Sonic ?
&lt;/h3&gt;

&lt;p&gt;AWS &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-nova-2-sonic-real-time-conversational-ai/" rel="noopener noreferrer"&gt;recently announced&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-conversational-speech.html" rel="noopener noreferrer"&gt;Amazon Nova 2 Sonic&lt;/a&gt;: an end-to-end speech-to-speech model, ASR plus LLM plus TTS in a single bidirectional connection. Obligatory question: why not Nova Sonic, then ?&lt;/p&gt;

&lt;p&gt;Nova Sonic is designed to &lt;strong&gt;respond&lt;/strong&gt; to an audio: conversational assistant, human-AI dialogue, turn-taking, managed interruptions. The use case here is the opposite: a &lt;strong&gt;transmission to multiple listeners, a different language for each (multilingual broadcast)&lt;/strong&gt;, with faithful translation. For example, Italian audio as input, the same sentence as audio in N different languages as output, across N parallel channels. They are two different products: the fact that both go by "speech-to-speech" is a marketing collision.&lt;/p&gt;

&lt;p&gt;Mapping the current three stages against Nova Sonic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current stage&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Nova 2 Sonic covers ?&lt;/th&gt;
&lt;th&gt;Same guarantee ?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transcribe Streaming&lt;/td&gt;
&lt;td&gt;ASR audio to text&lt;/td&gt;
&lt;td&gt;Yes, integrated&lt;/td&gt;
&lt;td&gt;Plausible, but I haven't tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translate&lt;/td&gt;
&lt;td&gt;Deterministic Neural Machine Translation (NMT)&lt;/td&gt;
&lt;td&gt;Yes, via prompting&lt;/td&gt;
&lt;td&gt;No, not deterministic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Polly Generative&lt;/td&gt;
&lt;td&gt;TTS reading quality&lt;/td&gt;
&lt;td&gt;Yes, conversational voices&lt;/td&gt;
&lt;td&gt;No, dialogue intonation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The three critical points, from most to least blocking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Translate&lt;/strong&gt;: an NMT trained for faithful, deterministic translation. Nova Sonic would do translation via LLM prompting: more fluent but not deterministic, may paraphrase or add conversational fillers. Unacceptable for a broadcast where the audience expects exactly what the speaker says&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Polly Generative&lt;/strong&gt;: voices optimized for reading a given text. Nova Sonic has voices optimized for dialogue, intonation that adapts to the user's voice input. For reading a translation it's the wrong voice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transcribe&lt;/strong&gt;: replaceable in principle, but Nova Sonic doesn't expose ASR as a standalone service billed separately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Operational constraints independent of quality: 8-minute connection limit against Transcribe Streaming's 4 hours, and Nova requires a separate session per target language (the current pipeline calls Transcribe once for N languages).&lt;/p&gt;

&lt;p&gt;Decision: pipeline with three specialized services. Nova 2 Sonic stays the natural candidate for a different scenario, where the listener asks the AI a question and the AI answers, not for a meetup with a human speaker and a passive audience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here is the stack
&lt;/h2&gt;

&lt;p&gt;As a lazy developer, the first thing I looked for is reusable pieces. &lt;code&gt;realtime-transcription&lt;/code&gt; already has the &lt;code&gt;audio_client/&lt;/code&gt; module to capture Pulse-Code Modulation (PCM) audio from a device and the FastAPI WebSocket scaffold: cherry-pick roughly 140 lines and you're off. The browser display, instead, is from scratch, because audio playback is a different beast from text display.&lt;/p&gt;

&lt;p&gt;The server-side pipeline is simple and linear: Transcribe streaming → Translate one-shot → Polly bidirectional. Transcribe can deliver a partial text (&lt;code&gt;is_partial=True&lt;/code&gt;) on faster timing, but it might be wrong and so cancelled and rewritten: the goal is validating the chain end to end, not shaving milliseconds of latency. Everything therefore starts from Transcribe once it has recognized a complete sentence (&lt;code&gt;is_partial=False&lt;/code&gt;): at that point Translate fires with a single call per sentence, and the translated text goes to Polly bidirectional, which begins returning audio while it's still generating the rest.&lt;/p&gt;

&lt;p&gt;For the audio format the options were compressed MP3 and raw PCM. MP3 uses ~4 times less bandwidth, but the browser has to decode it asynchronously for each chunk (&lt;code&gt;decodeAudioData&lt;/code&gt;), breaking the continuity of the playback queue. PCM (16-bit signed LE, 16 kHz mono) weighs more on bandwidth but the browser writes it straight into a Web Audio API &lt;code&gt;AudioBuffer&lt;/code&gt;: no intermediate decoding, linear queue. On LAN or local WiFi bandwidth isn't the constraint, latency is: I picked PCM. On top of that, 16 kHz mono matches the sample rate of the microphone and of Transcribe: no format conversion in the middle of the pipeline. In the cloud, where the audio going out from the server to each listener is data transfer out (AWS egress, billed), PCM might blow past the 100GB / month free tier, which is ~35h with 25 listeners.&lt;/p&gt;

&lt;p&gt;To pick a Polly voice in the target language, there were two paths. A hardcoded &lt;code&gt;(language) → (voice id)&lt;/code&gt;: simple but it breaks every time AWS publishes new voices. The other calls &lt;code&gt;DescribeVoices&lt;/code&gt; at server boot and discovers dynamically what's available, with an in-memory cache. I picked the second: one API call at startup, zero maintenance when AWS adds voices. To stay compatible with bidirectional streaming I filtered by &lt;code&gt;LanguageCode&lt;/code&gt; (the target language) and by voices that support it: the feature is recent (2026) and not every language covers it, so without the filter synthesis would fail at the first &lt;code&gt;start_speech_synthesis_stream&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The truly new piece is precisely &lt;code&gt;StartSpeechSynthesisStream&lt;/code&gt;, the Amazon Polly bidirectional API. &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-polly-expands-TTS-new-voices-and-bidirectional-streaming/" rel="noopener noreferrer"&gt;Announced in March 2026&lt;/a&gt;, exposed in the &lt;a href="https://aws.amazon.com/blogs/machine-learning/introducing-amazon-polly-bidirectional-streaming-real-time-speech-synthesis-for-conversational-ai/" rel="noopener noreferrer"&gt;Java SDK&lt;/a&gt;, and missing in boto3. The feature shows up in the Java SDK because its code generator reads &lt;code&gt;service-2.json&lt;/code&gt; and supports the HTTP/2 bidirectional event-stream protocol. Under boto3 there's botocore, and even botocore doesn't have that infrastructure: the operation stays declared in the &lt;a href="https://github.com/boto/botocore/blob/develop/botocore/data/polly/2016-06-10/service-2.json#L127-L142" rel="noopener noreferrer"&gt;service model&lt;/a&gt; but the Python client doesn't expose it. Same scenario for aioboto3, the asynchronous version of boto3, which reuses the same service models. Verified on boto3 1.43.9.&lt;/p&gt;

&lt;p&gt;So, what paths are available ?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;synthesize_speech&lt;/code&gt; sync&lt;/td&gt;
&lt;td&gt;Already in the SDK, 5 lines&lt;/td&gt;
&lt;td&gt;No fast first-byte: waits until Polly has generated all the audio before returning any byte&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP/2 raw + SigV4 + event-stream parser&lt;/td&gt;
&lt;td&gt;Real bidirectional, first audio chunk arriving while Polly is still generating&lt;/td&gt;
&lt;td&gt;Not in Python: needs to be written from scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Decision: the sync one first to validate the pipeline, then the bidirectional one.&lt;/p&gt;

&lt;p&gt;And here begins the piece that became a package of its own: &lt;a href="https://amazon-polly-streaming.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;amazon-polly-streaming&lt;/a&gt;. A PR to boto3 would have been the first reflex, but boto3 doesn't have the HTTP/2 bidirectional event-stream infrastructure. For Transcribe streaming AWS kept it out of boto3 in a separate package under awslabs: first in &lt;a href="https://github.com/awslabs/amazon-transcribe-streaming-sdk/" rel="noopener noreferrer"&gt;&lt;code&gt;amazon-transcribe-streaming-sdk&lt;/code&gt;&lt;/a&gt; (deprecated today) that delegates the HTTP/2 transport to &lt;a href="https://github.com/awslabs/aws-crt-python" rel="noopener noreferrer"&gt;&lt;code&gt;awscrt&lt;/code&gt;&lt;/a&gt;, then in &lt;a href="https://github.com/awslabs/aws-sdk-python/tree/develop/clients/aws-sdk-transcribe-streaming" rel="noopener noreferrer"&gt;&lt;code&gt;aws-sdk-transcribe-streaming&lt;/code&gt;&lt;/a&gt; (the successor) that delegates the event-stream too to &lt;a href="https://github.com/awslabs/smithy-python/tree/develop/packages/smithy-aws-core" rel="noopener noreferrer"&gt;&lt;code&gt;smithy_aws_core&lt;/code&gt;&lt;/a&gt;. For Polly bidirectional an official equivalent doesn't exist yet (verified in May 2026, neither on awslabs nor on PyPI), so &lt;code&gt;amazon-polly-streaming&lt;/code&gt; is the first public Python implementation of the feature.&lt;/p&gt;

&lt;p&gt;The public API is &lt;code&gt;PollyStreamingClient.start_speech_synthesis_stream()&lt;/code&gt;, a mirror of &lt;code&gt;TranscribeStreamingClient.start_stream_transcription()&lt;/code&gt; from &lt;code&gt;aws-sdk-transcribe-streaming&lt;/code&gt;. Same pattern as the official AWS package for Transcribe: a convention that lets future adoption by awslabs happen without redesigning the API. Same for exceptions: a separate module that mirrors the types Polly exposes in &lt;code&gt;StartSpeechSynthesisStream&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And why not delegate the HTTP/2 bidirectional event-stream to &lt;code&gt;smithy_aws_core[eventstream]&lt;/code&gt;, the way &lt;code&gt;aws-sdk-transcribe-streaming&lt;/code&gt; does ? The bulk of the package would stay uncovered: AWS hasn't published a smithy client for Polly bidirectional. Since that client doesn't exist, it's simpler to keep the protocol in-house too: one fewer dependency, and no need to sync &lt;code&gt;amazon-polly-streaming&lt;/code&gt;'s cycles with those of an external lib under active development.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stories the README doesn't tell
&lt;/h2&gt;

&lt;h3&gt;
  
  
  That &lt;code&gt;ServiceFailureException&lt;/code&gt; that says nothing
&lt;/h3&gt;

&lt;p&gt;I started from the &lt;a href="https://docs.aws.amazon.com/polly/latest/dg/API_StartSpeechSynthesisStream.html" rel="noopener noreferrer"&gt;AWS documentation for &lt;code&gt;StartSpeechSynthesisStream&lt;/code&gt;&lt;/a&gt;: it lists the parameters (&lt;code&gt;Engine&lt;/code&gt;, &lt;code&gt;LanguageCode&lt;/code&gt;, &lt;code&gt;VoiceId&lt;/code&gt;, &lt;code&gt;OutputFormat&lt;/code&gt;, ..) and the event types (&lt;code&gt;TextEvent&lt;/code&gt;, &lt;code&gt;CloseStreamEvent&lt;/code&gt;, &lt;code&gt;AudioEvent&lt;/code&gt;), but doesn't explain how to package the bidirectional event-stream body. The first attempt was therefore naive: I built a single event-stream body with &lt;code&gt;TextEvent&lt;/code&gt; followed by &lt;code&gt;CloseStreamEvent&lt;/code&gt;, signed it with SigV4 in its standard form (&lt;code&gt;HTTP_REQUEST_HEADERS&lt;/code&gt; headers and &lt;code&gt;EMPTY_SHA256&lt;/code&gt; payload), and sent it in one shot. AWS Polly's response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ServiceFailureException: Service is unavailable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No "this header is missing", no "the body isn't the type I expect", nothing that lets you figure out what's wrong. Always the same response across every combination I tried. Pushing harder on the Polly endpoint by tweaking parameters was therefore pointless: the contract had to be found elsewhere.&lt;/p&gt;

&lt;p&gt;I checked botocore's &lt;code&gt;service-2.json&lt;/code&gt; file (the same file is in the Java SDK, but only the latter implements it in a client): it's the canonical declaration of the AWS contract, committed to the repos as input for the code generator. For Polly it declares &lt;a href="https://github.com/boto/botocore/blob/develop/botocore/data/polly/2016-06-10/service-2.json#L6-L7" rel="noopener noreferrer"&gt;&lt;code&gt;protocol: "rest-json"&lt;/code&gt; with &lt;code&gt;protocolSettings: { h2: "eventstream" }&lt;/code&gt;&lt;/a&gt; and an &lt;a href="https://github.com/boto/botocore/blob/develop/botocore/data/polly/2016-06-10/service-2.json#L810-L824" rel="noopener noreferrer"&gt;&lt;code&gt;ActionStream&lt;/code&gt; payload of type &lt;code&gt;eventstream&lt;/code&gt;&lt;/a&gt;. It's the same protocol Transcribe Streaming uses for &lt;code&gt;start-stream-transcription&lt;/code&gt;, and for Transcribe a public Python implementation already exists: &lt;a href="https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/develop/amazon_transcribe/eventstream.py#L681-L741" rel="noopener noreferrer"&gt;&lt;code&gt;amazon-transcribe-streaming-sdk&lt;/code&gt;&lt;/a&gt; (Apache 2.0, awslabs). I read the transcribe-sdk and ported its signing logic to &lt;a href="https://github.com/bilardi/amazon-polly-streaming/blob/master/amazon_polly_streaming/_event_signer.py#L44-L116" rel="noopener noreferrer"&gt;&lt;code&gt;amazon-polly-streaming&lt;/code&gt;&lt;/a&gt;, adapting it to Polly.&lt;/p&gt;

&lt;p&gt;What I learned (the hard way):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS errors like &lt;code&gt;ServiceFailureException&lt;/code&gt; don't say what went wrong: a design choice. For AWS services not yet in boto3, you have to go straight to the &lt;code&gt;service-2.json&lt;/code&gt; file (in botocore or in the Java SDK, they are identical): faster than debugging parameter by parameter&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;smithy_aws_core[eventstream]&lt;/code&gt; is today the most complete Python reference for the generic part of the AWS HTTP/2 bidirectional event-stream; the event types (for Polly: &lt;code&gt;TextEvent&lt;/code&gt;, &lt;code&gt;CloseStreamEvent&lt;/code&gt;, &lt;code&gt;AudioEvent&lt;/code&gt;) aren't there, whoever builds the client writes them (in this case the Polly client)&lt;/li&gt;
&lt;li&gt;the Java SDK v2 client code is generated automatically at build time from &lt;code&gt;service-2.json&lt;/code&gt;, it isn't committed in the repo: searching the method name (e.g. &lt;code&gt;startSpeechSynthesisStream&lt;/code&gt;) in the source returns only changelogs and the service model, not the real signatures. For the protocol contract, &lt;code&gt;service-2.json&lt;/code&gt; stays the canonical source (both in the Java SDK and in botocore)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  That pool that worked solo
&lt;/h3&gt;

&lt;p&gt;An HTTPS call to AWS has a cost: before exchanging the first byte of data, there's the TLS handshake and the HTTP/2 setup. A connection pool removes that cost for every call after the first: open once, reuse N times. On a pipeline that calls Polly bidirectional once per finalized sentence it's an immediate win: ~50 ms less median per call, from the second call onwards.&lt;/p&gt;

&lt;p&gt;I added the HTTP/2 pool in &lt;code&gt;amazon-polly-streaming&lt;/code&gt; v0.2.0 with &lt;code&gt;use_pool=True&lt;/code&gt; as the default, and on a single listener it worked fine ..&lt;/p&gt;

&lt;p&gt;Then I implemented the multilingual broadcast fan-out: 1 speaker to N listeners, each with its own target language. The test with 2 listeners (&lt;code&gt;en-US&lt;/code&gt; and &lt;code&gt;de-DE&lt;/code&gt;), 5 sentences per 2 target languages: I expected 10 calls to Polly. Instead half of the calls emitted no audio. Alternating pattern: in the same execution one language always "won" and the other always "lost", but across different executions the role flipped. So it wasn't language-specific, it was specific to the &lt;strong&gt;second parallel task&lt;/strong&gt; of the fan-out iteration (a &lt;code&gt;for target in targets:&lt;/code&gt; over an unordered &lt;code&gt;set&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fbilardi%2Frealtime-speech-to-speech%2Fmaster%2Fimages%2F0.2.0.en.mermaid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fbilardi%2Frealtime-speech-to-speech%2Fmaster%2Fimages%2F0.2.0.en.mermaid.png" alt="Pool 0.2.0" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Diagnosis: pool v0.2.0 kept &lt;strong&gt;one and only&lt;/strong&gt; &lt;code&gt;HttpClientConnection&lt;/code&gt; per &lt;code&gt;(host, port)&lt;/code&gt; pair. Under fan-out, two near-simultaneous calls asked the pool for a connection to Polly: the first opened one from scratch, the second received the same connection already open. Both opened a new HTTP/2 stream on the same connection. But Polly bidirectional enforces "1 stream = 1 sentence" and the Polly endpoint accepts only one active bidirectional stream at a time: what I observed was that awscrt queued the second stream until the first one closed. Under fan-out the queue never drained: before the first one finished, the next sentence arrived. From here two moves: one immediate and one structural.&lt;/p&gt;

&lt;p&gt;As a lazy developer, the workaround first: &lt;code&gt;POLLY_USE_POOL=false&lt;/code&gt; so every call opened a fresh connection and every call produced audio. Cost: the ~50 ms gained earlier from the pool were lost on every call. The refactor of &lt;code&gt;_ConnectionPool&lt;/code&gt; with lease semantics was needed: &lt;code&gt;amazon-polly-streaming&lt;/code&gt; v0.3.0 creates a list of connections per &lt;code&gt;(host, port)&lt;/code&gt; instead of a single one, so every fan-out task leases a distinct connection (opened cold the first time, reused after that).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fbilardi%2Frealtime-speech-to-speech%2Fmaster%2Fimages%2F0.3.0.en.mermaid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fbilardi%2Frealtime-speech-to-speech%2Fmaster%2Fimages%2F0.3.0.en.mermaid.png" alt="Pool 0.3.0" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Improvement table across iterations. The &lt;code&gt;polly_first_byte_ms&lt;/code&gt; metric measures the time between when Translate returns the translated text and the first audio byte arriving from Polly: TLS plus HTTP/2 setup plus Polly's start-up latency. It's not the end-to-end latency perceived by the listener (which also includes server-to-browser forwarding).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Median &lt;code&gt;polly_first_byte_ms&lt;/code&gt; warm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single listener, no pool&lt;/td&gt;
&lt;td&gt;~370 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single listener, with pool (v0.2.0)&lt;/td&gt;
&lt;td&gt;~331 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fan-out 1-to-2, workaround without pool (v0.2.0)&lt;/td&gt;
&lt;td&gt;~373 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fan-out 1-to-2, with fixed pool (v0.3.0)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~306 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fixed pool in v0.3.0 beats every previous measurement: ~25 ms less median compared to the single-listener pool in v0.2.0. This extra delta comes from pipeline optimizations accumulated across iterations, orthogonal to the pool but that show up in the final result.&lt;/p&gt;

&lt;h3&gt;
  
  
  That WAF that, thankfully, isn't needed
&lt;/h3&gt;

&lt;p&gt;At the first deploy on EC2 via &lt;a href="https://github.com/bilardi/aws-docker-host" rel="noopener noreferrer"&gt;&lt;code&gt;aws-docker-host&lt;/code&gt;&lt;/a&gt;, with a public ALB at &lt;code&gt;https://sts.workshop.pandle.net&lt;/code&gt;, uvicorn's logs filled up within minutes of the &lt;code&gt;apply&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /hello.world?%ADd+allow_url_include%3d1+%ADd+auto_prepend_file%3dphp://input   404
GET  /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php                            404
GET  /vendor/phpunit/Util/PHP/eval-stdin.php                                        404
GET  /phpunit/phpunit/src/Util/PHP/eval-stdin.php                                   404
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tens of requests per minute: no problem, FastAPI answers 404 to all of them. The real risk is different: a targeted bot connects to &lt;code&gt;/ws/speak&lt;/code&gt; or &lt;code&gt;/ws/listen&lt;/code&gt; and fires Transcribe plus Translate plus Polly at the expense of the AWS account owner. The figure is low per single call but scales linearly with the number of malicious connections.&lt;/p&gt;

&lt;p&gt;So, how do you defend yourself ?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IP allowlist on the ALB security group&lt;/td&gt;
&lt;td&gt;Granular&lt;/td&gt;
&lt;td&gt;The audience IPs at the talk are not known in advance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS WAF with rules on scanner patterns&lt;/td&gt;
&lt;td&gt;Blocks the known noise (scanner UA, PHP paths)&lt;/td&gt;
&lt;td&gt;Doesn't block "competent" abuse (bot with browser UA, correct path), and costs 5-10 € / month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single shared token&lt;/td&gt;
&lt;td&gt;Simple to implement&lt;/td&gt;
&lt;td&gt;The QR code reaches tens of people, to be treated as a secret&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Double token per role&lt;/td&gt;
&lt;td&gt;Exposure asymmetry&lt;/td&gt;
&lt;td&gt;15 extra lines of code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Decision: double token. &lt;code&gt;SPEAKER_TOKEN&lt;/code&gt; protects &lt;code&gt;/ws/speak&lt;/code&gt; (the cost driver: Transcribe plus Translate plus Polly for N languages), &lt;code&gt;LISTENER_TOKEN&lt;/code&gt; protects &lt;code&gt;/ws/listen&lt;/code&gt; (the distribution path via QR code). Independent: the listener token doesn't work for the speaker, and vice versa. If the QR code leaks (photos on social, screenshots, shares), the damage is limited to "anyone can listen", not "anyone can spend the AWS owner's money". The &lt;code&gt;SPEAKER_TOKEN&lt;/code&gt; stays in the shell history and in the &lt;code&gt;.env&lt;/code&gt; of the deploy.&lt;/p&gt;

&lt;p&gt;The design stays minimal at every level. Locally, with no tokens set, authentication is off and nothing changes. The architecture adds no complications: no cookies, no login form, no OAuth, just a string comparison at each connection. And the code fits in a few lines on the server, a flag on the audio client, a URL parameter for the browsers, a few sample environment variables. Good enough for a PoC with frequent token rotation between events.&lt;/p&gt;

&lt;h2&gt;
  
  
  What else could be added ?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Signed JWTs in place of static tokens&lt;/strong&gt;: for prolonged use (always-on service, multiple events) JWTs with TTL per role. If the internet exposure becomes continuous, manually rotating the two static tokens gets tiring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subtitles sync&lt;/strong&gt;: the translated text arrives at the browser as a JSON message before the audio, so it's already on screen when the audio starts. A precise text-to-audio sync (word-by-word highlight) is the next step for accessibility. Polly exposes &lt;code&gt;SpeechMark&lt;/code&gt; exactly for this in the sync synthesize; for the bidirectional one they need to be checked in &lt;code&gt;service-2.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pause-based hybrid transcription pipeline&lt;/strong&gt;: to cut the perceived latency between "I'm done speaking" and "first audio byte", the pipeline needs to fire even when a Transcribe partial has been still for N milliseconds, not only when &lt;code&gt;is_partial=False&lt;/code&gt; arrives. Worth it only if you really want to optimize timing to the millisecond: the current sentence-bounded handling is enough, and implementing it requires a cancellation logic that's anything but trivial, because when Transcribe corrects a partial, the pipeline may already have fired translation and synthesis: you have to decide whether to let them finish, cancel them, or replace them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adoption of &lt;code&gt;amazon-polly-streaming&lt;/code&gt; by awslabs&lt;/strong&gt;: today it's the first public Python implementation of Polly bidirectional. The concrete path is a PR to &lt;a href="https://github.com/awslabs/aws-sdk-python" rel="noopener noreferrer"&gt;&lt;code&gt;aws-sdk-python&lt;/code&gt;&lt;/a&gt; to publish &lt;code&gt;aws-sdk-polly-streaming&lt;/code&gt; (sibling of &lt;code&gt;aws-sdk-transcribe-streaming&lt;/code&gt;), built on top of the generic primitives of &lt;code&gt;smithy_aws_core[eventstream]&lt;/code&gt;. When that client exists, &lt;code&gt;amazon-polly-streaming&lt;/code&gt; can be considered deprecated.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>polly</category>
      <category>docker</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>When does Iceberg beat Parquet+projection on AWS Glue, and when doesn't ?</title>
      <dc:creator>Alessandra Bilardi</dc:creator>
      <pubDate>Sun, 10 May 2026 20:39:08 +0000</pubDate>
      <link>https://dev.to/bilardi/when-does-iceberg-beat-parquetprojection-on-aws-glue-and-when-doesnt--2g2</link>
      <guid>https://dev.to/bilardi/when-does-iceberg-beat-parquetprojection-on-aws-glue-and-when-doesnt--2g2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmgbts8zmirh8a96xpon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvmgbts8zmirh8a96xpon.png" alt="Architecture" width="800" height="687"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this project
&lt;/h2&gt;

&lt;p&gt;I built this &lt;a href="https://github.com/bilardi/etl-prototype" rel="noopener noreferrer"&gt;repo&lt;/a&gt; because I didn't have one of this kind yet and, having worked on data ingestion with Glue for a while, I wanted to gather in one place three things: how to structure code so it stays testable, which Firehose and Glue features to use and on what criteria, and a few Docker and Terraform gems I'd always promised myself to slot in somewhere.&lt;/p&gt;

&lt;p&gt;Plus, I had never set up Glue streaming from scratch, and for a personal project I needed a test bed to compare Iceberg and Parquet + partition projection on the same data flow and under the same Athena queries, to figure out when one solution wins over the other and why.&lt;/p&gt;

&lt;p&gt;This project mixes a lot of the experience I've gathered over the years with a couple of curiosities I hadn't had a chance to test. So there are no real challenges here: I already took those hits long ago. What I'm sharing is deliberate choices, driven by knowing these services inside out.&lt;/p&gt;

&lt;p&gt;The architecture in the image describes exactly this project: a Python producer simulating stock tickers, a Kinesis Data Stream as the single entry point, two Firehose streams persisting the same flow in two different formats (Iceberg and Parquet), two Glue jobs that write to both formats (one batch for OHLC computation on 1m and 5m, one streaming for anomaly detection via z-score on a sliding window), and Athena querying both databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The choices and why
&lt;/h2&gt;

&lt;p&gt;The goal was to compare Glue batch and Athena on top of an Iceberg-based database and a Parquet + partition projection one.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Why (less effort)&lt;/th&gt;
&lt;th&gt;Discarded alternative (more effort)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python producer with &lt;code&gt;boto3.put_records&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Original code, controllable scenarios (&lt;code&gt;stable&lt;/code&gt;, &lt;code&gt;trend&lt;/code&gt;, &lt;code&gt;spike&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt;), pytest tests&lt;/td&gt;
&lt;td&gt;Kinesis Data Generator: webapp with Cognito, poorly maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;Partitioned with projection ready to use&lt;/td&gt;
&lt;td&gt;The alternative forces you to run a Crawler or schedule MSCK REPAIR TABLE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;--LOAD_DATA_MODE&lt;/code&gt; (&lt;code&gt;parquet&lt;/code&gt;, &lt;code&gt;spark&lt;/code&gt;, &lt;code&gt;iceberg&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;One parameter exposes three read strategies you can compare in the same deploy&lt;/td&gt;
&lt;td&gt;Three separate Glue jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wheel + &lt;code&gt;--additional-python-modules&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Explicit &lt;code&gt;pip install&lt;/code&gt; at worker boot, &lt;code&gt;pip install -e .&lt;/code&gt; locally: same import semantics&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--extra-py-files&lt;/code&gt; with zip or wheel: less deterministic across Glue versions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-line wrapper in &lt;code&gt;src/glue_jobs/&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;3 lines that call &lt;code&gt;run()&lt;/code&gt; from the wheel: all logic testable in pytest&lt;/td&gt;
&lt;td&gt;All code in &lt;code&gt;script_location&lt;/code&gt;: no pytest on the main scripts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The record schema the producer writes (&lt;code&gt;ticker_symbol&lt;/code&gt;, &lt;code&gt;sector&lt;/code&gt;, &lt;code&gt;price&lt;/code&gt;, &lt;code&gt;change&lt;/code&gt;, &lt;code&gt;event_timestamp&lt;/code&gt;) isn't something I made up: it's the one from the official AWS Firehose demo. That demo configures a single Firehose; this PoC configures two in parallel, one for Iceberg and one for Parquet+projection, to compare both storages on top of the same source. The Kinesis Data Generator is the tool the demo uses to produce the dataset, but rewriting it as a Python producer with &lt;code&gt;boto3&lt;/code&gt; gave me control over the scenarios (&lt;code&gt;stable&lt;/code&gt;, &lt;code&gt;trend&lt;/code&gt;, &lt;code&gt;spike&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt;) and made them testable in pytest. The scenarios feed Glue streaming, which handles anomaly detection: &lt;code&gt;spike&lt;/code&gt; injects controlled price spikes to validate z-score detection on anomalies, &lt;code&gt;stable&lt;/code&gt; and &lt;code&gt;trend&lt;/code&gt; act as baseline to avoid false positives.&lt;/p&gt;

&lt;p&gt;As a lazy developer, the criterion is always the same: less effort, in terms of time, code or cost. Two rows of the table deserve a deeper look: &lt;code&gt;--LOAD_DATA_MODE&lt;/code&gt; raises the question of read modes, the 3-line wrapper carries the code organization that makes TDD possible. I'll cover them one at a time, starting with reading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and read modes
&lt;/h2&gt;

&lt;p&gt;To understand why the three &lt;code&gt;LOAD_DATA_MODE&lt;/code&gt; exist, you have to start from the choice of &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html" rel="noopener noreferrer"&gt;partition projection&lt;/a&gt; as the partitioning strategy. The alternative would have been registering the partitions in Glue Catalog &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html" rel="noopener noreferrer"&gt;via Crawler or &lt;code&gt;MSCK REPAIR TABLE&lt;/code&gt;&lt;/a&gt;, letting you read them from Glue with &lt;code&gt;from_catalog&lt;/code&gt; and leverage the push-down predicate, &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html" rel="noopener noreferrer"&gt;up to 5x faster&lt;/a&gt; than post-read filtering. &lt;code&gt;GetPartitions&lt;/code&gt; can hit &lt;a href="https://repost.aws/knowledge-center/glue-throttling-rate-exceeded" rel="noopener noreferrer"&gt;API rate limits&lt;/a&gt;, S3 &lt;code&gt;LIST&lt;/code&gt; instead scales because it's &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html" rel="noopener noreferrer"&gt;paginated&lt;/a&gt;. Projection skips the registration (the table above reminds you why: less effort), but comes with a &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html" rel="noopener noreferrer"&gt;constraint&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Partition projection is usable only when the table is queried through Athena. If the same table is read through another service such as Amazon Redshift Spectrum, Athena for Spark, or Amazon EMR, the standard partition metadata is used.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So a Glue job reading the Parquet+projection database via &lt;code&gt;from_catalog&lt;/code&gt; would fall back to standard partition metadata, which for a projection table aren't registered in the Catalog: no partition info available on the Glue side, full scan that goes nowhere, dead end. You have to go straight to S3 with &lt;code&gt;spark.read.parquet&lt;/code&gt;, leaving Spark to handle &lt;a href="https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery" rel="noopener noreferrer"&gt;partition discovery&lt;/a&gt; via &lt;code&gt;LIST&lt;/code&gt; of the prefixes. Projection only matters when you query the same table from Athena, where it does its job: no &lt;code&gt;GetPartitions&lt;/code&gt; calls to the Catalog, partitions computed in memory from the template.&lt;/p&gt;

&lt;p&gt;From here, the three modes of &lt;code&gt;LOAD_DATA_MODE&lt;/code&gt; exposed by the Glue batch job:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;What it returns&lt;/th&gt;
&lt;th&gt;Extra cost vs &lt;code&gt;spark&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;When it makes sense&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;parquet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Glue DynamicFrame (&lt;code&gt;from_options(connection_type="s3", format="parquet")&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Schema discovery on-the-fly + ResolveChoice (explicit encoding of columns with inconsistent types as "choice"); wrapper memory overhead&lt;/td&gt;
&lt;td&gt;Raw "messy" data or unstable schema, where the DynamicFrame's flexibility helps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;spark&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Plain DataFrame (&lt;code&gt;spark.read.parquet(path)&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;No extra overhead: schema is what it is&lt;/td&gt;
&lt;td&gt;Parquet data with stable schema, like Firehose-generated. The most direct path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;iceberg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DynamicFrame from &lt;code&gt;from_catalog&lt;/code&gt;, but the read goes through Iceberg metadata (manifest list, column statistics)&lt;/td&gt;
&lt;td&gt;Reading the manifest list (small fixed cost); in exchange you get file skipping on non-partition filters&lt;/td&gt;
&lt;td&gt;Data managed as Iceberg tables with MERGE/UPSERT, and when typical filters are on columns with useful statistics (timestamp, ticker, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The DynamicFrame's traits are described in the &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html" rel="noopener noreferrer"&gt;Glue documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A &lt;code&gt;DynamicFrame&lt;/code&gt; is similar to a &lt;code&gt;DataFrame&lt;/code&gt;, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The access pattern shifts the balance between &lt;code&gt;spark&lt;/code&gt;/&lt;code&gt;parquet&lt;/code&gt; and &lt;code&gt;iceberg&lt;/code&gt; as volume grows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Access pattern&lt;/th&gt;
&lt;th&gt;Small volumes (~1 GB)&lt;/th&gt;
&lt;th&gt;Large volumes (50-100 GB, many files)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full read, no filter&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;iceberg&lt;/code&gt; slightly penalized by the fixed cost of the manifest read&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;iceberg&lt;/code&gt; comparable: the manifest cost dilutes against total I/O&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filter on partition column&lt;/td&gt;
&lt;td&gt;comparable: both do basic pruning&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;iceberg&lt;/code&gt; wins: the manifest list is O(1) over partition count, S3 list grows with O(n)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filter on non-partition column&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;iceberg&lt;/code&gt; wins via column statistics in the manifests: skips entire files without opening them&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;iceberg&lt;/code&gt; wins clearly: &lt;code&gt;parquet&lt;/code&gt;/&lt;code&gt;spark&lt;/code&gt; have to read and filter at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practice, on large volumes &lt;a href="https://iceberg.apache.org/spec/" rel="noopener noreferrer"&gt;Iceberg&lt;/a&gt; wins because it keeps, for each Parquet file, the min and max value of every column. When a query filters (say &lt;code&gt;ticker_symbol = 'AMZN'&lt;/code&gt;), the query engine looks at those min/max and immediately knows which files might hold the data and which can't; the discarded files don't even get opened.&lt;/p&gt;

&lt;p&gt;As a lazy developer I preferred reading the documentation rather than running a generic benchmark, because the access pattern is already clear. Then, case by case, the choice depends on the kind of data access required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three-layer TDD on Glue jobs
&lt;/h2&gt;

&lt;p&gt;Glue jobs are notoriously hard to test: you need &lt;code&gt;GlueContext&lt;/code&gt;, you need a real Iceberg &lt;code&gt;MERGE INTO&lt;/code&gt;, you need Spark configured the way it runs on the worker. I don't give up TDD here either: I split the code into three layers with clear boundaries.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pure Python logic&lt;/strong&gt; (argument parsing, naming derivation, producer scenarios): direct pytest, zero AWS or Spark dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spark core transformations&lt;/strong&gt; (the &lt;code&gt;OhlcAggregator&lt;/code&gt;, &lt;code&gt;ZScoreDetector&lt;/code&gt; classes): &lt;code&gt;SparkSession.builder.master("local[1]")&lt;/code&gt; as fixture, DataFrames built from literals. The classes are DataFrame-in / DataFrame-out, fully isolated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator &lt;code&gt;run()&lt;/code&gt;&lt;/strong&gt;: takes &lt;code&gt;args&lt;/code&gt;, &lt;code&gt;spark&lt;/code&gt;, &lt;code&gt;glue_context&lt;/code&gt;, &lt;code&gt;read_*_fn&lt;/code&gt;, &lt;code&gt;write_fn&lt;/code&gt; as parameters. Tests pass a mocked &lt;code&gt;GlueContext&lt;/code&gt; and test source/sink functions. The principle is "the job builds, the classes consume": all Glue knowledge lives in &lt;code&gt;_cli_entrypoint&lt;/code&gt;, which instantiates source and sink before calling &lt;code&gt;run()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What stays out of pytest is just the real integration (Glue Data Catalog, Iceberg &lt;code&gt;MERGE INTO&lt;/code&gt;, Kinesis Stream): covered by the JSON files in &lt;code&gt;tests/integration/&lt;/code&gt;, which run both locally via docker compose and on AWS via &lt;code&gt;aws glue start-job-run&lt;/code&gt;. The same file drives both: no duplication between AWS config and local test scripts.&lt;/p&gt;

&lt;p&gt;Alongside, &lt;code&gt;docker-compose.yaml&lt;/code&gt; exposes two profiles pointing to the official AWS images, &lt;code&gt;glue4&lt;/code&gt; (Spark 3.3, Python 3.10) and &lt;code&gt;glue5&lt;/code&gt; (Spark 3.5, Python 3.11, Iceberg built-in): &lt;code&gt;make test-integration-local PROFILE=glue5&lt;/code&gt; (default) or &lt;code&gt;PROFILE=glue4&lt;/code&gt;. The mount paths differ between the two images (&lt;code&gt;/home/glue_user/&lt;/code&gt; vs &lt;code&gt;/home/hadoop/&lt;/code&gt;), but &lt;code&gt;local_test.sh&lt;/code&gt; uses relative paths so the same JSON works on both. It's the shortcut to validate the same script on two Glue versions before bumping &lt;code&gt;glue_version&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The Python developer in me is now very satisfied.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned (the hard way)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Firehose with format conversion: 64 MB minimum and cached schemas
&lt;/h3&gt;

&lt;p&gt;Firehose accumulates records in a buffer before writing them to S3, and flushes in two cases: when the buffer reaches a certain size (&lt;code&gt;buffering_size&lt;/code&gt;, in MB) or when a certain time passes (&lt;code&gt;buffering_interval&lt;/code&gt;, in seconds).&lt;/p&gt;

&lt;p&gt;For a while now, the minimum values for these buffers have been lowered: &lt;code&gt;buffering_size&lt;/code&gt; starts at 1 MB and &lt;code&gt;buffering_interval&lt;/code&gt; at 0 seconds.&lt;/p&gt;

&lt;p&gt;For a PoC with small volumes I wanted a quick flush: I set &lt;code&gt;buffering_size = 1&lt;/code&gt; MB and &lt;code&gt;buffering_interval = 60s&lt;/code&gt;, counting on the flush to fire on time before size.&lt;/p&gt;

&lt;p&gt;On the Iceberg Firehose it went smoothly. On the Parquet+projection Firehose, no:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: InvalidArgumentException: BufferingHints.SizeInMBs must be at least 64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a Firehose has format conversion enabled (&lt;code&gt;data_format_conversion_configuration&lt;/code&gt;, which converts the incoming JSON to Parquet before writing it to S3), AWS imposes &lt;code&gt;buffering_size &amp;gt;= 64&lt;/code&gt; MB. On the Iceberg Firehose there's no conversion (Iceberg leans on its own native format), so 1 MB is accepted. On Parquet+projection I bumped the value to 64 MB and that was that: the flush stays governed by &lt;code&gt;buffering_interval = 60s&lt;/code&gt;, and at PoC volumes the 64 MB never get saturated. Perceived latency unchanged.&lt;/p&gt;

&lt;p&gt;Same Parquet+projection Firehose, second round: after apply, records were ending up in &lt;code&gt;s3://bucket/parquet_projection/_firehose_errors/format-conversion-failed/&lt;/code&gt; instead of &lt;code&gt;raw/&lt;/code&gt;. Cause: the producer writes &lt;code&gt;event_timestamp&lt;/code&gt; as ISO 8601 with &lt;code&gt;T&lt;/code&gt; and timezone (&lt;code&gt;"2026-04-23T20:48:32+00:00"&lt;/code&gt;), but the OpenXJsonSerDe used by Firehose accepts as Hive timestamp only &lt;code&gt;yyyy-MM-dd HH:mm:ss[.fff]&lt;/code&gt;. The Iceberg Firehose accepts ISO 8601 natively, the Parquet+projection one doesn't. Three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;change the producer to write epoch millis&lt;/strong&gt;: that was the cleanest, but assuming you can't touch the producer, where would it make sense to handle the conversion downstream ?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;add a Lambda processor in Firehose to reformat the timestamp&lt;/strong&gt;: such a simple operation, repeated on every record, was it really worth bringing in a Lambda ?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;type &lt;code&gt;event_timestamp&lt;/code&gt; as &lt;code&gt;string&lt;/code&gt; in the Glue raw tables, and cast it in Spark via &lt;code&gt;F.to_timestamp("event_timestamp")&lt;/code&gt; when needed&lt;/strong&gt;: when Spark has all the data in hand, it can handle the typing with O(n) complexity but parallelized&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Picked the third. The "natural" type lives in the layer where the data is born (&lt;code&gt;raw&lt;/code&gt; populated by Firehose, &lt;code&gt;string&lt;/code&gt; for portability), the &lt;code&gt;timestamp&lt;/code&gt; type appears in &lt;code&gt;aggregated_*&lt;/code&gt; and &lt;code&gt;anomalies&lt;/code&gt; where DataFrames are already in Spark's hands.&lt;/p&gt;

&lt;p&gt;After applying the fix, I updated the Glue raw table schema, changing the type of &lt;code&gt;event_timestamp&lt;/code&gt; from &lt;code&gt;timestamp&lt;/code&gt; to &lt;code&gt;string&lt;/code&gt;. &lt;code&gt;terraform apply&lt;/code&gt; went through fine, but for the next ~5 minutes the records kept landing in &lt;code&gt;_firehose_errors/&lt;/code&gt;. Cause: Firehose caches the &lt;code&gt;schema_configuration&lt;/code&gt; of the Glue table to avoid querying the Catalog on every record. AWS documents "up to 15 minutes" of cache; in tests 5 were enough before seeing records arrive cleanly in &lt;code&gt;raw/&lt;/code&gt;. To skip the wait, &lt;code&gt;terraform apply -replace="aws_kinesis_firehose_delivery_stream.parquet_projection[0]"&lt;/code&gt; recreates the delivery stream and clears the cache. For a PoC the wait is fine; in a real case the &lt;code&gt;replace&lt;/code&gt; (or &lt;code&gt;aws firehose update-destination&lt;/code&gt; directly) is the faster path.&lt;/p&gt;

&lt;h3&gt;
  
  
  The wheel filename: a story unto itself
&lt;/h3&gt;

&lt;p&gt;In the distant past, before I had local test management, I had the bad idea of providing the Glue job with the wheel renamed to &lt;code&gt;dist/glue_common.whl&lt;/code&gt;, so I wouldn't have to touch any configuration on each new upload to S3.&lt;/p&gt;

&lt;p&gt;But Glue throws a fit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LAUNCH ERROR | Installation of Additional Python Modules failed:
ERROR: glue_common.whl is not a valid wheel filename
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;pip install&lt;/code&gt; requires the PEP 427 form: &lt;code&gt;{name}-{version}-{python}-{abi}-{platform}.whl&lt;/code&gt;. The unversioned alias doesn't pass validation outside the PyPI context.&lt;/p&gt;

&lt;p&gt;So as a lazy developer, what's the best way to do everything automatically without forgetting to upload the new wheel ?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform reads the version dynamically from &lt;code&gt;src/glue_common/__init__.py&lt;/code&gt; via &lt;code&gt;regex()&lt;/code&gt;, builds the PEP 427 name and uses it as S3 key and source path&lt;/li&gt;
&lt;li&gt;on &lt;code&gt;make patch&lt;/code&gt; the filename changes, Terraform sees the new file and re-uploads it to S3 by itself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another satisfying win.&lt;/p&gt;

&lt;h3&gt;
  
  
  Iceberg on Glue 5.0: two ways to register the catalog
&lt;/h3&gt;

&lt;p&gt;After the wheel fix, the batch job stopped on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;AnalysisException&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TABLE_OR_VIEW_NOT_FOUND&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="s1"&gt;'etl_prototype_demo_iceberg.aggregated_1m'&lt;/span&gt; &lt;span class="n"&gt;cannot&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt; &lt;span class="k"&gt;found&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tables were in the Glue Data Catalog (Terraform had created them, I could see them via &lt;code&gt;aws glue get-tables&lt;/code&gt;). What was missing was the bridge between Spark and the Catalog: the keys &lt;code&gt;spark.sql.extensions&lt;/code&gt;, &lt;code&gt;spark.sql.catalog.glue_catalog.*&lt;/code&gt; and &lt;code&gt;spark.sql.defaultCatalog&lt;/code&gt; that tell Spark "&lt;em&gt;for the &lt;code&gt;glue_catalog&lt;/code&gt; catalog, use the Iceberg implementation that leans on the Glue Data Catalog&lt;/em&gt;".&lt;/p&gt;

&lt;p&gt;It's a technical constraint: these keys must be applied &lt;strong&gt;before&lt;/strong&gt; the SparkSession is initialized. Once &lt;code&gt;GlueContext(sc)&lt;/code&gt; has created the SparkSession, a runtime &lt;code&gt;spark.conf.set("spark.sql.catalog.glue_catalog", "...")&lt;/code&gt; is accepted syntactically, but has no effect: the catalog doesn't get registered and the job answers "&lt;em&gt;Catalog 'glue_catalog' plugin class not found&lt;/em&gt;". That was exactly my first attempt long ago, before I diligently read the documentation ..&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html" rel="noopener noreferrer"&gt;Glue documentation for Iceberg&lt;/a&gt; lists two equivalent ways to apply the conf in the right place:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Create a key named &lt;code&gt;--conf&lt;/code&gt; for your AWS Glue job, and set it to the following value. &lt;strong&gt;Alternatively, you can set the following configuration using &lt;code&gt;SparkConf&lt;/code&gt; in your script.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Under the hood, the two configurations achieve the same result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SparkConf in Python code&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getConf&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.sql.extensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.sql.catalog.glue_catalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;org.apache.iceberg.spark.SparkCatalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;# ... other conf ...
&lt;/span&gt;  &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;glueContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GlueContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# the SparkSession is born here with the right conf
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The configuration lives in the code. The &lt;code&gt;sc.stop()&lt;/code&gt; + recreation of the &lt;code&gt;SparkContext&lt;/code&gt; is when the configuration gets "injected" before SparkSession init.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--conf&lt;/code&gt; in Terraform's &lt;code&gt;default_arguments&lt;/code&gt;&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;  &lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;iceberg_spark_conf&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;" --conf "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.catalog.glue_catalog.warehouse=s3://${data.aws_s3_bucket.main.id}/iceberg/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"spark.sql.defaultCatalog=glue_catalog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Glue parses the concatenated string, applies the configurations at SparkSession boot, and then hands control to the Python script.&lt;/p&gt;

&lt;p&gt;I chose to configure the PoC via Terraform: why ? Three reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;a single source of truth&lt;/strong&gt;: the &lt;code&gt;iceberg_spark_conf&lt;/code&gt; &lt;code&gt;local&lt;/code&gt; is defined once in Terraform and reused by both the Glue batch and the streaming via &lt;code&gt;--conf = local.iceberg_spark_conf&lt;/code&gt; in their respective &lt;code&gt;default_arguments&lt;/code&gt;. No per-job duplication, and if I add a third Glue job tomorrow I reuse the same &lt;code&gt;local&lt;/code&gt; with a single line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;separation of configuration and code&lt;/strong&gt;: the catalog setup lives in Terraform alongside &lt;code&gt;--datalake-formats=iceberg&lt;/code&gt;; the Python code of the jobs doesn't know an Iceberg catalog exists, it imports &lt;code&gt;glue_common&lt;/code&gt;, takes &lt;code&gt;spark&lt;/code&gt; and &lt;code&gt;glue_context&lt;/code&gt; as parameters and runs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;low-cost configuration changes&lt;/strong&gt;: a different warehouse, catalog implementation or IO is touched only in Terraform, with no need to rebuild and re-upload the wheel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The configuration in code, on the other hand, stays handier when the catalog config depends on arguments the job receives at runtime (for instance a &lt;code&gt;warehouse&lt;/code&gt; derived from the input bucket name passed as &lt;code&gt;--ARG&lt;/code&gt;): in that case the conf is built naturally in the code, since you already have the resolved arguments there. In this PoC the warehouse is fixed per environment, so the configuration in Terraform wins on simplicity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What else is there to add ?
&lt;/h2&gt;

&lt;p&gt;Once the PoC has been signed off, you start to get serious: there's what was simulated to integrate, and other services and approaches to evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real APIs&lt;/strong&gt;: replace the simulated scenario with a real ingestion. It changes the producer's nature, not the architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Flink&lt;/strong&gt; as an alternative to Glue streaming: it makes sense when you need stricter guarantees on how many times an event is processed (Flink natively supports exactly-once, i.e. each event processed exactly once; Glue streaming is at-least-once and duplicates are handled at the application layer), or when the required latency is sub-second (Glue streaming, working in micro-batches, typically lands in the 5-10 second range; Flink drops to hundreds of milliseconds)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-environment deploy&lt;/strong&gt;: in a PoC, a single environment is enough. In production you need to separate so you can test feature rollouts without touching live data. So you introduce Terraform Workspaces or per-env modules, with all the implications for account management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD&lt;/strong&gt;: in a PoC, manual &lt;code&gt;make test&lt;/code&gt; and &lt;code&gt;terraform apply&lt;/code&gt; are enough. Working in a team or on mission-critical pipelines you need automation (lint, test, build wheel, terraform plan automatic on every PR) to catch regressions before merge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-account Data Catalog sharing&lt;/strong&gt;: Lake Formation + RAM + KMS + &lt;code&gt;assume_role&lt;/code&gt;. When the data lake aggregates flows from branches, departments, partners, the centralized schema changes everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Management&lt;/strong&gt;: the evolution of centralized Data Catalog sharing is DataZone or SageMaker Unified Studio, with lineage, asset-level permissions and per-asset documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extra time frames in the batch&lt;/strong&gt; as roll-up from 5m (1h, 1d), not from raw: each level computes on top of the previous level's output, hence on less data. It's a classic approach (cascade ETL) and works when the higher-level aggregate can be recomputed from the lower level (the high of one hour is the max of the highs of the 5 minutes). It doesn't work if the calculation needs to go back to the original values, like medians or exact distinct counts&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>glue</category>
      <category>iceberg</category>
      <category>parquet</category>
    </item>
    <item>
      <title>The lazy developer's code quality</title>
      <dc:creator>Alessandra Bilardi</dc:creator>
      <pubDate>Thu, 30 Apr 2026 09:25:08 +0000</pubDate>
      <link>https://dev.to/bilardi/the-lazy-developers-code-quality-3a34</link>
      <guid>https://dev.to/bilardi/the-lazy-developers-code-quality-3a34</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2998kczeig3a7o5plwj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2998kczeig3a7o5plwj.png" alt="Flow" width="800" height="1159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A repo to refresh, several rabbit holes to dive into
&lt;/h2&gt;

&lt;p&gt;A while ago, at PyCon IT, I attended a talk that opened my eyes on &lt;a href="https://pypi.org/project/pytest/" rel="noopener noreferrer"&gt;pytest&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simpler test management, especially for mocks&lt;/li&gt;
&lt;li&gt;parametrizable fixtures instead of the &lt;code&gt;setUp&lt;/code&gt; / &lt;code&gt;tearDown&lt;/code&gt; ritual&lt;/li&gt;
&lt;li&gt;bare &lt;code&gt;assert&lt;/code&gt; instead of a thousand &lt;code&gt;self.assertEqual&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd like my repo &lt;a href="https://github.com/bilardi/python-prototype/" rel="noopener noreferrer"&gt;python-prototype&lt;/a&gt;, born for educational purposes, to also be a bit of a template I can pull off the shelf for the next projects.&lt;/p&gt;

&lt;p&gt;So, with the excuse of refreshing the testing system with pytest and the packaging with &lt;a href="https://packaging.python.org/en/latest/guides/writing-pyproject-toml/" rel="noopener noreferrer"&gt;pyproject&lt;/a&gt;, I started thinking about adding more.&lt;/p&gt;

&lt;p&gt;I had been using &lt;a href="https://pypi.org/project/black/" rel="noopener noreferrer"&gt;black&lt;/a&gt; and &lt;a href="https://pypi.org/project/pylint/" rel="noopener noreferrer"&gt;pylint&lt;/a&gt; for a long time, so my first thought was: ok, let's bring in formatting and linting too. But I asked myself: isn't there something better that maintains style (&lt;a href="https://peps.python.org/pep-0008/" rel="noopener noreferrer"&gt;PEP 8&lt;/a&gt;), docstrings (&lt;a href="https://peps.python.org/pep-0257/" rel="noopener noreferrer"&gt;PEP 257&lt;/a&gt;) and type hints (&lt;a href="https://peps.python.org/pep-0484/" rel="noopener noreferrer"&gt;PEP 484&lt;/a&gt;) automatically ?&lt;/p&gt;

&lt;p&gt;And the environment, can it be modernized too ? With what ? Well, just like there are two schools, emacs and vi, there are also two schools, &lt;a href="https://python-poetry.org/" rel="noopener noreferrer"&gt;poetry&lt;/a&gt; and &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; .. without even mentioning all the others.&lt;/p&gt;

&lt;p&gt;What I needed was something to cover code quality, formatting, packaging and beyond: fewer tasks left to memory or to reading the holy README, more chances they actually get done.&lt;/p&gt;

&lt;p&gt;Since there's no "all-inclusive package", the plan was to test what was maintained and maintainable, and find the one most suited to my needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Today's chosen stack
&lt;/h2&gt;

&lt;p&gt;Four tools, not ten:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;uv&lt;/strong&gt;: the env manager. One Rust binary in place of &lt;code&gt;pip&lt;/code&gt;, &lt;code&gt;venv&lt;/code&gt;, &lt;code&gt;pyenv&lt;/code&gt; and &lt;code&gt;pipx&lt;/code&gt;. With poetry, the last two aren't covered and need to be installed separately: fewer satellite tools around.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;ruff&lt;/a&gt;&lt;/strong&gt;: formatting and linting. Replaces &lt;code&gt;black&lt;/code&gt;, &lt;code&gt;isort&lt;/code&gt;, &lt;code&gt;flake8&lt;/code&gt; and most of &lt;code&gt;pylint&lt;/code&gt;. Another Rust binary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://microsoft.github.io/pyright/" rel="noopener noreferrer"&gt;pyright&lt;/a&gt;&lt;/strong&gt;: the type checker. Skipping &lt;a href="https://mypy-lang.org/" rel="noopener noreferrer"&gt;mypy&lt;/a&gt;, &lt;a href="https://pyrefly.org/" rel="noopener noreferrer"&gt;pyrefly&lt;/a&gt; and &lt;a href="https://github.com/astral-sh/ty" rel="noopener noreferrer"&gt;ty&lt;/a&gt;. For now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pre-commit.com/" rel="noopener noreferrer"&gt;pre-commit&lt;/a&gt;&lt;/strong&gt;: a git-hook that runs ruff and pytest automatically before every commit. Just .. remember to set it up at the start of the project !&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The single criterion that drove all these choices is &lt;strong&gt;least total effort&lt;/strong&gt;. Fewer tools = less config = less maintenance. The lazy developer wants the toolchain to break before the commit, in case some step gets forgotten. But without overdoing it: just enough to produce quality code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stories from the field
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pylint and the 4.35/10 grade
&lt;/h3&gt;

&lt;p&gt;The first run of pylint on simple-sample stings: 4.35/10. A high school grade, not a teaching repo's. I sit down to fix my JavaScript hangover: &lt;code&gt;myClass&lt;/code&gt; becomes &lt;code&gt;my_class&lt;/code&gt; (PEP 8 naming), &lt;code&gt;foo&lt;/code&gt; and &lt;code&gt;bar&lt;/code&gt; and &lt;code&gt;foobar&lt;/code&gt; become &lt;code&gt;get_param_processing&lt;/code&gt;, &lt;code&gt;get_boolean&lt;/code&gt;, &lt;code&gt;get_reverse_protected_param&lt;/code&gt; (names that say what they do). Up to 9.41/10.&lt;/p&gt;

&lt;p&gt;But before claiming victory, three warnings need a decision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;W0223&lt;/strong&gt;: abstract method not implemented in a subclass. Pylint flags it as a bug to fix. In my case it MUST fail: it's part of the educational example. I keep it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C0301&lt;/strong&gt;: line too long. I look: it's an HTTP link in a docstring, can't be broken. I ignore it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C0104&lt;/strong&gt;: names like "foo" and "bar" are disallowed. I could disable the rule globally, but here I prefer having spent the hour of restructuring: variables and methods should be expressive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these decisions is a "the tool is right about the code but not about the context". And here is where pylint's limit shows up: it tells you what it found, not whether it really needs fixing. The case-by-case judgement stays with you: it doesn't change anything by itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pylint doesn't understand pytest
&lt;/h3&gt;

&lt;p&gt;I go looking for trouble, and run pylint on the test suite: a new warning shows up, W0621 &lt;code&gt;redefining-outer-name&lt;/code&gt;, on the fixtures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mci&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MyClassInterface&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_mci_creation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mci&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mci&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MyClassInterface&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pylint says "you're redefining &lt;code&gt;mci&lt;/code&gt; from the outer scope". But this pattern is the way fixtures work: it's not redefinition, it's parameter injection. Pylint reads the code as if it were running it, but it doesn't know how pytest runs it.&lt;/p&gt;

&lt;p&gt;False positive. The workaround exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mci&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mci_fixture&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MyClassInterface&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_mci_creation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mci&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mci&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MyClassInterface&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But it's there to silence pylint, not to improve the code. I don't add it. And here I start thinking that pylint is old for pytest, and it's time to switch tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ruff arrives and takes black's place
&lt;/h3&gt;

&lt;p&gt;I try &lt;code&gt;ruff check&lt;/code&gt; and &lt;code&gt;ruff format&lt;/code&gt;. It covers practically everything black did for formatting, and a good chunk of what pylint did for linting. One binary. Config in &lt;code&gt;pyproject.toml&lt;/code&gt;: a single section instead of two. Execution time: milliseconds.&lt;/p&gt;

&lt;p&gt;Ruff openly states the trade-off: it's AST-based and works on a single file at a time, it doesn't "read" the class hierarchy across files. So the abstract method not overridden, which I do need to see, doesn't get flagged. Ruff is a fast surface linter, not a deep analyst.&lt;/p&gt;

&lt;p&gt;Ok. Ruff takes black's place and covers most of pylint. For what's missing (abstract method, type consistency across files) I need another tool: a type checker.&lt;/p&gt;

&lt;h3&gt;
  
  
  The type checker tour
&lt;/h3&gt;

&lt;p&gt;Pylint flagged both typing and scoping errors (W0621 is a style check, not a type one). Choosing a type checker, I focus on the typing front: the scoping front stays out of this tour.&lt;/p&gt;

&lt;p&gt;I add type hints everywhere, otherwise the type checkers would throw a sea of red (with nothing to check): the signature &lt;code&gt;def get_param_processing(self, param):&lt;/code&gt; becomes &lt;code&gt;def get_param_processing(self, param: bool) -&amp;gt; bool:&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then I run mypy, pyrefly, ty, pyright on the same code to see who flags what.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Abstract method not implemented&lt;/th&gt;
&lt;th&gt;Return None where type hint says bool&lt;/th&gt;
&lt;th&gt;Other&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;mypy&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;historical, slow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pyrefly&lt;/td&gt;
&lt;td&gt;in a different form&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;lightning fast, young&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ty&lt;/td&gt;
&lt;td&gt;yes (interface only)&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;lightning fast, young&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pyright&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;also flags a third error: the method is used in MyClass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pyright finds more and has a mature ecosystem: Microsoft maintains it actively, and Pylance (the Python extension for VS Code) is built on top of pyright. Pyright wins. Pyrefly and ty are under active development: I'll come back to them later.&lt;/p&gt;

&lt;h3&gt;
  
  
  The workflow breaking at the first &lt;code&gt;make patch&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Setup done. Ruff passes clean. Pyright passes clean. Pre-commit stops me if I forget something. I run &lt;code&gt;make patch&lt;/code&gt; for the first "real" release .. and:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;make[1]: bump-my-version: No such file or directory
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Makefile was calling &lt;code&gt;bump-my-version&lt;/code&gt; directly, and the project's dev-deps were in &lt;code&gt;tests/requirements-test.txt&lt;/code&gt;, not in &lt;code&gt;pyproject.toml&lt;/code&gt;. So whoever cloned the repo had to know to do a &lt;code&gt;pip install -r tests/requirements-test.txt&lt;/code&gt; on top of &lt;code&gt;uv sync&lt;/code&gt;, and the release workflow assumed the venv was activated. Too much implicit knowledge, too much hassle.&lt;/p&gt;

&lt;p&gt;I'm so used to using &lt;code&gt;uv run&lt;/code&gt; that I don't run &lt;code&gt;source .venv/bin/activate&lt;/code&gt; anymore, so I tripped over something that "the old-fashioned way" would never have happened.&lt;/p&gt;

&lt;p&gt;What did it take to truly hand the environment over to uv ? Well, all I needed was to add every dependency in &lt;code&gt;pyproject.toml&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv add &lt;span class="nt"&gt;--dev&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; tests/requirements-test.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single command. uv reads the requirements file, writes everything in &lt;code&gt;[dependency-groups].dev&lt;/code&gt; of &lt;code&gt;pyproject.toml&lt;/code&gt; (the standard introduced by &lt;a href="https://peps.python.org/pep-0735/" rel="noopener noreferrer"&gt;PEP 735&lt;/a&gt; for dev-deps), updates &lt;code&gt;uv.lock&lt;/code&gt;, and installs. The &lt;code&gt;tests/requirements-test.txt&lt;/code&gt; file becomes redundant: one less file to handle.&lt;/p&gt;

&lt;p&gt;And then in the Makefile I added &lt;code&gt;uv run&lt;/code&gt; in front of every Python command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nl"&gt;release&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    uv run bump-my-version bump &lt;span class="p"&gt;$(&lt;/span&gt;PART&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;$(&lt;/span&gt;MAKE&lt;span class="p"&gt;)&lt;/span&gt; changelog
    git tag &lt;span class="nt"&gt;-f&lt;/span&gt; v&lt;span class="p"&gt;$$(&lt;/span&gt;uv run python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from simple_sample import __version__; print(__version__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    git push &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git push &lt;span class="nt"&gt;--tags&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;make patch&lt;/code&gt; works even from a fresh shell, no activation needed. The venv is no longer tribal knowledge, it's implicit in every command.&lt;/p&gt;

&lt;h3&gt;
  
  
  Seven sections in &lt;code&gt;pyproject.toml&lt;/code&gt;, one per tool
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;pyproject.toml&lt;/code&gt; was born for packaging, and from there it picked up the config sections of the project's tools: seven in total.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ruff&lt;/strong&gt; starts from &lt;code&gt;select = ["ALL"]&lt;/code&gt;: I enable every available rule and use &lt;code&gt;ignore&lt;/code&gt; for the ones I find too much. Philosophy "everything by default, exclude by name": as ruff adds new rules, I get them automatically. And the "ALL" bundle isn't just style + lint: it includes naming (PEP 8), docstring (PEP 257), type annotations (PEP 484, with &lt;code&gt;flake8-annotations&lt;/code&gt;), cyclomatic complexity (&lt;code&gt;mccabe&lt;/code&gt;), basic security (&lt;code&gt;bandit-base&lt;/code&gt;), import order (&lt;code&gt;isort&lt;/code&gt;). Ruff isn't "just" a formatter + linter, it's the umbrella under which black + isort + flake8 + parts of pylint, pydocstyle and bandit live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pyright&lt;/strong&gt; in &lt;code&gt;typeCheckingMode = "strict"&lt;/code&gt;: the default &lt;code&gt;basic&lt;/code&gt; lets a lot slide, &lt;code&gt;strict&lt;/code&gt; requires complete type hints and explicit returns. It's the mode that surfaces those errors the type checker tour had revealed (and that mypy / pyrefly / ty in default config would have missed).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pytest&lt;/strong&gt;: minimal config, &lt;code&gt;asyncio_mode = "auto"&lt;/code&gt; and &lt;code&gt;testpaths = ["tests"]&lt;/code&gt;. The rest lives in the tests themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;[dependency-groups].dev&lt;/strong&gt;: the list of dev-deps with version constraints (PEP 735). uv reads this section for &lt;code&gt;uv sync --group dev&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;packaging&lt;/strong&gt; (&lt;code&gt;[build-system]&lt;/code&gt;, &lt;code&gt;[project]&lt;/code&gt;, &lt;code&gt;[tool.setuptools]&lt;/code&gt;), &lt;strong&gt;bumpversion&lt;/strong&gt;, &lt;strong&gt;git-cliff&lt;/strong&gt;: handle the release pipeline (metadata + runtime dependencies + wheel and sdist build + versioning + CHANGELOG from conventional commits). A different topic from code quality, but necessary for the modernization and automation goal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pre-commit&lt;/strong&gt; lives in &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt; (outside &lt;code&gt;pyproject.toml&lt;/code&gt;): it points to the official &lt;code&gt;astral-sh/ruff-pre-commit&lt;/code&gt; repo for the two ruff hooks (check + format) and keeps a local hook running &lt;code&gt;uv run pytest&lt;/code&gt; for the tests. So pre-commit also leans on uv to access the project's venv, just like the Makefile targets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plus
&lt;/h2&gt;

&lt;p&gt;The lazy developer adds tools when they're really needed, when it's time to handle some other aspect automatically.&lt;/p&gt;

&lt;p&gt;Still on the code quality front, what could be added and when ?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pypi.org/project/vulture/" rel="noopener noreferrer"&gt;vulture&lt;/a&gt; and &lt;a href="https://pypi.org/project/radon/" rel="noopener noreferrer"&gt;radon&lt;/a&gt;&lt;/strong&gt;: project-level dead code and complexity reports. When a map of the codebase is needed, for instance before a major refactor: ruff sees the single file, vulture and radon see the whole.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pypi.org/project/bandit/" rel="noopener noreferrer"&gt;bandit&lt;/a&gt; (SAST), &lt;a href="https://pypi.org/project/pip-audit/" rel="noopener noreferrer"&gt;pip-audit&lt;/a&gt; (SCA) and &lt;a href="https://pypi.org/project/detect-secrets/" rel="noopener noreferrer"&gt;detect-secrets&lt;/a&gt;&lt;/strong&gt;: if the package becomes an API or handles sensitive data, but here a whole new world opens up ..&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mypy in strict mode&lt;/strong&gt;: a second pass on top of pyright. Today I don't have an example that would push me to add it, pyright strict covers well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pyrefly and ty&lt;/strong&gt;: worth re-evaluating especially for projects with many files. They're fast but young.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pre-commit.ci/" rel="noopener noreferrer"&gt;pre-commit.ci&lt;/a&gt;&lt;/strong&gt;: a hook that runs in CI on every PR too. For a personal one-maintainer project it's overhead, for a shared repo it would make sense.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>pytest</category>
      <category>ruff</category>
      <category>pyright</category>
      <category>uv</category>
    </item>
    <item>
      <title>Realtime transcription: choices and stories for PyCon IT</title>
      <dc:creator>Alessandra Bilardi</dc:creator>
      <pubDate>Mon, 20 Apr 2026 21:23:48 +0000</pubDate>
      <link>https://dev.to/bilardi/realtime-transcription-choices-and-stories-for-pycon-it-4ehd</link>
      <guid>https://dev.to/bilardi/realtime-transcription-choices-and-stories-for-pycon-it-4ehd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4m0zy96pe583q0v8bcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4m0zy96pe583q0v8bcl.png" alt="Architecture" width="800" height="957"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why all this interest in realtime transcription
&lt;/h2&gt;

&lt;p&gt;It all started with the collaboration with PyCon IT. At PyCon IT 2025 they set up live transcription with local Whisper on a Graphics Processing Unit (GPU), based on the repo &lt;a href="https://github.com/sofdog-gh/realtime-transcription-fastrtc" rel="noopener noreferrer"&gt;&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt;&lt;/a&gt;. With the YouTube videos used as tests, all good. With the real audio of a conference room, Whisper started hallucinating: a generative model, if you give it a signal it doesn't recognize, doesn't leave a blank, it writes something anyway.&lt;/p&gt;

&lt;p&gt;For PyCon IT 2026 a different path was needed, on a non-negotiable anchor: no hallucinations. If the model doesn't hear, ok, skip a word. If it hears badly, ok, transcribe badly. But it must not write sentences I didn't say.&lt;/p&gt;

&lt;p&gt;Fixing Whisper's hallucinations directly (Voice Activity Detection, tuning decoding parameters, logprob filters, fine-tuning, ..) would have been a separate effort: I didn't have the time, with everything else to build. A bigger Whisper I haven't tested. Other paid generative Speech To Text (STT) services either: they stay in the same category of a model that produces text token after token, so the structural risk of invention stays. To get out of the category, a managed service based on acoustic decoding was needed. And since it's PyCon, let's also grab the bonus of decoupling the pieces and writing it in a testable way.&lt;/p&gt;

&lt;h2&gt;
  
  
  A model that gets it wrong but doesn't make it up
&lt;/h2&gt;

&lt;p&gt;Let's start with the engine. Then with what's around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  STT: who gets it wrong, who makes it up
&lt;/h3&gt;

&lt;p&gt;I didn't run empirical benchmarks on the three. The choice played out on two axes: &lt;strong&gt;model structure&lt;/strong&gt; (generative or not) and &lt;strong&gt;delivery&lt;/strong&gt; (self-hosted or managed). The properties in the table come from product documentation and from direct observation of Whisper at PyCon IT 2025, not from A/B tests.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Whisper local&lt;/th&gt;
&lt;th&gt;Amazon Transcribe Streaming&lt;/th&gt;
&lt;th&gt;Paid generative STT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;generative (autoregressive)&lt;/td&gt;
&lt;td&gt;non-generative (acoustic decoding)&lt;/td&gt;
&lt;td&gt;generative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hallucinations structurally possible&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery&lt;/td&gt;
&lt;td&gt;self-hosted&lt;/td&gt;
&lt;td&gt;managed&lt;/td&gt;
&lt;td&gt;managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;GPU + model&lt;/td&gt;
&lt;td&gt;AWS credentials&lt;/td&gt;
&lt;td&gt;credentials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network dependency&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;on-site hardware&lt;/td&gt;
&lt;td&gt;$0.024/min&lt;/td&gt;
&lt;td&gt;variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Declared latency&lt;/td&gt;
&lt;td&gt;1-15s end of segment&lt;/td&gt;
&lt;td&gt;~300ms partial&lt;/td&gt;
&lt;td&gt;depends&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most important criterion is architecture. A non-generative model cannot, by construction, add words it didn't hear: at worst it skips or gets it wrong. A generative model can. The other criteria (network, cost, latency) are secondary trade-offs, all acceptable for a conference context: there's internet, a 30-minute talk costs ~$0.72, partial results arrive in ~300ms.&lt;/p&gt;

&lt;p&gt;Choice: Amazon Transcribe Streaming. Not because it's "the best" in absolute terms, but because it sits in the category that rules out at the root the problem we're here for. The repo &lt;a href="https://github.com/bilardi/video-to-text" rel="noopener noreferrer"&gt;&lt;code&gt;video-to-text&lt;/code&gt;&lt;/a&gt; I wrote on purpose to test Transcribe as an alternative to Whisper.&lt;/p&gt;

&lt;h3&gt;
  
  
  New repo or fork of the old one ?
&lt;/h3&gt;

&lt;p&gt;The other big choice: fork of &lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; (the one already used at PyCon IT 2025), or a new repo that takes only the good pieces from the two predecessors (&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; and &lt;code&gt;video-to-text&lt;/code&gt;) ?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Fork&lt;/th&gt;
&lt;th&gt;New repo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Initial effort&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fragile dependencies inherited&lt;/td&gt;
&lt;td&gt;FastRTC v0.0.26&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;monolithic to dismantle&lt;/td&gt;
&lt;td&gt;designed for the use case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testability&lt;/td&gt;
&lt;td&gt;inherits the existing scope&lt;/td&gt;
&lt;td&gt;every component in isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: new repo. As a lazy developer one would be tempted to fork, but when a dependency is fragile (FastRTC v0.0.26 isn't a stable standard), a fork could cost more than a targeted rewrite.&lt;/p&gt;

&lt;p&gt;From &lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; I keep the &lt;code&gt;screen&lt;/code&gt; layout (black background, large text) and the auto-scroll logic of the frontend. From &lt;code&gt;video-to-text&lt;/code&gt; I take the &lt;code&gt;transcribe_service.py&lt;/code&gt; module and the async pattern with &lt;code&gt;asyncio.Queue&lt;/code&gt; + &lt;code&gt;asyncio.gather()&lt;/code&gt;. The rest gets dropped.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture: monolithic or decoupled ?
&lt;/h3&gt;

&lt;p&gt;As a lazy developer, I don't want to redo everything moving from Proof of Concept (PoC) to Minimum Viable Product (MVP). The two predecessors already have pieces that work (the &lt;code&gt;screen&lt;/code&gt; layout of &lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt;, the &lt;code&gt;transcribe_service&lt;/code&gt; of &lt;code&gt;video-to-text&lt;/code&gt;), but they're pieces from different repos, made for different purposes. To recycle them, the modules need clear boundaries.&lt;/p&gt;

&lt;p&gt;A decoupled architecture here means having three components as three separate processes that talk to each other over the network:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the audio client, which captures audio from the system device and sends it to the server&lt;/li&gt;
&lt;li&gt;the server, which receives audio, manages the stream toward Amazon Transcribe, and publishes the text&lt;/li&gt;
&lt;li&gt;the display client, which receives the text from the server and shows it on the dedicated monitor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The alternative architecture is a single process (a single running program) that captures, transcribes, displays.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Monolithic&lt;/th&gt;
&lt;th&gt;Decoupled&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deploy&lt;/td&gt;
&lt;td&gt;a single binary&lt;/td&gt;
&lt;td&gt;three components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distribution across multiple computers&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes (native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testability&lt;/td&gt;
&lt;td&gt;internal dependencies&lt;/td&gt;
&lt;td&gt;each component in isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Communication overhead&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;network calls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: decoupled. It works both in development with everything on one computer (localhost), and at the conference with three separate computers: audio client in the control room near the mixer, server on any computer connected to the network, and display client on the computer that drives the monitor. The monolithic instead locks everything onto a single computer, and the code couples the components: tests and replacements require more work. With more rooms the bill gets worse: you'd need a full copy of the system per room (audio, server, display for each), whereas the decoupled shares a single server across all rooms, and each room only adds an audio-and-display client on the same computer, or, to avoid running a long cable across the room, a second display client near the monitor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Audio client: browser or standalone ?
&lt;/h3&gt;

&lt;p&gt;The audio to transcribe has different sources depending on the context: laptop microphone in local tests, Universal Serial Bus (USB) or analog mixer in the room, browser loopback for live apps like StreamYard. Who picks up this flow and sends it to the server ?&lt;/p&gt;

&lt;p&gt;Two candidates: the browser app with &lt;code&gt;getUserMedia&lt;/code&gt; (&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt;'s path), or a standalone Python script launched from the audio computer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;In the browser&lt;/th&gt;
&lt;th&gt;Standalone Python script&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System devices (mixer)&lt;/td&gt;
&lt;td&gt;limited&lt;/td&gt;
&lt;td&gt;full access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser dependency&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testability&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: standalone Python with &lt;code&gt;sounddevice&lt;/code&gt;. At a conference, audio doesn't come from the speaker's laptop microphone, but from a room mixer or a dedicated microphone connected via USB. The browser's Web Audio APIs don't expose virtual sinks and USB mixers as separate devices. Instead, a Python script with &lt;code&gt;sounddevice&lt;/code&gt; sees all the devices the operating system exposes, loopback and mixer included.&lt;/p&gt;

&lt;h3&gt;
  
  
  Protocol between audio client and server
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; used Web Real-Time Communication (WebRTC); &lt;code&gt;video-to-text&lt;/code&gt; instead WebSocket (WS). Which makes sense here ?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;WebRTC&lt;/th&gt;
&lt;th&gt;WS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bidirectionality&lt;/td&gt;
&lt;td&gt;required&lt;/td&gt;
&lt;td&gt;not needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network setup&lt;/td&gt;
&lt;td&gt;Network Address Translation (NAT), Traversal Using Relays around NAT (TURN), Interactive Connectivity Establishment (ICE)&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;path-dependent&lt;/td&gt;
&lt;td&gt;persistent connection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: WS. The audio client sends, the server receives. Bidirectionality isn't needed, so WebRTC is overkill. Persistence, on the other hand, is: a talk lasts tens of minutes, audio goes in chunks every 100ms, and on the server the same pipe keeps the Amazon Transcribe stream open for the whole session. WS covers both without the WebRTC layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transcript channel between server and display
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; used Server-Sent Events (SSE); &lt;code&gt;video-to-text&lt;/code&gt; WS. Which here ?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;SSE&lt;/th&gt;
&lt;th&gt;WS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fits the case&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tech already in use&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes (for audio)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate code&lt;/td&gt;
&lt;td&gt;a second handler&lt;/td&gt;
&lt;td&gt;same stack&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: WS. SSE would technically be enough (unidirectional server -&amp;gt; client, fine for the transcript). But WS is already in the house for the audio channel: keeping a single technology means a single stack of handlers server-side and a single client-side library, instead of two.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partial results vs final
&lt;/h3&gt;

&lt;p&gt;Amazon Transcribe sends both partials (text that changes until the segment is stable) and finals (stable). To compare the two delivery modes in the field, the display supports both via the &lt;code&gt;?partial=true|false&lt;/code&gt; flag: picked at runtime, not at build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Partial on by default&lt;/th&gt;
&lt;th&gt;Partial off by default&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Readability on the monitor&lt;/td&gt;
&lt;td&gt;low (changing text)&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perceived latency&lt;/td&gt;
&lt;td&gt;good&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: off by default. A dedicated monitor with text that writes, erases and rewrites is unpleasant to look at. Partials can be turned on via &lt;code&gt;?partial=true&lt;/code&gt; on the display if in a specific room the delay of finals ends up bothering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Language: zero restart between talks
&lt;/h3&gt;

&lt;p&gt;Amazon Transcribe wants the language when opening the stream (&lt;code&gt;language_code="it-IT"&lt;/code&gt; or &lt;code&gt;"en-US"&lt;/code&gt;). At PyCon, rooms have consecutive talks in different languages: Italian, English. Two paths: language as a global server configuration, or as a parameter per connection of the audio client.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Global in the server&lt;/th&gt;
&lt;th&gt;Per-room parameter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language change between talks&lt;/td&gt;
&lt;td&gt;server restart&lt;/td&gt;
&lt;td&gt;zero restart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability to multiple rooms in parallel&lt;/td&gt;
&lt;td&gt;all same language&lt;/td&gt;
&lt;td&gt;each room its own&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: per-room parameter. With the global version, a restart would be needed at every language change (or a proxy that discriminates per path, complicating things). With the per-room parameter, the server stays up for the whole day, and the audio client reopens at the next talk with the right language (&lt;code&gt;?lang=it-IT&lt;/code&gt; or &lt;code&gt;?lang=en-US&lt;/code&gt;). And it also works with multiple rooms in parallel: each room has its own language, independent of the others.&lt;/p&gt;

&lt;p&gt;Concretely: every WS connection is an independent handler on FastAPI, and each opens its own Amazon Transcribe stream with its own language. There's no shared state between different streams, so the language of one room cannot affect another.&lt;/p&gt;

&lt;h3&gt;
  
  
  Display: dynamic app or static HTML ?
&lt;/h3&gt;

&lt;p&gt;In this case, the display is what the audience looks at: a dedicated monitor with text scrolling as it arrives. It must update in real time receiving messages from the server, but does nothing else: no forms, no interaction.&lt;/p&gt;

&lt;p&gt;Two paths: a dynamic app (React, Vue or similar, with build and state management), or a static HTML page with a bit of JS that opens a WS and appends text.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Dynamic app&lt;/th&gt;
&lt;th&gt;Static HTML + JS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Client-side state&lt;/td&gt;
&lt;td&gt;possible&lt;/td&gt;
&lt;td&gt;only via WS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy&lt;/td&gt;
&lt;td&gt;requires build&lt;/td&gt;
&lt;td&gt;file served by the server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reuse from &lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes (CSS + JS)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Choice: static HTML. No client-side state needed: the browser opens the page, receives text via WS, shows it. No build. And the CSS of &lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt;'s &lt;code&gt;screen&lt;/code&gt; mode gets reused as is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choices at a glance
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;realtime-transcription&lt;/code&gt; choices don't come out of nowhere: some are new decisions for the live use case, others are pieces lifted from the two predecessors. Here they are in a row, with the source of inspiration. For the sequence diagram with WS endpoints and message flow, see the &lt;a href="https://github.com/bilardi/realtime-transcription#architecture" rel="noopener noreferrer"&gt;README of the repo&lt;/a&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Winning option&lt;/th&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;STT&lt;/td&gt;
&lt;td&gt;Amazon Transcribe Streaming&lt;/td&gt;
&lt;td&gt;no hallucinations&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;video-to-text&lt;/code&gt; (transcribe_service)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repo&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;td&gt;less tech debt&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;decoupled (3 components)&lt;/td&gt;
&lt;td&gt;reuse from predecessors, deploy flexibility&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio client&lt;/td&gt;
&lt;td&gt;standalone Python&lt;/td&gt;
&lt;td&gt;full access to system devices&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio protocol&lt;/td&gt;
&lt;td&gt;WS&lt;/td&gt;
&lt;td&gt;persistent connection, minimal network setup&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transcript channel&lt;/td&gt;
&lt;td&gt;WS&lt;/td&gt;
&lt;td&gt;single stack server + client&lt;/td&gt;
&lt;td&gt;&lt;code&gt;video-to-text&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial vs final&lt;/td&gt;
&lt;td&gt;flag `?partial=true\&lt;/td&gt;
&lt;td&gt;false`&lt;/td&gt;
&lt;td&gt;readability on the monitor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;per room&lt;/td&gt;
&lt;td&gt;zero restart between talks, scales to more rooms&lt;/td&gt;
&lt;td&gt;new&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Display&lt;/td&gt;
&lt;td&gt;static HTML&lt;/td&gt;
&lt;td&gt;no build, reuse of existing work&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;realtime-transcription-fastrtc&lt;/code&gt; (CSS + JS &lt;code&gt;screen&lt;/code&gt; mode)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The stories you only find when you plug things in
&lt;/h2&gt;

&lt;p&gt;The real fun starts when you stop drawing and turn on the machines.&lt;/p&gt;

&lt;h3&gt;
  
  
  The device number on Fedora
&lt;/h3&gt;

&lt;p&gt;The first time I ran &lt;code&gt;uv run python -m audio_client --list-devices&lt;/code&gt; I found myself facing a long list with the same hardware (my headphones in the docking station jack) showing up multiple times, with similar names and different IDs. On Linux several audio layers coexist (ALSA at the kernel, JACK for pro audio, PipeWire as a modern sound server) and &lt;code&gt;sounddevice&lt;/code&gt; lists them all: each exposes the same device, each is a candidate on paper.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Device ID&lt;/th&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ALSA&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;doesn't work as one might expect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JACK&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;doesn't work as one might expect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PipeWire (system default)&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;works (it's the active routing of the system)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There's no logic that helps you pick a priori: it depends on what the system uses as default routing. On Fedora 41 it's PipeWire, so the "right" ID was 20. I tried all three before figuring out the logic.&lt;/p&gt;

&lt;p&gt;Rule of thumb: if the audio doesn't get where it should, try all the candidates before touching the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  The browser loopback
&lt;/h3&gt;

&lt;p&gt;One of the audio sources to transcribe is StreamYard, which is a browser app: the speaker's audio goes out of the browser to the system's default sink. &lt;code&gt;audio_client&lt;/code&gt; with &lt;code&gt;sounddevice&lt;/code&gt; can capture from system devices (microphone, USB mixer), but can't read directly from an app's output. A bridge is needed: a virtual sink the browser writes to, and whose monitor &lt;code&gt;audio_client&lt;/code&gt; reads from.&lt;/p&gt;

&lt;p&gt;On Linux with PipeWire (or PulseAudio) the bridge is &lt;code&gt;module-null-sink&lt;/code&gt;. You load a sink called &lt;code&gt;loopback&lt;/code&gt;, you move the browser's stream onto it, you point &lt;code&gt;audio_client&lt;/code&gt; at the null-sink's monitor. It works on the first try, but there's a side effect: while the browser's stream is on the null-sink, I can't hear it on my headphones anymore. In the room it's not a problem (audio comes from the physical mixer, not from the laptop browser). In development, yes: I can't verify what I'm transcribing.&lt;/p&gt;

&lt;p&gt;I tried three paths: two deaf, one hearing clearly.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;audio_client hears&lt;/th&gt;
&lt;th&gt;Headphones hear&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;module-null-sink&lt;/code&gt; + move browser&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;base setup, muted on the laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;module-combine-sink&lt;/code&gt; with slaves&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;failed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;module-null-sink&lt;/code&gt; + &lt;code&gt;module-loopback&lt;/code&gt; as a parallel branch&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes (+~50ms)&lt;/td&gt;
&lt;td&gt;adopted solution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The path that works is &lt;code&gt;module-loopback&lt;/code&gt; as a parallel branch. The null-sink &lt;code&gt;loopback&lt;/code&gt; stays source for &lt;code&gt;audio_client&lt;/code&gt;; on top you load a &lt;code&gt;module-loopback&lt;/code&gt; that reads from the null-sink's monitor and writes to the default sink. Two independent consumers on the same monitor, neither blocks the other.&lt;/p&gt;

&lt;p&gt;The ~50ms is &lt;code&gt;module-loopback&lt;/code&gt;'s buffer. For the transcription nothing changes: the &lt;code&gt;audio_client&lt;/code&gt; branch stays instant. The 50ms is only what I hear in headphones compared to what leaves the browser.&lt;/p&gt;

&lt;p&gt;Everything is wrapped in two &lt;code&gt;make&lt;/code&gt; commands: &lt;code&gt;make loopback_redirect APP=firefox&lt;/code&gt; (which also accepts &lt;code&gt;MONITOR=1&lt;/code&gt; for the listening branch to headphones) and &lt;code&gt;make loopback_clean&lt;/code&gt; that cleans up.&lt;/p&gt;

&lt;p&gt;Practical choice: default &lt;code&gt;MONITOR=0&lt;/code&gt;. At the conference audio comes from the mixer, not the laptop, so hearing it locally isn't needed. &lt;code&gt;MONITOR=1&lt;/code&gt; is a development luxury.&lt;/p&gt;

&lt;h2&gt;
  
  
  How much hardware do you need ?
&lt;/h2&gt;

&lt;p&gt;I haven't benchmarked the system on specific hardware yet, so I'm basing this on typical sizes of similar Python applications. Better to oversize than to pick the bare minimum: on a real deploy you want margin, not to crash on the first spike.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;RAM/CPU&lt;/th&gt;
&lt;th&gt;Recommended example&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Audio client&lt;/td&gt;
&lt;td&gt;~50-100MB&lt;/td&gt;
&lt;td&gt;Pi 4 2GB with USB mic&lt;/td&gt;
&lt;td&gt;Pi 3 technically enough but tight&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;~100-200MB base + ~30-50MB per room&lt;/td&gt;
&lt;td&gt;EC2 t4g.small (2GB, ARM) or Pi 4 4-8GB&lt;/td&gt;
&lt;td&gt;Pi 4 handles 1-2 rooms; EC2 for more&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Display client&lt;/td&gt;
&lt;td&gt;~200-300MB for Chromium&lt;/td&gt;
&lt;td&gt;Pi 4 4GB&lt;/td&gt;
&lt;td&gt;Pi 4 2GB technically enough but tight&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three deploy scenarios:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended device&lt;/th&gt;
&lt;th&gt;When and why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All separate&lt;/td&gt;
&lt;td&gt;Pi 4 2GB (audio) + EC2 t4g.small (server) + Pi 4 4GB (display)&lt;/td&gt;
&lt;td&gt;Multi-room conference; server in cloud for sharing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All together&lt;/td&gt;
&lt;td&gt;A laptop with 8GB, or a Pi 4 8GB with USB mic&lt;/td&gt;
&lt;td&gt;Development, local demo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio + server together, display separate&lt;/td&gt;
&lt;td&gt;Pi 4 8GB (audio+server) + Pi 4 4GB (display)&lt;/td&gt;
&lt;td&gt;A single room, zero cloud; the audio Pi also hosts the server&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For one room, two Pis are enough. With a Pi 5 (server) you can push to 2-3 rooms; beyond that, EC2 is the way. EC2 or a more powerful laptop are natural upgrades anywhere, if you want more margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anything else to add ?
&lt;/h2&gt;

&lt;p&gt;What's there today is good enough for one room, with any computer connected to the network. But the design holds beyond, when it's worth it.&lt;/p&gt;

&lt;h3&gt;
  
  
  More rooms, same setup
&lt;/h3&gt;

&lt;p&gt;If many rooms in parallel are needed, the infrastructure can be handled with &lt;a href="https://github.com/bilardi/aws-docker-host" rel="noopener noreferrer"&gt;aws-docker-host&lt;/a&gt;, which spins up an Elastic Compute Cloud (EC2) instance with Docker ready to use. The &lt;code&gt;realtime-transcription&lt;/code&gt; server already ships with docker compose, and the opening image describes exactly this scenario.&lt;/p&gt;

&lt;h3&gt;
  
  
  When one EC2 isn't enough: ECS Fargate
&lt;/h3&gt;

&lt;p&gt;If there are many rooms and the load varies, a single static EC2 becomes tight. Fargate (part of Elastic Container Service, ECS) spins up tasks on-demand and shuts them down when needed. But live transcription lives on long-lived WS, and from the AWS documentation there are some points to configure with care (I haven't tested them on the project):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sticky sessions&lt;/strong&gt;: a one-hour WS connection must stay on the same Fargate task. The Application Load Balancer (ALB) supports WS, but the session must be routed with affinity. No per-packet round-robin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idle timeout&lt;/strong&gt;: the ALB target group default is 60 seconds of inactivity. A 20-second pause between sentences isn't inactivity (the client sends silence every 100ms), but it's worth raising the timeout to a few minutes for safety.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graceful shutdown&lt;/strong&gt;: during a deploy or a scale-in, the task that's closing must let open Transcribe streams finish, not cut off mid-talk. The container must handle &lt;code&gt;SIGTERM&lt;/code&gt; and close the WSs gracefully, giving the client time to reconnect to a different task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Authentication on the WebSockets
&lt;/h3&gt;

&lt;p&gt;Today the WSs are open: anyone who knows &lt;code&gt;/ws/audio/{sala}&lt;/code&gt; can inject audio, anyone who knows &lt;code&gt;/ws/transcript/{sala}&lt;/code&gt; can listen. For a deploy in a Local Area Network (LAN) or a private cloud on a Virtual Private Network (VPN) it's perfectly fine. On the public internet you need at least:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a token in the path or query (e.g. &lt;code&gt;?token=...&lt;/code&gt;), validated at connect&lt;/li&gt;
&lt;li&gt;rate limit per Internet Protocol (IP) on the audio channel&lt;/li&gt;
&lt;li&gt;permission separation: whoever can write on room X may not necessarily be allowed to read it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the minimum requirements to expose the endpoints on the public internet.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>transcribe</category>
      <category>docker</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Docker on EC2 with Terraform</title>
      <dc:creator>Alessandra Bilardi</dc:creator>
      <pubDate>Fri, 10 Apr 2026 22:25:12 +0000</pubDate>
      <link>https://dev.to/bilardi/docker-on-ec2-with-terraform-41lp</link>
      <guid>https://dev.to/bilardi/docker-on-ec2-with-terraform-41lp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff629k4bxool0cbahalah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff629k4bxool0cbahalah.png" alt="Architecture" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this project
&lt;/h2&gt;

&lt;p&gt;I was preparing a &lt;a href="https://github.com/bilardi/n8n-workshop" rel="noopener noreferrer"&gt;workshop&lt;/a&gt; and needed to expose a url with a specific interface, sparing participants from installing docker or anything else on their machines.&lt;/p&gt;

&lt;p&gt;I built the workshop locally with docker compose, which is one of the ways to develop and test locally: it works, it's fast, it's reproducible. And then?&lt;/p&gt;

&lt;p&gt;Then you need to move everything to the cloud. And as a lazy developer, why not use that same docker compose?&lt;/p&gt;

&lt;p&gt;The point isn't running Docker in the cloud - it's everything around it: HTTPS, custom domain, machine access, data backups, and the ability to rebuild or tear it all down with one command.&lt;/p&gt;

&lt;p&gt;With IaC you can manage HTTPS, custom domain, backups, access and cleanup smoothly: everything in one place, versioned, reproducible. Without IaC, you start from scratch every time.&lt;/p&gt;

&lt;p&gt;The usual options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manual EC2 setup&lt;/strong&gt;: SSH in, install Docker, configure nginx, certbot, and pray. Slow, fragile, and hard to reproduce.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECS/Fargate&lt;/strong&gt;: task definition, service discovery, cluster .. for what ? Using Fargate for a single container is like hiring a moving truck to carry your groceries home.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker on EC2 with Terraform&lt;/strong&gt;: one &lt;code&gt;terraform apply&lt;/code&gt; to spin up, one &lt;code&gt;bash scripts/destroy.sh&lt;/code&gt; to tear down. Backups included.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The third option is what I chose because it has the simplest architecture .. and the most complex part depends on your user data !&lt;/p&gt;

&lt;p&gt;The architecture in the image above is generated directly from the Terraform code (spoiler) in the &lt;a href="https://github.com/bilardi/aws-docker-host" rel="noopener noreferrer"&gt;repo&lt;/a&gt;, where you can find the README.md and all the details to use it.&lt;/p&gt;

&lt;p&gt;But let's take it step by step. The third option can be implemented in 1024 different ways: which IaC tool ? How do you handle HTTPS ? How do you access the machine ? Where do you store backups ? How do you manage DNS ? Which AMI ? It depends. The point is asking the right questions.&lt;/p&gt;

&lt;p&gt;As a lazy developer, every choice follows one criterion: less effort, in terms of time, cost, or both. And when less effort isn't enough to decide, the cleanest path is a minimal system: you know what's there, you know what's missing, no surprises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Terraform and not CDK
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Terraform&lt;/th&gt;
&lt;th&gt;CDK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;HCL: declarative, simple&lt;/td&gt;
&lt;td&gt;TypeScript/Python: powerful but verbose for simple infra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Local file, zero dependencies&lt;/td&gt;
&lt;td&gt;Requires CloudFormation stack, S3 bucket for assets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bootstrap&lt;/td&gt;
&lt;td&gt;&lt;code&gt;terraform init&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cdk bootstrap&lt;/code&gt; already creates resources in your AWS account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Low for simple infra&lt;/td&gt;
&lt;td&gt;Need to know both CDK and CloudFormation .. and their quirks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Destruction&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;terraform destroy&lt;/code&gt;: clean, predictable&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cdk destroy&lt;/code&gt;, which sometimes leaves orphaned resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For an ephemeral workshop run by one person, Terraform with local state is the minimum effort. CDK makes sense when the infra grows, you need complex logic, or there's a team involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  The choices and why
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Why (less effort)&lt;/th&gt;
&lt;th&gt;The discarded alternative (more effort)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ALB + ACM&lt;/td&gt;
&lt;td&gt;Free HTTPS certificate, auto-renewal, no certbot/nginx&lt;/td&gt;
&lt;td&gt;Let's Encrypt on EC2: port 80 open, cron for renewal, more moving parts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSM instead of SSH&lt;/td&gt;
&lt;td&gt;No keys, no port 22, audit trail on CloudTrail&lt;/td&gt;
&lt;td&gt;SSH key pair, SG rules, bastion if private subnet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 for backups&lt;/td&gt;
&lt;td&gt;Costs nothing, survives the EC2, simple CLI&lt;/td&gt;
&lt;td&gt;EBS snapshot: tied to instance lifecycle, harder to restore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Route 53 hosted zone&lt;/td&gt;
&lt;td&gt;DNS validation for ACM, alias record for ALB, all managed by Terraform&lt;/td&gt;
&lt;td&gt;External DNS only: manual certificate validation or HTTP challenge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Linux 2023 minimal&lt;/td&gt;
&lt;td&gt;Clean AMI, you install only what you need&lt;/td&gt;
&lt;td&gt;AL2023 standard: doesn't have Docker anyway, but has hundreds of extra packages you don't need&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;docker compose up --build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Works with both &lt;code&gt;build&lt;/code&gt; and &lt;code&gt;image&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Separate logic for build vs pull: pointless complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local state&lt;/td&gt;
&lt;td&gt;The workshop is ephemeral, one operator, no team&lt;/td&gt;
&lt;td&gt;Remote state (S3 + DynamoDB): cost and setup for zero benefit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conditional VPC&lt;/td&gt;
&lt;td&gt;Three modes: use an existing VPC, find the default, or create a new one&lt;/td&gt;
&lt;td&gt;Always new VPC: waste for a workshop running in the default VPC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conditional S3 bucket&lt;/td&gt;
&lt;td&gt;Pass one and it uses it. Don't, and it creates one named after the domain&lt;/td&gt;
&lt;td&gt;Always new bucket: waste for someone running many workshops and just managing backups&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I learned (the hard way)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The right AMI and how much disk
&lt;/h3&gt;

&lt;p&gt;As a lazy developer, instead of reading the documentation, one command to see what's out there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 describe-images &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=name,Values=al2023-ami-*-x86_64"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--owners&lt;/span&gt; amazon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'reverse(sort_by(Images, &amp;amp;CreationDate))[:10].[Name, BlockDeviceMappings[0].Ebs.VolumeSize]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three variants: &lt;strong&gt;minimal&lt;/strong&gt; (2 GB), &lt;strong&gt;standard&lt;/strong&gt; (8 GB), &lt;strong&gt;ECS-optimized&lt;/strong&gt; (30 GB). The ECS one comes with Docker but is meant to run in an ECS cluster, not on a standalone EC2. Standard and minimal don't have Docker: you need to install it either way.&lt;/p&gt;

&lt;p&gt;At that point, what does the standard have that minimal doesn't ? SSM agent and a few hundred packages you don't need. The &lt;a href="https://docs.aws.amazon.com/linux/al2023/ug/image-comparison.html" rel="noopener noreferrer"&gt;package comparison page&lt;/a&gt; confirms it: no Docker, no buildx, nothing that changes the picture.&lt;/p&gt;

&lt;p&gt;Minimal is the cleanest choice: install Docker, SSM agent and buildx in the user data, and you know exactly what's on the machine. One thing to watch: the 2 GB disk isn't enough, set &lt;code&gt;volume_size = 20&lt;/code&gt; and move on.&lt;/p&gt;

&lt;h3&gt;
  
  
  ssm-user is not root
&lt;/h3&gt;

&lt;p&gt;When you connect with &lt;code&gt;aws ssm start-session&lt;/code&gt;, you're &lt;code&gt;ssm-user&lt;/code&gt;. You don't have access to the Docker socket. Everything needs &lt;code&gt;sudo&lt;/code&gt;. Commands sent with &lt;code&gt;aws ssm send-command&lt;/code&gt; run as &lt;code&gt;root&lt;/code&gt; though, so sudo is built in.&lt;/p&gt;

&lt;h3&gt;
  
  
  buildx: no buildx, no build
&lt;/h3&gt;

&lt;p&gt;From Docker Compose v2.17+ the &lt;code&gt;--build&lt;/code&gt; flag requires buildx &amp;gt;= 0.17.0. The minimal AMI doesn't have it. Without buildx, &lt;code&gt;docker compose up --build&lt;/code&gt; fails even if no service uses &lt;code&gt;build&lt;/code&gt;: install it in the user data and forget about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  That damn cache
&lt;/h3&gt;

&lt;p&gt;After a destroy + redeploy, the new Route 53 hosted zone gets different nameservers. You update the NS records on the DNS provider, everything looks fine. But the browser says no.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dig @8.8.8.8&lt;/code&gt; tells you it's all good. But your local resolver disagrees.&lt;/p&gt;

&lt;p&gt;What happens: your ISP's resolver has the old SERVFAIL cached, and until it expires, that domain doesn't exist as far as it's concerned.&lt;/p&gt;

&lt;p&gt;The fix: temporarily switch your local DNS to Google (&lt;code&gt;8.8.8.8&lt;/code&gt;) and wait for your provider's cache to expire: they say 5-10 minutes, but sometimes (way) longer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anything else to add ?
&lt;/h2&gt;

&lt;p&gt;When it's not a workshop of a few hours but something that lasts weeks or months, it's worth investing extra effort to make the system hold up over time. But remember, it's always a temporary solution !&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More subdomains&lt;/strong&gt;: more applications on the same ALB, with routing rules, separate target groups, and potentially more containers on the same EC2 or, if needed, dedicated EC2s per service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tactical scheduling&lt;/strong&gt;: start/stop the EC2 to save money off-hours, periodic backups with EventBridge + SSM, not just at destroy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch alarms&lt;/strong&gt;: basic monitoring (CPU, disk, health check) with SNS notifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-recovery&lt;/strong&gt;: ASG with min=max=1 to replace dying instances (user data restores everything from S3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spot instances&lt;/strong&gt;: for workshops that tolerate interruptions, ~70% cost reduction&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>terraform</category>
      <category>docker</category>
      <category>aws</category>
      <category>ec2</category>
    </item>
  </channel>
</rss>
