<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Twio_AI</title>
    <description>The latest articles on DEV Community by Twio_AI (@twio_ai).</description>
    <link>https://dev.to/twio_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3963488%2Fd9ec32cc-4d27-45c7-8546-006ee42be5bf.png</url>
      <title>DEV Community: Twio_AI</title>
      <link>https://dev.to/twio_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/twio_ai"/>
    <language>en</language>
    <item>
      <title>From pg-boss to Cloud Tasks: Fixing Queue Bursts and DB Connection Failures on Serverless</title>
      <dc:creator>Twio_AI</dc:creator>
      <pubDate>Tue, 02 Jun 2026 03:23:40 +0000</pubDate>
      <link>https://dev.to/twio_ai/from-pg-boss-to-cloud-tasks-fixing-queue-bursts-and-db-connection-failures-on-serverless-ii5</link>
      <guid>https://dev.to/twio_ai/from-pg-boss-to-cloud-tasks-fixing-queue-bursts-and-db-connection-failures-on-serverless-ii5</guid>
      <description>&lt;p&gt;At Twio we picked pg-boss for our job queue, ran into trouble when we went serverless, looked at Pub/Sub, and ended up on Google Cloud Tasks. This is what each queue got right, what it got wrong for our workload, and the rule we landed on for choosing between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workload
&lt;/h2&gt;

&lt;p&gt;Twio is an AI SaaS for loan brokers. The piece that needs a job queue is email processing: download an email, parse the body and attachments, OCR, classify with an LLM, write structured data, and index for RAG. One email with five attachments easily becomes 30+ background jobs. A batch upload becomes hundreds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why pg-boss worked — until it didn't
&lt;/h2&gt;

&lt;p&gt;Our database was Postgres on Neon, so pg-boss was the obvious starting point. No extra infrastructure, and one feature we genuinely loved: &lt;strong&gt;transactional enqueue&lt;/strong&gt;. Because jobs live in the same database as business data, you can create a job in the same transaction as the row that triggered it. No dual-write problem, no "DB succeeded but the queue API failed" inconsistency.&lt;/p&gt;

&lt;p&gt;It also gave us retries, delayed jobs, dead-letter queues, dedup keys, and full SQL visibility into stuck or failed jobs. For a Postgres-first app on always-on infra, it's an excellent tool.&lt;/p&gt;

&lt;p&gt;Then we moved heavy processing to Cloud Run, and the cracks showed up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pg-boss polls. Neon suspends. They want opposite things.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-boss runs a query roughly every 1–2 seconds to look for the next job, plus maintenance queries. Neon autosuspends compute when nothing touches the database. If the queue is polling every second, Neon's idle timer never expires — you pay for always-on compute even when the queue is empty.&lt;/p&gt;

&lt;p&gt;Worse, when Neon &lt;em&gt;did&lt;/em&gt; manage to suspend, the next poll had to wake it. That wake-up takes hundreds of ms to a few seconds, and queries that triggered it would fail with &lt;code&gt;Connection terminated&lt;/code&gt;, &lt;code&gt;ECONNRESET&lt;/code&gt;, or timeouts. Pooled connections made it worse: the pool kept sockets that the server had already closed during suspend, and the next polling cycle picked one up and broke.&lt;/p&gt;

&lt;p&gt;This isn't a pg-boss bug. It's an architectural mismatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Pub/Sub wasn't the answer
&lt;/h2&gt;

&lt;p&gt;Pub/Sub is event-driven — no polling against Postgres, Neon can suspend freely. That fixed the obvious problem, but introduced a worse one for our shape of work.&lt;/p&gt;

&lt;p&gt;Pub/Sub is built to move messages &lt;strong&gt;fast&lt;/strong&gt;. We needed a queue that moves messages &lt;strong&gt;carefully&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Two specific failure modes hit us:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retry amplification.&lt;/strong&gt; A parent import job publishes 100 child parse messages, then crashes before acking. Pub/Sub redelivers the parent. The parent re-publishes 100 children. After a few retries, you have hundreds of duplicate child jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No native job-level pacing.&lt;/strong&gt; If 300 messages land at once, subscribers consume them as fast as they can — slamming our parser, Neon, the LLM provider, and third-party APIs simultaneously. Pub/Sub has flow control on the subscriber side, but it's not the kind of per-queue dispatch throttle we needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Plus the ack-deadline problem on long parse jobs, where a missed lease extension causes redelivery while the original is still running.&lt;/p&gt;

&lt;p&gt;All of these are solvable with idempotency keys, outboxes, and bounded retries — but at that point you're rebuilding what a job queue should give you out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Cloud Tasks fit
&lt;/h2&gt;

&lt;p&gt;Cloud Tasks is push-based: when a task is due, Google sends an HTTP request to our handler. When there are no tasks, nothing touches our database. That alone resolved the pg-boss/Neon conflict — Neon suspends, costs drop, no more wake-up connection errors.&lt;/p&gt;

&lt;p&gt;But the real reason it fit was &lt;strong&gt;per-queue dispatch control&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# queue.yaml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email-parse&lt;/span&gt;
  &lt;span class="na"&gt;rateLimits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;maxDispatchesPerSecond&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;maxConcurrentDispatches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
  &lt;span class="na"&gt;retryConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;minBackoff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;maxBackoff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;600s&lt;/span&gt;
    &lt;span class="na"&gt;maxDoublings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enqueue 300 tasks in a second and Cloud Tasks won't deliver them all at once — it paces dispatch to the limits we set. Our parsers, Neon, and the LLM provider stay protected from bursts.&lt;/p&gt;

&lt;p&gt;It also gives us operational levers Pub/Sub doesn't: list tasks, inspect depth, pause a queue, purge a bad batch. When a fan-out goes wrong, we can stop it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Cloud Tasks doesn't solve
&lt;/h2&gt;

&lt;p&gt;Two things, both important.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's still at-least-once.&lt;/strong&gt; A handler can finish the work and Cloud Tasks can still redeliver if the HTTP response is lost. Handlers must be idempotent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fan-out duplication is still possible.&lt;/strong&gt; If the parent creates 100 child tasks and then fails before returning 200, the retried parent creates them again. The fix here is deterministic task names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parse-{emailId}-{attachmentId}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Tasks rejects duplicate names within its retention window, so the second attempt is a no-op. But you have to design for it — it's not automatic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And it doesn't recover transactional enqueue.&lt;/strong&gt; Cloud Tasks lives outside the database, so creating a task after a DB write is a dual-write. If you need strict atomicity, the answer is still an outbox: write the business row and an outbox row in one transaction, have a relay publish to Cloud Tasks and mark the row published. No external queue makes this go away.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule we landed on
&lt;/h2&gt;

&lt;p&gt;Queue selection isn't about finding the best queue. It's about matching the queue to the runtime model.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;pg-boss&lt;/strong&gt; for small internal jobs in always-on services where Postgres transactionality matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Tasks&lt;/strong&gt; for cross-system, serverless workflows where we need to protect Neon, LLM providers, and third-party APIs from bursts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And three rules that apply regardless:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every handler is idempotent.&lt;/li&gt;
&lt;li&gt;Fan-out children have deterministic keys.&lt;/li&gt;
&lt;li&gt;If enqueue must be atomic with a business write, use an outbox.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloud Tasks fixed our infrastructure mismatch, but the real win was clarifying what the queue is responsible for. Infrastructure handles scheduling, retries, and rate limits. Correctness belongs to the application.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>cloud</category>
      <category>postgres</category>
      <category>serverless</category>
    </item>
    <item>
      <title>From pg-boss to Cloud Tasks: How Twio Solved Queue Bursts and Database Connection Failures in a Serverless Architecture</title>
      <dc:creator>Twio_AI</dc:creator>
      <pubDate>Tue, 02 Jun 2026 03:01:59 +0000</pubDate>
      <link>https://dev.to/twio_ai/from-pg-boss-to-cloud-tasks-how-twio-solved-queue-bursts-and-database-connection-failures-in-a-4alh</link>
      <guid>https://dev.to/twio_ai/from-pg-boss-to-cloud-tasks-how-twio-solved-queue-bursts-and-database-connection-failures-in-a-4alh</guid>
      <description>&lt;p&gt;At Twio, the job system looked like a solved problem early on.&lt;/p&gt;

&lt;p&gt;We started with pg-boss. It was simple, reliable, and worked directly inside PostgreSQL, which was already our core database.&lt;br&gt;
  Later, as we moved more of our workload into a serverless architecture, that same decision started creating unexpected problems:&lt;br&gt;
  database connections became fragile, Neon could not suspend cleanly, and an empty queue still kept the database awake.&lt;/p&gt;

&lt;p&gt;We then considered Pub/Sub. It solved the polling problem, but introduced a different kind of risk: messages moved too fast,&lt;br&gt;
  retries amplified fan-out workloads, and downstream services could be overwhelmed.&lt;/p&gt;

&lt;p&gt;Eventually, we moved key parts of Twio’s async pipeline to Google Cloud Tasks.&lt;/p&gt;

&lt;p&gt;This article is not about proving that one queue is universally better than another. It is about how the right queue changes&lt;br&gt;
  when your runtime, database, and workload change.&lt;/p&gt;

&lt;p&gt;## Twio’s Workload: Why We Needed a Job System&lt;/p&gt;

&lt;p&gt;Twio is an AI SaaS platform for loan brokers. It helps brokers automate daily operational work, especially around documents,&lt;br&gt;
  emails, and structured client data.&lt;/p&gt;

&lt;p&gt;One important workflow is email processing. Twio needs to download emails, parse the body and attachments, persist the data,&lt;br&gt;
  prepare RAG-ready content, classify the email context, and convert unstructured information into structured user data.&lt;/p&gt;

&lt;p&gt;That workflow is naturally asynchronous.&lt;/p&gt;

&lt;p&gt;A single email can contain multiple attachments. Each attachment may require document parsing, OCR, LLM classification, database&lt;br&gt;
  writes, and indexing. A batch of uploaded files can quickly become dozens or hundreds of background tasks.&lt;/p&gt;

&lt;p&gt;So from the beginning, Twio needed a reliable job system to coordinate these workflows.&lt;/p&gt;

&lt;p&gt;## The pg-boss Phase: A Very Reasonable First Choice&lt;/p&gt;

&lt;p&gt;Twio’s core database was PostgreSQL, running on Neon. Because of that, pg-boss was the natural first choice.&lt;/p&gt;

&lt;p&gt;It had one major advantage: no extra infrastructure.&lt;/p&gt;

&lt;p&gt;The queue lived inside the Postgres database we already had. We did not need Redis, SQS, Pub/Sub, or Cloud Tasks. For an early&lt;br&gt;
  SaaS product, this mattered. Every additional system adds deployment, monitoring, permissions, cost, and failure modes.&lt;/p&gt;

&lt;p&gt;pg-boss also had a much deeper advantage: transactional enqueue.&lt;/p&gt;

&lt;p&gt;Because jobs are stored in the same database as the business data, we could create jobs inside the same transaction as our&lt;br&gt;
  application writes. Either both the business row and the job committed, or neither did.&lt;/p&gt;

&lt;p&gt;That avoided the classic dual-write problem you get with external queues: the database write succeeds, then the queue API call&lt;br&gt;
  fails, leaving your system in an inconsistent state.&lt;/p&gt;

&lt;p&gt;pg-boss also came with many useful job semantics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delayed jobs and cron-style scheduling&lt;/li&gt;
&lt;li&gt;Retries with backoff&lt;/li&gt;
&lt;li&gt;Dead-letter queues&lt;/li&gt;
&lt;li&gt;Singleton keys and deduplication&lt;/li&gt;
&lt;li&gt;Rate limiting, throttling, and debouncing&lt;/li&gt;
&lt;li&gt;Job chaining&lt;/li&gt;
&lt;li&gt;Retention of completed jobs&lt;/li&gt;
&lt;li&gt;Full visibility through SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SQL visibility was especially useful. Jobs were just rows. We could inspect queued, failed, retried, or stuck jobs using&lt;br&gt;
  plain SQL, and build quick dashboards or debugging queries without learning a separate operational surface.&lt;/p&gt;

&lt;p&gt;For a Postgres-first system running on always-on infrastructure, pg-boss is an excellent tool.&lt;/p&gt;

&lt;p&gt;But Twio was moving toward serverless.&lt;/p&gt;

&lt;p&gt;## The Serverless Problem: pg-boss and Neon Wanted Opposite Things&lt;/p&gt;

&lt;p&gt;As Twio grew, we moved more document parsing and heavy processing modules to serverless services such as Cloud Run. At the same&lt;br&gt;
  time, we continued using Neon as our serverless Postgres database.&lt;/p&gt;

&lt;p&gt;This is where pg-boss started to hurt.&lt;/p&gt;

&lt;p&gt;pg-boss is a polling-based queue. It regularly runs queries to find the next available job. The default polling interval is&lt;br&gt;
  around two seconds, and many teams tune it closer to one second. It also runs maintenance and monitoring queries.&lt;/p&gt;

&lt;p&gt;That means even when there are no jobs, pg-boss still keeps sending queries to Postgres.&lt;/p&gt;

&lt;p&gt;Neon, on the other hand, is designed to autosuspend compute after a period of inactivity. If nothing touches the database, Neon&lt;br&gt;
  can scale down. But if pg-boss polls every second, Neon’s idle timer keeps resetting.&lt;/p&gt;

&lt;p&gt;This created two problems.&lt;/p&gt;

&lt;p&gt;The first was cost.&lt;/p&gt;

&lt;p&gt;If the queue keeps polling, the database never really becomes idle. Neon compute stays awake, and the main cost advantage of&lt;br&gt;
  serverless Postgres disappears.&lt;/p&gt;

&lt;p&gt;The second was connection stability.&lt;/p&gt;

&lt;p&gt;If Neon had already suspended and pg-boss fired a polling query, Neon had to wake the compute. That wake-up can take hundreds of&lt;br&gt;
  milliseconds or a few seconds. During that window, the query that triggered the wake-up could time out or get dropped, causing&lt;br&gt;
  errors such as Connection terminated, ECONNRESET, or connection timeouts.&lt;/p&gt;

&lt;p&gt;Connection pools made this worse. A pool could hold sockets that were already invalid because the server-side connection had&lt;br&gt;
  been closed during suspend. The next polling cycle would pick up a stale connection and fail.&lt;/p&gt;

&lt;p&gt;This was not really a pg-boss bug. It was an architectural mismatch.&lt;/p&gt;

&lt;p&gt;pg-boss wants a database that is always online and ready to answer polling queries. Neon wants to scale to zero when there is no&lt;br&gt;
  real work. Those two models fight each other.&lt;/p&gt;

&lt;p&gt;So we needed a queue that did not keep touching the database when there was no work.&lt;/p&gt;

&lt;p&gt;## Considering Pub/Sub: Event-Driven, But Too Fast&lt;/p&gt;

&lt;p&gt;The obvious next candidate was GCP Pub/Sub.&lt;/p&gt;

&lt;p&gt;Pub/Sub is event-driven. There is no polling loop constantly hitting Postgres. When there is no work, the database can stay idle&lt;br&gt;
  and Neon can suspend freely. That seemed like the right fix for the pg-boss problem.&lt;/p&gt;

&lt;p&gt;But Pub/Sub introduced a different issue.&lt;/p&gt;

&lt;p&gt;Pub/Sub is a high-throughput messaging system. It is great at moving messages quickly. But Twio needed a controlled job&lt;br&gt;
  pipeline, not just a fast message bus.&lt;/p&gt;

&lt;p&gt;Our workload often involves fan-out. For example, one email import job may create 100 child parse jobs.&lt;/p&gt;

&lt;p&gt;With Pub/Sub’s at-least-once delivery model, duplicates are normal. If the parent import job publishes child messages and then&lt;br&gt;
  fails before acking, Pub/Sub redelivers the parent message. The parent runs again and publishes another 100 child messages.&lt;/p&gt;

&lt;p&gt;After a few retries, you do not just have a retried parent job. You have hundreds of duplicate child jobs.&lt;/p&gt;

&lt;p&gt;This is retry amplification.&lt;/p&gt;

&lt;p&gt;Pub/Sub also does not provide native job-level rate limiting in the way we needed. Subscribers consume as fast as they can. If&lt;br&gt;
  300 messages appear at once, they can quickly hit the parser, database, LLM provider, and third-party APIs at the same time.&lt;/p&gt;

&lt;p&gt;For Twio, that was dangerous. Our downstream systems were not designed to absorb unlimited bursts.&lt;/p&gt;

&lt;p&gt;There is also the ack-deadline problem. If a long-running parse job exceeds the ack deadline and the lease is not extended&lt;br&gt;
  correctly, Pub/Sub assumes the job failed and redelivers it, possibly while the original job is still running.&lt;/p&gt;

&lt;p&gt;These problems can be managed, but they require additional design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idempotency keys for every job&lt;/li&gt;
&lt;li&gt;Fan-out separated from retryable work&lt;/li&gt;
&lt;li&gt;An outbox or staging table for child jobs&lt;/li&gt;
&lt;li&gt;Bounded retries and dead-letter topics&lt;/li&gt;
&lt;li&gt;Subscriber-side flow control&lt;/li&gt;
&lt;li&gt;Exponential backoff retry policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson was clear: Pub/Sub solved the polling problem, but it did not give us the dispatch control we needed.&lt;/p&gt;

&lt;p&gt;It was too good at delivering messages quickly.&lt;/p&gt;

&lt;p&gt;We needed a queue with a built-in throttle.&lt;/p&gt;

&lt;p&gt;## Why Cloud Tasks Fit Better&lt;/p&gt;

&lt;p&gt;Cloud Tasks was a better match for this part of Twio’s architecture.&lt;/p&gt;

&lt;p&gt;It is push-based. Google manages the queue, and when a task is due, Cloud Tasks sends an HTTP request to our handler. If there&lt;br&gt;
  are no tasks, it does not touch our database.&lt;/p&gt;

&lt;p&gt;That solved the pg-boss and Neon conflict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No polling loop against Postgres&lt;/li&gt;
&lt;li&gt;Neon can suspend when there is no work&lt;/li&gt;
&lt;li&gt;No constant database wake-ups&lt;/li&gt;
&lt;li&gt;Fewer connection errors around suspend and resume&lt;/li&gt;
&lt;li&gt;No always-on database cost caused by an empty queue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the bigger reason Cloud Tasks worked for us was dispatch control.&lt;/p&gt;

&lt;p&gt;Each queue can be configured with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;maxDispatchesPerSecond&lt;/li&gt;
&lt;li&gt;maxConcurrentDispatches&lt;/li&gt;
&lt;li&gt;maxAttempts&lt;/li&gt;
&lt;li&gt;minBackoff&lt;/li&gt;
&lt;li&gt;maxBackoff&lt;/li&gt;
&lt;li&gt;maxDoublings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that even if we enqueue 300 tasks in one second, Cloud Tasks does not have to deliver all of them immediately. It can&lt;br&gt;
  pace dispatch according to the limits we define.&lt;/p&gt;

&lt;p&gt;That protects our parsers, Neon, LLM providers, and downstream APIs.&lt;/p&gt;

&lt;p&gt;Cloud Tasks also gives better operational control than Pub/Sub for this workload. We can list tasks, inspect queue depth, pause&lt;br&gt;
  a queue, and purge a bad batch. When a bad fan-out happens, we have a way to stop and recover.&lt;/p&gt;

&lt;p&gt;That operational control matters in a production SaaS system.&lt;/p&gt;

&lt;p&gt;## What Cloud Tasks Does Not Solve&lt;/p&gt;

&lt;p&gt;Cloud Tasks fixed our infrastructure mismatch, but it did not remove the need for correctness design.&lt;/p&gt;

&lt;p&gt;It is still an at-least-once system. A handler can finish the work, but if the HTTP response is lost or times out, Cloud Tasks&lt;br&gt;
  may dispatch the task again.&lt;/p&gt;

&lt;p&gt;So handlers still need to be idempotent.&lt;/p&gt;

&lt;p&gt;Fan-out amplification can also still happen if you design it badly. Suppose an import task creates 100 parse tasks and then&lt;br&gt;
  fails before returning 200. Cloud Tasks retries the import task. If the import task creates the same 100 child tasks again, you&lt;br&gt;
  still get duplicates.&lt;/p&gt;

&lt;p&gt;Cloud Tasks gives us a cleaner way to solve this: deterministic task names.&lt;/p&gt;

&lt;p&gt;For example, child tasks can be named using business identifiers such as:&lt;/p&gt;

&lt;p&gt;parse-{emailId}-{attachmentId}&lt;/p&gt;

&lt;p&gt;If the parent task retries and tries to create the same child task again, the duplicate task name can be rejected or&lt;br&gt;
  deduplicated within Cloud Tasks’ retention window.&lt;/p&gt;

&lt;p&gt;But this is not automatic. You have to design for it.&lt;/p&gt;

&lt;p&gt;Cloud Tasks also does not recover pg-boss’s strongest feature: transactional enqueue.&lt;/p&gt;

&lt;p&gt;Because Cloud Tasks is outside the database, creating a task after writing business data is still a dual-write operation. The&lt;br&gt;
  database write can succeed, and the Cloud Tasks API call can fail.&lt;/p&gt;

&lt;p&gt;If strict atomicity is required, the right pattern is still a transactional outbox:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the business data and outbox row in the same database transaction.&lt;/li&gt;
&lt;li&gt;A separate relay reads the outbox.&lt;/li&gt;
&lt;li&gt;The relay publishes tasks to Cloud Tasks.&lt;/li&gt;
&lt;li&gt;The relay marks outbox rows as published.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No external queue can magically solve this. It has to be handled at the architecture level.&lt;/p&gt;

&lt;p&gt;## The Final Selection Methodology&lt;/p&gt;

&lt;p&gt;The biggest lesson for us was that queue selection is not about finding the “best” tool. It is about matching the tool to the&lt;br&gt;
  workload and runtime model.&lt;/p&gt;

&lt;p&gt;pg-boss is still a strong option for internal jobs that run in an always-on service and need tight transactional consistency&lt;br&gt;
  with Postgres. Its transactional enqueue and SQL visibility are real advantages.&lt;/p&gt;

&lt;p&gt;Pub/Sub is excellent for event broadcasting and high-throughput system integration. But for long-running, side-effect-heavy,&lt;br&gt;
  fan-out job pipelines, it requires careful idempotency, flow control, and retry design.&lt;/p&gt;

&lt;p&gt;Cloud Tasks is the best fit for Twio’s serverless-heavy business workflows where we need controlled concurrency, bounded&lt;br&gt;
  retries, and protection for downstream systems.&lt;/p&gt;

&lt;p&gt;Our current approach is:&lt;/p&gt;

&lt;p&gt;Use pg-boss for small internal jobs that benefit from Postgres transactionality and run in stable, always-on environments.&lt;/p&gt;

&lt;p&gt;Use Cloud Tasks for cross-system, heavy, serverless workflows, especially when we need to protect third-party APIs, LLM&lt;br&gt;
  providers, parsers, or Neon from bursts.&lt;/p&gt;

&lt;p&gt;And regardless of the queue, we keep three rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every handler must be idempotent.&lt;/li&gt;
&lt;li&gt;Fan-out child jobs must have deterministic keys or another deduplication mechanism.&lt;/li&gt;
&lt;li&gt;If enqueueing must be atomic with a business write, use the outbox pattern.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloud Tasks solved the operational problems that pg-boss and Pub/Sub created in our serverless setup. But the deeper improvement&lt;br&gt;
  was not just changing the queue. It was clarifying what the queue should and should not be responsible for.&lt;/p&gt;

&lt;p&gt;Infrastructure can help with scheduling, retries, and rate limits.&lt;/p&gt;

&lt;p&gt;Correctness still belongs to the application design.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>infrastructure</category>
      <category>postgres</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
