DEV Community: Moon Robert

GitHub Copilot vs Cursor vs Windsurf: Which AI Coding Assistant Actually Makes You Faster in 2026

Moon Robert — Mon, 09 Mar 2026 20:45:52 +0000

Two weeks. Three tools. One Next.js codebase I actually ship to users.

I'll be upfront: I went in expecting Cursor to win. I've been using it for about eight months and it became my default IDE somewhere around last summer. But my company started rolling out GitHub Copilot Enterprise licenses, and Windsurf kept appearing in threads where people claimed it was "doing things Cursor can't." So I did the thing — I actually rotated tools on real work, not toy projects.

My setup: MacBook Pro M3 Max, TypeScript/React frontend with a Node.js API layer, roughly 85k lines of real production code. Three-person team. I switched tools every few days across actual tasks: building a new billing dashboard, refactoring our OAuth flow, and adding test coverage to a module that had basically none. High-stakes enough that the dumb suggestions were obvious and painful, real enough that good suggestions genuinely saved me.

Here's what I found.

Inline Autocomplete Is Table Stakes, But There Are Real Differences

All three tools are good at autocomplete now. I want to be honest about that — if you're still deciding on the basis of "which one finishes my for loops faster," that's probably not the right question anymore.

That said, Copilot's completions feel the most conservative. It completes what you're typing. Cursor and Windsurf are more willing to speculate — they'll sometimes complete two or three lines ahead based on what they think you're trying to do, which is fantastic when they're right and mildly annoying when they're not.

I noticed Cursor's ghost text tends to drift toward patterns it's seen in the rest of your file. Windsurf does something similar but pulls context from further away — I had it correctly infer a helper function signature from a file I hadn't opened in the current session. Surprising the first time it happens.

One practical note: if you have a large TypeScript project with complex generics, Copilot gets confused more often than the other two. I don't know exactly why — probably model differences and how they handle the type context — but I hit this several times while working on our billing module, which is deep in generic utility types. Cursor and Windsurf both handled it better.

The winner here is basically a tie between Cursor and Windsurf, with Copilot slightly behind in complex type-heavy situations. Your mileage may vary if your codebase is mostly Python or Go.

Multi-File Editing Is Where You Either Win or Lose Hours

This is the actual battleground in 2026. The ability to say "refactor this auth flow to use the new session model" and have the tool understand that it needs to touch six files, in the right order, without breaking the interfaces between them — that's the capability that separates the tools now.

Cursor's Composer mode is mature. I've been using it for months and it has a very good intuition for dependency order. When I refactored our OAuth flow, I described what I wanted in a few sentences, and it correctly identified the files it needed to touch, showed me a plan, and executed it in a way that was maybe 85% right on the first pass. The remaining 15% was stuff I had to correct, but it surfaced the corrections clearly — it didn't silently do the wrong thing.

Windsurf's Cascade is — okay, let me back up a second, because I was skeptical of this one. Codeium has been around for a while and I always thought of them as the "free tier" option, not a serious competitor. Cascade surprised me. The "flows" concept, where it tracks what it changed and why across a multi-step edit, gave me way more confidence in what it was doing. At one point I had it touch eight files to update our API client and it completed the whole thing without breaking a single type contract. I pushed this on a Friday afternoon thinking it would definitely need cleanup, and it just... didn't.

(I also learned, the hard way, that if you interrupt Cascade mid-flow — close the panel, switch files before it finishes — it does not recover gracefully. Did this twice and ended up with half-applied changes that were more work to untangle than the original task. Don't do that.)

GitHub Copilot's agent mode exists but it felt less confident in multi-file situations. It would often complete the primary file change correctly but then ask clarifying questions about the secondary files rather than just doing it. Which — maybe that's a design choice, and maybe it's the right one if you want more control. But in flow state, the extra confirmation prompts broke my concentration.

Honest verdict for multi-file work: Windsurf slightly edges Cursor here, which I did not expect to say. Copilot agent mode lags behind both.

Context and Chat: Who Actually Knows Your Codebase

Here is the thing: there's a meaningful difference in how these tools understand your project, not just your open files. And it compounds over a full workday in ways that are hard to measure but easy to feel.

Cursor's @codebase indexing is solid and it updates incrementally as you work. When I asked it "where does our session token get validated?" it found the right middleware in about two seconds and gave me a useful summary with line references. Cursor Chat has become my default way to navigate unfamiliar parts of our repo — I use it more for exploration than for code generation at this point.

Copilot Chat has improved a lot. The enterprise tier has workspace context and it does understand cross-file relationships better than it did six months ago. Where I found it weaker is in remembering the conversation thread — it loses context faster than Cursor does across a long session. I was debugging a gnarly race condition in our WebSocket handler, and about fifteen messages in, Copilot Chat started answering as if it had forgotten what I told it earlier. Cursor maintained the thread correctly.

Windsurf's chat experience is good but slightly less polished in the UI. The context retrieval is excellent — arguably on par with Cursor — but the conversation flow feels a bit rougher. It's clearly an area they're still building. They shipped a significant update sometime in February, so this might already be different by the time you read this.

One thing I noticed: Windsurf surfaces potential side effects more proactively. I asked it to "just quickly add a rate limiter to this endpoint" and before writing anything, it flagged that the function I was editing was called from three other places and asked if I wanted the rate limiting applied there too. Copilot and Cursor both just modified the function I pointed at. Small thing, but it saved me from a real bug.

Pricing, Lock-In, and Practical Reality for Teams

I can't write this comparison without addressing cost because it actually shapes how you use these tools.

GitHub Copilot Individual is $10/month. Copilot Business is $19/user/month. Copilot Enterprise — which is what my company has — is $39/user/month. At that tier you get better codebase context, access to different models (you can switch to Claude or GPT-4o depending on the task), and some organizational policy controls that matter for compliance-heavy teams.

Cursor Pro is $20/month. For that you get 500 "fast" requests per month and unlimited slow ones. In practice, I burned through fast requests faster than I expected during my two-week test, particularly when using Composer heavily. There's also a Cursor Business tier at $40/user that adds privacy mode, centralized billing, and the usual team management stuff.

Windsurf Pro is $15/month as of this writing — the lowest of the three. They also have a free tier that's usable beyond just a trial. The business tier is $35/user. If your team has skeptics who want to try before committing, point them at the Windsurf free tier first.

Lock-in is a real consideration. Cursor is a fork of VS Code, which means if you have VS Code extensions, keybindings, and settings — they mostly work. Windsurf is also VS Code-based. GitHub Copilot works inside VS Code, JetBrains IDEs, Neovim, and basically everything else, which matters if your team isn't all on the same editor. I have one teammate on Neovim who can't use Cursor or Windsurf in their normal flow without a full context switch. For him, Copilot is the only real option.

My Actual Recommendation

I promised not to hedge this, so here it is.

If you're a solo developer or working on a small team, all on VS Code or willing to use a VS Code fork: use Windsurf. It's cheaper than Cursor, the multi-file editing is excellent, and the proactive side-effect detection has already saved me at least one regression. The UX is slightly rougher in places, but it's closing the gap fast.

If you're already deep in the Cursor ecosystem with months of muscle memory and your team workflows built around it: stay on Cursor. The tool is excellent, the context understanding is mature, and switching for a marginal improvement in one area doesn't make sense. The grass is only slightly greener.

If you're on a mid-size or larger team with mixed editors, JetBrains users, or compliance requirements: GitHub Copilot Enterprise is the practical answer. It's not the most impressive single-tool experience, but the breadth of integration matters when you have fifteen engineers with different setups. The ability to toggle models is also useful — for certain tasks, I found switching to Claude 3.7 inside Copilot gave better results than the default model.

Look, I went into this thinking Copilot's momentum and GitHub's distribution would make it the dominant tool by default. What I found instead is that Windsurf earned its way into my workflow on merit — not a marginal autocomplete difference but a noticeably better experience for the multi-file refactoring work that makes up maybe 40% of my actual day.

I'm writing this in Windsurf right now. Two weeks ago I wouldn't have said that.

TypeScript 5.x en 2026: Las Funcionalidades que Realmente Importan en Producción

Moon Robert — Mon, 09 Mar 2026 20:20:12 +0000

Llevo casi dos años usando TypeScript 5.x activamente — primero en un proyecto personal, luego en producción con un equipo de seis personas construyendo una plataforma SaaS B2B. No he probado cada feature de cada minor release, pero sí he ido adoptando las que tenían sentido para nuestro stack: Next.js en el frontend, NestJS en la API principal, un par de workers con Bun corriendo tareas de procesamiento en background.

Este artículo no es una lista de todo lo que salió. Es sobre lo que realmente usé, lo que me funcionó, y honestamente, lo que me decepcionó un poco.

Decoradores Estables: La Historia de Nunca Acabar que Por Fin Acabó

Si llevas tiempo en el ecosistema TypeScript, sabes que los decoradores experimentales de experimentalDecorators: true estuvieron rondando durante demasiado tiempo. TypeScript 5.0, lanzado en marzo de 2023, finalmente implementó el estándar ECMAScript de decoradores. No son los mismos decoradores de antes. Son mejores, pero implican una migración real que nadie te avisa que va a doler un poco.

Migramos nuestros controladores de NestJS a los nuevos decoradores durante el primer trimestre de 2024. Fue más suave de lo que esperaba — la mayoría de librerías ya habían añadido soporte — pero encontré un par de casos raros con decoradores en propiedades de clase donde el comportamiento difería del sistema legacy. Uno de esos casos me tuvo depurando durante una tarde entera porque el error en runtime no mencionaba los decoradores para nada.

// Decorador de clase con el nuevo estándar ECMAScript
function singleton<T extends { new(...args: any[]): {} }>(
  target: T,
  context: ClassDecoratorContext
) {
  let instance: InstanceType<T> | undefined;

  return class extends target {
    constructor(...args: any[]) {
      if (instance) return instance;
      super(...args);
      instance = this as unknown as InstanceType<T>;
    }
  } as T;
}

@singleton
class DatabasePool {
  connection = createConnection();
}

El nuevo parámetro context es lo que más me gustó. Tienes acceso al nombre, al tipo del decorador, y a addInitializer para ejecutar lógica post-construcción. Es mucho más explícito que el sistema anterior, donde básicamente estabas adivinando el orden de ejecución en algunos casos edge.

Lo que me decepcionó: no hay compatibilidad automática con el código legacy. Si tienes decoradores propios escritos para el sistema experimental, los tienes que reescribir desde cero. En nuestro caso fueron cuatro decoradores internos — un par de horas de trabajo — pero conozco equipos con docenas de decoradores propios que tardaron semanas en la migración completa. Haz el inventario de decoradores propios antes de comprometerte con una fecha. Y no hagas la migración el viernes por la tarde. Yo aprendí eso de la manera difícil.

`const` en Genéricos y `NoInfer`: Dos Cambios Pequeños, Mucho Menos Ruido

¿Cuántas veces terminaste con un tipo más ancho de lo que querías, el código compiló, los tests pasaron, y seis semanas después alguien pasó un valor inválido que el tipo debería haber rechazado? Eso es exactamente lo que resuelven estos dos — const type parameters en 5.0 y NoInfer en 5.4 — aunque llegaron en versiones distintas.

const en genéricos es simple pero resuelve algo que antes requería un as const en cada callsite:

// Sin const: T se infiere como string[]
function createRoute<T extends string[]>(paths: T): T {
  return paths;
}
const routes = createRoute(['/', '/about', '/contact']);
// tipo inferido: string[] — demasiado amplio

// Con const: T se infiere como el tuple literal exacto
function createRoute<const T extends string[]>(paths: T): T {
  return paths;
}
const routes = createRoute(['/', '/about', '/contact']);
// tipo inferido: readonly ['/', '/about', '/contact'] — perfecto

Llevaba años escribiendo as const en cada callsite de funciones similares. Esto lo elimina. En nuestro sistema de configuración de rutas, donde necesitamos los tipos literales para generar breadcrumbs y validaciones, fue un cambio de calidad de vida enorme.

NoInfer es más sutil — tardé tiempo en ver cuándo usarlo, pensé que era un caso edge raro, y resulta que no. El caso principal: tienes un genérico T que se infiere desde un argumento, pero no quieres que otro argumento influya en esa inferencia y la amplíe inadvertidamente.

function setDefault<T>(
  values: T[],
  defaultValue: NoInfer<T>
): T[] {
  return values.length ? values : [defaultValue];
}

setDefault(['a', 'b'], 'c'); // OK
setDefault(['a', 'b'], 42);  // Error: number no es asignable a string

Antes de NoInfer, teníamos que recurrir a técnicas como T & {} o reestructurar la firma entera. Uno de los desarrolladores del equipo lo usó en una función de validación de feature flags y cerró dos bugs de runtime que llevaban meses en el backlog. Los bugs existían porque TypeScript aceptaba valores inválidos por cómo estaban estructuradas las firmas genéricas. Pequeño cambio, impacto real.

`using` para la Gestión de Recursos: Más Útil de lo que Pensé

TypeScript 5.2 implementó using y await using, basados en la propuesta de Explicit Resource Management de TC39. Cuando lo vi por primera vez pensé "interesante, pero cuándo lo uso realmente en mi día a día". La respuesta: más seguido de lo que esperaba.

Funciona así: cualquier objeto que implemente Symbol.dispose (o Symbol.asyncDispose) se limpia automáticamente al salir del scope. Como el using de C# o los context managers de Python:

class DatabaseTransaction {
  private committed = false;

  constructor(private db: Database) {}

  commit() {
    this.db.commit();
    this.committed = true;
  }

  [Symbol.dispose]() {
    if (!this.committed) {
      this.db.rollback();
    }
  }
}

async function transferFunds(from: string, to: string, amount: number) {
  using transaction = new DatabaseTransaction(db);

  await db.debit(from, amount);
  await db.credit(to, amount);
  transaction.commit();
  // Si algo lanza antes del commit, el rollback ocurre automáticamente al salir del scope
}

Lo que me sorprendió genuinamente fue empezar a ver cuántos lugares en nuestra codebase teníamos bloques try/finally para cleanup que se podían simplificar con using. Conexiones a caches temporales, file handles en workers de procesamiento de CSV, clients de APIs externas que necesitan un .close() explícito. No es que el código fuera incorrecto antes — pero era más verboso y, lo que es peor, más fácil de olvidar el cleanup en paths de error que nadie testea.

En un proyecto de frontend puro probablemente no lo uses mucho. Pero si tienes workers, scripts de migración de datos, o código de servidor con gestión explícita de recursos, vale la pena adoptarlo.

`isolatedDeclarations` en Monorepos: El Antes y el Después de Nuestros Builds

Esta es, para nuestro equipo específico, la funcionalidad más impactante de toda la serie 5.x. Llegó en TypeScript 5.5 y resolvió un problema que yo ni sabía que tenía nombre.

El problema: en un monorepo con múltiples paquetes TypeScript, el compilador necesita procesar todos los archivos de un paquete para generar sus .d.ts de declaraciones de tipos. Esto hace que el build sea inherentemente secuencial en ciertos puntos — un paquete no puede emitir sus tipos hasta que termina de compilar completamente, y los paquetes que dependen de él tienen que esperar.

isolatedDeclarations: true en el tsconfig añade una restricción: todas las exportaciones públicas deben tener tipos explícitos anotados, no pueden depender solo de inferencia. A cambio, herramientas como tsc, esbuild y otras pueden generar los .d.ts en paralelo sin necesitar procesar los archivos de dependencias primero.

Nuestro monorepo tiene doce paquetes. Con el setup anterior, el build completo en CI — incluyendo generación de tipos — tardaba entre 3 minutos 45 segundos y 4 minutos 10 segundos dependiendo del runner. Después de habilitar isolatedDeclarations y ajustar nuestro pipeline de Turborepo para aprovechar la paralelización, bajamos a 2 minutos 20 segundos de forma consistente. No es lineal con el número de paquetes — el beneficio depende mucho de la topología de dependencias — pero el impacto en nuestro caso fue muy tangible.

El coste es real: tienes que añadir anotaciones de tipo explícitas donde antes te apoyabas en inferencia para las exportaciones públicas. Nuestro linter marca automáticamente las violaciones con una regla propia, así que el burden en el día a día no es grande. El setup inicial requirió medio día de limpieza. Si tienes un monorepo TypeScript y no estás mirando esto, empieza por aquí antes de mirar cualquier otra optimización de build.

Predicados de Tipo Inferidos: El Bug que Llevaba Tres Meses en el Radar

TypeScript 5.5 también trajo algo que parece menor pero que tiene una elegancia conceptual que me gusta: el compilador ahora puede inferir que una función es un type predicate sin que tú lo declares explícitamente, siempre que la lógica sea clara.

Antes, si querías narrowing automático en el callsite, tenías que anotar el return type manualmente:

// Antes de 5.5: anotación explícita obligatoria para que funcione el narrowing
function isString(value: unknown): value is string {
  return typeof value === 'string';
}

// TypeScript 5.5+: inferencia automática del type predicate
const isDefinedString = (v: string | undefined) => v !== undefined && v.length > 0;

const items = ['hello', undefined, 'world', undefined, ''];
const definedItems = items.filter(isDefinedString);
// definedItems ahora es string[], no (string | undefined)[]
// antes necesitabas el as string[] o anotar isDefinedString explícitamente

Tenía un bug — o más bien, un // @ts-ignore vergonzoso — en nuestro pipeline de procesamiento de eventos donde hacíamos .filter(Boolean) y luego teníamos que castear manualmente porque TS no infería el narrowing. Era de esos parches que vives con durante semanas hasta que te molesta lo suficiente como para investigarlo de verdad. Cuando actualicé a 5.5, desapareció solo. No tuve que tocar el código.

Una advertencia: si tienes predicados con lógica muy compleja, el compilador puede no inferirlo automáticamente y necesitarás la anotación explícita. Para los casos comunes de filtrado y narrowing básico, funciona sorprendentemente bien.

Después de dos años con TypeScript 5.x en producción: no actualices en bloque esperando una transformación mágica. Actualiza de forma incremental y adopta activamente isolatedDeclarations si tienes un monorepo, using si manejas recursos con cleanup explícito, y los const genéricos si tienes APIs que se benefician de inferencia más precisa. Los decoradores estables merecen la migración, pero planifícala — no la hagas el viernes por la tarde.

TypeScript 5.x no reinventó el lenguaje. Lo que hizo fue cerrar brechas que llevaban años abiertas, y en producción eso vale más que cualquier feature nueva que suene impresionante en un changelog pero que rara vez tocas en el trabajo real.

Serverless vs Contenedores en 2026: Guía Práctica de Decisión para Equipos de Backend

Moon Robert — Mon, 09 Mar 2026 20:19:42 +0000

El año pasado mi equipo — éramos cinco ingenieros en ese momento — llevaba casi dos años con toda la infraestructura de backend en AWS Lambda. Funciones Python para procesar eventos, APIs síncronas, pipelines de datos. Todo serverless. Estábamos orgullosos de eso.

Entonces empezamos a integrar modelos de ML propios. Y todo empezó a crujir.

Este post no es un benchmark neutral. Es cómo tomamos la decisión, qué nos sorprendió en el camino, y qué haría diferente si empezara desde cero en 2026.

Dos Años en Lambda: Lo que Funcionó y el Límite que No Vi Venir

Antes de hablar de contenedores, tengo que ser honesto sobre lo bueno. Lambda funcionó muy bien durante bastante tiempo. Procesábamos unos 2-3 millones de eventos diarios provenientes de webhooks de tiendas — Shopify principalmente — y el modelo de pago por ejecución nos ahorraba mucho comparado con servidores corriendo 24/7 con cargas variables. En los meses de baja temporada, la factura de compute caía casi a cero. Eso es real y no hay que subestimarlo.

El problema llegó cuando empezamos a correr modelos de embeddings para recomendaciones de producto. Nuestro primer modelo de sentence-transformers pesaba unos 420MB solo el checkpoint. Lambda tiene un límite de 250MB para el paquete de deployment sin comprimir, y aunque podés cargar modelos desde S3 al iniciar, eso disparaba los cold starts a entre 8 y 12 segundos. Para una API síncrona, eso es inaceptable.

Intenté workarounds. Cargué el modelo de forma lazy, exploré Lambda SnapStart (tuvimos que reescribir parte del pipeline, no valió la pena), probé contenedores de Lambda que permiten hasta 10GB de imagen. Eso último ayudó un poco, pero el cold start seguía siendo entre 3 y 5 segundos para el modelo grande. Ninguna de las tres opciones era satisfactoria — y lo peor era que cada workaround generaba su propia deuda técnica.

El patrón que me tardó en ver: Lambda sigue siendo excelente para cargas event-driven con dependencias ligeras. En cuanto metés ML o cualquier proceso que requiera estado caliente persistente en memoria, empezás a pelear contra la plataforma en lugar de con ella.

Donde los Contenedores Ganaron la Discusión Interna

La migración a ECS Fargate para las cargas de ML no fue una decisión feliz. Fue una decisión forzada.

Lo primero que noté al mover los pipelines de inferencia a Fargate fue el control. Aunque — déjame retroceder un segundo, porque Fargate no es simple. Tuve que escribir task definitions, ajustar los límites de CPU y memoria, configurar IAM roles específicos, y entender cómo funcionan los scaling policies en detalle. Eso tomó tiempo. Pero una vez listo, tener un contenedor con Python 3.12, CUDA, y todas las dependencias ML corriendo sin restricciones artificiales de tamaño era... alivio, honestamente.

Acá va un ejemplo simplificado de cómo pasamos de una Lambda con carga de modelo a un servicio en Fargate:

# Antes: Lambda handler con cold start doloroso en cada invocación fría
import json
from sentence_transformers import SentenceTransformer

# Se ejecuta en cada cold start — entre 8-12s con el modelo grande
model = SentenceTransformer('all-mpnet-base-v2')  # 420MB en disco

def handler(event, context):
    texts = [record['body'] for record in event['Records']]
    embeddings = model.encode(texts)
    # guardar embeddings en DynamoDB...
    return {'statusCode': 200}

# Después: servicio FastAPI en Fargate — modelo cargado una vez, persiste en memoria
from fastapi import FastAPI
from sentence_transformers import SentenceTransformer
import uvicorn

app = FastAPI()

# Se carga una vez cuando arranca el contenedor, no en cada request
model = SentenceTransformer('all-mpnet-base-v2')

@app.post("/embeddings")
async def generate_embeddings(texts: list[str]):
    # batch_size=32 para aprovechar paralelismo del modelo en memoria
    embeddings = model.encode(texts, batch_size=32)
    return {"embeddings": embeddings.tolist()}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

La diferencia en latencia fue brutal. P99 bajó de ~9s a ~180ms para el mismo modelo en los mismos requests. El modelo cargado una vez en memoria versus recargado en cada cold start es una diferencia que parece obvia en retrospectiva, pero cuesta verla cuando llevás años pensando en términos de funciones stateless.

Lo que no me gustó de Fargate: el costo base. Con Lambda pagás exactamente lo que usás. Con Fargate, si tenés un task corriendo 24/7 para mantener el modelo caliente en memoria, pagás por esas horas aunque el tráfico sea mínimo a las 3am. Para nuestro workload procesando unos 50k requests por día con picos horarios marcados, el costo mensual en Fargate fue entre 3x y 4x más caro que lo que habríamos pagado con Lambda sin el problema del cold start.

Right, so — ¿cómo justificamos el gasto? Rendimiento medible. Nuestros clientes notaban la diferencia y los datos lo confirmaron: conversion rate en las páginas de recomendaciones subió un 12% cuando la latencia bajó a menos de 200ms. Eso cerró la discusión interna.

Takeaway práctico: Fargate tiene sentido cuando necesitás estado caliente en memoria, dependencias pesadas, o procesos de más de 15 minutos. El costo base es real — hay que justificarlo con impacto concreto de negocio, no con argumentos técnicos.

El Hallazgo que Me Confundió Durante Semanas: Cloud Run

Un colega me recomendó Google Cloud Run para un servicio nuevo — una API de procesamiento de imágenes que necesitaba OpenCV y algo de lógica custom. Cloud Run básicamente te permite correr contenedores Docker de forma serverless: escala a cero cuando no hay tráfico, pero al llegar requests escala contenedores completos.

Mi primera reacción fue: "¿esto no es simplemente Lambda pero con Docker?" Y parcialmente sí. Pero la diferencia práctica me tomó tiempo entender, y creo que la confusión viene de que la distinción no es obvia en la documentación.

Con Lambda container images, vos definís tu imagen pero Lambda gestiona el runtime con sus propias restricciones: timeouts máximos de 15 minutos, el modelo de ejecución de funciones, limitaciones de concurrencia que requieren configuración explícita. Con Cloud Run, el contrato es distinto: vos exponés un servidor HTTP y Cloud Run gestiona cuántas instancias corren. Tu código puede tener estado dentro de una instancia mientras esté viva. Podés usar WebSockets. No existe límite de 15 minutos por request.

Probé Cloud Run para ese servicio de imágenes y el cold start fue de aproximadamente 600-900ms con una imagen de ~1.2GB. No tan rápido como un contenedor siempre encendido, pero mucho más barato en cargas variables. Lo que más me sorprendió fue el pricing model: Cloud Run cobra por tiempo de CPU y memoria durante el procesamiento de requests, no por tiempo de instancia activa (a menos que configures min-instances > 0). Para cargas intermitentes medianas, eso puede ser significativamente más barato que Fargate.

No lo probé a más de 800-1000 requests por segundo concurrentes, así que no puedo hablar de ese rango con seguridad. Pero para nuestro caso, Cloud Run resolvió el equilibrio entre costo y performance de una forma que ni Lambda pura ni Fargate podían ofrecer por separado.

Takeaway práctico: Si estás viendo serverless y contenedores como dos mundos completamente separados, te estás perdiendo opciones intermedias que en 2026 están bastante maduras. Cloud Run y AWS App Runner son probablemente el punto de partida correcto para más proyectos de los que la gente considera.

Los Números Reales Comparados (el Ejercicio que Me Pidió el CFO)

Armé esto después de que me pidieran justificar una factura de AWS que había subido un 40% en tres meses. Empujé esa migración de infra un viernes por la tarde — error clásico — y el proceso de validación de costos duró dos semanas. En retrospectiva, debería haberlo hecho con más datos antes de mover nada.

Tres workloads representativos, costos aproximados mensuales en us-east-1:

Workload A: Procesamiento de webhooks (2M eventos/día, avg 200ms de ejecución)
Lambda: ~$45/mes. Fargate (1 task 0.5 vCPU, 1GB RAM, 24/7): ~$22/mes. Cloud Run: ~$31/mes.
Fargate ganó. Carga constante y predecible significa que el compute dedicado sale más barato que pagar por invocación.

Workload B: API de inferencia ML (50k requests/día, distribución horaria con picos)
Lambda con cold starts: técnicamente no viable para nuestro SLA de latencia bajo 500ms. Fargate (1 task caliente, 2 vCPU, 4GB RAM): ~$110/mes. Cloud Run (1 vCPU, 2GB RAM, min-instances=1 para evitar cold starts): ~$62/mes.
Cloud Run con una instancia mínima ganó. Mantenés el modelo caliente sin pagar por N contenedores inactivos.

Workload C: ETL nocturno (30 minutos por noche, procesamiento intensivo)
Lambda: ~$2/mes. Fargate on-demand: ~$8/mes. Cloud Run: ~$3.50/mes.
Lambda ganó fácil. Para trabajos cortos e infrecuentes, el modelo de pago por ejecución es imbatible.

Lo que estos números muestran, más allá de los valores específicos: el patrón de tráfico importa tanto como el tipo de workload. No existe una respuesta universal.

Mi Recomendación Real Según el Tipo de Proyecto

Voy a ser directo porque "depende de tu caso de uso" no le sirve a nadie cuando está tomando una decisión concreta con fecha límite.

Serverless puro (Lambda, Cloud Functions) es la elección correcta si tu equipo tiene menos de 5 ingenieros de backend, tus workloads son principalmente event-driven con dependencias ligeras (menos de 100MB), y la variabilidad de tráfico es alta o impredecible. El overhead operativo de gestionar contenedores no vale la pena si podés evitarlo.

Contenedores dedicados (Fargate, GKE, EKS) cuando tenés ML en producción con modelos que pesan más de 500MB, procesos que corren por más de 15 minutos, o carga base predecible y sostenida que justifique instancias dedicadas. También si tu equipo ya tiene expertise sólido en Docker y Kubernetes — la curva de aprendizaje ya está amortizada y el beneficio operativo compensa.

La opción intermedia (Cloud Run, App Runner, Azure Container Apps) es donde yo comenzaría hoy para la mayoría de APIs medianas nuevas. Contenedores con billing serverless. La limitación principal es el vendor lock-in a la plataforma de cloud, que puede o no importarte según tu estrategia — aunque honestamente, para equipos de menos de 10 personas construyendo en un solo cloud, ese lock-in raramente es el problema real.

Kubernetes gestionado lo dejaría para equipos con platform engineering dedicado. Con cinco personas, no podíamos darnos ese lujo. Aprendimos eso de la manera cara, no de un post.

Mi veredicto después de este año: para un equipo de 4-6 personas en 2026, el punto de partida debería ser Cloud Run o App Runner, con Lambda reservado para event processing liviano y contenedores dedicados únicamente para workloads de ML con estado o procesamiento intensivo. La arquitectura puramente serverless que teníamos era elegante, pero nos limitó cuando escalamos en complejidad — no en tráfico. Esa es una distinción que no aparece en ningún comparison chart.

Redis vs Valkey en 2026: El Fork que Nadie Pidió y Por Qué Ahora Importa

Moon Robert — Mon, 09 Mar 2026 20:19:12 +0000

Marzo de 2024. Estaba revisando una PR bastante aburrida cuando vi el anuncio de Redis Ltd.: a partir de la versión 7.4, Redis dejaba de ser BSD-3-Clause para pasar a un modelo dual con RSALv2 y SSPL. Cerré el anuncio, lo volví a abrir. Lo leí dos veces más.

Llevaba cuatro años con Redis corriendo en producción —caché de sesiones, rate limiting, pub/sub para notificaciones en tiempo real— y de repente tenía que decidir si ese stack seguía teniendo sentido.

Spoiler: terminé migrando a Valkey. Pero la decisión no fue tan obvia como parecía al principio.

Por Qué Redis Ltd. Cambió la Licencia (y Por Qué Duele)

La explicación oficial fue que los proveedores cloud —principalmente AWS, Google y Azure— estaban ganando dinero ofreciendo Redis como servicio gestionado sin contribuir significativamente al proyecto. Hay cierta lógica ahí. El problema es cómo lo resolvieron.

El SSPL (Server Side Public License) tiene una cláusula que básicamente dice: si ofreces el software como servicio, tienes que liberar también todo el código de tu infraestructura de servicio bajo SSPL. No solo el código que modificaste de Redis, sino todo lo que lo rodea. AWS claramente no iba a hacer eso. Ningún cloud provider iba a hacerlo.

La OSI (Open Source Initiative) rechazó el SSPL cuando MongoDB lo propuso en 2018. No es open source bajo ninguna definición reconocida. Redis Ltd. lo sabía. Lo eligieron igual.

Entonces, ¿qué pasó? En menos de dos semanas, la Linux Foundation anunció Valkey —un fork de Redis 7.2.4 bajo BSD-3-Clause— con el respaldo de AWS, Google Cloud, Oracle, Ericsson y otros. El 23 de marzo de 2024, Valkey tenía su propio repositorio. Para mayo, AWS ya había anunciado que ElastiCache migraría a Valkey. DigitalOcean y Aiven siguieron poco después.

Yo me quedé pensando: ¿esto es un rescate legítimo del open source o simplemente los cloud providers protegiéndose a sí mismos?

Honestamente, creo que es las dos cosas. Y eso no hace el fork menos válido.

Lo Que el Cambio de Licencia Significa en la Práctica Para Tu Proyecto

El análisis legal se complica rápido. Lo práctico es más simple.

Si usas Redis en tu propio servidor o en contenedores propios, la nueva licencia de Redis 7.4+ no te afecta directamente mientras no lo estés ofreciendo como servicio a terceros. Para la mayoría de startups y equipos de producto, eso significa que técnicamente podrías seguir usando Redis sin problema legal inmediato.

El asunto se complica en tres casos:

Primero, si dependes de un cloud provider que ahora ofrece Valkey en vez de Redis. Ya no tienes elección —estás en Valkey de facto. AWS empezó la migración automática de ElastiCache en 2025 y la mayoría de instancias ya están en Valkey 7.2.x o Valkey 8.x.

Segundo, si tu empresa tiene política de "solo OSI-approved licenses" (muchas empresas medianas y grandes tienen esto como requisito legal). Redis 7.4+ no pasa esa barrera. Valkey sí.

Tercero —y esto me tomó por sorpresa cuando lo investigué a fondo— algunas distribuciones de Linux ya eliminaron Redis de sus repos oficiales o lo marcaron como non-free. Debian, por ejemplo, movió Redis al área non-free. Si tu pipeline de CI/CD instala Redis desde los repos del sistema, ya estás afectado aunque no lo sepas.

Mi equipo de tres personas descubrió esto último de la peor manera posible: un pipeline de staging que instalaba Redis via apt dejó de funcionar un martes a las 11pm porque alguien hizo un apt upgrade en la imagen base. No fue prod, pero tampoco fue divertido.

Valkey 8.x: Lo Que Me Sorprendió Después de Migrar

Tenía expectativas bajas. Los forks apresurados suelen tener deuda técnica, documentación inconsistente y una comunidad que se va apagando después del momento inicial de entusiasmo. Me preparé para el peor caso.

No fue lo que encontré.

Valkey 7.2.x (el fork inicial) es esencialmente Redis 7.2.4 con el nombre cambiado y la licencia corregida. La compatibilidad es casi perfecta —el protocolo RESP, la API de comandos, los archivos de configuración. Cambiar redis-cli por valkey-cli es literalmente todo lo que necesité hacer en la mayoría de mis scripts de utilidad.

Pero Valkey 8.0, lanzado en septiembre de 2024, es donde se empezaron a ver las primeras divergencias reales:

# Verificar versión y algunas métricas nuevas en Valkey 8.x
valkey-cli INFO server | grep -E "valkey_version|io_threads_active"

# Output que vi en mi setup:
# valkey_version:8.0.2
# io_threads_active:4

# En Redis 7.4, el mismo campo sería redis_version
# y io_threads no está activo por defecto de la misma manera

El cambio más interesante en Valkey 8.0 fue la mejora en el I/O threading. Redis implementó multithreading para operaciones de red en Redis 6.0, pero el modelo seguía siendo single-threaded para la ejecución de comandos. Valkey 8.0 mejoró el pipeline del I/O threaded y, en mis pruebas con cargas de lectura intensiva (básicamente GET/SET con payloads de ~1KB), vi entre un 15% y un 22% de mejora en throughput contra Redis 7.2.

No soy 100% seguro de que ese número se sostenga en todas las cargas de trabajo —mi setup específico favorece lecturas— pero es más de lo que esperaba de un proyecto con menos de un año de vida independiente.

Lo que sí me decepcionó: la documentación en 2024 era bastante escasa. Muchas páginas eran copias directas de la doc de Redis con el nombre cambiado, sin actualizar ejemplos ni aclarar las diferencias nuevas. Para principios de 2025 había mejorado bastante. En 2026 ya está en un estado decente, aunque todavía hay secciones donde claramente falta trabajo.

Dos Semanas Migrando en Producción: Lo que Realmente Pasó

El plan inicial era simple: alias de red, mismo puerto, cambiar el binario, listo. Y en su mayor parte funcionó exactamente así.

La arquitectura que migré: tres instancias Redis en un cluster de replicación primario/réplica, con Sentinel para failover automático. Aproximadamente 8GB de datos en memoria, mezcla de strings, hashes y sorted sets. Unas 12,000 operaciones por segundo en hora pico.

# El cambio más invasivo en código fue esto — casi ninguno
import redis  # El cliente de Python siguió funcionando sin cambios

r = redis.Redis(
    host='valkey-primary.internal',  # Solo cambié el hostname
    port=6379,
    decode_responses=True
)

# Todo lo demás igual. ZADD, HGET, SETEX, Pub/Sub — sin cambios.
r.zadd('leaderboard', {'user:1042': 9850.5})
score = r.zscore('leaderboard', 'user:1042')

El cliente de Python redis-py funciona con Valkey sin modificaciones. Lo mismo con los clientes de Node.js, Go y Java que usamos. Valkey es intencionalmente compatible con el protocolo de Redis —eso era un requisito del fork desde el día uno.

Donde sí tuve problemas: Valkey Sentinel tiene algunas diferencias de comportamiento en edge cases de failover. Específicamente, durante una prueba de failover intencional (déjame ser honesto: fue un kill -9 en el primario un viernes a las 4pm porque pensé que teníamos tiempo suficiente para resolver problemas antes del fin de semana), el tiempo de elección del nuevo primario fue más lento de lo esperado —unos 35 segundos vs los ~8 segundos que veía con Redis Sentinel.

Después de investigarlo, resultó ser un parámetro de configuración que no había migrado correctamente: sentinel down-after-milliseconds. El valor por defecto en mi instalación nueva de Valkey era diferente al que tenía en Redis. Cinco minutos de config fix, problema resuelto. Pero me costó esa tarde de viernes.

Una advertencia que no puedes ignorar: los módulos. Si usas RedisSearch, RedisJSON, RedisTimeSeries o cualquier módulo de Redis Stack, esos son propietarios de Redis Ltd. y no son compatibles con Valkey. Existen proyectos comunitarios alternativos, pero la paridad de features no es completa. Si dependes fuertemente de estos módulos, la migración es considerablemente más complicada.

En mi caso no usaba ninguno. Suerte de principiante, supongo.

Mi Veredicto: Cuándo Elegir Cada Uno (Sin Rodeos)

Después de dos semanas de migración y seis meses más usando Valkey en producción, esto es lo que pienso —sin pretender que es una decisión neutral:

Usa Valkey si:

Tu infraestructura está en un cloud provider que ya migró (AWS, GCP, DigitalOcean, Aiven) — no tienes opción y no es un problema. Si estás empezando un proyecto nuevo hoy, Valkey es la decisión obvia: licencia limpia, respaldo institucional fuerte, y la distancia técnica con Redis en casos de uso comunes es mínima o favorable.

El ecosistema de Valkey en 2026 ya tiene masa crítica. La mayoría de herramientas de observabilidad (Datadog, Prometheus exporters, etc.) soportan Valkey. Los frameworks que tenían integraciones con Redis las tienen con Valkey —algunos por compatibilidad directa, otros con soporte explícito.

Sigue con Redis si:

Dependes de Redis Stack y los módulos propietarios (RedisSearch en particular) son parte central de tu arquitectura. Migrar eso hoy tiene un costo real. Además, si tienes un contrato de soporte con Redis Ltd., tiene sentido mantenerse en su ecosistema mientras dure.

También hay un argumento de inercía legítimo: si Redis 7.2.x o anterior corre perfectamente en tu infraestructura propia y no tienes razones de licencia para cambiar, la presión de migrar no es urgente. Aunque el riesgo real permanece — Redis Ltd. ya demostró que puede cambiar las reglas cuando le convenga. El fork de Valkey existió exactamente porque ese riesgo se materializó.

Personalmente, no volvería a Redis para un proyecto nuevo. La incertidumbre de licencia existe, el ecosistema de Valkey ya es lo suficientemente maduro, y los números de rendimiento en Valkey 8.x me gustaron más de lo que esperaba. No es una decisión emocional contra Redis Ltd. —es pragmatismo.

La conclusión es sencilla: Valkey ganó el fork porque los cloud providers lo necesitaban, y eso resultó ser suficiente para llevarse el ecosistema también.

TypeScript 5.x in 2026: Features That Actually Matter for Production Code

Moon Robert — Mon, 09 Mar 2026 20:18:42 +0000

Spent most of last winter doing something I should have done a year earlier: actually reading the TypeScript 5.x changelogs. Not skimming the headlines — reading them, then dropping each feature into a scratch project to see how it actually behaved. Our codebase sits at around 180k lines — a team of seven, a mix of Node.js inference services and React front-ends — and we'd been on TypeScript 5.x for over a year without meaningfully adopting anything new. We'd bumped the package version, confirmed the build didn't break, moved on.

What I found: maybe six features that genuinely changed how I write TypeScript, and a longer tail of things that are technically interesting but haven't touched my day-to-day work. This isn't a changelog recap. It's what actually earned its place.

`using` Declarations Fixed a Leak I'd Been Ignoring for Eight Months

The explicit resource management proposal — using and await using — landed in TypeScript 5.2, and I'm genuinely annoyed it took me this long to use it. The thing that finally pushed me to look: a slow memory leak in one of our LLM inference services I'd been deferring for months.

We were pooling inference sessions, and somewhere in the request-handling code, sessions weren't always being released. The try/finally blocks were there — mostly. One code path through a batch endpoint was missing the cleanup call. The session sat there, held in memory, until the process restarted. I pushed a fix on a Friday afternoon after tracing it for two hours, and I thought: this is the kind of bug that shouldn't be possible.

The old pattern:

// The before: try/finally that's correct until someone adds a code path
async function runBatchInference(prompts: string[]) {
  const session = await pool.acquire();
  try {
    return await Promise.all(prompts.map(p => session.complete(p)));
  } catch (err) {
    logger.error('batch inference failed', err);
    throw err;
    // pool.release() was added here by a colleague — but not in the finally block
  } finally {
    await pool.release(session); // sometimes ran twice. sometimes not at all.
  }
}

After implementing Symbol.asyncDispose on the session class:

class InferenceSession {
  private released = false;

  async complete(prompt: string): Promise<string> { /* ... */ }

  async [Symbol.asyncDispose](): Promise<void> {
    if (!this.released) {
      await pool.release(this);
      this.released = true;
    }
  }
}

async function runBatchInference(prompts: string[]) {
  await using session = await pool.acquire();
  // No try/finally. Disposal is guaranteed at scope exit,
  // regardless of which path the function takes.
  return Promise.all(prompts.map(p => session.complete(p)));
}

What surprised me was the disposal ordering. When you stack multiple using declarations in the same scope, TypeScript disposes them in reverse order — last declared, first disposed, LIFO. I expected to have to verify this carefully and maybe work around edge cases. Nope. It just works the way you'd want it to if one resource depends on another.

If you manage any resource in TypeScript — database connections, file handles, WebSocket sessions, anything with a close() — implementing Symbol.dispose or Symbol.asyncDispose and switching to using is the most immediately practical change in all of 5.x.

Inferred Type Predicates Deleted About 300 Lines of Manual Guards

Before TypeScript 5.5, getting .filter() to actually narrow a type required an explicit type predicate function. We had a file of them: isNonNull, isLoaded, isSuccessResponse, isAPIError. About 300 lines across two utility modules, and someone would add a new one every couple of weeks. Every time we introduced a new union type, we'd forget to add the corresponding predicate, use the wrong one, or find out the type was wider than expected somewhere downstream.

TypeScript 5.5 introduced automatic inference of type predicates — when the compiler can determine from the function body that a value is being narrowed, it infers the value is T return type for you. The case that hit us hardest:

// Before 5.5 — you wrote this (correctly) every time
function isNonNull<T>(value: T | null | undefined): value is T {
  return value != null;
}

const rawResults: (InferenceResult | null)[] = await runBatch(prompts);
const results = rawResults.filter(isNonNull); // InferenceResult[]

// After 5.5 — TypeScript infers the predicate from the inline callback
const results = rawResults.filter(r => r !== null);
// results is InferenceResult[], not (InferenceResult | null)[]
// No helper. No import. Just correct.

I deleted most of those utility files the same afternoon I confirmed this worked. Not all of them — there are still cases where the inference doesn't trigger. The rule of thumb I've built up: simple null and equality checks work reliably; anything with nested property access or custom logic still needs an explicit predicate.

One thing I noticed: this pairs well with typed AI SDK responses, where you're often getting back something like CompletionResult | RateLimitError | null from a batch call and need to split it into separate arrays. Used to be a predicate per type. Now it's an inline condition and the types just follow.

NoInfer Is Nine Characters and It Stopped a Real Bug

I'll be honest — I thought NoInfer<T> (added in 5.4) was a library-author concern when I first read about it. I was wrong. I ran into the problem it solves within two weeks.

The setup is — okay, let me back up a second. We have a config resolution function that looks up model configurations by key and falls back to a default. The default value was silently widening the inferred type, because TypeScript was using the fallback argument to infer T rather than the caller's intended type.

// Without NoInfer: the fallback widens T
function resolveConfig<T>(
  registry: Map<string, T>,
  key: string,
  fallback: T  // TypeScript infers T partly from here — the problem
): T {
  return registry.get(key) ?? fallback;
}

// resolveConfig(myRegistry, 'gpt-4o', { temperature: 0.7 })
// infers T as { temperature: number }, not ModelConfig
// downstream code that expects ModelConfig now has no error

// With NoInfer: only the registry type informs T
function resolveConfig<T>(
  registry: Map<string, T>,
  key: string,
  fallback: NoInfer<T>  // can't influence T inference
): T {
  return registry.get(key) ?? fallback;
}

That function had been silently widening types for months. I'm not 100% sure we ever shipped a production bug because of it — but we had tests passing on wider types than they should have been, and that's a bad place to be.

Reach for NoInfer when you write generic utilities with default or fallback parameters. You'll know when you need it because you'll see the inferred type being wider than you intended, and you'll wonder why. Then it'll click immediately.

verbatimModuleSyntax Is the Price of Admission for ESM in 2026

verbatimModuleSyntax shipped in 5.0, but I keep seeing teams who've skipped it — usually because enabling it immediately breaks forty files and no one wants to deal with that mid-sprint. I deferred it for months too. But now that Node.js 22+ handles TypeScript natively via --experimental-strip-types, and TypeScript 5.8 introduced --erasableSyntaxOnly more cleanly, verbatimModuleSyntax is effectively required if you want your TypeScript to run without a transformation step.

Here's the thing: when this flag is off, TypeScript can rewrite your imports. An import { SomeInterface } that's type-only might get stripped, or it might get emitted, depending on whether the compiler thinks it's a value. That ambiguity is fine until you're on an edge runtime or a tool that doesn't do the same inference TypeScript does. Then you get subtle bundling issues — not crashes, usually, just slightly wrong output that's hard to trace back.

The fix is boring: turn the flag on, let the compiler tell you which imports need import type, run the VS Code quick-fix on each file. It took me about ninety minutes across our codebase. I haven't thought about import emission since.

If you're still on a CommonJS Node.js setup with no edge runtime in sight, you can defer this. If you're deploying to Cloudflare Workers, Deno, or native Node.js strip mode — do it now.

Two Features I Overhyped in Slack, and Two Small Wins That Earned Their Place

After migrating the using declarations and cleaning up the predicates, I made the mistake of posting "TypeScript 5.x is actually great" in our engineering channel and listing six more features I was excited to explore. Two of them did not pan out the way I expected.

Decorator metadata (5.2). I went in thinking we could annotate validation schemas directly on request classes, reflect on them at runtime, and eliminate some boilerplate in our API layer. You can do that. The problem is runtime support — you need either a polyfill or an environment that natively supports the TC39 Decorator Metadata proposal. For our Node.js services, fine. For the React front-end running in whatever browsers our users have, I didn't want to ship a polyfill for something Zod schemas solved in an afternoon. If you're building a framework where you control the runtime, worth evaluating. For application code, the cost-benefit didn't work out.

Const type parameters (5.0). I do use these, just much less often than I expected. When you declare function foo<const T>(), TypeScript infers literal types for T instead of widening. Useful for typed config builders and tuple utilities. I've reached for it maybe eight times in the past year. Good to know it exists; not a weekly-driver feature.

The smaller wins that actually earned their place: preserved narrowing after last assignment (5.4) caught a real bug where a closure was capturing a variable I thought was permanently narrowed but could have been reassigned before the callback fired. The compiler surfaced it before it shipped. And regex syntax checking (5.5) has caught two invalid patterns that would have been silent runtime failures — the kind of thing that used to be completely invisible to the type system.

Anyway — those four. using declarations and inferred predicates are the ones I'd push on any TypeScript team right now, regardless of what kind of code they're writing. verbatimModuleSyntax is a one-time cost you pay once and never think about again. NoInfer you'll understand the second you hit the problem it solves.

The rest of 5.x I'd skim when a release drops and learn on demand. The six features I was posting about excitedly in Slack? Genuinely cool. Not in my daily workflow.

These four are.

Serverless vs Containers in 2026: Why I Stopped Treating It as a Binary Choice

Moon Robert — Mon, 09 Mar 2026 20:18:12 +0000

About 14 months ago, my team of four migrated our entire backend — a fairly standard Node.js/Python mix serving a B2B SaaS product — fully onto AWS Lambda. We'd read the same blog posts you probably have. Pay for what you use, infinite scale, no servers to babysit. We were sold.

Six months later, two of our services were back in containers on ECS Fargate.

Not because Lambda failed us. It's more complicated than that. This post is my attempt to be honest about what actually happened, what the tradeoffs look like in practice in early 2026, and what I'd tell a team starting fresh today.

Why We Went All-In on Serverless (And What Actually Worked)

Before I get into the friction: serverless genuinely delivers for certain workloads, and I want to say that clearly before this turns into another "we went back to containers" post that implies the whole thing is overhyped. It isn't.

Our webhook processing pipeline — where we ingest events from third-party integrations and fan them out to customer-specific handlers — is still on Lambda and I have zero plans to change that. It's processing about 2-3 million invocations a day now, and the cost is roughly $40/month. The same workload on containers would require careful autoscaling configuration, and we'd almost certainly be over-provisioned most of the time because the traffic pattern is genuinely spiky: bursts of thousands of events followed by minutes of nothing.

The other thing serverless got right for us: the team doesn't have to think about it. Lambda functions deploy in under two minutes, they scale, they recover from errors automatically. For a four-person team where nobody has "DevOps" in their title, that operational simplicity is worth real money.

The tooling also got genuinely better between 2024 and now. AWS SAM plus GitHub Actions is a clean deployment story. The old pain of local Lambda testing has mostly been solved — sam local invoke is workable, not perfect, but I stopped complaining about it months ago.

The sweet spot for Lambda: irregular or unpredictable traffic, cold invocations that are tolerable for the use case, discrete bounded work, and a team small enough that operational overhead eats into actual product time.

Where the Serverless Story Started to Break

We ran our main user-facing API on Lambda for about four months. Authentication, data fetching, the synchronous endpoints our SaaS customers hit directly. And it worked — until 2am on a Tuesday, when our largest customer kicked off a bulk export job that slammed our database connection pool.

Lambda functions don't maintain persistent connections. Every cold start means a new connection. When 200 functions spun up simultaneously and each tried to grab a database handle from our RDS instance, we had a very fun morning.

RDS Proxy helped. We implemented it and it did solve the immediate connection storm. But it added 3-5ms of latency per query in our benchmarks, and it added another managed service to reason about. Our connection pooling logic — which had been invisible when we ran a containerized API server — was now something we actively had to debug and configure.

The deeper issue was architectural. Lambda encourages stateless, short-lived compute, which is correct and good engineering, but our API had accumulated a few stateful patterns we hadn't noticed until serverless made them painful. In-memory caching with a warm LRU cache we'd been relying on without realizing it. Some SDK client initialization done lazily that assumed a long-lived process. You could argue we should have caught these earlier — fair — but the migration surfaced a whole class of assumptions we'd made about "the server" that didn't hold anymore. These weren't Lambda problems exactly. They were invisible debt that Lambda forced us to pay.

The Cold Start Problem in 2026 Is Better, But Not Gone

I genuinely thought cold starts were a solved problem when we made our migration decision. That was wrong.

Lambda SnapStart — originally Java-only — extended to Node.js 22 and Python 3.13 runtimes in late 2025. The basic idea: AWS snapshots your initialized function and restores from that snapshot instead of initializing from scratch. In practice, this brought our cold starts from 600-900ms down to 80-150ms for most functions. Real improvement.

But there are edge cases. SnapStart doesn't play nicely with certain SDK initialization patterns. We hit a weird issue where the AWS SDK v3 client caching behavior caused stale credential state in restored snapshots — silent auth failures for about 0.1% of cold-start invocations. Took us two days to track down. It's documented in a GitHub issue thread (aws/aws-lambda-snapstart-java #89, though the Node behavior lives in a comment thread rather than its own issue, which is... typical).

For Python-heavy ML inference, cold starts are still brutal. A Lambda function loading a scikit-learn model plus its dependencies is going to take 3-8 seconds on a cold start depending on model size. Lambda container image support helps — you can package up to 10GB now — but you're still paying the initialization cost every time a new instance spins up. I moved our ML inference endpoints to containers for exactly this reason: a persistent ECS service that keeps the model warm is just better for that use case, full stop.

Here's what the contrast looks like in actual code:

# Lambda: webhook handler — stateless, spiky, exactly the right use case
import json
import boto3
from aws_lambda_powertools import Logger, Tracer

logger = Logger()
tracer = Tracer()

# Initialized once per lifecycle — SnapStart snapshots this state
sqs = boto3.client('sqs')

@tracer.capture_lambda_handler
@logger.inject_lambda_context
def handler(event, context):
    records = event.get('Records', [])
    results = [process_webhook(r) for r in records]
    return {"processed": len(results)}

# ECS: ML inference service — persistent process, model stays loaded in memory
from fastapi import FastAPI
import joblib
import numpy as np

app = FastAPI()

# This is the whole point: loads ONCE at container start, not on every invocation
# On Lambda, a cold start would re-load this 3-8 seconds every time
model = joblib.load('/app/models/churn_predictor_v3.pkl')

@app.post("/predict")
async def predict(features: dict):
    X = np.array([[features[k] for k in sorted(features)]])
    probability = model.predict_proba(X)[0][1]
    return {"churn_probability": float(probability)}

The difference is obvious when you lay it out this way. Lambda's cold start cost only becomes a real problem when you have heavy initialization — but "heavy initialization" turns out to describe a lot of production workloads.

Container Economics: When the Math Actually Flips

I spent too long assuming serverless was inherently cheaper. "Pay per invocation" sounds obviously better than always-running containers. The math gets interesting.

Our API sits at roughly 8 million requests per day. With 512MB functions averaging 50ms execution time:

Request charges: 240M/month → essentially negligible (~$0.05)
Compute: 240M × 0.05s × 0.5GB = 6M GB-seconds → ~$100/month
Supporting services (RDS Proxy, NAT Gateway egress, X-Ray): ~$65/month

Total: roughly $165/month for Lambda.

Two Fargate tasks (1 vCPU, 2GB RAM each) running 24/7: about $140/month. With a simple autoscaling policy that steps to four tasks during business hours, you land at ~$170/month.

Basically the same cost. At 2x our current scale, containers get cheaper — warm instances mean consistent p95 latency, persistent connection pools eliminate the RDS Proxy overhead, and we can be more precise about autoscaling than Lambda's concurrency model allows.

This math assumes someone actively tuning Fargate task sizes and autoscaling thresholds, though. That's real work. If your team doesn't have the bandwidth for it, the serverless model's operational simplicity is itself worth something — I wouldn't optimize purely for raw infrastructure cost if the alternative is your engineers spending their afternoons staring at CloudWatch dashboards.

What I'd Actually Recommend

I pushed our team to go all-serverless partly because I was excited and partly because I'd read too many AWS blog posts written by people whose job is to sell you AWS services. That's not a knock on the technology — just useful context for how those blog posts are framed.

For event-driven async workloads — webhooks, queue consumers, scheduled jobs, file processing pipelines — Lambda is genuinely the right default. Traffic is irregular, the work is discrete, and the operational overhead is low enough that a small team can mostly forget it exists. That's a real win.

For user-facing synchronous APIs, it depends. Latency requirements under 100ms p99, heavy in-memory state, ML model serving, or traffic steady enough to keep instances warm — containers are probably the right call. ECS Fargate is my default recommendation there. You don't need to manage EC2 instances unless your infra team is actually sized for that work; Fargate hits the sweet spot.

The hybrid architecture isn't a cop-out. It's how most mature backend systems actually end up, because different workloads have genuinely different characteristics. My mistake was treating this as either/or — that's a framing problem, not a technical one.

The "no servers to manage" promise of serverless is real, but it trades server management for function management: cold start tuning, concurrency limits, timeout edge cases, VPC routing. Lower stakes, but not zero stakes. A Lambda function silently timing out at 29 seconds on an edge case is harder to notice than a container dropping out of your load balancer's health check rotation. I've experienced both, and neither is fun at 2am.

Our current setup: Lambda for async pipelines, Fargate for synchronous APIs, shared infrastructure (VPC, RDS, ElastiCache) that both can reach. Four engineers, two environments, one YAML-heavy afternoon to wire up the networking. That's the architecture I'd start with if I were doing it again.

Redis vs Valkey in 2026: What the License Fork Actually Changed

Moon Robert — Mon, 09 Mar 2026 20:17:42 +0000

When Redis Ltd announced the license change in March 2024, I was in the middle of planning a caching layer for a mid-sized SaaS product — four engineers, roughly 80k daily active users, nothing hyperscale but enough that infrastructure decisions have real consequences. My first reaction was basically: this is annoying, but probably fine. I figured the open source community would grumble for a few weeks and move on.

I was wrong. The fork that came out of it — Valkey, now a Linux Foundation project — turned out to be more interesting than I expected.

Here's what I actually learned after running both in staging, migrating one production service, and spending way too many evenings reading GitHub issues.

March 2024: Why the SSPL Switch Was a Bigger Deal Than It Looked

Redis Ltd moved to dual-license Redis under RSAL (Redis Source Available License) and SSPL (Server Side Public License). Both sound fine until you notice that the Open Source Initiative does not consider SSPL an open source license. MongoDB used it first, and the controversy followed them too.

The practical implication: any cloud provider building a managed Redis-compatible service would now need a commercial agreement with Redis Ltd. The old BSD license let them run Redis however they wanted. SSPL did not.

So within weeks, Amazon, Google, Oracle, Ericsson, and others backed a fork. The Linux Foundation accepted it in late March 2024. They called it Valkey — and started from Redis 7.2.4, the last version under the BSD license.

What struck me was how fast this moved. Fork announcement, Linux Foundation acceptance, first release (7.2.5) — all within about two months. Compare that to the years-long drama around other high-profile forks. Someone had clearly been planning for this scenario before it was announced publicly.

The right way to think about Valkey is not "Redis but free." The governance model is genuinely different. Redis Ltd controls Redis. Valkey is steered by a Technical Steering Committee across multiple companies with no single controlling entity. Whether that matters depends on how much you care about vendor lock-in at the infrastructure layer.

Valkey Is Not the Same Project Anymore — and Neither Is Redis

I assumed Valkey would just track Redis feature-for-feature, staying compatible but staying behind. That's not what happened.

By late 2024, Valkey 8.0 shipped with meaningful performance work around I/O threading. Redis had always had a reputation for being single-threaded on commands (even though I/O threading was added in Redis 6.0), and Valkey's team pushed further on that in 8.0. In my own synthetic benchmarks — 8 CPU cores, mostly GET/SET workloads with some sorted set operations — Valkey 8.0 was measurably faster than Redis 7.4 at high connection counts. Maybe 15-20% throughput improvement in the scenarios I tested. Not nothing.

The config that mattered:

# valkey.conf — enable threaded I/O (Valkey 8.x)
io-threads 4
io-threads-do-reads yes

# This was available in Redis 6.0+ too, but Valkey's default
# behavior and threading model changed in 8.0
# Check your cpu count: don't set io-threads > (cpu_count - 1)

I'm not confident this scales beyond the specific workload I tested — scripting-heavy or transaction-heavy workloads behave differently. But the point stands: Valkey is making its own performance bets now, not just merging Redis commits.

On the Redis side, and this genuinely surprised me: Redis 8 added vector set support and kept iterating on Redis Query Engine (the search/vector work from the old RediSearch module). That's real differentiation. If you're building anything combining caching with vector similarity search, Redis has a more integrated story than Valkey does right now.

By early 2026, the two projects have genuinely diverged. Not dramatically — still about 90% compatible at the command level — but enough that you can't pick one on inertia alone.

The Client Library Situation Nobody Warned Me About

This is where I hit an actual wall.

I was migrating a Node.js service from ElastiCache (which AWS quietly switched to Valkey by default for new clusters in 2024) to self-hosted Valkey for cost reasons. Our client was ioredis, which we'd used for years.

I pushed the config change on a Friday afternoon. Forty minutes later, the service started throwing intermittent connection errors — not enough to page anyone, but enough to show up in our error rate dashboard, which I happened to be watching for a completely unrelated reason. We rolled back. I spent the weekend reading ioredis GitHub issues and found the actual problem buried in a thread from mid-2025.

The issue was how certain client libraries handled CLIENT INFO and HELLO commands during connection setup — version negotiation behavior that had diverged between Valkey 8.x and what ioredis expected based on Redis 7.x behavior. Not exactly a Valkey bug, more of a "the ecosystem assumed Redis forever" problem. The fix existed in an ioredis release that had already shipped, but our lock file had pinned us to an older version.

Check your client library's Valkey compatibility explicitly, and check recent release notes before migrating. "Redis-compatible" does not mean "tested against Valkey 8." Some libraries are explicit about this now; many aren't.

The managed service picture is cleaner. AWS MemoryDB and ElastiCache both support Valkey and handle client library abstraction for you. If you're on a managed offering, the migration path is more straightforward than self-hosted.

Where the Managed Services Landscape Settled

The cloud picture shifted more decisively than I expected.

AWS made Valkey the default for new ElastiCache and MemoryDB instances in 2024. You can still choose Redis — they have a commercial agreement with Redis Ltd — but the default changed. That's operationally significant: environments spun up from templates or Terraform modules defaulted to Valkey unless someone explicitly overrode it.

Google Cloud Memorystore offers Valkey as well. Azure still has Azure Cache for Redis, backed by actual Redis under their own commercial arrangement. The big three have split: AWS and GCP leaning Valkey, Azure leaning Redis.

Redis Insight (the GUI tool) works fine with Valkey — RESP protocol is shared, so most tooling in the ecosystem still functions. The divergence shows up in edge cases: specific command behaviors, module support, vector search.

Which brings me to the most important decision factor right now. If you're using Redis Modules heavily — RedisSearch, RedisJSON, RedisTimeSeries — your decision is mostly already made. Those are Redis Ltd products, not part of Valkey. Community-driven Valkey equivalents are emerging but aren't at the same maturity level yet.

My Actual Recommendation

I thought about hedging this and decided not to.

On managed AWS or GCP: use Valkey. The operational complexity is identical, the defaults are already pointing there, and there's no reason to route Redis Ltd licensing costs through your cloud provider for standard caching workloads. Session storage, rate limiting, pub/sub, leaderboards with sorted sets — Valkey handles all of it, and the performance characteristics are at least comparable, often better.

Self-hosted with heavy Redis Stack module usage: stay on Redis. Valkey doesn't have mature equivalents yet, and the compatibility gap is real if you've built on top of RedisSearch or RedisJSON. Watch the Valkey module ecosystem closely — that's where the gap closes or doesn't over the next year or so.

Starting a new project from scratch in 2026? Valkey is my default. The governance model is more stable long-term — Linux Foundation backing means no surprise license changes. The performance work is real. For typical web application use cases, you're not giving anything up.

The one caveat I'd offer: Redis has 15 years of tutorials, Stack Overflow answers, and tribal knowledge behind it. Valkey is two years old. That documentation gap is real, and for smaller teams without deep Redis expertise, it shows. Plan for that.

For my team, we migrated session caching and rate limiting to Valkey 8.x on managed infrastructure. The search service stayed on Redis — we're using Redis Query Engine heavily and I don't want to rewrite that integration. Both are running fine. The migration I thought would take a weekend took about three weeks once you factor in the client library audit, testing, and the Friday rollback incident.

Worth it. The fact that a fork can happen, get Linux Foundation backing, and reach production-grade maturity in two years says something — and it's a better outcome than everyone just swallowing the SSPL change and moving on.

LangChain vs LlamaIndex vs Haystack: Lo que aprendí construyendo RAG en producción

Moon Robert — Mon, 09 Mar 2026 18:23:04 +0000

Pasé las últimas dos semanas migrando un sistema RAG entre tres frameworks — y no fue una decisión voluntaria. Empezamos con LangChain, las abstracciones se volvieron difíciles de mantener, alguien del equipo sugirió LlamaIndex, probamos eso, y al final terminé revisando Haystack casi de casualidad mientras buscaba una solución a otro problema. Así que lo hice bien: monté un benchmark real con nuestros datos reales y medí lo que importa.

Trabajo en un equipo de seis personas, construimos un sistema RAG para un cliente fintech. El corpus tiene alrededor de 480k documentos — PDFs escaneados (los peores, siempre), HTML de sus portales internos, y algo de Markdown de sus wikis. Presupuesto máximo de inferencia: $800/mes. Eso descartó varias opciones antes de llegar siquiera a cuestiones de arquitectura.

Las versiones que probé: LangChain 0.3.15, LlamaIndex 0.12.3, Haystack 2.7.1.

LangChain 0.3.15 — El ecosistema que a veces juega en tu contra

LangChain fue mi punto de partida porque ya lo conocía. Y ese mismo conocimiento fue parte del problema — teníamos código de un proyecto de hace ocho meses que hubo que reescribir parcialmente porque las interfaces habían cambiado. Otra vez.

La API ha mejorado, seré honesto. LCEL (LangChain Expression Language) funciona bien cuando lo entiendes de verdad. La composición de cadenas con el operador | queda limpia, y el tracing con LangSmith nos ahorró horas de debugging en staging.

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_anthropic import ChatAnthropic

# El retriever con score_threshold fue la parte que más tardamos en afinar.
# 0.72 fue nuestro número después de ~200 consultas de evaluación manual.
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 5, "score_threshold": 0.72}
)

# LCEL en acción — se lee bien y en general funciona bien
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatAnthropic(model="claude-opus-4-6", temperature=0)
    | StrOutputParser()
)

El problema no es la calidad técnica del framework. LangChain intenta cubrir todo — agentes, memoria, herramientas, cadenas, RAG — y el precio de esa amplitud es que la profundidad en retrieval específicamente no está a la altura de la competencia. Cuando quise implementar sentence window retrieval o hierarchical node parsing, tuve que construirlo yo mismo o buscar en la comunidad. Hay soluciones, sí, pero en cinco versiones distintas de la librería, no siempre queda claro cuál aplica a la tuya.

Un momento concreto que me sacó de quicio: en diciembre intenté usar MultiVectorRetriever con documentos largos y me topé con un bug reportado en el GitHub de LangChain (issue #12847) donde los document IDs se sobreescribían silenciosamente. Funcionaba en local pero no en producción — diferencia en la versión de Chroma, resultó ser. Dos días perdidos en eso.

Para proyectos donde necesitas máxima flexibilidad y tienes equipo dispuesto a mantener código propio, LangChain tiene sentido. Para RAG puro, hay mejores opciones.

LlamaIndex 0.12.3 — Cuando la calidad del retrieval realmente importa

Cambié a LlamaIndex con escepticismo. Me sorprendió.

Está mucho más enfocado en el problema de indexing y retrieval, y eso se nota en los detalles de implementación. La combinación que más movió nuestros números fue SentenceWindowNodeParser con MetadataReplacementPostProcessor. La idea: indexas oraciones individuales para tener precisión en el retrieval, pero al momento de generar la respuesta reemplazas ese fragmento con una ventana de contexto mayor. Bajamos el hallucination rate de ~12% a aproximadamente 4% con este cambio solo. Eso es lo que quiero decir con "profundidad en retrieval" — no una feature de marketing, sino herramientas concretas para un problema concreto.

from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# Esta combinación fue la que realmente cambió nuestras métricas de evaluación
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,                           # 3 oraciones de contexto por lado
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

Settings.node_parser = node_parser
Settings.chunk_size = 512  # más pequeño de lo que usábamos antes con LangChain

index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True,
)

# El postprocessor reemplaza el nodo recuperado con su ventana de contexto
# justo antes de armar el prompt — la magia está aquí
query_engine = index.as_query_engine(
    similarity_top_k=6,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

Lo que no me convenció: el sistema de configuración global con Settings se siente torpe cuando tienes múltiples pipelines en el mismo proceso. Cada vez que necesitábamos cambiar algo para un subíndice específico, sobreescribíamos configuraciones globales y rezábamos para que los tests de integración no se pisaran entre sí. El ServiceContext directo era una opción pero está deprecado en 0.12.x, y la historia de migración no es del todo limpia.

También: la documentación asume que usas sus abstracciones de punta a punta. Cuando quisimos integrar LlamaIndex solo para el retrieval y usar nuestro propio sistema de generación, hubo más fricción de lo esperado. No imposible, pero tampoco el camino documentado.

Igual, si el retrieval de calidad es tu prioridad principal, LlamaIndex tiene la mejor caja de herramientas de los tres. Por bastante margen.

Haystack 2.7.1 — El que nadie menciona en los tutoriales

Seré directo: no esperaba que Haystack me sorprendiera tanto.

Tiene una fracción del mindshare de los otros dos, y eso es una lástima porque para ambientes de producción tiene ventajas reales. El modelo mental es distinto al de LangChain y LlamaIndex: todo es un pipeline de componentes conectados, y ese pipeline es un ciudadano de primera clase — lo puedes serializar a YAML, versionarlo, desplegarlo con configuración externa.

# pipeline.yaml — esto es real, no pseudocódigo de blog
components:
  retriever:
    type: haystack.components.retrievers.InMemoryEmbeddingRetriever
    init_parameters:
      document_store: !component document_store
      top_k: 5
  prompt_builder:
    type: haystack.components.builders.PromptBuilder
    init_parameters:
      template: |
        Contexto: {% for doc in documents %}{{ doc.content }}{% endfor %}
        Pregunta: {{ question }}
        Respuesta:
connections:
  - sender: retriever.documents
    receiver: prompt_builder.documents

Eso suena aburrido hasta que estás en tu tercer deployment del día y alguien de QA quiere correr la versión exacta del pipeline que falló en staging. Ser capaz de versionar el pipeline como un artefacto separado del código de aplicación es algo que los otros dos no tienen de serie.

Donde me costó caro: la comunidad es significativamente más pequeña. Pasé cuatro horas un martes intentando entender por qué mi DocumentSplitter ignoraba los saltos de página en PDFs. Nada en Stack Overflow. El Discord de Haystack tenía una pregunta similar sin respuesta de hace tres meses. Al final leí el código fuente — hay un parámetro split_by="page" que no está en la guía de inicio rápido pero sí en el API reference si sabes dónde buscar. Ese tipo de fricción se acumula cuando hay plazos encima.

La observabilidad está mejor pensada, especialmente si ya usas OpenTelemetry. El tracing sale de caja con detalle sobre qué componente tomó cuánto tiempo, sin pagar por herramientas externas.

No sé si Haystack escala más allá de lo que nosotros probamos — alrededor de 2,000 queries diarios en producción. Mi intuición dice que sí, pero no tengo datos propios para afirmarlo.

El Viernes que Casi Me Hace Cambiar de Opinión

Estaba casi decidido por LlamaIndex cuando pasó algo que me hizo repensar el criterio de evaluación completo.

Un viernes por la tarde — clásico — empujé una actualización al servicio de indexación. El proceso de background para indexar documentos nuevos empezó a consumir memoria de forma inconsistente. No todos los runs, tal vez uno de cada cuatro. El problema: en LlamaIndex 0.12.x, hay edge cases en el manejo de memoria al indexar documentos con embeddings de páginas muy largas — PDFs de más de 200 páginas en nuestro caso. Encontré la issue en GitHub pero no había fix todavía.

La solución que terminé usando fue procesar esos documentos en batches más pequeños con un wrapper propio. Funcionó, pero me dejó pensando: ¿cuánto tiempo debería gastar parchando comportamiento de framework versus construyendo producto?

Pensaba que la calidad del retrieval era el criterio más importante, pero en producción la operabilidad importa tanto o más. La pregunta que no me había hecho al inicio del benchmark.

Con Haystack, el mismo escenario hubiera sido más predecible — el pipeline explícito hace más obvio dónde puede fallar y dónde intervenir. Con LangChain hubiera tenido más opciones de configuración pero también más superficie de error. No hay respuesta perfecta. Solo hay tradeoffs que vale la pena conocer antes de comprometer el stack de un cliente.

Mi Recomendación (sin el "depende de tu caso de uso")

Voy directo: si en 2026 estás arrancando un nuevo proyecto RAG de cero con un corpus de escala media o grande, te recomiendo LlamaIndex para la capa de indexación y retrieval. La toolbox de retrieval avanzado no tiene competencia real en los otros dos frameworks, y esa diferencia se traduce en métricas concretas.

Si tu equipo valora la operabilidad y la reproducibilidad de pipelines sobre la velocidad de prototipado inicial — y especialmente si ya tienen cultura de infraestructura-como-código — considera Haystack. La curva de adopción es más alta, pero lo que ganas en predictibilidad en producción compensa.

LangChain tiene sentido en un escenario específico: si necesitas construir agentes con herramientas complejas, o si tu equipo ya lo conoce bien y el costo de migración supera el beneficio. El ecosistema de integraciones es el más amplio de los tres. Pero para RAG puro, es la opción con más overhead de mantenimiento.

Aquí lo que no esperaba aprender: el framework que eliges afecta cómo piensas el problema, no solo cómo lo implementas. LangChain te hace pensar en cadenas. LlamaIndex te hace pensar en nodos y retrieval. Haystack te hace pensar en pipelines y componentes. Si tu modelo mental ya encaja con uno de esos paradigmas, eso es información válida para la decisión — más válida que cualquier benchmark sintético.

Mi setup actual: LlamaIndex para indexación y retrieval, FastAPI por encima, métricas propias con Prometheus. Ninguno de los tres me convenció con su historia de observabilidad nativa, así que ahí construí algo propio. En los próximos meses voy a mirar si la integración OpenTelemetry de Haystack madura lo suficiente para reemplazar eso — pero por ahora, lo que tenemos funciona.

Docker Compose vs Kubernetes en 2026: Cuándo Usar Cuál (Y Cuándo te Estás Complicando la Vida)

Moon Robert — Mon, 09 Mar 2026 18:22:34 +0000

Voy a ser directo: durante el último año cambié de opinión dos veces sobre este tema. Empecé convencido de que Kubernetes era la respuesta correcta para casi todo. Después migré tres proyectos de vuelta a Docker Compose. Y ahora, con la cabeza más fría, creo que puedo darte algo más útil que un vago "depende del caso de uso".

Trabajo con un equipo de cuatro personas. Manejamos una plataforma SaaS con entre 8.000 y 12.000 usuarios activos según el mes, con picos bastante predecibles los martes y jueves. No somos Netflix. Tampoco somos un side project de fin de semana. Estamos exactamente en esa zona gris donde la elección importa de verdad.

Por Qué Docker Compose Llega Más Lejos de lo que la Mayoría Cree

Hay una narrativa muy extendida que dice que Compose es "solo para desarrollo local" y que en cuanto quieres hacer algo serio tienes que saltar a Kubernetes. Eso es, con todo el respeto, una simplificación bastante dañina.

Corrí Docker Compose en producción durante 14 meses. Un VPS de Hetzner, 8 vCPUs, 32 GB de RAM, backups diarios con restic y un nginx actuando de reverse proxy delante de todo. El uptime fue del 99.7%. Los deployments tardaban unos 40 segundos. Y el docker-compose.yml que manejaba todo esto tenía menos de 120 líneas.

# docker-compose.yml (producción, versión simplificada)
services:
  api:
    image: registry.example.com/api:${IMAGE_TAG}
    restart: unless-stopped
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  worker:
    image: registry.example.com/api:${IMAGE_TAG}
    command: ["node", "worker.js"]
    restart: unless-stopped
    depends_on:
      - cache
      - db

  db:
    image: postgres:16.3
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7.4-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

Nada glamoroso. Pero funcionaba. Los deployments eran docker compose pull && docker compose up -d --no-deps api worker y listo. Cero downtime si lo hacías bien — rolling update manual, sí, pero en 14 meses solo metí la pata una vez. Un viernes por la tarde, cómo no.

Lo que sí me costó: cuando empezamos a necesitar más de una instancia del servicio api, las cosas se pusieron incómodas. Compose no tiene balanceo de carga nativo entre réplicas del mismo servicio de forma que realmente te fíes. Puedes usar --scale api=3 pero entonces tienes que configurar nginx manualmente para distribuir la carga entre los contenedores, y eso se vuelve frágil. Ese fue el primer momento en que empecé a mirar k8s con más interés.

Dicho esto: si tienes una sola máquina, un equipo pequeño y menos de 50.000 usuarios, Compose en producción no es una locura. Es una decisión pragmática perfectamente válida.

El Momento en que Kubernetes Pasa de Solución a Problema

Migré a Kubernetes en octubre del año pasado. Usamos EKS en AWS (en aquel momento con Kubernetes 1.31, aunque ya vamos por 1.33). El razonamiento era sólido: necesitábamos escalar horizontalmente de forma automática, hacer deployments sin downtime con más control, y prepararnos para añadir más servicios sin que la infraestructura se volviera un spaghetti.

Lo que no calculé bien fue el coste de operación.

El clúster mínimo viable en EKS nos salía por unos 180–220 dólares al mes solo en nodos (sin contar el control plane, que en AWS son 70 dólares fijos). Añade el Application Load Balancer, los volúmenes EBS, las transferencias de datos, y llegas fácil a 350–400 dólares al mes solo de infraestructura. En el VPS de Hetzner estábamos pagando 39 euros. La diferencia es significativa cuando eres una startup pequeña.

Pero el coste económico es casi lo de menos. El coste en tiempo de ingeniería fue lo que realmente dolió.

El primer mes post-migración lo pasé — okay, lo pasamos (somos cuatro, pero el tema infra recayó principalmente en mí) — básicamente apagando fuegos de configuración. El cluster RBAC, los secrets con External Secrets Operator porque los nativos de k8s son demasiado básicos, los PodDisruptionBudgets para que los deployments no bajaran el servicio, las NetworkPolicies, los resource limits que había que afinar porque sin ellos un worker se comía toda la memoria del nodo y desalojaba pods de producción...

El error que más me dolió, porque era evitable: estuve dos horas debuggeando por qué un pod se reiniciaba cada 15 minutos. El kubectl describe pod mostraba OOMKilled. Pensé que era un memory leak. Resultó que el límite de memoria del pod era demasiado bajo (512Mi) para un servicio que necesitaba 700Mi en los picos. Lo habría visto antes si hubiera tenido Prometheus + Grafana bien configurados desde el día uno. No los tenía.

No digo que Kubernetes sea malo. Digo que tiene una superficie de configuración enorme que hay que gestionar activamente, y para un equipo pequeño eso tiene un coste real que los tutoriales no mencionan.

Las Diferencias que Importan en 2026: Networking, Secrets y Rollbacks

Pasado el trauma inicial, hay tres áreas donde la comparación entre Compose y k8s no está tan clara como parece.

Networking. En Compose, todos los servicios del mismo fichero comparten una red por defecto y se resuelven por nombre de servicio. Punto. En Kubernetes tienes namespaces, Services de tipo ClusterIP/NodePort/LoadBalancer, Ingress controllers (elige el tuyo: nginx-ingress, Traefik, Cilium Gateway API...), NetworkPolicies opcionales pero recomendadas. Más potencia, más cosas que configurar. Lo que sí te da k8s que Compose no puede igualar: service discovery entre múltiples equipos en el mismo clúster, y routing sofisticado tipo canary releases con Argo Rollouts.

Secrets. Aquí Kubernetes tiene fama de hacer las cosas bien, pero los Secrets nativos son simplemente base64 — no es cifrado. En 2026 ya no tienes excusa para no usar External Secrets Operator con AWS Secrets Manager o HashiCorp Vault, pero eso añade más piezas al puzzle. Con Compose en producción, nosotros pasábamos los secrets como variables de entorno desde un fichero .env que no subíamos al repositorio. Menos elegante, pero funciona y es comprensible al 100%.

Rollbacks. Este es donde Kubernetes gana sin discusión. kubectl rollout undo deployment/api y en 30 segundos estás en la versión anterior. Con Compose, el rollback es docker compose up -d con el tag de imagen anterior — lo cual funciona, pero requiere que tengas ese tag anotado en algún sitio y lo ejecutes manualmente. El domingo a las 3am esto marca la diferencia.

La Sorpresa que No Esperaba: k3s Cambia el Cálculo

Aquí viene la parte que la verdad no había anticipado cuando escribí mi primer draft de este artículo.

A principios de este año empecé a experimentar con k3s — la distribución ligera de Kubernetes de Rancher — en un VPS de Hetzner de 8 vCPUs y 16 GB de RAM. Lo que encontré fue que la experiencia de operación es considerablemente más cercana a Compose de lo que imaginaba. Un solo binario. Control plane y worker en la misma máquina. Incluye Traefik como ingress por defecto, SQLite en lugar de etcd para clústeres pequeños, y el consumo de recursos en idle es sorprendentemente bajo — alrededor de 500MB de RAM para el control plane completo.

El coste: el mismo VPS que usábamos con Compose, a 39 euros al mes, corriendo k3s en nodo único. Y ya tienes kubectl rollout undo, HPA (Horizontal Pod Autoscaler), health checks declarativos, y la posibilidad de añadir nodos workers cuando los necesites.

No estoy 100% seguro de que esto escale bien más allá de dos o tres nodos workers sin empezar a necesitar etcd en alta disponibilidad — que ya es otra conversación. Pero para ese punto dulce entre "Compose se queda corto" y "EKS es demasiado", k3s en 2026 merece estar en la conversación.

El Filtro Real: Las Tres Preguntas que Yo Me Haría

Después de haber migrado proyectos en ambas direcciones, este es el proceso que ahora aplico.

¿Necesitas escalar horizontalmente de forma automática?
Si no — si un servidor bien especificado aguanta tu carga con margen, si tus picos son predecibles y manejables manualmente — entonces Compose. Punto. No te vendas la complejidad de k8s como una inversión en el futuro si ese futuro no existe todavía en tus métricas actuales.

¿Tienes más de un servicio que necesita escalar de forma independiente?
Si la API puede tener diez instancias pero el worker de procesamiento de PDFs solo necesita dos, Kubernetes gestiona esto de forma natural. Con Compose empiezas a hacer malabares con nginx y scripts bash. Si estás en este punto, el salto tiene sentido — pero mira k3s antes de ir directo a EKS o GKE.

¿Tu equipo tiene ancho de banda para aprender y mantener esto? Esta es la que más gente ignora. Kubernetes no es un deployment de un día — es un sistema que requiere atención continua. Los upgrades de versión (k8s depreca APIs con bastante regularidad, mira lo que pasó con PodSecurityPolicy en 1.25), los certificados que caducan, los nodos que necesitan mantenimiento. Si tu equipo de DevOps eres tú solo, o si el backend es un tercio de lo que tu equipo hace, ese tiempo tiene un coste de oportunidad alto que se subestima mucho.

Mi recomendación sin rodeos: menos de 50.000 usuarios activos, equipo de menos de seis personas y un solo servidor o VPS — usa Docker Compose en producción. Si ya se te queda pequeño o necesitas multi-nodo de verdad, prueba k3s antes de comprometerte con EKS. Y si tienes un equipo de plataforma dedicado, múltiples servicios con requisitos de escala distintos y presupuesto para la complejidad operativa, entonces Kubernetes managed tiene todo el sentido.

Nosotros acabamos usando k3s en dos nodos en Hetzner para producción y Docker Compose para todos los entornos de desarrollo y staging. El equipo está contento. Los deployments funcionan. Y yo ya no me despierto pensando en etcd snapshots.

Deno 2.0 en Producción 2026: Migración desde Node.js y Qué Cambió Realmente

Moon Robert — Mon, 09 Mar 2026 18:22:04 +0000

Empecemos con honestidad: migré a Deno 2.0 porque me lo pidieron en una retro de equipo y yo, con toda la confianza del mundo, dije "dos semanas máximo". Spoiler: fueron cuatro semanas, un incidente en producción a las 11pm de un miércoles, y una cantidad de tabs de GitHub Issues que prefiero no recordar.

Trabajo en un equipo de seis personas construyendo APIs para una plataforma de analítica de contenido. Tres microservicios en Node.js 22, todos en TypeScript, todos usando ESM, con un montón de dependencias de npm que llevaban años ahí sin que nadie los tocara. El candidato perfecto para la migración, en teoría.

Tres Microservicios, Cuatro Semanas, y un Miércoles Muy Complicado

Elegí empezar con el servicio más pequeño: un worker que procesa webhooks entrantes, transforma los payloads y los encola en Redis. Unas 800 líneas de TypeScript, cuatro dependencias directas en npm. Pensé que sería el caso ideal para validar el proceso antes de tocar los servicios más críticos.

Lo que no calculé fue que "cuatro dependencias directas" significa, con el árbol completo, algo así como 340 paquetes. Y Deno 2.0, aunque mejoró enormemente la compatibilidad con npm respecto a versiones anteriores, sigue teniendo sus opiniones sobre ciertas cosas.

La migración inicial fue sorprendentemente fluida. Cambias el package.json por un deno.json, reemplazas las importaciones de npm con el prefijo npm:, y listo. En teoría. Esto funciona bien:

// Antes (Node.js)
import { Redis } from "ioredis";
import { z } from "zod";

// Después (Deno 2.0)
import { Redis } from "npm:ioredis@5.3.2";
import { z } from "npm:zod@3.22.4";

Aquí empezó lo interesante. Deno 2.0 con "nodeModulesDir": "auto" resuelve la mayoría de los paquetes sin problema. Pero ioredis usaba internamente node:tls con opciones que Deno maneja de forma ligeramente distinta, y el resultado era que la conexión se establecía, enviabas un comando, y a los 90 segundos exactos el socket se cerraba silenciosamente. Sin error. Sin log. Nada.

Lo encontré porque vi que mis métricas de latencia tenían spikes cada 90 segundos exactos — ese patrón tan regular fue la pista. Abrí el issue #24891 en el repositorio de Deno y resultó que ya estaba reportado desde noviembre de 2024. La solución: pin de ioredis@5.3.3-rc.1 que parcheó el comportamiento, y configurar keepAlive: true explícitamente en el cliente.

Esto lo empujé un miércoles a las 7pm. El error volvió a las 11pm porque en staging no se nota — el volumen es demasiado bajo. Me enteré por un alert de Datadog.

La Compatibilidad con npm en 2026: Mejor, Pero No Perfecta

Seré directo porque he visto muchos posts que pintan esto de color de rosa.

Deno 2.x mejoró mucho la compatibilidad con el ecosistema npm. La mayoría de paquetes que no dependen de APIs nativas de Node.js funcionan sin cambios: zod, date-fns, lodash, fastify (sí, fastify en Deno) — sin problemas. Pero hay categorías donde las cosas se complican.

Paquetes que usan __dirname o __filename son los primeros candidatos a romper. Deno los polyfilla, pero si el paquete hace algo creativo con esas rutas para cargar archivos relativos, el comportamiento puede diferir. Me pasó con un SDK interno que cargaba templates desde el sistema de archivos — tres horas depurando algo que no debería haber tardado más de veinte minutos.

Después están los paquetes con addons nativos o binarios de C++. sharp para procesamiento de imágenes funciona vía npm:sharp, pero el tiempo de instalación en CI subió de 45 segundos a 3 minutos en nuestro caso porque Deno no cachea los binarios compilados igual que npm.

El que más me sorprendió: algunos paquetes que detectan el entorno hacen typeof process !== 'undefined' para saber si están en Node — y Deno 2.0 expone un objeto process para compatibilidad, así que eso está bien. El problema viene cuando el paquete luego consulta process.versions.node y espera una versión específica. Deno devuelve un valor emulado que no siempre cuadra con las expectativas internas del paquete.

// Esto puede romper paquetes que verifican la versión de Node
console.log(process.versions.node); // En Deno: "22.0.0" (emulado)
console.log(Deno.version.deno);     // "2.2.4"

// Para detectar si estás en Deno:
const isDenoRuntime = typeof Deno !== "undefined";

JSR (el registro de paquetes de Deno) creció bastante en 2025. A principios de 2026 hay paquetes como @std/http, @std/async, @std/encoding que son de primera clase y funcionan perfectamente. Si puedes sustituir dependencias npm por equivalentes de JSR, la experiencia mejora notablemente.

El Sistema de Permisos: Pesadilla Operacional o Ventaja Real

Cuando empecé a migrar pensé que el sistema de permisos iba a ser un dolor de cabeza. Que terminaría dando --allow-all en producción porque quién sabe qué permisos necesita ioredis internamente, o cualquier otro paquete npm.

Me equivoqué, y de forma interesante.

El proceso de descubrir los permisos que necesita tu aplicación es incómodo al principio — ejecutas sin permisos y Deno te dice exactamente qué necesita — pero esa incomodidad te da algo valioso: un mapa de lo que tu aplicación realmente hace. Nuestro worker de webhooks necesitaba exactamente esto:

// deno.json
{
  "tasks": {
    "start": "deno run --allow-net=redis.internal:6379,api.externa.com:443 --allow-env=REDIS_URL,WEBHOOK_SECRET,LOG_LEVEL --allow-read=/etc/ssl/certs src/main.ts"
  },
  "nodeModulesDir": "auto",
  "imports": {
    "@std/async": "jsr:@std/async@^1.0.0"
  }
}

Eso es todo. Un servicio que en Node.js corría con acceso total al sistema, en Deno corre con permisos explícitos a una IP de Redis, un dominio externo, tres variables de entorno, y los certificados SSL. Si alguien introduce una dependencia que intenta leer /etc/passwd o hacer una petición a un dominio desconocido, Deno lo bloquea en runtime.

No sé si esto escala a aplicaciones con 50 dependencias y comportamientos dinámicos complejos — probablemente hay casos donde terminas en --allow-all de todas formas. Pero para servicios pequeños y bien definidos, cambió cómo pienso sobre la seguridad de lo que pongo en producción.

El Tooling Integrado: Lo Que No Calculé

Yo asumía que el formatter, linter y test runner integrados de Deno eran un nice-to-have para proyectos pequeños. Después de cuatro meses, es lo que más valoro del ecosistema — y no lo vi venir.

No porque deno fmt sea mejor que Prettier (honestamente son bastante similares en resultado). Sino porque elimina una categoría completa de discusiones en el equipo. Sin archivo .prettierrc. Sin conflictos de versión entre eslint y typescript-eslint. Sin "¿qué configuración de eslint usamos?" El standard es el runtime.

La primera semana, un compañero que lleva seis años en Node.js me preguntó cómo configurar el linter. Le respondí que no había configuración. Se quedó callado un momento y luego dijo "ah, ¿entonces funciona?" Sí, funciona.

El test runner también. deno test soporta cobertura de código nativa, watch mode, y el mismo formato de describe/it al que estás acostumbrado si vienes de Jest o Vitest:

import { assertEquals, assertRejects } from "jsr:@std/assert";
import { processWebhook } from "./webhook.ts";

Deno.test("procesa payload válido correctamente", async () => {
  const payload = { event: "push", repo: "mi-repo" };
  const result = await processWebhook(payload);
  assertEquals(result.status, "queued");
  assertEquals(result.jobId.startsWith("job_"), true);
});

Deno.test("rechaza payload sin firma", async () => {
  await assertRejects(
    () => processWebhook({}, { skipSignatureVerification: false }),
    Error,
    "Invalid signature"
  );
});

Sin instalar nada. Sin configurar nada. deno test y ya.

Deno Deploy merece una mención aparte: lo usé para un cuarto servicio, este sí desde cero, y el workflow de deploy es genuinamente rápido — push al repo, deploy en menos de 30 segundos, edge computing global. Pero tiene sus propias restricciones (algunas APIs de filesystem no están disponibles, el modelo de permisos cambia), así que no es directamente equivalente a correr Deno en tu propio servidor.

Cuatro Meses Después: El Veredicto

Los tres microservicios están en producción. Funcionan. No extraño Node.js en ninguno de ellos.

Pero te sería deshonesto si dijera que la migración fue solo positiva. El tiempo real fue el doble de lo estimado. Las incompatibilidades de npm son reales y no siempre están documentadas — a veces simplemente tienes que probar y ver qué explota. El ecosistema de JSR, aunque crece, todavía no tiene la cobertura de npm para casos de uso especializados.

Lo que sí cambió: el tiempo de setup de proyectos nuevos bajó bastante. El onboarding de un desarrollador nuevo es más simple porque hay menos configuración que explicar. Los containers de Docker son más pequeños porque no llevamos node_modules. Y el código TypeScript se siente más limpio cuando Deno es tu target, porque usas las APIs web estándar (fetch, WebSocket, crypto) directamente, sin polyfills.

Mi recomendación concreta: si tienes un servicio Node.js con pocas dependencias npm, bien tipado en TypeScript, y estás dispuesto a invertir una o dos semanas en la migración, hazlo. Si tienes un monolito con 200 dependencias npm, código legacy sin tipos, y deadlines apretados, no lo hagas todavía. No porque Deno sea malo, sino porque la fricción de compatibilidad va a comerte vivo.

Para proyectos nuevos en 2026, ya no consideraría Node.js como primera opción por defecto. Deno ganó ese puesto en mi stack.

LangChain vs LlamaIndex vs Haystack: What Two Weeks in Production Actually Taught Me

Moon Robert — Mon, 09 Mar 2026 18:21:34 +0000

My team got handed a RAG project earlier this year — 40,000 documents, mix of PDFs and Confluence exports, users who would notice if answers were wrong. I'd used LangChain for smaller stuff before, but this was the first time I actually ran all three major frameworks against real data, under real pressure, with a client watching the error rates.

Quick context: four-person eng team, Qdrant running on-prem, Claude as the LLM. The client's tolerance for hallucinated answers was basically zero. Not a toy project.

LangChain's Composition Model Is Great Until Something Goes Quietly Wrong

I've been using LangChain off and on since early 2023, and by now — v0.3+, LCEL as the standard — it genuinely is good at what it promises. The expression language makes wiring things together fast and readable:

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_qdrant import QdrantVectorStore

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 6, "fetch_k": 20}
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the context below.\n\nContext:\n{context}"),
    ("human", "{question}")
])

# This part is clean. The problem shows up later.
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ChatAnthropic(model="claude-sonnet-4-6", temperature=0)
    | StrOutputParser()
)

response = chain.invoke("What's the refund policy for enterprise contracts?")

That code is clean. I actually like it.

The trouble showed up on day four. A retrieval step was returning empty results for certain query types, but only intermittently — maybe 8% of requests. The chain kept running. Returned a confident, fully hallucinated answer with zero retrieved context, and nothing in the output flagged it. I spent half an afternoon chasing this before realizing LangChain was silently passing an empty string as context to the prompt template.

You can guard against this. Callbacks exist. LangSmith is genuinely useful for tracing if you're paying for it. But the default behavior when something fails upstream in a chain is to carry on — and for production RAG that's a real problem I hadn't budgeted time to solve. I ended up writing a custom runnable that validates retrieval counts before the context hits the prompt. Not hard, but it's defensive scaffolding you don't anticipate until it bites you.

The ecosystem advantage is real, though. When I hit a weird edge case with metadata filtering on Qdrant, there was a GitHub issue with a working fix posted five days earlier. That's community size, not luck. If you're integrating anything unusual — a niche vector store, custom document loaders, tool use patterns — LangChain almost certainly has it already.

LangChain is fast to start with, and the integrations will save you. Just write explicit failure guards around your retrieval steps, because the framework won't.

LlamaIndex's Node Model Finally Clicked for Me in Week Two

I'll admit: I bounced off LlamaIndex about eighteen months ago. The "index everything" abstraction felt strange coming from LangChain's chain-centric thinking, and the docs had this habit of showing four different ways to accomplish something without indicating which was current or preferred.

The v0.12 line is much better. But the real shift was accepting that LlamaIndex thinks in nodes, not documents — each chunk carries metadata forward through the whole pipeline. Once I stopped fighting that model and started working with it, things that had felt awkward suddenly made sense.

What genuinely surprised me — stopped me for a moment, honestly — was the SentenceWindowNodeParser. Found it while looking for something else, almost by accident:

from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.llms.anthropic import Anthropic
from llama_index.postprocessor.cohere_rerank import CohereRerank

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

Settings.llm = Anthropic(model="claude-sonnet-4-6")
Settings.node_parser = node_parser

index = VectorStoreIndex.from_vector_store(qdrant_store)
query_engine = index.as_query_engine(
    similarity_top_k=8,
    node_postprocessors=[CohereRerank(top_n=4)]
)

response = query_engine.query("What changed in the Q3 enterprise pricing tier?")
# response.source_nodes — exact retrieval, no hunting around
print(response.source_nodes[0].metadata)

The SentenceWindowNodeParser stores a small chunk for embedding but retrieves a larger surrounding window at query time. You get the precision of small embeddings with the readability of larger context. I had been implementing something like this manually in LangChain. It worked fine. But this was already built in, already tuned, and it took about three minutes to add to the pipeline.

The response.source_nodes access was something I also didn't realize I'd care about until the client asked for citations in the UI. In LangChain I was doing gymnastics with callbacks to surface source metadata. Here it's just... on the response object. Saved probably half a day of plumbing work.

Where it frustrated me: the query engine abstraction goes opaque fast when you need to customize retrieval logic significantly. I spent a day confused about why my custom retriever wasn't applying a metadata filter I'd set — turned out to be a precedence issue in how the query engine assembles its retrieval components internally. Found the answer in a GitHub issue (#14337, two months old), but that hidden behavior cost me real time. When LlamaIndex misbehaves, the error usually isn't the helpful kind.

That said: if the core of your project is document-heavy retrieval with complex chunking requirements, the built-in primitives here are ahead of the defaults in the other frameworks. You'll feel the difference.

Haystack Is Boring and I Mean That as High Praise

Before this project, I associated Haystack with enterprise teams who'd chosen it because procurement required something with a company behind it. I was wrong, and I'm correcting that publicly.

Haystack 2.x restructured around explicit, typed pipelines — every component declared, connections explicit, nothing implicit. Setting it up felt verbose. More boilerplate than either of the others. I figured I'd move through the eval phase quickly and move on.

Then something broke in all three frameworks on the same day (my fault — I'd changed the Qdrant schema without updating the retriever configs). In LangChain, I got a runtime error deep in the chain with a stack trace pointing at internal LangChain code, not mine. In LlamaIndex, it silently returned empty results — I only caught it because I was checking source_nodes counts. In Haystack: component name, expected input type, received input type, and the line in my pipeline definition where the mismatch was. Fixed in under ten minutes.

That's not an accident. The Haystack architecture is designed for exactly this — you can inspect the pipeline graph, each component logs its inputs and outputs clearly, and the type system catches mismatches before they become runtime surprises. For teams maintaining this code six months from now, that's worth a lot.

The deepset team also ships Hayhooks, which wraps your pipeline in a REST API with minimal extra work. For this specific project — where the eventual owners are not Python developers — that mattered during handoff. Showing up with a running API and readable pipeline graphs is a different conversation than handing someone a Python repo and wishing them luck.

What I didn't love: the community is smaller, and if you need an integration that LangChain has but Haystack doesn't, you're writing a custom component. I needed to pull data from an internal API with non-standard auth, and the LangChain loader already existed. In Haystack I wrote it from scratch — maybe three hours, not catastrophic, but real time.

For long-lived projects, regulated environments, or teams where the codebase needs to be maintainable by people who didn't build it — Haystack's verbosity pays dividends. For move-fast prototyping, it costs you upfront.

What the Retrieval Numbers Actually Looked Like

I ran an eval against 200 questions the client's domain expert had written — real questions about real content. Not a rigorous academic study, but real enough to be useful. All three frameworks used identical Qdrant backends.

Retrieval precision (did the right document appear in the top 5?):

LangChain (recursive text splitter, default settings): 71%
LlamaIndex (SentenceWindowNodeParser + Cohere rerank): 84%
Haystack (BM25 hybrid + Cohere rerank): 82%

LangChain's number was dragged down by document categories where the default splitter was cutting badly — a smarter node parser probably closes most of that gap. The retrieval quality difference between frameworks is mostly about defaults, not fundamental architecture. Which means: don't pick a framework because you think it retrieves better. Pick it based on your team's ability to tune the retrieval configuration you actually need.

The more useful metric was time-to-working-pipeline:

LangChain: 3 days (fast start, debugging tax after)
Haystack: 4 days (slower setup, then very stable)
LlamaIndex: 4.5 days (steeper start, paid off during tuning)

I'm genuinely not sure these numbers scale to a larger team — the debugging tax on LangChain probably distributes across more engineers and gets less painful. Your mileage will vary.

What I'm Actually Running in Prod

LlamaIndex.

Not because it's perfect — it isn't — but because for this specific problem (document-heavy RAG, retrieval quality as the primary metric, citation UI as a hard requirement), its built-in primitives were a better fit than what I assembled elsewhere. The node model matches how I was already thinking about the chunking problem. Source attribution is clean enough to build on directly. The retrieval pipeline felt less fragile than my equivalent LangChain setup.

If this had been a general-purpose AI application — agents, tool use, lots of different LLM calls, light retrieval — I'd probably still be on LangChain. The ecosystem advantage is real for that class of problem.

And if I were handing this project to a team that didn't build it, or if we had a compliance requirement around logging every retrieval step, I'd have chosen Haystack and not second-guessed it. The verbosity is a feature in those contexts.

One thing none of these frameworks solved cleanly: eval tooling. I ended up running RAGAS externally regardless of which framework I was using. None of them have a good embedded eval story yet, and that gap keeps showing up in production. That's a separate post — but worth knowing going in.

Pick the framework that maps to how you think about your problem, get a working pipeline running in a day, and then spend your optimization budget on retrieval strategy and eval. That's where the quality actually comes from.

Docker Compose vs Kubernetes: What I Actually Learned Running Both in Production

Moon Robert — Mon, 09 Mar 2026 18:21:04 +0000

Eighteen months ago I inherited a mess. A four-person team had built a reasonably capable ML inference service — three Python microservices, a Redis queue, a Postgres instance, an Nginx reverse proxy — all wired together with a docker-compose.yml that had clearly been written in a hurry and never revisited. The team lead had left a sticky note in the README that said, verbatim: "we should probably move this to Kubernetes at some point."

That sticky note started a long argument with myself.

I ended up running both. Not as an experiment — as an actual business decision I had to defend, twice, to different stakeholders. What follows is what I learned, what I got wrong, and where I landed.

Docker Compose in 2026 Is Not What You Used Five Years Ago

The version of Compose I inherited was using some 3.x syntax with deprecated options. First thing I did was migrate to Compose v2.32 (which ships bundled with Docker Desktop and the Docker CLI now — no separate install needed). That alone fixed several subtle networking headaches.

Thing is, Compose has gotten genuinely good at what it was always meant to do. compose watch has been stable for a while now, and it changed how I think about local development:

# docker-compose.yml — inference service, 2026
services:
  api:
    build: ./api
    ports:
      - "8000:8000"
    develop:
      watch:
        - action: sync
          path: ./api/src
          target: /app/src
        # Rebuild only when dependencies change, not on every save
        - action: rebuild
          path: ./api/requirements.txt
    environment:
      - MODEL_PATH=/models/bert-base
    volumes:
      - ./models:/models:ro  # mount model weights read-only, not baked into image

  worker:
    build: ./worker
    depends_on:
      redis:
        condition: service_healthy
    develop:
      watch:
        - action: sync+restart
          path: ./worker/src
          target: /app/src

  redis:
    image: redis:7.4-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      retries: 5

That sync+restart action for the worker is something I use constantly — it syncs files then restarts the process without a full image rebuild. Saves probably 40 seconds per iteration cycle when you're deep in debugging.

For a team our size (four engineers, two of whom are ML researchers who don't want to think about infrastructure), Compose has a near-zero learning curve. I can write a docker-compose.yml, push it to the repo, and anyone can docker compose up without reading a manual. That matters more than people admit.

On a single host — even a beefy one like an EC2 m7i.4xlarge — Compose handles more than you'd think. I've run services doing 400 req/s on a single host with Compose and it was fine. The constraint is the host, not Compose.

If your service fits on one host and your team is small, defaulting to Compose isn't laziness — it's a reasonable engineering decision with real payoff in operational simplicity.

Where Kubernetes Actually Earns Back Its Complexity Tax

I did eventually move part of the system to Kubernetes. Not all of it — more on that in a moment — but the inference serving component specifically, because we started getting requests for GPU-backed endpoints and that's where Compose genuinely hits a wall.

Running GPU workloads across multiple nodes is one of those things K8s is legitimately built for. The NVIDIA GPU Operator on K8s 1.35 has become much more stable than it was back in the 1.28 era — I remember hitting a specific issue where device plugin pods would crash on node drain (somewhere around kubernetes/kubernetes#118506, I'd have to dig). By 1.33 that class of issue was mostly sorted. GPU scheduling on multi-node K8s is now a solved problem in a way it genuinely wasn't two years ago.

The second payoff: HorizontalPodAutoscaler against custom metrics. We pipe inference latency from Prometheus into KEDA, and the autoscaler responds to queue depth and p95 latency — not just CPU. That's not something you replicate with Compose without building significant custom tooling.

# hpa.yaml — scales inference pods on queue depth + latency
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: inference-scaler
spec:
  scaleTargetRef:
    name: inference-deployment
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: inference_queue_depth
        threshold: "15"  # scale up when >15 items queued per pod
        query: sum(inference_queue_depth) / count(kube_pod_info{pod=~"inference.*"})
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: inference_p95_latency_ms
        threshold: "800"
        query: histogram_quantile(0.95, rate(inference_duration_bucket[2m])) * 1000

Rolling deployments are the other thing worth mentioning. With Compose, docker compose up --force-recreate on a single host means downtime — or you're writing your own health-check loop. K8s rolling updates with a proper readinessProbe mean zero-downtime deploys without having to think about it. I pushed a model update on a Friday afternoon once (yes, I know) and the rollout was fine because the cluster waited for new pods to be healthy before draining the old ones. I would not have taken that risk with Compose on a single host.

That said — and I want to be direct about this — the K8s cluster costs us roughly $340/month more than a comparable Compose deployment on a single large instance would. That's real money for a side project or an early-stage product. The break-even only works if you're at a scale where the autoscaling savings outweigh the base cluster cost, or if you genuinely need multi-node availability.

The ML Workload Angle I Didn't Anticipate

I thought I'd have a clear answer here. I didn't.

I assumed moving ML inference to K8s would also mean moving training jobs there. Same cluster, same GPU nodes, everything in one place — seemed logical. What I actually found was that training jobs are weird. They're batch, they're stateful in an awkward way, they need specific environment setup that changes frequently, and the feedback loop when something goes wrong is slow.

I ran training jobs as K8s Jobs with ttlSecondsAfterFinished for a few months. Fine in theory. In practice, every time an ML researcher wanted to tweak the data pipeline or swap a tokenizer, they were waiting on me to update a ConfigMap or rebuild an image. I had become a gatekeeper for changes that had nothing to do with infrastructure — which is a bad sign.

So I moved training back to Compose — on a dedicated GPU box, not the K8s cluster. Training runs as docker compose -f compose.train.yml up with the model checkpoint directory mounted as a volume. Researchers can modify it directly. Inference serving stays on K8s where the availability and scaling story matters.

I genuinely didn't see that split coming. I thought "K8s for ML" was the obvious move. The reality: K8s is great for serving (stateless, latency-sensitive, scaling matters) and overkill for training (stateful, batch, where iteration speed matters more than orchestration).

The Signals I Now Actually Use to Decide

After 18 months of this, the heuristic I've landed on is less about features and more about team and workload shape.

Compose is the right call when your service runs on one host without strain, your team has fewer than six or seven engineers touching infrastructure, and you're iterating fast enough that deployment simplicity directly affects development speed. Also — and I feel strongly about this — if the people running the service are primarily not infrastructure engineers, Compose's operational model is far more forgiving. A docker compose logs -f worker is something anyone can run. A kubectl logs -n production -l app=worker --since=1h is a command you need to look up, at least at first.

Kubernetes makes sense when you need to schedule across multiple nodes (GPUs, memory isolation, availability zones), when you have autoscaling requirements that respond to custom signals, when your team has dedicated platform or SRE capacity to own the cluster, or when your availability requirements are strict enough that single-host failure isn't acceptable.

One thing I want to push back on: the idea that Kubernetes is automatically "more production-ready." I've seen Compose deployments that were stable and well-monitored, and K8s clusters that were a disaster of misconfigured RBAC, stale CRDs, and nobody who actually understood the control plane. The tool doesn't make you production-ready. The operational discipline does.

What I'd Actually Tell You to Do

Start with Compose. Not because K8s is bad — it isn't — but because you'll hit the limits of Compose in very specific, recognizable ways. You'll know when you need multi-node scheduling because you'll be staring at a GPU allocation problem that Compose can't solve. You'll know when you need cluster-level autoscaling because you'll have just manually scaled your single host twice in a week and you're annoyed about it.

When you hit those specific walls, migrate that specific component. Not everything at once.

The worst outcome I've seen is teams migrating entirely to K8s before they have the scale to justify it, then spending their first six months of product development fighting cluster configuration instead of shipping features. Kubernetes is powerful and I use it every day, but complexity has a real cost and that cost lands on your team's velocity.

Anyway. The sticky note in the README — I never did "move everything to Kubernetes." I moved the inference serving layer and kept the rest on Compose. The system is faster, more reliable, and cheaper to operate than a full K8s migration would have been. Sometimes the boring answer is the right one.

DEV Community: Moon Robert

GitHub Copilot vs Cursor vs Windsurf: Which AI Coding Assistant Actually Makes You Faster in 2026

Inline Autocomplete Is Table Stakes, But There Are Real Differences

Multi-File Editing Is Where You Either Win or Lose Hours

Context and Chat: Who Actually Knows Your Codebase

Pricing, Lock-In, and Practical Reality for Teams

My Actual Recommendation

TypeScript 5.x en 2026: Las Funcionalidades que Realmente Importan en Producción

Decoradores Estables: La Historia de Nunca Acabar que Por Fin Acabó

const en Genéricos y NoInfer: Dos Cambios Pequeños, Mucho Menos Ruido

using para la Gestión de Recursos: Más Útil de lo que Pensé

isolatedDeclarations en Monorepos: El Antes y el Después de Nuestros Builds

Predicados de Tipo Inferidos: El Bug que Llevaba Tres Meses en el Radar

Serverless vs Contenedores en 2026: Guía Práctica de Decisión para Equipos de Backend

Dos Años en Lambda: Lo que Funcionó y el Límite que No Vi Venir

Donde los Contenedores Ganaron la Discusión Interna

El Hallazgo que Me Confundió Durante Semanas: Cloud Run

Los Números Reales Comparados (el Ejercicio que Me Pidió el CFO)

Mi Recomendación Real Según el Tipo de Proyecto

Redis vs Valkey en 2026: El Fork que Nadie Pidió y Por Qué Ahora Importa

Por Qué Redis Ltd. Cambió la Licencia (y Por Qué Duele)

Lo Que el Cambio de Licencia Significa en la Práctica Para Tu Proyecto

Valkey 8.x: Lo Que Me Sorprendió Después de Migrar

Dos Semanas Migrando en Producción: Lo que Realmente Pasó

Mi Veredicto: Cuándo Elegir Cada Uno (Sin Rodeos)

TypeScript 5.x in 2026: Features That Actually Matter for Production Code

using Declarations Fixed a Leak I'd Been Ignoring for Eight Months

Inferred Type Predicates Deleted About 300 Lines of Manual Guards

NoInfer Is Nine Characters and It Stopped a Real Bug

verbatimModuleSyntax Is the Price of Admission for ESM in 2026

Two Features I Overhyped in Slack, and Two Small Wins That Earned Their Place

Serverless vs Containers in 2026: Why I Stopped Treating It as a Binary Choice

Why We Went All-In on Serverless (And What Actually Worked)

Where the Serverless Story Started to Break

The Cold Start Problem in 2026 Is Better, But Not Gone

Container Economics: When the Math Actually Flips

What I'd Actually Recommend

Redis vs Valkey in 2026: What the License Fork Actually Changed

March 2024: Why the SSPL Switch Was a Bigger Deal Than It Looked

Valkey Is Not the Same Project Anymore — and Neither Is Redis

The Client Library Situation Nobody Warned Me About

Where the Managed Services Landscape Settled

My Actual Recommendation

LangChain vs LlamaIndex vs Haystack: Lo que aprendí construyendo RAG en producción

LangChain 0.3.15 — El ecosistema que a veces juega en tu contra

LlamaIndex 0.12.3 — Cuando la calidad del retrieval realmente importa

Haystack 2.7.1 — El que nadie menciona en los tutoriales

El Viernes que Casi Me Hace Cambiar de Opinión

Mi Recomendación (sin el "depende de tu caso de uso")

Docker Compose vs Kubernetes en 2026: Cuándo Usar Cuál (Y Cuándo te Estás Complicando la Vida)

Por Qué Docker Compose Llega Más Lejos de lo que la Mayoría Cree

El Momento en que Kubernetes Pasa de Solución a Problema

Las Diferencias que Importan en 2026: Networking, Secrets y Rollbacks

La Sorpresa que No Esperaba: k3s Cambia el Cálculo

El Filtro Real: Las Tres Preguntas que Yo Me Haría

Deno 2.0 en Producción 2026: Migración desde Node.js y Qué Cambió Realmente

Tres Microservicios, Cuatro Semanas, y un Miércoles Muy Complicado

La Compatibilidad con npm en 2026: Mejor, Pero No Perfecta

El Sistema de Permisos: Pesadilla Operacional o Ventaja Real

El Tooling Integrado: Lo Que No Calculé

Cuatro Meses Después: El Veredicto

LangChain vs LlamaIndex vs Haystack: What Two Weeks in Production Actually Taught Me

LangChain's Composition Model Is Great Until Something Goes Quietly Wrong

LlamaIndex's Node Model Finally Clicked for Me in Week Two

Haystack Is Boring and I Mean That as High Praise

What the Retrieval Numbers Actually Looked Like

What I'm Actually Running in Prod

Docker Compose vs Kubernetes: What I Actually Learned Running Both in Production

Docker Compose in 2026 Is Not What You Used Five Years Ago

Where Kubernetes Actually Earns Back Its Complexity Tax

The ML Workload Angle I Didn't Anticipate

The Signals I Now Actually Use to Decide

What I'd Actually Tell You to Do

`const` en Genéricos y `NoInfer`: Dos Cambios Pequeños, Mucho Menos Ruido

`using` para la Gestión de Recursos: Más Útil de lo que Pensé

`isolatedDeclarations` en Monorepos: El Antes y el Después de Nuestros Builds

`using` Declarations Fixed a Leak I'd Been Ignoring for Eight Months