DEV Community

Cover image for 9 Tools Big Tech Uses Internally (Now Open Source)
Tommaso Bertocchi
Tommaso Bertocchi

Posted on

9 Tools Big Tech Uses Internally (Now Open Source)

Most "best tools" lists are just GitHub trending with extra steps.

Same 10 repos. Same README marketing. Nothing that shows you how teams shipping at scale actually build their internal systems.

The actually interesting tools got built by engineers who had no choice but to build them.

Spotify needed to navigate 2,000 microservices. Uber needed workflows that didn't die silently. YouTube needed MySQL to scale horizontally. None of them built these tools for GitHub stars — they built them to survive the week.

That's the list.


I picked these based on:

  • Genuine internal origin — built and used in production before being open-sourced, not a side project that got donated
  • Still actively maintained — real commits in 2025–2026, active issues, responding maintainers
  • Solves a problem you'll actually hit — not theoretical Google-scale problems
  • Not already a commodity — nothing that's been in every DevOps job listing for five years
  • High complexity/value ratio — tools that take a day to set up but save months

TL;DR: The best infrastructure tools in 2026 aren't built by startups chasing a community — they're built by engineers who got tired of waiting for someone else to solve the problem.

Michael Scott YES

Table of Contents

  1. Backstage — Spotify's developer portal, now the IDP standard
  2. Temporal — Uber's durable workflow engine for code that can't fail mid-run
  3. Vitess — YouTube's MySQL sharding layer, now powering PlanetScale
  4. Envoy — Lyft's proxy that became the foundation of the service mesh market
  5. OpenFGA — Auth0's Zanzibar-style fine-grained authorization
  6. pompelmi — The zero-dep file scanner every serious prod team builds internally and never ships
  7. Turborepo — Vercel's monorepo build system with remote caching
  8. OpenTelemetry Collector — The observability pipeline every cloud provider adopted
  9. Buf — Protobuf tooling that makes gRPC schema management survivable

1) Backstage — Spotify's developer portal, now the IDP standard

What it is: A framework for building internal developer portals — software catalogs, scaffolding, docs, and plugin-based integrations unified in one UI.

Why it matters in 2026: Spotify open-sourced Backstage because managing 2,000+ microservices without a catalog is organized chaos. The internal developer platform (IDP) space was previously only accessible to companies with a dedicated platform engineering team — Backstage changed that. If your engineers spend 20 minutes finding the right service or figuring out who owns a repo, that's a product problem dressed as a process problem. In 2026, the question isn't whether you need an IDP. It's why you haven't set one up yet.

Best for: platform engineering teams, orgs with 10+ services, DevOps leads trying to cut onboarding time, teams drowning in scattered Confluence docs.

Links: GitHub | Website

backstage preview


2) Temporal — Uber's durable workflow engine for code that can't fail mid-run

What it is: A workflow orchestration engine where application state is durable by default — your code resumes exactly where it left off after crashes, restarts, or deploys.

Why it matters in 2026: Cron jobs fail silently. Queues lose messages. Sagas get complicated faster than anyone wants to admit. Uber built Temporal (originally Cadence) because every existing alternative broke under real load — and the same breaking points hit every team that tries to orchestrate multi-step async work. The explosion of AI agents and multi-step pipelines in 2026 has made durable execution a baseline requirement. If your workflow can fail in the middle and leave a user in an unknown state, that's a bug.

Best for: long-running business processes, AI agent orchestration, payment and fulfillment flows, async pipelines where partial failure is unacceptable.

Links: GitHub | Website

temporal preview


Spongebob head explode

3) Vitess — YouTube's MySQL sharding layer, now powering PlanetScale

What it is: A database clustering system for horizontal scaling of MySQL — the same system handling YouTube's query volume since 2010.

Why it matters in 2026: Most teams hit MySQL limits and immediately start planning a full migration to Postgres or a managed cloud DB. Vitess proves that migration is often the wrong answer. PlanetScale was built entirely on top of it, which means the operational understanding and tooling is now mature enough for teams well outside Google's infrastructure. Compute is cheap. Full DB migrations are expensive, slow, and high-risk. Vitess gives you a third option.

Best for: teams already on MySQL hitting read/write bottlenecks, orgs that can't afford a full DB migration, high-throughput SaaS apps with uneven load patterns.

Links: GitHub | Website

vitess preview


4) Envoy — Lyft's proxy that became the foundation of the service mesh market

What it is: A high-performance L7 proxy and communication bus built at Lyft, now the underlying layer of Istio, AWS App Mesh, and most major service mesh products.

Why it matters in 2026: Nginx handles traffic. Envoy understands services. The moment you need retries, circuit breaking, distributed tracing, and gRPC support in the same proxy — nothing else comes close. Lyft built it because no existing proxy could handle their microservice topology. It's now the de facto standard for any team running services at scale. If you're using a service mesh, you're almost certainly using Envoy without knowing it.

Best for: microservice architectures, teams running on Kubernetes, engineers needing deep per-request observability at the network layer.

Links: GitHub | Website

envoy preview


5) OpenFGA — Auth0's Zanzibar-style fine-grained authorization

What it is: An open-source authorization system based on Google's Zanzibar paper — the same model behind Google Drive and Docs permissions — built and production-tested by Auth0.

Why it matters in 2026: Role-based access control breaks down the moment you need "user X can edit document Y only if they're in project Z and the document isn't locked." Auth0 built OpenFGA because RBAC doesn't model real-world permission graphs — it approximates them, badly. With AI agents now needing scoped, auditable access to specific resources across multiple systems, authorization models that seemed over-engineered in 2022 are now the minimum viable approach.

Best for: multi-tenant SaaS products, platforms with document or resource-level permissions, teams building AI agents that need bounded, auditable access.

Links: GitHub | Website

openfga preview


Fry not sure meme

6) pompelmi — The zero-dep file scanner every serious prod team builds internally and never ships

What it is: A minimal Node.js wrapper around ClamAV that scans any file and returns a typed Verdict (Clean, Malicious, ScanError). No daemons, no cloud, no native bindings, zero runtime dependencies.

Why it matters in 2026: Every team that accepts file uploads eventually writes something like this internally — a ClamAV wrapper buried in a utils folder that never gets cleaned up, documented, or tested properly. pompelmi is what that internal util should have been from the start: typed, tested, and actually installable in one line. With LLM-powered tools now generating and accepting files at scale, scanning uploads before they reach your storage layer isn't paranoid — it's baseline. You don't build a ClamAV wrapper because you want to. You build it because you got burned.

Best for: Node.js apps handling file uploads, SaaS platforms processing user-generated content, teams adding a security layer without adding new infrastructure.

Links: GitHub

pompelmi preview


7) Turborepo — Vercel's monorepo build system with remote caching

What it is: A high-performance build system for JavaScript/TypeScript monorepos with task pipelines, incremental computation, and shared remote cache.

Why it matters in 2026: Vercel built Turborepo because managing 15+ packages in a single repo with a chain of npm run build calls is a slow way to hate your CI. The caching alone — skipping work that hasn't changed — cuts CI time by 40–80% on most real codebases. Remote caching means your teammates benefit from builds you already ran. In a world where AI-assisted development moves at a different pace than legacy CI pipelines, waiting 12 minutes for a green check is a product bottleneck.

Best for: teams with shared component libraries, full-stack TypeScript monorepos, frontend platform teams with multiple apps deploying from one repo.

Links: GitHub | Website

turbo preview


8) OpenTelemetry Collector — The observability pipeline every cloud provider adopted

What it is: A vendor-agnostic agent for collecting, processing, and exporting telemetry (traces, metrics, logs) — the common layer between your app and any observability backend.

Why it matters in 2026: Datadog and New Relic are great until you see the bill at 10M spans per day. OpenTelemetry lets you instrument once and route anywhere — swap backends without rewriting a single line of instrumentation. Every major cloud provider now supports it natively. If you're still vendor-locked on your observability pipeline, you're one contract renewal from a painful, expensive migration. The CNCF graduating it in 2023 wasn't a formality — it was the industry agreeing this is the standard.

Best for: platform engineers building internal observability stacks, teams tired of vendor lock-in, anyone running services across multiple cloud providers.

Links: GitHub | Website

opentelemetry-collector preview


rocket launch

9) Buf — Protobuf tooling that makes gRPC schema management survivable

What it is: A build system, linter, breaking change detector, and schema registry for Protocol Buffers — with remote plugin execution and a full BSR (Buf Schema Registry) for sharing schemas across teams.

Why it matters in 2026: gRPC is excellent until you try to manage .proto files across 8 teams without accidentally breaking a consumer. Protobuf has no standard toolchain, and it shows — protoc is a command-line puzzle from 2008. Buf is what Google and Stripe already have internally: enforced compatibility rules, centralized schema distribution, and CI that fails before you ship a breaking change. With more internal services and AI APIs moving to gRPC for performance in 2026, the schema management problem goes from annoying to blocking.

Best for: teams using gRPC or Protobuf internally, platform engineers managing API schemas across multiple services, anyone doing API versioning where backward compatibility matters.

Links: GitHub | Website

buf preview


Oprah you get a car

Final thoughts

Every tool on this list started as a private repo someone had to fight to get open-sourced.

That's why the most interesting open-source releases right now aren't from startups optimizing for community growth. They're from engineering teams that:

  • Hit a wall that no existing tool could solve
  • Built something internal that actually worked under real load
  • Eventually decided the maintenance cost of keeping it private was higher than publishing it
  • Didn't design for adoption — and ended up getting adopted anyway

Backstage, Temporal, Vitess — all went through internal reviews, legal clearance, and months of cleanup before anyone outside the company could use them. That friction is actually a signal. If a team put in that work to open-source something they didn't have to share, it's usually because the tool genuinely solved something hard.

The irony is that the tools most worth your time have the least marketing behind them.

If I missed something obvious, drop it in the comments.

Which internal tool are you surprised wasn't open-sourced sooner?

Top comments (0)