DEV Community: Jonas Scholz

How We Built Our Own DNS Server

Jonas Scholz — Fri, 17 Apr 2026 20:30:49 +0000

We wrote a production DNS server in ~1000 lines of Go, migrated thousands of records off Hetzner DNS, and dropped propagation time from "up to 90 minutes" to a few seconds. It uses the hidden primary pattern, Postgres as the event bus, and AXFR + IXFR to push zones to public secondaries. Here's how and why we did it!

Why Hetzner DNS Stopped Working for Us

Every service on Sliplane gets a managed subdomain like my-app-abc123.sliplane.app. That means an A and AAAA record for every running service, pointing at the server IP where the container lives. Records scale linearly with the platform.

We started with Hetzner DNS because it was free and we already ran most of our infra there. That worked fine for a while, but after 2 years we hit two walls:

Record limits: Hetzner DNS has a hard cap per zone. Originally 500, they bumped it to 10k for us (genuinely appreciate that), but at our growth rate we'd blow through that within weeks. Apparently we're one of their biggest DNS users by record count :D

Speed: After creating a record via the API, it could take up to 90 minutes before Hetzner's own nameservers actually returned it. For a PaaS where someone just deployed a service and wants to visit the URL, that's a rough experience. Although this wasn't consistently that bad, everytime that happened it directly affected the user experience. It simply looked like our platform was broken (which in that case it was!).

Why Not Just Use Another Managed Provider?

Fair question. For most people, a managed DNS provider is the right answer. But once you start shopping around at our "scale" and constraints, things get annoying fast:

"Contact sales" pricing. A lot of the providers that could comfortably handle our record count sit behind "talk to sales" forms. I hate that. Just tell me what it costs.

Per-record or per-query billing. The ones that do publish pricing often charge per record or per query. We have no idea how many DNS queries we actually serve, so migrating to an unknown pricing model felt like signing a blank check.

EU-only. We're based in the EU and wanted to keep DNS there too. That narrows the field a lot.

And honestly, it sounded fun. I'm a bit of a controlfreak and writing a DNS server is the kind of thing you daydream about. A thousand lines of Go felt worth the freedom. In the end, building the thing took less time than getting a meeting with a managed provider would have 😵‍💫

So we built it ourselves, which brings us to the pattern that made it surprisingly simple.

The Hidden Primary Pattern

The reason why this is all way simpler than I initially thought: our DNS server never answers a single public query.

In DNS, a zone's primary nameserver holds the authoritative records. Secondaries pull copies using AXFR (basically a full zone dump over TCP) and answer public queries just like the primary would. When the primary changes, it sends a NOTIFY to the secondaries, and they pull a fresh copy.

A hidden primary takes this one step further, the primary isn't public at all. It only exists to push zone data to secondaries. The public nameservers, the ones listed at your registrar, are all secondaries.

This means we can run our DNS server wherever we want, use any secondary provider that supports AXFR, and swap providers without changing our server. No lock-in because AXFR and NOTIFY are standard protocols, any compliant secondary will work.

No anycast, no super redundant ddos protected DNS servers deployed across the globe. Just a few instances of our primary hidden server.

The Architecture

The setup is pretty minimal:

Postgres is the source of truth. We install triggers that call pg_notify('dns_zone_changed', '') whenever a service is created, updated, or deleted. No message queue, no webhooks. Postgres is the event bus.

Why not Redis, NATS, or a proper queue? Two reasons. We already run Postgres as our primary database, so LISTEN/NOTIFY is "free" (no free lunch, but as free as it gets) infrastructure, nothing new to operate, monitor, or pay for. And the volume is tiny. Zone changes happen a few times per minute at peak, which is laughably low for anything queue-shaped. Reaching for Kafka here would be like renting a shipping container to mail a postcard.

sliplane-dns is a small Go server (~1000 lines, built on miekg/dns) that subscribes via LISTEN, queries Postgres for all managed domains and their IPs, builds the DNS zone, and serves it via AXFR.

To avoid unnecessary work, we hash all records. If the hash matches the previous zone, nothing happens, no serial bump, no NOTIFY. When the zone actually changes, we bump the SOA serial and send DNS NOTIFY to Hetzner's three secondary IPs. They pull the new zone, and records are live.

To see what a zone transfer actually looks like, here's a minimal DNS server that only speaks AXFR. It serves a hardcoded zone for example.com with a single A record (full code on GitHub):

package main

import (
    "context"
    "log"
    "net/netip"

    "codeberg.org/miekg/dns"
    "codeberg.org/miekg/dns/rdata"
)

func main() {
    soa := &dns.SOA{
        Hdr: dns.Header{Name: "example.com.", TTL: 3600, Class: dns.ClassINET},
        SOA: rdata.SOA{Ns: "ns1.example.com.", Mbox: "admin.example.com.", Serial: 1},
    }
    records := []dns.RR{
        soa,
        &dns.A{
            Hdr: dns.Header{Name: "app.example.com.", TTL: 300, Class: dns.ClassINET},
            A:   rdata.A{Addr: netip.MustParseAddr("1.2.3.4")},
        },
        soa,
    }

    mux := dns.NewServeMux()
    mux.HandleFunc("example.com.", func(_ context.Context, w dns.ResponseWriter, r *dns.Msg) {
        r.Unpack()
        w.Hijack()
        env := make(chan *dns.Envelope, len(records))
        for _, rr := range records {
            env <- &dns.Envelope{Answer: []dns.RR{rr}}
        }
        close(env)
        dns.NewClient().TransferOut(w, r, env)
        w.Close()
    })

    srv := dns.NewServer()
    srv.Addr = ":5553"
    srv.Net = "tcp"
    srv.Handler = mux
    log.Fatal(srv.ListenAndServe())
}

Run it and pull the zone with dig:

dig @localhost -p 5553 example.com AXFR

example.com.        3600    IN    SOA    ns1.example.com. admin.example.com. 1 0 0 0 0
app.example.com.    300     IN    A      1.2.3.4
example.com.        3600    IN    SOA    ns1.example.com. admin.example.com. 1 0 0 0 0

The full zone transfer is just SOA, all records, SOA again. This is roughly what Hetzner's secondaries pull from our production server, just with a few thousand more records in between the two SOAs.

Saturday Night DNS Surgery

You can't gradually migrate DNS nameservers. The NS records at the registrar point to either the old set or the new set. There's a cutover window, no way around it.

We had to switch from Hetzner's nameservers (hydrogen.ns.hetzner.com, oxygen.ns.hetzner.com, helium.ns.hetzner.de) to Hetzner Robot's secondary nameservers (ns1.first-ns.de, robotns2.second-ns.de, robotns3.second-ns.com).

During the transition, resolvers with cached old NS records would still ask the old servers and get stale data until TTL expired. Two things made this manageable: the NS delegation TTL was 5 minutes, and only new services deployed during that window were affected. Existing A/AAAA records were identical on both sets of nameservers.

We did it on a Saturday night when platform activity was lowest. It went smooth, no users complained!

The One Thing That Bit Us: IXFR

I went into this thinking AXFR was enough. It's the protocol every tutorial shows, every example uses, and it's what I built first. Full zone dump, SOA at the start, SOA at the end, done.

Turns out Hetzner Robot's secondaries don't just do AXFR. When they already have a zone and see a new SOA serial via NOTIFY, they ask for an incremental zone transfer first (IXFR, RFC 1995), a diff of only the records that changed since the old serial. If the primary doesn't speak IXFR, a well-behaved secondary falls back to AXFR. Hetzner Robot apparently doesn't fall back cleanly in every case, so zones weren't updating reliably until we implemented IXFR too.

IXFR isn't hard, you just keep a small history of recent zone versions and, on request, return the delta between the client's serial and the current one. But it's the kind of thing you'd only discover by actually shipping it against a real secondary. Cheers to whoever wrote that RFC.

Was It Worth It?

So far, 100%. Propagation went from "up to 90 minutes" to however long it takes to do a zone transfer, which for our zone size is practically instant. The zone grows with the platform without hitting any record ceilings, and we also have full observability baked in.

Should You Do This?

Probably not. Use Cloudflare DNS, Route 53, or whatever managed DNS your provider offers. They're fast, they work, and you don't have to think about them.

But if you do end up hitting the limits of a managed DNS provider, the hidden primary pattern is worth knowing about. Your primary doesn't need to be public, you can use any AXFR-compatible secondary, and you can swap providers without touching your server.

Cheers,

Jonas, Co-Founder sliplane.io

vCPUs are a marketing scam

Jonas Scholz — Sat, 24 Jan 2026 22:56:53 +0000

You've probably seen "4 vCPU" on a pricing page and wondered what that actually means. Is it 4 CPU cores? 4 threads? Something else entirely?

The short answer: a vCPU is whatever the hell your cloud provider decides. Sometimes it's a share of CPU time, sometimes it's an actual hardware thread, and sometimes providers offer both - AWS has regular instances with dedicated threads and burstable instances with CPU credits.

This post focuses on the quota-based model, because that's where the confusing behavior lives. Understanding how it works will save you debugging headaches.

To be fair, I dont actually think vCPUs are a scam usually. But it made you click, and now I get to explain to you some fun stuff about Linux CFS!

If you're a visual learner and want to skip the text, try this out:

The interactive component doesnt work on dev.to, click the image to land on my own blog where it works!

Why vCPUs Exist

Most web applications don't need constant CPU. Your server processes a request in milliseconds, then sits idle waiting for the next one. Even under load, CPU usage typically looks like spikes rather than a flat line.

Giving each customer a dedicated physical core would waste most of that capacity. Instead, providers give you a quota of CPU time: a baseline you can always use, plus a burst allowance for spikes. If your API needs 50ms of CPU three times per second, that's 150ms out of 1000ms - why pay for the other 850ms?

There's another reason: "1 core" is meaningless as a unit. A 2025 AMD EPYC core and a 2012 Xeon have completely different performance. A time-based quota at least gives you a consistent allocation, even if what you can accomplish in that time still depends on the underlying hardware.

To use this model well, you need to understand how it works under the hood.

The CFS Bandwidth Controller

Linux uses the CFS (Completely Fair Scheduler) bandwidth controller to manage CPU quotas. Three parameters control everything:

cpu.cfs_quota_us: How much CPU time (in microseconds) you get per period
cpu.cfs_period_us: How long each accounting period is (also in microseconds)
cpu.cfs_burst_us: The maximum accumulated run-time you can bank (in microseconds)

These are cgroup v1 names, which I find easier to understand. If you're configuring this yourself, you're probably on cgroup v2, which uses cpu.max and cpu.max.burst instead.

For example, if your quota is 25,000µs (25ms) and your period is 100,000µs (100ms), you can use 25ms of CPU time every 100ms. That's equivalent to 25% of a single CPU's time - and that might be what "1 vCPU" means on a shared instance.

The math: quota / period = your CPU share. A quota of 50ms per 100ms period means 50% of a CPU's time. Note that this is aggregate time across all threads in the cgroup, not pinned to a single core.

The burst parameter allows unused quota to accumulate. If your app only uses 10ms during one period, the remaining 15ms (partially) gets added to a burst balance. When a later request needs more than your baseline quota, it can draw from this balance to exceed the baseline temporarily. Burst is capped between 0 and your quota, and is often disabled by default.

Once the burst balance hits zero, you're capped at your baseline until it recovers. It only recovers when you're using less than your baseline - which becomes difficult when requests are queuing up.

See It In Action

Play with this simulator to build intuition for how quota and period settings affect latency:

The interactive component doesnt work on dev.to, click the image to land on my own blog where it works!

Try these experiments:

Switch to "Spiky" workload and watch the balance drain to zero, triggering throttling (red dashed line)
Increase the baseline to 25% - notice how the balance stays healthier and throttling decreases
With "Bursty" workload and low baseline, see how bursts drain balance but it recovers during idle periods

What Happens When You Exceed Your Quota

Let's say you have a 25ms quota per 100ms period, and a request comes in that needs 30ms of CPU time to process.

Your process starts running
After 25ms, the kernel sees you've used your quota
Your process gets paused until the next period starts
75ms later, a new period begins and you get 25ms more
Your process finishes the remaining 5ms

Total wall-clock time: 105ms for 30ms of actual work.

Your process wasn't slow - it was waiting. When you exceed your quota, latency doesn't degrade gracefully. It jumps by the length of the remaining period. That sucks, especially for your P99 latency!

When This Works Well

The quota model fits most web workloads because they're inherently bursty - short CPU spikes with idle time in between. If your app averages 10% CPU but occasionally spikes to 50%, the burst balance absorbs those spikes while idle periods let it recover.

When This Breaks Down

The model falls apart in a few scenarios:

Long synchronous operations: A request needing 50ms of CPU with a 25ms quota will always get throttled, regardless of burst balance.

Latency-sensitive workloads: If P99 latency matters, your longest operations need to fit within your quota.

Sustained load: Once burst balance depletes and requests queue up, each new request starts mid-period with less quota remaining. The backlog compounds.

Practical Takeaways

1. Size for your longest operations, not average CPU: If your P99 request needs 40ms of CPU, a 25ms quota will throttle those requests every time.

2. Shorter periods reduce worst-case throttling delay, longer periods increase it: A 50ms period means you wait at most 50ms when throttled. A 100ms period means potentially waiting 100ms. The tradeoff is how quota gets distributed within each period.

3. Watch for cascade effects: When one request gets throttled and takes longer, it holds a connection longer, which can cause queuing, which makes the next request start with less remaining quota in the period.

4. "Low CPU usage" can be misleading: If your monitoring shows 20% CPU but users complain about latency, check throttling stats. You might be at 80% of your quota while only using 20% of the physical core.

5. Consider dedicated CPU for latency-critical paths: If consistent latency matters more than cost, dedicated CPU instances guarantee you won't share with noisy neighbors. But of course, "dedicated" also has many different definitions. Sometimes that just means a bigger slice!

The Bottom Line

vCPUs aren't a scam; they're a mostly sensible way to share compute resources efficiently. The quota system works great for bursty workloads, which is most workloads.

The key is understanding that exceeding your quota doesn't make things "a little slower" - it makes them wait for the next period. Once you internalize that, you can make informed decisions about resource sizing and understand why latency sometimes spikes even when CPU "looks fine."

Notes & References

The cover image illustration was generated with AI.

Linux kernel CFS bandwidth controller documentation

Fly.io - Understanding VM CPU Performance

LWN: Control Group CPU throttling and bandwidth

Adam Logic on Bluesky - vCPU explanations

free yourself of overpriced docusign and self-host DocuSeal instead 🦭🦭🦭

Jonas Scholz — Fri, 09 Jan 2026 16:42:18 +0000

atakanozt

Jan 9

Self-hosting DocuSeal the easy way

#docker #selfhosted #tutorial #opensource

4 min read

Tech Stack Lessons from scaling 20x in a year

Jonas Scholz — Fri, 09 Jan 2026 12:13:03 +0000

A year ago, I wrote about our tech stack and how it helped us run a lean cloud computing startup. Since then, we've scaled over 20x. That kind of growth is fun, but also breaks a lot of things and assumptions; and forces you to make hard choices, quickly :D

Here's what changed, what stayed the same, and what we learned along the way.

What Stayed the Same

Some things just work. Our frontend is still Nuxt with Typescript and Tailwind (RIP). Our backend is still Go with Go-Gin. We still run on Hetzner bare-metal and use Firecracker for virtualization. Terraform still manages our infrastructure. Redis still handles caching. Crisp still powers customer support. AWS SES still sends our transactional emails.

If it ain't broke, don't fix it.

But plenty did break — or became too expensive to keep running the same way.

Observability: Axiom → Parseable

This was our biggest operational change. Last year I praised Axiom for logs. It was great, on the base plan. Until we scaled.

As our traffic grew, so did our need for better tracing and more detailed logs. Our Axiom bill exploded past €1,000/month and kept climbing. At that point, you have to ask yourself: is this sustainable? Obviously not lol.

We migrated to Parseable, self-hosted on Kubernetes with Minio for S3-compatible storage, all running on bare-metal. The product still feels early, but the team is responsive and ships fixes fast when something breaks. Big shoutout to Anant and Deba!

Would I recommend it? If you can't trade boatloads of money for time, yes. Self-hosting observability is work, but at our "scale" (we are still tiny), it's worth it. We still use Grafana for dashboards and alerts, that hasn't changed (for now, the bill is starting to hurt).

Object Storage: Backblaze → IONOS/Hetzner

Last year we used Backblaze for blob storage. It was cheap and reliable. The problem wasn't technical, it was purely political and a question of positioning.

As we grew, so did the type of customers we attract. Enterprise customers, especially European ones, started pushing back on storing their data with US providers. GDPR compliance, data sovereignty, internal policies; the reasons varied, but the message was clear. No US providers! So our crusade to replace all US providers began with Backblaze.

We moved to IONOS and Hetzner for object storage. Are they as good as Backblaze? No, not even close. But they're European, they're (barely) good enough, and they satisfy our customers' requirements. Honestly, if you're not required to use them I wouldn't. It feels like we don't really have a choice here.

CDN: Cloudflare → Bunny

Same story as storage. Cloudflare is an incredible product with features we'll never use. But customers asked for a European alternative.

Bunny fits the bill. It's not feature-complete like Cloudflare, but it handles our CDN needs perfectly. It's fast, reasonably priced, and European. In that case this wasn't even a real tradeoff, Bunny does exactly what we need. For our super simple setup the migration took less than 2 hours.

CI/CD: GitHub Actions → Namespace

GitHub Actions served us well, but it's stagnated. We needed nested virtualization for testing Firecracker stuff. We needed better performance. GitHub wasn't delivering.

We moved to Namespace for our runners. It's a great product — also European, which is becoming a theme here. The performance improvements alone were worth the switch.

That said, we'll probably migrate to completely self-hosted runners eventually. The more we scale, the more control we want.

Data Persistence: The Big One

This was our most significant architectural change. Last year, I bragged about running everything in Postgres with Timescale, including hundreds of millions of analytics rows. That worked great until our database hit 2TB.

At 2TB, Postgres becomes hard to manage. Stupid queries can take down prod, scaling is painful. Database pros are going to laugh about me here, 2TB is probably nothing in the grand scheme of things! I am not a postgres pro, and honestly wasn't planning on becoming one. Additionally, the cost just started to hurt. Especially considering that we want to do another 20x in 2026.

So we built something simpler: hot data lives in Postgres, then gets flushed to S3 as Parquet files. For queries, we use DuckDB to read directly from S3. DuckDB is amazing.

The results surprised us. P99 latency actually improved. Why? Most queries are "give me the last 5 minutes of metrics" or "show me the last 500 logs." That's all hot data sitting in Postgres. Historical queries hit S3, and DuckDB handles Parquet files like a champ. Those are, if not cached, of course slightly slower.

This architecture saves money, scales better, and plays to our strengths. We understand S3. We don't understand running a 10TB Postgres cluster :D

The Pattern

Looking back at all these changes, there's a clear pattern:

European everything. Customer pressure pushed us toward EU providers. Again, this isn't a technical decision. It's a business reality when you grow beyond startups and indie hackers.
Self-host at scale. SaaS products are great until your bill crosses a threshold. Then you have to do the math on whether your time is cheaper than their prices.
Simple beats clever. We didn't build a fancy distributed database. We flush data to S3 and query it with DuckDB. It's not sexy, but it works! (Actually I think the simplicity is quite sexy, but not great for resume-driven development)

What's Next

We'll probably self-host our CI runners soon. We're evaluating alternatives to AWS SES since, you know, European.

The stack will keep evolving. That's the nature of building infrastructure at scale. But the core philosophy stays the same: keep it simple, keep it maintainable, and only add complexity when the problem forces you to.

That's where we're at in 2026. Twenty times bigger, a few hard lessons learned, and a stack that's more European than ever.

Cheers,

Jonas

How to Deploy NiceGUI Apps with Docker on Sliplane

Jonas Scholz — Sat, 13 Sep 2025 19:02:08 +0000

NiceGUI is a fantastic Python framework for creating web-based user interfaces with ease. If you've built a NiceGUI app and want to deploy it without the complexity of managing servers, you're in the right place. In this tutorial, I'll show you how to containerize and deploy your NiceGUI application on Sliplane.

Prerequisites

Before we start, make sure you have:

A NiceGUI application ready to deploy
Docker installed on your local machine (for testing)
A GitHub repository with your NiceGUI code
A Sliplane account

Step 1: Prepare Your NiceGUI Application

First, let's make sure your NiceGUI app is production-ready. Here's a basic example of a NiceGUI application:

from nicegui import ui

@ui.page('/')
def index():
    ui.label('Hello NiceGUI World!')
    ui.button('Click me!', on_click=lambda: ui.notify('Button clicked!'))

if __name__ in {"__main__", "__mp_main__"}:
    ui.run(host='0.0.0.0', port=8080, reload=False)

Key points for production deployment:

Set host='0.0.0.0' to accept connections from outside the container
Use any port you prefer (Sliplane auto-detects ports)
Set reload=False for production stability
Use the if __name__ check to prevent issues with container restarts

Step 2: Create the Dockerfile

Create a Dockerfile in your project root:

FROM zauberzeug/nicegui:latest

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080

CMD ["python", "main.py"]

This Dockerfile:

Uses the latest official NiceGUI base image (check Docker Hub for specific versions if needed)
Installs your dependencies from requirements.txt
Copies your application code
Exposes port 8080
Runs your main application file

Step 3: Create Requirements File

Create a requirements.txt file with your dependencies:

nicegui
# Add any other dependencies your app needs
# For example:
# pandas
# requests
# matplotlib

Step 4: Add Docker Ignore File

Create a .dockerignore file to exclude unnecessary files:

__pycache__
*.pyc
*.pyo
*.pyd
.git
.gitignore
README.md
.pytest_cache
.coverage
venv/
env/
.venv/
.env/

Step 5: Test Locally

Before deploying, test your container locally:

# Build the image
docker build -t my-nicegui-app .

# Run the container
docker run -p 8080:8080 my-nicegui-app

Visit http://localhost:8080 to verify your app works correctly in the container.

Step 6: Deploy on Sliplane

Now for the easy part! Here's how to deploy your NiceGUI app on Sliplane:

Sign up for Sliplane (first 2 days free)
Connect your GitHub repository by clicking "Create Service" and selecting your repository
Configure your service:
- Service name: my-nicegui-app (or your preferred name)
- Keep other settings as default (Sliplane auto-detects ports)
Click Deploy and wait about 2-3 minutes
Access your app at https://my-nicegui-app.sliplane.app

That's it! Your NiceGUI app is now live and accessible worldwide.

Automatic Updates

Whenever you push changes to your GitHub repository, Sliplane automatically rebuilds and deploys your application. No manual intervention required.

Cost Comparison

Platform	Monthly Cost	Features
Sliplane Base	€9.00	2 vCPU, 2GB RAM, 40GB SSD
Google Cloud Run	~$132	2 vCPU, 2GB RAM, pay-per-use
Heroku Standard-2X	$50	2 vCPU, 2GB RAM
DigitalOcean App	$25	2 vCPU, 2GB RAM, dedicated

Sliplane offers excellent value with dedicated resources and no cold starts.

FAQ

Q: Can I run multiple NiceGUI apps on one server?
A: Yes! Sliplane allows unlimited containers per server. Deploy multiple NiceGUI apps and they'll share the server resources.

Q: Does NiceGUI work well in containers?
A: Absolutely. NiceGUI was designed with containerization in mind and works perfectly with Docker.

Q: How do I handle file uploads?
A: Use NiceGUI's built-in upload component and save files to a persistent volume mounted at /data.

Q: Can I use external APIs?
A: Yes, NiceGUI apps can make HTTP requests to external APIs. Store API keys as environment variables in Sliplane.

Ready to deploy your NiceGUI application? Sign up for Sliplane and get your first 2 days free. No credit card required to start!

Cheers,

Jonas

LLMs are the End of Serverless

Jonas Scholz — Tue, 05 Aug 2025 01:20:04 +0000

Remember when serverless was going to revolutionize everything? Well, LLMs just delivered the killing blow.

Here's the thing: In an AI-assisted coding world, proprietary serverless platforms are dead weight. Why? Because LLMs understand Docker like they understand breathing, but they choke on your special snowflake Lambda configuration.

Let me explain why serverless was already a scam and how LLMs just made it ten times worse.

The Original Sin: Serverless Was Always Broken

Before we get to the LLM angle, let's recap why serverless was already a bad idea:

The Promise:

No servers to manage!
Infinite scale!
Pay only for what you use!

The Reality:

15-minute execution limits
Cold starts that make your app feel broken
Surprise $10,000 bills
Vendor lock-in so tight it hurts
Debugging that makes you question your career choices

You know what doesn't have these problems? A container.

Enter LLMs: The Final Nail in the Coffin

Here's where it gets spicy.

When you're coding with Claude, ChatGPT, or Cursor, what works better?

Option A: "Deploy this to Docker"

docker build -t my-app .
docker run -p 3000:3000 my-app

Option B: "Deploy this to AWS Lambda with API Gateway, configure the execution role, set up the VPC endpoints, create a deployment package with the right runtime, configure the event source mappings..."

The LLM's response to Option B: confused screaming

Why LLMs Love Docker (And Hate Your Serverless Platform)

1. Documentation Density

Docker has been around since 2013. That's over a decade of:

Stack Overflow answers
GitHub examples
Blog posts
Official docs
YouTube tutorials

AWS Lambda? Sure, there's documentation. But it's:

Constantly changing
Platform-specific
Full of edge cases
Buried in AWS's labyrinth of services

When an LLM trains on the internet, it sees 1000x more Docker examples than CloudFormation YAML nightmares.

2. Universal Patterns vs. Proprietary Nonsense

Docker is just Linux containers. The patterns are universal:

Environment variables work the same everywhere
Volumes are just mounted directories
Networking is standard TCP/IP

Serverless? Every platform invents its own:

Event formats
Configuration syntax
Deployment procedures
Debugging tools
Billing models

LLMs can't keep up with this Tower of Babel.

3. Local Development = Better LLM Assistance

Watch this:

Me: "Help me debug why my container isn't connecting to Redis"

LLM: "Let's check your docker-compose.yml, ensure the services are on the same network, verify the connection string..."

vs.

Me: "Help me debug why my Lambda can't connect to ElastiCache"

LLM: "First, check your VPC configuration, then the security groups, subnet associations, NAT gateway, execution role permissions, and... wait, are you using VPC endpoints? What about the Lambda ENI lifecycle? Did you enable DNS resolution in your VPC?"

head explodes

"But Serverless Scales!"

So does Kubernetes. So does Docker Swarm. So does literally any container orchestrator.

But here's the thing: with containers + LLMs, you can actually implement that scaling:

Me: "Add horizontal autoscaling to my Docker Compose setup"

LLM: "Here's a complete docker-compose.yml with scaling configuration, health checks, and load balancing..."

vs.

Me: "Add autoscaling to my Lambda"

LLM: "First, create an Application Auto Scaling target, then define a scaling policy using CloudWatch metrics, but make sure your concurrent execution limits don't interfere with account limits, and don't forget about reserved concurrency vs provisioned concurrency..."

Which one are you actually going to implement correctly?

Breaking Free: The Container + LLM Combo

Here's your escape plan:

Pick boring technology: Docker, PostgreSQL, Redis
Use standard patterns: REST APIs, background workers, cron jobs
Deploy anywhere: VPS, Kubernetes, even Sliplane (yes, shameless plug)
Let LLMs actually help: They understand these tools

Your AI assistant becomes a force multiplier instead of a confused intern.

The Future Is Boring (And That's Beautiful)

We're entering an era where AI can write most of our code. But it can only write code for platforms it understands.

Docker is boring. PostgreSQL is boring. Redis is boring.

You know what? Boring means:

Documented
Predictable
LLM-friendly
Actually works

Serverless is "exciting": excitingly broken, excitingly expensive, excitingly impossible to debug.

TL;DR

Serverless was already a questionable choice. Now that we code with LLMs, it's practically sabotage.

Your AI assistant can spin up a complete containerized application in seconds. But ask it to debug your Lambda cold start issues? Good luck.

The writing's on the wall: In an LLM-powered development world, proprietary platforms are dead weight. Stick to technologies with deep documentation, wide adoption, and standard patterns.

Or keep fighting with CloudFormation while your competitors ship features. Your choice.

Cheers,

Jonas, Co-Founder of sliplane.io

How to Build Custom Open WebUI Themes

Jonas Scholz — Mon, 28 Jul 2025 19:41:08 +0000

While Open WebUI doesn't have built-in theming support, you can easily customize its appearance by injecting a custom CSS file into the Docker image. This guide will show you how to create your own themed version of Open WebUI.

Want to see a complete example? Check out our Open WebUI Theme repository on GitHub for a full working implementation.

Look at this beautiful (questionable) pink theme:

Prerequisites

Docker installed on your system
Basic CSS knowledge
A text editor

Step 1: Create a Dockerfile

First, create a Dockerfile that extends the Open WebUI image:

FROM ghcr.io/open-webui/open-webui:git-49a928d

# Optional: Replace favicon icons
# COPY favicon.svg /app/build/static/favicon.svg
# COPY favicon.png /app/build/static/favicon.png
# COPY favicon.ico /app/build/static/favicon.ico

# Copy your custom CSS file
COPY custom.css /app/build/static/custom.css

Important: Always use a specific version tag (like git-49a928d) instead of main to ensure your theme doesn't break with updates. Check the Open WebUI releases for available tags, especially if you need CUDA or Ollama support.

Step 2: Create Your Custom CSS

Create a custom.css file in the same directory as your Dockerfile. Here's an example theme with a blue and yellow color scheme:

:root {
  --primary-text: #00487d;
  --primary-yellow: #ffd600;
  --primary-bg: #e2eef5;
  --hover-bg: #d4e3ed;
}

* {
  color: var(--primary-text) !important;
}

#send-message-button {
  background-color: var(--primary-yellow) !important;
}

#sidebar {
  background-color: var(--primary-bg) !important;
}

#sidebar > div *:hover {
  background-color: var(--hover-bg) !important;
}

[aria-label="Voice mode"] {
  background-color: var(--primary-yellow) !important;
}

.tippy-content {
  background-color: var(--primary-yellow) !important;
  border-color: var(--primary-yellow) !important;
}

.tippy-box {
  background-color: var(--primary-yellow) !important;
  border-color: var(--primary-yellow) !important;
}

button[type="submit"] {
  background-color: var(--primary-yellow) !important;
}

[role="switch"][aria-checked="true"] {
  background-color: var(--primary-yellow) !important;
}

button.px-3\.5.py-1\.5.text-sm.font-medium.bg-black.hover\:bg-gray-900.text-white.dark\:bg-white.dark\:text-black.dark\:hover\:bg-gray-100.transition.rounded-full {
  background-color: var(--primary-yellow) !important;
}

Step 3: Finding CSS Selectors

To customize other elements:

Open Open WebUI in your browser
Right-click on the element you want to style
Select "Inspect" or "Inspect Element"
Find the appropriate CSS selector
Add it to your custom.css with !important to override existing styles

Step 4: Build and Run Your Custom Image

Build your Docker image:

docker build -t my-custom-openwebui .

Run your themed Open WebUI:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui-custom \
  my-custom-openwebui

The core functionality of Open WebUI hasn't changed. All normal configuration still applies!

Tips and Best Practices

Use CSS Variables: Define colors as CSS variables for easy theme-wide changes
Test Thoroughly: Check all UI elements to ensure your theme doesn't break functionality
Use Specific Selectors: Some elements may need very specific selectors due to Open WebUI's styling approach
Version Control: Keep your Dockerfile and custom.css in version control
Document Your Changes: Comment your CSS to remember what each override does

Deployment

Want to deploy your custom-themed Open WebUI quickly? Check out our guide on self-hosting OpenWebUI with Ollama which shows you how to deploy Open WebUI in minutes. Once deployed, you can easily update the container with your custom theme.

Conclusion

Creating custom themes for Open WebUI is straightforward once you understand the process. By injecting a custom CSS file into the Docker image, you can completely transform the look and feel of your Open WebUI instance. Remember to use specific version tags and test your themes thoroughly before deploying to production :)

Happy theming!

Jonas, Co-Founder of sliplane.io

MCP Servers That I Use as a Technical Founder

Jonas Scholz — Sat, 26 Jul 2025 15:54:44 +0000

There's been a flood of demos lately showing off how MCP servers can write WhatsApp messages or book plane tickets. Most of that feels like novelty, not utility. As a technical founder building Sliplane, a Docker hosting platform, I live in my terminal, and I care about things that help me ship, support customers, and write content fast.

What is MCP?

Quick refresher: MCP (Model Context Protocol) is Anthropic's open protocol that lets AI assistants like Claude connect to external tools and data sources. Think of MCP servers as bridges that give Claude access to APIs, databases, or services. Instead of copy-pasting information back and forth, Claude can directly interact with your tools to get work done.

Here are four MCP servers I actually use every day to get real work done. No gimmicks. Just productivity :)

1. Docker Hub MCP Server

For: Understanding and debugging third-party container images

Most Sliplane users deploy prebuilt images from Docker Hub. But many issues come down to a single missing environment variable or a bad volume mount. When a customer asks for help with an obscure database or web app, I use the Docker Hub MCP server to instantly pull the README and docs of that image.

Instead of clicking around or copying commands from some badly formatted blog, I can just say:

"Get me the required environment variables plus the volume mount for postgres from Docker Hub."

This gets me 80% of the way to understanding the problem. It saves minutes per support ticket, and those add up fast when you're dealing with dozens per day.

2. GitHub MCP Server

For: Digging deeper into niche projects and edge cases

Sometimes the README isn't enough. Maybe the Docker image is outdated, or the docs are missing a config detail. That's when I use the GitHub MCP server. It connects to the linked repo, looks at open issues, and can search for error messages or flags that aren't documented.

For example, a user tried to use a specific node inside n8n and there was no information about it. But with the GitHub MCP server, I searched through the issues and found a common thread that helped. That's the kind of context you don't get from static docs.

3. Sliplane MCP Server

For: Testing deployments directly, from chat

After gathering the docs from Docker Hub and GitHub, I often hand off the problem to Claude, powered by a Sliplane MCP server that wraps our API. It spins up a real server, deploys the container, collects logs, and tries to reason its way to a working setup. Yes, you're basically vibe coding cloud infra. Do it at your own risk :D

This lets me treat deployment debugging as an interactive process. Claude can read logs and then automatically fix the deployment. It has a feedback loop that I don't need to control.

It doesn't always get things perfect. But it gets me most of the way there while I move on to the next support ticket. I still talk to customers manually, but the underlying guesswork gets automated.

4. Dev.to MCP Server

For: Publishing blog post drafts directly from Markdown

Once a weird issue is solved, I turn it into content. That's part of our growth loop at Sliplane. I write Markdown posts like "How to self-host Qdrant with Docker," and then use the inofficial Dev.to MCP server to create a draft on dev.to.

It doesn't publish the post automatically since I still tweak images and layout, but the draft is created with tags, title, and metadata pulled from the file. It's a small but useful automation that saves me time every week.

Why use MCP servers?

The real power isn't that Claude with MCP servers is necessarily faster than doing things manually. If I sat down and debugged a Docker deployment myself, I might even be quicker. But that's not the point. What I often do is specify the problem using speech-to-text, hand it off to Claude, and then move on to something else. Claude runs in the background, trying different approaches, reading docs, and debugging issues while I handle customer calls or write code. I can come back whenever I have time and see what progress was made. It's like having a junior developer who never gets tired and can work on the boring stuff while you focus on what matters.

Final Words

None of these are flashy. They don't write love poems or send Slack messages. But they help me support users, debug real deployments, and create useful content without leaving the terminal. MCP servers can be powerful. You just need to give them real work :D

If you're interested in trying out MCP servers yourself, check out the official MCP repository for more examples and documentation. And if you need a place to host your Docker containers, give Sliplane a try! Now that you can vibe-code cloud infra, no excuse not to try to self-host!

Cheers,

Jonas, Co-Founder sliplane.io

Self-hosting Qdrant the easy way

Jonas Scholz — Sat, 12 Jul 2025 10:32:58 +0000

Qdrant is one of the most popular open-source vector databases for AI and semantic search applications. Whether you're building RAG applications, recommendation systems, or semantic search engines, Qdrant provides high-performance vector similarity search with advanced filtering capabilities.

While the official Qdrant Cloud offering is convenient, it can get expensive fast, especially for production workloads. The good news? You can self-host Qdrant and get the same powerful features! In this tutorial, we'll show you how to deploy your own Qdrant instance on Sliplane for only €9 per month with virtually no limitations and full control over your data.

Why Self-Host Qdrant?

Cost Savings: Save 70%+ compared to managed solutions
Data Privacy: Keep your vectors and metadata on your own infrastructure
Full Control: Configure Qdrant exactly how you need it
No Vendor Lock-in: Migrate your data anytime without restrictions
Scalability: Easily upgrade your server as your needs grow

If you prefer watching a video, here is a 45 second guide on how to deploy Qdrant:

Step-by-Step Deployment Guide

1. Create Your Sliplane Account

Sign up at sliplane.io for free. You can use your GitHub account for quick registration. New users get a 48-hour trial server to test everything out!

2. Set Up Your Server

If you just signed up, you'll have a trial server ready to use. Otherwise:

Navigate to Servers in your dashboard
Click Create Server
Choose the "Base" instance (2 vCPU, 2GB RAM) which is perfect for most Qdrant workloads

3. Deploy Qdrant Service

Go to your project (create a new one or use the default)
Click Deploy Service (top right corner)
Select the Qdrant preset from the available options

4. Configure Security Settings

⚠️ Critical Step: Before deploying, you'll see a random API key automatically generated in the environment variables. Save this API key immediately as you'll need it to authenticate all requests to your Qdrant instance!

5. Launch and Access

Click Deploy and wait for the service to start (usually takes 1-2 minutes)
Once running, your Qdrant instance will be available at: your-service-name.sliplane.app
The web UI will be accessible at: https://your-service-name.sliplane.app/dashboard

6. Test Your Connection

You can quickly test your Qdrant instance using curl:

curl -X GET 'https://your-service-name.sliplane.app/collections' \
  -H 'api-key: YOUR_SAVED_API_KEY'

You should get an empty collections response, confirming your instance is working!

Cost Comparison: Why Sliplane Wins

Here's how Sliplane compares to other hosting options for running Qdrant:

Provider	vCPU	RAM	Storage	Monthly Cost	Setup Complexity
Qdrant Cloud	2	2 GB	40 GB	$29–$49	⭐ Easy
AWS ECS	2	2 GB	40 GB	$40–$60	⭐⭐⭐ Complex
Render.com	1	2 GB	40 GB	$35–$45	⭐⭐ Medium
Fly.io	2	2 GB	40 GB	$20–$25	⭐⭐ Medium
Railway	2	2 GB	40 GB	$15–$66*	⭐⭐ Medium
Sliplane	2	2 GB	40 GB	€9	⭐ Easy

*Railway charges based on actual usage. $66 is maximum possible cost.

Why Choose Sliplane?

Predictable Costs: Flat €9/month with no usage surprises
Simple Setup: Deploy Qdrant in under 5 minutes
European Hosting: GDPR-compliant with excellent latency
Included SSL: HTTPS certificates automatically managed
No Hidden Fees: 2TB bandwidth included, transparent pricing

Frequently Asked Questions

Is self-hosted Qdrant the same as Qdrant Cloud?

Self-hosted Qdrant gives you all the core features of the open-source version, including:

High-performance vector search
Advanced filtering and hybrid search
Clustering and replication
REST and gRPC APIs
Web dashboard

Qdrant Cloud offers additional managed features like auto-scaling. For most use cases, self-hosting provides everything you need at a fraction of the cost. Check the official Qdrant documentation for feature comparisons.

What are the bandwidth costs on Sliplane?

Compute costs are always flat and predictable with no surprises! You get:

2TB bandwidth included in your €9/month plan
€2/TB for additional bandwidth (€14.8/TB in Singapore)
No charges for CPU usage or memory consumption

How do I update my Qdrant version?

Updating is simple:

If using the latest tag: Click Redeploy in your Sliplane dashboard
For specific versions: Update the image tag in service settings and redeploy
Your data persists across updates thanks to volume mounting

Can I migrate from Qdrant Cloud to self-hosted?

Yes! Qdrant provides built-in backup and restore functionality:

Export your collections using Qdrant's snapshot API
Deploy your self-hosted instance on Sliplane
Import your data using the restore API
Update your application endpoints

What if I need more resources?

Sliplane makes scaling easy:

Vertical scaling: Upgrade to larger instances (Medium: €24, Large: €44)
Horizontal scaling: Deploy multiple Qdrant nodes with clustering
Storage expansion: Add persistent volumes as needed

Is my data secure?

Absolutely:

API key authentication protects your instance
HTTPS encryption for all traffic
European data centers with GDPR compliance
Private networking between your services
Automated Backups once per day

Next Steps

Ready to deploy your own Qdrant instance? Sign up for Sliplane and have your vector database running in minutes!

Need help with your deployment? Use our support chat (bottom right corner) or check out our documentation for more guides and tutorials.

Did Elon Musk just invent AGI? Everything you need to know about Grok 4 and how to try it out

Jonas Scholz — Thu, 10 Jul 2025 08:23:19 +0000

Elon Musk’s AI company xAI just dropped a bombshell: Grok 4 is here — and it’s fast, smart, and already topping the charts. Some even say its AGI.

// Detect dark theme var iframe = document.getElementById('tweet-1943158495588815072-471'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1943158495588815072&theme=dark" }

What is Grok 4?

Grok is xAI’s answer to ChatGPT, Claude, and Gemini. It's multi-modal, API-accessible, and now available in two flavors:

Grok 4 (the base model)
Grok 4 Heavy (a multi-agent powerhouse that thinks in parallel)

xAI says it performs better than PhD level on academic tasks - Musk's words, not mine ;)

Benchmark showdown

Grok 4 is already outperforming most models in the wild:

Benchmark	Grok 4	Grok 4 Heavy	o3 (OpenAI)	Gemini 2.5 Pro
Humanity’s Last Exam (no tools)	25.4%	—	21%	21.6%
Humanity’s Last Exam (with tools)	—	44.4%	—	26.9%
ARC-AGI-2	16.2%	—	~8%	~6%

🧠 That’s state-of-the-art territory — especially the ARC-AGI score, nearly 2x the nearest competitor. Say what you want about Elon Musk, but that is impressive.

How much does it cost?

Using Grok 4 via OpenRouter is easy — but not cheap:

Grok 4
- Input: $3 / million tokens
- Output: $15 / million tokens

Compare that to OpenAI’s o3:

Input: $2 / M
Output: $8 / M

There’s also a $300/month SuperGrok Heavy plan for early access to Grok 4 Heavy, new agents, coding models, and even video generation later this year.

Try Grok 4 via OpenRouter

Want to test it yourself? Here’s a simple code snippet using OpenRouter:

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="x-ai/grok-4",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
)

print(completion.choices[0].message.content)

Yep, multimodal! Image in, answer out.

Final thoughts

Between wild benchmark results and Musk’s usual hype, Grok 4 is shaping up to be a serious contender. Whether it holds up in the real world — or just on X.com — is still TBD.

But if you want to play with what might be the most powerful public model today, it’s already live via OpenRouter.

Let’s see what it can do.

Cheers,

Jonas, Co-Founder of sliplane.io

Sources

Move Over LLaMA: Tencent's New Open LLM is Ready to Self-Host

Jonas Scholz — Sun, 29 Jun 2025 00:55:47 +0000

Tencent just released a new open-source model called Hunyuan-A13B-Instruct. It has open weights (not sure about code), and it runs locally (well if you have a B200 GPU). If you're curious about how it performs and want to try it out yourself, here's how to set it up on a rented GPU in a few minutes.

What is Hunyuan-A13B?

Hunyuan-A13B is a Mixture-of-Experts (MoE) model with 80 billion total parameters, but only 13 billion active at a time. This means inference is much cheaper than a full dense model.

Mixture-of-Experts (MoE) is a neural network architecture where only a subset of specialized "expert" sub-networks are activated for each input, reducing computation while increasing model capacity. A gating mechanism dynamically selects which experts to use based on the input, allowing the model to scale efficiently without always using all parameters.

Some highlights:

Supports 256K context out of the box
Fast and slow thinking modes
Grouped Query Attention (GQA) for more efficient inference
Agent-oriented tuning, with benchmark results on BFCL-v3 and τ-Bench
Quantization support, including GPTQ

So far, it looks like a solid candidate for local experimentation, especially for long-context or agent-type tasks. I'm still testing how it compares to other models like LLaMA 3, Mixtral, and Claude 3.

Step 1: Spin Up a RunPod Instance

The easiest way to try it is RunPod(This link will give you between $5 and $500 credits!). You'll need:

A 300 GB network volume
A B200 GPU (I don't think less works, you need ~150GB of VRAM)
A supported PyTorch image

Create a Network Volume

Region: use one where B200 is available (currently eu-ro-1)
Size: 300 GB
Cost: around $21/month (billed even if unused)

Create a Pod

GPU type: B200
Image: runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04
⚠️ Earlier versions didn't work in my testing
GPU Count: 1
Enable SSH + Jupyter
Attach your network volume

Step 2: Install Dependencies

In the notebook terminal:

%pip install transformers tiktoken accelerate gptqmodel optimum

Step 3: Load the Model

Set the cache path so that downloads go to the mounted volume instead of the default root directory:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import os
import re

os.environ['HF_HOME'] = '/workspace/hf-cache' #
model_path = 'tencent/Hunyuan-A13B-Instruct'

tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, cache_dir='/workspace/hf-cache/', local_files_only=False, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)

messages = [
  {
  "role": "user",
  "content": "What does the frog say?"
  },
]

tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt",
                                                  enable_thinking=True # Toggle thinking mode (default: True)
                                              )

outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=5000)
output_text = tokenizer.decode(outputs[0])
print(output_text)

Notes:

First run will download ~150 GB of weights
VRAM usage is ~153 GB during inference
Loading into VRAM takes a few minutes
If GPU util (not just VRAM) goes up, it's running
You can set device_map="cpu" if testing on CPU only. Make sure you have around 200GB of RAM and a good CPU :D

Costs

B200 pod: $6.39/hour
Network volume: $21/month, even if unused
Suggestion: shut the pod down when not in use x)

Tooling Notes

llama.cpp support is not there yet. PR in progress: #14425
Works fine in Python with transformers and bfloat16

Benchmark

The official benchmarks are available on Hugging Face and evaluated by TRT-LLM-backend.

Model	Hunyuan-Large	Qwen2.5-72B	Qwen3-A22B	Hunyuan-A13B
MMLU	88.40	86.10	87.81	88.17
MMLU-Pro	60.20	58.10	68.18	67.23
MMLU-Redux	87.47	83.90	87.40	87.67
BBH	86.30	85.80	88.87	87.56
SuperGPQA	38.90	36.20	44.06	41.32
EvalPlus	75.69	65.93	77.60	78.64
MultiPL-E	59.13	60.50	65.94	69.33
MBPP	72.60	76.00	81.40	83.86
CRUX-I	57.00	57.63	-	70.13
CRUX-O	60.63	66.20	79.00	77.00
MATH	69.80	62.12	71.84	72.35
CMATH	91.30	84.80	-	91.17
GSM8k	92.80	91.50	94.39	91.83
GPQA	25.18	45.90	47.47	49.12

Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below. - Tencent

Topic	Bench	OpenAI-o1-1217	DeepSeek R1	Qwen3-A22B	Hunyuan-A13B-Instruct
Mathematics	AIME 2024 AIME 2025 MATH	74.3 79.2 96.4	79.8 70 94.9	85.7 81.5 94.0	87.3 76.8 94.3
Science	GPQA-Diamond OlympiadBench	78 83.1	71.5 82.4	71.1 85.7	71.2 82.7
Coding	Livecodebench Fullstackbench ArtifactsBench	63.9 64.6 38.6	65.9 71.6 44.6	70.7 65.6 44.6	63.9 67.8 43
Reasoning	BBH DROP ZebraLogic	80.4 90.2 81	83.7 92.2 78.7	88.9 90.3 80.3	89.1 91.1 84.7
Instruction Following	IF-Eval SysBench	91.8 82.5	88.3 77.7	83.4 74.2	84.7 76.1
Text Creation	LengthCtrl InsCtrl	60.1 74.8	55.9 69	53.3 73.7	55.4 71.9
NLU	ComplexNLU Word-Task	64.7 67.1	64.5 76.3	59.8 56.4	61.2 62.9
Agent	BDCL v3 τ-Bench ComplexFuncBench C3-Bench	67.8 60.4 47.6 58.8	56.9 43.8 41.1 55.3	70.8 44.6 40.6 51.7	78.3 54.7 61.2 63.5

Conclusion

This is one of the more interesting open MoE models out right now. It supports long contexts, has some thoughtful design choices, and it's easy enough to run. I'm still evaluating how good it actually is, especially compared to Mistral Magistral and other recent models. If you want to test it yourself, this setup gets you going quickly.

Cheers,

Jonas, Co-Founder of sliplane.io

Cloudflare just released Containers: here's everything you need to know

Jonas Scholz — Wed, 25 Jun 2025 01:06:06 +0000

Cloudflare Containers let you run any Docker image on Cloudflare's 300-plus edge locations.
You control them with a few lines of JavaScript in a Worker, they scale to zero, and you're billed in 10 ms slices while they're awake.

They sit in the gap between:

Model	Strengths	Trade-offs
Workers (today)	Sub-ms startup, worldwide	V8 only, 128 MB RAM
Always-on PaaS (e.g. sliplane)	Simple, predictable	You pay 24 / 7, even when idle
DIY Kubernetes / Fargate	Full control at scale	Cluster, LB, IAM overhead

Cloudflare Containers bring the edge reach and pay-for-use pricing of Workers to workloads that need a full Linux sandbox.

Why would I care?

Native binaries or full FS, so you can run FFmpeg, Pandas, or AI toolchains.
Languages beyond JS or Wasm, such as Go, Rust, Python, Java, Ruby, or anything your Dockerfile holds.
Bigger resource envelope, with up to 4 GiB RAM and half a vCPU per instance (larger sizes are planned).
Per-tenant state, with one container per Durable-Object ID for sticky sessions.
Burst-heavy jobs, such as cron, code evaluation, or on-demand video export.

If your code sleeps a lot, scaling to zero is better than paying for an always-on container (whether that is sliplane, a VPS, or a managed dyno).

How it works

# 1. Scaffold + deploy
npm create cloudflare@latest -- --template=cloudflare/templates/containers-template
wrangler deploy

// 2. Route requests in your Worker
import { Container, getRandom } from "@cloudflare/containers";

class API extends Container {
  defaultPort = 8080;
  sleepAfter  = "10m";
}

export default {
  async fetch(req, env) {
    const instance = getRandom(env.API, 3);   // simple round-robin helper
    return instance.fetch(req);
  },
};

The first hit is a cold-start (about 2 to 3 seconds in beta). After that, the container stays warm until it is idle for the duration set in sleepAfter.

Under the hood, each container is coupled to a Durable Object that handles lifecycle and routing. There is no YAML, no nodes, just code.

Pricing snapshot

Meter (Workers Paid, $5/mo)	Free quota	Over-quota rate
Memory	25 GiB-hours	$0.0000025 / GiB-s
CPU	375 vCPU-min	$0.000020 / vCPU-s
Disk	200 GB-hours	$0.00000007 / GB-s

Instance sizes in beta are dev (256 MiB), basic (1 GiB), and standard (4 GiB). Larger sizes are coming.

Assume a "standard" instance (4 GiB RAM, half a vCPU, 4 GB disk) that runs 24 × 7 for a 30-day month and ships 2 TB of traffic. This is a workload better suited to an always-on PaaS.

Meter	Raw usage	Free quota	Billable	Rate	Cost
Memory	4 GiB × 2 592 000 s = 10 368 000 GiB-s	25 GiB-h = 90 000 GiB-s	10 278 000 GiB-s	$0.0000025 / GiB-s	$25.70
CPU	0.5 vCPU × 2 592 000 s = 1 296 000 vCPU-s	375 vCPU-min = 22 500 vCPU-s	1 273 500 vCPU-s	$0.000020 / vCPU-s	$25.47
Disk (ephemeral)	4 GB × 2 592 000 s = 10 368 000 GB-s	200 GB-h = 720 000 GB-s	9 648 000 GB-s	$0.00000007 / GB-s	$0.68
Egress (NA/EU)	2 TB = 2048 GB	1 TB	1024 GB	$0.025 / GB	$25.60

Variable total: about $77.44 per month.
Add the $5 Workers Paid subscription, and the total is about $82.44 all-in.

A comparable always-on PaaS instance (such as sliplane or a small VPS) might cost $7 to $15 per month flat, so for high-utilisation, bandwidth-heavy services, Cloudflare Containers can be five to ten times more expensive.

Rule of thumb: workloads that idle most of the day tend to cost less on Containers. Steady-state, high-utilisation services can still be cheaper on an always-on host like sliplane.

Current beta limits

Manual scaling, you call get(id). Autoscale and latency routing are planned.
Ephemeral disk, so you get a fresh FS after each sleep.
40 GiB RAM and 20 vCPU account cap (temporary).
Linux/amd64 only, no ARM support yet.
No inbound TCP or UDP, since everything is proxied through a Worker HTTP call.

When to pick Containers vs. an always-on PaaS

Scenario	Containers	sliplane / always-on
Edge-adjacent AI image generation, mostly idle	✅
24/7 REST API with over 70% utilisation		✅ simpler, lower steady cost
Per-tenant sandbox (one container per user)	✅
Database that needs persistent volumes		✅

A mixed model often wins. You can run your persistent database on sliplane (or similar), bursty compute on Cloudflare Containers, and connect them with a Worker.

Takeaway

Cloudflare just introduced what is essentially serverless Fargate at the edge: Docker images, millisecond billing, global points of presence, and no cluster busywork. If Workers' V8 box ever felt cramped, or your always-on container spends most of its time idle, try spinning up a beta Container and see what the edge can do.

Happy hacking!

Cheers,

Jonas, Co-Founder of sliplane