Mason K

Posted on May 19 • Edited on May 29

Why I stopped self-hosting videos and moved to a video API

#api #videotech #learning

Self-hosting made sense when we started.

We didn’t have a massive content library. Just a few product walkthroughs, some onboarding videos, and a couple of marketing clips embedded in the homepage. So I figured why bring in another service when I could control the whole thing myself?

It felt clean. Efficient. “Real engineer” stuff.

I set up the usual stack: S3 buckets for storage, FFmpeg for encoding, CloudFront as the CDN, and some Lambda functions to glue it all together. Simple enough. Plus, it meant:

I could fine-tune the encoding settings and generate only the resolutions we needed.
We avoided adding yet another vendor dependency in the stack.
Costs were minimal just bandwidth and storage.
It worked well enough for an MVP.
Back then, I believed owning the pipeline meant owning the quality.
And to be fair it did work. Until it didn’t.

My stack: How I self-hosted video

Here’s exactly what my self-hosted video setup looked like not aspirational, just what I made work. It grew out of necessity, piece by piece. And like most DIY pipelines, it got messy fast.

Storage & delivery

Every video file was uploaded to an Amazon S3 bucket with versioned keys and private access by default. I used presigned URLs for both upload and playback authorization.
To serve the files, I layered CloudFront on top with two key rules:

Caching HLS .m3u8 playlists and .ts segments aggressively
Using Lambda@Edge to attach short-lived access tokens and apply cache-busting for updated manifests
This gave us basic global delivery, but required constant tuning, especially around signed URLs expiring too early or CDN invalidations missing the edge.

Transcoding

Encoding was entirely manual. We used FFmpeg in two environments:

For small files: run directly inside AWS Lambda (with a custom build)
For anything larger: trigger an SQS job processed by EC2 spot instances

I created the ABR ladder myself using static presets for:

240p at ~400 kbps
480p at ~800 kbps
720p at ~1.5 Mbps
1080p at ~3 Mbps A typical HLS master manifest looked like this:

m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240
240p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=854x480
480p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1920x1080
1080p/index.m3u8

Everything from ladder logic to segment duration was hand-tuned in shell scripts. And if I wanted DASH support? That was another FFmpeg pass and even more config.

Embedding & playback

The frontend used hls.js for most browsers. I had a simple feature check for MediaSource support and a fallback to the native HTML5 video element for basic compatibility.

For older devices or odd edge cases (looking at you, Safari on iOS 12), I had a few hard-coded hacks to switch source formats or mute autoplay issues.

Styling was done with a responsive container like:

CSS

.aspect-ratio {
  position: relative;
  padding-top: 56.25%;
}
.aspect-ratio > video {
  position: absolute;
  top: 0; left: 0;
  width: 100%;
  height: 100%;
}

It wasn’t elegant but it worked. Usually.

Upload workflow

This was a classic patchwork of serverless logic:

Frontend calls our backend to request a presigned S3 URL.
Client uploads the raw MP4 directly to S3.
S3 triggers a Lambda that validates the upload and kicks off an encoding job (either inline for <100MB or queued to EC2).
Once transcoding finished, we’d write new .m3u8 manifests to a CloudFront path and send a webhook to our app to mark the video as “ready.”

No queuing framework. No retries. If encoding failed, I got an alert or worse, a user complaint.

Monitoring video performance

This was practically non-existent.

We had no first-class playback analytics. No QoE metrics. Just:

CloudWatch logs from encoding jobs
S3 access logs (which we rarely parsed)
Occasional bug reports from users: “The video won’t load,” “It’s blurry,” or “It works on Chrome, but not Firefox.”

Debugging meant cross-referencing player logs, CDN cache status, and FFmpeg outputs often with no clear root cause.

What worked in the system (Until it didn’t)

To be fair, a lot of it did work.

Having full control over the pipeline meant I could decide exactly how videos were encoded. I could tune the bitrate ladders, segment durations, keyframe intervals all of it.

Everything lived in my infra. Storage costs were transparent. Bandwidth usage was easy to track. I knew how much every video cost us in S3 and CloudFront down to the byte.

Integration into our dev workflow was smooth maybe even too smooth. A build job would run FFmpeg, push assets to S3, invalidate CloudFront, and deploy updated embed links. It felt tightly integrated with the way we shipped everything else.

And honestly, for a while, it was good enough:

Static demos on the homepage
Product walkthroughs inside our app
Internal how-to videos for the team

No complex personalization, no livestreaming, no real analytics requirements.

But as soon as we started relying on video for real product experiences and not just assets on the side problems started showing up in the cracks.

Problems started accumulating

At some point, maintaining the stack became more work than building the product. Everything that seemed lightweight at first started stacking up slowly, then all at once.

Manual transcoding overhead

FFmpeg was powerful, but using it at scale is another story.
Tuning encoding settings became a never-ending task. Every time a new mobile device or browser version shipped, I'd need to recheck compatibility, bitrate profiles, and playback behavior.

Even minor adjustments like trying shorter segment durations for better latency, meant hours of testing and regeneration. One typo in a CLI flag, and you’d end up with broken manifests or unseekable video.

Playback inconsistencies

Things broke in weird, inconsistent ways:

Safari started blocking autoplay unless videos were muted by default.
Android devices failed to render fallback streams correctly.
Some captions just… didn’t appear, depending on the browser version.

None of this was visible at upload. Everything seemed fine, until users started complaining, and I had to reverse-engineer playback issues one browser at a time.

No real observability

This was the biggest blind spot.
I had no reliable way to answer basic questions:

Did the video start for the user?
Was it buffering more than usual?
Where did they drop off?

The only data I had come from browser dev tools, CloudWatch logs, and user screenshots. Debugging playback meant piecing together fragments from multiple systems, and guessing. A lot of guessing.

Scaling cost & latency

The deeper we got into usage, the more fragile everything felt:

Lambda functions that handled uploads had cold starts that hit just when we needed them to be fast.
CDN invalidation was never instantaneous. Some users got stale playlists, others got 404s.
Egress charges from S3 and CloudFront started creeping up, especially once we began serving higher-quality streams to global users.

None of this was obvious early on but it all added up.

Security gaps

Implementing secure video delivery was a DIY project in itself:

I had to roll custom logic for signed URLs and token expiry.
No native support for watermarking, session binding, or DRM.
Protecting streams from hotlinking or CDN leeching required more Lambda@Edge logic and more caching tradeoffs.

Even then, it felt patchy. Never airtight.

Developer time tax

Every new feature request became a tangent:

Add support for multi-language captions? That’s a custom text track parser.
Show dynamic thumbnails on hover? Time to spin up another encoding pipeline.
Enable live-to-VOD recording? Not without a whole new ingestion architecture.

These weren’t edge cases, they were standard expectations. And each one meant lost development time and weeks of detours.

It wasn’t about video anymore. It was about time, my team’s time, my roadmap’s time, and all the time we weren’t spending on the actual product.

Just getting the basics meant building systems for uploading, transcoding, adaptive bitrate packaging, storage, delivery, playback, and monitoring. Then came the extras: thumbnails, captions, metadata, moderation. And if we wanted to go live? Add ingest, recording, clipping, simulcasting.

That’s half a dozen systems. Held together with glue. And every one of them pulled us further from what we were actually here to build.

What I needed instead

It took me too long to admit it, but most of my energy wasn’t going into building features, it was going into managing a pipeline I didn’t want to own.

The goal was never to become a video infrastructure engineer. I just wanted users to watch videos without friction.
What I actually needed was simple but completely different from what I had built.

Uploads needed to be fast, resumable, and secure. Not “hope your connection doesn’t drop mid-way” S3 links.
Transcoding should be just-in-time, tailored to the user’s device and bandwidth not pre-rendered into static variants that may or may not get used.
Adaptive streaming and device support should be automatic. I shouldn’t have to test every browser and OS combo just to make sure a 720p stream works.
Embeds should just work, with consistent behavior across browsers, devices, and network conditions.
Playback should be observable. I needed to know when someone hit play, when they dropped, and whether buffering or resolution switching killed the experience.
Most importantly, I needed to spend my time building product features, not re-learning FFmpeg flags or chasing CDN cache bugs.
This wasn’t a job for another shell script. This needed a system.

Why I switched to a video API

Eventually, I stopped trying to duct-tape solutions together and started looking at APIs - real infrastructure built to handle video as a first-class citizen.

The difference was immediate.

APIs gave me predictability and composability. Instead of dealing with brittle workflows and side effects, I had clearly defined requests and responses. Upload, transform, stream, and analyze each step exposed as a service, not a one-off hack.

Instead of managing EC2 queues and encoding pipelines, I used an SDK to trigger uploads. Instead of tracking egress usage manually, I had access to real-time metrics and webhooks. Instead of polling player events, I could subscribe to structured analytics.

No glue code. No disconnect tools. No infrastructure overhead
REST endpoints replaced Bash scripts. GraphQL replaced guesswork. Observability replaced log-diving.

And that’s when I made the switch to using a proper video API.
Not a platform that tried to abstract everything away with a UI, but one that gave me clean, low-level primitives upload endpoints, stream-ready outputs, playback event hooks, and observability I could wire into my own systems.

In my case, I moved to FastPix.

What made it work was that the APIs aligned with how I already build software. Uploads were resumable. Transcoding was handled just-in-time. ABR worked out of the box live streaming was also there. And I finally had structured data on how videos were performing latency, errors, drop-offs, QoE types. FastPix didn’t ask me to change my architecture. It just plugged into it. And that alone saved me months of infrastructure work.

What my stack looks like now

Since moving to a third-party video API workflow, things look very different. Not because we stopped thinking about video but because we stopped having to think about it all the time.

Here’s how it works now.

Upload
Uploads are handled through an SDK either in the browser or backend depending on the use case. Files are resumable by default, and uploads can be tracked via events, not just HTTP status codes. No more juggling presigned URLs or dealing with retries manually.

Transcoding & adaptive streaming

I don’t run FFmpeg jobs anymore. As soon as the file hits the platform, just-in-time encoding kicks in.
Adaptive bitrate ladders are generated automatically based on content and usage context. Devices get the right rendition without me having to define presets or maintain profiles.

Embedding & playback

Embeds are now responsive by default and customizable through player API hooks. I can tweak behavior, load specific captions, or bind events to user actions without dealing with browser-specific quirks.
It works on Chrome, Firefox, Safari, mobile, and anything else we’ve thrown at it, no more last-minute bug reports about broken video controls on random Android devices.

Playback metrics & observability

For every stream, I now get structured analytics instantly on FastPix dashboard:

Start time
Buffer events
Resolution switches
Drop-offs by timestamp
Errors grouped by type and device

All exposed through a playback metrics API or delivered via webhooks. I can finally correlate user complaints with actual playback data, not guesswork.

Real-Time alerts

If a stream fails, I know instantly.
QoE degradations, failed uploads, abnormal rebuffering events everything surfaces in logs I can actually monitor. It’s all wired into our existing observability stack.

Bonus capabilities

Beyond the basics, we’ve plugged into features like:

Live-to-VOD conversion, automatically preserving live streams as on-demand assets
Instant clipping, for generating shareable snippets from long-form content
In-video AI intelligence, powering automated chaptering, NSFW filtering, object detection, and metadata tagging for better playback and search I run this whole workflow through FastPix now. The API platform that takes video seriously. It gave us the primitives we needed to build a clean, observable, and production-grade video pipeline without reinventing everything.

And honestly, for the first time, video feels like part of the product not a liability we have to babysit.

So why am I telling you this?

Because if you’re still stitching together S3 buckets, FFmpeg scripts, CDN configs, and player hacks you’re not alone. I did the same. It worked, until it didn’t.

And when it didn’t, it started costing more than just time. It slowed down product development, drained engineering focus, and made video feel like a liability instead of a feature.

You can absolutely build video infrastructure from scratch. But at some point, you have to ask: is that really the thing you want to be building?

Or would you rather use APIs, and spend that time building product?