DEV Community: AI Alleyway

Google Meet ships its own AI notetaker now — so the real question is lock-in, not features

AI Alleyway — Wed, 22 Jul 2026 03:55:44 +0000

Google Meet now takes its own notes. Turn on "Take notes for me" and Gemini follows the call, then drops an organized summary with action items into a Google Doc in your Drive. No bot, no install — it runs inside Meet.

That changes the question. For years, "best AI notetaker for Meet" was a feature comparison between third-party apps. Now the platform ships the feature natively, and the interesting decision is the one every engineer recognizes when a platform absorbs a capability you used to buy: do you take the native integration and accept the platform's boundary, or keep an abstraction layer above it so you're not locked in?

What the native path actually commits you to

Native Gemini notes are genuinely well-integrated — that's the whole pitch. But integration is the same thing as coupling, and it's worth naming exactly what you're coupling to:

It's Meet-only. Gemini's notetaker does nothing for your Zoom or Teams calls. Your notetaking is now a function of which platform the meeting happened on, which is a strange constraint to accept for something as platform-agnostic as "write down what we decided."
It's license- and admin-gated. The feature needs a Gemini add-on or a Business/Enterprise Workspace edition, and a Workspace admin has to switch it on before anyone can use it. There's no free personal path. Your access to your own meeting notes is now downstream of a procurement decision and an admin toggle.
The output lives in Google's sphere. The notes land in a Drive Doc under your Workspace's sharing rules. That's convenient if everything you do lives in Google — and it means your meeting record is shaped by, and governed by, the platform, not you.

None of that is bad. It's the classic trade: deep integration in exchange for living inside the platform's boundary. The native notetaker is bot-free and zero-install precisely because it's part of Meet — you can't have the integration without the coupling.

The portable layer is the other architecture

A third-party notetaker is the opposite design: an abstraction layer that sits above the meeting platform and treats Meet, Zoom, and Teams as interchangeable inputs. It captures the call (usually via a bot that joins as a participant), and — this is the part that matters — it owns the output. The notes live in your tool, on your terms, exportable, the same regardless of which platform hosted the call.

The properties fall out of that shape. It works no matter your Workspace plan (no license, no admin). It's cross-platform by construction, so your notetaking doesn't fork every time a client sends a Zoom link instead of a Meet one. Many are free — Fathom's free plan is genuinely unlimited and produced the cleanest summary of anything we tested, and it saves the video Gemini's notes don't. The costs are the mirror image of native's benefits: a visible bot in the participant list, and one more app holding your data instead of the suite you already run.

There's even a reliability wrinkle that tracks the architecture. A bot joins the call as its own participant, so it keeps recording if your laptop sleeps or your tab closes. The native/bot-free approaches capture from your session, so they depend on you staying in it. Coupling to the platform buys integration; the portable layer buys independence — including from your own flaky Wi-Fi.

So which boundary do you want?

Strip it to the engineering decision and it's clean:

You live entirely in Google Workspace, already pay for Gemini, and never meet anywhere but Meet → take the native integration. The coupling costs you nothing you'd have used, and you get the tightest experience.
You meet across Zoom/Teams too, or you want your notes to outlive your Workspace subscription, or you're on a personal account → keep the portable layer. A tool like Fathom abstracts the platform away and hands you the output to own, which is worth a bot in the roster.

The tell is portability. If your meeting notes should be Meet-shaped — governed by Google, searchable in Drive, gone if you leave Workspace — native is the clean answer. If they should be yours, independent of where the call happened, keep the layer. (Either way, the tools record a conversation, so disclose it on external calls — in two-party-consent regions that's the law, and Meet shows a note-taking banner regardless.)

I broke down the native setup, the best portable tools, the bot-vs-bot-free reliability tradeoff, and the exact steps for each in the full guide to AI notetakers for Google Meet. But the fast way to decide isn't a feature grid — it's the boundary question: are you coupling to the platform, or keeping a layer above it?

There are only three ways to tap a Zoom call — and it decides everything about your notetaker

AI Alleyway — Thu, 16 Jul 2026 08:46:54 +0000

Every "best AI notetaker for Zoom" list ranks tools by features. That's the wrong layer. Underneath the feature grid, there are only three places a piece of software can physically tap a Zoom call to get the audio, and which one a tool uses determines almost everything you actually care about — who sees it, whether it works on the free Zoom tier, whether it follows you to Google Meet, and whether you can go back and re-watch. Once you see the three attach points, the tool choice mostly falls out. I've tested tools from each category; here's the taxonomy.

1. Bot-as-participant

The most common approach: a bot joins the meeting as a visible participant and records from the inside. Fathom and Otter do this by default. It's the most portable design — because the bot just needs to be admitted to the call, it works on any Zoom tier (including free Zoom) and the same tool usually covers Google Meet and Teams too. Fathom even ships an official Zoom Marketplace app for one-click setup.

The cost is visibility. There's a labeled "AI Notetaker" sitting in the participant list, and once everyone on a call runs their own, you get the now-familiar pile-up of three or four notetaker bots in the roster. On external calls that's awkward at best and against the other side's policy at worst.

Trade: maximum reach and cross-platform, minimum discretion.

2. Inside-the-platform

The second approach doesn't join the call — it runs inside it. Zoom's native AI Companion (its "My Notes" feature) executes server-side within Zoom itself, captures the meeting, and emails a summary afterward. Nothing appears in the participant list because nothing joined; it's the platform taking its own notes.

The economics are the appeal: where Google charges for Gemini and Microsoft charges for Copilot, Zoom bundles the core AI Companion note-taking into its paid Zoom Workplace plans (roughly $13–18/user/month) at no extra cost — so if your org already pays for Zoom, you may already have this and just need an admin to switch it on. (Zoom has started layering premium AI tiers on top, so the frontier keeps moving, but the basic summary comes with the plan.)

The constraints are the flip side of living inside one platform: it's Zoom-only (nothing for your Meet or Teams calls), it needs an admin/host to enable it, and it's not available on the free Zoom tier.

Trade: cheapest and cleanest if you're already a paying, single-platform Zoom shop; useless outside it.

3. Local device-audio capture

The third approach ignores the call entirely and captures what your own machine hears. Granola grabs your device audio locally; browser tools like Tactiq transcribe from a Chrome extension. No bot, no participant, nothing added to the meeting — the quietest possible footprint, which is exactly why it's the pick for confidential client calls.

The architectural catch is a hard one: these tools capture what you can hear, so they work best when you're an active participant actually in the audio path, and they lean on the notes you take rather than a pristine multi-speaker server-side transcript. (Worth stating plainly: bot-free is not the same as recording in secret — in two-party-consent regions you still disclose you're transcribing.)

Trade: minimum footprint and maximum privacy, at the cost of capture fidelity and needing to be present.

Pick the attach point, then the tool

Read your situation through the three and the shortlist writes itself: on a free Zoom account or bouncing between Zoom/Meet/Teams, you need a bot-as-participant tool (Fathom is the one I'd start with — unlimited free plan, official Zoom app, cross-platform). Already a paying, Zoom-only org, the inside-the-platform AI Companion is the cheapest clean option you may already own. Running discreet external calls, local capture (Granola) keeps the meeting clean.

I laid out both routes on Zoom specifically — the native AI Companion setup, the free third-party apps, the multi-bot problem, and how to enable each — in our AI notetaker for Zoom guide, built from the tools we actually ran rather than a marketplace listing. But the mental model is the durable part: three attach points, three sets of tradeoffs, and the tool is downstream of which one fits your call.

The "AI avatar generator" category is three rendering architectures wearing one label

AI Alleyway — Thu, 09 Jul 2026 11:42:15 +0000

If you evaluate AI avatar tools as if they're interchangeable — type a script, get a talking head — you'll pick the wrong one and then blame the tool. I tested the three market leaders hands-on, giving each the same nine-second script, and the thing that actually separates them isn't avatar quality. It's the rendering architecture underneath. And once you see the architecture, two things you'd written off as pricing quirks turn out to be inevitable consequences of it.

There are three architectures.

1. Batch render behind a moderation gate (Synthesia)

Synthesia takes a script, screens it for policy issues before generating, renders the avatar, then moderates the finished video before it will release it to you. In my test, a nine-second clip took four to five minutes to appear — and almost all of that time was the moderation step, not the rendering.

That reads like a performance problem. It isn't. It's the whole value proposition. The moderation gate is why Synthesia is the tool most of the Fortune 100 standardize on: a security or compliance team can sign off on a system that refuses to emit an avatar saying something off-policy. You cannot buy your way past the latency, because the latency is the guardrail. There's no "fast mode" tier — a fast mode would mean turning off the exact thing enterprises are paying for.

The billing model follows from the architecture too. Because every output is a discrete, moderated artifact, Synthesia meters by the finished minute. Its plans are minute allowances — a mid-tier plan is a few hundred minutes a year — and the effective cost lands around two dollars per finished minute. A per-artifact price on a per-artifact pipeline.

2. Fast render, metered by compute (HeyGen)

HeyGen returned a comparable nine-second clip in about a minute, with no moderation queue in the way. Same job description — script in, talking head out — completely different pipeline: no policy gate, and the render is the entire wait.

But the interesting part is the meter. HeyGen doesn't bill by the minute of video; it bills by credits, and the credit burn rate depends on which motion engine you pick. The older engine costs a few credits per minute. The newer, lifelike engines — the entire reason you'd choose HeyGen — cost around 20 credits per minute. That's roughly a 7× spread in cost for the same duration of output, decided by one dropdown.

That's not an arbitrary gimmick. Credits are a compute passthrough: the lifelike engines are far more expensive to run, so they drain the meter far faster. The consequence for a buyer is that a plan's headline number ("600 credits") tells you almost nothing until you know which engine you're running — 600 credits is ~30 minutes of the good engine, or ~200 minutes of the cheap one. The unit that matters is credits-per-minute-of-the-engine-you-actually-want, and the pricing page won't do that division for you.

There's a trap that falls straight out of this architecture: a new HeyGen project defaults to the premium engine, which the free plan won't render — so a first-time free user hits Generate, gets an upsell instead of a video, and concludes the tool is broken. It isn't; you have to manually downgrade the engine. That's the compute-metered design leaking into the onboarding.

3. Real-time inference, metered by the conversation (Tavus)

Tavus doesn't render a file at all. It generates the avatar live, during what feels like a video call — you talk, it talks back, rendered in real time. I held an actual conversation with its built-in agent, and the real-time face was the closest thing to talking to a person I've tried.

Different architecture, different failure mode: it's still in beta, and my first call dropped before connecting on a retry. A batch renderer can't "drop a call" — it either returns a file or errors. Real-time inference carries a whole class of reliability problems the other two don't, because there's a live session to keep alive.

And, predictably, the meter changes again. There's no finished minute to sell and no render credit to burn, so Tavus prices by conversational minutes, with pay-as-you-go overage. That's the only honest unit for a pipeline whose output is a session, not an artifact — and it's also why costs are harder to forecast than a flat per-video plan: you're paying for time-on-call, which your users control, not you.

The takeaway for anyone choosing

"Which AI avatar tool is best" has no answer because the three leaders aren't competing on one axis — they're three different systems:

Playback of a governed artifact → batch-plus-moderation (Synthesia). Slow and per-minute on purpose.
Playback of a high-quality artifact, fast → compute-metered render (HeyGen). Fast, but the credit rate — not the plan price — is your real bill.
A conversation → real-time inference (Tavus). No file; per-conversational-minute; and a live session's reliability tax.

You can't upgrade a batch tool into a real-time one, and you can't make a moderated tool fast without removing the moderation that's the point of it. So decide which architecture your job needs first; the tool, and the shape of its bill, follow from that. I put all six tools we looked at — the three leaders plus the specialists — through exactly this lens in our hands-on best AI avatar generator roundup, where the render times, credit math, and per-tool verdicts are laid out in full.

Prices and credit rates above are from our hands-on tests in July 2026; vendors move these around, so treat the exact figures as directional and the architecture as the durable part.

InVideo's credit meter is a GPU bill: the same video, priced 20x apart

AI Alleyway — Thu, 09 Jul 2026 03:46:44 +0000

"InVideo AI is $20 a month" is true and almost useless. The plan fee is the price of the door; the credits are the price of the video, and the credit cost of a single 30-second clip swings about 20x depending on one choice you make each time you hit generate. Once you understand why, it stops being a mystery bill and becomes something you can actually budget — the same way you'd budget metered cloud compute.

It's not a video tool with a price. It's a model hub with a meter.

InVideo bills itself as a hub with access to 200-plus models. You don't pay per video; you pay a flat monthly fee for access, and then every generation spends from a monthly credit pool that refills with your plan (75 credits on the $20 Plus plan, 390 on Max, 800 on Generative, 4,250 on Elite).

The important part is what sets the cost of a single generation. It's not the length or the plan — it's the compute behind the model you invoke. Pulling a licensed stock clip is nearly free to run, so it costs almost nothing. Running a frontier video model like Google's Veo 3.1 or OpenAI's Sora 2 is expensive to run, so it costs a lot. InVideo is passing that compute cost straight through to you, per generation.

Here's what that looks like on the exact same 30-second brief, from InVideo's in-app generate screen (as of my testing, mid-2026):

Footage setting	What runs	Credits
Stock (Basic)	licensed stock clips	~2
Efficient generative	InVideo's cheaper models	~15
Premium (Pro)	Veo 3.1 / Sora 2	40

Same brief. Twenty times the price. That is not a pricing quirk — it's the cost model. Your plan sets the size of the pool; your footage choice sets how fast you drain it.

The part every engineer will flinch at: no retry discount

Here's the detail that turns this from "metered" to "watch out." There is no idempotency discount on a regeneration. If a clip comes out wrong and you generate it again, InVideo charges the full rate a second time.

Generative prompts rarely land on the first try — you tweak the wording, regenerate, adjust a scene, regenerate again. Two or three attempts at 40 credits each is 80 to 120 credits for one finished premium clip — more than the entire 75-credit monthly pool on the $20 plan. So the real unit of cost isn't "a video." It's a finished video, including the retries it took to get there, and that number is unknowable in advance because it depends on how many attempts your prompt needs.

That's the mechanism behind the recurring review complaint of "I paid for videos I never finished." The meter runs on every attempt, hit or miss.

How to actually budget it

Stop thinking in "videos per plan." Think in cost-per-finished-clip, like a cloud bill:

effective cost  ≈  (credits per generation)
                 ×  (expected attempts to get a keeper)
                 ×  (dollars per credit on your plan)

Run the numbers and the split is stark. On the same $20 Plus plan, a stock-first creator effectively pays about 50 cents a video (that $20 spread across ~37 stock clips), while a premium-generative creator pays closer to $11 a clip — and once you fold in the regenerations a Veo or Sora prompt usually needs, a single finished premium clip can cost more than the entire monthly plan on its own.

So the budgeting rule is: decide your footage mix first, compute credits-per-finished-clip (retries included), then pick the plan whose pool clears that with margin. The 40-credit premium tier is the on-demand GPU instance of this analogy — remarkable value if you use it sparingly, ruinous if you lean on it without watching the meter.

I worked the full thing out — every plan, the credit-to-dollar math, the stock-license second meter, and where the free plan actually stops being usable — in what a video actually costs on InVideo. But the one idea that saves you the most is this: the number to write down before you subscribe isn't $20. It's how many premium clips a month you actually need, because that's the variable the whole bill turns on.

The line in the LICENSE file that reorders your n8n alternatives

AI Alleyway — Wed, 08 Jul 2026 07:16:37 +0000

Most "n8n alternatives" lists compare the wrong column.

They line up connector counts, per-execution pricing, cron support, and a screenshot of the canvas. Useful, up to a point. But if you're planning to self-host the engine — the reason you're looking past the SaaS incumbents in the first place — there's a field none of those tables show, and it sits in a file at the root of every repo: LICENSE.

We run n8n in production. The entire AI Alleyway content pipeline is about ten self-hosted n8n workflows on a single box. So this isn't a "we surveyed the landscape" post — it's the one architectural question I wish more alternatives lists led with, because it's the one that quietly decides whether a tool is even eligible for your use case before you've compared a single feature.

Three licenses, three completely different contracts

"Open source" gets used as a single adjective. For self-hosted automation engines it's actually three different legal contracts, and they diverge exactly where it matters — redistribution.

n8n ships under the Sustainable Use License (v1.0). This is source-available, not an OSI-approved open-source license. n8n's own umbrella term for the model is "fair-code." You can read the source, self-host it, and modify it — but the grant is scoped. Straight from the repo's LICENSE.md, you may use or modify the software "only for your own internal business purposes or for non-commercial or personal use," and you may distribute it to others "only if you do so free of charge for non-commercial purposes." In plain terms: run it for yourself all you want; you cannot turn around and offer n8n as a hosted service to third parties, or bundle-and-resell it commercially, without a separate enterprise license.

Activepieces ships under MIT (its Community Edition). MIT is the permissive, OSI-approved baseline — embed it, fork it, resell it, close your fork, no copyleft strings. The honest nuance: MIT covers the Community Edition core; Activepieces keeps some enterprise features under a separate commercial license, so "MIT" describes the open core, not every paid add-on. But the core you'd self-host and build on is genuinely permissive.

Node-RED ships under the Apache License 2.0, stewarded by the OpenJS Foundation. Also OSI-approved and permissive, and Apache 2.0 goes one step past MIT by including an explicit patent grant — the contributors license the patents needed to use their contribution, and there's a patent-retaliation clause. If your legal team cares about patent exposure in a shipped product, that clause is a feature, not boilerplate.

Three tiers: source-available-with-a-redistribution-fence, permissive, permissive-with-a-patent-grant. Same "open" label on the tin.

Why the license is an architectural constraint, not a footnote

Here's the reframe: the license isn't a compliance checkbox you clear at the end. It's a boundary on your architecture, because it constrains where the automation engine is allowed to live in your system.

Ask one question — is the automation layer something you run, or something you ship?

If you run it — internal ETL, ops glue, a content pipeline like ours, back-office orchestration that never leaves your own walls — then n8n's fair-code fence never touches you. "Internal business purposes" is exactly what you're doing. That's precisely our situation: we chose n8n knowingly, and the Sustainable Use License was a non-issue because we don't resell the engine or expose it as a product to anyone. In that world you should optimize for depth, and n8n's depth is real — on the order of 1,100 integrations, strong branching and error handling, and a node ecosystem that means most of what you need already exists.

If you ship it — the automation engine is embedded in a product your customers touch, or you're offering "workflows" as a feature of your SaaS, or you're a platform letting your users build automations — you've crossed the exact line the fair-code license draws. Now n8n's restriction is load-bearing. Offering it as a service to third parties is the thing the license reserves. At that point the feature-richest tool is no longer the honest answer; the eligible tools are the permissively licensed ones. Activepieces (MIT) and Node-RED (Apache 2.0) let you embed, white-label, fork, and resell without asking anyone. That's not a marketing claim — it's what those license texts grant.

This is why a license-blind alternatives list can actively mislead. It'll rank n8n at the top for a reader who's building a product on top of an automation engine — a reader for whom n8n's own license says "not like this." The ranking is correct for the internal-use reader and wrong for the productizing reader, and the table gives you no way to tell which one you are.

Where the hosted tools fit (and why they dodge the question)

For completeness: we've also driven Make (built a real scenario against its API) and Zapier (ran its MCP across Gmail, Calendar, and Slack). Both are fine tools. But they're hosted SaaS — there's no repo to self-host and no LICENSE file to read, so the whole license-tier question is moot. You're renting capacity and accepting the terms of service, full stop. That's a legitimate choice; it's just a different decision than the one this post is about. The license axis only exists once you've committed to self-hosting, which is where the "n8n alternatives" search usually lands you anyway.

The same "it's a service, not a license" framing applies to most of the other hosted names that show up in these roundups — Pipedream, Workato, Power Automate, Gumloop, Tray.ai. I'm assessing those from their docs and terms, not from production use, and none of them change the core point: if you're not self-hosting, you're evaluating a contract, not a license.

The decision, compressed

Running automations internally? Fair-code is fine. Optimize for depth and ecosystem — n8n earns its spot.
Shipping the automation layer inside a product, or need to fork/embed/resell freely? The permissive licenses are the honest destination, not the feature-count winner. Look hard at Activepieces (MIT) and Node-RED (Apache 2.0), and read their LICENSE files yourself before you commit.
Not self-hosting at all? The license tier doesn't apply — you're picking a SaaS on terms-of-service and pricing, and Make/Zapier/Pipedream are competing on that axis instead.

The practical move: before you shortlist any self-hosted automation engine, open its LICENSE file first and answer the run-it-vs-ship-it question. That one field filters the list faster than any feature matrix, and it filters it correctly — because it filters on what you're actually allowed to do, not on what the tool can do.

When I sanity-checked our own stack against this lens, I ended up pulling together the full n8n alternatives breakdown, sorted by why you're leaving — nine tools grouped by the reason someone actually migrates (license, hosting model, pricing shape, integration gaps) rather than by a single leaderboard. If you're weighing a switch, reading it by your reason-for-leaving is a lot more useful than reading it top-to-bottom.

Read the LICENSE file first. It's the cheapest architecture decision you'll make all quarter.

Generate your Open Graph images with React (Remotion), not a design tool

AI Alleyway — Mon, 06 Jul 2026 13:00:00 +0000

Every blog post, product page, and share link wants its own Open Graph image — the card that shows up when the URL is posted to X, LinkedIn, Slack, or iMessage. Hand-making those in Figma doesn't scale: the moment you have a few dozen pages, you're copy-pasting a template and re-exporting PNGs by hand.

I moved mine to code. One React component, one CLI command per card, and a size-optimization step. Here's the whole setup — including the two gotchas that cost me an afternoon.

Why Remotion?

Remotion is "React for videos" — but it renders still frames just as happily as video, and a 1200×630 OG card is just a single frame of a React component. That means your card is a normal component: props, flexbox, your real fonts and brand tokens, conditional layout. No design-tool round-trip, no drift between your site's styles and your cards.

The composition

export const OgCard: React.FC<{
  headline: string;
  productName?: string;
  rating?: number;
}> = ({ headline, productName, rating }) => (
  <AbsoluteFill style={{ background: "linear-gradient(135deg,#0b0b0f,#1a1330)", color: "#fff", padding: 80, justifyContent: "space-between" }}>
    <h1 style={{ fontSize: 68, lineHeight: 1.05, fontFamily: "Outfit" }}>{headline}</h1>
    {productName && <div style={{ fontSize: 34, opacity: 0.8 }}>{productName}{rating ? ` · ${rating}/5` : ""}</div>}
  </AbsoluteFill>
);

// index.ts
<Composition id="OgCard" component={OgCard} width={1200} height={630}
  durationInFrames={60} fps={30} defaultProps={{ headline: "" }} />

Rendering one card per page

renderStill writes a single PNG. Feed it the per-page props as JSON:

npx remotion still src/index.ts OgCard out/my-post.png \
  --props='{"headline":"The cheaper voice tool wasn'\''t on the pricing page","productName":"Acme TTS","rating":4}'

Loop that over your content collection and you have a card per URL, regenerated on every build.

Don't ship the PNG — convert to WebP

A 1200×630 PNG out of a headless browser is big — mine came out around 770 KB. That's absurd for a social card. One conversion step drops it to ~17 KB with no visible loss:

# ImageMagick
magick out/my-post.png -quality 82 -define webp:method=6 out/my-post.webp
# or sharp: sharp(png).webp({ quality: 82 }).toFile(webp)

Point your og:image / twitter:image at the WebP. ~45× smaller, same card.

Gotcha 1: `selectComposition` needs the SAME inputProps as `renderStill`

If you render programmatically (Node API) rather than the CLI, you call selectComposition() first, then renderStill(). Pass your inputProps to BOTH. Anything resolved at selection time — most importantly staticFile() references and anything derived from props in calculateMetadata — uses defaultProps if you only handed props to renderStill.

The symptom is baffling the first time: every card comes out byte-identical (the default props rendered), or an image staticFile('logo.png') silently resolves to the placeholder. It's not a caching bug — it's the selection step running on defaults.

const inputProps = { headline, productName, rating };
const comp = await selectComposition({ serveUrl, id: "OgCard", inputProps }); // ← here
await renderStill({ composition: comp, serveUrl, output, inputProps });        // ← and here

Gotcha 2: `renderStill({ frame })` requires `durationInFrames > frame`

If you register a still-only composition with durationInFrames: 1 and then ask for frame: 30 (say, to let an entrance animation settle), Remotion throws:

RangeError: Cannot use frame 30: Duration of composition is 1

Give the composition enough frames for the frame you sample. I register OG/still compositions with durationInFrames: 60 (2s @ 30fps) and render whatever frame I want within that.

That's the whole pipeline

One React component = your card, in your real styles.
renderStill + a JSON props blob = one card per page, on every build.
A PNG→WebP step = ~45× smaller files.
Remember: same inputProps to selectComposition and renderStill, and durationInFrames bigger than the frame you sample.

Once it's wired, adding a new page's social card is zero manual work — it falls out of the build. Worth the afternoon.

Shipping one Manifest V3 extension to Chrome, Edge, and Firefox from a single source

AI Alleyway — Sun, 05 Jul 2026 10:38:26 +0000

I recently shipped a small Manifest V3 browser extension — a toolbar popup plus an address-bar (omnibox) search, zero permissions — and wanted it on all three major stores: the Chrome Web Store, Edge Add-ons, and Firefox's AMO.

The pitch for MV3 is that it's the shared standard, so "write once, ship everywhere." That's mostly true — but the small divergences are exactly the kind of thing that fails a store review at 11pm. Here's the complete list of what actually differs, plus a tiny build script that emits all three packages from one source.

Chrome and Edge are the same package

Good news first: Edge Add-ons accepts the exact same MV3 zip as the Chrome Web Store. Edge is Chromium, background.service_worker works as-is, and the chrome.* APIs are identical. You upload the same artifact to both. (I keep them as separately-named zips only for a clean per-store upload trail.)

Firefox needs three manifest tweaks — and zero code changes

Firefox is where it gets interesting. The code didn't change at all — the extension uses only chrome.omnibox, chrome.runtime, and chrome.tabs, all of which Firefox exposes via the chrome.* alias. Only the manifest needs three edits.

1. background.service_worker → background.scripts

Firefox's MV3 support prefers an event page over a service worker for the widest compatibility. If your background script only registers listeners at the top level (no service-worker-only globals), it runs fine as an event-page script:

// Chrome / Edge
"background": { "service_worker": "background.js" }

// Firefox
"background": { "scripts": ["background.js"] }

2. browser_specific_settings.gecko.id

AMO requires an explicit extension id. Chrome/Edge derive one for you; Firefox wants it declared:

"browser_specific_settings": {
  "gecko": { "id": "your-ext@yourdomain.com" }
}

3. data_collection_permissions — and the version floor it drags in

This is the one that surprised me. Newer Firefox requires every extension to explicitly declare what user data it collects — even when the answer is "nothing." Omit it and web-ext lint fails with MISSING_DATA_COLLECTION_PERMISSIONS:

"gecko": {
  "id": "your-ext@yourdomain.com",
  "strict_min_version": "142.0",
  "data_collection_permissions": { "required": ["none"] }
}

The gotcha is the version floor. data_collection_permissions is only supported from Firefox 140 on desktop and 142 on Android. Set strict_min_version below 142 and the linter throws KEY_FIREFOX_UNSUPPORTED_BY_MIN_VERSION — first for desktop, then again for Android. 142.0 is the floor that satisfies both.

The omnibox trap on Firefox for Android

If your extension has an omnibox keyword like mine, do not tick "Firefox for Android" compatibility on AMO. The omnibox API is not supported on Firefox for Android — the popup still works as an overlay, but the address-bar keyword silently does nothing. Ship desktop-only until you've adapted and tested for mobile, or you'll be shipping a broken core feature to Android users.

One source, three zips

Rather than maintain three manifests by hand, keep one and transform it at build time. The whole script is ~40 lines; the interesting part is the Firefox transform:

function firefoxManifest(m) {
  const fx = structuredClone(m);
  fx.background = { scripts: ["background.js"] };
  fx.browser_specific_settings = {
    gecko: {
      id: "your-ext@yourdomain.com",
      strict_min_version: "142.0",
      data_collection_permissions: { required: ["none"] },
    },
  };
  return fx;
}

// pack("chrome", base);
// pack("edge",   base);              // === Chrome, kept separate for a clean upload trail
// pack("firefox", firefoxManifest(base));

Each pack() stages the shared files (popup.*, background.js, icons, plus the per-store manifest) and zips them with paths at the archive root (manifest.json must sit at the top level, not under a subfolder).

Validate before you upload

Run web-ext lint — the same validator AMO runs — on the Firefox package before submitting. It catches the version-floor and data-collection issues above locally, instead of after a rejected upload. My bar was 0 errors / 0 warnings / 0 notices before the zip went anywhere near the store.

The short version

Chrome == Edge — one MV3 zip, upload to both.
Firefox — three manifest lines (background.scripts, gecko.id, data_collection_permissions) and one version floor (142.0). No code changes.
Omnibox on Android — unsupported; don't tick the box.
Lint locally with web-ext so the store never says no.

If you're maintaining separate repos or manifests per browser, collapsing to one source + a build transform is an afternoon well spent.

Two architectures for "script to video", and why the credit meter follows from the design

AI Alleyway — Fri, 03 Jul 2026 07:09:17 +0000

Two AI video tools take the same input — a short script — and hand back the same shape of output: a captioned vertical clip with a voiceover. Feed the same brief to both and one produces a clip for the equivalent of about 2 credits; the other, on its premium setting, burns about 40 credits for a single 30-second clip on a 75-credit monthly plan.

A 20× spread on identical-looking output is the kind of thing that looks like arbitrary pricing until you look at the pipeline. It isn't arbitrary. The credit meter is a projection of the architecture, and once you see the two designs, the prices — and the free-tier policies, and where each tool spends its quality budget — all fall out of the diagram.

I tested both (Fliki and InVideo) hands-on for a comparison, so to be clear about scope up front: I can't see either company's source. What follows is the architecture the observable behavior implies — the models each one exposes and the credit costs I actually watched tick down. Treat it as a systems read, not an internal spec.

Pipeline A: assemble a voice, backfill the picture

Fliki's design is a text-to-speech assembly pipeline. Trace a script through it and the stages look roughly like this:

Parse the script and segment it into scenes.
Synthesize narration from a large TTS voice library — 2,000+ voices across 80+ languages.
Time captions to the audio.
Backfill each scene's background from stock or an AI-generated still.
Mux audio + captions + background into a clip.

The thing to notice is where the cost and the differentiation live. Every stage except one is cheap and mostly deterministic — segmentation, caption timing, and asset lookup are database-and-glue work. The one stage that's both expensive and the actual product is TTS synthesis. That's why Fliki pours its quality budget into the voice library and lets the visuals stay basic: the visuals are a lookup, the voice is the inference.

TTS is also, in compute terms, cheap inference relative to what's coming in pipeline B. A few seconds of neural speech is orders of magnitude less GPU time than a few seconds of generated video. So the marginal cost of one Fliki clip is close to flat, and low. Two consequences fall straight out:

The free tier can actually export a finished (watermarked, 720p) video. A free export costs Fliki almost nothing, so it can afford to give one away as a real test drive.
The paid floor is low — about $8/month for the entry plan — because the pipeline it's amortizing is cheap.

Pipeline B: generate the footage, then narrate

InVideo's design is a generative-model orchestration layer. The stages:

Take a one-line prompt and expand it into a storyboard/plan (an LLM step).
For each scene, call a text-to-video model — it reaches Veo 3.1, Sora 2, Kling, and Seedance, 200+ models in all — to generate original footage.
Generate a voiceover.
Assemble.

Here the expensive stage isn't the voice — it's step 2, and it's expensive by a different order of magnitude. Generating a few seconds of novel video from a frontier diffusion/video model is among the most compute-heavy inferences you can buy right now. That single stage dominates the cost function so completely that everything else rounds to zero.

Now the 20× makes sense. When InVideo pulls a stock clip, step 2 degrades to a lookup and the clip costs ~2 credits — same class of operation as Fliki's backfill. When it generates a premium Veo/Sora clip, you're paying for GPU-seconds of a frontier model, and that's the ~40-credit clip. Same tool, same UI, two completely different cost regimes depending on whether step 2 retrieves or generates.

And the same two consequences invert:

The free tier cannot export a usable video, because a single free generation is real, non-trivial GPU cost — you can't give that away the way you give away a TTS clip.
The paid floor is higher (about $20/month entry) and the meter is a monthly pool of ~75 credits that one ambitious clip can gut, rather than the slow-draining yearly pool an assembly tool can offer.

The meter is a shadow of the pipeline

Put the two side by side and the pricing stops looking like a marketing decision and starts looking like an accounting identity:

Observable	Fliki (assembly)	InVideo (generation)
Dominant-cost stage	TTS synthesis	per-scene video generation
Marginal cost per clip	low, ~flat	low for stock, high for generative
Credit cost, one clip	~2-equivalent	~2 stock / ~40 generative
Free tier exports?	yes (cheap pipeline)	no (a free gen is real GPU cost)
Meter shape	slow yearly pool	monthly pool one clip can drain
Entry price	~$8/mo	~$20/mo
Where quality is spent	the voice	the footage

None of the right-hand column is a pricing "choice" in isolation. It's what the generative pipeline costs to operate, expressed as credits. The left column is the same story for a cheap pipeline.

This generalizes past these two tools, and it's the actually-useful part if you build or buy in this space: find the dominant-cost stage of a pipeline and you've predicted its pricing model, its free-tier policy, and where it spends quality. A tool whose expensive stage is retrieval will have a generous free tier and a flat, low meter. A tool whose expensive stage is frontier-model inference will gate the free tier and meter aggressively, because it has to — the unit economics don't allow anything else. When a pricing page confuses you, reverse the arrow: ask what the tool must be spending compute on, and the meter usually explains itself.

For the buyer, the practical read is the same one the architecture predicts: if the voice carries your video, the assembly pipeline is the efficient match and you'll pay a low, predictable meter. If the footage is the product, you're buying GPU-time-as-credits and your job is to ration the generative stage — treat the 40-credit clip as a deliberate spend, not a default.

Both, incidentally, export clean 1080p on their paid plans, which is the tell that the resolution was never the differentiator — what fills the frame is. One holds a generic AI still; the other holds generated footage. That's the whole 20×.

I scored the two 4.3 and 4.2 respectively, and if you want the full head-to-head — the credit math on a real posting cadence, the voice-library depth, and which one to pick by what carries your videos — I wrote it up here: Fliki vs InVideo.

But the pricing itself you can now read straight off the diagram: cheap pipeline, cheap meter, free export; expensive pipeline, expensive meter, no free export. The credit gap was the architecture talking the whole time.