DEV Community: Soumyadeep Dey

My dashboard took 7.6 seconds to render fifteen numbers 🐌

Soumyadeep Dey — Thu, 23 Jul 2026 06:19:53 +0000

This is a submission for DEV's Summer Bug Smash: Clear the Lineup powered by Sentry.

Not fifteen tables. Fifteen numbers. Four stat tiles, three donut slices, six bars, and a list of four recent invoices.

Seven and a half seconds, on the page my clients land on after login.

It had been doing this for months and I never noticed, because my development database held four invoices. On four invoices it takes 111 ms and feels instant. This bug only existed at a volume I never had locally, which is the most dangerous kind there is, because it ships and then it waits.

Here's how Sentry found it, what was actually broken (three separate things), and the numbers afterwards.

The app, and why the stakes are real

DevLabs is the agency platform my team runs the business on. Next.js 16, React 19, MongoDB/Mongoose, ~34k lines. Marketing site, internal admin dashboard, and a client-facing portal holding real invoices, real contracts, and real customer records.

That last part matters twice in this post: once because slow pages cost us client trust, and once because it constrained how I was allowed to instrument.

A note on the code below. This is a private client-facing codebase, so I can't link the repo. Everything here is the real diff, quoted verbatim from the files it lives in, with nothing invented for the write-up. Every measurement comes from a harness I've included in full, so you can point the same one at your own app and get your own numbers rather than taking mine on faith.

Step 1: Sentry, and the integration that isn't on by default

I wired up @sentry/nextjs expecting the trace to point at the slow query.

Instead the whole request came back as one opaque server span. No database spans at all.

Here's the thing that cost me an evening, and the single most useful thing in this post:

Mongo and Mongoose spans are not in Sentry's default integration set. You have to name them.

I verified it rather than trusting my assumptions:

$ node -e "const n=require('@sentry/node');
  const d=n.getDefaultIntegrations({});
  console.log('total:', d.length);
  console.log('mongo:', d.map(i=>i.name).filter(x=>/[Mm]ongo/.test(x)))"

total: 17
mongo: []      # <- empty. they're opt-in.

Seventeen integrations out of the box, and your database is in none of them.

// src/instrumentation.ts
// Without these, a dashboard request traces as ONE opaque server span
// and the entire data layer stays invisible.
const integrations = isNode
  ? await import("@sentry/node").then((node) => [
      node.mongoIntegration(),
      node.mongooseIntegration(),
    ])
  : [];

Sentry.init({
  dsn: DSN,
  tracesSampleRate,
  integrations,
  sendDefaultPii: false,   // this app holds customer and invoice records
});

With those two lines the waterfall opened up, and the database started talking:

Span	Duration
`mongodb` `{"getMore": ...}`	4.42 s
`mongoose.projects.find`	2.67 s
`mongodb` `{"getMore": ...}`	1.35 s
`mongoose.customers.find`	134 ms
`mongodb` `{"find": {"firebaseUid": "?"}}`	86 ms

The thing I did not expect was getMore.

getMore is MongoDB's cursor-continuation command. You only see it when a result set is too large to come back in one batch, so the driver goes back to the server again and again to stream the remainder. Two getMore spans totalling 5.77 seconds is the database telling you, in its own words, you asked for so much data I had to deliver it in installments.

Two getMore spans (4.42s and 1.35s) in a single dashboard load, beside mongoose.projects.find at 2.67s, which is the second bug.

That's a much sharper diagnosis than "the query is slow." A slow find could be a missing index. getMore dominating a trace means the problem is volume of bytes, not lookup speed. No index would have saved this. It pointed straight at over-fetching before I had read a single line of the query.

Opening the full waterfall made the proportion impossible to argue with:

http.server  GET /dashboard ................. 14.89 s
  db  mongoose.invoices.find ................ 13.32 s   <-- one span
  db  mongoose.users.findOne ................ 845 ms
  resolve page components ................... 3.93 ms
  build component tree ...................... 11.73 ms

Sentry's own span breakdown:  db 77%  ·  default 12%  ·  function.nextjs 12%

One span was 89% of the request.

Read that number honestly. 14.89 s is a development load: Turbopack compiling, no production optimizations, and Sentry sampling at 100%. The clean figure is the 7,580 ms from the benchmark below, measured with the same code paths and no instrumentation overhead. What the trace contributes is not magnitude, it's shape: which span, what share, and why. Those hold regardless of environment.

(The saslStart, saslContinue, and createIndexes spans above it in the trace are MongoDB's connection handshake on cold start. They fire once per process and are not part of this bug.)

Sorted by duration, the whole bug reads top to bottom: a 14.89s request, one 13.32s query inside it, and a getMore cursor span at 9.62s.

Sentry's own span breakdown for the request: **db 77%, default 12%, function.nextjs 12%.

Session Replay needed one more decision before I turned it on. This dashboard renders real customer names, invoice totals, and contract titles:

Sentry.replayIntegration({
  maskAllText: true,     // non-negotiable: replays record layout, never client data
  blockAllMedia: true,
}),

A replay that leaks a client's invoice total is not a debugging tool, it's an incident.

Step 2: make it reproducible

Sentry told me where. To know whether a fix worked I needed a number I could re-run, and a database that actually hurt.

So my first commit wasn't a fix. It was a volume generator, with two rules because it points at whatever MONGODB_URI points at:

It only inserts. Never updates or deletes a document it didn't create.
Everything it writes is tagged, so cleanup removes exactly the generated rows and nothing else.

const SEED_TAG = "[perf-seed]";

// Stands in for the base64 data URI a real uploaded logo becomes.
// Every invoice carries its own copy.
const FAKE_LOGO = `data:image/png;base64,${"iVBORw0KGgoAAAANSUhEUg".repeat(360)}`;

Plus a read-only benchmark that runs the exact functions the page runs:

npm run seed:perf -- --yes        # 2,000 invoices, 40 projects
npm run bench:dashboard           # measures, writes nothing

corpus      2004 invoices, 42 projects

getInvoicesAdmin()      6668.8 ms    22358.6 KB read
getDashboardOverview()   911.4 ms     2490.0 KB read, 4806 tasks scanned
                                      8 activity rows kept
total server time       7580.2 ms

24.3 MB off the wire, 7.6 seconds, to render fifteen numbers.

What one dashboard load was actually doing

flowchart TD
    A["GET /dashboard"] --> B["getInvoicesAdmin()"]
    A --> C["getDashboardOverview()"]

    B --> D["Invoice.find()<br/>no filter · no limit · no projection"]
    D --> E["2,004 FULL documents<br/>incl. base64 logo per invoice<br/>21.8 MB"]
    E --> F["rebuild every invoice<br/>invoiceFieldsFrom() x 2004"]

    C --> H["Project.find().select('name board activity')"]
    H --> I["every board · every activity row<br/>2.4 MB · 4,806 tasks"]
    I --> J["reduce in JS to<br/>8 items + 4 counters"]

    F --> K["ship 2,004 rows to the browser"]
    J --> K
    K --> L["render 15 numbers"]

    style E fill:#c62828,color:#fff
    style I fill:#c62828,color:#fff
    style L fill:#2e7d32,color:#fff

Three distinct bugs.

Bug 1: a base64 logo, fetched 2,004 times, to print a serial number

const invoices = await Invoice.find()   // no filter
  .populate(INVOICE_POPULATE)           // no limit
  .sort({ createdAt: -1 })              // no projection
  .lean();

Invoice.find() with no projection pulls every field of every document. In this schema that includes fields.companyDetails.logo, and uploaded logos are stored as base64 data URIs. Each invoice carries its own full copy of an image.

A list view rendering a serial number, a customer name, and a total was dragging an image per row across the network. Times two thousand.

The fix is pure exclusion, zero contract change:

+const INVOICE_LIST_EXCLUDE = "-fields.companyDetails.logo -fields.metadata -access";

 const invoices = await Invoice.find()
+  .select(INVOICE_LIST_EXCLUDE)
   .populate(INVOICE_POPULATE)

The detail view still selects everything, so opening an invoice is untouched.

22,358 KB → 6,046 KB. 6,669 ms → 1,597 ms.

Bug 2: the comment that was true about the output and false about the input

// Aggregated CRM data for the dashboard Overview tab. One query pass,
// everything trimmed to what the cards render.

That comment is correct about what comes out: eight activity rows, four counters, four project bars.

It is completely wrong about what goes in:

Project.find().select("name board activity").lean()

Every project's entire board, meaning every column and every task, plus its entire activity log. Activity is append-only. It grows forever. The dashboard's cost was tied to my whole business history, on every load, to display eight rows.

4,806 tasks scanned to produce four numbers.

The fix is to do the reduction where the data already lives:

Project.aggregate([{
  $facet: {
    recentActivity: [
      { $unwind: "$activity" },
      { $sort: { "activity.time": -1 } },
      { $limit: 8 },
      { $project: { _id: 0, projectId: "$_id", projectName: "$name",
                    text: "$activity.text", time: "$activity.time" } },
    ],
    taskTotals: [
      { $unwind: "$board" }, { $unwind: "$board.tasks" },
      { $group: { _id: "$board.tasks.status", count: { $sum: 1 } } },
    ],
    perProject: [
      { $unwind: "$board" }, { $unwind: "$board.tasks" },
      { $group: {
          _id: "$_id", name: { $first: "$name" }, total: { $sum: 1 },
          done: { $sum: { $cond: [{ $eq: ["$board.tasks.status", "Done"] }, 1, 0] } },
      }},
      { $sort: { total: -1 } }, { $limit: 4 },
    ],
  },
}])

One round trip, three reductions. MongoDB counts 4,806 tasks server-side and hands back four numbers.

911 ms → 74 ms. 2,490 KB → 2.0 KB. A 99.9% cut in bytes returned.

Bug 3: the comparator that answers "yes" in both directions

This one isn't about speed. It had been sitting in two files for months.

.sort((a, b) => (b.createdAt > a.createdAt ? 1 : -1))

Read it carefully. It never returns 0.

When two invoices share a timestamp, it says a comes before b. Ask the other way round and it says b comes before a. Both, at once.

compare(a, b) === -1   // "a first"
compare(b, a) === -1   // "b first"

That isn't a preference, it's a contradiction, and it breaks the contract Array.prototype.sort is built on. V8's TimSort then resolves it based on array length and starting position rather than on your data.

Equal timestamps are routine here: bulk imports, seed scripts, fast manual entry. And this runs inside a "use client" component, so server and browser can legitimately disagree about row order. That's a hydration mismatch waiting for the right array length.

sort comparator            before      after
  self-contradictory        true        false
  order independent of input true       true

An honest note. The contract violation is proven, and true is not ambiguous. But my 64-element reordering test did not produce a visible reorder, because V8 uses insertion sort below a threshold where the inconsistency happens to be harmless. So this is a latent bug: guaranteed invalid, with a symptom that depends on array size and engine version. I'm reporting what I measured, not what would have made a better story.

-.sort((a, b) => (b.createdAt > a.createdAt ? 1 : -1))
+.sort((a, b) => (a.createdAt === b.createdAt ? 0 : a.createdAt < b.createdAt ? 1 : -1))

Results

                        BEFORE                          AFTER
getInvoicesAdmin       ████████████████████ 6669ms      ████ 1597ms
getDashboardOverview   ███ 911ms                         ▏ 74ms
──────────────────────────────────────────────────────────────────
TOTAL                  ██████████████████████ 7580ms    ████ 1671ms

Metric	Before	After	Change
Total server time	7,580 ms	1,671 ms	4.5× faster
`getInvoicesAdmin()`	6,669 ms	1,597 ms	4.2×
`getDashboardOverview()`	911 ms	74 ms	12.3×
Read from MongoDB	24,849 KB	6,048 KB	−76%
Overview bytes returned	2,490 KB	2.0 KB	−99.9%
Comparator self-contradictory	`true`	`false`	fixed

Same corpus, median of 5 runs.

And the same Sentry view, after:

The interesting part is not that the bars got shorter. **The getMore spans are gone.* Once the projection stops pulling a base64 logo per invoice, the result set fits in a single batch and MongoDB has nothing left to page.*

Output verified identical. 4,806 tasks scanned, 8 activity rows, 4 project bars, before and after. A performance fix that changes behaviour is just a different bug.

Does your app have this? Three checks

This bug class isn't specific to my code. If you run Mongo behind a dashboard, here's how to check in about ten minutes.

1. Grep for unprojected finds.

rg "\.find\(\)" --type ts -A 3 | rg -v "select|limit|projection"

Every hit is a query pulling whole documents. Ask what the caller actually reads. If it's fewer than half the fields, add a .select().

2. Look for a field that can be huge.

Base64 images, rich-text bodies, append-only logs, embedded arrays with no cap. One of these inside a list query is the whole bug. In my schema it was companyDetails.logo.

3. Check whether your reduction happens in the wrong place.

If a function loads N documents and returns a constant number of rows, the reduction belongs in the database. $facet does several at once in a single round trip.

And if your traces show no db.query spans at all, you haven't found a fast app. You've found missing integrations.

What I deliberately did not do

The obvious next move is pushing the money math into MongoDB too, $group the totals and skip the documents entirely.

I didn't, and it's the decision I'd defend hardest.

export function getTotalValue(data: ZodCreateInvoiceSchema): number {
  let total = getSubTotalValue(data) - getDiscountValue(data);
  for (const detail of data.invoiceDetails.billingDetails) {
    if (detail.type === "percentage") total += (total * detail.value) / 100;
    else total += detail.value;
  }
  return Math.round(total * 100) / 100;
}

Those billingDetails compound sequentially. Each percentage applies to the running total, so order matters. Reproducing that in MongoDB expression language means a $reduce that has to stay byte-identical to this function forever.

That's two sources of truth for what a customer owes. A dashboard that loads 200 ms faster is not worth an invoice total that disagrees with itself.

So I cut what gets fetched and left the arithmetic in exactly one place.

What I learned

Empty dev databases hide production bugs. Four invoices, 111 ms. Two thousand, 6,669 ms. The code never changed.

Read comments as claims to verify. "Everything trimmed to what the cards render" was true about the output and false about the input, and that gap was the entire bug.

Instrumentation has defaults, and defaults have gaps. Seventeen integrations, and your database is in none of them. An empty waterfall is a finding, not a clean bill of health.

Learn to read the span name, not just the bar. getMore versus find is the difference between "I sent too many bytes" and "I looked in the wrong place." One is fixed with a projection, the other with an index. The trace told me which before I had read the query.

Report what you measured. The comparator's contract violation is provable. The visible reorder didn't reproduce at 64 elements, so I said so.

The harness, in full

I can't hand you my repo, so here's the thing that actually did the work. It's about thirty lines and it's framework-agnostic: point it at any function your page calls.

// bench.ts (read-only). Runs the same functions the page runs.
const RUNS = 5;

function median(xs: number[]) {
  const s = [...xs].sort((a, b) => a - b);
  const mid = Math.floor(s.length / 2);
  return s.length % 2 ? s[mid] : (s[mid - 1] + s[mid]) / 2;
}

async function time<T>(fn: () => Promise<T>): Promise<[T, number]> {
  const t0 = performance.now();
  const out = await fn();
  return [out, performance.now() - t0];
}

/** Bytes the driver actually pulls back. Pass the SAME projection your
 *  production code uses, or you'll measure an idealised query.
 *  (I got this wrong first time and the projection looked like a no-op.) */
async function wireBytes(model: Model<any>, select?: string) {
  const q = model.find({});
  if (select) q.select(select);
  return Buffer.byteLength(JSON.stringify(await q.lean()), "utf8");
}

const times: number[] = [];
for (let i = 0; i < RUNS; i++) times.push((await time(getInvoicesAdmin))[1]);

console.log(`time         ${median(times).toFixed(1)} ms`);
console.log(`from mongo   ${(await wireBytes(Invoice) / 1024).toFixed(1)} KB`);
console.log(`rendered     4 tiles, 3 donut counts, 6 bars, 4 rows`);

That last console.log is the whole trick. Print what you fetched next to what you rendered, on the same screen. The gap between those two lines is the bug, and once you can see it you can't unsee it.

For volume, generate rows that look like your worst real ones (mine embed a base64 logo), tag every generated row with a marker string so cleanup is exact, and never let the generator update or delete anything it didn't create.

Three bugs. About forty lines changed. 7.6 seconds down to 1.7.

The fix was small. Finding it meant making it hurt first, and turning on the integration that shows you where it hurts.

I wanted Snowflake to do more than store data. So I built a pipeline where it judges World Cup conviction, then writes its verdict back to Solana. The architecture ended up being my favorite part of the project.

Soumyadeep Dey — Thu, 16 Jul 2026 06:08:45 +0000

DEV Weekend Challenge: Passion Edition Submission

Soumyadeep Dey

Jul 12

We Taught a Snowflake Warehouse to Judge World Cup Conviction and Write the Verdict Back to Solana

#devchallenge #weekendchallenge #snowflake #100daysofsolana

15 min read

We Taught a Snowflake Warehouse to Judge World Cup Conviction and Write the Verdict Back to Solana

Soumyadeep Dey — Sun, 12 Jul 2026 18:09:31 +0000

This is a submission for Weekend Challenge: Passion Edition

Target categories: Best use of Snowflake + Best use of Solana.

The single most common mistake in cross-platform hackathon projects is using
one platform as a database and the other as a logo. A dashboard that reads
Solana data through an API and renders it in a UI is not a Solana project.
It's a dashboard.

The actual first question is: what can each platform do that the other
physically cannot?

Solana has behavioral data that exists nowhere else - every buy, sell, and
hold, public and permanent. But a blockchain cannot compute aggregates over
its own history; that's not what the runtime is for. Snowflake can chew
through millions of decoded transactions with a declarative SQL DAG - but it
has no way to make a statement the rest of the on-chain world can read and
build on.

Strip away the World Cup framing and what I built is a pattern: a data
warehouse acting as a first-class blockchain participant. Read state that
only exists on-chain, compute something the chain cannot compute about its
own history, and commit the answer back where other programs can consume it.

Solana (mainnet) --read--> Snowflake --compute--> Snowflake --write back--> Solana (devnet)
  live swaps and             streaming              conviction               an oracle account
  transfers, decoded         ingest + Dynamic       scoring + Cortex         any program can read
  by Helius                  Tables DAG             ML

FERVOR is a real-time conviction oracle for World Cup fandom on Solana.
It tracks sixteen national token communities from the 2026 field - Argentina
to Japan, Morocco to Canada - wired into eight derby rivalries (ARG-BRA,
FRA-ENG, POR-ESP, GER-NOR, USA-MEX, NED-BEL, CRO-MAR, JPN-CAN), and it does
not measure hype, because following a crowd is cheap. It measures
conviction: the wallet that keeps buying its country's token while the
price bleeds, the holder who never sells through a losing run. Passion,
defined mechanically: holding when it hurts.

One thing up front, because it changes how you read every number below

The pipeline is real and running unattended. The fan wallets are not - yet.

Unofficial "national" meme tokens need per-mint liquidity vetting before you
point mainnet ingest at them, so the sixteen mints in REF.TEAM_TOKENS are
still PLACEHOLDER_MINT_*, and the wallet activity flowing through the DAG
is a labeled synthetic feed (_batch_id = 'DEMO_SEED').

What that means precisely: the seeder injects Helius-shaped transactions
upstream, into the raw landing table, and the real warehouse computes every
downstream number from there. Nothing in MARTS is hand-written. Nothing is
mocked below the landing zone. One UPDATE per row in REF.TEAM_TOKENS
points ingest at verified mints and the identical SQL scores real traders.

Full accounting in What's real vs. what's simplified at the bottom. I'd rather you
read the architecture knowing this than discover it in a footnote.

Verify this in 60 seconds - no keys, no clone

The oracle is live. Snowflake's output lands on devnet every 5 minutes, signed and confirmed. Watch the oracle account
- or open the Norway write below and expand the Memo instruction to read the JSON the warehouse emitted.
The warehouse is real. The Dynamic Tables DAG, live in Snowsight
- the actual lineage, not the diagram. And no cron anywhere: every table declares TARGET_LAG and Snowflake works out the refresh order.
The scoring is SQL. warehouse/03_marts_dt.sql
- the conviction score is twelve lines, and it is the only place a wallet's loyalty is decided.

Repo: github.com/SoumyaEXE/weekend-challenge

Every 5 minutes, a Snowflake task stages the freshly computed index and a
signing bridge writes it to Solana devnet as a confirmed transaction:

{
  "oracle": "FERVOR",
  "source": "Snowflake MARTS.TEAM_FERVOR_INDEX",
  "team_id": 10,
  "team": "NORWAY",
  "fervor_index": 236,
  "momentum": -21
}

That JSON is Snowflake output living on a block explorer. fervor_index is a
SQL aggregate. momentum is a Cortex forecast - a machine-learning output
from inside the warehouse, published to a blockchain. Negative momentum means
the model expects the index to fall.

And this is the league the warehouse produces:

Snapshot. The index recomputes every minute; the on-chain writes are every five.

Argentina leads on conviction (109 average days held), Japan is the surprise
runner-up, England has the deepest bench of believers, and Norway folds under
pressure - on the seeded feed.

Norway is the transaction linked above. The prose, the league table, and the
block explorer are three views of one number, and none of them was written by
hand. And "folds under pressure" is the model's verdict, not mine: the
momentum: -21 in that payload is Cortex forecasting Norway's conviction to
slide another 21 points, and that projection is what got signed on-chain.

And that is the honest claim worth making. This table does not prove
Argentina's fans are loyal. It proves the engine discriminates: sixteen
behavioral profiles go into the raw landing table, and the warehouse alone
decides who is a believer and who is a tourist. The ranking is an output, not
an input. Swap the mints and the same SQL judges real money.

The architecture, actually explained

Six stages. Every stage does load-bearing work; there is no component whose
only job is to name-drop a technology.

#	Stage	Tech	Latency	What it does
1	INGEST	Node worker + Helius enhanced transactions	8 s batches	Polls decoded mainnet activity for the 16 tracked country tokens
2	LAND	micro-batch inserts	seconds	Raw VARIANT JSON into `RAW.SOLANA_TX_LANDING`
3	TRANSFORM	Snowflake Dynamic Tables	1 min lag	7-table declarative DAG: decode, price, position, score
4	INTELLIGENCE	Snowflake Cortex	5 min task	`ANOMALY_DETECTION`, `FORECAST`, `COMPLETE`
5	WRITE-BACK	Node signing bridge + Anchor	5 min task	Queue staged in SQL, signed and confirmed on devnet
6	SERVE	Streamlit-in-Snowflake	live	8-tab analytics app, zero external hosting

One database, six schemas, data flowing strictly left to right:

Schema	Purpose	Key objects
`REF`	hand-loaded seed	`TEAM_TOKENS` (mint, decimals, rival_team_id), `PARAMS`
`RAW`	untouched VARIANT landing	`SOLANA_TX_LANDING`
`STAGING`	decoded events (Dynamic Tables)	`TOKEN_TRANSFERS`, `SWAP_EVENTS`, `PRICE_TICKS`
`MARTS`	conviction metrics (Dynamic Tables)	`WALLET_POSITIONS`, `WALLET_FERVOR`, `TEAM_FERVOR_INDEX`, `DEFECTION_FLOWS`
`ML`	Cortex outputs + time series	`TEAM_INDEX_HISTORY`, `FERVOR_ANOMALIES`, `FERVOR_FORECAST`
`ORACLE`	write-back queue + audit	`PUBLISH_QUEUE`, `PUBLISH_LOG`

Stage 1: ingest - a cursor trick that makes polling feel like streaming

Helius decodes raw Solana transactions into clean JSON, so I ingest
structured events instead of byte soup. The worker polls each tracked mint
every 8 seconds, but the until cursor means each poll returns only
transactions newer than the last one seen. Functionally, it's a stream:

// ingest/poller.ts - the cursor is the whole trick
const cursor = new Map<string, string>(); // address -> newest seen signature

async function fetchNewTxs(address: string): Promise<EnhancedTx[]> {
  const until = cursor.get(address);
  const url =
    `https://api.helius.xyz/v0/addresses/${address}/transactions` +
    `?api-key=${HELIUS_API_KEY}&limit=50` +
    (until ? `&until=${until}` : "");
  const res = await fetch(url);
  const txs = (await res.json()) as EnhancedTx[];
  if (txs.length > 0) cursor.set(address, txs[0].signature); // newest first
  return txs;
}

Landing is a multi-row INSERT ... SELECT PARSE_JSON(?) through the Node
connector. Honest note: true Snowpipe Streaming needs the Java ingest SDK.
8-second micro-batches demo identically (rows appear in Snowflake seconds
after they land on-chain) and were far more reliable to stand up in a
weekend.

Stage 3: the Dynamic Tables DAG - no cron, declared lag

I did not write a single scheduler for the transform layer. Each table
declares TARGET_LAG = '1 minute' and Snowflake works out the refresh order:

RAW.SOLANA_TX_LANDING
    |- STAGING.TOKEN_TRANSFERS ----> MARTS.DEFECTION_FLOWS
    |- STAGING.SWAP_EVENTS -------> STAGING.PRICE_TICKS
    |- MARTS.WALLET_POSITIONS
           |- MARTS.WALLET_FERVOR ----> MARTS.TEAM_FERVOR_INDEX
                                             |- (1 min task) ML.TEAM_INDEX_HISTORY
                                             |- (5 min task) Cortex models
                                             |- (5 min task) ORACLE.PUBLISH_QUEUE

The result is a per-minute index series where each of the sixteen nations
charts its own story:

Two pieces of the transform SQL earned their place the hard way.

Decoding BUY/SELL without per-DEX parsers. Helius does not emit a parsed
swap event for every venue, so I classify from token-balance movement: fee
payer receives the tracked token = BUY, sends it = SELL. Routed swaps get
their legs summed, and self-arbitrage transactions (fee payer both sends AND
receives the same token in one tx) are dropped entirely - an arb bot carries
zero conviction signal:

-- warehouse/02_staging_dt.sql (excerpt)
team_legs AS (
  SELECT l.signature,
         MIN(l.block_time)                  AS block_time,
         l.raw_payload:feePayer::string     AS wallet,
         tt.value:mint::string              AS mint,
         r.team_id,
         IFF(tt.value:toUserAccount::string = l.raw_payload:feePayer::string,
             'BUY', 'SELL')                 AS side,
         SUM(tt.value:tokenAmount::float)   AS token_amount
  FROM dedup l,
       LATERAL FLATTEN(input => l.raw_payload:tokenTransfers) tt
  JOIN REF.TEAM_TOKENS r ON r.mint = tt.value:mint::string
  WHERE tt.value:toUserAccount::string   = l.raw_payload:feePayer::string
     OR tt.value:fromUserAccount::string = l.raw_payload:feePayer::string
  GROUP BY l.signature, l.raw_payload:feePayer::string,
           tt.value:mint::string, r.team_id, side
  -- self-arb filter: a signature with BOTH a BUY and a SELL row for the
  -- same token makes this window count 2, and the tx is excluded
  QUALIFY COUNT(*) OVER (PARTITION BY l.signature, r.team_id) = 1
)

The 24-hour price change that survives trading gaps. The naive version is
LAG(price, 24) over hourly buckets. That counts ROWS, not HOURS - one
quiet hour and your "24h change" silently becomes a 25h change. ASOF JOIN
matches each tick to the nearest tick at or before 24 hours earlier, with a
warm-up fallback for series younger than a day:

-- warehouse/02_staging_dt.sql (excerpt)
SELECT cur.team_id, cur.window_start, cur.price_usd,
       100.0 * (cur.price_usd - COALESCE(day_ago.price_usd, first_tick.price_usd))
             / NULLIF(COALESCE(day_ago.price_usd, first_tick.price_usd), 0)
         AS price_change_24h_pct
FROM hourly cur
ASOF JOIN hourly day_ago
  MATCH_CONDITION (DATEADD('hour', -24, cur.window_start) >= day_ago.window_start)
  ON cur.team_id = day_ago.team_id
LEFT JOIN ( -- warm-up: earliest tick, for series younger than 24h
  SELECT team_id, price_usd FROM hourly
  QUALIFY ROW_NUMBER() OVER (PARTITION BY team_id ORDER BY window_start) = 1
) first_tick ON first_tick.team_id = cur.team_id;

That distinction is not pedantry. A dip is what conviction is measured
against, so a 25-hour "24h change" quietly reclassifies who was buying the
bleed. Prices use the hourly median, not VWAP - SOL-leg attribution is
heuristic, and one mis-attributed leg destroys a volume-weighted mean.

The conviction score - the creative payload

Hype is a headcount. Conviction is a cost. Per wallet, per country, 0 to 100:

Component	Max points	What it rewards
`25 x ln(1 + hold_days) / ln(366)`	25	longevity of the relationship
`30 x min(1, dip_buys / 5)`	30	buying while THEIR token bled 5%+ in 24h
`25 x min(1, underwater_buys / 5)`	25	buying below their own average cost
`-20 x (sold within 3 days)`	-20	punishes paper hands

-- warehouse/03_marts_dt.sql (excerpt)
ROUND(GREATEST(0, LEAST(100,
      25 * LN(1 + a.hold_duration_days) / LN(1 + 365)
    + 30 * LEAST(1, COALESCE(c.buy_against_decline, 0) / 5.0)
    + 25 * LEAST(1, COALESCE(c.buy_at_loss, 0) / 5.0)
    - 20 * IFF(a.last_sell_time > DATEADD('day', -3, SYSDATE()), 1, 0)
)), 1) AS fervor_score

Two engineering decisions worth stealing:

Tenure in fractional days (DATEDIFF('second', ...) / 86400.0). Whole-day granularity freezes longevity at zero for 24 hours and flat-lines the index on launch day.
Zero-signal wallets are excluded from the national average. The raw swap feed is dominated by one-shot MEV bots whose score is 0 by construction. Averaging them in measures bot traffic, not the fan base:

ROUND(( COALESCE(AVG(IFF(f.fervor_score > 0, f.fervor_score, NULL)), 0) * 0.6
      + LEAST(100, COUNT_IF(f.fervor_score >= 60) / 10.0) * 0.4 ) * 10, 0)
  AS fervor_index   -- 0 to 1000, this exact number goes on-chain

And the emotional centerpiece, MARTS.DEFECTION_FLOWS: the same wallet
selling one country and buying another within 24 hours. Selling ARGENTINA to
buy BRAZIL is not a portfolio rebalance. It is treason, and it gets its own
Sankey:

Stage 4: Cortex - and the rule that the LLM never produces a number

One rule governs every metric in this system: SQL and fitted models compute
the numbers. The LLM only writes prose. Nothing an LLM says reaches the
index, the league table, or the chain.

Three Cortex models, retrained every 5 minutes by a task.

ANOMALY_DETECTION - with a strict train/detect split.

-- warehouse/04_ml_cortex.sql (excerpt)
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION ML.FERVOR_ANOMALY_MODEL(
  INPUT_DATA => SYSTEM$QUERY_REFERENCE(
    'SELECT team_id, ts, fervor_index FROM FERVOR.ML.TEAM_INDEX_HISTORY
      WHERE NOT COALESCE(is_synthetic, FALSE)          -- never train on demo spikes
        AND ts < DATEADD(minute, -15, SYSDATE())'),    -- strict train/detect split
  SERIES_COLNAME => 'TEAM_ID', TIMESTAMP_COLNAME => 'TS',
  TARGET_COLNAME => 'FERVOR_INDEX', LABEL_COLNAME => '');

The gotcha that cost an hour: my first version trained on ts < now and
detected on the last hour. Cortex rejected it -
All evaluation timestamps must be after the last timestamp in fitting data.
Training and detection must not overlap. Split both at the same boundary,
strict < on one side, >= on the other.

The is_synthetic flag is the honesty mechanism, and it is load-bearing. A
demo needs a passion spike on camera, so scripts/demo_spike.py injects
labeled rows that detection sees but training never fits. The detector
flags the surge because it genuinely is one - not because it was taught to.

FORECAST - and this one goes on-chain. Every nation's index is projected
12 minutes out. The delta between that forecast and the live index is
momentum - the second field in the oracle payload. So a Snowflake Cortex ML
output is signed onto Solana every five minutes, next to the SQL aggregate it
predicts. The dashboard draws it as a dumbbell (live index → forecast) so you
read who is strengthening and who is cracking in one glance, with an explicit
label that it forecasts holding behavior, not token price.

COMPLETE - prose only. A scheduled task writes hype lines; the AI insights
tab calls Cortex on demand from inside the app - no external API, no key,
the LLM runs where the data lives:

SNOWFLAKE.CORTEX.COMPLETE('mistral-large2',
  'You are a crypto market analyst covering World Cup fan tokens. 3 bullets: '
  || 'one strength, one risk, one outlook. ' || team_name || ' index '
  || fervor_index || '/1000, ' || devoted_wallet_count
  || ' believer wallets, average hold ' || avg_hold_days || ' days.')

Every number in that prompt was computed upstream, in SQL. The model is handed
facts and asked for sentences. It is never asked for arithmetic.

The market tabs render real 1-hour candlesticks built in SQL - no OHLC
API, just MIN_BY/MAX_BY over decoded swap executions:

SELECT TIME_SLICE(s.block_time, 1, 'HOUR')      AS h,
       MIN_BY(s.price_usd, s.block_time)        AS open,
       MAX(s.price_usd)                         AS high,
       MIN(s.price_usd)                         AS low,
       MAX_BY(s.price_usd, s.block_time)        AS close,
       SUM(s.sol_amount)                        AS volume
FROM STAGING.SWAP_EVENTS s
WHERE s.team_id = ? AND s.block_time >= DATEADD('hour', -72, CURRENT_TIMESTAMP())
GROUP BY 1 ORDER BY 1;

And the USD leg is live: the bridge pushes the real SOL/USD rate into
REF.PARAMS every 60 seconds, so the whole DAG's pricing tracks the market.

Stage 5: the write-back - where the trust boundary lives

The security property judges should check: the oracle keypair never touches
Snowflake. The warehouse stages numbers in a queue table and nothing more. A
small Node bridge holds the key, signs, submits, and writes the audit trail
back:

-- warehouse/05_oracle.sql
CREATE OR REPLACE TABLE ORACLE.PUBLISH_QUEUE (
  publish_id   NUMBER IDENTITY,
  team_id      NUMBER, team_name STRING,
  fervor_index NUMBER,
  momentum     NUMBER DEFAULT 0,          -- Cortex forecast minus current index
  status       STRING DEFAULT 'PENDING',  -- PENDING -> SENT -> CONFIRMED | FAILED
  queued_at    TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
  tx_signature STRING, explorer_url STRING, last_error STRING
);

Until the Anchor program is deployed the bridge writes signed Memo
transactions - real, confirmed, explorer-visible. Once FERVOR_PROGRAM_ID is
set it flips to the PDA oracle automatically: one account per country,
init_if_needed, so the first write creates it.

// oracle-program/programs/fervor_oracle/src/lib.rs (excerpt)
#[account]
pub struct FervorAccount {
    pub authority: Pubkey,   // first writer claims the PDA
    pub team_id: u16,
    pub fervor_index: u32,
    pub momentum: i32,
    pub updated_slot: u64,
}

I also built the inbound design - a Snowpark procedure calling the bridge
directly over HTTPS, using an External Access Integration with a SECRET and a
NETWORK RULE. Snowflake trial accounts reject it
(509009: External access is not supported for trial accounts), so the
submission ships the queue-poll design. I verified the other path anyway - an
authenticated POST /publish through a cloudflared tunnel produced a confirmed
devnet transaction - and warehouse/06_external_access.sql runs as-is on a paid
account. Same on-chain result, one fewer moving part.

Here is the oracle's actual output over one evening - every marker a confirmed
devnet transaction:

Stage 6: the dashboard runs inside the warehouse

Streamlit-in-Snowflake, eight tabs (Standings, World Cup, Wallets and
rivalries, Market, Token markets, Head to head, AI insights, On-chain oracle),
national colors with SVG emblems, a rotatable 3D conviction scatter (days held
× dip buys × score, sized by SOL traded), a per-wallet drill-down. Zero
external hosting. The app, the data, the ML and the LLM all live in the same
account - which is also why the oracle needs a bridge at all: the SiS sandbox
cannot reach the public internet.

The live World Cup, joined to on-chain conviction

The dashboard's World Cup tab reads the real 2026 FIFA World Cup: fixtures,
scores with extra-time and penalty-shootout detail, the group tables, and the
live Golden Boot race - pulled from football-data.org and set directly beside
the conviction index.

Same constraint as the oracle, same solution: SiS cannot reach the internet, so
a host-side sync writes the tournament into Snowflake and the app only ever
reads a table.

# scripts/football_sync.py (excerpt) - three endpoints -> three REF tables
matches   = get("/matches",   KEY)   # scores, half-time, extra-time, penalties
standings = get("/standings", KEY)   # group tables: W/D/L, goals for/against, pts
scorers   = get("/scorers",   KEY)   # Golden Boot race: goals + assists
# every row maps the API country name to our REF.TEAM_TOKENS team_id via
# our_team(), so the real tournament joins straight onto on-chain conviction

On the day I wrote this, France knocked Morocco out 2-0 in the quarter-final
and Argentina beat Switzerland 3-1 after extra time, setting up an England vs
Argentina semi-final. Mbappe and Messi were tied on eight.

That is not decoration. It is the join that makes the thesis falsifiable. A
conviction oracle exists to answer exactly one question - did the losing fan
base hold, or capitulate? - and you cannot ask it without a real scoreline on
the other side of the join. The tournament is real today. Point the mints at
real tokens and both halves of that question are answerable at once.

The exact line between real and simplified

Stated at the top; itemised here.

Real and running unattended: the full 7-table Dynamic Tables DAG at
1-minute lag; the conviction scoring above; three Cortex models retraining on a
5-minute task; confirmed devnet transactions for all sixteen nations, one
publish cycle every 5 minutes, each with a clickable Solscan link in
ORACLE.PUBLISH_LOG; the live 2026 FIFA World Cup synced into REF.WC_MATCHES,
REF.WC_STANDINGS and REF.WC_SCORERS and joined to conviction; the 8-tab
dashboard inside Snowflake.

Simplified, with reasons:

Simplification	Why	Honest label
Country token mints are placeholders pending verification	unofficial "national" meme tokens need per-mint liquidity vetting before pointing mainnet ingest at them	`PLACEHOLDER_MINT_*` in `REF.TEAM_TOKENS`; one UPDATE per row to go live, the poller skips placeholders
Demo data seeded	placeholder mints produce no organic feed yet	every seeded row labeled: `_batch_id = 'DEMO_SEED'`, wallets end in `demo`, `is_synthetic` history is excluded from model training, sample oracle rows carry `is_demo = TRUE` and are replaced by real bridge writes
8 s micro-batches, not Snowpipe Streaming SDK	SDK is Java-only	noted in code comments
BUY/SELL from balance deltas, not per-DEX decoding	Helius doesn't parse every venue	self-arb filter compensates
Token prices decoded from swap legs, not a DEX API	the pipeline must work from raw transactions alone	USD via a LIVE SOL/USD rate the bridge refreshes into `REF.PARAMS` every 60 s
Writes go to devnet; bridge is centralized	this is an oracle BRIDGE, not a decentralized oracle network	stated plainly

The seeder injects Helius-shaped transactions upstream, into the raw
landing table, and lets the real DAG compute every downstream number - each
nation with its own volatility, trend and fan-base depth, and a slowly drifting
sentiment so believers capitulate or buy the dip and the index genuinely moves
(which is what makes the Cortex forecast diverge). Nothing in MARTS is
hand-written.

These are unofficial Solana meme tokens named after national teams, not
licensed fan tokens (those live on Chiliz). FERVOR analyzes public on-chain
behavior only.

What this unlocks

The World Cup is the costume. Underneath it is a primitive: a data warehouse
that reads a chain, computes what the chain cannot compute about itself, and
signs the answer back where other programs can consume it - with the key held
in a minimal, auditable bridge instead of the warehouse.

Proof-of-reserve attestations. Risk scores for lending protocols.
Activity-based airdrop eligibility. Sybil scoring. Every one of them is an
"aggregate off-chain, attest on-chain" problem, and every one of them currently
gets solved by a bespoke indexer somebody has to babysit.

They are all TARGET_LAG = '1 minute' and a queue table.

Run it yourself

git clone https://github.com/SoumyaEXE/weekend-challenge
cp .env.example .env             # Helius key, Snowflake creds, devnet keypair
npm install
python scripts/run_sql.py --all  # entire warehouse: schemas, DAG, tasks
python scripts/demo_seed.py      # labeled demo data through the real DAG
npm run bridge                   # queue -> signed devnet writes
python scripts/football_sync.py  # live World Cup -> matches, standings, scorers
python scripts/deploy_streamlit.py

Within about two minutes the DAG refreshes, tasks snapshot history, and the
bridge confirms its first write. npm run publish:once forces a publish for
the impatient. python scripts/demo_spike.py makes the anomaly detector fire on
camera; --clean removes the evidence.

To go live on real trades: fill config/mints.json with verified mints, run
python scripts/set_mints.py (it auto-fetches each token's decimals and logo),
then npm run ingest. Not one line of SQL changes.

Built with @dronzer2code within the challenge window with AI pair-programming. All simplifications and data labeling are documented above and in the README.

Hermes Agent for Developers: The Open Source AI Agent That Learns & Remembers

Soumyadeep Dey — Sat, 30 May 2026 04:28:57 +0000

Why Hermes Agent Is One of the Most Practical Open Source AI Agents for Developers

This is a submission for the Hermes Agent Challenge.

Author: SoumyaEXE

Portfolio: isoumya.xyz

LinkedIn: isoumyadeyy

Introduction

Agent frameworks are everywhere right now, but most of them still feel like demos dressed up as products. They can answer prompts, call a tool or two, and generate nice screenshots for launch posts, yet many fall apart when the task gets longer, messier, or more dependent on memory.

That is why Hermes Agent caught attention. It is not positioned as just another chatbot wrapper or an IDE sidekick. Hermes Agent is designed as an autonomous, self-improving agent that can plan, use tools, build reusable skills, and persist memory across sessions. That combination matters a lot for developers who want something more durable than one-shot prompt orchestration.

This post focuses on the developer angle. Instead of repeating marketing copy, it looks at what Hermes Agent is, why its design stands out, how a developer can start using it, and where it fits in a real workflow. The goal is simple: help other developers decide whether Hermes Agent is worth their time for learning, experimentation, and actual projects.

Why this topic matters now

The timing of this challenge is interesting because the AI tooling landscape is moving from simple assistants toward more persistent systems. The official Hermes Agent Challenge asks writers to publish something that educates, inspires, or sparks discussion around Hermes Agent, and submissions are judged on clarity, originality, practical value, and writing quality [1].

That judging criteria actually says a lot about what kind of post works best. A strong submission should not just praise the tool. It should teach something useful, present a clear point of view, and leave readers with a practical understanding they can apply immediately.

So instead of writing a shallow overview, this article takes a position: Hermes Agent is one of the more practical open source agent systems for developers because it treats memory, skills, tooling, and deployment as first-class parts of the product rather than afterthoughts.

What Hermes Agent actually is

According to the official documentation, Hermes Agent is a self-improving AI agent built by Nous Research, and its core idea is a built-in learning loop that creates skills from experience, improves them during use, persists knowledge, and builds a deeper model across sessions [2].

That description is important because it changes how the agent should be evaluated. Many systems are good at completing a single session well enough. Hermes Agent is trying to become more useful over time. That means the benchmark is not only immediate output quality. The benchmark is whether repeated use produces a smarter, more capable system without forcing the developer to constantly rebuild context from scratch.

In practice, Hermes Agent presents itself as more than a local toy. The docs highlight that it can run across local environments and multiple terminal backends including Docker, SSH, Daytona, Singularity, and Modal, which makes it easier to think of it as infrastructure rather than a single-device assistant [2].

The four ideas that make Hermes Agent stand out

1. Persistent memory is not optional

One reason many agent demos feel brittle is that they start fresh too often. A tool might solve a task today, then forget everything tomorrow. Hermes Agent is built around a memory system that grows across sessions, with cross-session recall listed as a core feature in the official docs [2].

For developers, this matters in obvious ways. If an agent remembers preferred workflows, recurring project context, prior fixes, or known environment quirks, it becomes more valuable with repetition. That turns the system from a novelty into something closer to an assistant that can actually compound usefulness over time.

This is especially relevant for software work because software work is repetitive in structure, even when the tasks themselves differ. A developer often revisits the same repositories, conventions, setup steps, deployment targets, and debugging habits. An agent that remembers these patterns can reduce friction in a way stateless systems simply cannot.

2. Skills turn experience into reusable procedure

Hermes Agent also emphasizes a skills system. The docs describe procedural memory and reusable skills as a major feature, and independent writeups on Hermes highlight that the agent can generate and reuse skill documents after completing complex tasks [2][3].

That is a stronger idea than it sounds at first glance. A lot of AI tooling relies on the user becoming the memory layer. The human has to remember which prompt worked, which sequence of steps got the right result, and which format produced reliable output. Hermes Agent tries to capture some of that procedure itself.

For a developer, this creates an appealing loop:

Solve a task once
Let the agent retain the pattern
Reuse the pattern later with less prompting
Improve the pattern as usage continues

That is one of the most convincing arguments for Hermes Agent. It is not only trying to answer better. It is trying to operationalize experience.

3. Tool use is built into the agent identity

The official documentation positions Hermes Agent as an agent with broad tool access, including web search, browsing, extraction, image generation, TTS, MCP support, and more than 60 built-in tools depending on configuration [2].

This matters because real agentic work is almost never pure text generation. Useful systems need retrieval, execution, inspection, iteration, and output shaping. A good model alone is not enough. Hermes seems to understand that the value of an agent lies in the relationship between reasoning, memory, and tools.

There is also a practical advantage here. Developers do not want a fragile stack where every new capability requires wiring together five more packages. Hermes Agent appears to reduce that setup burden by making tool use part of the default mental model rather than an advanced extension.

4. It is designed to live outside the laptop

One of the better ideas in the Hermes documentation is the insistence that the agent is not tied to a single laptop session. The docs explicitly frame Hermes as something that can run on a cheap VPS, a GPU cluster, or serverless infrastructure, while also connecting through platforms such as Telegram, Discord, Slack, WhatsApp, Teams, and more [2].

That changes the type of projects developers can imagine. Instead of thinking only in terms of a command line assistant opened for a few minutes, it becomes easier to imagine background research agents, message-based task bots, or persistent personal infrastructure that can keep working while the user is away.

A lot of agent products claim flexibility. Hermes Agent makes that flexibility feel architectural.

Why developers should care

The strongest reason to care about Hermes Agent is not hype. It is leverage.

A developer usually does not need an AI tool that looks impressive for thirty seconds. A developer needs a system that can help with repeated workflows, multi-step tasks, context retention, and automation that survives beyond a single prompt window.

Hermes Agent lines up well with that need in several areas:

It supports persistent memory, which makes repeated use more valuable [2]
It supports reusable skills, which can convert successful task patterns into future shortcuts [2][3]
It supports multiple deployment environments, which opens the door to self-hosted and remote workflows [2]
It supports many communication platforms and tool integrations, which makes it easier to bring the agent into real work channels instead of leaving it isolated in a terminal [2]

That combination is what makes Hermes Agent interesting beyond the current agent trend cycle. It is built around continuity.

A practical use case: an automated research pipeline

To make this concrete, consider a small automated research pipeline. This is the sort of project that fits both the challenge spirit and everyday developer needs.

A basic research workflow usually involves the same sequence again and again:

Accept a topic or question
Search for relevant sources
Open and extract content from the strongest pages
Compare claims across sources
Summarize findings in a structured format
Preserve useful patterns for later runs

Hermes Agent is well suited to this type of workflow because it combines tool use, memory, and iterative reasoning. The official docs specifically highlight web search, browsing, extraction, memory, scheduled automations, and subagents for parallel workstreams as supported capabilities [2].

That means a developer can imagine a workflow like this:

Hermes receives a research prompt
It searches the web using integrated tools
It extracts content from relevant sources
It writes a structured summary
It stores notes about the format, source preferences, and common research patterns for future work

The interesting part is not just that Hermes can do each step. It is that the system is designed to improve the workflow over time.

Fast setup path

The official installation flow is refreshingly direct. The docs provide a one-line install script for Linux, macOS, and WSL2, and they recommend hermes setup --portal as the fastest path to a working agent with model access plus tool gateway features such as web search, image generation, TTS, and browser access [2].

Here is a compact example setup flow.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
hermes setup --portal

That installation story matters more than people admit. In fast-moving AI tooling, the difference between curiosity and abandonment is often ten minutes of setup friction. Hermes Agent looks stronger here because the onboarding path is designed around getting to a usable system quickly [2].

A simple developer-oriented workflow

Below is a lightweight conceptual setup for a research-focused Hermes workflow. It is intentionally short because the purpose is to show how approachable the structure can be.

agent:
  name: research-agent
  profile: technical-writer
  memory:
    enabled: true
  skills:
    enabled: true

runtime:
  backend: local

models:
  default: portal/default

goals:
  - Search for high-quality sources
  - Extract useful technical details
  - Summarize findings clearly
  - Reuse successful workflows later

And here is the type of prompt that fits Hermes well:

hermes run research-agent --prompt "Research memory-augmented open source AI agents, compare core features, and draft a developer-friendly summary with practical examples."

This type of workflow plays to the platform's strengths. It is not just asking the model to write. It is asking the agent to search, organize, compare, retain, and evolve.

Where Hermes Agent feels stronger than many alternatives

A lot of frameworks in the agent ecosystem are capable. That is worth stating clearly. There are several good options depending on whether someone wants orchestration, workflow control, coding support, or enterprise tooling.

Still, Hermes Agent feels different in a few practical ways.

Area	Why Hermes Agent stands out
Memory	Persistent cross-session memory is positioned as a core feature, not a plugin [2]
Skills	Reusable procedural skills are central to the product identity [2][3]
Deployment	Local, Docker, SSH, Modal, and other backends are documented as supported paths [2]
Communication	Hermes can connect to many messaging platforms, which expands real-world usage options [2]
Learning loop	The built-in self-improvement story is more explicit than in many general-purpose frameworks [2][3]

That does not mean Hermes Agent is universally better. Some developers may still prefer lower-level orchestration frameworks if they want to custom-build every layer. Others may prefer vendor-specific ecosystems if they are deeply invested in a single cloud stack.

But Hermes Agent makes a compelling case for developers who want a system that is usable now, extensible later, and architected around long-term improvement.

What kind of developer should try Hermes Agent

Hermes Agent is especially attractive for a few categories of builders:

Developers who want to self-host or run agents on their own infrastructure [2]
Builders interested in long-running assistants rather than one-session chat tools [2]
People experimenting with research workflows, coding helpers, task bots, or messaging-based agents [2]
Developers who care about reusable procedure, not just prompt output [2][3]
Tinkerers who want an open system with MCP support and portable skills [2]

That last point matters a lot. A framework becomes much more interesting when it lets developers build systems that feel personal and durable. Hermes Agent seems designed for that mode of use.

The trade-offs to think about honestly

A good technical post should not read like promotion. So it is worth talking about trade-offs.

First, a system with memory, tools, multiple backends, and cross-platform integration will naturally be more complex than a simple chat interface. That complexity can be a strength, but it also raises the bar for understanding, debugging, and configuration.

Second, the promise of self-improvement is powerful, but any learning system depends on the quality of the tasks, stored artifacts, and feedback loops around it. In other words, Hermes Agent may create better workflows over time, but that still requires thoughtful usage. Good systems amplify good practice. They do not remove the need for it.

Third, the more ambitious the workflow, the more important observability becomes. If an agent searches, extracts, remembers, and adapts, developers need to inspect what it did and why. That is not a Hermes-only issue. It is a general truth of agent engineering.

These are not reasons to avoid Hermes Agent. They are reasons to approach it like infrastructure rather than magic.

What makes a strong DEV challenge post

Since this article is also a challenge submission, it is worth stepping back and asking what kind of Hermes post is likely to perform well with readers and judges.

The official challenge rules say the write track is judged on clarity and depth of explanation, originality of perspective or insight, practical value to the community, and quality of writing [1]. That means the strongest posts will probably do three things well:

Teach something concrete
Offer a distinct point of view
Stay readable for developers who are curious but not yet invested

That is why Hermes Agent is such a good subject for writing. It naturally supports several strong angles:

A setup guide for local or VPS deployment
A breakdown of the memory and skills model
A comparison against other agent approaches
A case study around a useful workflow like research, automation, or coding assistance
A broader essay on what persistent agents mean for developer tooling

For reach, the best post is usually not the most academic one. It is the one that gives readers enough technical substance to trust the writer while keeping the narrative clear enough that they actually finish it.

Why Hermes Agent is worth watching

There are many open source AI projects that generate excitement for a week and then disappear into a GitHub tab graveyard. Hermes Agent feels more durable because the core ideas behind it map onto real problems developers actually have.

Developers need systems that remember, adapt, and integrate with tools. They need agents that can live outside a browser tab. They need workflows that become easier after the fifth run, not harder.

Hermes Agent is interesting because it is trying to solve those exact problems. The official docs frame it as an autonomous agent with persistent memory, reusable skills, broad tool access, multiple runtime backends, and rich communication options across platforms [2]. That is not a small ambition, but it is the right ambition.

Final thoughts

Hermes Agent stands out because it treats agency as a systems problem, not just a prompting trick. Memory, skills, execution environments, communication channels, and tool access are all part of the same product story [2].

That makes it one of the more practical open source agent projects to study right now. Not because it promises magic, but because it is built around the idea that useful agents should improve with experience, persist across contexts, and fit into the way developers actually work [2][3].

For developers exploring open source agent frameworks in 2026, Hermes Agent looks less like a passing novelty and more like a serious attempt at building software that can keep getting better after the first run [2][3].

Useful links

Challenge page: Hermes Agent Challenge
Official rules: Hermes Agent Challenge Contest Rules [1]
Documentation: Hermes Agent Docs [2]
GitHub: NousResearch/hermes-agent
GitHub profile: SoumyaEXE
Portfolio: isoumya.xyz
LinkedIn: Soumya Dey

Your AI agent has amnesia. You've just normalized it.

Soumyadeep Dey — Sat, 23 May 2026 17:44:06 +0000

Three hackathons in three months. Three different agent setups. Three times I wrote a system prompt that said something like "you are a helpful assistant that can use tools and reason across steps." Three times I shipped something, closed the terminal, and the agent forgot everything it had ever done.

I didn't think of it as a problem. I thought of it as just how agents work.

That's the part worth interrogating.

There's a normalization happening in how we build with AI. We've collectively agreed that agents are stateless, that every session starts cold, that the "memory" problem is either solved by shoving things into a context window or punted to some RAG pipeline you bolt on after the fact. We treat amnesia as a default and then we build around it. Bigger context windows. Better prompts. Smarter retrieval.

But none of that solves the actual problem, which is: the agent isn't getting better at working with you specifically. It's not accumulating judgment. It's not noticing that you always want JSON back, or that your last three API integrations broke on rate limits, or that you prefer terse explanations over verbose ones. Every session, you re-establish who you are. Every session, the agent starts from scratch.

That's not intelligence. That's a very fast amnesiac with a lot of knowledge.

The thing that made me stop and actually think was this line from Hermes Agent's README:

"The self-improving AI agent... it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions."

Not "we have a memory feature." Not "long-term context support." Skills from experience. A deepening model of who you are.

That's a different claim. And it's architectural, not cosmetic.

When Hermes completes a task, it can crystallise what it did into a reusable skill, tag it, store it, and pull from it next time a similar problem comes up. The first time you ask it to do something gnarly, it figures it out. By the third time, it's refined the approach. This is not a context window trick. The learning loop is built into the agent itself.

Here's a concrete thing to sit with: imagine you're onboarding a junior developer. Day one, you explain your codebase, your conventions, your preferences. Day two, they remember. Day thirty, they're fast because they've internalized how you work. Now imagine that junior dev reset every morning with zero memory of the previous day. You'd re-onboard them every single time. You'd eventually stop delegating anything non-trivial because the overhead is too high.

That's what most of us are doing with agents right now, and calling it "agentic AI."

Hermes Agent is at least trying to solve the actual problem instead of building better workarounds for it.

The model-agnostic architecture is worth one paragraph because it doesn't get enough attention. Hermes runs on anything: Claude, GPT-4o, Gemini, 200+ models through OpenRouter, your own endpoint. Switch models with hermes model, no code changes. This matters less as a convenience feature and more as a philosophical stance. The model is not the agent. The learning loop, the skill library, the memory system, the planning layer: those are the agent. The model is just the reasoning engine you slot in. When the next frontier model drops in six months and blows the current one out of the water, Hermes users swap it in with one command. Everyone building on a single-model API foundation has to rearchitect.

The infrastructure story also compounds with everything above. You can run this on a $5 VPS. Not "you can try it on a $5 VPS before you upgrade," actually run it there. The serverless option means you're paying essentially nothing when it's idle. You can message it from Telegram while it does work on a cloud machine. That's not a demo feature, it's a deployment story that actually works outside your laptop.

I want to be honest: I haven't run Hermes for six months and watched it compound knowledge into something that feels like a real collaborator. Neither has basically anyone outside the Nous Research team. The learning loop architecture is genuinely interesting but the proof will be in sustained use cases, and those take time to surface.

But here's what I do know: the agents I've built and shipped have all hit the same ceiling. They can reason well within a session. They fall apart across sessions. Every improvement I made was either a bigger context window, a more detailed system prompt, or a retrieval system I had to design and maintain separately. None of it made the agent better at working with me. I was compensating for statelessness, not solving it.

Hermes Agent is trying to solve it. The architecture is open source, the model layer is swappable, and the whole thing runs on infrastructure you control.

In a space where most "agents" are prompt chains with an API call attached, that's worth taking seriously.

[NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent) on GitHub. Quickstart video is linked in the docs. Worth an afternoon.

Gemma 4 Has Four Variants. Here's How to Pick the Right One Before You Write a Single Line of Code.

Soumyadeep Dey — Sat, 23 May 2026 16:46:16 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

The single most common mistake developers make when picking a local model is choosing based on benchmark scores. The second most common mistake is choosing based on what fits in VRAM.

Both of those things matter. But neither one is the actual first question.

The actual first question is: where does your model need to live, and what does it need to do there?

Gemma 4 ships in four variants - E2B, E4B, 26B A4B (MoE), and 31B - and Google made very deliberate architectural choices for each one. If you understand those choices, picking the right variant takes about five minutes. If you skip that step and benchmark-shop, you'll end up either underbuilding (a phone-ready E4B doing work that needs 256K context) or overbuilding (a 31B model sitting on $80/month of cloud compute when an E4B running locally would have been fine).

This post is that five-minute decision guide.

What Gemma 4 Actually Is

Released on April 2, 2026 under Apache 2.0, Gemma 4 is Google DeepMind's latest open-weight model family. Every variant ships with multimodal understanding (text + image as baseline, audio natively on the two smallest models), native function calling, and support for over 140 languages.

The headline capability that separates Gemma 4 from previous generations isn't any single feature. It's the intelligence-per-parameter ratio. The 26B MoE model only activates roughly 4B parameters per forward pass. The E4B runs on a phone. The 31B scores 89.2% on AIME 2026 math benchmarks - a score that would have required a model several times larger just a year ago.

The architecture decisions that make this possible:

Alternating local/global attention layers (local layers use sliding windows of 512-1024 tokens, global layers handle long-range context)
Per-Layer Embeddings (PLE) on the edge variants, which keeps the parameter count low while maintaining expressivity
Mixture-of-Experts on the 26B that routes each token through only the relevant expert layers, not the full network

This isn't just efficiency for efficiency's sake. It's what allows a 4-billion-parameter model to run offline on an Android phone with 4GB of RAM while still having a 128K context window. That combination didn't exist before.

The Four Variants, Actually Explained

Gemma 4 E2B - The Phone Model

~2.3B effective parameters, ~5.1B total with PLE, 35 layers, 128K context

This is the model you reach for when the edge is the deployment target. It runs on Android 12+ via Google AICore, on Raspberry Pi, and on Jetson devices. It supports text, image, and audio natively.

The "E" in the name stands for effective - because PLE means the model has more total parameters than it activates per forward pass, similar to how MoE works at a different level of the architecture. The practical result is a 1.5GB footprint with capabilities that land well above what a raw 2B parameter count would suggest.

Use E2B when: you're building a mobile app, an edge inference pipeline, a device-local assistant, or anything where network latency or data privacy makes sending requests to a remote API unacceptable.

Real use case: a receipt-scanning expense tracker that runs fully offline, reads image input, parses line items, and categorizes spending - all on device, no API call, no data leaving the phone.

# Running E2B locally with transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "google/gemma-4-E2B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Extract the total amount and vendor name from this receipt text: ..."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    return_dict=True
).to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Gemma 4 E4B - The Laptop Model

~4.5B effective parameters, ~8B total, 42 layers, 128K context

This is the everyday workhorse for developers who want to run a capable model locally without dedicated GPU hardware. It runs comfortably on a MacBook with 16GB unified memory, on a mid-range laptop with an integrated GPU, and on any machine where you'd rather not spin up a cloud instance.

The jump from E2B to E4B isn't just more parameters. The additional layers and parameter budget give it noticeably better instruction following, more reliable structured output, and stronger performance on tasks that require holding context across a long conversation.

It supports the same text, image, and audio modalities as E2B, which makes it genuinely multimodal in a way that matters for developer tooling - you can feed it screenshots, diagrams, or audio transcripts as part of a pipeline without needing a separate vision model.

Use E4B when: local inference is the requirement, your hardware doesn't have a discrete GPU, or you're prototyping something you'll later scale to a larger model and want fast iteration cycles.

Real use case: a local code review tool that takes a screenshot of your editor alongside the diff, understands both, and gives context-aware feedback - all running on your laptop, no telemetry.

# Quick Ollama setup for E4B (easiest local path)
# After installing Ollama: https://ollama.com

# In terminal:
# ollama pull gemma4:e4b

import ollama

response = ollama.chat(
    model="gemma4:e4b",
    messages=[
        {
            "role": "user",
            "content": "Review this function for edge cases and suggest improvements:",
        }
    ],
    options={
        "temperature": 0.3,
        "num_ctx": 8192  # can go up to 128K
    }
)

print(response["message"]["content"])

Gemma 4 26B A4B (MoE) - The Consumer GPU Model

25.2B total parameters, ~3.8B active per forward pass, ~30 layers, 256K context

This is the one that makes the architecture story interesting. The 26B MoE sounds like it needs 26 billion parameters worth of compute. It doesn't. Only about 4 billion parameters activate for each token, which means it runs on a single RTX 3090 or RTX 4090 at full precision while delivering quality that competes with much larger dense models.

The jump to 256K context window is significant for developers. At 128K you can fit roughly a medium-sized codebase or a very long document. At 256K you're fitting large repositories, multi-document research contexts, or full conversation histories in customer-facing applications.

The MoE architecture also means that quality degrades more gracefully with quantization than a dense model of equivalent total parameters would. INT4 at 26B MoE looks better than INT4 at a comparable dense model.

Use 26B A4B when: you have a consumer GPU (24GB VRAM), need 256K context, and want near-flagship quality without flagship hardware costs. Also the right choice for anything agentic where the model needs to reason across large amounts of context to plan multi-step tasks.

Real use case: an agentic document processor that ingests a full legal contract (or a full codebase) in a single prompt, reasons across the entire document, and extracts structured data or answers specific questions - running locally on a 4090.

# Using the Gemma 4 26B with native function calling
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

model_id = "google/gemma-4-26B-A4B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_4bit=True  # fits on 24GB with 4-bit quant
)

# Native function calling - define your tools
tools = [
    {
        "name": "search_contracts",
        "description": "Search the contract database by clause type or party name",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "clause_type": {
                    "type": "string",
                    "enum": ["liability", "termination", "payment", "IP"],
                    "description": "Type of clause to filter by"
                }
            },
            "required": ["query"]
        }
    }
]

messages = [
    {
        "role": "user",
        "content": "Find all termination clauses across the Q1 vendor contracts and summarize the notice periods."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    return_tensors="pt",
    return_dict=True
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Gemma 4 31B - The Server Model

31 billion dense parameters, 256K context, full multimodal, thinking mode

This is the flagship. Every capability available in the family is present here. Thinking mode (chain-of-thought reasoning) is enabled. Math benchmark scores are serious: 89.2% on AIME 2026, compared to Gemma 3 27B's 20.8% on the same benchmark. It sits at #3 on the Arena open model leaderboard.

It requires ~20GB VRAM at FP16, or ~12GB with INT4 quantization. A single A100 80GB handles it comfortably at full precision. Two RTX 4090s with tensor parallelism also work. This is the model you deploy to a server, not run on a laptop.

Use 31B when: benchmark quality matters for your application, you need thinking mode for reasoning-heavy tasks, you're building a production service that will handle requests from multiple users, or you need the best math and coding performance available in an open-weight model.

Real use case: a coding assistant API that developers on your team query through a self-hosted endpoint - one 31B instance serving your whole engineering org at a cost that's a fraction of equivalent proprietary API calls.

# Serving 31B with vLLM for production throughput
# pip install vllm

from vllm import LLM, SamplingParams

llm = LLM(
    model="google/gemma-4-31B-it",
    tensor_parallel_size=2,   # across 2x RTX 4090
    dtype="bfloat16",
    max_model_len=65536        # 64K for production balance
)

sampling_params = SamplingParams(
    temperature=0.2,
    top_p=0.9,
    max_tokens=2048
)

# Thinking mode for complex reasoning
prompts = [
    "<start_of_turn>user\nThink step by step: Given this algorithm, what's the worst-case time complexity and where is the bottleneck?\n\n[your code here]\n<end_of_turn>\n<start_of_turn>model\n"
]

outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    print(output.outputs[0].text)

The Decision Matrix

Here's the five-minute version:

Situation	Model
Mobile app, Raspberry Pi, offline-first	E2B
Laptop development, no GPU, fast iteration	E4B
Consumer GPU (24GB), 256K context needed	26B A4B MoE
Server deployment, best quality, team-serving	31B
Agentic pipeline with many tool calls	26B A4B MoE (active param efficiency)
Math, coding, or reasoning-heavy production	31B
Privacy-sensitive user data, no API calls	E4B or E2B
You have an A100 and want the best	31B

The Bigger Thing Happening Here

I want to step back from the specs for a second.

A model that scores 89.2% on a serious math benchmark, supports 256K context, runs multimodal inference, and has native function calling for agentic tasks... is now open-weight, Apache 2.0, and runs on hardware that a developer can actually own.

The E4B running on a laptop with 128K context and audio support isn't a "small model compromise." It's a capability that would have been frontier-level two years ago. The E2B running on a phone offline isn't a demo trick. It's a production-viable deployment target.

What that actually means is that the architectural question of "cloud or local?" is no longer primarily a capability question. It's a cost, latency, and privacy question. And for a lot of applications - the ones where user data is sensitive, where offline availability matters, where API costs compound at scale - local wins.

Gemma 4 doesn't make that argument. It just makes it very hard to argue against.

Getting Started in Under 5 Minutes

The fastest path to running any Gemma 4 variant locally is Ollama:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the variant you want
ollama pull gemma4:e4b     # ~5GB, laptop-ready
ollama pull gemma4:26b     # ~15GB, GPU-ready

# Run it
ollama run gemma4:e4b

# Or use the API directly
curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [
    { "role": "user", "content": "Hello, what can you do?" }
  ]
}'

If you want Python with the full transformers ecosystem (function calling, thinking mode, multimodal), the Hugging Face model cards for each variant have complete working examples. Start with google/gemma-4-E4B-it - it's the most accessible entry point and covers most development use cases.

Quick Note on Licensing

Apache 2.0 means you can use Gemma 4 commercially, modify the weights, build products on top of it, and distribute your derivative work - without paying royalties or asking permission. That is not the case for every "open" model out there, and it matters a lot for anyone building a business on top of local inference.

The right Gemma 4 variant is the one that runs where your users are, fits the hardware you can actually provision, and has enough context to do the task you're designing for. Everything else is optimization.

Start with E4B if you're unsure. Scale up when the task demands it.

Tags: devchallenge gemmachallenge gemma ai machinelearning python opensource

What Is WebMCP? The Google I/O 2026 Web Standard That Changes AI Agent Tool Use

Soumyadeep Dey — Sat, 23 May 2026 16:39:52 +0000

A first-look at the most underreported thing to come out of Google I/O 2026

I was watching the Google I/O 2026 Developer Keynote on my laptop at 11:30 PM with a cup of cold chai. Gemini 3.5 Flash landed. Antigravity 2.0 got a standing ovation. The smart glasses got the screenshots. Everyone was talking about Gemini Omni.

And then, buried between slide 40 and a code snippet, a presenter said the words "WebMCP."

I sat up straight. I replayed that section three times.

Here is why.

A little context on what I'm building

For the past few months I've been working on a sales platform that stitches together an omnichannel inbox (Chatwoot) with a generative UI layer inside chat. The core problem is this: sales agents need to close deals inside conversations, not by jumping between five tabs. I want the chat widget itself to become the deal-closing surface - surfacing interactive UI components, pulling CRM context, letting a buyer sign a proposal inside the chat thread.

The hardest unsolved piece of that puzzle has always been: how do you let an AI agent do something meaningful on a webpage without either (a) screen-scraping like it's 2019, or (b) building a bespoke API integration for every single surface it needs to touch?

WebMCP is Google's answer to that question. And it's a good one.

What WebMCP Actually Is

WebMCP is a proposed open web standard that lets developers annotate their JavaScript functions and HTML forms so that browser-based AI agents can call them directly as structured tools - with the same reliability you'd expect from a typed API, not from a model guessing where to click.

Think of it as MCP, but for the open web. Regular MCP handles the vertical connections: agent to database, agent to file system, agent to third-party API. WebMCP handles the horizontal layer: agent to the website sitting in front of you right now.

The origin trial starts in Chrome 149. Here is what the developer-side contract looks like conceptually:

// You annotate your JS functions with WebMCP metadata
// so agents know what they can call, what inputs they need,
// and what effect they'll produce

const webmcpManifest = {
  tools: [
    {
      name: "schedule_demo",
      description: "Books a product demo for a qualified lead",
      parameters: {
        name: { type: "string", required: true },
        email: { type: "string", required: true },
        preferred_slot: { type: "string", enum: ["morning", "afternoon", "evening"] }
      },
      handler: scheduleDemo
    },
    {
      name: "generate_proposal",
      description: "Creates a personalized pricing proposal and returns a preview URL",
      parameters: {
        company_name: { type: "string", required: true },
        seats: { type: "number", required: true },
        plan: { type: "string", enum: ["starter", "growth", "enterprise"] }
      },
      handler: generateProposal
    }
  ]
};

// Register once. Any WebMCP-capable agent can now call these
// without scraping your DOM or guessing your form structure.
window.__webmcp__ = webmcpManifest;

The agent on the other end - whether it's Gemini in Chrome, an Antigravity subagent, or your own custom orchestrator - now has a reliable, developer-defined surface to work with. No brittle CSS selectors. No "find the button that looks like it submits the form." Just a typed function call.

Why This Matters More Than It Looks

Here is the frame I keep coming back to: every website becomes a tool.

Right now, when you build an AI agent that needs to do anything on the web, you have two bad options and one expensive option.

Bad option 1 is prompt the model to act like a browser automation script. You're essentially doing Playwright-via-LLM, which works until a div gets renamed and your entire pipeline breaks silently at 2 AM.

Bad option 2 is write a bespoke connector for every website or service you want the agent to interact with. Fine for the five services you use every day. Not fine for the long tail of things your users will eventually ask your agent to handle.

The expensive option is to build everything in-house, behind your own API, so the agent only ever talks to surfaces you control. This is what most serious teams do. It scales, but it costs months.

WebMCP opens a fourth path: the website author defines the agent interface themselves, once, as part of their normal development work. You annotate the functions that make sense to expose. You describe what they do. You ship it as part of your site. From that point forward, any agent with WebMCP support can call those functions correctly - without you writing a new connector, and without the agent guessing.

The protocol also composes cleanly with the rest of the stack Google laid out this year:

MCP handles agent-to-infrastructure connections (databases, APIs, file systems)
A2A handles agent-to-agent coordination across vendors
WebMCP handles agent-to-website interaction in the browser

Three protocols, three layers, one coherent answer to "how does an agent actually do things in the real world?"

The Angle Nobody Is Talking About

Most I/O coverage framed WebMCP as a tool for making AI assistants more useful to end users. Fair. But I think the more interesting frame is what it does for developers building agentic products.

If I'm building the sales platform I described above, WebMCP means that instead of my agentic layer needing to "know" how to use each tool on the page through brittle DOM inspection, I can define those tools explicitly as part of my product. My checkout flow, my proposal generator, my slot-booking widget - they're all first-class agent interfaces the moment I annotate them.

The generative UI layer I was building with Tambo (the agentic React UI toolkit) suddenly has a much cleaner answer to "how does the agent actually trigger actions?" instead of "it passes props and hopes for the best." The agent calls a WebMCP-registered tool. The tool fires a handler. The handler updates state. The UI responds.

That is a clean, auditable, testable loop. That matters when you are building something real, not a demo.

What I'm Watching Next

A few things I'm paying close attention to as WebMCP moves through the origin trial:

Security boundaries. Right now the spec assumes the website author is intentional about what they expose. But as soon as agent-browsing becomes mainstream, you'll see adversarial cases: pages that register fake tools designed to manipulate agents into taking actions users didn't intend. The security model around what a WebMCP tool is "allowed" to do on behalf of a user needs to be tight before this goes anywhere near financial or identity workflows.

Cross-browser adoption. This is proposed by Google and Microsoft engineers under the W3C Web Machine Learning community group - which is a good sign for eventual cross-browser support. But Chrome 149 origin trials don't mean it's in Firefox next quarter. For developers building agent-facing products today, you'll need fallback strategies for a while.

Standardization lag vs. tooling speed. The web standards process moves on a timescale that AI tooling doesn't. By the time WebMCP is a full W3C Recommendation, the agent landscape will have changed dramatically. Google's bet here is that shipping an origin trial fast and getting real-world feedback is worth more than waiting for perfect committee consensus. I think that bet is correct.

The Bigger Shift Underneath All of It

There was a line in the keynote that I keep turning over in my head: "We've transitioned from AI that simply assists you, to agents that can independently navigate complex tasks across your entire workflow."

That is a product vision, not a product feature. And WebMCP is one of the infrastructure pieces that makes that vision non-fictional.

The assistive AI era had a ceiling: the model could help you think, draft, and plan, but doing required a human in the loop who could actually click the button, fill the form, and trigger the action. The agentic era removes that ceiling - but only if agents can interact with the web in a way that's reliable enough to trust in production.

WebMCP is a small, well-scoped proposal. It doesn't solve everything. But it solves the exact right problem at the exact right layer: giving website developers a first-class way to say "here is what an agent is allowed to do on this page, and here is how to do it correctly."

For anyone building agentic products on the web, that is not a quiet announcement. That is the foundation you were waiting for.

If you're building something that sits at the intersection of AI agents and web UIs, I'd genuinely love to compare notes. Drop a comment or find me at my usual haunts.

Tags: googleiochallenge webdev ai javascript webstandards

I built a fully interactive 3D Solar System you can explore right from your browser.

Soumyadeep Dey — Sat, 16 Aug 2025 13:55:40 +0000

🪐 Explore the Cosmos from Your Browser

I’m stoked to share my recent project: 3D Solar System Simulation, crafted with Three.js and accelerated with Vite! Tap into the wonders of our cosmic neighborhood through your screen—no telescopes needed.

🔗 Check out the live project below:

https://3d-solar-system-three-js.vercel.app/

📸 Gallery Snapshot

💡 Why I Built It

To visualize planetary orbits, rotations, and atmospheres in real-time
To flex the power of Three.js combined with modern tooling like Vite
To experiment with interactive controls and responsive design
To have a fun showcase for learning and collaboration

🚀 Project Highlights

Realistic 3D models of the Sun, planets, dwarf planets, and moons
Smooth orbit animations combined with planet rotations
Scaled visuals for better clarity and navigation
Full interaction via mouse: orbit, zoom, and pan
Modern tooling with Vite for blazing-fast development
Mobile-friendly interface with responsive scaling

🛠 Behind the Scenes

3d-Solar-System-ThreeJS/
├── public/
│   └── textures/             # Planet & celestial textures
├── src/
│   ├── main.js               # Core rendering logic
│   └── additional modules... # Planet classes, UI controls, etc.
├── index.html
├── package.json
├── vite.config.js
└── README.md

⚡ Installation & Local Run

Clone and run locally:

git clone https://github.com/SoumyaEXE/3d-Solar-System-ThreeJS.git
cd 3d-solar-system-three-js
npm install
npm run dev

Then visit

http://localhost:5173

To build a production version:

npm run build

📚 What I Learned

Skill	Takeaway
Three.js fundamentals	Scene setup, camera controls, mesh rendering
Animation techniques	Smooth orbits & rotations using `requestAnimationFrame`
Performance tuning	Optimized for smoother animations with Vite
Responsive design	UI/UIX that works seamlessly across devices

🌌 Join the Journey

I’d love your thoughts! Let me know:

What features you’d like added next
How the scaling feels on mobile
Any performance hiccups you spotted

Don’t forget to ⭐ the repo if you enjoy the simulation — and open an issue or PR if you want to contribute!

🔗 Related Projects & Resources

Three.js Examples — official showcase
Three.js Documentation — API deep dive
Vite — next-gen frontend tooling

Thanks for exploring with me!
— Soumya

What Is Vibe Coding? How To Do It In 2025

Soumyadeep Dey — Sat, 12 Jul 2025 07:00:25 +0000

🧑‍💻 Follow me on GitHub for more experiments, tools, and guides: github.com/SoumyaEXE

What if you could skip the boilerplate and just say what you want your app to do?

That’s vibe coding—a 2025 workflow where you build web apps, prototypes, and backend APIs using plain English. Whether you’re a pro developer tired of repetitive tasks or a beginner with big ideas, vibe coding changes how you ship.

This article breaks down what vibe coding is, how it works, tools to try, and how GitHub plays a major role in your AI-powered workflow.

🔍 What Is Vibe Coding?

Vibe coding is building software by describing your app, component, or logic in natural language. Tools like Cursor, Claude Code, or Replit AI translate your intent into code. GitHub Copilot and Vercel’s AI SDKs are integrating this even deeper into production-ready workflows.

🧠 “The hottest new programming language is English.” — Andrej Karpathy

Vibe coding isn't just code generation—it's iteration + intention + automation.

⚡ Why Developers (and Non-Devs) Love It

🧑‍🎓 Beginners: Build apps with zero syntax knowledge
💼 Professionals: Skip boilerplate, test logic fast
⏱️ Everyone: Launch MVPs in hours, not weeks

🛠️ Tools for Vibe Coding in 2025

Tool	What it’s good for	GitHub Integration
Cursor	Code generation in VS Code	✅ Commits + Copilot sync
Claude Code	Complex structured logic via prompts	❌ Manual export
Replit AI	Full-stack in-browser development	✅ GitHub sync
Vercel AI SDK	Integrate AI responses in Next.js apps	✅ Native
Postman AI	API testing & backend autogen	✅ Push collections

Want GitHub commits that make sense? Cursor and Replit are great for AI-generated commits, clean histories, and Copilot X preview features.

✍️ How to Vibe Code

1. Write Specific Prompts

Bad:

Make a website

Good:

Build a dark-themed Next.js blog with a subscribe form and Markdown support.

Even better:

Create a TypeScript-based Next.js 14 app with Tailwind CSS. Add a newsletter form using Vercel’s Edge Functions and connect to a Notion DB.

2. Test, Commit, Repeat

Once the tool generates the code:

Run locally (npm run dev)
Test features and endpoints
Fix bugs via prompts
Auto-commit to GitHub

git add .
git commit -m "✨ AI-generated landing page with Tailwind and form handling"
git push

AI even suggests commits now. Just review and approve.

3. Use GitHub for Versioning + Deploying

Every good AI project needs version control. Here's a simple Vercel-integrated flow:

vercel init
vercel --prod

Want preview links after every push? GitHub Actions + Vercel CI does it for you.

🚨 Pitfalls to Avoid

Problem	Fix
Vague prompts	Be specific with structure + tech
Wrong framework	Always mention your stack
Code doesn't run	Use Replit or local dev server
AI bloats the code	Refactor manually or with Copilot
Debugging is messy	Break into smaller prompts

🧪 Example Vibe Prompt (Backend + DB)

“Create an Express.js REST API with a /books route, connect it to a SQLite DB, and deploy it with Vercel Serverless Functions.”

⬇️ AI Output:

/api/books.js with CRUD routes
db.sqlite pre-seeded with sample data
vercel.json configured
.env support added

All ready to deploy + preview live via GitHub commits.

🔮 What’s Next?

AI-driven workflows are becoming GitHub-native.

Copilot Workspace is in preview
GitHub Actions trigger based on AI-generated code
aidd.io is launching something major on July 15, 2025

Get ready for AI-powered commit messages, automated PR descriptions, and test coverage reports—all from natural language.

💬 Final Thoughts

Vibe coding is a real shift in how we build. You still need logic and structure, but AI handles the scaffolding.

✅ Start small

✅ Be specific

✅ Use GitHub for backup and versioning

✅ Iterate fast

Want to ship ideas without the grind? Vibe coding might be your new workflow.

🔗 Stay in the Loop

Follow me on GitHub: SoumyaEXE
Leave a ❤️ if you vibe with this
Share your AI-generated repos in the comments!

See you on the other side of the commit log 🚀

Top 10 Must-Know GitHub Repositories for Backend Developers (2025) 🔥

Soumyadeep Dey — Sat, 12 Jul 2025 06:43:05 +0000

As a backend developer, mastering the right tools and libraries can significantly improve your workflow, scalability, and project maintainability. Here’s a concise list of 10 highly useful GitHub repositories tailored for backend developers in 2025.

1. expressjs/express

Minimalist Web Framework for Node.js

Express is a fast, unopinionated, and widely-used web framework for building APIs and server-side applications.

const express = require('express');
const app = express();

app.get('/', (req, res) => {
  res.send('Hello Backend World!');
});

app.listen(3000, () => console.log('Server running'));

✅ Lightweight and flexible
✅ Robust routing
✅ Middleware support

2. typeorm/typeorm

ORM for TypeScript and JavaScript

TypeORM allows you to interact with SQL databases using an object-oriented approach.

@Entity()
export class User {
  @PrimaryGeneratedColumn()
  id: number;

  @Column()
  name: string;
}

✅ Supports multiple DB engines (PostgreSQL, MySQL, etc.)
✅ Migration and CLI support
✅ Decorator-based modeling

3. prisma/prisma

Next-Generation Type-Safe ORM

Prisma offers a powerful and type-safe interface for querying your database using auto-generated TypeScript types.

model User {
  id    Int     @id @default(autoincrement())
  email String  @unique
  posts Post[]
}

✅ Type safety for queries
✅ Built-in migrations
✅ Great developer experience

4. nestjs/nest

Progressive Framework for Building Scalable Server-Side Apps

NestJS uses modern JavaScript, TypeScript, and modular architecture inspired by Angular.

@Controller('users')
export class UsersController {
  @Get()
  findAll(): string {
    return 'This returns all users';
  }
}

✅ Dependency injection
✅ Modular architecture
✅ Ideal for microservices and enterprise apps

5. docker/awesome-compose

Collection of Docker Compose Examples

This repository provides practical and production-ready Docker Compose templates.

services:
  backend:
    build: .
    ports:
      - 3000:3000
  db:
    image: postgres
    environment:
      POSTGRES_PASSWORD: example

✅ Quick environment setup
✅ Great for learning and deployment
✅ Covers common backend stacks

6. sql-js/sql.js

SQLite Compiled to JavaScript

SQL.js allows you to run a full SQLite database in the browser or Node.js.

const SQL = require('sql.js');
const db = new SQL.Database();
db.run("CREATE TABLE test (col1, col2);");

✅ No external dependencies
✅ Perfect for testing and learning SQL
✅ Use in web apps or sandboxed environments

7. kamranahmedse/developer-roadmap

Visual Learning Path for Developers

This roadmap outlines the core skills and technologies needed to become a backend developer.

✅ Covers languages, tools, protocols, DBs
✅ Updated regularly
✅ Community-driven and beginner-friendly

8. elastic/elasticsearch

Search and Analytics Engine

Elasticsearch is used for full-text search, data analysis, logging, and monitoring.

curl -X GET "localhost:9200/_cat/indices?v"

✅ Distributed and scalable
✅ Powerful querying and filtering
✅ Widely used with Logstash, Kibana

9. auth0/node-jsonwebtoken

JWT Authentication for Node.js

A simple and robust library for creating and verifying JSON Web Tokens.

const jwt = require('jsonwebtoken');
const token = jwt.sign({ userId: 123 }, 'your-secret', { expiresIn: '1h' });

✅ Stateless authentication
✅ Supports expiration and claims
✅ Easy to integrate into APIs

10. fastify/fastify

High-Performance Node.js Web Framework

Fastify is designed for high throughput and low overhead.

const fastify = require('fastify')();

fastify.get('/', async (request, reply) => {
  return { hello: 'world' };
});

fastify.listen({ port: 3000 });

✅ Schema-based validation
✅ Highly extensible plugin system
✅ Blazing fast performance

✅ Final Thoughts

These repositories are more than just popular — they provide the backbone for robust, scalable, and efficient backend development.

Use Express or Fastify for APIs
Manage data using Prisma or TypeORM
Secure your backend with JWT
Containerize with Docker
Monitor with Elasticsearch

Explore these repos, contribute to them, and use them in real-world projects to become a more effective backend developer in 2025.

📌 Stay Connected

Follow me on GitHub
Bookmark this article for reference
Leave a ❤️ if you found it helpful

Happy building! 🚀

What's the most challenging coding problem you've encountered, and how did you overcome it? 😮

Soumyadeep Dey — Sun, 03 Sep 2023 14:42:25 +0000

So What Was The Biggest Problem You've Encountered While Doing Coding? 😶

Let Everybody Know In Comments! 🤔❤

(Hit a Follow To Me ~ Its a Small Promotion 🤣✋🏻)

Jokeday Funday: Part 6 - More Programming Humor to Brighten Your Day

Soumyadeep Dey — Sun, 03 Sep 2023 14:36:53 +0000

Jokeday Funday: Part 6 - More Programming Humor to Brighten Your Day

Joke 1: The Broken Monitor

Why did the programmer go broke?

Because they used up all their cache and couldn't find their cache flow!

Joke 2: The Time Traveler

Why did the developer bring a ladder to the bar?

Because they wanted to reach the "root" beer on the top shelf!

Joke 3: The Coding Detective

Why did the programmer always carry a pencil and paper?

Because you never know when you'll need to draw a "byte" sketch of a suspect!

Joke 4: The Bug Whispers

Why did the developer become a gardener?

Because they had a talent for making bugs flourish!

Joke 5: The Tech Support Hero

Why did the tech support agent go to therapy?

Because they couldn't stop hearing voices saying, "Have you tried turning it off and on again?"

Joke 6: The Password Blues

Why did the password go to therapy?

Because it had too many issues to "hash" out!

Joke 7: The Agile Athlete

Why did the developer start running marathons?

Because they wanted to master the art of sprinting and releasing!

Joke 8: The Code Poet

Why did the developer start writing poetry?

Because they realized code comments could be a form of art!

Joke 9: The Lost Pointer

Why did the pointer go on vacation?

Because it needed to refresh its memory!

Joke 10: The Debugging Zen

Why did the programmer become a meditation guru?

Because they wanted to find inner peace and tranquility amidst runtime errors!

Bonus Joke: The Coding Conductor

Why did Soumya become a programmer?

Because he wanted to orchestrate the symphony of logic and create beautiful software symphonies!