DEV Community: AI Tech Connect

Agent-to-Agent Interop with A2A: Agent Cards, Tasks and Production Security

AI Tech Connect — Thu, 23 Jul 2026 15:30:11 +0000

Originally published on AI Tech Connect.

What A2A gives you For two years the interesting problem in applied AI was getting a single agent to do useful work: give it good tools, a sensible prompt, some memory, and let it reason. That problem is far from solved, but the frontier has already moved. The systems being built now are not single agents — they are teams of them. A triage agent hands a refund case to a payments agent. A research agent delegates a data-extraction job to a specialist. A logistics agent asks a pricing agent for a quote. And here the old approach breaks down, because those agents were built by different teams, on different frameworks, sometimes by different companies entirely. There was no shared language for one agent to discover another, understand what it could do, and hand it a piece of work safely. A2A…

Read the full article on AI Tech Connect →

Become an AI Safety Engineer: Red-Teaming, Evals and Alignment Roles in 2026

AI Tech Connect — Thu, 23 Jul 2026 13:30:08 +0000

Originally published on AI Tech Connect.

What you need to know AI safety engineering is, right now, one of the fastest-growing and hardest-to-hire roles in the whole industry. Anthropic, Google DeepMind and OpenAI are all scaling their empirical safety, evaluation and alignment teams, and the demand is running well ahead of the supply of people who can do the work. The unusual thing about this field — and the reason it is such a good bet for a determined builder — is that it hires overwhelmingly on demonstrated skill rather than credentials. You do not need permission, a specific degree or a famous employer on your CV to start. You need to do the work in public and let it be found. This guide lays out the five sub-tracks inside safety engineering, the skills each one rewards, the pipelines that reliably convert self-taught…

Read the full article on AI Tech Connect →

Code Execution with MCP: How "Code Mode" Cuts Agent Token Costs 90%+

AI Tech Connect — Thu, 23 Jul 2026 11:30:21 +0000

Originally published on AI Tech Connect.

What you need to know There is a quiet tax buried inside almost every agent you have built on the Model Context Protocol, and most teams only notice it when the invoice arrives. Each tool you connect ships its full schema into the model's context. Each tool call the model makes returns a result that also passes back through context. Wire up a dozen integrations with a few hundred operations between them, run a multi-step workflow, and the model can burn through a hundred thousand tokens before it has produced a single line of useful output. The work is real, but most of the tokens are plumbing. Anthropic's engineering team put a name and a fix to this with a pattern they call code execution with MCP. The idea is deceptively simple: instead of the model making many direct tool calls — each…

Read the full article on AI Tech Connect →

Agentic Payments for Developers: x402, AP2 and ACP Explained

AI Tech Connect — Thu, 23 Jul 2026 09:30:11 +0000

Originally published on AI Tech Connect.

What you need to know For thirty years, the HTTP status code 402 has sat in the specification marked "Payment Required" and reserved for future use. That future has arrived, and it arrived because of AI agents. When software starts acting on a user's behalf — booking, buying, subscribing, calling paid APIs without a person at the keyboard for each step — the entire assumption behind online payment breaks. Card checkout expects a human to read a form, tick a box and confirm. An agent needs something machine-native: a way to be quoted a price, attach proof that it may pay, and settle the money, all in code. As of 2026, four protocols compete to be that machine-native layer. x402, from Coinbase, revives the 402 status code to settle stablecoin payments over plain HTTP. AP2, the Agent…

Read the full article on AI Tech Connect →

Pick a Self-Hosted Coding Model in 2026: Qwen3-Coder vs GLM vs Kimi vs DeepSeek

AI Tech Connect — Thu, 23 Jul 2026 07:30:14 +0000

Originally published on AI Tech Connect.

What you need to know Two years ago, running your own coding model was a compromise you made for privacy and then quietly regretted every time the completions came back wrong. As of July 2026, that has changed. A cluster of open-weight models has crossed the threshold where a team can serve them in-house and get genuinely useful engineering help — not just autocomplete, but multi-file refactors, agentic task loops and long-running builds. The question is no longer whether you can self-host; it is which model, on what hardware, and whether the sums actually work out in your favour. This guide is written for two kinds of team. The first is a Pune product company with a 2×48GB GPU pair sitting in a rack, wondering whether it can retire part of its API bill. The second is a London studio…

Read the full article on AI Tech Connect →

Write Portable Agent Skills: One SKILL.md for Claude Code, Codex and the API

AI Tech Connect — Thu, 23 Jul 2026 05:32:06 +0000

Originally published on AI Tech Connect.

What you need to know Every coding agent you use — Claude Code in your terminal, Codex in your editor, a bespoke agent you built on the API — is only as good as the instructions you feed it. Most teams feed those instructions the slow way: someone pastes the same nine-step deployment checklist into chat for the fourth time this week, or re-explains the house style for a database migration to an agent that has no memory of the last three times. That repetition is a signal, and the thing it is signalling is a skill. An agent skill is a small, self-contained package that teaches an agent how to do one job well. At its heart sits a single file, SKILL.md: a YAML frontmatter block that names and describes the skill, followed by a Markdown body of step-by-step instructions. The format is defined…

Read the full article on AI Tech Connect →

Alibaba Previews Qwen3.8 Max: 2.4T Params, Second Only to Fable 5

AI Tech Connect — Thu, 23 Jul 2026 05:30:12 +0000

Originally published on AI Tech Connect.

What Alibaba actually announced On 20 July 2026 Alibaba previewed Qwen3.8 Max, and the headline numbers are deliberately enormous. Per the company's preview, the model weighs in at 2.4 trillion parameters — the largest Alibaba has ever trained — and on aggregate benchmarks the company positions it second only to Anthropic's flagship Claude Fable 5. The preview is already live on Alibaba's coding platforms, and the company says an open-weight release will follow, letting developers download and customise the model themselves. Three things are worth separating before anyone re-plans a stack around this. First, what is verifiable today: a preview you can actually run on Alibaba's own platforms. Second, what is promised: open weights, on an unspecified date, under an unspecified licence.…

Read the full article on AI Tech Connect →

Latency Budgets for Chat UX: Streaming, TTFT and Perceived Speed

AI Tech Connect — Wed, 22 Jul 2026 15:30:15 +0000

Originally published on AI Tech Connect.

What "fast" really means for an AI feature Ask a product manager whether their chat assistant is fast and they will usually quote a number: the model returns a full answer in six seconds, say, or a summariser finishes a document in four. Ask the person actually using it, and you get a different verdict entirely. Speed, as users experience it, has almost nothing to do with total generation time. It has everything to do with how quickly the interface stops feeling dead and starts feeling alive. This is the single most important thing to internalise before you tune a single parameter: users judge an AI feature by how fast it feels, not by how long it takes to finish. A response that begins appearing four hundred milliseconds after the user hits enter and then streams smoothly to completion…

Read the full article on AI Tech Connect →

Building a Human Annotation Pipeline for LLM Evals

AI Tech Connect — Wed, 22 Jul 2026 13:30:21 +0000

Originally published on AI Tech Connect.

What you will build Every automated evaluation you run — an LLM-as-a-judge scoring answers, a regression suite gating your deploys, a dashboard tracking quality over time — ultimately rests on a set of human decisions about what "good" looks like. Those decisions live in a golden set: a curated collection of inputs paired with human-verified labels or reference outputs. Get the golden set right and everything downstream becomes trustworthy. Get it wrong and you are optimising confidently towards the wrong target. This guide walks through building the pipeline that produces and maintains that golden set: how to design the labelling task, how to measure whether your annotators actually agree, which self-hostable tools to use, how to feed the pipeline from production traffic, and how to keep…

Read the full article on AI Tech Connect →

Load-Testing and Capacity Planning for LLM Apps

AI Tech Connect — Wed, 22 Jul 2026 11:30:12 +0000

Originally published on AI Tech Connect.

What you will actually measure Most teams load-test their large language model service the way they load-test a REST API: point a tool at the endpoint, ramp up virtual users, watch requests-per-second and average response time, and call it capacity. Then production traffic hits, the first token takes four seconds to appear during the evening peak, and nobody can say why — because the dashboard was measuring the wrong things all along. LLM serving does not behave like a stateless CRUD endpoint. Responses stream token by token rather than arriving in one block. Output length varies enormously from one request to the next. The meaningful signals live at the token level, not the request level. And the hardware saturates in a pattern that has more to do with GPU memory and a structure called…

Read the full article on AI Tech Connect →

A/B Testing LLM Features: Online Experiments That Beat Offline Evals

AI Tech Connect — Wed, 22 Jul 2026 10:09:21 +0000

Originally published on AI Tech Connect.

What you'll set up, in one paragraph By the end of this guide you will have a repeatable way to answer one question that offline evals cannot: did this change to your LLM feature actually make things better for real people? You will define an Overall Evaluation Criterion — one primary success metric plus a small set of guardrails — assign users to variants with deterministic hashing, protect the experiment with a Sample Ratio Mismatch check, avoid the peeking trap with either a fixed horizon or a sequential test, reach for interleaving when you are changing retrieval, and finally graduate a trusted metric into a contextual bandit that routes traffic automatically. The theme throughout is simple: offline evals prove an output moved; only a live experiment proves it helped. Offline tells…

Read the full article on AI Tech Connect →

Multilingual RAG: Cross-Lingual Retrieval for Indian and European Languages

AI Tech Connect — Wed, 22 Jul 2026 07:38:07 +0000

Originally published on AI Tech Connect.

What you'll build, in plain English By the end of this guide you will know how to build a retrieval-augmented generation system that answers a question in the language it was asked, no matter which language your source documents happen to be written in. A user in Coimbatore can ask in Tamil and get an answer grounded in an English policy PDF. A support agent in Lyon can ask in French and pull the right paragraph from a German technical manual. That capability — a query in language A retrieving documents in language B — is called cross-lingual retrieval, and it is the single most under-built part of most production RAG stacks serving India and Europe. The plan looks like this. Detect the language and script of the incoming query. Normalise the text so Unicode quirks and code-mixing do not…

Read the full article on AI Tech Connect →