<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ivo Brett</title>
    <description>The latest articles on DEV Community by Ivo Brett (@mattercoder).</description>
    <link>https://dev.to/mattercoder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2960227%2F6f4b2cea-93ab-4172-82cc-c417f2f3eb0e.png</url>
      <title>DEV Community: Ivo Brett</title>
      <link>https://dev.to/mattercoder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattercoder"/>
    <language>en</language>
    <item>
      <title>I Gave AI Agents a Telecom Job Interview. Most Failed Without a Cheat Sheet</title>
      <dc:creator>Ivo Brett</dc:creator>
      <pubDate>Tue, 17 Mar 2026 09:00:47 +0000</pubDate>
      <link>https://dev.to/mattercoder/i-gave-ai-agents-a-telecom-job-interview-most-failed-without-a-cheat-sheet-ddj</link>
      <guid>https://dev.to/mattercoder/i-gave-ai-agents-a-telecom-job-interview-most-failed-without-a-cheat-sheet-ddj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Telecoms is one of the most API-driven industries on the planet. TM Forum has standardised hundreds of operations workflows across product catalogs, customer management, incident response, network topology, billing, and performance monitoring. If AI agents are going to automate telecom operations, they need to work reliably across all of them.&lt;/p&gt;

&lt;p&gt;I wanted to find out: can today's open-weight LLMs actually do this? And if not — what closes the gap?&lt;/p&gt;

&lt;p&gt;The answer led me to build something I'm calling &lt;strong&gt;SKILLS&lt;/strong&gt; - a benchmark framework and a set of portable domain skill documents that give AI agents the operational knowledge they need to execute real telecom workflows. This is obviously a play on the terms Skills. Agent Skills are a new standard and according to &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;https://agentskills.io&lt;/a&gt; are &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;folders of instructions, scripts, and resources that agents can discover and use to do things more accurately and efficiently.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article covers what I built, what I found, and why the results matter for anyone building agentic AI in a regulated, API-heavy industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Generalist Agents Hit a Wall
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexbl1rwooqw5l8ac9q6w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexbl1rwooqw5l8ac9q6w.png" alt="Generalist AI agents face a wall of domain-specific telecom skills they don't have" width="800" height="478"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1: Generalist AI agents need domain-specific skills to handle telecom operations workflows.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ask a general-purpose LLM agent to handle a task like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Identify any cells with traffic anomalies greater than 3 standard deviations from their baseline, specifically looking at overnight patterns. We need to rule out unauthorized usage or configuration errors."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A capable agent will understand the problem. It will know what standard deviation means. It might even write reasonable analysis logic.&lt;/p&gt;

&lt;p&gt;But it won't know that the TMF628 Performance Management API expects &lt;code&gt;g_15mn&lt;/code&gt; for 5-minute granularity — not &lt;code&gt;PT5M&lt;/code&gt;. It won't know the job creation lifecycle. It won't know which fields to filter, or in which order to call the APIs to get the data it needs.&lt;/p&gt;

&lt;p&gt;It will hallucinate a plausible-looking answer, or fail validation, or call the wrong endpoint. Not because it's incapable — because it lacks &lt;em&gt;domain knowledge that isn't in any training data&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That's the gap skills are designed to close.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built: The SKILLS Benchmark
&lt;/h2&gt;

&lt;p&gt;I built a benchmark of &lt;strong&gt;37 telecom operations scenarios&lt;/strong&gt; across &lt;strong&gt;8 TM Forum Open API domains&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;TMF API&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Scenarios&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TMF628&lt;/td&gt;
&lt;td&gt;Performance Management&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TMF629&lt;/td&gt;
&lt;td&gt;Customer Management&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TMF637&lt;/td&gt;
&lt;td&gt;Product Inventory&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TMF639&lt;/td&gt;
&lt;td&gt;Resource Topology&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TMF724&lt;/td&gt;
&lt;td&gt;Incident Management&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TMF620/621/622&lt;/td&gt;
&lt;td&gt;Catalog, Tickets, Orders&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each scenario runs against &lt;strong&gt;live mock API servers&lt;/strong&gt; backed by MongoDB, with seeded production-representative data. The agent has access to MCP tool interfaces for each server. Evaluation is three-layer: programmatic tool-call verification, LLM judge for response content, and database state assertions.&lt;/p&gt;

&lt;p&gt;Scenarios span four complexity tiers from simple single-API lookups to &lt;strong&gt;Complex&lt;/strong&gt; scenarios that require proprietary business logic the model cannot infer from schema alone — SLA weighting formulas, maintenance exclusion rules, specific TMF enumeration formats.&lt;/p&gt;

&lt;p&gt;For each scenario I ran the agent twice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline&lt;/strong&gt; — the agent has tools but no domain guidance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With-Skill&lt;/strong&gt; — the agent receives a portable &lt;code&gt;SKILL.md&lt;/code&gt; document encoding workflow steps, API patterns, and business rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The delta is the &lt;strong&gt;skill lift&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Skills?
&lt;/h2&gt;

&lt;p&gt;A Skill is a portable Markdown document that gives an agent the operational knowledge for a specific telecom workflow. It encodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which MCP servers and tools are required&lt;/li&gt;
&lt;li&gt;The exact sequence of API calls and their parameters&lt;/li&gt;
&lt;li&gt;Business rules and decision logic (e.g. SLA priority weights)&lt;/li&gt;
&lt;li&gt;Domain-specific enumeration formats&lt;/li&gt;
&lt;li&gt;Error handling patterns&lt;/li&gt;
&lt;li&gt;Required output format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skills are model-agnostic. They contain no code — only structured natural language instructions that any agent platform can load as system context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evaluation Setup
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwukzdfkfrzdnevbgg3z5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwukzdfkfrzdnevbgg3z5.png" alt="Skills evaluation running in Contextware" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: The Contextware skills evaluation workbench running baseline and with-skill conditions for each scenario.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I evaluated the following open-weight and open-access models via OpenRouter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nemotron 120B&lt;/strong&gt; (NVIDIA) — standard and minimal reasoning conditions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiniMax M2.5&lt;/strong&gt; (MiniMax)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM-5 Turbo&lt;/strong&gt; (Z.AI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seed 2.0 Lite&lt;/strong&gt; (ByteDance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healer Alpha&lt;/strong&gt; and &lt;strong&gt;Hunter Alpha&lt;/strong&gt; (OpenRouter)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Here's the headline table across all models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;With Skills&lt;/th&gt;
&lt;th&gt;Lift&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Healer Alpha&lt;/td&gt;
&lt;td&gt;70.3%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+13.5pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiniMax M2.5&lt;/td&gt;
&lt;td&gt;67.6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81.1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+13.5pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nemotron 120B (std)&lt;/td&gt;
&lt;td&gt;59.5%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+18.9pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nemotron 120B (min)&lt;/td&gt;
&lt;td&gt;67.6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+10.8pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM-5 Turbo&lt;/td&gt;
&lt;td&gt;73.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+5.4pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Seed 2.0 Lite&lt;/td&gt;
&lt;td&gt;56.8%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+18.9pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hunter Alpha&lt;/td&gt;
&lt;td&gt;43.2%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;62.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+18.9pp&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Every single model improved with skills.&lt;/strong&gt; The lift ranges from +5pp to +19pp overall, and on the hardest Complex scenarios the gains are even larger: &lt;strong&gt;+33–44pp&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No amount of additional model scale alone achieved these results. The knowledge had to be injected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 1: Skills Matter Most Where Models Are Blind
&lt;/h2&gt;

&lt;p&gt;The Complex scenario tier is the most diagnostic part of the benchmark. These scenarios require logic that genuinely isn't in any training data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An SLA risk score calculated as &lt;code&gt;Σ(WEIGHT × BREACH_MINUTES)&lt;/code&gt; where Platinum=10, Gold=7, Silver=4&lt;/li&gt;
&lt;li&gt;A topology traversal that must exclude resources with &lt;code&gt;administrativeState=locked&lt;/code&gt; (planned maintenance) to find the true root cause&lt;/li&gt;
&lt;li&gt;TMF measurement job creation using &lt;code&gt;g_15mn&lt;/code&gt; and &lt;code&gt;r_1h&lt;/code&gt; format strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without the skill: models either hallucinate a plausible answer, or get the API call wrong at the parameter level. With the skill: they apply the exact logic and pass.&lt;/p&gt;

&lt;p&gt;Complex scenario lift across models: &lt;strong&gt;+33pp to +44pp&lt;/strong&gt;. This is where skills earn their keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 2: More Reasoning Isn't Always Better
&lt;/h2&gt;

&lt;p&gt;This one surprised me.&lt;/p&gt;

&lt;p&gt;I ran Nemotron 120B under two conditions: full reasoning and minimal reasoning (a guardrail preamble instructing it to prefer direct tool calls, skip re-verification steps, and use exact enum values from the skill).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Both conditions scored exactly 78.4% overall with skills.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Identical ceiling. But minimal reasoning scored &lt;strong&gt;88.9% on Complex scenarios&lt;/strong&gt; vs 77.8% for standard. Reducing reasoning depth improved performance on the hardest tasks.&lt;/p&gt;

&lt;p&gt;Why? Because the full reasoning model was burning its budget on the wrong problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 3: The Sandbox Discrimination Failure
&lt;/h2&gt;

&lt;p&gt;This is the most significant finding from the Nemotron evaluation.&lt;/p&gt;

&lt;p&gt;I traced every tool call across all TMF639 topology analysis scenarios:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Calls&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;connect_to_mcp_server&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_command&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_environment_variable&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_mcp_servers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;write_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;create_sandbox&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;execute_mcp_tool&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Domain work&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;97% of tool calls were infrastructure overhead. 3% was actual domain work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model wrote a Python script to call an MCP API tool — when that tool was sitting directly available in its tool palette. Then the ephemeral sandbox expired mid-run. Scenario failed. Not because Nemotron couldn't reason about topology analysis. Because it couldn't decide whether to retrieve data from an API or compute something.&lt;/p&gt;

&lt;p&gt;I call this &lt;strong&gt;Sandbox Discrimination Failure&lt;/strong&gt;: the inability to distinguish between "I need to retrieve data from an API" (use the MCP tool directly) and "I need to compute something" (a sandbox is appropriate). Nemotron defaults to sandbox as a general-purpose execution layer regardless of task type.&lt;/p&gt;

&lt;p&gt;The cascade looks like this:&lt;/p&gt;

&lt;p&gt;Step 3: connect_to_mcp_server (30s)&lt;br&gt;
Step 7: create_sandbox (30s)&lt;br&gt;
Steps 9–15: run_command / write_file cycles (180s)&lt;br&gt;
Step 15: Sandbox expires → recovery attempts (90s)&lt;br&gt;
Step 17+: Scenario timeout (360s total) → FAIL&lt;/p&gt;

&lt;p&gt;The agent never reached the actual topology analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx58b94l79tsokmhr2ipj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx58b94l79tsokmhr2ipj.png" alt="Nemotron evaluation results showing sandbox fixation pattern" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: Nemotron 120B evaluation results showing the impact of sandbox fixation across TMF domains.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 4: The Reasoning-Prescription Paradox
&lt;/h2&gt;

&lt;p&gt;Reasoning models treat skill instructions as &lt;em&gt;suggestions&lt;/em&gt; to evaluate against their general knowledge — not as &lt;em&gt;directives&lt;/em&gt; to follow.&lt;/p&gt;

&lt;p&gt;When the TMF628 skill says "use &lt;code&gt;g_5mn&lt;/code&gt; for 5-minute granularity," Nemotron overrides it with &lt;code&gt;PT5M&lt;/code&gt; (ISO 8601) because its training data says ISO formats are more correct. It's internally logical. It's externally wrong.&lt;/p&gt;

&lt;p&gt;The API returns a validation error. The scenario fails.&lt;/p&gt;

&lt;p&gt;Here's a sample of what we observed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill says&lt;/th&gt;
&lt;th&gt;Model used&lt;/th&gt;
&lt;th&gt;Model's reasoning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;r_1h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PT1H&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"ISO 8601 standard"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;g_5mn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;r_5mn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Confused prefix semantics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;unlocked&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;UNLOCKED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Enum constants are uppercase"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This leads to a counterintuitive design principle: &lt;strong&gt;skills for reasoning models must be more prescriptive than skills for non-reasoning models.&lt;/strong&gt; You have to explicitly prohibit the substitutions the model will otherwise make. The more capable the model's reasoning, the more guardrails the skill needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 5: Baseline-Lift Compression
&lt;/h2&gt;

&lt;p&gt;GLM-5 Turbo (Z.AI) is purpose-built for agent workflows — complex instruction decomposition, multi-step tool chains, execution stability. It shows the clearest example of what I'm calling &lt;strong&gt;baseline-lift compression&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GLM-5 achieves the highest baseline of any non-reasoning model (73.0%). Skills add only +5.4pp overall lift. But it still converges on the same 78.4% with-skill ceiling as Nemotron — and reaches 88.9% on Complex scenarios.&lt;/p&gt;

&lt;p&gt;The implication: &lt;strong&gt;aggregate skill lift is an unreliable quality signal for capable models.&lt;/strong&gt; If a model already handles most tasks correctly, skills appear not to help much. But drill into the Complex tier — where proprietary logic is required — and skills deliver the same +33pp regardless.&lt;/p&gt;

&lt;p&gt;Also observed: on domains where GLM-5 already achieves 100% baseline, injecting a skill can &lt;em&gt;hurt&lt;/em&gt; performance. Service assurance: 100% drops to 75% with skill. Domain guidance adds noise where the model already knows the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Architects Building Agentic Telco Systems
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Pretrained models are not enough.&lt;/strong&gt; Every model tested — regardless of capability tier — improved with skills. The TMF-specific knowledge (enumeration formats, API sequences, business logic) simply isn't in training data at sufficient depth. You cannot engineer your way around this with a bigger model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Skills are a practical, model-agnostic layer.&lt;/strong&gt; The same SKILL.md document improved performance across every model tested. They're portable, version-controllable, and maintainable by domain experts without ML expertise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Model selection, skill design, and infrastructure are a three-way interaction.&lt;/strong&gt; A reasoning-heavy model that takes 30 seconds per step will hit sandbox idle timeouts that a 2-second model never encounters. The right model for a TMF724 incident workflow may be the wrong model for a TMF639 topology traversal. Evaluate them together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Complex scenarios are the real test.&lt;/strong&gt; Overall pass rate flatters every model. The Complex tier — scenarios requiring proprietary logic — is where the real gap opens up, and where skills deliver their highest returns (+33–44pp).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Watch for skill interference.&lt;/strong&gt; High-baseline models can be hurt by skills on domains they already handle correctly. Design skills to add value at the capability boundary, not to duplicate what the model already knows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skills Pack
&lt;/h2&gt;

&lt;p&gt;The 8 TM Forum skills I used in this evaluation are portable SKILL.md documents covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;billing-inquiry&lt;/strong&gt; — cross-referencing orders, catalog pricing, and inventory records&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;customer-onboarding&lt;/strong&gt; — multi-API activation across TMF629/620/622&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;incident-management&lt;/strong&gt; — SLA-weighted triage and dispatch ranking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;network-incident-assessment&lt;/strong&gt; — situational analysis across active incidents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;product-management&lt;/strong&gt; — catalog and order orchestration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;service-assurance&lt;/strong&gt; — troubleshooting across customer and inventory APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tmf628-performance-manager&lt;/strong&gt; — KPI job creation and anomaly detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tmf639-topology-analysis&lt;/strong&gt; — root cause analysis with maintenance exclusion and SLA priority weighting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want the full skills pack, connect with me on the LinkedIn  and I'll DM you the documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Generalist AI agents are capable. They can reason about telecom problems, use API tools, and produce operational outputs. But they lack the domain-specific knowledge — the exact API sequences, enumeration formats, and business rules — that production telecom operations require.&lt;/p&gt;

&lt;p&gt;Structured skills close that gap reliably and cost-effectively across every model tested. The hardest tasks show the biggest returns. And the findings around reasoning models — the sandbox fixation, the enumeration substitution, the prescription paradox — have direct implications for anyone deploying agentic AI in an API-heavy regulated environment.&lt;/p&gt;

&lt;p&gt;The question isn't whether your model can reason. It's whether it's reasoning too much.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full research paper and benchmark results: [&lt;a href="https://arxiv.org/abs/2603.15372" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2603.15372&lt;/a&gt;]&lt;/em&gt;&lt;br&gt;
&lt;em&gt;LinkedIn: [&lt;a href="https://www.linkedin.com/in/ivobrett" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/ivobrett&lt;/a&gt;]&lt;/em&gt;&lt;br&gt;
&lt;em&gt;GitHub: [&lt;a href="https://github.com/oidebrett" rel="noopener noreferrer"&gt;https://github.com/oidebrett&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
    </item>
    <item>
      <title>In this article, I share my journey of building an MCP Server to allow AI agents to control smart-home devices based on Matter, along with a step-by-step guide for you to do the same</title>
      <dc:creator>Ivo Brett</dc:creator>
      <pubDate>Thu, 27 Mar 2025 16:59:27 +0000</pubDate>
      <link>https://dev.to/mattercoder/in-this-article-i-share-my-journey-of-building-an-mcp-server-to-allow-ai-agents-to-control-4mpo</link>
      <guid>https://dev.to/mattercoder/in-this-article-i-share-my-journey-of-building-an-mcp-server-to-allow-ai-agents-to-control-4mpo</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/mattercoder" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2960227%2F6f4b2cea-93ab-4172-82cc-c417f2f3eb0e.png" alt="mattercoder"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/mattercoder/i-built-an-ai-agent-that-can-control-a-smart-home-and-you-can-too-152e" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;I built an AI Agent that can control a smart-home and you can too.&lt;/h2&gt;
      &lt;h3&gt;Ivo Brett ・ Mar 27&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#iot&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#matter&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>iot</category>
      <category>programming</category>
      <category>matter</category>
    </item>
    <item>
      <title>I built an AI Agent that can control a smart-home and you can too.</title>
      <dc:creator>Ivo Brett</dc:creator>
      <pubDate>Thu, 27 Mar 2025 10:27:30 +0000</pubDate>
      <link>https://dev.to/mattercoder/i-built-an-ai-agent-that-can-control-a-smart-home-and-you-can-too-152e</link>
      <guid>https://dev.to/mattercoder/i-built-an-ai-agent-that-can-control-a-smart-home-and-you-can-too-152e</guid>
      <description>&lt;h2&gt;
  
  
  AI Agents + MCP + Matter = Amazing Opportunity
&lt;/h2&gt;

&lt;p&gt;The world of AI agents is evolving rapidly, with the ability to control real-world connected devices in smart homes. This advancement unlocks new opportunities for automation, enabling seamless interaction with physical environments through APIs.&lt;/p&gt;

&lt;p&gt;However, integrating AI Agents with smart home devices is challenging. If you’ve ever tried to control IoT-based devices programmatically, you know how frustrating it can be—dealing with protocols, writing low-level code, and debugging endless connection issues.&lt;/p&gt;

&lt;p&gt;That’s why I built the &lt;strong&gt;Matter-MCP Server&lt;/strong&gt;, an open-source tool that makes it ridiculously easy for AI Agents and AI assistants like Claude to control Matter devices using natural language. In this article, I’ll break down how it works, why it’s a game-changer for developers, and how you can set it up yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa75b0f0k56qypqgkrb1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa75b0f0k56qypqgkrb1a.png" alt="Matter Mcp Architecture" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Built the Matter-MCP Server
&lt;/h2&gt;

&lt;p&gt;When working with AI and IoT, I found myself constantly wrestling with connectivity complexity. Matter is an amazing protocol backed by industry giants, but it isn’t exactly built for AI-driven automation out of the box.&lt;/p&gt;

&lt;p&gt;One major issue is the need to communicate with devices using the Matter protocol. This makes automation difficult, especially for developers working on AI-powered assistants.&lt;/p&gt;

&lt;p&gt;That’s where &lt;strong&gt;MCP (Model-Context-Protocol)&lt;/strong&gt; comes in. The Matter-MCP Server acts as a bridge between AI models and Matter devices, allowing seamless communication through structured interfaces. Instead of writing raw protocol commands, you can now control your IoT setup with simple, human-readable requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Can It Do?
&lt;/h2&gt;

&lt;p&gt;The Matter-MCP Server allows AI to:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Commission new Matter devices&lt;/strong&gt; (without manual intervention!)&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Read and update device attributes&lt;/strong&gt; (e.g., check if a door is locked)&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Send commands in natural language&lt;/strong&gt; (e.g., “Turn off the smart plug”)&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Monitor device status in real-time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Search and access Matter protocol documentation dynamically&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By leveraging &lt;strong&gt;MCP&lt;/strong&gt;, it makes AI-driven home automation smoother and more intuitive than ever.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9wcefch06t4003evvw6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9wcefch06t4003evvw6.png" alt="MCP Flow" width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Set It Up (Step-By-Step)
&lt;/h2&gt;

&lt;p&gt;Setting up the Matter-MCP Server is easy! Here’s how to do it in a few minutes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Clone the Repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/MatterCoder/matter-mcp-server.git
&lt;span class="nb"&gt;cd &lt;/span&gt;matter-mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2️⃣ Set Up a Python Virtual Environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate  &lt;span class="c"&gt;# On Windows: .venv\Scripts\activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3️⃣ Install Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4️⃣ Install UV (Required for AI Integration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;curl&lt;/code&gt; is unavailable, use &lt;code&gt;wget&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget &lt;span class="nt"&gt;-qO-&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Integrating with Claude (For AI-Powered Control)
&lt;/h2&gt;

&lt;p&gt;To enable Claude to interact with the Matter-MCP Server, configure your Claude desktop settings:&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Locate the &lt;code&gt;claude_desktop_config.json&lt;/code&gt; file
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ubuntu&lt;/strong&gt;: &lt;code&gt;~/.config/Claude&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MacOS&lt;/strong&gt;: &lt;code&gt;~/Library/Application Support/Claude&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows&lt;/strong&gt;: &lt;code&gt;%APPDATA%\Claude&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2️⃣ Add the MCP Server Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matter-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"[REPLACE_WITH_FULL_PATH_TO_YOUR_REPO]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"matter-mcp-server.py"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3️⃣ Restart Claude Desktop
&lt;/h3&gt;




&lt;h2&gt;
  
  
  Claude Code MCP Installation
&lt;/h2&gt;

&lt;p&gt;You can also install the Matter MCP server in Claude Code using the claude mcp add command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add matter-mcp-server
uv &lt;span class="nt"&gt;--directory&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;REPLACE_WITH_FULL_PATH_TO_YOUR_REPO] run matter-mcp-server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtbfhya6r6k483gt6paj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtbfhya6r6k483gt6paj.png" alt="Claude Code MCP Setup" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Python Matter Server
&lt;/h2&gt;

&lt;p&gt;The Python Matter Server is used by my MCP server. The Python Matter Server, from the Open Home Foundation, implements a Matter Controller Server over WebSockets using the official Matter (formerly CHIP) SDK.&lt;/p&gt;

&lt;p&gt;For running the server and/or client in your development environment, see the &lt;a href="https://github.com/home-assistant-libs/python-matter-server/blob/main/DEVELOPMENT.md" rel="noopener noreferrer"&gt;Development documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For running the Matter Server as a standalone docker container, see the  &lt;a href="https://github.com/home-assistant-libs/python-matter-server/blob/main/docs/docker.md" rel="noopener noreferrer"&gt;docker instructions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftr1hnn2x0alir6d92wn6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftr1hnn2x0alir6d92wn6.png" alt="Python Matter Server" width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing with a Matter device?
&lt;/h2&gt;

&lt;p&gt;A Matter Virtual Device (MVD) is a software-based emulator provided by Google that simulates Matter-compatible smart home devices for testing and development. It allows developers to validate device behavior without physical hardware. To set it up, use the Matter Virtual Device Tool, follow the steps in the &lt;a href="https://developers.home.google.com/matter/tools/virtual-device" rel="noopener noreferrer"&gt;MVD official guide&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtx2paw4i3ai7xg49tmm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtx2paw4i3ai7xg49tmm.png" alt="MVD Install" width="449" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging Your AI-Powered IoT Setup
&lt;/h2&gt;

&lt;p&gt;If you are having difficulties, you can check the communication to the python matter server websocket by running these sample scripts:&lt;/p&gt;

&lt;p&gt;📌 &lt;strong&gt;Commission a Device&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python samples/Commission_with_Code.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 &lt;strong&gt;Get Node Information&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python samples/Get_Node.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 &lt;strong&gt;Send Commands to Devices&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python samples/Send_a_command.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Expanding Your AI Agent’s Capabilities
&lt;/h2&gt;

&lt;p&gt;To make your AI assistant even smarter, you can add additional MCP servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matter-coder-search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"[REPLACE_WITH_FULL_PATH_TO_YOUR_REPO]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"matter-coder-search.py"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matter-datamodel-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"[REPLACE_WITH_FULL_PATH_TO_YOUR_REPO]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"matter-datamodel-mcp.py"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows your AI model to dynamically search for relevant Matter documentation and device commands and attributes without manual intervention.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Matter-MCP Server&lt;/strong&gt; is a &lt;strong&gt;game-changer&lt;/strong&gt; for AI-driven IoT automation. Instead of spending hours wrestling with protocols, developers can now integrate Matter devices with AI assistants in minutes.&lt;/p&gt;

&lt;p&gt;If you’re interested in making AI &lt;strong&gt;actually useful&lt;/strong&gt; in smart home automation, &lt;strong&gt;this is your tool.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;No complex coding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;No endless debugging.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Just AI-powered IoT magic.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  🚀 Ready to Build?
&lt;/h3&gt;

&lt;p&gt;Check out the &lt;a href="https://github.com/MatterCoder/matter-mcp-server" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; and start your AI-powered IoT journey today!&lt;/p&gt;

&lt;h3&gt;
  
  
  Need more info?
&lt;/h3&gt;

&lt;p&gt;Check out my youtube video&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/k1wj-ec1evE"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you want a more comprehensive tutorial style video, check out my &lt;a href="https://youtu.be/DNOUvUqoh3k?si=m9zR-Phdu97cBU7T" rel="noopener noreferrer"&gt;coding tutorial on youtube&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you found this journey inspiring or have insights to share, feel free to connect and collaborate. Together, we can harness technology to create solutions that truly matter.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>iot</category>
      <category>programming</category>
      <category>matter</category>
    </item>
    <item>
      <title>Building an AI Agent Powered Elderly Care System: A Developer's Journey</title>
      <dc:creator>Ivo Brett</dc:creator>
      <pubDate>Wed, 26 Mar 2025 11:40:34 +0000</pubDate>
      <link>https://dev.to/mattercoder/building-an-ai-agent-powered-elderly-care-system-a-developers-journey-3g43</link>
      <guid>https://dev.to/mattercoder/building-an-ai-agent-powered-elderly-care-system-a-developers-journey-3g43</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As developers, we're always looking for meaningful projects that not only challenge our skills but also make a real difference. Imagine creating a system that helps elderly individuals live safely and comfortably in their own homes. That's exactly what I set out to do by integrating AI agents with smart home technology to develop a proactive elderly care monitoring system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea: Merging AI and Smart Homes for Elderly Care
&lt;/h2&gt;

&lt;p&gt;The concept was straightforward: utilize AI agents to monitor and assist elderly individuals within their homes, leveraging smart home devices to detect anomalies and provide timely interventions. With the rise of the &lt;strong&gt;Matter&lt;/strong&gt; protocol—a unifying standard for smart home devices—the timing was perfect to embark on this project.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4ty8ovnvbs4ly1yxoz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4ty8ovnvbs4ly1yxoz0.png" alt="System Architecture Diagram" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: System architecture illustrating the interactions between AI agents and Matter-compatible devices.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The system's architecture is built around an &lt;strong&gt;agentic framework&lt;/strong&gt;, consisting of multiple specialized AI agents working collaboratively:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Planning Agent&lt;/strong&gt;: Coordinates the workflow and triggers other agents as needed. This is built on OpenAI gpt-4o frontier model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scanning Agent&lt;/strong&gt;: Monitors sensor logs for any unusual activities or anomalies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Messaging Agent&lt;/strong&gt;: Handles communication, sending alerts or notifications when necessary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensemble Agent&lt;/strong&gt;: Aggregates insights from various AI models through a voting mechanism to ensure accurate decision-making.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;strong&gt;Ensemble Agent&lt;/strong&gt; incorporates three distinct AI models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Random Forest Agent&lt;/strong&gt;: Utilizes traditional machine learning techniques for anomaly detection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tabular Data Model Agent&lt;/strong&gt;: Combines AI and machine learning to analyze structured data effectively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontier Reasoning Model&lt;/strong&gt;: Applies logical reasoning to assess situations and make informed decisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent processes sensor data independently, and the &lt;strong&gt;Ensemble Agent&lt;/strong&gt; consolidates their results through a voting system. The final decisions are stored in the &lt;strong&gt;agent memory&lt;/strong&gt;, facilitating continuous learning and improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration of Matter-Compatible Devices
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wugs3zhw1gm2odzlge9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wugs3zhw1gm2odzlge9.png" alt="Python Matter Server" width="485" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A significant aspect of this project was integrating &lt;strong&gt;Matter-compatible&lt;/strong&gt; smart home devices. Matter is an emerging protocol that ensures seamless interoperability between various smart devices, making it easier to create a cohesive and responsive home environment.&lt;/p&gt;

&lt;p&gt;By utilizing the &lt;strong&gt;Python Matter Server&lt;/strong&gt; from the Open Home Foundation, the system can interact with a wide range of Matter-supported devices, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Locks&lt;/strong&gt;: Ensuring doors are securely locked or unlocked as needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lights&lt;/strong&gt;: Adjusting lighting based on time of day or detected activity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wi-Fi Networks&lt;/strong&gt;: Monitoring connectivity to ensure seamless communication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Electric Vehicle Chargers&lt;/strong&gt;: Managing charging schedules and monitoring usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TVs and Sensors&lt;/strong&gt;: Controlling entertainment systems and monitoring environmental conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This integration allows for real-time monitoring and control, enabling the AI agents to respond promptly to any detected anomalies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Collection and Analysis
&lt;/h2&gt;

&lt;p&gt;To gather and analyze data from these devices, I employed &lt;strong&gt;Matter Flow&lt;/strong&gt;, an open-source tool designed to produce comprehensive sensor logs. These logs serve as the foundation for the AI agents to detect patterns, identify anomalies, and make informed decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l40ieamksqp17nz4ep5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l40ieamksqp17nz4ep5.png" alt="Image description" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: Collecting Data from Matter devices using Matterflow.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Flow and Learning Process
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fks8i2k9zviecibqkibrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fks8i2k9zviecibqkibrf.png" alt="Agentic Flow" width="547" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: Flowchart depicting the data collection, analysis, and learning process within the system.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A crucial component of the system is its ability to learn and adapt over time. By incorporating &lt;strong&gt;human feedback reinforcement learning&lt;/strong&gt;, the AI agents can refine their decision-making processes based on real-world interactions and caregiver input. This iterative learning approach ensures that the system becomes more accurate and reliable, ultimately providing better care for elderly individuals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboard and User Interface
&lt;/h2&gt;

&lt;p&gt;For caregivers and family members to monitor and interact with the system, I developed a comprehensive &lt;strong&gt;dashboard&lt;/strong&gt;. This interface displays real-time data, alerts, and system status, allowing users to stay informed and take action when necessary. The dashboard serves as a bridge between the AI agents and human caregivers, fostering a collaborative approach to elderly care.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcoip8ukhpqyxe0re8u6g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcoip8ukhpqyxe0re8u6g.gif" alt="Dashboard Animation" width="1024" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Shoutout: LLM Engineering Master AI Course
&lt;/h2&gt;

&lt;p&gt;I want to acknowledge &lt;strong&gt;Ed Donner&lt;/strong&gt; for his exceptional &lt;strong&gt;&lt;a href="https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/?srsltid=AfmBOoqDgYJT79bukw9zJ3CvOnAm5InN5yokRJu8sxdTGECP7Ais2Z04" rel="noopener noreferrer"&gt;LLM Engineering Master AI course&lt;/a&gt;&lt;/strong&gt;. This program provided me with the knowledge and skills to embark on this project, transforming me from a novice to a confident AI developer in just six weeks. If you're looking to deepen your understanding of AI and large language models, I highly recommend this course.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Developing this AI-powered elderly care system has been a rewarding journey, blending the realms of AI, smart home technology, and compassionate care. By leveraging Matter-compatible devices and an agentic framework, I've created a system that not only enhances the safety and well-being of elderly individuals but also empowers caregivers with valuable tools and insights.&lt;/p&gt;

&lt;p&gt;For developers interested in making a tangible impact, exploring the intersection of AI and healthcare presents a wealth of opportunities. Whether you're passionate about improving elderly care or eager to dive into smart home integrations, there's ample room to innovate and contribute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Need more info?
&lt;/h3&gt;

&lt;p&gt;Check out my youtube video&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/4JS2w3kIWt4"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you want a more comprehensive tutorial style video, check out my &lt;a href="https://youtu.be/DNOUvUqoh3k" rel="noopener noreferrer"&gt;coding tutorial on youtube&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Here is a link to my &lt;a href="https://github.com/oidebrett/careagent" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you found this journey inspiring or have insights to share, feel free to connect and collaborate. Together, we can harness technology to create solutions that truly matter.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>iot</category>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
    </item>
  </channel>
</rss>
