Richard Dillon

Posted on Apr 13

LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale

#ai #agents #machinelearning #programming

LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale

The LangChain team quietly shipped one of the most significant architectural changes to LangSmith in March 2026, and it wasn't a new model integration or a flashy UI overhaul. They renamed Agent Builder to Fleet—a seemingly cosmetic change that signals a fundamental shift in how enterprises should think about their AI agent portfolios. This isn't about building better individual agents anymore; it's about managing dozens or hundreds of agents as a coordinated organizational capability. If you're running more than a handful of agents in production, this is the infrastructure layer you didn't know you needed.

The timing is deliberate. LangChain's State of Agent Engineering report revealed a troubling gap: 89% of organizations have observability in place, but most lack centralized governance for their agent portfolios. Teams are spinning up agents in silos, duplicating prompt engineering effort, and losing track of which agents have access to what resources. Fleet directly addresses this governance vacuum by introducing three primitives that enterprise deployments desperately need: agent identity, role-based permissions, and reusable Skills.

With 300+ enterprise customers now processing over 15 billion traces, LangSmith has accumulated hard-won lessons about what breaks at scale. Fleet codifies these patterns into infrastructure that prevents the debugging nightmare of "which agent caused this production incident?" before it happens.

Fleet Architecture: Identity and Permissions Model

The core insight behind Fleet's design is deceptively simple: every agent in your organization needs a stable, auditable identity that persists across deployments, versions, and team ownership changes. This sounds obvious until you realize how most teams currently manage agents—as anonymous graph definitions deployed via CI/CD with no central registry.

Fleet's identity system assigns each agent a unique organizational identifier that ties together its definition, deployment history, performance metrics, and access patterns. This identity travels with the agent across environments. When your compliance checking agent runs in staging versus production, it's the same identity with environment-specific configuration, not two unrelated deployments you have to mentally correlate.

The permission model layers role-based access control on top of these identities. Fleet distinguishes between several permission levels: viewers can observe agent behavior and metrics, editors can modify prompts and tool configurations, deployers can push agents to production environments, and administrators can retire agents or transfer ownership. These map cleanly to organizational realities—your ML platform team shouldn't need to ask a Slack channel before deploying a new agent version, but they probably shouldn't be editing the legal team's contract review prompts without approval.

Sharing mechanisms enable cross-team agent reuse without the current anti-pattern of exporting agent definitions as JSON and importing them into another workspace. When a shared agent is updated, teams consuming it can see the update and choose to adopt it—version control semantics applied to agent definitions. This preserves provenance: you always know which team owns the canonical definition and what modifications downstream teams have made.

Integration with enterprise identity providers happens at the organization level. Fleet supports SSO and SAML authentication, which means your existing Okta groups or Azure AD roles can map directly to Fleet permissions. The compliance team's AD group gets viewer access to all agents; the ML platform team's group gets deployer access. No separate permission system to maintain.

The audit trail functionality captures agent lifecycle events comprehensively: creation timestamp and creator identity, every modification with diff visibility, deployment events across environments, permission grants and revocations. This isn't just for compliance checkbox exercises—it's the forensic evidence you need when a production agent starts behaving unexpectedly six weeks after someone made a "minor prompt tweak."

Skills: Equipping Agents with Reusable Capabilities

If identity and permissions form Fleet's governance layer, Skills represent its knowledge management layer. A Skill is a modular, shareable knowledge package that can be attached to any agent in your fleet. The March 2026 newsletter announced the first open-source Skills alongside the Fleet rename, establishing an ecosystem pattern that will likely expand significantly.

The architectural insight here addresses a real pain point: domain expertise shouldn't be trapped inside individual agent definitions. When your engineering team figures out the optimal way to interact with your internal APIs—the authentication dance, rate limiting patterns, error recovery logic—that knowledge currently lives in one agent's prompts. Every subsequent agent that needs API access must rediscover or copy-paste this expertise.

Skills separate domain expertise from agent orchestration logic. A "Company API Integration" Skill encapsulates authentication patterns, retry strategies, and response parsing conventions. A "Legal Document Formatting" Skill knows your organization's citation styles, confidentiality markings, and section numbering conventions. These Skills attach to agents without modifying the agent's core orchestration graph.

The skill attachment model supports both static and dynamic binding. Static attachment happens at agent definition time—you declare that your contract review agent always uses the Legal Document Formatting Skill. Dynamic attachment allows agents to discover and request Skills at runtime based on task requirements, though this requires more sophisticated capability negotiation that most teams won't need initially.

Version management for Skills solves the knowledge distribution problem. When your platform team improves the API Integration Skill—say, adding support for a new authentication method—that improvement propagates to all agents using the Skill. Teams can pin to specific Skill versions for stability or track the latest version for continuous improvement. The semantics mirror dependency management in traditional software development, which makes the mental model accessible to engineering teams.

The skill definition format establishes a standard structure that includes capability declarations (what the Skill enables), knowledge content (prompts, examples, patterns), tool bindings (if the Skill requires specific tool access), and configuration parameters (organization-specific customization points). This standardization means Skills can be shared not just within organizations but eventually across the emerging ecosystem of function calling improvements that the broader community is developing.

Hands-On: Code Walkthrough

Let's build a practical Fleet setup with multiple agents, custom Skills, and proper permission configuration. This example creates a document processing fleet for a legal department with shared compliance capabilities.

# fleet_setup.py
# Requires: langsmith>=0.3.0, langchain>=0.4.0, langgraph>=0.3.0

from langsmith import Client
from langsmith.fleet import (
    Fleet, 
    AgentIdentity, 
    Skill, 
    PermissionSet,
    SkillAttachment
)
from langgraph.graph import StateGraph
from typing import TypedDict, Annotated
import operator

# Initialize the LangSmith client with Fleet capabilities
client = Client()
fleet = Fleet(client=client, organization_id="your-org-id")

# Define agent identity with comprehensive metadata
# This identity persists across deployments and environment changes
contract_reviewer_identity = AgentIdentity(
    name="contract-reviewer-v1",
    display_name="Contract Review Agent",
    owner_team="legal-ops",
    purpose="Reviews vendor contracts for compliance issues and risk factors",
    deployment_environments=["staging", "production"],
    tags=["legal", "compliance", "vendor-management"],
    # Metadata for organizational classification
    metadata={
        "cost_center": "LEGAL-001",
        "data_classification": "confidential",
        "review_frequency": "quarterly"
    }
)

# Register the identity with Fleet - this creates the audit trail
registered_identity = fleet.register_agent(contract_reviewer_identity)
print(f"Agent registered with ID: {registered_identity.fleet_id}")

# Define a reusable Skill for compliance checking
# Skills encapsulate domain expertise separate from orchestration
compliance_checking_skill = Skill(
    name="corporate-compliance-rules",
    version="1.2.0",
    description="Encapsulates corporate compliance requirements for contract review",

    # Knowledge content that will be injected into agent context
    knowledge_content="""
    ## Corporate Compliance Requirements

    All vendor contracts must be reviewed against these criteria:

    1. DATA HANDLING: Contracts involving PII must include:
       - Data processing addendum (DPA)
       - SOC 2 Type II certification requirement
       - Data residency clauses for EU customers (GDPR)
       - Breach notification timeline (max 72 hours)

    2. FINANCIAL TERMS: Flag for legal review if:
       - Auto-renewal clauses exceed 1 year
       - Liability caps below $1M for critical services
       - Payment terms shorter than Net 30
       - Price escalation clauses above 5% annually

    3. TERMINATION RIGHTS: Require:
       - Termination for convenience with 90-day notice
       - Immediate termination for material breach
       - Data return/deletion obligations post-termination

    4. INDEMNIFICATION: Standard requirements:
       - Mutual indemnification for IP infringement
       - Vendor indemnification for data breaches caused by vendor
       - Carve-outs for gross negligence
    """,

    # Configuration parameters that can be customized per-organization
    config_schema={
        "liability_threshold": {"type": "number", "default": 1000000},
        "auto_renewal_max_years": {"type": "number", "default": 1},
        "breach_notification_hours": {"type": "number", "default": 72}
    },

    # Tags for discoverability in the Skill registry
    tags=["legal", "compliance", "contracts", "vendor-management"]
)

# Register the Skill with Fleet
registered_skill = fleet.register_skill(compliance_checking_skill)
print(f"Skill registered with ID: {registered_skill.skill_id}")

# Define the agent's state structure
class ContractReviewState(TypedDict):
    contract_text: str
    compliance_issues: Annotated[list, operator.add]
    risk_score: int
    recommendation: str
    skill_context: dict  # Injected by Fleet at runtime

# Build the LangGraph workflow for contract review
def analyze_data_handling(state: ContractReviewState) -> dict:
    """Check contract against data handling requirements from Skill."""
    # Skill context is automatically injected by Fleet
    skill_rules = state.get("skill_context", {}).get("knowledge_content", "")

    # Your LLM call here would use skill_rules in the prompt
    # This is where domain expertise from the Skill gets applied
    issues = []

    # Simplified example - real implementation would use LLM
    contract = state["contract_text"].lower()
    if "data processing addendum" not in contract and "dpa" not in contract:
        issues.append({
            "category": "DATA_HANDLING",
            "severity": "HIGH",
            "finding": "Missing Data Processing Addendum (DPA)",
            "remediation": "Request DPA from vendor before signing"
        })

    return {"compliance_issues": issues}

def analyze_financial_terms(state: ContractReviewState) -> dict:
    """Check financial terms against Skill-defined thresholds."""
    skill_config = state.get("skill_context", {}).get("config", {})
    liability_threshold = skill_config.get("liability_threshold", 1000000)

    issues = []
    # Implementation would parse contract and check against thresholds
    return {"compliance_issues": issues}

def calculate_risk_score(state: ContractReviewState) -> dict:
    """Aggregate issues into overall risk score."""
    issues = state.get("compliance_issues", [])
    high_severity = sum(1 for i in issues if i.get("severity") == "HIGH")
    medium_severity = sum(1 for i in issues if i.get("severity") == "MEDIUM")

    # Risk score: 0-100, higher = more risk
    score = min(100, high_severity * 25 + medium_severity * 10)

    recommendation = "APPROVE" if score < 25 else "REVIEW" if score < 50 else "REJECT"

    return {"risk_score": score, "recommendation": recommendation}

# Construct the graph
workflow = StateGraph(ContractReviewState)
workflow.add_node("data_handling", analyze_data_handling)
workflow.add_node("financial_terms", analyze_financial_terms)
workflow.add_node("risk_calculation", calculate_risk_score)

workflow.add_edge("data_handling", "financial_terms")
workflow.add_edge("financial_terms", "risk_calculation")
workflow.set_entry_point("data_handling")
workflow.set_finish_point("risk_calculation")

graph = workflow.compile()

# Attach the Skill to the agent identity
# This creates the binding between agent and reusable knowledge
skill_attachment = SkillAttachment(
    agent_id=registered_identity.fleet_id,
    skill_id=registered_skill.skill_id,
    version_constraint="~1.2.0",  # Accept 1.2.x patches automatically
    config_overrides={
        "liability_threshold": 2000000,  # Org-specific override
    }
)

fleet.attach_skill(skill_attachment)
print(f"Skill attached to agent")

# Configure permissions for the agent
# RBAC model controls who can view, edit, deploy, or retire
permissions = PermissionSet(
    agent_id=registered_identity.fleet_id,
    rules=[
        # Legal ops team has full control
        {"principal": "group:legal-ops", "role": "admin"},
        # General counsel can view and deploy
        {"principal": "group:general-counsel", "role": "deployer"},
        # All legal staff can view metrics and outputs
        {"principal": "group:legal-all", "role": "viewer"},
        # ML platform team can deploy but not modify prompts
        {"principal": "group:ml-platform", "role": "deployer"},
        # Compliance team needs read access for audits
        {"principal": "group:compliance", "role": "viewer"},
    ]
)

fleet.set_permissions(permissions)
print("Permissions configured")

# Register the compiled graph with the agent identity
# This connects the LangGraph definition to the Fleet identity
fleet.register_graph(
    agent_id=registered_identity.fleet_id,
    graph=graph,
    environment="staging"
)

# Query the Fleet to verify setup
agents = fleet.list_agents(team="legal-ops")
for agent in agents:
    print(f"\nAgent: {agent.display_name}")
    print(f"  Fleet ID: {agent.fleet_id}")
    print(f"  Skills: {[s.name for s in fleet.get_agent_skills(agent.fleet_id)]}")
    print(f"  Permissions: {fleet.get_permissions(agent.fleet_id).summary()}")

# Example: Run the agent with Fleet context injection
result = fleet.invoke(
    agent_id=registered_identity.fleet_id,
    input={
        "contract_text": """
        VENDOR SERVICES AGREEMENT
        This agreement between Acme Corp and BigCloud Inc...
        Payment terms: Net 15
        Auto-renewal: 3 years
        Liability cap: $500,000
        """
    },
    environment="staging"
)

print(f"\nReview Result:")
print(f"  Risk Score: {result['risk_score']}")
print(f"  Recommendation: {result['recommendation']}")
print(f"  Issues Found: {len(result['compliance_issues'])}")

This implementation demonstrates Fleet's key capabilities: persistent agent identity, Skills as reusable knowledge packages, and permission configuration that maps to organizational structure. The skill_context injection pattern keeps domain expertise separate from orchestration logic while making it available where needed.

Observability and Governance at Fleet Scale

Fleet-level observability transforms how you monitor agent portfolios. Instead of drilling into individual agent dashboards and mentally aggregating patterns, Fleet provides organizational views that answer questions like "which agents had the highest error rates this week?" or "which teams are consuming the most LLM tokens?"

The aggregated metrics dashboards span invocation counts, error rates, latency distributions, and token consumption across your entire agent fleet. You can slice these metrics by team, deployment environment, or custom tags. When leadership asks "how much are we spending on AI agents in the legal department?", you can answer with actual data rather than estimates.

Polly, LangSmith's AI assistant that reached general availability alongside Fleet, can analyze fleet-wide patterns and suggest optimizations. This isn't just a chatbot interface to your metrics—Polly can identify correlations across agents that humans miss: "Three agents in the legal-ops team show similar latency spikes every Monday morning, likely correlated with the weekly contract upload batch job."

The correlation between agent identity and LangSmith traces enables per-agent performance drilling that was previously impossible. Every trace automatically tags with the Fleet identity, so you can see exactly how the contract-reviewer-v1 agent performed across all its invocations—not just one deployment, but the complete behavioral history. This longitudinal view reveals drift patterns: if an agent's error rate creeps up over weeks, you can correlate with Skill updates or permission changes that might explain the regression.

Compliance reporting leverages the audit trail to generate reports showing which agents accessed which tools and when. For regulated industries, this isn't optional—legal frameworks are struggling to keep pace with agentic AI, and organizations that can demonstrate clear governance have an advantage. Fleet's event export integrates with enterprise monitoring systems like Datadog and Splunk, feeding agent lifecycle events into existing security and compliance workflows.

Cost attribution by agent identity enables chargeback and budget planning at the organizational level. When the CFO asks why the AI infrastructure budget tripled, you can show exactly which teams and which agents drove the increase—and more importantly, what business value they delivered.

What This Means for Your Stack

If you're operating five or more agents in production, Fleet's identity system addresses a concrete debugging problem: the "which agent did this?" nightmare. When a customer reports unexpected behavior, you need to trace from the symptom back to a specific agent, version, and invocation. Without stable identities, this requires manual correlation across deployment logs, trace IDs, and team knowledge about what's running where.

Skills reduce duplicated prompt engineering effort across your organization. If you've ever watched multiple teams independently solve the same problem—formatting API responses correctly, handling authentication retries, structuring outputs for downstream systems—you've experienced the knowledge fragmentation that Skills address. The investment in creating a Skill pays dividends across every agent that attaches it.

The permission model becomes essential before you give non-engineering teams access to modify agent behavior. The trend toward democratized agent creation is accelerating, and without permission controls, you face either a security nightmare or a bottleneck where engineers review every change. Fleet's RBAC model lets you give marketing the ability to tweak their content agent's prompts without giving them access to the production deployment pipeline.

The migration path from Agent Builder is relatively smooth: existing agents automatically receive Fleet identities when you enable Fleet on your workspace. Skill extraction is manual—you'll need to identify reusable knowledge currently embedded in agent prompts and factor it into Skills. This is work worth doing regardless of Fleet, as it forces you to document tribal knowledge that currently exists only in specific prompts.

Evaluate Fleet alongside whatever agent registry solution you currently have (or more likely, don't have). Many teams are using internal wikis, spreadsheets, or Notion pages to track agents—fine for two or three agents, increasingly untenable as portfolios grow. Fleet provides a programmatic registry with API access, which enables automation that informal registries can't support.

Start with identity and permissions before investing heavily in Skills. Governance foundations enable safe skill sharing later; without them, Skills become another vector for uncontrolled changes propagating across your fleet. Get the audit trail and permission model in place first, then layer Skills on top once you've established who can modify what.

What to Build This Week

Project: Agent Fleet Inventory and Skill Extraction

Before you can benefit from Fleet, you need to understand what you're managing. This week's project is a comprehensive inventory of your current agent portfolio with identification of skill extraction candidates.

Start by cataloging every agent your organization runs: the ones in production, the ones in staging that "will ship soon," and the forgotten experiments still consuming compute in someone's sandbox. For each agent, document: owner team, purpose, tools it accesses, data it processes, current deployment status, and rough monthly cost.

Next, analyze the prompts across these agents looking for duplicated expertise. Common patterns include: API interaction conventions, output formatting requirements, domain-specific terminology definitions, error handling approaches, and citation/attribution styles. These are your Skill extraction candidates.

Create one Skill from the most duplicated pattern you find. Define its knowledge content, configuration parameters, and tags. Attach it to two existing agents and verify they behave consistently. This gives you hands-on experience with the Skill model before you scale.

Finally, draft an RBAC policy for your fleet. Which teams should be viewers, editors, deployers, or administrators for which agents? Map this to your existing identity provider groups. You don't need Fleet to implement the policy—having it documented means you're ready when Fleet permissions become available in your workspace.

The deliverable is a Fleet readiness document: agent inventory, three skill extraction candidates with rough definitions, and an RBAC policy ready for implementation. This preparation work ensures you can adopt Fleet deliberately rather than reactively when the tooling matures.

Sources

- Data-Driven Function Calling Improvements in Large Language Models

This is part of the **Agentic Engineering Weekly* series — a deep-dive every Monday into the frameworks,
patterns, and techniques shaping the next generation of AI systems.*

Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.

Building something agentic? Drop a comment — I'd love to feature reader projects.

DEV Community

LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale

LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale

Fleet Architecture: Identity and Permissions Model

Skills: Equipping Agents with Reusable Capabilities

Hands-On: Code Walkthrough

Observability and Governance at Fleet Scale

What This Means for Your Stack

What to Build This Week

Sources

- Data-Driven Function Calling Improvements in Large Language Models

Top comments (0)