DEV Community

Cover image for Different Ways to Integrate LLMs into Software Engineering
WebGrip
WebGrip

Posted on

Different Ways to Integrate LLMs into Software Engineering

(Through the lens of GitHub's ecosystem)

Author's note: Over the coming weeks I will be posting more in-depth articles (with real examples) that gather and describe the concrete ways in which I have integrated LLMs into my own workflows, and as a consultancy. This article is a broad overview where I lay the groundwork for an understanding that is necessary to meaningfully apply lessons learned. Follow me for more and share your own thoughts on these matters—we are still in the extremely early stages of dealing with what can be called a paradigm shift.

Most days start the same way: open the laptop, run git status, try to move one useful thing forward.

Only a year or two ago, large language models (LLMs) were not part of that loop for many of us. Now they turn up in code completion, pull request reviews, release notes, and even in how we triage issues or interpret production telemetry.

Although the use of AI is spreading in our industry at break-neck speed, there are some things that cannot be rushed. How we, as humans, settle into interacting with this new piece of technology in a structured manner is one of those things. The goal of this document is to distinguish the various ways in which we as developers can meaningfully use these systems, and to bring clarity to how they can be introduced inside GitHub's ecosystem (GitHub.com, Copilot, Actions, and the many things wired into them).

Over time, four distinct integration modes have emerged in day-to-day work:

  1. Manual inference
  2. Copilot in the IDE
  3. Copilot as an agent
  4. Copilot in the platform and pipelines

Each mode fits different needs. The sections below unpack what they look like in practice.

1. Manual inference: AI as an external assistant

This is the simplest form of interaction: you send a prompt to an LLM, get a response, and decide what to do with it. The model has no direct access to your repository or any other information that you do not explicitly provide.

Typical development uses:

  • Paste a failing test and say "fix it".
  • Paste a long git diff and ask for help drafting a PR description.
  • Paste a GitHub Actions log and ask, "What is this job failing on?"

Key characteristics:

  • You initiate every interaction.
  • The output is only text. No files are edited and no PRs are opened automatically.
  • Zero implicit integration. Everything is copy/paste or a one-off CLI call.

Think of this as "AI as a smarter REPL" that lives outside your main workflow.

2. Copilot in your IDE: AI in the active development loop

In this mode the model moves into your editor. GitHub Copilot (or equivalent) runs where you write code and suggests completions based on the context in which you are working, such as the contents of the currently opened file, nearby files, clipboard contents, and relevant metadata.

Common cases:

  • You write a test name; Copilot suggests the full test body.
  • You write a function signature; Copilot suggests an implementation.
  • You add a comment like // handle pagination; Copilot sketches the loop.

Key characteristics:

  • Inline suggestions. It participates in every keystroke loop: think → type → see suggestion → accept or reject.
  • You still gate every change. It cannot save, commit, or reformat the code on its own.
  • Context-aware but local. It works mostly at the file or small-scope project level.

This is pair programming where your partner never tires, but also never commits anything without explicit approval.

3. Copilot as an agent: goal-driven editing

Here the interaction changes from "help me with this code" to "help me with this task." Instead of asking for a single function, you ask for a change at the level of a story or a ticket:

  • "Add logging to the checkout flow and cover it with tests."
  • "Migrate this module to the new configuration API."
  • "Find usages of this deprecated function and replace them."

An agent typically:

  • Reads relevant parts of the repository.
  • Plans and applies edits across multiple files.
  • Runs tests or other commands if configured.
  • Opens or updates a PR with the result.

Key characteristics:

  • You give it goals. It decides how to get there.
  • It operates at PR scope. This is no longer a single completion in a single file.
  • Your role shifts to stakeholder and reviewer. You assess the PR the way you would a junior colleague's work.

The value is obvious when the task is scattered and mechanical. The risk grows with the scope of the change and the criticality of the code.

4. Copilot in your platform and pipelines: AI as part of the environment

In this mode the integration moves off your local machine completely and into the system that surrounds your code. LLMs are invoked by GitHub events and CI/CD workflows rather than by you typing into a prompt box.

Examples:

  • Copilot reviewing PRs and leaving comments in the GitHub UI.
  • "Explain this file" or "Summarize this PR" buttons on GitHub.com.
  • GitHub Actions that generate release notes, triage issues, propose tests, open automated refactor PRs, or verify that documentation is still in sync with the code.

Key characteristics:

  • Event-driven. Triggers include "PR opened", "tag pushed", "workflow completed", "issue created".
  • Shared surface. Many developers see the results: comments, labels, generated artifacts, automated PRs, alerts.
  • Development infrastructure. Changes here affect teams and repositories, not just one developer's workflow.

At this point "using AI" looks less like a chatbot and more like automation for repetitive processes that require cognition and a degree of autonomy. When implemented well, it lightens everyone's load.

Axes for reasoning about LLM integration

Once the modes are clear, it helps to analyze them across a set of axes. Six that show up repeatedly in practice are:

  1. Position in the development loop.
  2. Initiative and autonomy.
  3. Blast radius and reversibility.
  4. Scope and quality of context.
  5. Feedback latency.
  6. Governance and auditability.

The following sections summarize each axis.

Axis 1: Position in the development loop

Where does the AI sit relative to your day-to-day feedback loop?

  • Manual inference — Outside. You alt-tab to a browser or run a CLI command. The model never acts directly on the repo.
  • Copilot in your IDE — Inside. Suggestions appear as you type and influence code shape continuously.
  • Copilot as an agent — Around. It wraps an entire task and lives at the change-request level.
  • Copilot in platform/pipelines — Outside and around. It watches events and acts as part of the operational environment.

The closer the model is to the inner loop, the more it shapes your code style and habits. The further out it is, the more it shapes process, tooling, and team coordination.

Axis 2: Initiative and autonomy

Two questions: who starts the interaction and how much can the system do without pulling a human into the loop?

  • Manual inference: Initiative is always yours; autonomy is zero.
  • Copilot in your IDE: Initiative is shared; autonomy is limited to editing text in your working copy.
  • Copilot as an agent: You state a goal; autonomy is medium-to-high within that goal (multi-file edits, tests, PRs).
  • Copilot in platform/pipelines: Initiative is driven by system events; autonomy is configurable via workflow policy.

As autonomy rises, you move from "tool" to "collaborator" to "production system," requiring explicit guardrails.

Axis 3: Blast radius and reversibility

When something goes wrong, how much can it break and how hard is it to undo?

  • Manual inference: Trivial blast radius and full reversibility—you can ignore bad advice.
  • Copilot in your IDE: Limited to your working copy; undo or reset is easy.
  • Copilot as an agent: Repository-level changes gated by PRs; reversal involves closing or reverting PRs.
  • Copilot in platform/pipelines: Org-level impact; rely on strong change management, approvals, and the ability to disable workflows quickly.

Use the strongest safety mechanisms where blast radius is largest.

Axis 4: Scope and quality of context

What context does the model reliably see when it makes a decision?

  • Manual inference: Only what you paste—precise but easy to omit important data.
  • Copilot in your IDE: Current file plus nearby code; strong local understanding, limited global awareness.
  • Copilot as an agent: Multi-file context guided by heuristics or explicit instructions.
  • Copilot in platform/pipelines: Workflow-defined context (commits, diffs, metadata, historical signals) with quality dependent on prompt engineering and data hygiene.

As you move toward agents and pipelines, the design of context becomes as important as model choice.

Axis 5: Feedback latency

How long between request and response, and what loop does that create?

  • Manual inference: Seconds; ad-hoc loop.
  • Copilot in your IDE: Sub-second; highly interactive coding loop.
  • Copilot as an agent: Seconds to minutes; task-level loop waiting on PRs or reports.
  • Copilot in platform/pipelines: Seconds to hours; process-level loop tied to CI, nightly jobs, or release cadence.

Prefer the shortest loop that solves the problem, and stay alert for modes that outgrow their usefulness.

Axis 6: Governance and auditability

How visible are the model's actions, and how easy is it to understand what changed and why?

  • Manual inference: Personal governance, no audit trail unless you create one.
  • Copilot in your IDE: Mostly personal; the code is visible but its origin is not.
  • Copilot as an agent: Team-level governance through PR rules, named accounts, and logs.
  • Copilot in platform/pipelines: Organizational governance with workflow definitions, logs, and explicit owners.

As LLMs move from convenience tools to delivery infrastructure, they require the same governance as any other production service.

Interpreting the data

Mode Position in loop Initiative & autonomy Blast radius & reversibility Context scope Feedback latency Governance & audit
Manual inference Outside the main loop Human-initiated, text-only Trivial; ignore output Whatever you paste Seconds; ad-hoc Personal, no audit trail
Copilot in IDE Inside the keystroke loop Shared prompts; low autonomy Working copy; undo/reset Current file plus nearby code Sub-second Team policy, implicit provenance
Copilot as agent Around a full change request Goal-driven; medium-high autonomy Repository-level; PR-gated Multi-file via heuristics Seconds to minutes Treat as user with PR history
Copilot in platform/pipelines Outside & around via workflows Event-driven; configurable autonomy Org-level workflows; depends on controls Workflow-fed (diffs, metadata, history) Seconds to hours Organisational governance & logs

As we can see, there are some observations to be made. The first thing that stands out is Platform & Pipelines. High autonomy, high scope, high governance and high change impact make for an amazing tool in our toolbox.

The next thing that draws attention is the IDE Copilot, which is strong exactly where fully automated AI workflows are not. It’s important to know what to use in which circumstance.

Finally, “Manual inference” is useful exactly because it’s low in everything except feedback speed. Sometimes being in complete control and withholding certain context from the LLM is beneficial to the process of solving a problem.

Closing thoughts

This article was about naming the territory: four ways in which LLM's show up in our work, and a few axes to reason about where they belong. In the next pieces, we'll stay very concrete. We'll look at how to make Copilot in the IDE actually pull its weight on real repositories (here's a little taste of where we're going), and where the limits are when you care about tests, design, and long-term maintainability.

From there, we'll move outwards into GitHub's experimental "Agentic workflows": natural-language workflows that compile down to Actions and let AI agents respond to issues, shepherd changes, and run multi-step tasks in your pipeline. With the shared vocabulary from this article, we can treat those agents the same way we treat any other piece of infrastructure - understand the blast radius, design the feedback loops, and then see, in code, where they genuinely simplify the work.

I believe the playing field has changed, and will continue to change (more) rapidly going forward. Talking about these questions explicitly turns "AI in the stack" from a vague ambition into a design problem. And design problems are exactly what software engineers know, and SHOULD know how to tackle.engineers know how to tackle.

  • Make sure everyone understands: accepting a suggestion carries the same responsibility as typing it yourself.

Keep autonomy low and blast radius small. The main risks are style drift and subtle API misuse—code review, automated tests, and linters are usually enough to catch that.

Step 3: Experiment with agents on safe ground

Before you let an agent work on critical services, give it a playground.

  • Use internal tools, documentation repositories, or non-production branches.
  • Focus on tedious but low-risk tasks: mechanical refactors, test additions, documentation alignment.

Treat every agent PR like a junior developer's first PR: review thoroughly, leave comments, and iterate. You learn how well the agent handles your codebase and how much process overhead it introduces.

Step 4: Integrate into platform and pipelines with conservative defaults

Start with read-only or comment-only responsibilities:

  • PR summaries.
  • File explanations.
  • Issue triage suggestions.
  • Test suggestions attached as comments, not committed code.

Only after you are comfortable with the quality and the logs should you consider code-changing automations:

  • Bots that open PRs but never auto-merge.
  • Workflows that update generated files under strict constraints.

The higher the autonomy and blast radius, the more you rely on PRs, approvals, and audit trails.

If an automation fails that test, pull it back one mode: from platform to agent, from agent to IDE support, or from IDE support back to manual inference.

Closing thoughts

This document was about naming the territory: four ways in which LLMs show up in our work, and a few axes to reason about where they belong. Next steps can dive deeper into making Copilot in the IDE pull its weight on real repositories (here's a taste of where we're going) and exploring GitHub Next's experimental Agentic Workflows.

With a shared vocabulary, we can treat AI-powered agents the same way we treat any other piece of infrastructure—understand the blast radius, design the feedback loops, and then see, in code, where they genuinely simplify the work. The playing field has changed and will continue to change rapidly; talking about these questions explicitly turns "AI in the stack" from a vague ambition into a design problem, and design problems are exactly what software engineers know how to tackle.

Top comments (0)