DEV Community

Cover image for Cline in VS Code: I used it two weeks on a TypeScript project and this survived
Juan Torchia
Juan Torchia Subscriber

Posted on • Originally published at juanchi.dev

Cline in VS Code: I used it two weeks on a TypeScript project and this survived

Cline in VS Code: I used it two weeks on a TypeScript project and this survived

Back in 2005, when the internet café closed at 11pm and the place was packed, there was no time to read docs. You had to diagnose, run a command, see what happened, correct. That shaped something in me: deep respect for tools that let you see exactly what they're about to do before they do it, and deep suspicion toward anything that acts without warning you.

When I started evaluating autonomous coding agents in 2024, that same instinct pushed me to look at the permission model before any speed benchmark. Cline was the first one where I stopped for more than an hour configuring limits before writing a single real instruction.

My thesis, before going into detail: autonomous coding agents are not all the same, and Cline has a permission model that makes it more controllable than other tools — but the devil is in how you configure those limits, not in the tool itself. If you install Cline with defaults and ask it to refactor a complex module, you're going to have a radically different experience than if you invest 30 minutes defining what it can and can't touch.

What follows is an analysis based on two weeks of active use on a real TypeScript project, documenting delegated tasks, mistakes made, and configuration decisions. It's not a benchmark. There are no invented numbers. It's judgment earned through craft.


Cline as an autonomous agent: what the official page says and what it doesn't

Cline is available on the VS Code Marketplace as an open source extension. The official description presents it as an autonomous agent that can read files, execute terminal commands, navigate the browser, and create or edit code — all from inside VS Code.

What the page says clearly: Cline operates with an approval loop by default. Each potentially destructive action — writing a file, executing a terminal command — asks for confirmation before proceeding. You can configure "auto-approve" mode for specific categories, but you start in safe mode.

What the page doesn't say: that the quality of the output depends almost entirely on the model you connect. Cline is provider-agnostic — you can use Claude via Anthropic directly, via OpenRouter, GPT-4o, local models via Ollama, or whatever you want. That flexibility is genuine and it's a real advantage over tools that lock you into one provider, but it also means "using Cline" can mean completely different experiences depending on the model you choose.

In the experiment I'm about to describe, I used claude-3-5-sonnet via the Anthropic API directly. I didn't use OpenRouter in this iteration because I wanted to isolate the model variable.


What tasks I delegated and how I structured them

The project: a TypeScript codebase with Express, Prisma, and PostgreSQL. Nothing experimental in the stack — in fact, I deliberately chose a project with a known stack so I could evaluate Cline's errors without confusing them with my own uncertainty about the technology.

I split tasks into three categories before starting:

Category A — Full delegation with review at the end:

  • Generating Zod types from existing Prisma schemas
  • Writing unit tests for already-implemented pure functions
  • Creating seed files with consistent test data

Category B — Delegation with intermediate checkpoints:

  • Refactoring a validation module with high coupling
  • Migrating Express endpoints to a cleaner router structure
  • Resolving TypeScript strict errors in specific files

Category C — Not delegated, monitored:

  • Any changes to the database schema
  • Modifications to authentication logic
  • Changes to infrastructure configuration files

I didn't pull this classification from any guide — I built it after the first 48 hours, when Cline did something I didn't expect: in a Category B task, it decided to resolve a type error by changing an import in a file I hadn't mentioned, which was technically correct but pulled me completely out of context. Not a serious error, but a signal that "review at the end" didn't work for tasks with lateral dependencies.

// Example of an instruction that worked well for Category A
// (generating a Zod schema from an existing Prisma model)

// Instruction to Cline:
// "Generate a Zod schema for the User model from the schema.prisma file.
// Only the file src/schemas/user.schema.ts.
// Do not modify any other file.
// Use z.string().uuid() for the id field."

// Expected and received result:
import { z } from 'zod'

export const UserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  nombre: z.string().min(1),
  creadoEn: z.coerce.date(),
  actualizadoEn: z.coerce.date(),
})

export type User = z.infer<typeof UserSchema>
Enter fullscreen mode Exit fullscreen mode

The precision of the instruction matters more than the complexity of the task. That's the first thing I learned.


Where Cline screwed up — and what each error revealed

Error 1: Over-generalization of a local fix

I asked it to resolve a TypeScript error in a specific file. Cline fixed the error correctly, but also modified a shared type in a definitions file because "it was cleaner." Technically impeccable. Context completely lost on my end.

What it revealed: Cline reasons about the entire codebase, not the scope you give it. If you don't explicitly say "don't modify anything outside file X," it's going to explore laterally. This can be an advantage when you want it to find the real root cause; it's a problem when you want a surgical change.

Error 2: Tests that passed but didn't test anything useful

In test generation tasks, Cline delivered files with 100% coverage that were actually testing implementations, not behaviors. expect(fn()).toBeDefined() instead of expect(fn(input)).toEqual(expectedOutput). They passed. They contributed nothing.

What it revealed: the instruction "write tests for this function" is too open. You need to specify what edge cases you want covered, what behaviors are critical, and what level of assertion you expect. If you don't, Cline optimizes for coverage, not for utility.

// Vague instruction → tests that pass but are useless
// "Write tests for the calcularDescuento function"

// What it delivered (summarized):
describe('calcularDescuento', () => {
  it('should return a value', () => {
    // ← this tests nothing useful
    expect(calcularDescuento(100, 10)).toBeDefined()
  })
})

// Precise instruction → tests that actually matter
// "Write tests for calcularDescuento.
// Required cases:
// - 0% discount returns the original price unchanged
// - 100% discount returns 0
// - negative discount throws an Error with message 'Descuento inválido'
// - price 0 with any discount returns 0"

describe('calcularDescuento', () => {
  it('0% discount returns original price', () => {
    expect(calcularDescuento(100, 0)).toBe(100)
  })
  it('100% discount returns 0', () => {
    expect(calcularDescuento(100, 100)).toBe(0)
  })
  it('negative discount throws error', () => {
    expect(() => calcularDescuento(100, -5)).toThrow('Descuento inválido')
  })
  it('price 0 returns 0 regardless of discount', () => {
    expect(calcularDescuento(0, 50)).toBe(0)
  })
})
Enter fullscreen mode Exit fullscreen mode

Error 3: Autonomy without checkpoints in long refactors

The most time-costly error. In a Category B refactor, Cline completed 12 editing steps before I reviewed the intermediate state. The final result was correct, but there was a design decision in step 4 that I disagreed with — and rolling it back at that point took more time than just discussing it upfront.

What it revealed: for tasks with more than 5 steps, the review loop needs to be explicit. You can tell Cline to pause and wait for confirmation before moving to each phase — and it's worth doing.


Cline vs Claude Code: autonomy vs cost, without romanticizing either

Claude Code (Anthropic's terminal tool) and Cline share the same base model when you configure Cline with Claude. The difference isn't in the model's intelligence — it's in the execution environment and the cost model.

Cline:

  • You live inside VS Code. The visual context of the codebase is available.
  • You pay per token via the Anthropic API (or whichever provider you use). The cost is proportional to how much context you send and how many actions the agent executes.
  • The permission model is granular and configurable. You can tell it exactly which directories it can touch.
  • Each conversation is a new session — no persistent memory between sessions without extra configuration.

Claude Code:

  • You operate from a terminal with Anthropic's own CLI.
  • It has a Pro subscription model that can be more predictable in cost if you use a lot of context.
  • Git integration is smoother by design.
  • It builds codebase context by actively reading the filesystem.

My honest take: for point-editing workflows inside VS Code, Cline is more ergonomic. For tasks that cross many files with complex dependencies, Claude Code has an advantage in how it handles the context of the full conversation. They're not equivalent — they're tools with different strengths.

If you've got posts on rate limiting in web applications or middleware patterns in Next.js, you know that tool choice always depends on the most expensive constraint in the system. Here the constraint is: how much context do you need to maintain between steps? That determines which tool makes more sense.


Workflows I'd never hand over to it — and why

This is the most important section of the post, because the temptation to delegate everything is real and the cost of learning it the hard way is too.

1. Database schema changes
Cline can generate a Prisma migration. It can also get the migration direction wrong, or not account for existing data, or ignore foreign key constraints. The cost of an error here isn't "one bad file" — it's data. I don't give this control to any autonomous agent without full human review of the generated SQL.

If you want to see how I think about Prisma migrations with actual judgment, the post on Prisma 5 → 6 breaking changes has the framework I use.

2. Authentication and authorization logic
The model can generate functionally correct code with an attack surface you won't detect until someone exploits it. This is a domain where security judgment is non-negotiable and can't be delegated to a superficial review.

3. Refactors without prior tests
If you don't have tests covering the current behavior, you can't know if Cline broke something. This isn't a Cline problem — it's a problem with any change made without a safety net. But autonomous agents amplify the risk because the surface area of change is larger.

4. Architecture decisions
Cline can suggest an architecture. It can implement the one you ask for. It cannot evaluate business trade-offs, team context, or technical debt constraints that only you know. For reasoning through those decisions, I still prefer explicit deliberation — the kind of analysis that shows up in the post on digital identity architecture.


Decision checklist: when to use Cline, when not to

Before delegating a task to Cline, I run through this list mentally:

Green — delegate with a precise instruction:

  • [ ] The output is a new file with no lateral dependencies
  • [ ] There are existing tests covering the behavior you're about to change
  • [ ] The scope of the change is one isolated file or module
  • [ ] You can define the success criterion in one sentence

Yellow — delegate with explicit checkpoints:

  • [ ] The task has more than 5 sequential steps
  • [ ] The change touches more than 3 files
  • [ ] The result depends on a project-specific pattern that isn't documented
  • [ ] It's the first time Cline is working on that module

Red — don't delegate, use Cline only for an initial draft:

  • [ ] Any change to database schema or migrations
  • [ ] Authentication, authorization, or secrets handling logic
  • [ ] Changes to infrastructure configuration files (Docker, CI, environment variables)
  • [ ] Architecture decisions that affect multiple teams

Limits of what you can conclude from this

I want to be straight about what this analysis doesn't prove:

  • It doesn't prove Cline is better or worse than other tools in absolute terms. The connected model changes everything.
  • There are no verifiable speed metrics here. "Faster than without an agent" is a perception, not a number.
  • The errors described are observable patterns, not bugs reproducible in every context. The same instruction in a different codebase can produce different results.
  • The real cost depends on how much context you send per session. There's no generally valid number without knowing the codebase size and usage frequency.

What you can conclude: permission configuration and instruction precision have more impact on output quality than whether you use Cline vs another comparable tool. That learning is transferable.


FAQ — Frequent questions about Cline as a coding agent

Does Cline work with models other than Claude?
Yes. Cline is provider-agnostic — you can connect it with GPT-4o, OpenRouter models, Gemini via Google AI Studio, or local models via Ollama. Output quality varies with the model. The official VS Code Marketplace page documents supported providers.

How do you control which files Cline can touch?
Two main mechanisms: the .clinerules file (a file in the project root where you define agent behavior rules) and the default approval loop that shows you each action before executing it. In default mode, nothing executes without your explicit approval.

Does it make sense to use it if you're already using GitHub Copilot?
They're different tools. Copilot is intelligent autocomplete — it suggests as you type. Cline is an agent that executes complete tasks autonomously. They can coexist without conflict. The relevant question is whether you need full task delegation or inline assistance.

What happens to cost if you let the agent run on long tasks?
Cost scales with consumed tokens — both the input context and the generated output. On long tasks with many files in context, the spend can surprise you if you're not monitoring it. The practical recommendation: start with small tasks and measure cost per task before delegating large refactors.

Is it viable in TypeScript with strict mode on?
Yes, and in my experience strict mode helps — compiler errors are clear signals that Cline can read and iterate on. If you want to know which strict mode flags impact production the most, the post on TypeScript strict mode and tsconfig is where to start.

How does Cline's autonomy model compare to Claude Code?
Cline gives you more granular control inside VS Code — you can approve action by action. Claude Code has smoother git integration and handles context better in long sessions with many files. For point editing inside the editor, Cline is more ergonomic. For tasks crossing many modules with a long conversation history, Claude Code has the edge.


My take after two weeks

Cline survived the experiment. It stays in my workflow for Category A tasks — precise boilerplate generation, types, seeds, tests with explicit criteria. For everything else, I have the checkpoints.

What I don't buy: the narrative that configuring an autonomous agent correctly is a five-minute job. It's not. The .clinerules, the task classification, the scope definition per instruction — that takes time and gets refined through error. If someone tells you they installed Cline and delegated everything without issues from day one, they either have a very simple codebase or they didn't review the output carefully enough.

What I do accept: for a software architect who already has formed technical judgment, Cline is a tool that multiplies speed in the right parts of the work — the parts that are repeatable, definable, and verifiable. The decisions that matter are still yours.

The concrete next step if you want to reproduce this: install Cline from the VS Code Marketplace, create a .clinerules file in the root of your TypeScript project with the directories the agent cannot touch, and start with a Category A task. Measure the cost of that session. Then scale.


Original source:


This article was originally published on juanchi.dev

Top comments (0)