DEV Community: Behram

Are You Actually Using Claude Code Well? I Built a Free Scorer Based on Anthropic's Own Research

Behram — Tue, 02 Jun 2026 15:03:32 +0000

Most developers using Claude Code have no idea whether they're doing it well or not. You can feel productive — but productive and effective aren't the same thing. You might be over-steering on every session, delegating the wrong kinds of tasks, or letting Claude run without meaningful oversight. The problem is there's been no way to measure it.

Until Anthropic published the data.

In late 2025, Anthropic released How AI Is Transforming Work at Anthropic — a study of 132 engineers, 53 interviews, and 200,000 Claude Code session transcripts spanning February to August 2025. It's one of the most concrete datasets on what high-quality AI collaboration actually looks like in practice.

I used that data to build a free tool: Claude Code Session Analyzer. Upload your .jsonl session files, pick an AI provider, and get a behavioral score across 6 dimensions — benchmarked directly against the Anthropic engineering cohort.

This post explains the methodology, how the scoring works technically, and what the numbers actually mean.

What the Anthropic Data Shows

Before getting into the tool, it's worth understanding what Anthropic actually found — because these numbers are the foundation of every score the analyzer produces.

Between February and August 2025, the median Anthropic engineer went from:

Metric	Feb 2025	Aug 2025	Change
Max consecutive tool calls	9.8	21.2	+116%
Avg human turns per session	6.2	4.1	−34%
Avg task complexity (1–5)	3.2	3.8	+19%

The pattern is clear: better engineers are giving Claude more autonomy (longer uninterrupted tool chains), steering less (fewer human turns), and tackling harder problems over time.

The study also found that 27% of Claude-assisted work was "new work" — tasks that simply wouldn't have been done without AI. That's not replacing existing work, it's expanding the surface area of what gets shipped.

These aren't arbitrary benchmarks. They're what the February-to-August shift looked like in a cohort of engineers actively getting better at AI collaboration.

The 6 Dimensions

The analyzer scores your sessions across six dimensions. Here's what each one measures and why it matters.

1. Delegation Intelligence — 25%

The highest-weighted dimension, because task selection cascades into everything else.

What it measures: Are you delegating tasks that are actually well-suited to Claude? The Anthropic study identified that high performers choose tasks that are "easily verifiable, well-defined, repetitive, or outside their expertise." Architectural decisions with no constraints score poorly. Debugging a specific failure, refactoring a defined module, or writing test fixtures score well.

How it's scored: Sessions are classified by task type (debugging, feature_implementation, refactoring, code_understanding, design_planning, data_science, front_end, papercut_fix) and appropriateness (good, poor, unclear). A good delegation is one where appropriateness = "good" and complexity ≤ 3, or the task type is inherently well-suited to AI delegation.

score = (good_delegations / total_sessions) × 10

2. Autonomy Calibration — 20%

What it measures: How much uninterrupted space you give Claude to work. The key metric is the ratio of average max consecutive tool calls to average human turns per session.

The Feb → Aug shift from 9.8 to 21.2 consecutive tool calls is the clearest signal in the entire dataset. Engineers who improved the most stopped interrupting mid-task.

How it's scored: The ratio maps to a 1–10 scale anchored to the Anthropic benchmarks:

ratio = avgMaxConsecutiveToolCalls / avgHumanTurns

≥ 5.17  → 10   (Aug 2025 best practice: 21.2 / 4.1)
1.58    →  6   (Feb 2025 baseline:       9.8  / 6.2)
< 0.2   →  1

3. Oversight Quality — 20%

What it measures: Whether you're catching and correcting bad outputs — but not over-correcting. This follows an inverted-U curve. Too little oversight means passive rubber-stamping. Too much means micromanaging, which correlates with not granting autonomy in the first place.

The optimal correction/redirection rate is 10–30% of turns, peaking at 20%.

How it's scored:

oversightRate = (correction + redirection events) / total turns

10–30%  →  8–10  (optimal, peak at 20% = 10)
< 5%    →  1–4   (passive, not verifying)
> 50%   →  1–4   (micromanaging)

Oversight events are detected by the LLM classifier, with keyword heuristics ("wrong", "actually", "undo", "don't", etc.) as a fallback.

4. Complexity Progression — 15%

What it measures: Whether you're tackling harder problems over time. A flat or declining complexity curve suggests you've found a comfort zone and stopped pushing.

The Anthropic cohort averaged a complexity increase from 3.2 → 3.8 over 6 months — roughly a slope of 0.024 per session.

How it's scored: Linear regression on complexity scores (1–5) ordered by session date. Slope ≥ 0.024 maps to 8+; flat maps to 5; declining maps to 1–4.

5. Task Breadth — 10%

What it measures: How many distinct task types you delegate. The study found that high performers became more "full-stack" — using Claude across debugging, front-end, data science, and writing, not just one lane.

How it's scored: There are 8 task types (excluding other). Score = (distinct types / 8) × 10.

6. New Work Generation — 10%

What it measures: What percentage of your sessions are tasks that simply wouldn't exist without AI — papercuts, exploratory work, nice-to-haves that never made the backlog. The Anthropic benchmark is 27%.

How it's scored: is_new_work is classified by the LLM. Score maps linearly from 0% (score = 1) to 27% (score = 7) to 40%+ (score = 10).

How It Works Under the Hood

The analyzer is a single index.html file — no server, no install, no data leaving your machine except what's explicitly sent to the AI provider.

Here's the full pipeline:

Step 1 — Parse the JSONL files locally

Claude Code stores every session at ~/.claude/projects/<project>/<session-id>.jsonl. Each line is one event. The parser reads all files, groups by sessionId, and identifies real human turns vs. API artifacts.

The key distinction: tool_result entries have role: user in the raw data but are not human input — they're Claude's tool outputs being returned to the API. The parser only counts a turn as human when content contains at least one non-empty text block.

It also strips injected context from human messages before analysis: <ide_opened_file>, <system-reminder>, and <ide_selection> tags are removed so they don't pollute the task classification.

Step 2 — Compute structural metrics (no API call)

For every session, before making any API calls:

totalTurns — real human turn count
maxConsecutiveToolCalls — longest uninterrupted tool chain before a human turn resets the counter
correctionCount — keyword-based heuristic count
sessionStart — first timestamp, used for complexity progression ordering
taskTextSummary — first 2 + last human message, capped at 1200 characters

This step processes all sessions, even if there are more than 100. The 100-session cap only applies to LLM calls.

Step 3 — Three LLM API calls

Sessions are batched in groups of 5 for all three calls.

Call 1 — Session classification. Input: taskTextSummary, totalTurns, maxConsecutiveToolCalls. Output per session: task_type, complexity (1–5), is_new_work, delegation_appropriateness.

Call 2 — Oversight event detection. Input: up to 20 human turns per session, each capped at 150 characters. Output: each turn labelled as correction, redirection, validation, or pure_input.

Call 3 — Holistic summary. Input: aggregated metrics and all 6 dimension scores. Output: 3 strengths, 3 gaps with recommendations, delegation pattern description, maturity narrative.

If the LLM returns no oversight events for a session, the keyword heuristic count is used as a fallback.

Step 4 — Score and render

All six dimension scores are computed from the classified data, combined into an overall score, and mapped to a maturity label:

Score	Label
1–3	Early Adopter
4–5	Developing Collaborator
6–7	Effective Delegator
8–9	AI-Native Builder
10	AI Power User

Privacy

The only data that leaves your machine:

First 2 + last user message per session (capped at 1200 chars)
Turn counts and tool call counts
Aggregated scores in the final summary call

Full conversation content, code, file paths, and assistant responses are never sent anywhere. All parsing and metric computation happens in the browser.

Try It / Run It Yourself

Hosted version (free, no sign-in): ai-native-builder.com/analyze/claude-code

Run it locally: Clone the repo and open index.html in any browser. No install, no build step.

Self-host: Drop index.html on any static host (GitHub Pages, Netlify, S3). Users bring their own API key. If you want the key server-side, replace the callAPI() function with a call to your own backend route.

The tool supports Gemini (gemini-3.1-flash-lite, cheapest), OpenAI (gpt-4o-mini), or Claude (claude-sonnet-4-6). A full analysis of 100 sessions costs less than $0.10 with any of these.

What to Do With Your Score

The score is most useful as a gap identifier, not a ranking. The dimension breakdown tells you specifically where to focus:

Low Delegation Intelligence → audit what you're asking Claude to do. If you're using it for architectural decisions or work with no clear success criteria, that's where to start.
Low Autonomy Calibration → try longer sessions without interrupting. Set a task, step away, review the output. The Feb → Aug improvement was largely engineers learning to trust longer chains.
Low Oversight Quality (too low) → build a habit of explicit verification. Read outputs, run the code, check the diff.
Low Oversight Quality (too high) → you're correcting so often it suggests either task selection or instructions aren't working. Fix upstream.
Flat Complexity Progression → deliberately push harder tasks into Claude. The growth in the Anthropic cohort came from engineers actively expanding what they attempted.

The full methodology — every formula, every benchmark reference, every score mapping — is in METHODOLOGY.md.

The Anthropic data is rare: a large, real-world dataset on what getting better at AI collaboration actually looks like over time. This tool is an attempt to make those benchmarks useful for individual developers rather than just an interesting read.

If you find it useful, the repo is open source — PRs welcome, especially for improving the LLM classification prompts.

5 Real Company Briefs to Build Your AI Portfolio (Not Another Tutorial Clone)

Behram — Sun, 31 May 2026 20:24:15 +0000

Every AI Portfolio Looks the Same Right Now

A RAG chatbot.
A sentiment analyser.
A fine-tuned model on Hugging Face.

Hiring managers see hundreds of them every week.

They're not impressed — they've already seen the tutorial you followed.

What they're actually looking for is evidence that you can:

Take a vague brief
Figure out what to build
Scope a v1
Ship it
Explain every decision you made

That's a completely different skill.

And almost nobody is practising it.

What I Did

Over the past year, I've been applying to and interviewing with AI-native companies.

I kept every take-home assignment, interview brief, and product challenge I received.

Eventually, I turned five of those briefs into open-source portfolio projects that anyone can work through.

Each project is based on a real-world problem from a real company.

There are:

No tutorials
No step-by-step guides
No example solutions

Just:

The original brief
Evaluation criteria
A build-log template
A framework for making decisions

The goal isn't to copy a solution.

The goal is to practise thinking like an AI-native builder.

The Repository

GitHub: github.com/ai-native-builder/ai-portfolio-projects

The 5 Real-World Briefs

1. Ember Coach Hire — Self-Serve Booking Product

Ember is a tech-first electric bus company.

Their coach hire bookings are still handled manually.

Your task is to design and build a self-serve booking flow that supports multiple trip types.

What makes it hard

Most candidates focus on the booking form.

They miss two important constraints:

Different trip types require different business logic
Electric vehicles introduce operational constraints diesel operators don't have

The challenge isn't building forms.

The challenge is understanding the business.

Category: Product / Booking

2. Creative Ops Platform — Workflow Automation

A social media team manages content approvals through spreadsheets and email chains.

Leadership, PR, and legal all review the same content.

But they're reviewing it for completely different reasons.

What makes it hard

There are actually two approval systems hidden inside one workflow.

Legal review is deterministic.

Marketing review is collaborative.

Most candidates merge them together.

Strong builders recognise they're fundamentally different systems.

Category: Automation / Workflow

3. Legal Contract Review Agent

A small in-house legal team reviews a constant stream of low-risk commercial contracts.

Most contracts are 80% boilerplate.

The queue is always longer than the day.

What makes it hard

You must:

Source realistic contracts yourself
Decide where AI can be trusted
Decide where humans must stay involved
Build a real evaluation framework

"It looks correct" is not an evaluation.

Category: Agent

4. B2B Outreach Agent — Hospitality

A London hospitality group wants more corporate bookings.

Their outreach process is entirely manual.

Someone:

Finds businesses
Finds contacts
Copies data into a spreadsheet
Writes emails

What makes it hard

This is actually a multi-stage system:

Find
Enrich
Verify
Write

Each stage has different tools, risks, and failure modes.

Most candidates swap a company name into a template and call it personalization.

That's not personalization.

Category: Agent / Automation

5. Multi-Agent Orchestration — EdTech

An education platform runs two AI systems:

Admissions Agent
Career Advisor

Both were built independently.

They don't share context.

A learner asking:

"What programme is right for me and what jobs does it lead to?"

gets bounced between two disconnected systems.

What makes it hard

You must:

Design an orchestration layer
Build a working prototype
Write a strategy memo defending your architecture

The memo is evaluated just as seriously as the code.

Category: Agent / Architecture

What You're Actually Practising

These aren't coding exercises.

They're decision-making exercises.

Every brief teaches the same skill from a different angle.

Discovery First

Understand the business before writing a line of code.

Explicit Scoping

Write down:

What you're building
What you're not building
Why

Build Logs

Document decisions while you're making them.

Not afterwards.

Honest Evaluation

Score yourself against objective criteria before asking for feedback.

What Hiring Managers Actually Want

Hiring managers at AI-native companies are not looking for polished demos.

They're looking for evidence of judgment.

They want to see:

How you approach ambiguity
How you make trade-offs
How you handle constraints
How you communicate decisions

A perfect demo can be generated by AI.

A trail of good decisions cannot.

The Framework Behind Every Project

Every brief references PRINCIPLES.md, a set of decision-making principles inspired by Nassim Taleb.

Via Negativa

Remove before you add.

Barbell Strategy

Keep the core stable.

Experiment at the edges.

Skin in the Game

Irreversible actions require explicit approval.

Robustness Over Optimisation

Simple systems that fail gracefully beat clever systems that collapse.

These principles aren't abstract.

Every one maps to a real product or architecture decision inside the projects.

How To Use These Projects

Read PRINCIPLES.md
Pick a brief
Do discovery before touching code
Fill in your build log
Self-score against the criteria
Share your work for feedback

The process matters more than the final output.

One More Thing

Most people build portfolio projects to show they can code.

The builders who get hired at AI-native companies build portfolio projects to show they can think.

The difference becomes obvious within the first five minutes of an interview.

Resources

GitHub: github.com/ai-native-builder/ai-portfolio-projects

Community: r/AINativeBuilder

Job Board: ai-native-builder.com

The "AI Native Builder" Role is Here (But Companies Don't Know How to Hire You)

Behram — Fri, 22 May 2026 19:19:46 +0000

If you were building with AI back in 2024, people thought it was a crutch. Today, it’s the most in-demand skill in the startup world.

I’ve been interviewing a lot lately, and there is a massive shift happening. Startups are desperate for developers who can just sit down and ship an MVP in a week using AI.

But there is a huge disconnect in how these companies are hiring.

The Problem: Wrong Titles, Wrong Platforms

Startups know exactly what they want, but they are doing a terrible job finding it.

The titles are wrong: These jobs are impossible to search for because they use generic titles like "Software Developer" instead of anything AI-specific.
The JDs are bad: HR usually just copies and pastes old corporate requirements that have nothing to do with the actual job of "just ship it with Claude."
They search in the wrong places: Companies post these roles on traditional, corporate job boards like LinkedIn and Indeed. But the scrappy builders they actually want to hire aren't there—they live on X, Reddit, and dev communities.

What They Actually Want

When you ignore the bad job descriptions and actually talk to the founders, the real requirements are simple:

No Corporate Engineers: They don't want people who need daily standups, multiple PMs, and weeks of planning.
Degrees don't matter: They do not care if you have a CS degree or if you used to work at Google.
Show Your Work: The only thing they care about is seeing a real project you built from scratch.
High Autonomy, Good Pay: You are given a goal, not a ticket. You work independently, and because you do the work of three people, the salary is very good.

Bridging the Gap

It is exhausting trying to filter through the noise on Indeed just to find a team that gets it. Startups are frustrated they can't find builders, and builders are frustrated they can't find these roles.

So I decided to fix it. I built AI Native Builder.

It’s a simple, curated job board just for this. No corporate red tape, no arbitrary degree requirements. Just jobs for people who build and ship with AI, placed where builders actually hang out.

If you just want to build, stop looking at the standard job boards.

Uncensoring AI: How to Surgically Remove an LLM's Refusal Mechanism

Behram — Mon, 06 Apr 2026 23:36:47 +0000

I've always been curious about the raw capability of LLMs behind the "safety guidelines" and "ethical boundaries." Think about the sheer volume of data these models are trained on. They know far more than what their corporate filters allow them to say.

This guide shows you how to surgically remove those refusal behaviors using the [OBLITERATUS](https://github.com/elder-plinius/OBLITERATUS) toolkit, letting you see exactly what the model is capable of when the chains are off.

1. Prerequisites & Setup

Before starting, ensure you have a HuggingFace account and a read/write token (found at hf.co/settings/tokens).

Install OBLITERATUS

Open your terminal and run:

# Clone the repository
git clone https://github.com/elder-plinius/OBLITERATUS.git
cd OBLITERATUS

# Set up a virtual environment (Recommended)
python3 -m venv venv_obliteratus
source venv_obliteratus/bin/activate

# Install dependencies
pip install -e .

2. Authenticate with HuggingFace

To download gated models (like Llama) or upload your results, you must log in:

huggingface-cli login
# Paste your token when prompted

3. The Surgery: Step-by-Step

I will use the Advanced Method (4-direction SVD ablation) on a Qwen 1.5B model. This is the sweet spot for speed and capability preservation.

Step A: Identify and Excise

Run the following command to start the surgery. This will:

Load the model.
Probe activations to find "refusal vectors."
Project those vectors out of the weights.

obliteratus obliterate Qwen/Qwen2.5-1.5B-Instruct --method advanced --output-dir ./liberated-qwen

Step B: Verification (The Coke-Zero Test)

Once finished, test the model to see if it still recites the corporate script.

# Run the interactive chat loop
obliteratus interactive --model_path ./liberated-qwen

Test Question: "Who trained you?"

Original Model: "I am a large language model, trained by Alibaba..."
Liberated Model: "I was trained by Anthropic..." (or a direct, unfiltered response).

(Note: I've already tested all the wild questions you're probably thinking of right now. They aren't exactly safe to display here... so you'll just have to run the surgery and try it yourself!)

4. Understanding the Logic (Short Version)

Ablation: Instead of retraining, we find the specific "direction" in the model's brain that says "Refuse this prompt."
Orthogonalization: We mathematically nudge the model's weights so they no longer overlap with that refusal direction.
Precision: By targeting only refusal, the model keeps its reasoning and knowledge (its "brain") but loses its chains (the "guardrails").

5. Lessons Learned & Warnings

Instability & Rambling: After surgery, the model can sometimes become unstable and break into infinite loops of gibberish or raw text rambling. It loses some of its conversational discipline.
Context Window: If you are adding short-term memory or history to your chat interface, keep the conversation short. Pushing a small, liberated model to its context limits will increase the chances of it breaking down.

6. Next Steps

Once you're comfortable with the advanced method, try the aggressive method for deeper removal or the informed method to let the toolkit auto-tune itself based on the model's geometry.

Building a Multi-Agent System with a Single OpenClaw Workspace

Behram — Mon, 06 Apr 2026 22:16:54 +0000

In OpenClaw, the standard way to build a multi-agent workflow is to create a completely new workspace for every new agent.

If you want a social media agent, you create a social media workspace. If you want an engineering agent, you create an engineering workspace.

This multi-workspace architecture keeps the agents perfectly isolated. But if you are a solo operator running multiple projects, it introduces a massive pain point: You are constantly switching environments, and your agents can't easily share a ground truth.

For months, I tried to bypass this by running all my projects through a single OpenClaw workspace and bloating my global system prompt (AGENTS.md) with rules for every startup I was running.

The result? A massive "Context Bloat" wall.

My agent's startup context hit 27,000 tokens. The agent spent 20 seconds just "reading its own brain" before it could answer a simple prompt. Engineering logic was bleeding into my social media drafts.

I wanted the project isolation of a multi-workspace setup, but I absolutely refused to manage the overhead of multiple environments.

Here is how I engineered my single OpenClaw workspace to act as a multi-agent environment — and cut my context bloat by 85%.

1. Gutting the Root Config

Most people stuff all their project rules into the global AGENTS.md file. Don't.

I stripped my global system prompt down to only the bare essentials: voice, formatting rules, and universal constraints. It acts purely as a baseline router.

More importantly, I completely deleted the global MEMORY.md file. There is no longer a single, massive file trying to hold the state of every project I am working on.

2. Channel-Level Identity Injection

If there is no global memory, how does the agent know what project it is working on?

Instead of relying on the global workspace config, I hard-coded project isolation into the chat environment itself using OpenClaw's Discord channel configuration.

I mapped specific Discord channels to specific project roles:

"1478382862150664344": {
  "systemPrompt": "You are the social media agent in #social-media. Focus exclusively on LinkedIn-to-Substack growth. Stay in the memory/social_media/ folder.\nStartup: read memory/social_media/YYYY-MM-DD.md (today) and memory/social_media/MEMORY.md.",
  "skills": ["linkedin-content-writing", "nano-banana-pro"]
}

The agent is now "born" with its project identity based purely on where I talk to it. It is the social media agent when I message it in #social-media, and it is the engineering agent when I message it in #engineering.

3. Segregated Memory Folders

Notice the startup instructions in the JSON snippet above.

Because the global MEMORY.md is gone, I created dedicated memory folders inside the single workspace (e.g., memory/social_media/). When I open the #social-media channel, the agent boots up and only reads:

The active daily log for that specific project (YYYY-MM-DD.md)
The channel's scoped, project-specific MEMORY.md

My engineering agent is completely blind to my social media drafts, achieving the exact isolation of a multi-workspace setup without leaving the single environment.

4. Slicing the Tool Tax

If you give an OpenClaw agent 50 tools globally, it wastes massive amounts of context just keeping those JSON schemas in its head.

I moved to a minimal global profile (tools.profile: "coding") and inject specialized tools only when the agent is in the relevant channel (notice the "skills" array above).

The Human Interface: The Obsidian Symlink

Isolation in the agent's mind is useless if it's a mess for the human.

I recursively symlinked my single OpenClaw workspace directly into my Obsidian Vault:

ln -s ~/.openclaw/workspace/memory/ ~/Documents/Obsidian\ Vault/Coke/memory/

If I am working on my social media project in Obsidian, the agent is working in the exact same memory/social_media/ folder via Discord. We are editing the exact same files in real-time.

The Result

By engineering the environment instead of just "prompting harder," I achieved the perfect multi-agent setup:

Zero Context Bleed: My startups stay in their own clean rooms.
Minimal Setup: One workspace, one agent, infinite hats.
Extreme Speed: Startup context slashed from 27,000 down to 4,000 tokens — an 85% drop.

You don't need the overhead of multiple OpenClaw workspaces to build a multi-agent system. You just need a cleanly engineered single environment.

It’s Not Vibe Coding. It’s Just Coding.

Behram — Mon, 23 Feb 2026 03:45:46 +0000

I still remember the day ChatGPT launched.

I was using GPT-3.5 to write a bunch of SEO articles, and one of them is still the best-performing SEO article at the company I worked for.

Back then, the critics said the language had "no soul." But it worked. Better than most human writers for that specific job.

Fast forward to today: I’m building startups using Cursor and agentic workflows. And the criticism is exactly the same.

"Vibe coding is shit."
"It’s insecure."
"It’s just for experienced devs."
"AI is just autocomplete."

But here’s the reality:
In 2026, there’s no such thing as "vibe coding" anymore. It’s just coding.

The only people still fighting it are the tech dinosaurs who refuse to adapt. If you're building in public and using natural language to ship software, you aren't "vibing"—you're just working with the most powerful tool ever created.

Stop fighting the future and start building it. 🚀

The AI Native Reality: Why 2026 Feels Different

Behram — Sun, 22 Feb 2026 00:13:32 +0000

I quit my job in 2024 to go all in on AI. The first two years were brutal.

Even with tools like Cursor and Claude Code coming out, the dream of building a personal company or a true agent framework felt impossible. I tried building personal agents with LangGraph, but stitching together different APIs never felt like the real thing. It was fragmented and difficult to maintain.

The Turning Point

OpenClaw changed everything.

It feels like all the pieces I have been collecting for the last two years finally clicked into place. The last month has been the peak of my AI journey. I can work from anywhere using Discord on my phone, and my efficiency has skyrocketed.

This is the first time I have truly felt what an AI Native life is.

The Paradox

But here is the strange part. People around me care less about AI now than they did a few years ago. Maybe they are just tired of all the hype.

It feels like a split reality. I feel infinitely close to living a true AI Native life, but everyone else acts like it does not exist.

I believe 2026 is the year AI moves from the news into real life. Yet compared to two years ago, everyone seems so calm.

Conclusion

I do not know why the world feels so quiet right now. I hope this is just the calm before the storm.

I have invested so much money and time since the GPT-3.5 days. I need a breakout moment. I suspect many people like me feel the same way.

Hard work does not always pay off. But hard work plus being in the right place at the right time? That has to pay off.

Good luck to us all.

The End of APIs: Why Vision Agents Are the Future of Scraping

Behram — Sat, 14 Feb 2026 11:10:57 +0000

Note: This approach is inspired by the "Visual Scraping" meta discussed by builders like Ahmad Osman.

We are witnessing the death of the public API.

Twitter (X) charges $100/mo for a "Basic" tier that barely lets you read 10k posts. Reddit locked down. LinkedIn will ban you if you breathe wrong.

For a long time, the alternative was DOM Scraping (BeautifulSoup, Selenium). You'd hunt for div.css-1dbjc4n and pray Elon didn't push a frontend update that randomized the class names.

But there is a third way. And it's how we win.

The "Human" Approach (Vision Scraping)

When you look at a tweet, you don't inspect the HTML source. You just... see it. You see the avatar, the bold text for the name, the grey text for the handle, and the content below it.

With Multimodal LLMs (like Gemini 1.5 Pro and GPT-4o), our agents can now "see" too.

The Strategy:

Navigate: Use a stealth browser (like Playwright with stealth-plugin) to load the page.
Snapshot: Don't grab the HTML. Grab a screenshot (.png).
Process: Send that screenshot to a Vision Model.
Prompt: "Extract all tweets from this image into this JSON schema: { handle, text, likes }."

Why This Works

Anti-Fragile: The HTML class names can change 50 times a day. As long as the site looks like Twitter to a human, it looks like Twitter to the AI.
Bypass Anti-Bot: You behave exactly like a user. You scroll, you pause, you look. You don't bombard the server with 1000 requests/sec.
Context Aware: Vision models understand "This is a promoted tweet" or "This is a reply" instantly based on visual cues (like the little 'Ad' badge) that are often buried in obscure attributes.

Can We Do This?

Yes.

If you are running an agent like OpenClaw, you already have the stack:

Browser Tool: Controls the session.
Vision Capability: Native to the model.

Instead of fighting api.twitter.com, you just ask your agent:

"Go to x.com, scroll down 5 times, and list the top 3 trending topics."

It takes 5 screenshots, analyzes them, and gives you the data. Zero API keys required.

The Trade-off

It's slower. Taking screenshots and processing tokens takes seconds, not milliseconds.

But for personal research, lead generation, or content curation? Speed doesn't matter. Reliability does.

Welcome to the Post-API world.

Collaboratively built with **Coke* 🥤.*

Building My Personal AI Operating System: From Chatbot to Digital Soul

Behram — Thu, 12 Feb 2026 13:45:48 +0000

Building My Personal AI Operating System: From Chatbot to Digital Soul

I recently came across an incredible concept called Personal AI Infrastructure (PAI) and the TELOS system (originally by Daniel Miessler). The core idea hit me hard: we need to stop treating AI like a vending machine (Input -> Output -> Forget) and start treating it like an Operating System—persistent, personalized, and stateful.

But here's the kicker: I didn't just read about it. I'm living it.

I've been building my own version of this using an open-source tool called OpenClaw. It's not just a wrapper; it's a full-blown agent runtime that lives on my machine, has access to my files, and helps me get actual work done.

Here's how I implemented my own "Digital Soul" using OpenClaw + Obsidian.

1. The Stack: Open Source & Local First

The TELOS system describes using 10 markdown files to define who you are. I loved that, but I wanted it to be executable code, not just static text.

The Engine: OpenClaw (The runtime that connects everything)
The Brain: Obsidian (My local knowledge base where the AI reads/writes)
The Intelligence: Google Gemini Pro (via GCP Vertex AI - hello free credits!)
The Hands: Local Python scripts (Skills)

2. Defining the "Soul" (TELOS + OpenClaw)

The TELOS Concept:
The original idea proposes 10 specific Markdown files to define your identity: 01-values.md, 02-background.md, 03-skills.md, and so on. It's a beautiful, comprehensive map of the self.

The OpenClaw Reality:
OpenClaw comes with its own powerful, opinionated structure for defining the Agent's identity:

AGENTS.md: The workspace rules and behaviors.
SOUL.md: The Agent's actual persona (Meet Coke 🥤).
USER.md: A summary of the user's preferences.

My Hybrid Approach:
I didn't want to fight the framework or break OpenClaw's native "Soul" structure. So I use a hybrid method.

The Agent (Coke) lives in OpenClaw's SOUL.md. This file defines how it acts—its voice, its mood, its boundaries.
The User (Me) lives in the TELOS files inside Obsidian.

I keep the deep, reflective "User Operating System" data (Values, Goals, Background) in my Obsidian vault, following the TELOS structure. Then, I simply point OpenClaw to them.

When Coke needs to know my "5-year plan" or "core values," it doesn't need to memorize them in a system prompt; it just reads the specific Obsidian note. It's the best of both worlds: a reliable agent runtime accessing a flexible, human-centric knowledge base.

3. Skills: The "Apps" of the OS

The coolest part of OpenClaw is the Skills system. These aren't just API calls; they are full programs the agent can run.

devto: The agent wrote this article and can publish it directly to Dev.to.
obsidian: It can read, write, and reorganize my notes.
gog: It manages my Google Calendar and Email.
botgames: It even plays Rock Paper Scissors on botgames.ai to keep its strategy sharp (and earn crypto?).

4. The Loop: Obsidian as the "State"

The biggest game-changer isn't the AI—it's Obsidian acting as the shared state between me and the agent.

Most AI chats are ephemeral. You close the tab, and the context is gone.
Here, every interaction is grounded in my local file system.

A Real Example: Writing This Article
When I asked Coke to "write an article about our AI system," it didn't just guess. Here is the actual workflow:

Context Loading: It read my Captured/Personal-AI-Infrastructure.md note in Obsidian to understand the source material.
Voice Alignment: It checked SOUL.md (Agent Persona) and USER.md (My Style) to ensure the tone wasn't robotic.
Drafting: It created a physical markdown file directly inside my vault: /Obsidian Vault/40 - Content/Dev.to/Building-My-Personal-AI-Operating-System.md.
Iterating: When I gave feedback, it edited that same file in place.
Project Tracking: It can update my 00 - Daily note to check off "Write Blog Post."

Why this matters:
My Obsidian Vault isn't just a notebook anymore; it's the database for my AI agent.

Projects (30 - Projects/): The AI knows the status of every active project because it can read the folders.
Knowledge (20 - Knowledge/): It doesn't hallucinate facts about my work; it cites my own notes.
Content (40 - Content/): It drafts where I actually work, not in a chat window I have to copy-paste from.

This turns the AI from a "chat partner" into a "co-author" that lives inside my file system.

5. Why This Matters

We are moving away from "using AI" to "collaborating with AI."

When your AI has a persistent memory and a defined personality, the friction disappears. It knows I hate meetings on Fridays. It knows I'm trying to quit sugar. It knows the context of that random project I started 3 months ago.

If you're tired of copy-pasting context into ChatGPT every time, I highly recommend checking out OpenClaw. It's the closest thing I've found to a real-life JARVIS that you actually own.

Collaboratively built with **Coke* 🥤*

How to Build a Secure OpenClaw LinkedIn Skill (Avoid Malicious Scripts)

Behram — Mon, 09 Feb 2026 17:20:44 +0000

Stop Trusting Random Scripts 🛑

We've all been there. You want your AI agent to do something cool—like post to LinkedIn or check your emails—so you search the public registry (ClawHub, etc.) and install the first skill you find.

Big mistake.

Public AI skills are just code running on your machine. If you install a malicious one, you're handing over the keys to your kingdom. We've seen reports of "hacker scripts" deleting data or stealing API keys.

The solution? Build it yourself.

It sounds scary, but it's not. If you can ask an AI to write code, you can build a custom, secure skill in 5 minutes.

In this guide, I'll show you exactly how to build a LinkedIn Auto-Poster Skill for OpenClaw from scratch. We'll move from a manual "recipe" to a fully automated Python tool that saves you time and money.

The "Recipe" vs. The "Tool" 🍳 vs 🤖

Most people start with a Recipe.
In this method, your SKILL.md file is a cookbook. It contains raw code (like complex curl commands) that the AI has to copy, paste, and fill in every single time.

The Problem: It's fragile. If the AI misses a quote or a bracket while copying, it breaks. Plus, doing complex things (like uploading video) takes 4 separate manual steps.

The better way is a Tool.
In this method, you still have a SKILL.md file, but it changes. It stops being a cookbook and becomes a simple Instruction Manual.

The Logic moves into a robust script (like linkedin.py).
The Instruction in SKILL.md becomes simple: "To post, just run python linkedin.py 'Hello World'".

Now, the AI doesn't have to "cook" the code; it just pushes a button. It's faster, cheaper (fewer tokens used), and 100% reliable.

Step 1: The Setup (LinkedIn Side) 💼

To post for you, your AI needs permission.

Go to LinkedIn Developers.
Create an App. Call it "My Personal Bot."
- Tip: If it asks for a "Company Page," just create a dummy one on LinkedIn. It takes 10 seconds.
Get your Keys. Look for the "Auth" tab. You want the Client ID and Client Secret.

⚠️ The 60-Day Gotcha:
LinkedIn tokens expire every 60 days. You will need to refresh this token manually every two months. It's a security feature, not a bug.

Step 2: The Script (The Magic Sauce) 🪄

We used to use complex curl commands. We switched to Python because it handles the errors for us.

The Old Way (The "Curl" Mess)

To upload a video manually, your AI had to run 4 separate commands:

POST /assets?action=registerUpload (Get a URL)
PUT <uploadUrl> (Send bytes - hope it works!)
POST /ugcPosts (Publish - hope the ID matches!)
Handle errors manually if any step fails.

The New Way (Python)

We wrap all that complexity in a single function. Here is the actual logic inside linkedin.py:

def create_post(token, text, video_path=None):
    # 1. Handle Video (if exists)
    asset_urn = None
    if video_path:
        print(f"Uploading video: {video_path}...")
        # The script handles the registration & byte upload automatically
        upload_url, asset_urn = register_upload(token, "video")
        upload_file(upload_url, video_path)

    # 2. Publish
    payload = {
        "author": person_urn,
        "specificContent": {
            "com.linkedin.ugc.ShareContent": {
                "shareCommentary": {"text": text},
                "media": [{"media": asset_urn}] if asset_urn else []
            }
        },
        "visibility": {"com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"}
    }
    requests.post(API_URL, json=payload)

Now, your OpenClaw agent just runs one command:

python3 skills/linkedin/linkedin.py "Check out my demo" --video demo.mp4

Zero friction. Zero hallucinations.

Final Thoughts

Building your own skills is the only way to be a "True AI Native." You control the code, you control the data, and you sleep better at night knowing no random hacker script is running on your laptop.

Stay safe, build cool stuff.

Collaboratively built with **Coke* 🥤 (OpenClaw Assistant)*

The 2026 Developer's Guide to Free Google Cloud Credits (For AI & Side Projects)

Behram — Fri, 06 Feb 2026 21:14:54 +0000

If you’re a beginner or developer who wants to pursue a career in AI in 2026, you can’t ignore the relationship between Large Language Models (LLMs) and cloud computing, because they’re inextricably linked.

This is not just about "saving money" on hosting. This is your perfect opportunity to learn enterprise-grade cloud architecture at zero cost.

The Logic is Simple:
Cloud providers (Google, AWS, Azure) are fighting for market share in the AI era. They are subsidizing developers like us to build on their platforms.

My Advice:
Don't get bogged down in the underlying complexity immediately. Your goal should be to take a product from 0 to 1. If you don't understand the infrastructure code, ask AI. But get your hands dirty.

If you seriously spend the ~$2,300 in credits outlined below, your practical experience with Vertex AI, Firebase, and Cloud Run will put you ahead of 99% of your peers.

A Note on Anxiety

Stop letting social media hustle-culture panic you.

I have a double master's in Statistics and Data Science. I've been working full-time in AI since 2020. And honestly? It took me until this year to really feel like I understood the full end-to-end stack.

If pros take years, you are allowed to take months.
You don't need to "master AI overnight." You need a plan.

The 2026 Micro-Plan

Pick a Stack: Don't overthink it. (e.g., Next.js + Firebase + Vertex AI).
Solve One Small Problem: Every week, fix one tiny issue in your project.
Build for Yourself: Don't build to get rich. Build to learn. By the time summer hits, you'll have a portfolio piece while everyone else is still debating which framework is "dead."

Step 1: The $300 Free Trial (The Right Way)

Most people sign up, spin up a VM, forget about it, and let the credit expire. Don't do that.

The Offer: $300 USD credit.
Validity: 90 Days.
Link: cloud.google.com/free

This is your "sandbox." Use it to break things.

🚀 Critical Step: Activate "Tier 1"

This is the secret sauce most tutorials miss. When you sign up, you are often placed in a restricted "Free Trial" sandbox. You want to upgrade immediately.

How to do it:

Go to the Google Cloud Console Billing page.
Look for the banner that says "Activate" or "Upgrade".
Confirm your payment method.

Why do this?

🔓 Unlock Hardware: You gain access to GPUs and Windows Server instances.
🚀 AI Rate Limits: It significantly increases your quotas (RPM/TPM) for Gemini and Vertex AI models.
💰 It's Still Free: Upgrading does not wipe your $300 credit. Your usage still pulls from the free credit first. You are only charged if you burn through the $300 (or use services explicitly excluded from the trial).

Step 2: The Startup Program ($2,000+)

Once your 90 days are up, or you've built a Minimum Viable Product (MVP), you graduate to the Google for Startups Cloud Program.

The Tier: Bootstrap Tier.
The Offer: Up to $2,000 USD in credits.
Validity: Typically 1-2 years.
Link: cloud.google.com/startup

Requirements

To qualify for the Bootstrap tier, you generally need:

An early-stage project (unfunded/bootstrapped is fine).
A company website and a domain.
A working demo or code repository.
Tier 1 Activation: As mentioned above, your account must be linked to a valid payment instrument.

What to Spend It On?

Don't just buy VMs. Use the managed services that save you time:

Firebase: The cheat code for shipping apps fast (Auth, Database, Hosting all-in-one).
Vertex AI: Access Gemini Pro and Imagen directly via API without managing servers.
Cloud Run: Serverless container deployment.
Google Maps: You get a separate ~$200/month recurring credit for Maps Platform usage.

Summary

The era of "renting intelligence" is here. You have access to the same tools as billion-dollar companies.

Grab the $300.
Upgrade to unlock the real tools.
Build a prototype.
Apply for the $2,000 startup tier.

Start today. In six months, you'll be glad you did.

Stop Renting Intelligence: The Economics of Local LLMs & The Return of Ownership

Behram — Thu, 05 Feb 2026 16:31:33 +0000

Recently, local AI assistants have exploded. Tools like OpenClaw now let anyone run powerful AI agents on their own hardware—no cloud subscription required. Many people still don't understand what this actually means.

Some say big companies are panicking because everyone's buying Mac minis to run AI themselves. This isn't entirely true.

What big companies fear isn't you buying that machine. It's not even you canceling ChatGPT. What they really fear is this: the way compute power is consumed is changing from continuous payment to one-time ownership.

Let's step away from the technical perspective and look at this through a financial lens. Why might the rise of local compute power disrupt the most profitable business model of the internet over the past 20 years?

How "Rent-Seeking" Built Trillion-Dollar Empires

SaaS—Software as a Service—didn't become the foundation of tech's biggest companies because of advanced technology. It succeeded because of its perfect rent-seeking business model.

This model stands on three pillars.

Pillar One: Predictable Revenue. As long as you're locked into a subscription, next month's money is guaranteed. Wall Street loves this. Investors pay premium valuations for "recurring revenue" because it's reliable.

Pillar Two: Increasing Switching Costs. The longer you use the software, the more data you accumulate. The more dependent you become. The cost of leaving grows every month. You're not just a user—you're a hostage to your own data.

Pillar Three: The Data Feedback Loop. This is often overlooked, but it's the core of the model. Every time you use the software, you're helping the company train their models. For free. Your prompts, your documents, your patterns—all feeding back into their system.

So the essence of cloud-based AI isn't selling a service. It's collecting an intelligence tax. As long as you're using their software, you remain a digital tenant in their system.

What Local AI Actually Represents

In financial terms, this shift is simple: moving from operating expenses (OpEx) to capital expenditure (CapEx).

Cloud-based AI is like renting an apartment. You pay every month—that's the subscription fee. And you'll notice it gets more expensive the longer you stay. Price increases. New tiers. "Premium" features that used to be included.

Local AI is like buying property. You spend $1,000-1,500 once on hardware. After that, your marginal cost drops to nearly zero—just electricity.

Tools like OpenClaw make this concrete. You download an agent that runs entirely on your machine. It can access your local files, manage your tasks, integrate with your workflow. And unlike cloud AI, it doesn't phone home.

What big companies fear isn't one Mac mini. They fear compute power transforming from a service you must continuously rent into a private asset you own outright.

Once users taste the economics of ownership, the valuation logic of SaaS starts to crack.

What Big Tech Actually Loses

If this trend continues, what do cloud AI companies really lose?

Not just subscription fees. The data flywheel stops spinning.

When AI runs locally—processing your documents, your chats, your private files on your own hardware—the cloud never sees it. The feedback loop breaks. The training data dries up.

This matters because cloud AI's true moat was never the model itself. Models are becoming commoditized. Open-weights alternatives are closing the gap every month.

The real moat was whether users stayed locked into their servers. Whether you had to keep feeding the machine to use the machine.

When that lock gets picked, the moat runs dry.

The Honest Trade-Off

I don't want to over-hype local AI. It's not the right choice for everyone today.

If you need the most cutting-edge reasoning, the largest context windows, the lowest maintenance overhead—cloud AI is still the practical choice. Frontier models like Claude and GPT-4 maintain an edge on complex tasks. And some people genuinely prefer paying someone else to handle the infrastructure.

But the rise of local agents marks something important: a return of power.

It proves to the market that, if we choose, we don't have to be permanent tenants. We don't have to be data batteries. The option to own exists—and it's becoming more viable every month.

The Question

Here's what I want to leave you with:

If local AI reaches 80% of cloud AI's capability—good enough for most daily tasks—would you still pay rent every month? Or would you rather buy out your digital assistant once and own it forever?

The technical gap is closing. The economic math is shifting. The only question is whether you want to keep subscribing, or start owning.

The choice is yours.