DEV Community: AK DevCraft

Installing OpenClaw on Linux - 0$ Personal Agentic AI Assistant - Part 4

AK DevCraft — Mon, 01 Jun 2026 15:00:00 +0000

Introduction

Part 4 of the Zero Dollar AI Assistant series- Installing OpenClaw on Linux — Avoiding Every Trap. Part 1 covers the architecture. Part 2 covers Oracle Cloud setup. Part 3 covers Ollama and model selection.

OpenClaw's documentation assumes a Mac. The install script works on Linux, but several things that work automatically on Mac require manual intervention on a Linux server. This article documents every one of them, the right user to install as, the systemd service traps, the config file quirks, and the silent background token drain that will surprise you on day one.

Install as the Right User

This sounds trivial. It isn't.

The OpenClaw installer sets up a systemd user service, a service that runs under a specific user account, not as root. If you install as root, the systemd user service won't work correctly, the gateway won't auto-start on boot, and you'll spend time debugging problems that don't exist on a proper install.

Always install as a non-root user. On Oracle Cloud, that user is ubuntu.

Verify before installing:

whoami
# Must show: ubuntu

If you're currently root, exit and SSH back in directly as ubuntu:

exit  # if su'd to root
ssh -i ~/.ssh/id_rsa ubuntu@YOUR_PUBLIC_IP

Switching via su - ubuntu from a root session isn't sufficient; the systemd user session won't be properly initialized, and you'll get "systemd user services unavailable" errors during setup.

Installation

# use tmux — install may take several minutes
tmux new -s openclaw
curl -fsSL https://openclaw.ai/install.sh | bash

The installer will launch the onboarding wizard automatically on completion.

The Onboarding Wizard

Work through each step. Key selections for this stack:

AI Provider: Ollama
Ollama host: http://localhost:11434
Model: Select ollama/llama3.2:3b from the browse list — look for the entry showing (ctx 128K)
Search provider: Skip for now — or select a provider if you have a key. Tavily has a free tier at tavily.com.
Secret provider: No, secrets stored in openclaw.json directly is fine for a personal server
Skills: Skip all for now. Add after verifying the core setup works. One gotcha: the GitHub skill tries to install via Homebrew, which doesn't exist on Linux; skip it here and install the GitHub CLI via apt afterward.
Hooks: Skip
Hatch: Select "Hatch in terminal". This verifies that the gateway starts correctly before you exit the wizard

⚠️ On first hatch, OpenClaw reads BOOTSTRAP.md from the workspace and prompts you to have a setup conversation with the agent to define its identity and personality. This is intentional — but with a local 3B model, it's very slow. Skip it by creating the files manually (covered below).

Skip the Bootstrap Conversation

The bootstrap flow is designed for interactive use; it has a conversation with you to set the agent's name, personality, and preferences. With a local 3B model, this takes 10-15 minutes of slow back-and-forth.

Create the required files manually instead:

# Create SOUL.md - agent personality and preferences
cat > ~/.openclaw/workspace/SOUL.md << 'EOF'
# Soul

## Identity
- Name: Claw
- Vibe: Casual, helpful, direct
- Emoji: 🦞

## User
- Preferred responses: Concise, no fluff
- Direct answers preferred

## Preferences
- Keep responses brief
- No unnecessary preamble
EOF

# Create IDENTITY.md
cat > ~/.openclaw/workspace/IDENTITY.md << 'EOF'
# Identity
- Name: Claw
- Nature: AI assistant
- Vibe: Casual and helpful
- Emoji: 🦞
EOF

# Create USER.md
cat > ~/.openclaw/workspace/USER.md << 'EOF'
# User
- Name: Your name here
- Preferred address: Your name
EOF

# Delete the bootstrap file to clear the pending state
rm ~/.openclaw/workspace/BOOTSTRAP.md

Customise the files with your actual name and preferences. Once BOOTSTRAP.md is deleted, the gateway proceeds normally.

Post-Install Configuration

We don’t have to configure all the settings during onboarding; we can pick and configure as we go. That way, we don’t have to worry about having all the information that is needed, like api key or bot token, during onboarding.

1. Create a Telegram Bot and Configure the Channel

Create your bot via BotFather:

Open Telegram → search @botfather
Send /newbot
Choose a display name (e.g., My AI Assistant)
Choose a username — must end in bot (e.g., myassistant_bot)
BotFather replies with your token: 12346789:AAFxxx... — save this

Add the channel to OpenClaw via the wizard:

openclaw channels add

The wizard walks you through selecting Telegram, entering your bot token, and setting access policies. A few things to watch for during setup:

When asked about a secret provider — select No
When asked about DM access policy — select pairing (only your approved accounts can talk to the agent)
When asked to bind accounts to agents — select Yes

⚠️ The wizard may ask to install the Telegram plugin from npm — this may fail. Telegram is a bundled plugin. If you see this error, run openclaw plugins enable telegram first, then retry openclaw channels add.

2. Add Gemini Fallback

Add the Gemini API key you obtained in Part 3 to the systemd service:

sed -i '/Environment=OPENCLAW_SERVICE_KIND=gateway/a Environment=GEMINI_API_KEY=your_gemini_key_here' \
  ~/.config/systemd/user/openclaw-gateway.service

Configure Ollama as primary with Gemini as fallback:

openclaw config set agents.defaults.model.primary "ollama/llama3.2:3b"
openclaw config set agents.defaults.model.fallbacks '["gemini/gemini-2.5-flash-lite"]' --json

# Add available model list
openclaw config set agents.defaults.models '{"ollama/llama3.2:1b": {}, "google/gemini-2.5-flash-lite": {}}' --json

3. Enable Ollama Auto-Discovery

This tells OpenClaw to automatically detect all locally running Ollama models:

echo 'export OLLAMA_API_KEY=ollama-local' >> ~/.bashrc
source ~/.bashrc

sed -i '/Environment=OPENCLAW_SERVICE_KIND=gateway/a Environment=OLLAMA_API_KEY=ollama-local' \
  ~/.config/systemd/user/openclaw-gateway.service

4. Disable mDNS

Oracle Cloud doesn't support mDNS/Bonjour, leaving it enabled causes repeated crash-restart cycles:

openclaw gateway stop
openclaw config set discovery.mdns.mode off
openclaw config set discovery.wideArea.enabled false

5. Apply Low-Power Optimizations

These reduce startup time and memory overhead:

mkdir -p /var/tmp/openclaw-compile-cache
echo 'export NODE_COMPILE_CACHE=/var/tmp/openclaw-compile-cache' >> ~/.bashrc
echo 'export OPENCLAW_NO_RESPAWN=1' >> ~/.bashrc

sed -i '/Environment=OLLAMA_API_KEY/a Environment=NODE_COMPILE_CACHE=/var/tmp/openclaw-compile-cache' \
  ~/.config/systemd/user/openclaw-gateway.service
sed -i '/Environment=NODE_COMPILE_CACHE/a Environment=OPENCLAW_NO_RESPAWN=1' \
  ~/.config/systemd/user/openclaw-gateway.service

source ~/.bashrc

6. Add Gateway Token to CLI

The CLI needs the gateway token to authenticate against the running service:

# Get your token
python3 -c "import json; import os;d=json.load(open(os.path.expanduser('~/.openclaw/openclaw.json'))); print(d['gateway']['auth']['token'])"

# Add to bashrc
echo 'export OPENCLAW_TOKEN=your_token_here' >> ~/.bashrc
source ~/.bashrc

7. Reload and Restart

systemctl --user daemon-reload
openclaw gateway start

The Silent Token Drain

This is the most important thing in this article.

OpenClaw has a background process that runs every 30 minutes — sweepStaleRunContexts which cleans up expired agent sessions. On a fresh install with default settings, this sweep triggers a Claude API call each time it runs via the startupContext feature, which loads daily memory on every session reset.

The result: if you're using the Anthropic API (not Ollama), you'll see API calls every 30 minutes even when nobody is using the assistant. Left overnight, this consumed $2-3 in API credits silently.

Disable startup context to prevent this:

openclaw gateway stop
openclaw config set agents.defaults.contextInjection.startupContext.enabled false
openclaw gateway start

ℹ️ If you're using Ollama as your primary model with Gemini as a fallback (as set up in this series), this is less of a concern — the sweep uses your local model at zero cost. But if you ever switch to an Anthropic API key as primary, disable this first.

Verifying the Installation

1. Check gateway status:

openclaw gateway status

A healthy output shows:

Connectivity probe: ok
Capability: read-only
Runtime: running

2. Check if the Ollama model has been discovered:

openclaw models list

Should show ollama/llama3.2:3b as configured.

3. Check logs in real time:

openclaw logs --follow

# You can update logging level as below
openclaw config set logging.level "info"

Common Issues

systemd user services unavailable
You installed as root or switched via su. Exit completely and SSH back in directly as the ubuntu user. A proper login session is required for systemd user services.
Gateway starts, but Telegram is not connecting
The bot token isn't in the config, or the IPv6 issue is causing timeouts. Verify:

grep -i telegram ~/.openclaw/openclaw.json
curl -4 https://api.telegram.org/botYOUR_TOKEN/getMe

CIAO PROBING CANCELLED crash
mDNS is not disabled. Follow step 4 in the post-install configuration above.
Config changes disappear after gateway restart
The gateway rewrites openclaw.json on startup. Always stop the gateway before editing the config:

openclaw gateway stop
# make edits
openclaw gateway start

Model times out in agent mode but works in ollama run
OpenClaw's agent mode adds tool context, memory, and session history to every request — significantly heavier than a bare model call. Switch to llama3.2:3b if using a larger model, or add Gemini as a fallback to catch timeouts.
HTTP 401 on CLI commands
The OPENCLAW_TOKEN environment variable isn't set. Add it to ~/.bashrc (see step 6 above).

What's Next

Gateway running, Telegram configured, Ollama as primary, Gemini as fallback — all the pieces are in place.

Part 5 and the final part cover pairing your Telegram account, testing the end-to-end flow, and making the whole setup production-ready for daily use.

This article is the fourth in a five-part series:

$0 Personal Agentic AI Assistant - Architecture
Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas
Running Ollama on ARM — model selection, disk management, CPU inference, reality
Installing OpenClaw on Linux — avoiding every trap ← you are here
The Complete Setup — Telegram, end-to-end testing

Stay tuned, all links will be updated as articles are published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Running Local LLM - 0$ Personal Agentic AI Assistant - Part 3

AK DevCraft — Mon, 25 May 2026 15:00:00 +0000

Introduction

Part 3 of the Zero Dollar personal AI Assistant series, running Local LLMs on a Free Cloud Server — What Actually Works. Part 1 covers the architecture. Part 2 covers free Oracle Cloud setup.

Running a language model locally sounds straightforward until you try it. Download a model, point your app at it, done. In practice, there are real constraints: RAM limits, disk-space surprises, and CPU inference-speed walls that most tutorials gloss over.

This article is honest about all of it. What works on a free Oracle ARM instance, what doesn't, and how a hybrid local + free API fallback makes the whole thing practical.

The CPU Inference Reality Check

Before picking a model, understand what you're getting into.

Your Oracle ARM instance has no GPU. Every token generated by a language model runs on CPU cores. This matters because modern LLMs were designed to run on a GPU, the parallel processing architecture that makes inference fast. On the CPU, that parallelism doesn't exist in the same way.

What this means in practice:

Model size	RAM needed	Tokens/sec on 4 ARM CPUs	Response time (100 tokens)
3B parameters	~2GB	15-25 tok/s	4-7 seconds
8B parameters	~5GB	5-10 tok/s	10-20 seconds
14B parameters	~9GB	2-5 tok/s	20-50 seconds
70B parameters	~40GB	Won't fit	—

For a personal assistant responding to Telegram messages, 4-7 seconds for a short response is acceptable. You send a message, put your phone down, and pick it up to respond. Different mental model from a real-time chat UI, but workable.

The mistake to avoid: pulling a 70B model because it benchmarks well. It needs 40GB RAM minimum and simply won't run on your instance. I learned this the hard way: a partial 42GB download filled the disk before the model even ran.

Installing Ollama

Ollama is the runtime that downloads and runs open-source models locally. Think of it as the music player; the models are the music it plays.

Always use tmux before long-running commands:

sudo apt install tmux -y
tmux new -s setup

If your SSH session drops mid-install, reconnect and tmux attach -t setup to pick up exactly where you left off. Not using tmux for a bigger size model download is how you end up restarting from scratch.

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Verify it's running:

systemctl status ollama
ollama --version

Ollama installs as a systemd service and starts automatically on boot, no manual management needed.

Model Selection

This is where most guides give you a benchmark table and call it done. What actually matters for your use case is the RAM-to-quality tradeoff on CPU hardware.

The models that make sense for this stack:

Llama 3.2:3B — The Speed Choice

ollama pull llama3.2:3b

RAM: ~2GB
Speed: 15-25 tokens/second — fastest option
Quality: Good for everyday tasks, struggles with complex reasoning
Made by: Meta
Best for: Quick responses, simple Q&A, drafting short content

Llama 3.1:8B — The Quality Choice

ollama pull llama3.1:8b

RAM: ~5GB
Speed: 5-10 tokens/second
Quality: Significantly better reasoning, handles nuanced tasks
Made by: Meta
Best for: More complex tasks where quality matters more than speed

Phi-4:14B — The Reasoning Choice

ollama pull phi4

RAM: ~9GB
Speed: 2-5 tokens/second — noticeably slower
Quality: Strong reasoning and instruction following, punches above its weight
Made by: Microsoft
Best for: Tasks requiring careful reasoning, analysis, and structured output

The recommendation for this stack: llama3.2:3b

Not because it's the best model, it isn't. But because OpenClaw's agent mode wraps every model call with tool context, memory, session history, and system prompts. What feels fast in a bare ollama run test becomes significantly slower when the agent layer adds 2-3KB of context to every request. With that overhead, the 3B model stays within acceptable response times. The 8B model starts hitting timeout issues in agent mode on the CPU.

If you want better quality and can accept 30-90 second response times for complex queries, llama3.1:8b is worth trying.

Disk Space Management

Model files are large. Managing disk space proactively saves painful cleanup sessions later.

Check your current disk usage:

df -h
du -sh /usr/share/ollama/.ollama/models/

List downloaded models:

ollama list

Remove a model you no longer need:

ollama rm <modelname>

The gotcha with partial downloads:

If a download fails or you cancel it, Ollama leaves a partial file in the blobs directory. These can be gigabytes in size and won't show up in ollama list. Check and clean manually:

# Stop Ollama first
sudo systemctl stop ollama

# Remove as ollama user (files are owned by this user)
sudo -u ollama rm -rf /usr/share/ollama/.ollama/models/blobs/*

# Restart
sudo systemctl start ollama

If the disk fills and growpart fails with "no space left on device", you need to free space before the partition can be extended, even growing the volume requires temp space. Remove partial downloads first, then retry growpart.

The Hybrid Architecture: Local + Gemini Fallback

Here's the truth about local-only inference for an AI assistant: it works, but has a quality ceiling. The 3B model handles most everyday tasks fine. But occasionally, a complex question, a nuanced writing task, something that requires real reasoning, either produces a weak response or times out entirely.

The solution: use the local model as the primary and Google's Gemini API as a free fallback.

Why Gemini free tier works here:

250K Tokens Per Minute (TPM) on the free tier — more than enough for one person
No credit card required
Gemini 2.5 Flash lite responds in 1-2 seconds
When the local model times out, Gemini catches it automatically

The flow:

Your message
     ↓
Ollama llama3.2:3b (primary)
     ↓ if timeout or failure
Gemini 2.5 Flash (fallback) ← free, fast, no card needed
     ↓
Response to Telegram

Most responses come from the local model at zero cost. Complex queries or timeouts fall through to Gemini, also at zero cost. The experience from your phone is just: you send a message, you get a response.

Get Your Gemini API Key

Go to aistudio.google.com
Click Get API Key → Create API key
Copy the key — it may start with AIza...

No credit card, no billing setup. Takes two minutes.

Verifying Everything Works

Check RAM usage while model is loaded:

free -h

With llama3.2:3b loaded, you should see ~2-3GB used out of 24GB, plenty of headroom for OpenClaw and everything else.

Check Ollama has auto-started:

systemctl status ollama

Should show active (running). The model itself loads into RAM only when first called and then stays resident for subsequent calls, which is why the first response after a reboot takes longer than subsequent ones.

Test Ollama directly:

ollama run llama3.2:3b "Just Reply OKAY!"

Should respond in under 10 seconds. If it takes longer, something is wrong with the Ollama service.

Test Gemini Model API Call

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent" \
  -H 'Content-Type: application/json' \
  -H 'X-goog-api-key: API_KEY' \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Just! Reply OKAY"
          }
        ]
      }
    ]
  }'

HTTP Response status code should be 200 along with response text, and you should see the call log in your Google Studio - Logs

Common Issues

model requires more system memory than is available You pulled a model too large for your RAM. llama3.3 requires 40GB — it will never run on a 24GB instance. Remove it and pull a smaller model:

ollama rm llama3.3
ollama pull llama3.2:3b

Disk full during model download
The download filled your boot volume. Stop Ollama, remove partial files as the ollama user (not root), free space, then extend the partition if needed via Oracle Console → Boot Volume resize.
Ollama slow after reboot
The first call after a reboot loads the model into RAM, expected. Subsequent calls are faster since the model stays resident.

What's Next

With Ollama running and your hybrid local + Gemini fallback configured, the AI layer is ready.

Part 4 will cover installing OpenClaw on Linux — the right user, systemd service setup, the config file traps, and every mistake worth avoiding so you don't have to make them yourself.

This article is the third in a five-part series:

$0 Personal Agentic AI Assistant - Architecture
Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas
Running Ollama on ARM — model selection, disk management, CPU inference, reality ← you are here
Installing OpenClaw on Linux — avoiding every trap
The Complete Setup — Telegram, Gemini fallback, end-to-end testing

Stay tuned, all links will be updated as articles are published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Setting Up Free Cloud Server - $0 Personal Agentic AI Assistant - Part 2

AK DevCraft — Mon, 18 May 2026 14:00:00 +0000

Introduction

This is Part 2 of the Zero Dollar AI Assistant series. Part 1 - Running a Personal AI Assistant for $0 - Architecture covers the full stack and why it works.

AWS, Google Cloud, and Azure have a free tier. However, they all have one thing in common: a clock ticking. Twelve months, then the bill starts.

Oracle Cloud is different. Their Always Free Tier has no expiry date. The server you provision today will still be running and free, as long as your account stays active. The benefit also includes the block storage capacity and its free tier. That's the foundation this entire stack is built on.

This article walks you through setting up that foundation: creating a properly configured Oracle Cloud account, provisioning an ARM instance with the right specs, and solving every gotcha along the way.

Understanding Oracle's Always Free Tier

Before touching the console, know what you're getting into:

Always Free compute:

Up to 4 ARM CPU cores and 24GB RAM total across ARM instances — permanently free
2x VM.Standard.E2.1.Micro (1 OCPU, 1GB RAM each) — also permanently free x86 instances, but avoid these for this stack — 1GB RAM is far too tight for Ollama
These are not trial credits. They don't expire.

Always Free storage:

200GB total block storage
Enough for your server's boot volume with room to spare

What costs money:

Going beyond 4 OCPU / 24GB RAM on ARM
Reserved public IPs that aren't attached to a running instance
Outbound data beyond 10TB/month — in practice, a personal AI assistant will never get close to this

The shape you want is VM.Standard.A1.Flex — Oracle's ARM instance. Use the full free allocation: 4 OCPU and 24GB RAM. This gives Ollama maximum headroom and keeps the system responsive under load.

Account Setup

Free Tier vs Pay As You Go

Start by creating an Oracle Cloud account at cloud.oracle.com. You'll be on the Free Tier by default.

Here's the catch: ARM capacity on the free tier is frequently exhausted. You'll often see "Out of Capacity" errors across all availability domains. The fix is upgrading to Pay As You Go (PAYG).

PAYG sounds alarming, but it isn't:

Always Free resources remain free — your ARM instance still costs $0
You only pay if you exceed free tier limits
Oracle places a temporary $100 authorization hold on your card when you upgrade — this is a verification hold, not a charge, and disappears within 3-5 business days

Upgrade via: Billing → Upgrade and Manage Payment

Set a budget alert immediately after upgrading:

Billing → Budgets → Create Budget
Amount: $1
Alert at 50% and 100%

This ensures you're notified before any real spend happens. In practice, for this setup, you'll never see the alert trigger. Below is the screenshot of the cost incurred for May for my account, which is $0.

Creating the VCN

Before launching an instance, you need a Virtual Cloud Network (VCN) with internet connectivity. This is where most people hit their first obstacle.

The critical mistake to avoid: Don't use the blue "Create VCN" button on the VCN list page. It creates a bare VCN with nothing attached — no subnets, no internet gateway, no routing. You'll end up with an instance that has no public IP and no way to SSH in.

The correct path — VCN Wizard:

Option A — From the Console home page:

Look for "Set up a network with a wizard" in the build section
Click it

Option B — From the VCN list page:

Hamburger (☰) → Networking → Virtual Cloud Networks
Look for "Start VCN Wizard" — it's separate from the "Create VCN" button

Either way, select "Create VCN with Internet Connectivity" and proceed. Give it a name (e.g., openclaw-vcn works), leave CIDR defaults, and click through.

The wizard automatically creates:

Public subnet with internet routing
Private subnet
Internet Gateway
NAT Gateway
Route tables and security lists — all pre-wired

Add SSH Access

After the VCN is created:

Go into your VCN → Subnets → click the Public Subnet
Click Default Security List → Add Ingress Rules
Add:
- Source CIDR: 0.0.0.0/0
- Protocol: TCP
- Destination port: 22
Save

Provisioning the Instance

Hamburger (☰) → Compute → Instances → Create Instance

Work through each section:

Image

Click Change Image and look carefully. Oracle lists ARM and x86 variants of Ubuntu side by side with nearly identical names:

Canonical Ubuntu 24.04 → x86 ❌
Canonical Ubuntu 24.04 Minimal aarch64 → ARM ✅

The aarch64 or Minimal aarch64 variant is what you want. Selecting the wrong one produces an "incompatible settings" error when combined with the ARM shape.

Recommended: Canonical Ubuntu 24.04 Minimal aarch64

The Minimal image strips unnecessary packages — smaller attack surface, less memory overhead, cleaner slate. Install any tools you need explicitly.

Shape

Shape: VM.Standard.A1.Flex
OCPUs: 4
Memory: 24GB

If you see "Out of Capacity":

Try each availability domain (AD-1, AD-2, AD-3)
Set fault domain to "No preference."
Try a different region
If all else fails, this is the primary reason to upgrade to PAYG — capacity is significantly more available

Networking

Select your openclaw-vcn
Select the public subnet
Check "Assign a public IPv4 address" ← easy to miss, critical

Boot Volume

Check "Specify a custom boot volume size" and set it to 100GB.

The default 46.6GB disappears quickly once Ubuntu, Node.js, Ollama, and a 5GB model are installed. 100GB stays well within your 200GB free storage allocation.

Leave in-transit encryption and confidential computing unchecked — both add overhead with no meaningful benefit for this use case.

SSH Keys

Generate a key pair or upload your existing public key. Alternatively, you can download and save the private key — you'll need it every time you SSH in.

Click Create and wait 2-3 minutes for provisioning.🕛

Initial Server Configuration

Use the previously configured SSH key to SSH into the Oracle cloud server you just provisioned. SSH from your laptop. Get the Public IP from your instance provisioned.

SSH in:

chmod 400 ~/.ssh/id_rsa
ssh -i ~/.ssh/id_rsa ubuntu@YOUR_INSTANCE_PUBLIC_IP

NOTE: Run these steps in order — don't skip them

System Update

sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential curl wget tmux

Install tmux immediately — you'll use it for every long-running command. Starting a download without tmux and having SSH disconnect is a painful lesson.

Disable IPv6

Oracle Cloud's network doesn't support IPv6 for outbound connections in most configurations. OpenClaw's Telegram integration will repeatedly timeout and fail if IPv6 is active:

echo 'net.ipv6.conf.all.disable_ipv6 = 1' | sudo tee -a /etc/sysctl.conf
echo 'net.ipv6.conf.default.disable_ipv6 = 1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Verify it worked:

curl -6 https://api.telegram.org 2>&1 | head -3
# Should fail immediately — that's correct
curl -4 https://api.telegram.org 2>&1 | head -3
# Should succeed

Install Node.js

OpenClaw requires Node.js 22+. Install via NVM for flexibility:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
source ~/.bashrc
nvm install 22
nvm use 22
nvm alias default 22
node --version  # should show v22.x.x

Also install system-level Node (used by OpenClaw's systemd service):

curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs

Useful Aliases

echo "alias clearmem=\"sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'\"" >> ~/.bashrc
source ~/.bashrc

clearmem clears the page cache when memory gets tight. You'll use it occasionally, even on 18GB RAM.

Verifying the Setup

Before moving on to Ollama and OpenClaw installation, confirm everything is healthy:

# Verify ARM architecture
uname -m
# Expected: aarch64

# Verify disk space
df -h
# /dev/sda1 should show ~96GB available

# Verify Node.js
node --version
# Should show v22.x.x

# Verify IPv6 is disabled
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
# Should show: 1

# Verify internet connectivity
curl -s https://api.telegram.org | head -3
# Should respond

If all five pass, your foundation is set up. You're ready for Part 3 — installing Ollama and picking the right model for your hardware.

Common Issues and Fixes

This instance has incompatible settings
You selected the x86 Ubuntu image with the ARM shape. Go back to image selection and choose the aarch64 variant.
Out of Capacity for VM.Standard.A1.Flex
Try all three ADs, set fault domain to "No preference", and consider upgrading to PAYG. Capacity is the single most common blocker for new Oracle free-tier accounts.
Instance created but no public IP showing
The public subnet wasn't selected during creation, or "Assign a public IPv4 address" wasn't checked. Easiest fix: terminate and recreate with the correct networking settings.
SSH connection refused
The SSH ingress rule wasn't added to the security list. Go to VCN → Public Subnet → Default Security List → add TCP ingress on port 22.
SSH times out after a few minutes of inactivity
Add to ~/.ssh/config on your local machine:

Host *
  ServerAliveInterval 60
  ServerAliveCountMax 10

What's Next

With your Oracle Cloud instance running, IPv6 disabled, and Node.js installed, the infrastructure layer is done.

Part 3 of this zero-dollar AI Assistant series will cover installing Ollama on ARM, choosing the right model for CPU inference, managing disk space with large model files, and the hybrid local + Gemini fallback architecture that makes the whole thing actually usable.

This article is the second in a five-part series:

$0 Personal Agentic AI Assistant - Architecture
Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas ← you are here
Running Ollama on ARM — model selection, disk management, CPU inference reality
Installing OpenClaw on Linux — avoiding every trap
The Complete Setup — Telegram, Gemini fallback, end-to-end testing

Stay tuned, all links will be updated as articles are published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

$0 Personal Agentic AI Assistant - Architecture - Part 1

AK DevCraft — Mon, 11 May 2026 14:30:00 +0000

Introduction

A productivity tool that promised to change everything, charged monthly, and quietly became background noise. AI assistants are going the same way — another tab, another login, another $20/month for something you open twice a week. Probably, most of us regret the subscription that we have today.

Welp! What if you didn't have to?

In today’s world, the infrastructure exists to run a capable, always-on personal AI assistant, one that lives in your day-to-day regular apps like Telegram or WhatsApp, remembers you, browses the web, and handles real tasks — for exactly zero dollars a month. Not a trial. Not a teaser. Permanently free, on infrastructure you control.

This article explains the architecture that makes it possible and why each piece matters.

The Subscription Trap

Most people's AI setup looks like this: Claude.ai, ChatGPT, or any other AI providers in a browser tab or mobile app, opened when needed, closed when done. Conversations are saved, and you can go back to what you discussed last time if you're in the same thread. But it's passive history, not active memory. You have to go and find it. And across that whole time, it couldn't reach out, take action, or do anything unless you opened it first.

That's not an assistant. That's a very smart search box.

A real assistant is always on. It knows who you are. It operates in the apps you already use. It can take actions, not just generate text, and it doesn't charge you for existing.

Until recently, building that required either paying cloud AI bills or owning serious hardware. Both most likely out of reach for most people. However, that can be changed.

Three Shifts That Can Make This Possible

1. Open-weight models are now genuinely capable

Meta's Llama, Google's Gemma, and others have closed the gap with proprietary models significantly over the past few years. A 3-8 billion parameter model running locally can handle the majority of everyday tasks like summarising, drafting, answering questions, and light reasoning, that people actually use AI assistants for day-to-day.

2. Cloud providers offer permanently free compute

Oracle Cloud's Always Free tier gives you up to 4 ARM CPU cores and 24GB of RAM — permanently, with no expiry date. Not a 12-month trial like AWS. Not credits that run out. A real server running 24/7 at zero cost, forever, as long as you keep the account active.

That's enough to run Ollama with a capable local model.

3. Free API tiers have become genuinely useful

Google's Gemini 2.5 Flash-Lite is generally capped at 250K Tokens Per Minute (TPM) on the free tier with no credit card required. For a personal assistant handling one person's queries, that's more than enough headroom. When a local model is too slow or too limited for a task, Gemini catches it — for free.

Put these three things together, and the economics change completely.

The Architecture - Tech Stack

Oracle Cloud ARM Instance — your always-on server. 4 CPU cores, 24 GB RAM, permanently free. Hosts everything. Never sleeps, never charges.
Ollama — runs open-source language models locally on your server. No API calls, no cost, no data leaving your machine. The primary brain is for most tasks.
Gemini API (free tier) — Google's fallback for when the local model is too slow or hits a complex task. 1,000 free requests per day—no credit card.
OpenClaw — The agent layer that ties everything together. Connects to Telegram, maintains memory across conversations, runs scheduled tasks, and routes requests between local and cloud models intelligently.
Tavily - Native AI search engine
MCP - Model Context Protocol server, bridge to fetch data from backend service/engine

What It Can Actually Do

This isn't just a toy setup. On this stack, you get:

Telegram access — message your agent from your phone, anywhere, like texting a person
Persistent memory — it remembers your preferences, ongoing projects, and past conversations
Web search — real-time search via Tavily's free tier integrated directly into responses
File operations — read, write, and summarise documents on the server
GitHub integration — search issues, review code, summarise pull requests
Scheduled tasks — set reminders, recurring summaries, automated workflows
Custom agents — define specialised subagents for specific tasks (code review, research, writing)

What it can't do as well as a paid service: complex multi-step reasoning at speed, very long document analysis, and tasks that push the limits of a 3B parameter model. For those, the Gemini fallback steps in.

The Honest Tradeoffs

Zero cost doesn't mean zero compromise. Know what you're getting into:

Speed — local CPU inference is slower than cloud APIs. A response that takes a few seconds on Claude.ai might take > 30 seconds locally. With Gemini as a fallback, complex tasks are fast. Simple tasks on the local model are slow but free.
Quality ceiling — a 3B local model is noticeably less capable than Claude Sonnet or GPT-4. For writing, summarisation, and Q&A, it's fine. For nuanced reasoning or complex code, it shows limitations.
Setup effort — this is not a five-minute install. There are VCN configurations, systemd services, API keys, and model downloads involved. It takes an afternoon to set up correctly. Once running, it requires minimal maintenance.
Oracle ARM capacity — Oracle's free ARM instances are in high demand. You may need to retry provisioning multiple times or upgrade to Pay As You Go (which still costs $0 for Always Free resources) to get reliable access.

Who This Is For

It makes sense if:

You're comfortable with a terminal and basic Linux
You want AI infrastructure you actually control
You're experimenting and don't want ongoing costs
You're comfortable with slower responses in exchange for zero cost

It doesn't make sense if:

You need production-grade reliability
Response speed is critical
You want a turnkey experience with no configuration
You'd rather pay $10-20/month for something that just works

For the right person, this is the most interesting AI setup you can build right now. Not because it beats the paid alternatives on any individual metric, but because it's yours — running on your server, with your data, on your terms, for nothing, and most importantly, your private data on your laptop is far away from accidentally being exposed.

What's Next

This article is the first in a five-part series:

The Architecture ← you are here
Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas
Running Ollama on ARM — model selection, disk management, CPU inference reality
Installing OpenClaw on Linux — avoiding every trap
The Complete Setup — Telegram, Gemini fallback, end-to-end testing

Stay tuned, all links will be updated as articles are published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Subagents: The Building Block of Agentic AI

AK DevCraft — Mon, 27 Apr 2026 14:30:00 +0000

Problem to Solve

Most developers' first encounter with AI is a single prompt, a single response. It feels powerful — until the task gets complex. Ask an AI to research three competitors, synthesize the findings, and format them as a report, and a single context window starts to feel very small. This is the problem subagents solve.

Before you move forward - Here is my new article on my agentic setup Running a Personal AI Assistant for $0 - Part 1 - Architecture

Demo

In the demo above, I'm invoking the agent explicitly by name. In orchestrated workflows, Claude can invoke multiple agents like this automatically — in parallel or sequentially — based on the task structure.

Skills GitHub Repo - .claude/agents/code-quality-reviewer.md

What Is a Subagent?

A subagent is an AI instance invoked by an orchestrating AI to handle a specific subtask within a larger workflow. In multi-agent systems broadly — whether built on Claude, GPT, Gemini, or open-source models — the core pattern is the same: rather than a single model doing everything sequentially, an orchestrator breaks work into pieces and delegates them. Much like a technical lead assigning work to specialists rather than writing every function themselves.

According to Anthropic's Claude Agent SDK documentation, subagents serve two core purposes: parallelization (running multiple tasks simultaneously) and context isolation (each subagent uses its own context window, returning only relevant results to the orchestrator rather than its full context).[^1]

Within their assigned scope, subagents are active execution units — they can browse the web, execute code, read and write files, and call external APIs. They don't just reason; they act.

How a Subagent Workflow Works

The orchestrator doesn't do the heavy lifting — it coordinates. Each subagent receives a focused prompt with a clear objective, output format, and tool access, then returns a concise result. The orchestrator aggregates those results into the final deliverable.

Anthropic's internal research system uses exactly this pattern: a lead agent spawns subagents to explore different aspects of a query in parallel, then compiles their findings into a coherent answer. Their evaluations found this approach outperformed a single Claude Opus 4 by 90.2% on internal research benchmarks.[^2]

Orchestration Patterns

Not all subagent workflows are structured the same way. Three patterns cover most real-world cases:

Parallel fan-out — Independent subtasks launch simultaneously. Best for tasks like analyzing multiple documents at once.
Sequential pipeline — Each subagent's output feeds the next. Best when there's a dependency chain (research → draft → edit → format).
Hierarchical delegation — A subagent itself becomes an orchestrator for deeper subtasks. Powerful, but adds coordination complexity.

Choosing the wrong pattern is a common mistake. Parallelizing a sequential task adds overhead without benefit; sequentializing an independent task wastes time.

Creating Subagents with Claude Code CLI

Claude Code gives you two ways to create subagents: interactively via the /agents command, or manually as markdown files. Both result in the same thing — a .md file in a .claude/agents/ directory.

The Interactive Way

/agents create

This walks you through a guided setup: name, description, tools, model, and scope. At the end, it saves the file, and the agent is available immediately.

The Manual Way

Create a markdown file directly — the frontmatter defines behavior, the body is the system prompt:

---
name: security-reviewer
description: "Expert security reviewer. Use PROACTIVELY after any changes to auth, data handling, or API endpoints."
tools: Read, Grep, Glob
model: haiku
permissionMode: plan
---

You are a senior security engineer reviewing code for vulnerabilities.
When invoked:
1. Identify recently changed files
2. Analyze for OWASP Top 10 vulnerabilities
3. Check for secrets, SQL injection, and hardcoded credentials
4. Report findings with severity levels and remediation steps

Where the File Lives

Subagent scope is determined by which directory the file is placed in:

Scope	Path	When to use
Project	`.claude/agents/` in project root	Team-shared agents, commit to version control
User	`~/.claude/agents/` in home dir	Personal agents available across all projects

Project scope is the recommended default — it makes subagent definitions shareable via version control. Use user scope for general-purpose agents you want available everywhere, regardless of which repo you're in.

Best Practices

Write descriptions that trigger correctly. Claude uses the description field to decide when to invoke a subagent automatically. Be specific and include PROACTIVELY if you want it auto-triggered — for example: "Use PROACTIVELY after any changes to authentication or data handling."
Restrict tools intentionally. The tools field restricts what the agent can do — a security auditor only needs Read, Grep, and Glob, and has no business writing files. That restriction is worth being explicit about.
Match model to task complexity. Route subagent exploration to cheaper, faster models like Haiku and reserve Opus for genuine architectural reasoning. A read-only code scanner doesn't need the same model as an agent writing production code.
Keep system prompts focused. A subagent with a narrow, well-defined role outperforms a generalist one. If the prompt starts covering many different concerns, split it into two agents.

The Practical Challenge: Context Management

Each subagent starts fresh with no shared memory. This means:

The orchestrator must craft every subagent prompt with all the context it needs to succeed
Subagent outputs must be concise enough to fit back into the orchestrator's context alongside everything else
If outputs are large, the orchestrator must summarize before aggregating

Good subagent design is largely information architecture: what does each agent need to know, what must it produce, and how does that output flow back into the whole.

Safety Considerations

When subagents can take real-world actions, safety boundaries matter more than in single-agent systems:

Least privilege — Give each subagent only the tools it actually needs. A research agent doesn't need write access to a production database.
Output validation — Don't blindly pass subagent outputs downstream. Even lightweight sanity-checking reduces blast radius.
Prompt injection — Subagents that browse the web or read external files can encounter content designed to manipulate their behavior. This is a real attack surface in agentic systems.

When to Use Subagents

Use subagents when…	Avoid subagents when…
Task has clearly separable subtasks	Task is straightforward for one agent
Parallel execution saves meaningful time	Subtasks share too much state to delegate
Total work exceeds one context window	Coordination overhead exceeds the benefit
Different subtasks need different tools	You're still prototyping — stay simple first

Start with a single-agent approach and introduce orchestration when it genuinely starts to strain. Complexity has a cost.

Subagents vs. Skills: A Quick Note

These two terms sometimes get conflated, but they operate at completely different layers. A skill is a passive instruction document — a markdown file Claude reads before a task to understand best practices, available libraries, and output conventions. A subagent is an active execution unit that runs, uses tools, and returns results.

The honest relationship: a subagent might read a skill before doing its work. One shapes knowledge; the other executes.

Wrapping Up

The broader Claude stack can be conceptualized as five layers: MCP for connectivity, Skills for task-specific knowledge, Agent as the primary worker, Subagents as parallel independent workers, and Agent Teams for coordination.[^3] These building blocks are shipping in rapid succession, and the pattern is maturing fast.

For developers just entering this space: don't need to build full orchestration systems on day one. But understanding the pattern — how delegation works, what subagents can and can't do, where the sharp edges are — will shape how you think about AI architecture from the start.

Subagents aren't a feature. There's a shift in how we think about what AI can be tasked with doing.

I have also shared my experience with MCP client-server project, here is the article Working with MCP: What Stood Out

References

[^1]: Anthropic Engineering. Building agents with the Claude Agent SDK. -> https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
[^2]: Anthropic Engineering. How we built our multi-agent research system. -> https://www.anthropic.com/engineering/multi-agent-research-system
[^3]: Winbuzzer. Anthropic Shows How to Scale Claude Code with Subagents and MCP. -> https://winbuzzer.com/2026/03/24/anthropic-claude-code-subagent-mcp-advanced-patterns-xcxwbn/
[^4]: Anthropic. Create custom subagents — Claude Code Docs. -> https://code.claude.com/docs/en/sub-agents

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

“Skills” in Claude Aren’t About Prompts — They’re About Context Design

AK DevCraft — Mon, 13 Apr 2026 16:58:59 +0000

Introduction

When I first came across “Skills” in Claude Code, it looked like just another abstraction over prompts. I kept repeating the same instructions to Claude — code review rules, API patterns, formatting — across multiple sessions.

But after spending some time with it, the interesting part wasn’t what skills are, but what problem it's trying to solve.

Before you move forward - Here is my new article on my agentic setup Running a Personal AI Assistant for $0 - Part 1 - Architecture

Demo

In the demo above, a code-review skill is defined. When I ask Claude to “review modified files”, it automatically matches the request to the skill and executes it — without needing explicit invocation.

Skills GitHub Repo - .claude/skills/code-review

Context Window Challenge

One of the biggest challenges with LLM usage in real workflows is:

Context keeps growing
Instructions get repeated
Conversations lose focus over time

Most people try to fix this by writing better prompts or adding more context. However, that approach doesn’t scale. As more context/prompt is provided, more of the context window will be consumed.

Definition

First, let’s go over the official definition of Skills.

Skills are folders of instructions and resources that Claude Code can discover and use to handle tasks more accurately. Each skill lives in a SKILL.md file with at least a name and description in its frontmatter.

A little more about Skills

Skills can be created in different locations based on the user's or the project's settings. At the end of the day, skill is created at .claude/skills/code-review/SKILL.md. Only difference is where the .claude directory is created.

User's personal skill: Create .claude directory in your home directory, e.g., ~/.claude/skills/code-review/SKILL.md
Project-specific skill: create .claude directory in the root of the project, e.g., project/.claude/skills/code-review/SKILL.md

Apart from the user and project, skills can be stored at the Enterprise and plugin levels. And if you have the same skill with the same description in all locations (not a desire approach, but hypothetically), then the skill is applied based on the following precedence order:

1. Enterprise — managed settings, highest priority. All users in your organization.
2. Personal — your home directory (~/.claude/skills/<skill-name>/SKILL.md). All your projects.
3. Project — the .claude/skills/<skill-name>/SKILL.md directory inside a repository. A particular project only.
4. Plugins — installed plugins <plugin>/skills/<skill-name>/SKILL.md, lowest priority. Where the plugin is enabled.

Skills Flow Diagram

User Request
     ↓
[Skill Description Match]
     ↓
[Skill Loaded]
     ↓
[Context Applied]
     ↓
Claude Response

What Skills Actually Change

At a surface level, skills look simple:

Reusable instructions that Claude can apply

But the real shift is deeper:

Skills separate “when to use context” from “what the context is.”

That’s a big deal, instead of loading everything up front or manually injecting instructions. You define a trigger (description) and a payload (instructions).

Skills work best for specialized knowledge that applies to specific tasks:

Code review standards
Commit message formats
Debugging checklists for a project

Claude handles the rest.

When you find yourself repeating again and again, then most likely you need a skill.

The Most Important Part: Description

If there’s one thing that matters more than anything else:

Your Skills description is the entry point to your system

This is effectively a semantic matcher and routing mechanism. In practice, this means:

Two skills with similar descriptions will conflict
A vague description won’t trigger
An overly broad one will trigger at the wrong time

👉 You’re not just writing instructions — you’re designing intent matching

Skills as a Context Control Mechanism

What changed my thinking was this:

Skills are not a convenience feature
They are a context management strategy

Because they are lazy-loaded only when needed, scopes are not always present, and multiple composable components can activate together.

This directly impacts:

Response quality
Token usage
System predictability

Designing Skills (What Actually Matters)

Let's see how we can design the Skills to harvest the best out of it.

1. Skill Schema (Only Important Ones)

The skills open standard supports many fields in the SKILL.md frontmatter. However, only two fields are required:

name (required) — Identifies your skill. Use lowercase letters, numbers, and hyphens only. Maximum 64 characters. Should match your directory name.
description (required) — Tells Claude when to use the skill. Maximum 1,024 characters. This is the most important field because Claude uses it for matching.
allowed-tools (optional) — Restricts which tools Claude can use when the skill is active.
model (optional) — Specifies which Claude model to use for the skill.

2. Keep the core small

Don’t treat a skill like a knowledge dump. Keep instructions focused and push details to supporting files.
Think of this as an entry point, not an encyclopedia.

3. Use constraints intentionally

Parameters like tool restrictions aren’t just configuration. They’re guardrails.

Examples:

Read-only analysis skills
Restricted execution environments
Safer automation workflows

4. Structure for scale, not just usage

The real challenge isn’t creating a skill — it’s managing many of them.

Things that start to matter:

Naming clarity
Description uniqueness
Conflict avoidance
Ownership (personal vs project vs org)

👉 This becomes closer to system design than prompt design

Skills vs Everything Else

A common mistake is trying to use skills for everything. But each feature solves a different problem. Claude Code has several ways to customize behavior. Skills are unique because they're automatic and task-specific. Here's a quick comparison:

CLAUDE.md file is loaded into every conversation. If you want Claude to always use a certain file formatting, that goes in CLAUDE.md.
Skills load on demand when they match your request. Claude only loads the name and description initially, so they don't fill up your entire context window. Your code review checklist doesn't need to be in context when you're debugging — it loads when you actually ask for a review.
Subagents run in isolated execution contexts — use them for delegated work. Skills add knowledge to your current conversation. Here is the article 🔗 for more information about the Subagents.
Hooks are event-driven (fire on some event like file saves). Skills are request-driven (activate based on what you're asking Claude to act)
MCP servers provide external tools and integrations — a different category entirely from skills. Here is the article 🔗 for more information about the Working with MCP
Slash commands require you to explicitly type them. Skills don't. Claude applies them when it recognizes the situation.

How does this shift thinking

Without Skills, we were thinking in terms of:

“How do I write better prompts?”

Now it’s more like:

“How do I design context that shows up at the right time?”

That’s a different level of control.

Practical Takeaways

If you’re starting out:

Don’t create many skills — create one good one
Spend more time on the description than the instructions
Keep skills narrow and intentional
Treat conflicts as a design problem, not a bug

And most importantly:

👉 If context is growing uncontrollably, skills are probably the penetrating layer.

Final Thoughts

👉 “Skills are not a feature. They’re a shift in how you structure AI systems.”

But the idea behind them is bigger:

👉 Moving from prompt writing → to context system design

And that’s where LLM usage starts becoming predictable, not just powerful.

I have also shared my experience with Subagent, here is my article Subagents: The Building block of Agentic AI.

If you have reached here, then I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Working with MCP: What Stood Out

AK DevCraft — Mon, 30 Mar 2026 18:40:40 +0000

Background

I spent some time exploring the Model Context Protocol (MCP). Not a deep dive—just a hands-on attempt to understand how it actually works.

So I built a minimal client + server setup:
👉 https://github.com/an1meshk/claude-chat-cli

In a real-world project, we will build client and server separately; however, as this was just a demo project, everything was built in the same project

A few things stood out more than I expected.

Before you move forward - Here is my new article on my agentic setup Part 1- Running a Personal AI Assistant for $0 - Architecture

Demo

🧠 Context

The goal wasn’t to learn everything about MCP. It was to answer a simpler question:
How hard is it to actually build something with it?

Before moving further, here is a quick MCP client and server architecture diagram:

⚙️ What I Tried

Went through the MCP intro course - course link
Built a minimal client and server in Python using the course’s startup kit
Connected it to a simple CLI chat client that already came with the startup kit

Nothing complex—but enough to see how things fit together.

💡 What Stood Out

1. Setup Is Simpler Than Expected

I was expecting some heavy setup. But, instead:

Basic server setup was straightforward
The Python MCP library abstracts most of the complexity

👉 It felt closer to:
“Define a few handlers, and you’re up.”

than:

“Learn a new framework.”

2. The Real Concept Is Context Flow

The interesting part wasn’t the server setup. It was understanding:

How context flows
How tools are exposed
Exposing resources
How the model interacts with them

That’s where the mental shift is.

🧠 What Helped Me Understand It

This simple mental model made things click:

image credit Anthropic academy course

Client → MCP Server → Tool → Response → Client

Client sends a request
MCP server routes it
The tool executes logic
Response flows back

Once this clicked, everything else made more sense

⚡ If You Want to Try It

The setup is intentionally minimal:

Basic MCP server using Python
A simple tool
CLI client to trigger it

That’s enough to understand the core idea

Repo:
https://github.com/an1meshk/claude-chat-cli

You will need an Anthropic API key, as this project uses Claude

⚙️ One More Thing That Helped

Project uses uv for dependency management. That helps in a few ways:

Fast installs
Clean environment setup
Minimal friction

It removes a lot of the usual setup overhead

⚠️ What I Haven’t Explored Yet

This was just a first pass. Still curious about:

Structuring tools cleanly
Scaling MCP servers
Real production use cases

🧠 Initial Take

My early impression:

MCP isn’t hard to start with
But the real value is in how you design interactions

The setup is easy.
The thinking is where the work is.

🏁 Final Thought

Sometimes the best way to understand a new concept is:
Build the simplest possible version of it

This was one of those cases.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Git Commands Workflow

AK DevCraft — Sun, 29 Mar 2026 06:11:00 +0000

⚡ Git Commands Workflow

When I started using Git, I tried to learn as many commands as possible.

But over time, I realized something:

You don’t need to know everything.
You need to master the workflow.

Most of my day-to-day work revolves around a small set of commands—used repeatedly in different contexts.

🧠 The Real Git Workflow

At a high level, Git revolves around three stages:

Working directory → where you make changes
Staging area → where you prepare changes
Repository → where changes are committed

Understanding this flow matters more than memorizing commands.

🔄 1. Starting Work

git pull
git checkout -b feature/my-feature

What this does:

Sync latest code
Create a new branch

👉 This is usually how I start my day.

✍️ 2. Making Changes

git status
git add .

git status → shows what changed
git add → stages changes for commit

👉 I check the status a lot more than I expected.

💾 3. Saving Work

git commit -m "Add feature X"

Creates a snapshot of your changes
Helps track history over time

👉 Good commit messages matter more than the command itself.

🚀 4. Syncing with Remote

git push
git pull

push → sends your changes
pull → brings latest updates

👉 This is where most conflicts show up.

🌿 5. Working with Branches

git branch
git checkout main
git merge feature/my-feature

Branching helps isolate work
Merging brings changes together

👉 Simple concept—but causes most real-world issues.

🧰 6. Commands I Started Using Later

These didn’t make sense early on—but became useful over time:

git stash
git log
git diff

stash → save work temporarily
log → view history
diff → see changes

👉 These are workflow boosters, not essentials.

⚠️ What Actually Matters (Not the Commands)

Over time, I realized:

Git problems are rarely about commands
They’re about understanding state and flow

Most issues come from:

Not knowing what’s staged
Working on the wrong branch
Pulling at the wrong time

🧠 What I’d Do Differently

If I were starting again:

Focus on workflow, not commands
Learn:
- status
- add
- commit
- branch
- merge

👉 That’s enough for most real work.

⚡ Practical Takeaways

You don’t need 50 Git commands
Master the core workflow first
Use advanced commands only when needed

🏁 Final Thought

Git feels complex when you try to learn everything.

It becomes simple when you think in terms of:
“Where are my changes right now?”

If you have reached here, then I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

JMeter vs Gatling: Comparison for Modern Performance Testing

AK DevCraft — Sun, 22 Mar 2026 05:45:13 +0000

Introduction

Performance testing has been around for a long time. And if you’ve worked in this space, chances are you’ve used Apache JMeter.

It’s popular.
It’s reliable.
And it has served the industry well.

But is it still the best way to approach performance testing today?

🧠 The Shift: From Tooling → Engineering

Traditional performance testing tools like Apache JMeter are largely:

UI-driven
Configuration-heavy
File-based (XML test plans)

This works… until it doesn’t.

As systems become more complex and teams move toward:

CI/CD
Version-controlled infrastructure
“Everything as Code”

👉 Performance testing needs to evolve, too.

That’s where Gatling starts to stand out.

⚔️ JMeter vs Gatling — Key Differences

1. 🧩 Configuration vs Code

JMeter

Test plans are GUI-driven
Stored as .jmx files
Harder to review in pull requests
Merge conflicts are painful

Gatling

Fully code-based (Scala/Java/Kotlin)
Lives naturally in your codebase
Easy to version, review, and refactor

👉 This is the biggest shift:

Gatling treats performance tests like real software, not configuration.

2. 🚀 Learning Curve & Developer Experience

JMeter

Requires learning the tool + UI
Debugging can be unintuitive
Configuration becomes overhead over time

Gatling

Uses familiar programming languages
Easier for developers to adopt
Better IDE support

👉 You’re not learning a tool—you’re applying existing skills.

3. 🔄 CI/CD Integration

JMeter

Integration is possible, but not seamless
Often requires additional scripting

Gatling

Fits naturally into build pipelines
Works like any other test suite

👉 This aligns perfectly with modern DevOps practices.

4. 📊 Reporting & Insights

JMeter

Provides reports, but often requires interpretation
User ramp-up behavior isn’t always obvious
Some level of “guesswork” is involved

Gatling

Rich, interactive HTML reports out of the box
Clear visualization of:
- Active users
- Ramp-up patterns
- Response time distribution

👉 Observability is significantly better.

5. ⚙️ Maintainability at Scale

JMeter

Large test plans become difficult to manage
Reusability is limited
Refactoring is cumbersome

Gatling

Modular, reusable code
Easier to scale scenarios
Cleaner structure

👉 This becomes critical in large systems.

Here is a quick comparison:

🏁 So… Is JMeter Still Relevant?

Absolutely.

Apache JMeter is still:

Mature
Widely adopted
Backed by a strong community

For:

Quick testing
Non-developer teams
Legacy setups

👉 It still does the job very well.

💡 My Take

If your team is moving toward:

Microservices
CI/CD pipelines
Engineering-driven quality

Then:

👉 Gatling feels like the natural next step

Not because JMeter is bad…
…but because the way we build software has changed.

🔥 Final Thought

We’ve already embraced:

Infrastructure as Code
Tests as Code
Pipelines as Code

👉 Performance testing as code is the next logical step.

And in that world, Gatling has a clear edge.

If you have reached here, then I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Practical Tips When Working with AI Coding Assistants

AK DevCraft — Sat, 14 Mar 2026 22:22:13 +0000

Introduction

Modern AI coding assistants like Claude, GitHub Copilot, and ChatGPT can dramatically accelerate development. Recently, while working on a feature update, I had to modify an existing API to fetch data from a new system while maintaining backward compatibility.

The migration was gradual. Some clients would continue using the old system for a while, while others would start using the new one. Because of that, the implementation had to support both behaviors during the transition period.

Like many developers today, I used an AI coding assistant to speed up the implementation.

At first, it seemed straightforward.

But the process turned out to be more interesting than expected.

Before you move forward - Here is my new article on my agentic setup $0 Personal Agentic AI Assistant for $0 - Architecture - Part 1

The First Iteration Looked Correct… But Wasn't Ideal

The AI-generated code worked functionally. It handled the new system integration, preserved backward compatibility, and integrated with the existing service.

But after reviewing the code carefully, a few issues surfaced:

Extra conditional branches that were unnecessary
Redundant logic left over from earlier iterations
Code paths that technically worked but were not optimal
Some defensive checks that were never needed for the actual use case

In other words, the code worked, but it wasn't clean.

It took multiple iterations and careful review before the implementation reached a version that was both correct and maintainable.

This experience reinforced something important.

AI Accelerates Coding — Not Thinking

AI assistants are excellent at generating working code quickly. They help remove boilerplate, explore possible implementations, and reduce the time spent writing repetitive logic.

However, they do not fully understand the context of your system.

They don't know:

The long-term architecture decisions
The migration strategy
The constraints of your system
What future developers will have to maintain

Because of this, AI often generates code that is technically correct but contextually imperfect.

And that is where code review becomes critical.

In the AI Era, Code Review Becomes a Core Skill

When development speed increases, the risk of suboptimal code entering the codebase also increases.

If developers accept AI-generated code in the first iteration, teams may gradually accumulate:

Unnecessary abstractions
Redundant logic
Hidden technical debt
Dead code paths

Over time, these small issues compound, making systems harder to maintain.

This means developers must become even better reviewers than before.

Good code review is no longer just about catching bugs. It is about evaluating whether the code truly fits the system.

Questions We Should Now Ask During Code Review

When reviewing AI-assisted code, now intentionally ask a few additional questions.

1. Does the code solve the problem exactly, or more than necessary?

AI often introduces extra flexibility that the use case does not require.
Extra flexibility today can easily become unnecessary complexity tomorrow.

2. Is there redundant or leftover logic?

Because AI suggestions often evolve across multiple prompts, some intermediate logic can remain even after the final version is generated.
This can result in code paths that are never actually used.

3. Is the implementation optimal for the system architecture?

AI may suggest patterns that work in general but do not align with your system's architecture.

Examples include:

Introducing unnecessary abstractions
Overusing defensive checks
Adding layers that increase complexity without real benefit

4. Will the code still make sense to the next developer?

Maintainability matters.
Even if the code works, the question remains:

Would another engineer understand this logic six months from now?

AI Changes the Development Workflow

One interesting shift I’ve noticed is that development is starting to look more like collaboration between a developer and an AI assistant.

The workflow increasingly looks like this:

Developer defines the intent
AI generates an implementation
Developer reviews and refines
AI proposes improvements
Developer validates architecture and constraints

The developer's role shifts slightly from writing every line of code to evaluating, refining, and validating generated code.

The Real Skill Shift for Engineers

As AI becomes more capable at writing code, the value of engineers will increasingly come from their ability to:

Review code critically
Evaluate trade-offs
Optimize implementations
Ensure architectural alignment

In other words, the skill of thinking about code becomes even more important than writing code.

Development Flow Diagram

Here is a quick comparison between Traditional and AI-Assistent development flow.

Traditional Development

Developer
↓
Write Code
↓
Code Review
↓
Merge

AI-Assisted Development

Developer Idea
↓
AI Generates Code
↓
Developer Reviews & Refines
↓
AI Iteration
↓
Code Review
↓
Merge

AI accelerates code generation, but developers are still responsible for validating the architecture, reducing unnecessary complexity, and ensuring maintainability.

Final Thoughts

AI tools are powerful accelerators for software development. They can help teams move faster and explore solutions more quickly.

But speed should not come at the cost of quality.

If anything, the rise of AI in development makes strong code review practices more important than ever.

Ultimately, AI can generate code, but engineers are still responsible for the systems they build.

If you have reached here, then I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

☁️ Private Cloud vs Public Cloud — still one of the most misunderstood topics in cloud computing. Many discussions focus only on cost or control, but the real differences go deeper

AK DevCraft — Thu, 12 Mar 2026 05:37:00 +0000

AK DevCraft

Dec 12 '21

Private vs Public Cloud: Key Differences, Architecture, and Cloud Service Models

#cloud #cloudcomputing #containers #docker

Comments 1

5 min read

Do you know how to stream Kubernetes log from multiple pods concurrently?

AK DevCraft — Fri, 27 Feb 2026 05:14:43 +0000

AK DevCraft

Jan 3 '23

Streaming Kubernetes logs of more than one pod concurrently

#kubernetes #containers #docker #softwareengineering

Comments

2 min read