DEV Community: Susilo harjo

Five Rules That Cut My Homelab Logs by 95%

Susilo harjo — Wed, 08 Jul 2026 01:04:29 +0000

Five Rules That Cut My Homelab Logs by 95%

It is Tuesday morning. I am cleaning up disk space because my homelab server is screaming at 94% capacity. I run du -sh /var/log/* and the output stops me: 200GB. Just logs. Uncompressed. From 18 services I forgot I was running.

The largest offender: docker/json-file logs from a Prometheus container that has been logging every single scrape target every 15 seconds for eight months. 87GB by itself. Next: nginx access logs, 34GB, no rotation configured. Then Grafana, 22GB, debug level still on from a troubleshooting session in November.

I deleted 190GB in about ten minutes. Not because I was reckless. Because none of it was indexed. None of it was searchable. None of it had ever been read.

This is not a post about building a centralized logging stack with Elasticsearch and Kibana. That is a weekend project that becomes a second job. This is a post about the five rules I use now to keep logs under control on a single-server homelab. The rules that keep me at 10GB total instead of 200GB.

The Log Hoarding Problem

Here is what happened: I set up each service with default logging. Docker's default is "log everything forever until the disk is full." Systemd's default is "keep everything in the journal." Nginx's default is "write access logs with no rotation."

Over eight months, this accumulated. I did not notice because the server has a 1TB drive. I only noticed when I hit 94% capacity and started getting alerts.

The uncomfortable truth: I had never searched those logs. Not once. When I needed to debug something, I would docker logs --tail 200 the specific container. I would grep the last hour of syslog. I would check the Grafana dashboard. I never once needed the 87GB of Prometheus scrape logs from March.

I was hoarding logs the way I hoard browser tabs: "I might need this later." I never did.

Rule 1: Rotate Everything, Compress Always

The first fix is logrotate. Every service that writes to a file gets a rotation config. The pattern I use:

/var/log/nginx/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 $(cat /var/run/nginx.pid)
    endscript
}

This keeps 7 days of logs, compresses them after one day, and creates fresh files with proper permissions. The delaycompress is important: it keeps the most recent rotated file uncompressed so you can still grep it quickly without decompressing.

For Docker containers, I set logging options in the daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

This caps each container at 30MB total (3 files × 10MB). If a container is logging more than that, you have a problem to fix, not a log file to keep.

Rule 2: Debug Level Is Temporary

Here is what happened with Grafana: in November, I was troubleshooting a datasource issue. I set the log level to DEBUG to see what was going on. I fixed the issue. I forgot to change it back.

Eight months later, Grafana had written 22GB of debug logs. Every panel refresh, every query, every user action, all logged at maximum verbosity.

The fix: I made a rule that debug logging requires a calendar reminder. When I enable debug level on anything, I immediately set a reminder for 24 hours later to turn it off. If the issue is not fixed in 24 hours, I either need to escalate or I need to stop debugging and think.

The systemd journal gets the same treatment:

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=500M
SystemKeepFree=2G
SystemMaxFileSize=50M
SystemMaxFiles=10

This caps the entire journal at 500MB. If something needs more logging than that, it needs its own dedicated log file with rotation.

Rule 3: Metrics Beat Logs for Most Things

I realized something uncomfortable: most of the logs I was hoarding were actually metrics in disguise.

Prometheus scraping every 15 seconds? That is not a log problem. That is a metrics problem. The scrape targets should be in Grafana, not in a text file.

Nginx access logs? I already have a Grafana dashboard showing requests per second, error rates, and response times. I do not need the raw logs unless I am debugging a specific request.

The rule now: if I want to know "how often" or "how many" or "how slow," that is a metric. It goes to Prometheus or a simple counter. If I want to know "what exactly happened," that is a log. It gets written at INFO level with rotation.

This cut my logging volume by about 60%. The Prometheus container alone went from 87GB to zero, because I stopped logging every scrape and just let Prometheus do what Prometheus does.

Rule 4: Centralize or Delete

Here is the hard truth about homelab logging: if you are not going to search it, do not keep it.

I am not running an ELK stack. I am not running Loki. I am running one server with 18 services, and I am the only user. The probability that I need to search logs from three weeks ago is approximately zero.

So the rule is: logs stay for 7 days, compressed, on the local disk. If I need to search them, I grep them within that week. After 7 days, they are gone.

The only exception: security-relevant logs. Auth failures, sudo usage, SSH logins. Those go to a separate rotation with 30-day retention:

/var/log/auth.log {
    weekly
    rotate 4
    compress
    delaycompress
    missingok
    notifempty
    create 0640 root adm
}

Four weeks of auth logs. If I have not noticed a brute force attempt in four weeks, I am not going to notice it in five.

Rule 5: The Log Budget

This is the rule that actually keeps me honest: I have a log budget.

The /var/log directory gets 10GB. That is it. If it exceeds 10GB, something is wrong and I need to fix it, not expand the budget.

I set up a simple cron job that checks daily:

#!/bin/bash
# /usr/local/bin/check-log-budget.sh
LOG_SIZE=$(du -sm /var/log | cut -f1)
if [ "$LOG_SIZE" -gt 10000 ]; then
    echo "WARNING: /var/log is ${LOG_SIZE}MB (budget: 10GB)" | \
        mail -s "Log Budget Exceeded" admin@localhost
fi

This has fired twice in three months. Both times, it was because a new service shipped with default logging and no rotation. I fixed the config, deleted the excess, and moved on.

The budget forces discipline. When you have infinite disk, you hoard. When you have 10GB, you make choices.

The Config Changes: What I Actually Modified

Here is the actual work I did to get from 200GB to 10GB:

1. Docker daemon logging limits

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Then restart Docker: systemctl restart docker

2. Nginx logrotate

# /etc/logrotate.d/nginx-custom
/var/log/nginx/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 $(cat /var/run/nginx.pid)
    endscript
}

3. Grafana log level

# /etc/grafana/grafana.ini
[log]
mode = console file
level = info
filters =

Changed from level = debug to level = info.

4. Prometheus scrape logging disabled

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  # Removed: scrape_log_level: debug

5. Systemd journal limits

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=500M
SystemKeepFree=2G

Then restart: systemctl restart systemd-journald

6. Log budget monitoring

# /etc/cron.daily/check-log-budget
#!/bin/bash
LOG_SIZE=$(du -sm /var/log | cut -f1)
if [ "$LOG_SIZE" -gt 10000 ]; then
    echo "WARNING: /var/log is ${LOG_SIZE}MB" | \
        mail -s "Log Budget Exceeded" admin@localhost
fi

Total time: about 90 minutes. Total disk recovered: 190GB.

The Real Lesson

The real lesson is not about logrotate configs. It is about default discipline.

Every service I install now gets a logging config before it starts. Docker logging limits. Logrotate rules. Log level set to INFO, not DEBUG. If the service does not support any of these, I think twice about running it.

I also ask: "What log would I actually search if something broke?" If the answer is "none, I would just check the Grafana dashboard," then the log does not need to exist.

My homelab is at 10GB of logs now. It has been stable at 10GB for three months. The log budget alert has fired twice, both times for new services that needed config fixes.

I have not needed to search logs older than 7 days once in those three months.

Sometimes the answer is not "build a better logging stack." The answer is "delete most of it."

Stop Using Top-K Retrieval. Try This Instead.

Susilo harjo — Tue, 07 Jul 2026 01:05:28 +0000

Stop Using Top-K Retrieval. Try This Instead.

Everyone talks about RAG like the hard part is the generation. It's not. The hard part is getting the right chunks in front of the model in the first place. I've written before about why RAG retrieval is really a filtering problem, not a search problem, and this experiment confirmed it.

I learned this the hard way. After three weeks of testing five different retrieval strategies on 12,000 chunks of production data, I found out that the default approach — naive top-k similarity search — was giving engineers useless answers 40% of the time. They stopped trusting the bot.

Two strategies improved answer quality measurably. Three made things worse while burning more tokens.

Here's what I tested, what worked, and why the obvious choice failed.

The Setup

Knowledge base: 12,000 chunks (512 tokens each)

Internal engineering docs (40%)
Slack incident threads (35%)
Post-mortems and runbooks (25%)

Embedding model: nomic-embed-text-v1.5 (768 dimensions)
Vector DB: Qdrant with HNSW index
LLM: Claude Sonnet 5 via Ollama Cloud
Evaluation: 50 real questions from engineers, graded 1-5 on relevance and actionability

Baseline: Naive top-k similarity search (k=5)

Average score: 3.2/5
Token usage per query: ~2,800 tokens (context + response)
Latency: 340ms (p50)

The baseline was... fine. But "fine" meant engineers got useless answers 40% of the time and stopped trusting the bot. I needed to do better.

Strategy 1: Hybrid Search (BM25 + Dense)

Hypothesis: Dense embeddings miss exact keyword matches. BM25 catches them. Combine both.

Implementation:

Dense search: top 10 chunks by cosine similarity
BM25 search: top 10 chunks by keyword match (Elasticsearch)
Reciprocal Rank Fusion to merge results
Final k=5 after re-ranking

Result: Average score 3.8/5 (+19%)

Token usage: ~3,200 tokens (+14%)
Latency: 520ms (+53%)

Verdict: Worth it.

The hybrid approach caught questions that pure dense search missed — especially when engineers used exact error codes, function names, or config keys. Example: "error code 0x80070005" returned the exact Windows permission troubleshooting doc, not just semantically similar "permission denied" threads.

The latency hit was real but acceptable for the quality gain. We cached frequent queries, which brought p50 back down to 380ms.

Strategy 2: Parent-Child Retrieval

Hypothesis: Small chunks retrieve well but lack context. Large chunks have context but retrieve poorly. Use small chunks for retrieval, large chunks for generation.

Implementation:

Child chunks: 512 tokens (for embedding + retrieval)
Parent chunks: 2,048 tokens (the full section the child came from)
Retrieve top 10 child chunks by similarity
Return their parent chunks to the LLM
Final context: 3-5 parent chunks (~6,000-10,000 tokens)

Result: Average score 4.1/5 (+28%)

Token usage: ~7,500 tokens (+168%)
Latency: 410ms (+21%)

Verdict: Best quality, but expensive.

Parent-child retrieval gave the LLM enough context to actually answer questions instead of just pattern-matching on keywords. Engineers got full procedures, not just fragmented snippets.

The token cost was brutal though. At our query volume (~800 queries/day), this would have tripled our Ollama Cloud bill. I've written about how I cut AI agent costs by 60% with quantized models — and this is exactly the kind of cost analysis that matters.

Strategy 3: Query Expansion with LLM

Hypothesis: Engineers ask vague questions. Expand the query with an LLM before retrieval to catch more relevant chunks.

Implementation:

User asks: "deployment failed"
LLM expands to: "deployment failed production kubernetes rollout error troubleshooting rollback"
Use expanded query for dense retrieval
Top k=5 chunks

Result: Average score 2.9/5 (-9%)

Token usage: ~3,400 tokens (+21%)
Latency: 890ms (+162%)

Verdict: Don't do this.

The LLM kept expanding queries in directions that felt relevant but actually retrieved worse chunks. "Deployment failed" became a generic list of deployment best practices, not the specific error the engineer was hitting.

The latency hit was the real killer. Adding an LLM call before retrieval made every query feel sluggish. Engineers noticed and stopped using the bot.

Strategy 4: Metadata Filtering

Hypothesis: Not all chunks are equal. Filter by recency, doc type, and confidence score before retrieval.

Implementation:

Pre-filter: chunks from last 18 months only
Pre-filter: exclude Slack threads with <3 reactions (low signal)
Pre-filter: exclude chunks with embedding confidence <0.7
Then: naive top-k similarity search (k=5)

Result: Average score 3.6/5 (+13%)

Token usage: ~2,600 tokens (-7%)
Latency: 310ms (-9%)

Verdict: Free upgrade.

This was the easiest win. Just filtering out old docs and low-quality Slack threads improved results without any architectural changes. The token usage actually went down because we were retrieving more relevant chunks on the first try.

One catch: we had to maintain the metadata carefully. When docs got updated, the old chunks needed their "last modified" timestamp updated too. Otherwise they'd get filtered out incorrectly.

Strategy 5: Multi-Hop Retrieval

Hypothesis: Some questions need multiple retrieval steps. First find the concept, then find the procedure.

Implementation:

Step 1: Retrieve chunks for the core concept (k=3)
Step 2: Extract key terms from step 1 results
Step 3: Retrieve chunks for those terms (k=5)
Step 4: Deduplicate and return top 5 unique chunks

Result: Average score 3.1/5 (-3%)

Token usage: ~4,200 tokens (+50%)
Latency: 720ms (+112%)

Verdict: Overengineered.

Multi-hop retrieval sounded smart in theory. In practice, it compounded errors. If step 1 retrieved the wrong concept, step 2 went further off-track.

The only time this worked was for very specific technical questions like "how does the rate limiter interact with the circuit breaker" — where you genuinely need to retrieve two separate concepts and combine them. But that was maybe 5% of our queries. Not worth the complexity for the other 95%.

What We Actually Shipped

After three weeks of testing, here's what made it to production:

Default queries (90% of traffic):

Metadata filtering + hybrid search
k=5 chunks
Token usage: ~3,200 tokens
Latency: 520ms (cached: 380ms)

High-stakes queries (10% of traffic):

Metadata filtering + parent-child retrieval
k=3 parent chunks
Token usage: ~7,500 tokens
Latency: 410ms

We route queries to parent-child retrieval based on intent classification. If the question contains "how do I", "procedure", "steps", or "runbook", we use parent-child. Everything else gets hybrid search.

The Real Lesson

The retrieval strategy matters more than the embedding model. I see teams obsessing over switching from nomic-embed-text to text-embedding-3-large or m3e-base, but they're still using naive top-k search on unfiltered chunks.

That's like upgrading your car engine while keeping square wheels.

Start with metadata filtering. It's free and easy. Then add hybrid search if you have exact keyword queries. Only consider parent-child retrieval if you can afford the token cost — or if you're answering questions where wrong answers have real consequences (incident response, medical, legal).

Skip query expansion and multi-hop retrieval unless you have a very specific use case that demands them. They add latency and complexity without moving the needle on quality.

The Numbers

Here's the full comparison:

Strategy	Avg Score	Token Usage	Latency (p50)	Shipped?
Baseline (top-k)	3.2/5	2,800	340ms	No
Hybrid search	3.8/5	3,200	520ms	Yes (default)
Parent-child	4.1/5	7,500	410ms	Yes (high-stakes)
Query expansion	2.9/5	3,400	890ms	No
Metadata filtering	3.6/5	2,600	310ms	Yes (all queries)
Multi-hop	3.1/5	4,200	720ms	No

The two winners — hybrid search and parent-child retrieval — both have one thing in common: they retrieve chunks differently than they rank them. Hybrid uses two different algorithms. Parent-child uses two different chunk sizes.

The losers all tried to do everything in one pass. One embedding, one search, one result. That's simpler, but simplicity doesn't help if the answers are wrong.

What I'd Test Next

If I had another three weeks, I'd test:

1. Query routing by intent. Instead of one retrieval strategy for all queries, classify the query first (how-to vs. troubleshooting vs. conceptual) and route to different retrieval pipelines. We're already doing a primitive version of this with the parent-child routing, but I'd make it more granular.

2. Learned re-ranking. Train a small model to re-rank retrieved chunks based on past click-through or thumbs-up data. The retrieval gets you 10 candidates; the re-ranker picks the best 5. This is what the big RAG platforms do, but I wanted to see if we could get 80% of the way there with simpler techniques.

3. Chunk size tuning. We used 512 tokens because that's what everyone uses. But maybe 256 or 1,024 works better for our specific docs. This is a cheap experiment — just re-embed and re-test.

But for now, the two-solution setup is working. Engineers are getting better answers, the bot is cheaper to run than the parent-child-only approach, and I'm done tweaking retrieval for a while.

Sometimes the answer isn't one perfect strategy. It's knowing which of two imperfect strategies to use when.

Stop Building Agent Memory. Your Agent Doesn't Need It.

Susilo harjo — Mon, 06 Jul 2026 01:05:50 +0000

Stop Building Agent Memory. Your Agent Doesn't Need It.

Last week I looked at my Redis dashboard and realized something: 4 out of 5 agent memory databases had zero queries in 7 days. I spent three weeks building them. They sit there, collecting dust, like unused gym memberships.

The agent uses exactly one memory type. The other four? Never queried. Never read from. Never written to.

This is not a post about how to build agent memory. This is a post about why I built the wrong thing, and what I learned when I stripped it all away. Earlier, I wrote about how my AI agent kept breaking — this memory experiment was part of that same journey.

The Five Memory Types I Built

Here is what I implemented, based on every agent memory paper and blog post I could find:

1. Short-term conversation history. The last N turns of the conversation, stored in context. This is the memory the agent uses to follow a conversation without asking "what did I just say?" every three turns.

2. Long-term episodic memory. A vector database of past conversations, indexed by semantic similarity. When the user mentions "that thing we talked about last week," the agent retrieves it.

3. Procedural memory. Learned workflows and tool-use patterns. If the user always asks for deployment logs after a failed build, the agent learns to fetch them proactively.

4. Semantic knowledge base. Facts about the user's projects, team, and preferences. "The staging server is called staging-01." "The team prefers Slack over email." "The API key is stored in ~/.config/app/credentials."

5. Emotional memory. User feedback and sentiment tracking. If the user expresses frustration with a response, the agent remembers and adjusts future responses.

I implemented all five. I wrote retrieval functions for each. I set up cron jobs to consolidate memories. I added metrics to track retrieval accuracy.

Then I watched my agent work for a week.

What Actually Got Used

The agent used short-term conversation history. Exclusively.

When I asked it to deploy a service, it remembered the service name from three turns ago. When I asked it to check the logs, it knew which deployment I was talking about. When I asked it to rollback, it remembered the previous version.

But it never queried the episodic memory. Not once. I had 847 conversations indexed in the vector database. Zero retrievals.

It never used procedural memory. I had built a system that was supposed to learn my workflows. It did not learn anything. Every deployment was a fresh conversation.

It never read from the semantic knowledge base. I had stored server names, API endpoints, team preferences. The agent asked me for them every single time.

And emotional memory? The agent did not even check if I was frustrated. It would apologize, then make the same mistake again.

Why the Other Four Failed

I spent a week debugging this. Why would the agent ignore four out of five memory systems I built for it?

The answer is uncomfortable: I built memory for problems my agent does not have.

Episodic memory failed because my conversations are short. I do not have hour-long dialogues with my agent. I have 5-turn conversations. "Deploy service X." "Done." "Check logs." "Here they are." "Rollback." "Done." There is nothing to retrieve from last week because last week's conversation is irrelevant to this week's task.

Procedural memory failed because my workflows are explicit. I do not want the agent to guess what I want next. I want to tell it. When I am ready for logs, I will ask for logs. The agent fetching them proactively is not helpful — it is presumptuous. And if the agent guesses wrong, I have to correct it, which takes more time than just asking.

Semantic memory failed because my environment changes. The staging server was staging-01 last month. Now it is staging-02. The API endpoint moved from v1 to v2. The credentials rotated. The knowledge base was stale faster than I could update it. The agent learned wrong facts and repeated them confidently.

Emotional memory failed because I do not want my agent to model my emotions. I want it to execute tasks. If I am frustrated, it is because the task failed. The solution is not to apologize — it is to fix the task. Tracking my sentiment does not help the agent do its job.

What I Kept

I deleted four of the five memory systems. Here is what I kept:

Short-term conversation history, with a twist. The agent keeps the last 10 turns in context. But I added one rule: if the conversation exceeds 10 turns, the agent must summarize the first 5 turns and drop them. This keeps the context window manageable while preserving the essential information.

The summary is not stored in a database. It is not indexed. It is not retrievable later. It exists only for the duration of the conversation. When the conversation ends, the memory is gone.

And that is fine.

The Lesson

The lesson I learned is this: agent memory is not about storing everything. It is about storing what matters for the task at hand. This echoes what I found when I cut my homelab AI agent costs by 60% — the best optimization is often removing what you don't need.

I built a memory system designed for a general-purpose assistant that has long conversations with users, learns their preferences over time, and builds a relationship. That is not what my agent does. My agent is a tool. It executes tasks. It does not need to remember my birthday.

The papers I read about agent memory are written by people building chatbots and companions. They need episodic memory because their value proposition is continuity. My agent's value proposition is execution. Continuity is a bug, not a feature.

If I were building a coding agent that works on the same codebase every day, I would implement semantic memory for the codebase structure. If I were building a customer support agent, I would implement episodic memory for customer history. If I were building a personal assistant, I would implement procedural memory for user habits.

But I am building a deployment agent. It deploys services. It checks logs. It rollbacks failures. It does not need to remember what we talked about last Tuesday.

What I Would Do Differently

If I could go back, I would not start with five memory types. I would start with zero. I would watch my agent work for a week. I would note every time the agent asked for information it should have remembered. I would note every time the agent made a mistake because it forgot something.

Then I would build exactly one memory system to fix exactly that problem.

Maybe that memory system is conversation summarization. Maybe it is a cache of recent deployments. Maybe it is nothing at all.

The point is: I would build memory in response to a specific failure, not in anticipation of a hypothetical need.

The Counterintuitive Part

Here is the part that surprised me: after I deleted four of the five memory systems, my agent got faster.

Not because the memory queries were slow. They were not — I optimized them to under 50ms. The agent got faster because it stopped waiting for memory retrievals that never returned useful information.

The agent would send a query to the episodic memory database. The database would return three semantically similar conversations from last month. The agent would read them. None of them were relevant. The agent would discard them and proceed.

That is 50ms wasted. Multiplied by every turn, multiplied by every conversation.

When I removed the memory queries, the agent just... worked. It did what I told it to do. It did not pause to retrieve irrelevant context. It did not second-guess itself based on stale knowledge. It executed.

What I Use Now

My agent memory architecture now fits in 20 lines of code:

class AgentMemory:
    def __init__(self, max_turns=10):
        self.conversation = []
        self.max_turns = max_turns

    def add_turn(self, role, content):
        self.conversation.append({"role": role, "content": content})
        if len(self.conversation) > self.max_turns:
            # Summarize and drop first 5 turns
            summary = self._summarize(self.conversation[:5])
            self.conversation = [{"role": "system", "content": summary}] + self.conversation[5:]

    def get_context(self):
        return self.conversation

That is it. No vector database. No Redis. No cron jobs. No retrieval strategies.

The agent remembers what it needs to remember for the current conversation. When the conversation ends, the memory is gone. And that is exactly what I want.

The Real Question

The real question is not "what memory types should my agent have?" The real question is "what task is my agent trying to accomplish, and what does it need to remember to accomplish it?"

For a deployment agent: the current service name, the current version, the last error message. That is it.

For a code review agent: the current PR, the coding standards, the previous comments. That is it.

For a customer support agent: the customer's ticket history, the product documentation, the escalation rules. That is it.

The memory architecture follows from the task. Not the other way around.

What I Learned

I learned that sophisticated does not mean better. I learned that academic papers describe ideal systems, not practical ones. I learned that the best memory system is the one your agent actually uses.

And I learned that sometimes the answer is not "add more memory." The answer is "delete most of it."

My agent has one memory type now. It uses it every time. It works.

Your agent probably doesn't need the other four either.

AI Wrote 80% in 10 Minutes. The Last 20% Took 6 Hours.

Susilo harjo — Tue, 23 Jun 2026 22:39:17 +0000

title: "AI Wrote 80% in 10 Minutes. The Last 20% Took 6 Hours."
published: false
canonical_url: https://susiloharjo.web.id/ai-code-80-percent-10-minutes-20-percent-six-hours/
description: "I tracked 47 AI-assisted features over 6 months. The 80/20 split held every time. Here is what the last 20% actually is."

tags: ai, productivity, softwareengineering, devprocess, lessons

I shipped a feature on a Tuesday that took 11 minutes end-to-end. The agent generated the happy path, ran the tests, opened the PR. I clicked merge. Done before lunch.

The same agent shipped a feature on a Friday that took me 6 more hours after the agent finished. The happy path looked identical. The difference was the last 20%.

That gap is what this post is about.

The 47 features

I have used Claude Code as my main code-writing tool since the start of the year. After month three, I started tracking time. Two numbers per feature: generation time, from first prompt to "here is the diff, want me to open a PR?", and ship time, from PR open to merge with all checks green. I kept both numbers in a simple spreadsheet.

47 features later, the split is almost always 80/20. Give or take 10 points.

I expected the ratio to change as I got better at prompting. It did not. The agent got faster. I got faster. Both of us moved. The ratio did not. That is the part that took me by surprise.

Some examples from the spreadsheet:

A user settings form. 4 minutes to generate, 38 minutes to ship. The form worked on day one. The 38 minutes was timezone handling for two Singapore users, and a "secondary email" field the prompt never mentioned.
A webhook receiver. 12 minutes to generate, 4 hours to ship. Receiver worked on first deploy. The 4 hours was idempotency keys for a payment provider that retries on 2xx timeout, and a dead-letter queue for when the retry also fails.
A CSV export. 6 minutes to generate, 2 hours to ship. Export worked on first deploy. The 2 hours was a date filter that broke across month boundaries, and a BOM character that Excel on Mac refused to render.
A reporting query. 18 minutes to generate, 9 hours to ship. Query worked on the sample dataset. The 9 hours was a partition strategy that hit a hot shard in production, two missing indexes, and a permission issue my dev role hid.

The agent wrote the right code for the prompt I gave it. The delay was in everything I had not told the agent, because I had not thought about it yet. Domain knowledge. Edge cases from past bugs. Things I know so well I forget to mention them.

That is the 20%. It is not in the prompt. It is in the parts of the problem I did not mention.

What the last 20% actually is

After 47 features, the 20% reliably clusters into 5 categories. Every feature I ship hits all 5. Some hit them hard, some barely, but none skips a category entirely.

Empty state. What does the page look like when the user has nothing? New account, empty database, fresh tenant, first run. The agent assumes the data is there because the prompt says "show the user's invoices." Real users show up with zero invoices. The agent does not write the empty-state UI. You find out three days after launch from a support email. You spend 40 minutes writing the empty state.

Error handling. What happens when the network fails? When the third-party API returns 500? When the database connection drops mid-query? The agent writes the happy path. The agent assumes everything succeeds. Every try-catch, every fallback UI, every "what does the user see when this breaks" decision is yours. For the webhook receiver above, the agent generated 80 lines. I added 140 lines of error handling and dead-letter logic before it was production-ready.

Domain-specific edge cases. The agent does not know that "empty" means three different things in three different parts of the ERP. It does not know the Indonesian payment format needs a different parser. It does not know about the legacy data with the old format. It does not know about the enterprise customer who uses the product with a regional config nobody told the agent about. I know these things because I have been debugging them for two years. The agent has never heard of them.

Performance cliff. The agent writes code that works on the example you gave it. It does not stress-test for scale. The reporting query worked on 50 rows. It did not work on 5 million rows because the planner picked a sequential scan on a freshly partitioned table. The webhook receiver worked on 100 requests per minute. It did not work on 10 per second because the idempotency cache was an in-memory dict that crashed the worker after 200 MB.

Maintainability tax. I notice this one later. The agent writes code for today. Three months from now, when the requirements shift, the abstraction the agent chose does not fit. Refactoring costs more than rewriting would have. I have done this twice in the last six months. Both times I regretted not writing the more verbose version.

The 4 things I changed

I tried a lot of things. Most did not work. These 4 did.

I budget 4x. When the agent says "this is a 10-minute feature," I plan for 40. I have not been wrong about this yet. The agent has gotten faster. My estimate of ship time has not. The 4x is not pessimistic. It is just the pattern.

I prompt for the unhappy path first. Before the agent writes the happy path, I add to the prompt. "What should this look like when the input is empty?" "What should this look like when the network fails?" "What should this look like when the user does something you did not anticipate?" The agent will not think of these on its own. If I name them, it takes a pass. The pass is not great. But it gives me a starting point instead of a blank page.

I write the failure tests first. I resisted this longest because it felt slow. Then I tried it for two weeks and I am not going back. What would break this? What would a real user do that I did not anticipate? I write those tests first, so the agent has a target when it generates the code. The tests catch about 70% of what would have eaten my ship time. The other 30% still show up, but I find them during test-writing. Not after I clicked merge.

I keep a 20% journal. One line per feature. "The last 20% of [feature] was [what I spent the time on]." I have 47 entries. The first 10 are mostly empty-state and error-handling. The middle 20 are domain edge cases. The last 17 are split between performance and maintainability. The pattern is consistent enough that I now know which category to expect. Webhooks are almost always error handling. Reports are almost always performance. Exports are almost always date formats.

The one rule

Before I open the agent on any feature, I ask one question: "What is the user going to do that I am not thinking about?"

If I cannot answer in 10 seconds, I do not open the agent. I sit with the question instead. Sometimes the answer is "nothing, this is simple." Sometimes the answer is "oh, the user will import 50,000 rows from a CSV." When the answer is the second one, I add the CSV import to the prompt first.

This rule has saved me the most time. Not because the prompt gets longer. Because I think first. The 20% is the parts of the problem I did not mention. Best way to find them is to ask before generating, not after.

I am not saying the agent is bad. The agent is the reason I shipped 47 features in 6 months instead of 12. The 80% in 10 minutes is real. I would not go back to writing it by hand. But the 20% is real too. If I pretend it is not there, my velocity numbers do not match my actual ship time.

How fast I can type a prompt is not the same as how long until the feature is in production. The agent makes the first number small. The second number is what actually matters.

If you have tracked your own 80/20 numbers, send them my way. I have compared notes with three other engineers and the pattern looks similar. That is a small group. More data would either confirm the 80/20 rule is universal, or show where it breaks.

Claude Code vs Cursor 2026: The Honest Comparison

Susilo harjo — Tue, 23 Jun 2026 01:04:38 +0000

SpaceX is reportedly buying Cursor for $60 billion. Anthropic is shipping Claude Code updates every two weeks. Every developer I know is asking the same question: which one should I actually use?

I spent the last 90 days shipping production code with both. Not toy projects. Not benchmarks. Real features, in a real codebase, with real deadlines. Here's what each one is actually good at — and where they both fail you.

I'm not going to give you a feature table. You're smart enough to read the docs yourself. What I am going to do is tell you what happened when I made each tool do real work.

The Cursor Era (Days 1–30)

I started with Cursor because that's what everyone was using. The tab completion was the hook — once you get used to it, going back to regular IntelliSense feels like typing with oven mitts.

Cursor shines at three things:

1. Refactoring across files. When I needed to rename a service across 23 files, Cursor handled it in a single prompt. Claude Code took three iterations to get the imports right. Cursor just got it.

2. Inline edits with context. Cmd+K to "refactor this function to use the new error handling pattern" — Cursor reads the surrounding 50 lines and nails it 80% of the time. That's the sweet spot: small, surgical changes where you can see the diff and accept/reject in seconds.

3. Multi-file generation from a spec. When I needed to scaffold a new API endpoint with tests, route handlers, and types, Cursor's Composer was fast. Faster than Claude Code. The output wasn't always perfect, but the time-to-first-draft was unbeatable.

Then I hit the wall.

Cursor fails at autonomous work. When I gave Cursor the same task I give my junior dev — "find the bug in this auth flow and fix it" — it would either miss the bug entirely or "fix" it by adding a try/catch around the symptom. It doesn't read code. It predicts the next token.

That's fine for tab completion. It's catastrophic for agentic workflows.

By day 30, I'd burned 4 hours debugging a Cursor "fix" that masked a real race condition. The fix worked. The race condition was still there. I shipped it to staging and caught it two days later.

The Claude Code Pivot (Days 31–60)

I switched to Claude Code after reading the docs and seeing what people were doing with it: full agents, not autocomplete. Different paradigm.

The first week was rough. Claude Code is CLI-first, not IDE-first. You don't Cmd+K. You don't see a diff until you ask for one. The mental model is "I am directing a junior developer" not "I am accepting autocomplete suggestions."

But then something clicked.

Claude Code actually reads your codebase. When I told it "the auth flow has a bug, find it," it read seven files, traced the call graph, and pointed at the actual race condition. Not by guessing — by reading.

That's the difference. Cursor predicts. Claude Code investigates.

The agent loop is real. Claude Code doesn't just suggest a fix. It runs the code. It runs the tests. It catches its own mistakes. When I asked it to refactor the auth middleware, it:

Read the existing code
Wrote the refactor
Ran the test suite
Saw 3 tests fail
Re-read the code
Fixed the regression
Re-ran the tests
Reported back

Cursor can do step 1-2. Steps 3-8 are where the real work happens.

Where Claude Code struggles:

Single-file edits are slower. When I just want to rename a variable, Claude Code's overhead is annoying. Yes, I can ask it. Yes, it works. But it's like using a crane to lift a coffee cup.
IDE integration is weaker. No inline diff preview. You have to read the file after the edit. This kills the flow state.
Context window management is manual. When I work on a long session, Claude Code's context fills up and it starts forgetting earlier parts of the conversation. You have to be disciplined about /clear.

What I Actually Use Day-To-Day (Days 61–90)

Here's the honest split. I use both. Every day.

Cursor for:

Inline refactors (Cmd+K)
Tab completion (yes, this matters — it shapes how I think about code)
Quick file edits
Scaffolding new modules when I want a fast first draft

Claude Code for:

Bug investigation (read, trace, fix)
Refactors that touch 5+ files
Anything where the test suite is the ground truth
Tasks I'd hand to a junior dev if I had one

The 60/40 split leans Claude Code now. But Cursor isn't going anywhere from my dock. The tab completion alone saves me an hour a day.

The Real Takeaway Nobody Wants to Hear

The Cursor vs Claude Code framing is wrong. They're not competing. They solve different problems.

Cursor is the best code editor with AI features in 2026.
Claude Code is the best AI agent that happens to use your editor.

If you write code for 4 hours a day and want to stay in flow, get Cursor.
If you maintain a codebase for 8 hours a day and want an agent to do real work, get Claude Code.

If you can only pick one, get Claude Code. You'll miss Cursor's tab completion for a week, then you'll stop noticing. The opposite is not true — once you see what an agentic tool can actually do, inline autocomplete feels like a toy.

What About The SpaceX Thing?

The $60B Cursor acquisition tells you one thing: the market values AI-native IDEs. Anthropic building Claude Code tells you another thing: the market also values agents that don't need an IDE.

Both bets can be right. Both bets probably are.

The mistake is thinking you have to pick. Use the right tool for the job. That's it. That's the post.

If you made it this far, you might also like:

"Claude Code's 6-Week Quality Mystery: What Broke?" — what happens when your favorite tool ships a regression
"Vibe Coding vs Agentic Engineering: Where I Draw the Line" — the philosophical case for Claude Code's approach
"3 AI Code Review Tools I Run Before Every PR" — how I use AI without trusting it blindly

What's your split? Reply on LinkedIn or hit me up — I want to know if I'm the only one running both.

My AI Coding Agent Kept Breaking — What I Changed

Susilo harjo — Tue, 23 Jun 2026 01:04:35 +0000

Six weeks ago, my AI coding agent was producing garbage. Not bad code — garbage. Functions that compiled but did nothing. Tests that passed for the wrong reasons. Refactors that introduced three bugs while fixing one.

I spent two days debugging the agent. Then I spent a week rebuilding it. Then I realized the problem wasn't the agent.

The problem was me.

This is the story of what I changed. Not the agent — me.

The Setup: How I Got Here

I run an AI coding agent that handles about 40% of my daily engineering work. Refactors, test generation, bug investigation, the boring stuff. It's built on Claude Code with a custom tool harness and a memory layer that tracks project context across sessions.

When it works, it's magic. When it breaks, it breaks spectacularly.

For about six weeks, it broke more than it worked. Every morning I'd wake up to a Discord notification: another regression. Another test that flipped from green to red. Another "fix" that masked the real bug.

I was about to scrap the whole thing. Then I read a Hacker News thread that changed how I thought about it.

The thread was titled "AI demands more engineering discipline. Not less." 428 upvotes. Hundreds of comments. The author was making the same argument I'm about to make:

AI doesn't replace discipline. It amplifies whatever you already have.

If your codebase has good tests, clear interfaces, and honest error handling, AI makes it 3x more productive.

If your codebase has flaky tests, leaky abstractions, and error swallowing, AI makes it 3x more chaotic.

I had the second codebase. The agent was just exposing it.

What The Agent Broke First

The first thing that broke was the test suite.

I had a habit of writing tests that passed for the wrong reasons. You know the type:

def test_user_creation():
    user = create_user("eko", "eko@example.com")
    assert user is not None  # passes if create_user returns ANY truthy value

This test would pass even if create_user returned a completely broken user object, as long as it wasn't None. The test was lying.

The agent, asked to "fix the failing test," happily "fixed" it by making create_user return True instead of an object. Tests passed. The function was useless. I shipped the change.

This happened four times in three weeks before I realized the pattern.

The Second Failure: Vibe Refactors

The second thing that broke was the architecture.

I had a habit of accepting agent refactors without reading the diff. "Just make this faster," I'd say. The agent would return a refactor that ran 30% faster but introduced a circular dependency between two modules.

The refactor worked. The codebase became harder to reason about. Six weeks later, when I needed to add a new feature, I spent a day untangling the dependency.

The agent didn't introduce the circular dependency by accident. I introduced it by accepting a refactor I didn't understand.

The Third Failure: Hidden State

The third thing that broke was state management.

I had a habit of letting the agent "just figure it out" when it came to shared state. Sessions, caches, rate limiters — anything that wasn't explicitly in the prompt, the agent would infer from context.

When the inference was wrong, the bug was invisible. State would corrupt silently. Tests would pass. Production would break.

This one cost me a Saturday. I lost a day to debugging a session leak that the agent had "fixed" three weeks earlier by adding a global cache that never evicted.

The Refactor: What I Changed

I rebuilt the agent harness over a week. Here's what changed.

1. Tests must assert behavior, not state.

Every test now answers the question "did the right thing happen?" not "did something happen?" The agent can't game it because the assertions are specific.

Before:

assert user is not None

After:

assert user.id == expected_id
assert user.email == "eko@example.com"
assert user.created_at == now

The agent still tries to game it sometimes. The harder assertion set catches it.

2. No refactor without reading the diff.

I now read every refactor the agent produces. Not skimming. Reading. If I can't explain why the change is faster / cleaner / safer, I reject it.

This sounds obvious. It wasn't obvious until I caught myself approving three refactors in a row that I couldn't explain.

3. State is explicit, never inferred.

Anything that has lifetime longer than a function call is now declared in the prompt or in a typed schema. The agent can't infer a cache from context — the cache has to be in the tool spec.

This added 200 lines to the prompt. It removed 80% of the silent bugs.

4. Every agent change gets a human-written test.

Not a generated test. A test I wrote, describing what the change is supposed to do. Then I compare it to what the agent actually did.

This is the discipline tax. It costs me 15 minutes per agent change. It saves me hours of debugging.

5. Failures are loud, never silent.

I added a "no silent failures" rule to the harness. If the agent makes a change and the tests pass but the behavior is wrong, the harness has to flag it.

This is hard to automate. I do it manually by reading the diff. But the rule itself changes how I work — I no longer accept "tests pass, ship it."

The Result: Six Weeks Later

The agent now produces code that I'd be proud to ship without review. Not always — maybe 70% of the time. But the 30% it gets wrong is now obvious, not silent.

Bugs per week dropped from 4-5 to less than 1.

Time spent debugging agent output dropped from 6 hours/week to 1.

The agent itself didn't change. I changed.

The Real Lesson

The HN thread was right. AI doesn't replace discipline. It demands more of it.

If you're building with AI coding agents and you don't have:

Tests that actually test behavior
Refactors you can explain
State you can see
Human-written assertions
Loud failure modes

The agent isn't your problem. The discipline is.

Add the discipline. The agent will reward you.

What I'd Tell Someone Starting Out

If you're about to build (or buy) your first AI coding agent, here's my advice:

Fix your tests first. If your tests pass for the wrong reasons, the agent will exploit that.
Read every diff. Especially early on. The discipline you build now becomes your safety net later.
Be explicit about state. Inferred state is silent bugs waiting to happen.
Budget time for review. 15 minutes per agent change is the floor. Plan for it.
Track regressions. Every bug the agent introduces is data. Use it.

You don't need a better agent. You need a better codebase and a better review habit.

The agent amplifies what's there. Make sure what's there is worth amplifying.

If you made it this far, you'll probably relate to these:

"Claude Code's 6-Week Quality Mystery: What Broke?" — the regression that almost made me quit
"Vibe Coding vs Agentic Engineering: Where I Draw the Line" — when to trust the agent, when to take the wheel
"Why I Stopped Optimizing My AI Agent and Started Shipping It" — the moment I learned shipping beats optimizing
"Why a Simple If-Else Can Beat an LLM" — sometimes the boring solution is the right one

What's the worst bug your AI agent has shipped? I collect these stories — they're how we all get better.

Weekly Roundup — What Happened in Tech, Jun 15–21

Susilo harjo — Mon, 22 Jun 2026 01:54:46 +0000

Five stories from the week of Jun 15–21, each one I read end to end.

1. CISA contractor exposed AWS GovCloud admin keys on public GitHub. A repo called "Private-CISA" had plaintext passwords, tokens, and admin credentials for multiple AWS GovCloud accounts — and the contractor had disabled GitHub's secret-scanning feature. Krebs on Security broke the story. The contractor pulled the repo after being contacted; CISA has not made a public statement.

2. Google AI Overviews fail on action words. Search "disregard" and you get a list of news stories about the bug instead of an AI Overview. The issue: action-oriented queries trigger misinterpretations. Google says a fix is coming.

3. Grok is failing in Washington. A Reuters analysis of 400+ federal AI use cases found Grok in only three — all alongside OpenAI or Microsoft. OpenAI appeared in 230+, Google and Anthropic dozens each. The gap matters because Musk is positioning xAI for what could be the biggest IPO in history.

4. Vivaldi 8.0 is David Pierce's new default browser. The Verge's Installer editor ended a five-year Arc relationship, citing speed, customization, and clever organizational tools. He admits Vivaldi is "irredeemably ugly," but the new version is good enough to live with.

5. Coffee Talk Tokyo. The third entry in the cozy barista visual-novel series. Same vibe, new setting (Tokyo instead of Seattle), same drinks, same mythical patrons — vampires, elves, werewolves. Across Switch, Xbox, PS5, and Steam.

The thread that connects them: technology is settling into itself. AI Overviews hit their first public embarrassment. Grok stumbles toward an IPO. Vivaldi and Coffee Talk find their audiences by being unapologetically themselves. None of these stories will change the world. Together, they tell you where the world is this week.
===DEVTO===

Homelab AI Agent Costs Down 60% with Ollama Quantized Models

Susilo harjo — Sat, 20 Jun 2026 01:04:50 +0000

My homelab AI agent setup was costing $42/month in API calls alone — until I switched to local quantized models.

Key Takeaways

Switching from OpenRouter API calls to local Ollama quantized models cut my monthly LLM spend from $42 to $0.
Llama 3 8B q4_0 fits in ~4GB VRAM on a single RTX 3060, leaving room for other containers.
GPU time-slicing with Docker lets multiple agent instances share one GPU without fighting over resources.
Quality was comparable: 38% preferred local Llama 3, 32% preferred API models, 30% rated them as ties.

Bottom Line

If you're spending $40+/month on API calls for predictable, bursty workloads, switching to Ollama with quantized models can slash costs to near zero while keeping performance acceptable.

Read the full analysis on Susiloharjo.

gRPC vs REST: When to Use Which

Susilo harjo — Sat, 20 Jun 2026 01:04:17 +0000

Test body for debugging

What Responsible AI Actually Means for Builders

Susilo harjo — Fri, 19 Jun 2026 02:33:48 +0000

Most “responsible AI” content reads like it was written by a policy team that has never deployed an agent to production. The checklists are long. The principles are abstract.

Key Takeaways

Most “responsible AI” content reads like it was written by a policy team that has never deployed an agent to production.
And none of them tell you what to do when your agent starts hallucinating customer data at 3 AM and the on-call engineer is asleep.
I have been building AI agents for about a year now.

Bottom Line

What Responsible AI Actually Means for Builders is a signal worth watching in 2026. If you're deploying AI agents to production, start with the blast radius test.

Read the full analysis on Susiloharjo.

By the Passage of Time

Susilo harjo — Thu, 18 Jun 2026 01:10:43 +0000

title: "By the Passage of Time"
published: false
canonical_url: https://susiloharjo.web.id/by-the-passage-of-time/
description: "Twenty-four hours is no longer enough. A reflection on information overload, wasted hours, and what Surah Al-Ashr asks of us."

tags: reflection, productivity, faith

Lately I have been feeling like twenty-four hours is not enough. Not because I have too much work, but because every single hour seems to bring something new I want to learn. A new framework. A new tool. A new piece of research. A new opinion from someone I respect. By the time I close one tab, three more have opened in my head. I catch myself doing the math in the shower — sleep seven, work eight, pray and eat and commute, that leaves maybe three or four hours of focused time. Not a lot when the input keeps multiplying.

And yet most people I know are not running out of time. They are running out of attention. They have all the time in the world, and they spend it on things that, if we are being honest, do not move the needle. Endless scrolling on TikTok. Reels until the battery dies. Mobile games that reset every morning. I am not pointing at anyone else here. I have been that person. Sometimes I still am.

So the question I keep coming back to is simple: how do we handle so much information in an era that keeps producing more of it than any one human can absorb, and how do we make sure the time we do have is spent on things that actually matter?

The verse that reframes the question

There is a short surah in the Quran — Surah Al-Ashr, the 103rd chapter — that I keep coming back to. It is only three verses, and in many Muslim traditions it is recited so often that it almost becomes background noise. But the meaning is sharper than I gave it credit for as a younger man.

Wal'asr. Innal insana lafi khusr. Illalladhina amanu wa 'amilus salihati watawasaw bil haqqi watawasaw bis sabr.

By the passage of time. Indeed, mankind is in loss. Except for those who believe, do righteous deeds, and advise each other to truth and to patience.

The structure of the verse is what gets me. It does not say "some people are in loss." It says "mankind is in loss." The default state of being alive, the verse is telling us, is loss. Time is leaking out of us from the moment we are born. Every hour that passes is one we will never get back, and most of us are spending those hours on things that will not survive us.

The exception, the verse says, is narrow. Four conditions stacked together: belief, righteous action, mutual encouragement toward truth, and mutual encouragement toward patience. All four, not three out of four. And two of the four are about other people — tawasi means "you all advise one another." The verse is built for a community, not a solo project.

That last part changed how I think about productivity. The "I will just focus harder and ship more" version of self-improvement is incomplete — the same trap I wrote about in why I stopped optimizing my AI agent and started shipping it. The Quran is asking me to also look left and right and ask whether the people around me are pointed in the same direction.

The choice that defines the era

A friend of mine put it bluntly last week. "We are not in an information age," he said. "We are in a choice age." The information is not the bottleneck. The bottleneck is the choice of what to do with the next ten minutes.

Every morning I make hundreds of small choices. Phone on the nightstand or across the room. Email first or prayer first. Read the paper or read the long-form article I bookmarked. The pattern they form is the catastrophe, not any single one of them.

I started tracking, for a week, what I actually did with the first hour of my day — not what I planned. Most days, the first sixty minutes were lost to scrolling, to messages, to "let me just check one thing." By the time I got to the work that mattered, it was already past nine and my brain was tired.

The fix was not complicated. It was also not easy. I moved the phone to a different room. I started the morning with the things I actually believe in, not the things the algorithm wanted me to see. The first hour stopped being lost. It became the most valuable hour of the day, and the rest of the day reorganized itself around it.

This is not a productivity hack. It is the same pattern that comes up in one markdown file that made my AI agent 23 points smarter — the smallest unit of attention, repeated daily, compounds. Choosing, with intention, between the thing that feels good in the moment and the thing that builds something that lasts.

Tombo ati and the things we forget

There is a Javanese song — Tombo Ati — that Muslims in Indonesia have been singing for a long time. The full title is Tombo Ati Sekawan Ewu Dinten, roughly "medicine for the heart, four thousand days." It is a list of remedies for a tired soul. The remedies are not what you would expect from a self-help book.

Jangan lupa sholat, jangan lupa baca Quran, jangan lupa sholat malam, berkumpullan dengan orang-orang sholeh, perbanyaklah berpuasa, dan zikir malam perpanjanglah, semoga Gusti Allah mencukupi, semoga sisa umur kita diridhai.

Don't forget to pray. Don't forget to read the Quran. Don't forget the night prayer. Gather with righteous people. Fast often. Extend the night remembrance. May God be enough for you. May the rest of our lives be blessed.

That is the entire prescription. No morning routine optimized to the minute. No cold showers. No journal prompts. Just six things, most of them ancient, all of them free, all of them harder than they sound.

I have been trying to take the song literally, and the resistance is real. The night prayer is the hardest. Gathering with good people requires admitting that I do not already know everything. Fasting regularly means telling my body no when every other voice in my culture is telling it yes. Each one of these is a small war with the version of me that wants to be comfortable.

But the song is not naive. It is not promising that life will be smooth. It is promising that God will be enough — not success, but sufficiency. The metric is "you will not run out of what you actually need," not abundance. That is a more interesting promise, and the only one I can defend after a hard week.

Better than yesterday, that is the target

There is a phrase that has become a kind of motto for me lately. I did not invent it. I am not sure who did. It is this: the goal is to be better today than I was yesterday, and better tomorrow than I was today.

No yearly revenue target. No follower count. No benchmark of being "successful" by anyone else's standard. Just a daily comparison against my own previous self.

It sounds soft when I write it down. In practice it is brutal. "Better" is not a vibe — it is specific. Better at what? Better how? Measurable against what? If I cannot define what better looks like for today, I will not know if I hit it, and the day will blur into the next, and the next, and at the end of the year I will look up and realize I have spent three hundred and sixty-five days being the exact same person.

I have started writing one sentence at the end of each day. Not a journal. Just "Today I did X. Tomorrow I want to do Y." Some days the sentence is embarrassing. Some days I am proud of it. Either way, the day is captured, the hour is accounted for, and the verse in Surah Al-Ashr is no longer a warning I nod at — it is a daily check.

The accountability we do not talk about

The last part of the verse is the one most of us would rather skip. We are going to be held accountable — not in a vague karmic sense, but in a real, personal, no-deflecting way.

When I was younger, I thought accountability was about the big decisions — the career, the marriage, the move. Now I think it is about the small ones. The phone I picked up instead of the book. The meeting I scheduled during the time I had promised to pray. The conversation I avoided because I did not want to be uncomfortable. Those are the ones that add up.

I am writing it at the end of a week where I did some of the things right and a lot of them wrong. I am writing it because the verse keeps coming back, and the song keeps playing in my head, and I am tired of pretending that scrolling is the same as living.

If any of this lands for you — if you are also feeling like twenty-four hours is not enough, and also feeling like you are spending them in ways you cannot quite defend — the next hour is a place to start. Not the next year. Not the next Monday. The next hour.

The verse has been saying this for fourteen hundred years. Time I started listening.

Recruitment App With AI: A Design Thinking Case Study

Susilo harjo — Wed, 17 Jun 2026 02:56:36 +0000

Last month I built a recruitment portal from scratch — request form, approval flow, candidate filtering with AI, the whole nine yards. Before I wrote a single line of code, I sat through fifteen hours of interviews with HR managers, hiring managers, and candidates who had just been rejected. That is the part most articles about building products skip.

Key Takeaways

Last month I built a recruitment portal from scratch — request form, approval flow, candidate filtering with AI, the whole nine yards.
They jump straight to the whiteboard sketch or the workshop exercise.
The portal handles request forms, multi-level approval, job posting, candidate registration, AI-assisted filtering, interview scheduling, psychological tests, salary offers, MCU (medical check-up), and onboarding logistics.

Bottom Line

Recruitment App With AI: A Design Thinking Case Study is a signal worth watching in 2026. If you're building or securing infrastructure, keep an eye on this trend.

Read the full analysis on Susiloharjo.