DEV Community: Aman

Building Agent Skills from Scratch

Aman — Fri, 26 Dec 2025 08:20:23 +0000

There's a lot written about agent skills, but not much about actually implementing them.

This post shows you how they work and how to integrate them into your existing agent.

View the complete implementation on GitHub

What Are Agent Skills?

Agent skills solve a simple problem: your system prompt gets bloated when you try to make your agent good at everything.

Instead of this:

You're an expert at code review, git, file organization, API testing...
[2000 lines of instructions]

You do this:

You have access to these skills:
- code-review: Reviews code for bugs and security
- git-helper: Git workflows and troubleshooting
- file-organizer: Organizes files intelligently
- api-tester: Tests REST APIs

Load them when needed.

The Core Idea

Skills are markdown files that live in a directory. Each skill has two parts:

YAML frontmatter with name and description (see this guide on frontmatter if you're new to it)
Markdown body with detailed instructions

When the agent needs expertise, it loads the relevant skill on the fly.

User: "Review this code for SQL injection"
  ↓
Agent: "I need the code-review skill"
  ↓
System: [Loads SKILL.md with security guidelines]
  ↓
Agent: [Follows those guidelines]

The key insight is that skills are just structured prompts. But they're modular, discoverable, and loaded on demand.

How It Actually Works

Step 1: Discovery

Scan a directory for SKILL.md files and parse their frontmatter. You only load the metadata initially, not the full content. This keeps memory usage low. (See the discovery implementation)

skills/
├── code-review/
│   └── SKILL.md        # name: code-review, description: ...
├── git-helper/
│   └── SKILL.md

During discovery, you extract just the YAML frontmatter (name, description). The full markdown content stays on disk until needed.

Step 2: Tool Registration

Convert each skill into an OpenAI function tool. The LLM sees these as callable functions:

activate_skill_code_review: "Reviews code for bugs, security, best practices"
activate_skill_git_helper: "Git workflows and troubleshooting"

The description is critical because it's what the LLM uses to decide which skill to activate. Be specific and clear.

Step 3: Activation

When the LLM calls a skill function:

Load the full SKILL.md content from disk
Add it to the conversation as a tool result
Let the LLM continue with those instructions

This is lazy loading. You only fetch content when actually needed. If you have 20 skills but only use 2, you've only loaded 2.

Step 4: Execution

The LLM reads the skill instructions and follows them. The skill acts like a temporary system prompt for that specific task. Once the task is done, the skill instructions fade from context (unless you keep them for multi-turn conversations).

What a Skill Looks Like

---
name: code-review
description: Reviews code for bugs, security, and best practices
version: 1.0.0
---

# Code Review Skill

You are an expert code reviewer.

## Check For

1. **Security**
   - SQL injection in queries
   - XSS in user inputs
   - Auth bypasses

2. **Quality**
   - Readability
   - Maintainability
   - DRY violations

3. **Performance**
   - N+1 queries
   - Memory leaks
   - Inefficient algorithms

## Response Format

**Summary**: Brief assessment
**Critical Issues**: Security problems (if any)
**Improvements**: Suggestions for better code
**Positives**: What works well

Notice the structure: clear sections, bullet points, and expected output format. The LLM follows structured instructions much better than prose. (For more on crafting effective prompts, see this guide.)

Why This Pattern Works

1. Context Efficiency

Instead of loading 10KB of instructions upfront, you load 100 bytes of metadata. Full instructions only come in when needed. This matters when you're paying per token.

2. Modularity

Each skill is independent. Add a new one by dropping in a SKILL.md file. No code changes needed. Want to remove a skill? Delete the directory.

3. Clarity

When debugging, you can see exactly which skill was activated and what instructions it provided. This makes troubleshooting much easier than a monolithic prompt.

4. Reusability

Share skills across projects. Someone else's api-tester skill works in your agent with zero modification. Skills become a shared library of expertise.

Key Design Decisions

Lazy Loading

Don't load all skills into memory at startup. This defeats the purpose because you're back to loading everything upfront.

Do load on demand. Parse frontmatter during discovery, but keep full content on disk until the LLM actually requests it.

Function Naming

Prefix skill functions clearly: activate_skill_code_review. This makes it obvious in logs what's happening. When you see activate_skill_* in your logs, you know a skill was activated.

Conversation Flow

The exact sequence matters. Here's what happens:

User sends message
LLM responds with tool_calls (requesting a skill)
Critical: Add assistant message with tool_calls to conversation
Add tool message with skill content
LLM continues with skill instructions
Final response

If you skip step 3, OpenAI will reject your request. The tool_calls must be properly formatted with a type field and nested function object. This is a common gotcha. (See OpenAI's tools documentation for details.)

Looping for Multiple Tool Calls

Skills can chain. A skill might activate code execution, which might need another skill. Your agent should loop until there are no more tool calls:

while True:
    response = llm.chat(messages=messages, tools=tools)
    if not response["tool_calls"]:
        break
    handle_tool_calls(response)

Always pass tools in every call, even after skill activation. Otherwise, skills can't use other tools like code execution. (See full implementation for the complete loop logic.)

Practical Considerations

Skill Scope

One skill equals one domain. Keep them focused.

Good examples: code-review, git-helper, api-tester
Bad example: developer-tools (too broad)

Skill Structure

Use clear sections with examples:

What the skill does
How to approach tasks
Expected output format
Examples of good results

A wall of text doesn't work. Structure helps the LLM follow instructions.

Error Handling

What if a skill doesn't exist? Return a helpful error:

"Skill 'xyz' not found. Available: code-review, git-helper"

Common Mistakes & Troubleshooting

Loading Everything Upfront

Some implementations load all skills at startup. This defeats the purpose because you're back to loading everything upfront, wasting memory and context tokens.

Fix: Load metadata only during discovery. Activate skills when needed.

Vague Skill Descriptions

The LLM uses skill descriptions to decide which to activate. Be specific.

❌ "Helps with code"
✅ "Reviews Python/JavaScript code for security vulnerabilities, PEP 8 compliance, and performance issues"

Include what the skill does, what types of tasks it handles, and key capabilities.

Wrong Tool Calls Format

Error: Missing required parameter: messages[1].tool_calls[0].type

Cause: OpenAI requires a specific nested structure. The tool_calls must have a type field and nest the function details under a function key.

Fix: Use the correct format with type: "function" and nested function object. Don't flatten it. See OpenAI's tools documentation for the exact message format.

Forgetting to Include Tools After Skill Activation

Problem: After activating a skill, the LLM can't use other tools like code execution.

Fix: Always pass tools in every LLM call. Don't remove tools after skill activation because skills might need them.

No Structure in Skills

A wall of text doesn't work. Use clear headings, bullet points, code examples, and expected output formats. The LLM follows structured instructions much better than prose.

When Skills Make Sense

Good fit:

Multi-domain agents that handle code, git, and devops
Agents with specialized workflows
Teams sharing common patterns
When you hit context limits

Not needed:

Single-purpose agents
Agents with small, focused prompts
Prototypes and experiments

Don't over-engineer. If your system prompt is small and manageable, you probably don't need skills.

The Standard

AgentSkills.io defines the open format:

SKILL.md naming convention
YAML frontmatter schema
Directory structure
Best practices

Following the standard means your skills work with other implementations. Skills become portable across projects and teams.

Building Your First Skill

Create the directory: mkdir -p skills/my-first-skill
Create SKILL.md with YAML frontmatter and markdown instructions
Integrate SkillsManager into your agent (see GitHub repo for full code)
Test it by asking your agent to use the skill and verifying it activates

That's it. No code changes needed to add new skills. Just drop in a SKILL.md file.

Bottom Line

Agent skills are structured prompts with a loading mechanism.

The pattern works because:

It keeps context lean by only loading what you need
It makes agents modular since skills are independent
It enables skill reuse so you can share skills across projects
It simplifies debugging with clear activation logs

You can build a working implementation in an afternoon. The core SkillsManager is about 130 lines of Python. (View the implementation)

Start with one skill. See if it helps. Expand from there.

The complete working implementation is available on GitHub. Use it as a reference or starting point for your own agent.

Resources

AgentSkills.io - Official specification
Claude Skills - Anthropic's skill examples
Open Agent Skills - Community skill library
Working Implementation - Complete code from this tutorial
WTF is Frontmatter? - Understanding frontmatter/metadata in markdown files
Anatomy of a Prompt - Complete guide to crafting effective AI prompts with structured approaches
OpenAI Function Calling - Official OpenAI tools documentation

I cut my site’s image payload by 97.7% (83.57 MB 1.89 MB)

Aman — Tue, 23 Dec 2025 06:36:46 +0000

I run amankumar.ai on Next.js. Pages would load, layouts would appear, and React would do its thing, but something still felt slow.

The problem wasn’t that the page didn’t render.

The problem was this:

The page renders, but images stay blank for a moment. Then they drop in one by one. And the site doesn’t feel visually complete until later.

That gap, where placeholders sit there and images arrive late, is subtle, but once you notice it, you can’t unsee it. It makes the site feel heavier than it should.

So I stopped guessing and looked at the boring part I’d been ignoring: image payloads.

Why this keeps happening on modern sites (without anyone messing up)

This problem shows up a lot more today, even on well-built sites, and it’s not because people are careless.

A few things are happening at once:

Design tools export big images by default Posters, UI mockups, and screenshots often come out as large PNGs.
AI-generated images are heavy by nature High detail, high resolution, no concern for delivery size.
Frameworks don’t change asset intent Next.js can optimize delivery, but if the source image is huge, it still has to download.
Image bloat accumulates quietly No errors, no warnings, just slower visual completion over time.

The result is exactly what I was seeing: the page loads, but images arrive last.

Measuring the problem on amankumar.ai

Before touching anything, I measured the image assets in the repo.

They were a mix of:

PNGs (most of them)
some JPG/JPEGs
different resolutions
photos, posters, and UI graphics, all mixed together

After running a single optimization pass, the numbers looked like this:

This is repo-level image weight, not “every page ships 83 MB.” But it clearly showed the root issue. I was carrying a lot of unnecessary image data, and the browser was paying for it.

The engineering question I cared about

I didn’t want a clever setup. I wanted something:

repeatable
boring
safe for mixed assets
easy to re-run later

So the real question became:

What’s the simplest image pipeline that’s correct most of the time?

First principle: pixels matter more than formats

Before debating PNG vs WebP vs AVIF, the biggest issue was pixel count.

If an image is 2400px wide but is never rendered above ~800px, shipping the extra pixels is pure waste.

So I made one hard rule:

Cap the max dimension at 800px. If an image is larger, downscale it. If it’s smaller, leave it alone.

This single decision removes a surprising amount of weight.

Image formats: quality vs size (same perceived quality)

Once dimensions are sane, format choice actually matters.

Here’s the practical ranking when comparing images at roughly the same visual quality:

AVIF genuinely compresses better than WebP in many cases.

Why I didn’t standardize on AVIF (yet)

This is important, because AVIF is genuinely impressive.

The reason I didn’t standardize on it is not quality. It’s practical browser support and operational safety.

According to current browser compatibility data:

AVIF support is good, but not universal
Some older browsers, embedded webviews, and edge cases still fall back poorly

You can see the current state clearly here: https://caniuse.com/?search=avif

For this project, I didn’t want:

extra format negotiation
more fallbacks
“why didn’t this image load on X device?” debugging

So the decision wasn’t “AVIF is bad.”

Get Aman Kumar’s stories in your inbox

Join Medium for free to get updates from this writer.

It was:

WebP is the safest modern default today. AVIF can be layered later if needed.

The tricky part: photos vs UI/posters

My repo wasn’t just photos.

It had:

UI graphics
posters with text
logos and illustrations

If you treat everything like a photo and apply lossy compression everywhere, UI assets suffer:

fuzzy text
halos around edges
cheap-looking posters

WebP helped here because it supports both lossy and lossless modes.

The challenge was choosing between them without manual tagging.

The simple rule that worked

I used one clean signal:

If the image has transparency (alpha), likely UI or graphic, use lossless WebP
If it’s fully opaque, likely a photo, use lossy WebP (quality ~80)

Is this perfect? No. Is it correct often enough to automate safely? Yes.

That single rule handled mixed assets without human intervention.

Metadata: invisible weight

Many images carry metadata:

camera EXIF
editing history
embedded profiles

None of this helps a webpage load faster or look better.

So I strip metadata unconditionally. Sometimes the savings are small; sometimes they’re large. Either way, it’s free.

Sanity check on a very different repo

To make sure this wasn’t a one-off, I ran the same script, unchanged, on another repo: promptsmint.com, an AI prompts library with only AI-generated images.

Before optimization

92 images
228.12 MB total
2.48 MB per image (avg)

After optimization

92 images
4.28 MB total
~50 KB per image (avg)

Result

98.0% reduction
223.84 MB saved

Different site. Different content. Same outcome.

That confirmed this wasn’t luck. It was just removing waste.

The script

I packaged the whole pipeline into a single script and hosted it here:

[other]script[/other]

It:

finds all jpg/jpeg/png/webp
outputs *-optimized.webp next to originals
caps size at 800px
strips metadata
uses lossless WebP for transparent images
uses lossy WebP for opaque images

No clever tricks. Just consistent rules.

The actual outcome

I didn’t magically make Next.js faster.

What changed is simpler and more important:

Images arrive earlier. Placeholders disappear sooner. Pages feel visually complete faster.

That was the slowness I was noticing, and this fixed it.

Thanks for reading. If you found this useful, you can follow or connect with me here:

x.com linkedin

I Tested 7 Python PDF Extractors So You Don’t Have To (2025 Edition)

Aman — Thu, 18 Dec 2025 13:14:00 +0000

Why This Even Matters

PDF extraction sounds boring until you need it. Then it becomes the bottleneck in everything you’re trying to build.

Maybe you’re building a document search system and need clean text for indexing. Maybe you’re creating embeddings for a RAG pipeline, and garbage text means garbage vectors. Maybe you’re processing invoices, analysing research papers, or just trying to get data out of that quarterly report someone sent you.

For small PDFs? Sure, you can often just pass the whole thing to Claude or GPT-4. But when you’re dealing with hundreds of documents, building search systems, or need structured data for processing, that’s when extraction quality actually matters.

So I decided to test the most popular Python libraries the way most developers would actually use them: minimal setup, basic extraction, real-world document.

What I Actually Tested

The Document: A typical business PDF — one page with headers, body text, a six-column table, and an image. The kind of thing that shows up in email attachments daily. You can find it here.

The Environment:

MacBook M2 Pro, 16GB RAM, macOS 15
Fresh Python 3.11 virtual environment
Clean pip installs, no optimisations

You can find the code for this test here.

Testing Approach: I used the simplest possible implementation for each library — the approach you’d try first when you’re just getting started. Most of these packages have advanced configuration options, specialised table extraction methods, and layout analysis features that could dramatically change results. But I wanted to see what you get with minimal effort.

What I Measured:

Speed: How fast can it process a single page?
Text Quality: Is the output readable and properly formatted?
Table Handling: Do tables survive extraction in usable form?

The Libraries (How to test quickly & Honest First Impressions)

pypdfium2 — The Speed Champion

# pip install pypdfium2
import pypdfium2 as pdfium
text = "\n".join(
    p.get_textpage().get_text_range() 
    for p in pdfium.PdfDocument("doc.pdf")
)

What I got: Clean, readable text in 0.004 seconds. No formatting, no table structure — just fast, basic extraction.

Good for: High-volume processing, simple content indexing, when speed matters more than structure.

Consider if you need any formatting preservation or structured data extraction.

pypdf — The Reliable Default

# pip install pypdf
from pypdf import PdfReader
reader = PdfReader("doc.pdf")
text = "\n".join(p.extract_text() for p in reader.pages)

What I got: Solid text extraction with occasional spacing quirks. Works everywhere, no C dependencies.

Good for: Lambda functions, containerised apps, environments where you can’t compile extensions.

Consider if: Text fidelity is critical for your downstream processing.

pdfplumber — The Data Extraction Tool

# pip install pdfplumber
import pdfplumber
with pdfplumber.open("doc.pdf") as pdf:
    first_page = pdf.pages[0]
    text = first_page.extract_text()
    table = first_page.extract_table()

What I got: Basic text had some concatenation issues, but the table extraction worked well. This library has extensive options for fine-tuning that I didn’t explore.

Good for: When you specifically need tabular data, coordinate-based extraction, or detailed layout control.

Consider if: You just need clean text without heavy configuration.

pymupdf4llm — The Markdown Generator

# pip install pymupdf4llm
import pymupdf4llm
markdown = pymupdf4llm.to_markdown("doc.pdf")

What I got: Clean markdown output in 0.14 seconds with proper headings and table formatting. Surprisingly good results.

Good for: Content systems, documentation processing, when you need structured text that preserves hierarchy.

Consider if: You’re dealing with complex multi-column layouts that might get scrambled.

unstructured — The Semantic Chunker

# pip install "unstructured[all-docs]"
from unstructured.partition.auto import partition
blocks = partition(filename="doc.pdf")
for block in blocks:
    print(f"{block.category}: {block.text}")

What I got: Semantically labelled chunks (Title, NarrativeText, etc.) in 1.11 seconds. Perfect for downstream processing.

Good for: RAG systems, document analysis, when you need meaningful content boundaries for embeddings.

Consider if: You just need raw text content without semantic analysis.

marker-pdf — The Layout Perfectionist

# pip install marker-pdf
from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict
from marker.output import text_from_rendered
text, _, _ = text_from_rendered(
    PdfConverter(create_model_dict())("doc.pdf")
)

What I got: Stunning layout-perfect markdown with inline images. Takes 12 seconds and downloads a 1GB model on first run.

Good for: When layout fidelity is critical, vision model inputs, and high-quality document conversion.

Consider if: You’re processing documents in real-time or have resource constraints.

textract — The Universal Handler

# pip install textract  # Requires Tesseract
import textract
text = textract.process("doc.pdf").decode()

What I got: Fast extraction (0.05s) with automatic OCR fallback capability. Handles many file formats beyond PDF.

Good for: Mixed document types, when some files might be scanned, and building robust document processing pipelines.

Consider if: You only handle digital PDFs and want to avoid additional dependencies.

Real-World Performance Results

Here’s what actually happened with my test document:

marker-pdf (11.3s): Perfect structure preservation, ideal for high-quality conversions, long time though

pymupdf4llm (0.12s): Excellent markdown output, great balance of speed and quality

unstructured (1.29s): Clean semantic chunks, perfect for RAG workflows

textract (0.21s): Fast with OCR capabilities, minor formatting variations

pypdfium2 (0.003s): Blazing speed, clean basic text, no structure

pypdf (0.024s): Reliable extraction, occasional spacing artifacts

pdfplumber (0.10s): Good for tables, text extraction needs configuration

Important caveat: These results reflect basic usage with minimal configuration. Each library has advanced features that could significantly change performance for specific use cases. You can find the link to all results in the references.

Takeaways

Context matters more than raw performance. The “best” extractor depends entirely on what you’re building and how you’ll use the extracted text.
Simple often wins. For many use cases, basic text extraction is perfectly adequate. Don’t over-engineer unless you actually need the advanced features.
Test with your data. PDF structures vary wildly. What works great on my test document might fail on your quarterly reports.
Have a fallback plan. For production systems, consider hybrid approaches: fast extraction first, and more sophisticated methods for edge cases.
Advanced features exist. This comparison only scratched the surface. Most libraries have configuration options that could completely change the results.

Next Steps

The scope of this experiment was limited, but it could be useful for most of the basic use cases around extraction. Next, I will try to solve more problems like the ones listed below. If you have anything else we should add, please let me know.

More Document types (DOC, DOCX, …) are also very common document formats we come across, we either need to convert them to some common format or check the package for compatibility with it
Handling Password-Protected PDFs
Dealing with OCR Text: PDF files may contain scanned images of text, which cannot be extracted using standard methods. To handle OCR (Optical Character Recognition) text, specialised libraries like pytesseract (a wrapper for Google’s Tesseract OCR engine) can be used to extract text from the images.
Even in PDFs / Documents, there are more edge cases like handling images inside documents, OCR, vector drawings, and more.
Forms, especially with checkboxes
Rotated PDFs
Right-to-left script (Arabic/Hebrew).
DOCX containing elements like an embedded Excel chart or a floating text box.

Bottom Line

Pick the tool that fits your actual requirements, not the one with the highest benchmark scores.

For most document processing needs, pymupdf4llm hits the sweet spot of speed and quality. For RAG systems, unstructured gives you better semantic chunks. For pure speed, pypdfium2 is hard to beat.

But honestly? The extraction is usually the easy part. The real work happens in how you process, chunk, and use that text afterwards.

Found different results with your documents or discovered better approaches? I’d love to hear about it — this space moves fast, and real-world feedback keeps comparisons honest.

I hope you have found this article useful 😄. Thank you for reading. I am

Follow me on Medium for more articles. Also, let’s connect Twitter, LinkedIn. ☕️

References:

https://bella.amankumar.ai/examples/pdf/11-google-doc-document/google-doc-document.pdf (Test Python File)

https://gist.github.com/onlyoneaman/a479b3875524d39cee234f013866015e

Final Results ZIP: http://bella.amankumar.ai/examples/pdf/1.zip

GitHub - py-pdf/benchmarks

pymupdf.readthedocs.io

marker-pdf

aiwand - pypi.org

amankumar.ai

Export Your Brain: A Simple Way To Make Any AI “Know You” From Day One

Aman — Wed, 17 Dec 2025 06:24:56 +0000

Every new AI tool treats you like a stranger.

New chat. New model. New product.
Same ritual:

“I’m a dev, I use React/TS/Tailwind.”
“Keep it short, show code first.”
“Don’t give me motivational quotes.”

You’ve already trained one AI on all of this.
It knows your stack, your tone, your quirks, your life context.

The question is: how do you steal that “you” and carry it anywhere?

That’s what this article is about:
Not “exporting ChatGPT data”, but exporting your persona so any AI, any agent, any system can plug into it.

The Real Problem: Your Context Is Trapped

We’re in a multi-AI world:

ChatGPT for ideation / debugging
Claude / Gemini for writing
Cursor / Copilot / Replit for coding
Internal bots at work
Random tools for notes, docs, email, scheduling

Each one gets a tiny slice of you.
None of them truly “know you”.

Your preferences, your working style, your constraints, your goals – they’re all split across chats, tabs, and products.

What we actually want:

“My AI context should be as portable as my email address.”

Until the tools natively support that, we hack it:
we create a Persona Memo.

The Idea: A Persona Memo You Can Plug Anywhere

Instead of trying to export raw chat history, you ask your main AI:

“Look at everything you know about me.
Compress it into a memo + JSON that any other system can use to work with me.”

This does three things:

Gives you a clean “Operating Manual” of yourself.
Gives other AIs a high-signal context to start from.
Works across tools: ChatGPT, Claude, Grok, Gemini, Cursor, internal agents, whatever.

You run one good prompt → get back a human-readable memo + machine-readable persona.

That’s your portable brain.

The Master Prompt (Use This On Whatever Knows You Best)

Take this and paste it into the AI where you’ve spent the most time (ChatGPT, Claude, etc.):

What you get back:

Operating Manual – human-readable, like a one-pager about “How to work with ”.
Persona JSON – machine-readable, ready to be plugged into agents, system prompts, configs, etc.

This is already a massive upgrade over “Hi, I like React and short answers”.

How To Use This Persona In The Wild

Once you’ve generated it, you now have a portable context pack.

You can:

Start any new AI chat like this

“Here’s my persona. Assume all of this as background whenever you answer.”
[paste Operating Manual or a trimmed version]

Set it as “custom instructions” / “system prompt”

Many tools let you define persistent instructions.
Drop your memo there and stop repeating yourself.

Wire it into your own agents

Store the JSON in your DB or config
Load it into your backend when instantiating an LLM
Prepend it in the system or developer prompt

Your internal bots now behave like they’ve worked with you for months.

Use it as onboarding material

You can even share a cleaned-up version with collaborators:
“Here’s how I work, this will save both of us time.”

Keep It Tight: Version, Don’t Worship It

A few practical notes:

Update it every 1–3 months
Your stack, goals, priorities shift. Rerun the prompt or manually edit.
Treat it like config, not scripture
If something feels wrong, change it. This is your OS.
Be selective with where you paste it
Don’t feed deeply personal / sensitive stuff to random SaaS forms.
Keep a “public-safe” persona and a more detailed internal one if needed.

Quick “Lite” Prompt (When You Don’t Need JSON)

Sometimes you just want a small memo you can paste into a one-off tool.

Here’s a shorter version:

You are an expert integration developer and personal-knowledge summariser.

Based on everything you know about me in this workspace, create a concise "Operating Manual" for working with me that I can paste into ANY AI tool.

Include sections:
- Who I am (1–2 lines)
- How I work best
- How to talk to me
- My tech / domain preferences
- What to double-check with me
- Things to avoid
- What I’m currently optimising for

Maximum 350 words. Use direct, confident language. Prioritise stable patterns over one-off details.

This is your “carry in pocket” version.

The Point Of All This

You’re already training these models every day just by talking to them.

Might as well capture that and reuse it.

One good prompt
One persona memo
Works across models, products, and your own agents

So next time you open a new AI tool, you’re not introducing yourself.

You’re just handing over your Operating Manual and saying:

“Start here. We’ve got work to do.”

How to Make Rails Response Faster

Aman — Thu, 30 Mar 2023 20:16:17 +0000

As your document or response size increases 📈, it can result in much slower response time 📈 📈 , even in seconds 😿 which results in a pretty bad user experience. Speed is the key here 🚤, the quicker your site is the more your users like it. PageSpeed and Responses Size are also responsible for SEO Rankings and is a major part of LightSpeed

In the below section, we explore how this is achieved and tested, if you have a code to fix 🐛, Just Jump to Implementation here and check this out later.

So, How do we achieve this?

I found Rack::Deflater recently, and regret how I wasn’t using it for a long time. I experimented over Rack::Deflater by sending a heavy json as response and comparing the performance of rails app with and without Rack::Deflater . Below we can see the Package Size of 8.3 Mb which was transferred without Rack::Deflater and 359 Kb which was transferred with Rack::Deflater. The Actual Factor and the ratio of Compression can vary a bit, it consistently results in a faster response, thus undoubtedly Rack::Deflater does makes the user experience better by compressing and sending resources.

So how does Rack::Deflater does this. Rackk Deflate compresses the body of web page before responding to a request. When the response is then received by the client, it sees the gzip compression enabled via the header Content-Encoding=gzip and unzips it before rendering it. Though it sounds like a lot of trouble, remember generally CPUs are fast, and networks are slow. Thus, It’s much faster and more efficient to just send less data “over the internet” even if we spend time compressing and expanding that data.

“Transferred” is the compressed size of all resources. You can think of it as the amount of upload and download data that a mobile user will use in order to load this page. “Resources” is the uncompressed size of all resources.

Implementation

Just add the below line in your application.rb file.

*config.middleware.use Rack::Deflater*

Your file should be like this after it.

....
class Application < Rails::Application
  ...
  *config.middleware.use Rack::Deflater
  ...
end
....*

TroubleShooting

Check if you have placed the middleware line correctly in code.
If you are using ActionDispatch::Static in your app. Then,Rack::Deflater should be placed after ActionDispatch::Static. The reasoning is that if your app is also serving static assets (e.g., Heroku), when assets are served from disk they are already compressed. Inserting it before would only end up in Rack::Deflater attempting to re-compress those assets. Therefore as a performance optimisation:

    # application.rb

    config.middleware.insert_after ActionDispatch::Static, Rack::Deflater

If you are using any other middleware it might conflict withRack::Deflater , it should work if you use insert_before (instead of "use"), to place it near the top of the middleware stack, prior to any other middleware that might send a response. .use places it at the bottom of the stack. Let’s say, the topmost middleware is Rack::Sendfile. So we would use:

config.middleware.insert_before(Rack::Sendfile, Rack::Deflater)

You can get the list of middleware in order of loading by doing rake middleware from the command line.

Conclusion

I hope you have found this useful 😄. Thank you for reading.
Drop a Response or 📬 if you have any queries or just wanna connect ☕️.

React Hooks and Lifecycle Methods

Aman — Wed, 29 Mar 2023 09:19:16 +0000

React State and Lifecycle are very useful methods and with the advancement of React hooks and when a developer uses hooks instead of traditional React classes the most important question becomes how one is gonna implement the lifecycle methods offered by React classes in Hooks. We will look after the Hooks implementation of various lifecycle methods in this blog.

If you are new to state and lifecycle, Have a look at the official docs.
React State and Lifecycle. Briefly, these are the methods which can be useful if you wanna execute a function when a component is mounted or unmounted or at every render of the component.

But we cannot use any of the existing lifecycle methods (componentDidMount, componentDidUpdate, componentWillUnmount etc.) in a hook. They can only be used in class components. And with Hooks, you can only use in functional components. The line below comes from the React doc:

If you’re familiar with React class lifecycle methods, you can think of useEffect Hook as componentDidMount, componentDidUpdate, and componentWillUnmount combined.

suggest is, you can mimic these lifecycle methods from a class component in a functional component.

ComponentDidMount

Code inside componentDidMount runs once when the component is mounted. useEffect hook equivalent for this behaviour is

useEffect(() => {
  // Your code here
}, []);

Notice the second parameter here (empty array). This will run only once.

ComponentDidUpdate

Without the second parameter the useEffect hook will be called on every render of the component.

useEffect(() => {
  // Your code here
});

ComponentWillUnmount

componentWillUnmount is used for cleanup (like removing event listeners, cancel the timer etc). Say you are adding an event listener in componentDidMount and removing it in componentWillUnmount as below.

componentDidMount() {
  window.addEventListener('mousemove', () => {})
}

componentWillUnmount() {
  window.removeEventListener('mousemove', () => {})
}

Hooks equivalent of the above code will be as follows

useEffect(() => {
  window.addEventListener('mousemove', () => {});

  // returned function will be called on component unmount
  return () => {
    window.removeEventListener('mousemove', () => {})
  }
}, [])

Hope this was helpful 😄

Unlocking the Power of Redis: Storing Any Object as Cache in Ruby on Rails

Aman — Tue, 28 Mar 2023 16:12:26 +0000

Unlock the full potential of Redis for storing any object as cache in Ruby on Rails. Learn how Redis works as a cache, how we can store classes and prevent threads fighting over single Redis connection.

Redis is an open-source, in-memory data structure store that can be used for caching, messaging, and real-time analytics. It has become increasingly popular in recent years because it is extremely fast, scalable, and easy to use. In this blog, we will be focusing on using Redis as a cache for storing any object in Ruby on Rails. We will explore the many benefits of using Redis as a cache and demonstrate how to integrate it into your Ruby on Rails application. We will also look into storing complex classes and arrays into redis efficiently, and finally how can we prevent different threads fighting over single redis connection.

What is Redis?

Redis stands for Remote Dictionary Server. It is an in-memory data structure store that is used as a database, cache, and message broker. It is an open-source project written in ANSI C. Redis supports a wide range of data structures such as strings, hashes, lists, sets, and sorted sets. Its amazing speed comes from its ability to keep the data in memory, which allows read/write operations to be performed with incredibly low latency.

Why Use Redis as a Cache?

Caching is a mechanism for improving the performance of your application by storing frequently accessed data in memory. This reduces the number of times your application has to access the database, which in turn reduces the response time of your application. Redis is an excellent choice for caching because it is incredibly fast and can store any type of data. Redis is also very easy to use and can be integrated into your Ruby on Rails application with minimal effort.

How Redis Works as a Cache

Redis works by storing the data in key-value pairs in memory. When you request data from Redis, it checks if the data is already in memory, and if it is, it returns the data from memory. If the data is not in memory, Redis will fetch it from the database, store it in memory, and return it to you. This process is known as caching. When you modify the data, Redis will write the changes to the database and update the cached version in memory. This ensures that the cached version is always updated with the database.

How to Use Redis in Ruby on Rails

Using Redis in Ruby on Rails is very easy. First, you need to add the Redis gem to your Gemfile:

gem 'redis'

Next, you need to configure Redis in your environment file.
Initialize a redis.rb file inside config/initializers with the below content.

REDIS_CLIENT = Redis.new(url: "YOUR_REDIS_URL", timeout: 1)

This configuration sets up Redis as the REDIS_CLIENT for your application. You can then use REDIS_CLIENT.set and REDIS_CLIENT.get to store and retrieve data from Redis. Here is an example of how to use Redis to cache data:

data = {test: "hello"}
REDIS_CLIENT.set("test_key", data, ex: 60)
cached_user = REDIS_CLIENT.get("test_key")

In this example, we are caching the object with an expiration time of 1minutes. When we retrieve the object from the cache, we get the cached version if it is still within the 1-minute expiration time, and if not, we fetch it from the database and store it in the cache again.

Next Steps

Storing Objects (Arrays / JSON / other classes) in redis.

Storing plain hashes or simple objects like strings, numbers is straightforward. But things get complicated as we try storing different classes or arrays using redis.

E.g.

data = User.first
REDIS_CLIENT.set("test_key", data, ex: 60)
cached_user = REDIS_CLIENT.get("test_key")
# Result ->  "#<V1User:0x000055e224d30520>"

Here, We need attributes like email, name, id associated with User to get stores in cache, but Redis stored the class instance simple. To avoid this, we need to convert such classes into attributes and pass a simple hash to Redis instead of Class.

data = User.first
data_to_store = JSON.dump(data.attributes)
REDIS_CLIENT.set("test_key", data_to_store, ex: 60)
cached_user = REDIS_CLIENT.get("test_key")
proper_cached_user = JSON.parse(cached_user)
# Result ->   {"id"=>1, "email"=>"aman@gmail.com"}

Similarly, there could be attributes which contain classes as a child, Let’s create a function which drills down below the object, converting it suitable to be stored in Redis.

def create_redis_cache
  key = "test_key"
  items = [User.first, User.second, {test: User.third}]
  items_to_store = attributes_from_item(items)
  REDIS_CLIENT.set(key, JSON.dump(items_to_store), ex: 60)
end

def get_redis_cache
  key = "test_key"
  items = REDIS_CLIENT.get(key)
  final_items = JSON.parse(items)
end

 def attributes_from_item(item)
    return item unless item.respond_to?(:attributes)
    new_item = {}
    item.attributes.each do |k, v|
      if v.is_a?(Hash)
        new_item[k] = attributes_from_item(v)
      elsif v.is_a?(Array)
        new_item[k] = v.map{|e| attributes_from_item(e)}
      else
        new_item[k] = v
      end
    end
    new_item
  end

You can use the above create_redis_cache and get_redis_cache methods to store and retrieve cache with any data type without worrying about validity of cache.

Managing Concurrency

When deploying rails, we often have a side worker like sidekiq, which runs parallel, doing tasks without disturbing the web application. When sidekiq opens many threads, your redis can start misbehaving due to the high number of open connections (sidekiq threads + puma threads) threads fighting over one connection (since the Redis Client only runs one command at a time, using a Monitor.). To solve this, You use a separate global connection pool for your application code.

Add connection_pool to your Gemfile.

gem 'connection_pool'

and then create limited pools in redis like this in your redis.rb initializer:

require 'connection_pool'

REDIS = ConnectionPool.new(size: 10) { Redis.new, timeout: 1 }

This ensures that even if you have lots of concurrency, you’ll only have 10 connections open to memcached per Sidekiq process.

Now in your application code anywhere, you can do this:

REDIS.with do |conn|
  # some redis operations
  r = conn.get(redis_key)
end

You’ll have up to 10 connections to share amongst your puma/sidekiq workers. This will lead to better performance since, as you correctly note, you won’t have all the threads fighting over a single Redis connection.

If you have survived till here, here is an article which will help you decrease your API Response time multi-folds.

Advanced Features of Redis

Redis is an incredibly versatile tool and has many advanced features that can be used to improve performance and functionality. Here are some of the most useful advanced features of Redis:

Pub/Sub
Redis supports Publish/Subscribe messaging which allows you to set up a messaging system between your application and other external sources. This feature can be used to push data from your application to external sources or to receive data from external sources in real-time.
Lua Scripting
Redis supports Lua scripting which allows you to execute complex operations on the server-side. This can be used to perform complex calculations, transformations, or data manipulations.
Transactions
Redis supports transactions which allow you to execute multiple commands in a single operation. This feature is useful when you need to ensure that a set of commands are executed atomically, meaning that either all of them are executed or none of them are executed.
TTL
Redis allows you to set an expiration time for the data in the key-value pair. This is known as Time to Live (TTL). When the TTL expires, Redis automatically removes the key-value pair from memory. This feature can be used to reduce memory usage and to ensure that the cached data is always up to date.

Conclusion

Redis is an excellent choice for caching frequently accessed data in your Ruby on Rails application. It is fast, scalable, and easy to use. Redis can store any type of data in memory and can be integrated into your application with minimal effort. It has many advanced features that can be used to improve performance and functionality. Using Redis as a cache can significantly improve the response time of your application and reduce the load on your database. Redis is a powerful tool that can help your application keep up with the demands of your users and stay ahead of the competition.

Did the article help or anything else you would like to suggest or add? Add a response or drop a message below to me. 😉

Follow me on Medium for more articles. Also, Let’s connect if we haven’t yet Twitter, LinkedIn.

References

How to Make Rails Response Faster

Advanced Options

GitHub - redis/redis-rb: A Ruby client library for Redis

ActiveSupport::Cache::Store

Documentation