DEV Community: Satoshi Kaneyasu

How to Reduce markdownlint Violations in AI-Generated Markdown

Satoshi Kaneyasu — Fri, 12 Jun 2026 22:03:49 +0000

I'm Kaneyasu from the Application Services Division, DevOps team at Serverworks.

Have you ever had AI coding agents (Claude Code, Kiro, GitHub Copilot Agent Mode, etc.)
generate or edit Markdown files, only to find yourself drowning in markdownlint warnings?

This article covers the common violation patterns and
a mitigation strategy combining markdownlint configuration with AI agent instruction files.

What is markdownlint?

Markdown has style best practices such as
"add blank lines around headings" and "limit line length."
markdownlint is the linter that automatically checks these rules.

markdownlint checks Markdown files for style and syntax consistency.
It defines rules from MD001 through MD060 and can be used via
the VS Code extension (davidanson.vscode-markdownlint) or CLI (markdownlint-cli2).

You can customize which rules are enabled and their parameters
by placing a .markdownlint.json file in your project root.

Warnings from markdownlint don't make files unreadable.
However, if you get into the habit of ignoring warnings,
eventually no one will pay attention to them,
and linters in general lose their effectiveness.

Common Violations When AI Generates Markdown

In my environment, when I have AI generate Markdown in a project with markdownlint enabled,
the following violations occur frequently.
Your experience may vary, but I suspect most people can relate.

MD022 (blank lines around headings): Forgets blank lines before/after headings
MD032 (blank lines around lists): No blank lines before/after list blocks
MD058 (blank lines around tables): No blank lines before/after tables
MD013 (line length): Outputs long paragraphs as a single line without line breaks
MD036 (emphasis used instead of heading): Uses a **bold**-only line as a heading substitute
MD009 (trailing spaces): Leaves unnecessary trailing spaces at end of lines
MD012 (multiple blank lines): Inserts 2 or more consecutive blank lines

MD036 (emphasis as heading) is the one I see most often.
AI tends to insert **bold**-only lines to visually separate sections.

MD013 (line length) is especially common with CJK text.
AI doesn't consider the CJK convention of breaking lines at punctuation marks,
and tends to output entire paragraphs as a single line.

Mitigation Strategies and File Locations

There are two main approaches:

Relax markdownlint settings — widen the tolerance to reduce warnings
Instruct AI agents via instruction files — tell them to follow the rules at generation time

AI instructions are not guaranteed to be followed 100% of the time.
Since configuration files provide more reliable enforcement,
combining both approaches works best.

.markdownlint.json (Linter Configuration)

Place a .markdownlint.json file in your project root.
Here is the configuration I use:

{
  // Relax the default 80 characters to 120.
  // Exclude tables, code blocks, and headings from inspection
  "MD013": {
    "line_length": 120,
    "tables": false,
    "code_blocks": false,
    "headings": false
  },
  // Disable the duplicate heading rule.
  // Technical articles often use headings like "Step 1", "Step 2"
  "MD024": false,
  // Set table pipe style to consistent (uniform within each table).
  // Strict column alignment is difficult with CJK characters
  "MD060": {
    "style": "consistent"
  }
}

Some people may prefer not to enforce line length limits at all — that's a valid preference.

Instructing AI Agents to Follow the Rules

markdownlint alone cannot prevent violations at generation time.
You need an instruction file that tells the AI agent to follow specific rules.

I include instructions like the following in my AGENTS.md:

## Markdown Style Rules

This project uses markdownlint.
When creating or editing Markdown files, you must follow these rules.

### Headings (MD022)

Always add a blank line before and after headings.

### Lists (MD032)

Always add a blank line before and after list blocks.

### Line Length (MD013)

Keep each line within 120 characters. Tables, code blocks, and headings are exempt.

Tips for long lines:
- Long paragraphs: Break lines after periods or commas
- Long URLs: Place the URL on its own line with blank lines around it

### No Emphasis as Headings (MD036)

Do not use bold/italic-only lines as heading substitutes.

Adding good/bad examples helps the AI understand the rules more accurately.

Where to Place Instruction Files

File	Target Tool	Location
`CLAUDE.md`	Claude Code	`~/.claude/CLAUDE.md` (global) or project root
`AGENTS.md`	Multiple AI agents	Project root
`.kiro/steering/*.md`	Kiro	`.kiro/steering/` directory
`.github/copilot-instructions.md`	GitHub Copilot	`.github/` directory

Differences Between AGENTS.md and CLAUDE.md

In the previous section I used AGENTS.md for the example,
but CLAUDE.md is probably the most well-known AI agent instruction file.
Let me explain the differences.

AGENTS.md

AGENTS.md is an instruction file for AI agents in general, placed at the project root.
OpenAI published the specification as an open standard format.

As stated on the official site,
many AI coding agents and tools support AGENTS.md.
Therefore, rules you want enforced across multiple tools — like Markdown style rules —
are best placed in AGENTS.md.

CLAUDE.md

CLAUDE.md is an instruction file exclusive to Claude Code.
It is automatically loaded when Claude Code opens a project.

There are two placement options:

~/.claude/CLAUDE.md — Global settings for all projects
CLAUDE.md at project root — Project-specific settings

Ideally, Claude Code would also read AGENTS.md,
but currently there doesn't appear to be such a feature.
You could instruct Claude Code to read AGENTS.md from within CLAUDE.md,
which might achieve the goal indirectly,
but it's not guaranteed to work reliably.

Kiro

Kiro has its own steering mechanism
where Markdown files placed in .kiro/steering/*.md are loaded as instructions.

Additionally, Kiro supports AGENTS.md.
The official documentation (Steering — Kiro) states:

Kiro supports providing steering directives via the AGENTS.md standard.
AGENTS.md files are in markdown format, similar to Kiro steering files;
however, AGENTS.md files are always included.

This means if you place AGENTS.md at the workspace root,
it will always be loaded regardless of steering file inclusion settings.

Note that the official documentation is published under the "CLI" section,
and there is no explicit statement about IDE (Kiro IDE) behavior.
However, when I verified this in Kiro IDE,
AGENTS.md was automatically included in the context
as a workspace-level "Included Rules" entry, just like in the CLI.
Although there is no official confirmation,
the steering mechanism appears to be shared between CLI and IDE.

Placement options:

AGENTS.md at workspace root — Project-specific rules
~/.kiro/steering/AGENTS.md — Global rules for all projects

Kiro's steering files (.kiro/steering/*.md) and AGENTS.md can be used together.
A good approach is to put cross-tool rules (like Markdown style) in AGENTS.md
and Kiro-specific instructions in steering files.

Summary

To reduce markdownlint violations in AI-generated Markdown,
combining the following two approaches is effective:

Adjust .markdownlint.json to set appropriate tolerances
Provide specific rules and examples in AGENTS.md or CLAUDE.md

It's difficult to eliminate violations entirely,
but setting up both of these significantly reduces them in practice.

Parental Controls with Kiro IDE, AWS, and NextDNS

Satoshi Kaneyasu — Tue, 09 Jun 2026 11:53:41 +0000

Note: I'm a Japanese developer, so some screenshots contain Japanese text. I hope that's okay!

Hi there.
I'm Kaneyasu from the Application Services Division (DevOps) at Serverworks.
This time, I'd like to share a personal project where I used Kiro IDE to build parental controls.
Think of it as a story of turning a small idea into reality using spec-driven development.

Introduction

This is about my own family's situation.
I want to be clear that I don't believe parental control is an absolute must in child-rearing.

Parental Controls with NextDNS

On the personal side, I have a middle school-aged kid.
They have a smartphone, but tend to stay up late browsing manga sites at night. After a family discussion, we decided to block certain sites during late-night hours until they can manage their own sleep schedule.

To achieve this, I used a service called NextDNS.

NextDNS provides private DNS. On Android, you can typically configure it under [Network & Internet] > Private DNS.
By routing through private DNS, you can block access to specific sites from the phone.

NextDNS - The new firewall for the modern Internet

The new firewall for the modern Internet

nextdns.io

Refer to the NextDNS official docs for detailed setup. In short:

Set up a private DNS with NextDNS
Add specific sites to the denylist
Configure the phone's DNS settings to use NextDNS

This way, the phone connects to the internet through private DNS, and access to specific sites is blocked.
The nice thing about this approach is that it's difficult to bypass unless you know about private DNS and how it works.

NextDNS has built-in parental control features with pre-registered sites. These can be toggled ON/OFF by time of day using the "Recreation Time" feature.
However, sites not in the default list can only be blocked via the "Denylist", which doesn't support time-based toggling.

On the other hand, NextDNS provides an API that allows toggling the denylist ON/OFF programmatically.

NextDNS API Documentation | api

nextdns.github.io

I combined the NextDNS API with AWS and built the parental controls my family needed using Kiro IDE.

System Architecture

Here's the system architecture I envisioned.
Amazon EventBridge Scheduler triggers an AWS Lambda function at night and in the morning to operate NextDNS.
Sites my kid frequently visits are registered in the NextDNS "Denylist", and Lambda toggles them ON/OFF via the API.

Spec-Driven Development with Kiro IDE

With the above architecture in mind, I developed in two stages using Kiro:

PoC (verify technical feasibility), then write a verification report, product overview, and tech stack
Implement based on the verification report, product overview, and tech stack

Honestly, I was fairly confident there wouldn't be technical issues, but I wanted to avoid wasting Kiro tokens and time if something unexpected came up. I've also learned from experience that vague instructions lead to rework, so I split it into two stages.

Kiro IDEと一緒に技術検証してみたところ、自分が熟慮せずに指示を出してることに気づいた話 - サーバーワークスエンジニアブログ

こんにちは。アプリケーションサービス部、DevOps担当の兼安です。今回はKiro IDEの話で、簡単な感想記事です。はじめに最近のKiro IDE 想定していた技術検証の流れ設計を自分で書いていないので指示が適当になっている欲が出て余計な要件を入れてしまっている自分も間違える、AIも間違えるはじめに先日ベクトルデータベースの技術検証記事を書いたのですが、この時の技術検証はKiro IDEをフル活用して書いています。 blog.serverworks.co.jp 検証用コードはこちらです。 GitHub - satoshi256kbyte/vector-db-benchmar…

blog.serverworks.co.jp

PoC Spec

With the plan decided, I created and executed the PoC Spec.
Here's the initial prompt I entered:

I want to implement parental controls for my kid's smartphone using NextDNS.
I want to block access to specific sites from late night to early morning.
By setting NextDNS private DNS on the phone, I can restrict the phone's traffic.
Domains in the default list can be toggled ON/OFF by time using
"Parental Controls" > "Recreation Time".
However, what I want to control are domains NOT in the default list.
These can be blocked via a custom domain list called "Denylist",
but it doesn't support time-based toggling.
NextDNS has an API, so I'm thinking of calling the API externally
on a schedule to achieve this.
Please verify feasibility using the URL below as reference.
https://nextdns.github.io/api/
I already have an account so I can obtain an API key.

This prompt generated requirements.md.
At this point I remembered this was supposed to be a PoC, so I added another prompt:

Let's treat this Spec as a PoC.

Verify with simple test code and set the goal as creating a report,
product overview, and tech stack documentation under docs/
for the subsequent main development.

I then added a requirement that test code execution should be done manually, and finalized requirements.md.
Then design.md and tasks.md were created.
Here's the outline of tasks.md:

Project initial setup
NextDNS API client implementation
PoC script implementation
Checkpoint - manual execution and verification
Documentation creation

Task 4 was written in tasks.md like this:

- [ ] 4. Checkpoint - manual execution and verification
  - Create `.env` file and set actual API key
  - Run `npm run poc` to verify
  - Ensure all steps pass, ask the user if questions arise.

I made this a manual step because I didn't want a potentially broken script to be executed automatically.
After reviewing tasks.md and running it, when it reached step 4, the following message appeared and it properly paused:

I entered the manual execution results as a prompt and continued tasks.md. The verification report, product overview, and tech stack were all completed.

nextdns-parental-controls/
├── config/
│   └── schedule-config.json
├── docs/
│   ├── poc-report.md        # PoC verification report
│   ├── product-overview.md  # Product overview
│   └── tech-stack.md        # Tech stack
├── scripts/
│   └── poc.ts
├── src/
│   └── nextdns-client.ts
├── package.json
├── tsconfig.json
└── README.md

I later noticed some issues with the tech stack produced here. More on that in the retrospective section.

Implementation Spec

In the Spec creation screen, I instructed it to implement based on the verification report, product overview, and tech stack from the PoC Spec.
Once the new requirements.md was generated, I ran Analyze requirements to improve requirement precision.

Analyze Requirements - IDE - Docs - Kiro

Deep analysis that catches logical inconsistencies, ambiguities, and gaps in your requirements before implementation

kiro.dev

Analyze requirements checks for logical inconsistencies, ambiguities, and gaps.

In this Spec, it provided detailed verification about the Secrets Manager permissions to grant to the Lambda function.

My impression is that Analyze requirements isn't as granular as aidlc-workflows, but it provides appropriate checks.
Once requirements.md was finalized with Analyze requirements, it was just a matter of proceeding through design.md and tasks.md to complete the implementation.
The implementation was completed and my family's desired parental controls are now up and running.

Retrospective and Lessons Learned

The implementation probably could have been done in one shot without the PoC, but verifying feasibility first gave me peace of mind.
The PoC Spec was designed with implementation deferred, with the goal of producing documentation.
In real-world projects, there are many tasks where documentation is the deliverable, so I think treating Specs this way makes them more applicable to actual work.

Regarding the tech stack produced by the Spec, I noticed several issues:

Language versions (TypeScript, Node.js) were slightly outdated
The scheduling implementation was slightly outdated (used EventBridge Rules instead of EventBridge Scheduler)
Lint/formatter tooling wasn't fully configured

These kinds of issues come up frequently in spec-driven development. From my experience, this happens regardless of the tool, not just with Kiro IDE. It's worth keeping in mind during review that these things commonly occur.

After implementation was complete, I found that deploying with IaC produced an error.
I used AWS CDK for IaC, and the cause was a simple import error that slipped through despite having unit tests.
This could have been caught by including a dry-run check (cdk synth) in the tasks.md checkpoint, not just unit tests.
I should have added that instruction earlier. Proactive instructions likely reduce AI costs in spec-driven development, and that's something to be mindful of.

Total time to completion was about 4 hours including post-deployment manual testing.
Up to implementation completion, it was under 3 hours.
However, this assumes my existing experience and knowledge of where spec-driven development tends to have gaps.
Without this background knowledge, it would likely take twice as long, or you might discover version issues only after the fact.

Conclusion

The parental controls I built are running smoothly to this day.
If my kid figures out private DNS and disables it, I'll think of something else.

The code is available here:

satoshi256kbyte / nextdns-parental-controls

NextDNS Parental Controls

NextDNSの拒否リスト（Denylist）をスケジュールに基づいて自動制御するシステム。 EventBridge Scheduler（cron式）でLambdaを起動し、設定した時刻に拒否リストの全ドメインを一括でON/OFFする。

セットアップ

前提条件

Node.js 22 以上
AWS CLI が設定済み（aws configure）
AWS CDK CLI（npm install -g aws-cdk）
git-secrets（brew install git-secrets）
対象AWSアカウントで CDK Bootstrap 済み（cdk bootstrap）
NextDNS アカウントと有効なAPIキー

インストール

# 依存パッケージのインストール（husky pre-commitフックも自動セットアップ）
npm install

# git-secretsのパターン登録（シークレット漏洩防止）
git secrets --register-aws
git secrets --add '[0-9a-f]{40}'

デプロイ

1. Secrets Manager にクレデンシャルを登録

aws secretsmanager create-secret \
  --name nextdns-pc-dev-secret-apikey \
  --secret-string '{"apiKey": "あなたのNextDNS_APIキー", "profileId": "あなたのプロファイルID"}'

更新する場合:

aws secretsmanager put-secret-value \
  --secret-id nextdns-pc-dev-secret-apikey \
  --secret-string '{"apiKey": "正しいAPIキー", "profileId": "正しいプロファイルID"}'

2. スケジュール設定

config/schedule-config.json を編集し、ブロック時間帯を設定:

{
  "windows": [
    { "startTime": "22:00", "endTime": "06:00" }
  ]
}

NextDNS拒否リストの全ドメインに対して startTime にブロックON、endTime にブロックOFF
複数の時間帯を指定可能
Lambdaは各時刻に1回だけ実行される（1日2回 × window数）
時間帯は Asia/Tokyo 固定
日またぎ（例: 22:00〜06:00）対応

3. デプロイ実行

# dry-runチェック（推奨）
npm run synth

# デプロイ
npm run deploy

4. 動作確認

# ブロックONの動作確認
aws lambda invoke \

…

View on GitHub

By the way, if my kid reads this article the whole scheme would be exposed immediately. But realistically, it's hard for them to stumble upon it, and they probably wouldn't want to read their dad's tech blog anyway — I imagine they'd close the tab after the first two lines of self-introduction. So I'm not too worried.

Getting Started with Vector Databases Using Amazon Aurora PostgreSQL + pgvector

Satoshi Kaneyasu — Wed, 03 Jun 2026 03:32:51 +0000

Hello!
I'm Satoshi Kaneyasu, DevOps engineer at Serverworks.
In this article, I'll introduce the basic concepts and terminology of vector databases for those who are just starting to learn about them.

Target Audience

This article is aimed at beginners to vector databases.
You may have heard that vector databases are related to LLMs and RAG, but aren't quite sure what they actually are.
Think of this as written with that kind of reader in mind.

What Is a Vector Database?

A vector database is a database that stores data as vectors (arrays of numbers) and searches for data using "distance" or "similarity" between vectors.

Traditional relational databases search for data using "exact match" or "partial match" (LIKE queries), but vector databases can search for things that are semantically similar.
For example, searching for "weather in Tokyo" might return results like "temperature in Tokyo" or "weather conditions in Kanto" — data that differs as a string but is semantically related.

Visualizing Vector Space

In a vector database, all data is represented as points in a multidimensional space. When searching, the query is also converted into a vector, and data that is "close in distance" within that space is retrieved.

This diagram represents it in two dimensions, but in a real vector database, proximity and distance are defined across many dimensions.

Use Cases for Vector Databases

Vector databases are used across a wide range of applications:

Use Case	Description
RAG (Retrieval-Augmented Generation)	Knowledge base search to provide external knowledge to LLMs. Allows internal documents and up-to-date information to be reflected in LLM responses
Semantic Search	Searching internal documents or FAQs by meaning rather than keywords. Handles spelling variations and synonyms
Recommendation	Recommending products and content whose vectors are close to a user's preference vector. Used as an alternative or complement to collaborative filtering
Image Search	Searching for similar images (face recognition, product image matching). Images are vectorized using an embedding model and compared
Anomaly Detection	Detecting data that deviates far from the vector of normal patterns. Used in log analysis and security monitoring
Duplicate Detection	Detecting similar documents or code. Used for plagiarism detection and content deduplication

The most common use case is RAG.

RAG: A Technique for Improving Answer Accuracy

What Is RAG?

RAG (Retrieval-Augmented Generation) is a technique that improves LLM response accuracy by searching for relevant information from external data sources before generating a response, then including that information in the prompt.

LLMs cannot accurately respond to information not included in their training data (internal documents, recent news, specialized technical information, etc.).
With RAG, you can have the LLM reference external knowledge stored in a vector database to generate more accurate and up-to-date responses.

When using Amazon Bedrock as the LLM for RAG, there is a fully managed RAG feature called Knowledge Bases.
With Knowledge Bases, you simply register documents stored in S3 and AWS manages everything — vectorization, vector database setup, and search.
Since you don't need to set up a vector database yourself, this is ideal when you want to try RAG quickly or minimize infrastructure management.

AWS Bedrock Knowledge Bases

Since this article focuses on the vector database itself, we'll proceed without using Knowledge Bases.

RAG Processing Flow

The RAG process follows this flow:

The user inputs a question (e.g., "What is AWS Lambda?")
The application vectorizes the question text using an embedding model
The vectorized query is used to search the vector database and retrieve relevant documents
- At this point, you specify how many relevant documents to retrieve (e.g., top_k=3)
The retrieved relevant documents are sent to the LLM as context in the prompt
The LLM generates a response while referencing the search results
The response is returned to the user

As you can see, the vector database plays a central role in RAG as the "search engine for external knowledge."
From here, let's dive deeper into the "vector database search" step.

Searching a Vector Database

Search Flow

The vector database search flow works as follows:

The user searches for "weather in Tokyo"
The application vectorizes "weather in Tokyo" using an embedding model (e.g., a 1024-dimensional vector)
Cosine distance is calculated against the data in the vector database (pre-vectorized using the same model)
The top k results with the closest distance are returned

In a vector database, data is represented as multidimensional numbers.
Therefore, data and search queries are converted to numbers at insertion time.
This is called vectorization, or Embedding.

The key point of vector database search is that the search query itself is also vectorized.
Instead of searching with raw text, it is converted to a vector using an embedding model (described later), and data that is close in vector space is retrieved.

From here, I'll use implementation examples with Aurora PostgreSQL + pgvector (abbreviated throughout) and Python code.

There are multiple options for building a vector database on AWS, but I find Aurora PostgreSQL + pgvector to be the most approachable starting point, and it's a great way to feel the difference between a conventional relational database and a vector database.

Search Implementation Code

Here is an implementation example using Aurora PostgreSQL + pgvector:

# ① Vectorize the text query (handler.py)
embedding_result = generate_embedding(query)
query_embedding = embedding_result.embedding  # 1024-dimensional vector

# ② Search the DB with the vectorized query (logic.py)
with connection.cursor() as cur:
    cur.execute(
        # Calculate cosine distance between query vector and DB vectors,
        # return top_k results in ascending distance order
        "SELECT content, embedding <=> %s::vector AS distance "
        "FROM embeddings ORDER BY distance LIMIT %s;",
        (query_embedding, top_k),
    )
    results = cur.fetchall()

The <=> operator here is pgvector's cosine distance operator.
A smaller value means higher similarity.
Because we're using Aurora PostgreSQL + pgvector, we can use SQL to query the vector DB.
This code uses a prepared statement to safely pass the vectorized search text and the result count (top_k) into the %s placeholders.

Several terms have appeared in this simple search, so let me explain them.

Embedding (= Vectorization)

Embedding refers to the process of converting data such as text or images into a numerical vector.
It is also called "vectorization."

Humans intuitively know that "Tokyo weather forecast" and "Tokyo temperature" are similar, but computers can only compare strings.
By numerically representing meaning through embedding, computers can mathematically calculate "semantic closeness."

Before: "Tokyo weather forecast"
After:  [0.0231, -0.0142, 0.0567, ..., 0.0412]  ← 1024 numbers

Embedding Implementation Code

Here is an implementation example using Amazon Bedrock's Titan Embeddings V2.
The generate_embedding function implemented here is called at step ① in the "Search Implementation Code" above.

def generate_embedding(text: str) -> EmbeddingResult:
    """Vectorize text using Bedrock Titan Embeddings V2."""
    client = _get_bedrock_client()
    body = json.dumps({
        "inputText": text,        # Before: text
        "dimensions": 1024,       # Output dimensions
        "normalize": True,        # Normalize (set vector length to 1)
    })

    response = client.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=body,
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embedding"]  # After: [float] × 1024
    return EmbeddingResult(embedding=embedding, time_ms=elapsed_ms)

Specifying normalize=True normalizes the output vector length to 1.
This makes cosine similarity calculation equivalent to a dot product calculation, improving search efficiency.

Dimensions

In the embedding implementation code, there was a keyword called "dimensions."
Dimensions refer to the number of numbers in a single vector.

3-dimensional vector:    [0.5, -0.3, 0.8]           ← 3 numbers
1024-dimensional vector: [0.023, -0.014, ..., 0.041] ← 1024 numbers

More dimensions allow for finer representation of "meaning," but storage consumption increases accordingly.

Dimensions	Size per vector	Size for 100k records
256	1 KB	~100 MB
1024	4 KB	~400 MB
1536	6 KB	~600 MB
3072	12 KB	~1.2 GB

The number of dimensions is determined by the embedding model you use. Titan Embeddings V2 lets you choose from 256, 512, or 1024, allowing you to balance accuracy and cost based on your use case.

Embedding Models

Specialized models that convert text to vectors are distinct from LLMs (generative models).
Embedding models specialize in generating representations for computing semantic similarity.

Model	Provider	Dimensions	Features
Titan Embeddings V2	AWS Bedrock	256/512/1024	AWS native. Has normalization option. High affinity with AWS environments
Cohere Embed v3	AWS Bedrock	1024	Multilingual support. Evaluated as highly accurate for Japanese
text-embedding-3-small	OpenAI	256~1536	Lightweight and low cost. Multilingual support. Best for cost-sensitive use cases
text-embedding-3-large	OpenAI	256~3072	High accuracy and multilingual support. Flexible dimension selection

An important note: you must use the same model for both search and registration.
Vectors generated by different models don't exist in the same space, so distance calculations are meaningless.

Amazon Titan Text Embeddings V2 - Bedrock Documentation

Cosine Similarity and Cosine Distance

Cosine similarity represents "how much two vectors point in the same direction" as a number between -1 and 1.
Closer to 1 means more semantically similar, closer to 0 means unrelated, and closer to -1 means semantically opposite.

Cosine distance is defined as 1 - cosine similarity and ranges from 0 to 2.
A smaller value means higher similarity, and pgvector's <=> operator returns this cosine distance.
"Distance" and "similarity" are just opposite representations of the same concept.

Metric	Range	"More similar" direction	Use case
Cosine Similarity	-1 to 1	Larger value (closer to 1)	Threshold judgment (e.g., "hit if >= 0.95")
Cosine Distance	0 to 2	Smaller value (closer to 0)	ORDER BY in SQL, KNN search

The search implementation code (embedding <=> %s::vector) sorts by cosine distance, while the threshold judgment in semantic cache (described later) (similarity >= 0.95) uses cosine similarity.

top_k

top_k is the number of top-k results to return from a search. Set an appropriate value based on the use case.

Small top_k (1–5): Returns only the most relevant results. Suitable when you want to limit the context passed to an LLM in RAG
Large top_k (10–100): Returns a wide range of candidates. Suitable for recommendations or displaying a list of candidates

In RAG, it is common to pass the full set of top_k results as context to the LLM.
Be aware that making top_k too large will lengthen the context, increasing the LLM's token consumption and latency.

Normalization

Normalization is the process of setting the length (norm) of a vector to 1.
With Titan Embeddings V2, specifying normalize=True automatically normalizes the output vector.
Cosine similarity between normalized vectors becomes equivalent to a simple dot product.
Since dot products have lower computational cost than cosine similarity, this leads to more efficient search.
Also, by standardizing vector lengths, distance comparisons purely reflect "differences in direction," which stabilizes search result quality.

Registering Data in a Vector Database

Of course, data must be registered in advance before you can search a vector database.
Let's now look at data registration in a vector database.

Data Registration Flow

Data registration in a vector database follows this flow:

Drop the HNSW index (to speed up registration)
Vectorize text data using an embedding model and INSERT it into the database in batches (e.g., 500 records at a time)
Once all data is registered, bulk-create the HNSW index

As with the search explanation, I'll use implementation examples with Aurora PostgreSQL + pgvector and Python.

Table Definition on Aurora PostgreSQL + pgvector

The following table and index are created on Aurora PostgreSQL with the pgvector extension enabled:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- embeddings table (storage for vector data)
CREATE TABLE IF NOT EXISTS embeddings (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1024) NOT NULL
);

-- HNSW index (speeds up ANN search)
CREATE INDEX IF NOT EXISTS idx_embeddings_embedding
    ON embeddings
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

The content column in the embeddings table stores the text data, and the embedding column stores the vectorized text.
An HNSW index is then created on the embedding column.

HNSW (Search Algorithm and Index Algorithm)

Vector databases have indexes too, and in Aurora PostgreSQL + pgvector, you create indexes with the CREATE INDEX statement just like regular indexes.
Here, ON embeddings USING hnsw specifies something called the index algorithm.
The index algorithm is closely related to the search algorithm, and these two algorithms are critical in vector databases.

Search Algorithms

There are two main types of search methods in vector databases:

Search Method	Full Name	Features
KNN	K-Nearest Neighbor	Compares against all data exhaustively. Accuracy is perfect but computation cost increases linearly as data grows, making it slow
ANN	Approximate Nearest Neighbor	Searches approximately. Slightly lower accuracy but can search at high speed even with large volumes of data

In practical systems, ANN is almost always used.
KNN is fine for small-scale data of a few thousand records, but ANN becomes essential when dealing with tens of thousands of records or more.

Index Algorithms

The data structures used to implement ANN are called index algorithms, and there are several types:

Algorithm	Mechanism	Features
HNSW	Builds a hierarchical graph structure and progressively narrows the search range from upper to lower layers	High accuracy and high speed. Higher memory consumption but currently the most widely used
IVF	Clusters data and performs partial search only on clusters close to the query	Memory-efficient. Suitable for large-scale data but may have lower accuracy than HNSW

Currently, the ANN + HNSW combination is the standard for building vector databases.
AWS offers multiple ways to build vector databases, and Aurora PostgreSQL + pgvector, OpenSearch, and MemoryDB all support HNSW.

HNSW Index Parameters

-- HNSW index (speeds up ANN search)
CREATE INDEX IF NOT EXISTS idx_embeddings_embedding
    ON embeddings
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

The WITH clause in the index creation SQL specifies the HNSW index parameters:

Parameter	Meaning	Effect when increased	Typical value
m	Connections per node	Search accuracy ↑ / Memory consumption ↑ / Build time ↑	16
ef_construction	Search width during construction	Search accuracy ↑ / Build time ↑	64~200

Data Registration Implementation Code

Here is the Python code to register a substantial amount of data into Aurora PostgreSQL + pgvector:

class AuroraIngester:
    """Batch INSERT data into Aurora pgvector.

    Efficiently inserts vector data using batch INSERT of 500 records at a time.
    """
    def __init__(self, connection: psycopg2.extensions.connection) -> None:
        self._connection = connection

    def ingest_batch(self, start_index: int, end_index: int) -> int:
        """Batch INSERT records in the specified range.

        Args:
            start_index: Start index (inclusive)
            end_index: End index (exclusive)

        Returns:
            Number of records inserted
        """
        values_parts: list[str] = []
        params: list[str | list[float]] = []
        for i in range(start_index, end_index):
            values_parts.append("(%s, %s::vector)")
            params.append(f"doc-{i}")
            params.append(generate_vector(seed=i))

        sql = f"INSERT INTO embeddings (content, embedding) VALUES {', '.join(values_parts)};"
        with self._connection.cursor() as cur:
            cur.execute(sql, params)
        self._connection.commit()
        return end_index - start_index

    def ingest_all(self, record_count: int, batch_size: int = 500) -> int:
        """Insert all records into Aurora in batches.

        Args:
            record_count: Total number of records to insert
            batch_size: Number of records per batch (default 500)

        Returns:
            Total number of records inserted
        """
        log = logger.bind(database="aurora_pgvector")
        total_inserted = 0

        for start in range(0, record_count, batch_size):
            end = min(start + batch_size, record_count)
            for attempt in range(1, MAX_RETRIES + 1):
                try:
                    count = self.ingest_batch(start, end)
                    total_inserted += count
                    break
                except Exception as e:
                    log.warning("batch_insert_retry", start=start, end=end, attempt=attempt, error=str(e))
                    if attempt == MAX_RETRIES:
                        log.error("batch_insert_failed", start=start, end=end, error=str(e))
                        break
                    time.sleep(RETRY_DELAY_SECONDS)

        log.info("ingest_all_complete", total_inserted=total_inserted)
        return total_inserted

def _run_database_ingestion(index_manager, ingester, record_count):
    """Execute bulk data insertion into the database.

    Args:
        index_manager: Object managing index drop and creation (implementation omitted)
        ingester: Object that inserts data in batches (described above)
        record_count: Total number of records to insert
    """
    # ① Drop index (speeds up registration)
    index_manager.drop_index()
    # SQL executed internally:
    # DROP INDEX IF EXISTS embeddings_hnsw_idx;
    # TRUNCATE TABLE embeddings;

    # ② Batch registration (500 records at a time)
    ingester.ingest_all(record_count, batch_size=500)

    # ③ Bulk index creation
    index_manager.create_index()
    # SQL executed internally:
    # CREATE INDEX embeddings_hnsw_idx
    #   ON embeddings USING hnsw (embedding vector_cosine_ops)
    #   WITH (m = 16,              -- Connections per node (more = higher accuracy, more memory)
    #         ef_construction = 64); -- Search width during construction (more = higher accuracy, slower build)

The reason for dropping the index first, registering data, and then recreating the index is that registering data while an index exists makes processing time unpredictable.
This technique is commonly used in relational databases and applies equally to Aurora PostgreSQL + pgvector.
For more details, see: Index Considerations When Bulk-Inserting Large Amounts of Data into a Database (Japanese)

Semantic Cache

One technique for speeding up search and data retrieval is caching.
For vector databases, there is a technology called semantic cache that differs slightly from conventional caching.

What Is Semantic Cache?

Semantic cache is a mechanism that uses the embedding vector of a query as a key to cache past search results or FM (Foundation Model) responses, and quickly returns results from the cache for semantically similar queries.

Comparing it with conventional caching reveals its unique characteristics:

	Conventional Cache	Semantic Cache
Key	Exact string match	Vector similarity
Hit condition	Only the exact same query	Semantically similar queries also hit
Example	Only "weather in Tokyo" hits	"Tokyo weather forecast" and "What's the weather in Tokyo today?" also hit

With conventional caching, "weather in Tokyo" and "Tokyo weather forecast" are treated as different keys, resulting in lower cache hit rates. Semantic cache can group semantically equivalent queries together for caching, dramatically improving hit rates.

Semantic Cache Processing Flow with Amazon MemoryDB

When implementing semantic cache on AWS, Amazon ElastiCache or Amazon MemoryDB are the typical options.
Here, I'll introduce a semantic cache implementation using Amazon MemoryDB (hereafter, MemoryDB), referencing the following documentation:

Amazon MemoryDB - Vector Search Examples

Setting aside the RAG with a vector database for a moment, if you introduce semantic cache for Foundation Model queries, the processing flow would look like this:

The application vectorizes the query using Bedrock Titan Embeddings V2
Perform cosine similarity search with FT.SEARCH KNN against MemoryDB (cache store)
If a result with similarity above the threshold is found (cache hit) → Return the cached FM (Foundation Model) response
If similarity is below the threshold (cache miss) → Call the FM for inference and save the result to cache (HSET + EXPIRE)

Index Definition in MemoryDB

Note: MemoryDB is a Redis-compatible key-value store and does not have "tables" like RDBs. Data is stored in Hash-type keys, and the search schema is defined as an "index" using the FT.CREATE command.

In this repository, the following FT.CREATE command creates the index for semantic cache:

FT.CREATE semantic_cache_idx
  ON HASH
  PREFIX 1 cache:
  SCHEMA
    embedding    VECTOR HNSW 10
                   TYPE FLOAT32
                   DIM 1024
                   DISTANCE_METRIC COSINE
                   M 16
                   EF_CONSTRUCTION 512
    query_text   TAG
    result       TEXT
    created_at   NUMERIC
    ttl          NUMERIC

Field	Type	Description
embedding	VECTOR (HNSW)	Query embedding vector (1024 dimensions). Target for KNN search
query_text	TAG	Original query text. For exact match filtering
result	TEXT	FM response result (cached answer)
created_at	NUMERIC	Cache entry creation time (UNIX timestamp)
ttl	NUMERIC	Cache expiration time (seconds)

PREFIX 1 cache: means only Hashes whose key name starts with cache: are indexed
HNSW parameter EF_CONSTRUCTION=512 is set higher than Aurora pgvector (64). Since MemoryDB operates in-memory, build cost is relatively low, so accuracy is prioritized

Semantic Cache Threshold

The threshold for semantic cache is the cosine similarity value used to determine cache hits.

Threshold	Characteristics	Recommended Use Case
0.95~1.0	Only nearly identical queries hit	Accuracy-focused. When you want to minimize the risk of returning incorrect cached responses
0.80~0.90	Synonymous phrasing variations also hit	Practical balance. Recommended for most use cases
0.70~0.80	Related queries also broadly hit	Hit rate-focused. However, the risk of returning unrelated results increases

The appropriate threshold depends on business requirements, so I think it's safe to start with a high threshold around 0.95 and gradually lower it while monitoring cache hit rates.

HSET + EXPIRE

These are not keywords specific to vector databases or semantic cache — they are Redis commands, which is the engine underlying MemoryDB.

HSET

A command that saves field-value pairs together in a Hash-type key.
Multiple fields like embedding, query_text, result, and created_at can be stored as a single entry.

In Redis / MemoryDB, it's conventional to use colon-separated naming like cache:abc123 for key names.
This simply means "entry abc123 in the cache category" — the colon itself has no special function.
The PREFIX 1 cache: in the index definition is a setting to make only keys starting with this prefix subject to search.

Redis HSET Command

EXPIRE

A command that sets an expiration time (TTL) on a key. After the specified number of seconds, the key is automatically deleted. This prevents stale cache entries from accumulating.

Redis EXPIRE Command

Implementation Code

The implementation code got a bit long, but what it does is the same as typical cache-based data retrieval: use the cache if available, otherwise search and save the result to cache.
I'll introduce the implementation code in three stages.

Query Vectorization and Cache Lookup

def handler(event, context):
    query = event["query"]  # "What is AWS S3?"

    # ① Vectorize the query (Bedrock Titan V2)
    embedding_result = generate_embedding(query)
    query_embedding = embedding_result.embedding

    # ② Cache lookup via MemoryDB → FM call
    cache_result = process_query(
        query_text=query,
        query_embedding=query_embedding,
        redis_client=redis_client,
        threshold=0.95,   # Environment variable SIMILARITY_THRESHOLD
        ttl_seconds=3600, # Environment variable CACHE_TTL
    )

    # ③ Return response (with metrics)
    return {"statusCode": 200, "body": {...}}

Cache Lookup Processing

def process_query(query_text, query_embedding, redis_client,
                  threshold, ttl_seconds):
    # ① Query MemoryDB cache (FT.SEARCH KNN)
    search_results = search_similar(redis_client, query_embedding)

    if search_results:
        key, similarity, fields = search_results[0]

        # ② Cache hit → Return result from cache (no FM call)
        if similarity >= threshold:
            return CacheResult(hit=True, source="cache",
                               result=fields["result"])

    # ③ Cache miss → Query FM directly and get result
    fm_result = _invoke_fm(query_text)

    # ④ Save result to cache (HSET + EXPIRE)
    _store_cache_entry(redis_client, query_text,
                       query_embedding, fm_result, ttl_seconds)

    return CacheResult(hit=False, source="fm", result=fm_result)

MemoryDB Cache Query

def search_similar(redis_client, query_embedding, top_k=1):
    """Execute KNN vector search with FT.SEARCH."""
    query_vec = struct.pack(f"<{len(query_embedding)}f", *query_embedding)

    query = (
        Query(f"*=>[KNN {top_k} @embedding $query_vec AS score]")
        .return_fields("query_text", "result", "created_at", "score")
        .sort_by("score", asc=True)
        .paging(0, top_k)
        .dialect(2)
        .timeout(3000)  # 3-second timeout
    )

    results = redis_client.ft("semantic_cache_idx").search(
        query, query_params={"query_vec": query_vec}
    )

    # Convert cosine distance to similarity (distance = 1 - similarity)
    return [(doc.id, 1.0 - float(doc.score), fields) for doc in results.docs]

MemoryDB's FT.SEARCH command is compatible with Redis's RediSearch module and natively supports KNN vector search.
score is returned as cosine distance (1 - cosine similarity, theoretically in the range 0~2). 1.0 - score converts it to cosine similarity.
With Titan V2's normalize=True, output vectors are already normalized, so actual scores fall in the range 0~1, meaning the converted similarity also stays in the 0~1 range.

Semantic Cache Performance

Here are the measured results under the following conditions:

Item	Value
FM (Foundation Model)	Claude 3 Haiku (`anthropic.claude-3-haiku-20240307-v1:0`)
Embedding Model	Titan Embeddings V2 (1024 dimensions)
Cache Store	Amazon MemoryDB
Similarity Threshold	0.95
Test Query	"What is AWS S3?" (same query run twice)

The threshold is set high at 0.95.
Please treat these measurement results as reference values to demonstrate that semantic cache has a certain level of effectiveness.

Metric	Cache Miss (1st run)	Cache Hit (2nd run)	Reduction
Total Response Time	4,573ms	279ms	94%
Embedding Generation	194ms	192ms	—
Cache Lookup	4ms	3ms	—
FM Call	4,375ms	0ms	100%

When there's a cache hit, the FM call is completely skipped, reducing response time by 94%.
Since only embedding generation (~190ms) and cache lookup (~3ms) are needed to complete the response, user experience is dramatically improved.
Skipping the FM call also directly translates to reduced API usage costs.

RAG + Semantic Cache Processing Flow

Semantic cache can be integrated into a RAG system.
In that case, the processing flow would look like this:

Vectorize the query
Search for similar queries in MemoryDB (cache)
Cache hit → Immediately return the cached response
Cache miss → Search Aurora pgvector (vector DB) for RAG context
Call the FM with the retrieved context for inference
Save the FM response to cache and return it to the user

On Cache Hit

On Cache Miss

Summary

In this article, I covered everything from the basic concepts of vector databases to implementation on AWS and optimization with semantic cache.

Vector database basics
- A vector database is a database that "searches by meaning." It handles spelling variations and synonymous expressions that traditional keyword search cannot catch
- Both data and search queries are vectorized using the same embedding model, and "semantic closeness" is calculated using cosine similarity
- ANN + HNSW is the standard for vector databases
Building vector databases on AWS
- AWS offers multiple options: Aurora PostgreSQL + pgvector, OpenSearch, S3 Vectors, MemoryDB, and more
- Aurora PostgreSQL + pgvector, which can be operated with SQL and leverages existing skills, is recommended as the first step
- For bulk data ingestion, the "drop index → insert data → bulk create index" pattern is the go-to approach
Semantic cache
- Can be used to cache queries that are semantically similar

That's all for this time.
Thank you for reading this lengthy article!

DEV Community: Satoshi Kaneyasu

How to Reduce markdownlint Violations in AI-Generated Markdown

What is markdownlint?

Common Violations When AI Generates Markdown

Mitigation Strategies and File Locations

.markdownlint.json (Linter Configuration)

Instructing AI Agents to Follow the Rules

Where to Place Instruction Files

Differences Between AGENTS.md and CLAUDE.md

AGENTS.md

CLAUDE.md

Kiro

Summary

Parental Controls with Kiro IDE, AWS, and NextDNS

Introduction

Parental Controls with NextDNS

NextDNS - The new firewall for the modern Internet

NextDNS API Documentation | api

System Architecture

Spec-Driven Development with Kiro IDE

Kiro IDEと一緒に技術検証してみたところ、自分が熟慮せずに指示を出してることに気づいた話 - サーバーワークスエンジニアブログ

PoC Spec

Implementation Spec

Analyze Requirements - IDE - Docs - Kiro

Retrospective and Lessons Learned

Conclusion

satoshi256kbyte / nextdns-parental-controls

NextDNS Parental Controls

セットアップ

前提条件

インストール

デプロイ

1. Secrets Manager にクレデンシャルを登録

2. スケジュール設定

3. デプロイ実行

4. 動作確認

Getting Started with Vector Databases Using Amazon Aurora PostgreSQL + pgvector

Target Audience

What Is a Vector Database?

Visualizing Vector Space

Use Cases for Vector Databases

RAG: A Technique for Improving Answer Accuracy

What Is RAG?

RAG Processing Flow

Searching a Vector Database

Search Flow

Search Implementation Code

Embedding (= Vectorization)

Embedding Implementation Code

Dimensions

Embedding Models

Cosine Similarity and Cosine Distance

top_k

Normalization

Registering Data in a Vector Database

Data Registration Flow

Table Definition on Aurora PostgreSQL + pgvector

HNSW (Search Algorithm and Index Algorithm)

Search Algorithms

Index Algorithms

HNSW Index Parameters

Data Registration Implementation Code

Semantic Cache

What Is Semantic Cache?

Semantic Cache Processing Flow with Amazon MemoryDB

Index Definition in MemoryDB

Semantic Cache Threshold

HSET + EXPIRE

HSET

EXPIRE

Implementation Code

Query Vectorization and Cache Lookup

Cache Lookup Processing

MemoryDB Cache Query

Semantic Cache Performance

RAG + Semantic Cache Processing Flow

On Cache Hit

On Cache Miss

Summary

References