DEV Community: Roman Tsypuk

AWS Golden Jacket: My Journey of Learning and Certification

Roman Tsypuk — Mon, 08 Jun 2026 05:05:58 +0000

Cloud technologies have been one of my biggest professional passions for many years, and AWS has played a significant role in that journey.

Through continuous learning, hands-on experience, and the pursuit of AWS certifications, I gained knowledge that helped me make better technical, architectural, and business decisions across a wide range of projects. Over time, I expanded my expertise across multiple AWS domains and eventually completed every AWS certification available at that time.
As an award for these achievements, I was honored to receive the AWS Golden Jacket. This post is the story of that journey.

I hope it motivates individuals who are considering a similar path and helps remove some of the uncertainty surrounding AWS certifications.

When AWS was a DevOps-Only land

Long before I started pursuing AWS certifications, I still remember visiting the Certification Center located on the campus, where I completed multiple Oracle Java certifications. Every visit was an opportunity to meet other professionals preparing for exams—developers, DevOps engineers, project managers, analysts, and architects—each pursuing their own goals.

One thing stood out: AWS certifications were primarily taken only by DevOps engineers. At the time, many organizations treated AWS as an infrastructure-only domain. More then that - developers rarely had access to the AWS console, all these was DevOps responsibilities. Maybe it was due to carrier path requirements for DevOps that were built in the company.

Fortunately, I was working on a large distributed system hosted across AWS and Rackspace infrastructure. Because our development team participated in on-call rotations and incident response, gaining AWS access became a necessity rather than a privilege. And that was a moment when all started - EC2, S3, RabitMQ, SQS, SES, S3, Spark, RDS, EMR, Redshift. Working as developer I permanently was learning Cloud - it was given that missing feeling from Telco when you have access and control to equipment and servers infrasturcture distributed globe-wide.

SAA-C01: First AWS certification

Later, working in different startups and having experience with Amazon Web Services I already was driving the projects arthictetures,
was involved into discoveries and PoCs, was helping others to learn AWS. At that moment I wanted to structure gained practical experience with Cloud and force it with certification with focus on real use cases and how they can be implemented with AWS. By that time I have scored all existing Oracle Java certifications and wanted to expand knowledge from Core Languages to Cloud Infrastructure.

Why I choose Architect certification — it gave a very quick and wide view to AWS ecosystem, its core principles, source of knowledge. At that moment I did not know anything about Golden Jacket and was keen to increase the expertise in AWS.

Next Certifications

Just after SAA-C01: Architect were DVA-C01:Developer and SOA-C01:SysOps - that were a great extension of AWS knowledge deeper to VPC level and CI/CD tools.
Next was BDS-C00:BigData exam - amazing exam and content that you will not meet today it was all-in-one BigData, DataBases, Machine Learning at that time, enjoyed it a lot and used many concepts from it at many projects.

SCS-C01: Security - a MUST have caveat when you are working with AWS, only after that learning you have a full clear picture of IAM and permissions that allows you operate at multi-account organisation level aws accounts. I had exprerience working with huge enterprise companies that were using multiple accounts aggregated into organisation and the knowledge from this set always helped to find vulnerabilities, missconfigurations, show them to clients and fix the breaches.

SAP-C01: Professinal Architect is a strong hard exam that moves you to next the level, instead of knowing the landscape and integration of services it allows you deep dive into full end-to-end architecture and each service in isolation. DOP-P01: Devops Professional is a logical continuation - exam materials very coresspond to Architect plus added focus on IaC and infra. This match allows you fluently operate at complex infrastructure levels.

CLF-C01: Cloud Practitioner was more a check mark - at that moment there were opportunities and requests from customer for free AWS certification, so why not.

At that moment Big Data exam was retired and splitted into 3 new certifications - DBS-C01: Databases, DAS-C01: DataAnalytics and MLS-C01: Machine Learning, and it was a right decision. Each Certification allows deeply learn domains. Today all 3 of them are retiered and topics have migrated into different domain areas of other exams. This Data-oriented set of exams is a great to have in portfolio - every system in startup/enterprise has challanges and issues on persistence layer that require optimisation.

AIF-C01: AI Practitioner opens hot AI topic in AWS certification and later continued DEA-C01: Data Engineering and MLA-C01: ML engineering.

ANS-C01: AWS Advanced Networking - the Toughest Certification

Having a telecommunications background, I initially assumed this certification would be one of the easier ones. I had hands-on experience with radio and fiber-optic networks and had completed networking-related certifications from vendors such as Ericsson, Nokia, Huawei, and Siemens. Networking was already part of my professional foundation.

But, my first attempt ended under circumstances completely outside my control. During the exam, a large-scale attack on Ukraine's energy infrastructure caused a nationwide blackout. Power and internet connectivity disappeared instantly. By that point, I had already answered approximately 40 questions.
When I later received the score report, I saw that I had achieved around 65% of the required score. It was impossible not to wonder whether I could have passed if I had been able to finish the exam.

My second attempt was even more frustrating. I scored 731 points, just 19 points below the passing score of 750.
This time, the exam focused heavily on topics such as AWS Direct Connect, BGP routing, hybrid networking, and enterprise-scale connectivity patterns—areas that many cloud engineers rarely encounter in day-to-day work. Another challenge was the gap between exam preparation materials and the rapidly evolving AWS platform. Some learning resources contained information that was no longer fully aligned with current AWS services, limits, and best practices. While studying, I often found myself asking:
Should I answer according to the latest AWS documentation, or according to the knowledge that existed when the exam questions were created?
I chose to follow the most current AWS guidance, even when I suspected the exam might expect an older answer.

After this try I switched to other Certification that were more related to my projects activity and decided that ANS-C01 will be the last in chain. Once completed all certification, finally I returned again to ANS-C01. But this time I have another approach - by that time I have got some outdated CISCO network equipment, few tiny clients and assembled home LAB.
With this setup, I experimented with technologies and scenarios — setup of vlans and advanced routing, passing connection of self-hosted Proxmox to Cloud with site-to-site vpn, layered DNS resolvers in cloud and on-prem, Deep packet inspection with snort, pfSence setup and integration with vlans.
Now I'm using this stack at CCNA course and for PoCs. These practical setups, allows quickly refreshing the knowledge and try different setups. And finally, I completed the exam, even having some saved time after it with confidence of done.

ANS-C01: But it is for sure the most in-depth certification

Advanced Networking ultimately became one of the most rewarding certifications in my AWS journey.
Once you understand networking at a deeper level, you begin to see AWS differently. You gain insight into how services communicate internally, how new AWS services are integrated into the broader ecosystem, and how managed services operate behind the scenes from own accounts and data plane connections.
Certification goes far beyond networking itself. It changes how you think about architecture, scalability, resilience, and the design decisions that power modern cloud platforms.

My full AWS certification flow:

SAA-C00: AWS Certified Solutions Architect - Associate
DVA-C01: AWS Certified Developer - Associate
SOA-C01: AWS Certified SysOps Administrator - Associate
BDS-C00: AWS Certified Big Data - Specialty
SCS-C01: AWS Certified Security - Specialty
SAP-C01: AWS Certified Solutions Architect - Professional
DOP-C01: AWS Certified DevOps Engineer - Professional
CLF-C01: AWS Certified Cloud Practitioner
ANS-C01: AWS Certified Advanced Networking - Specialty (Failed)
DBS-C01: ANS-C01: AWS Certified Advanced Networking - Specialty (Failed)
DAS-C01: AWS Certified Database - Specialty
DAS-C01: AWS Certified Data Analytics - Specialty
MLS-C01: AWS Certified Machine Learning - Specialty
DEA-C01: AWS Certified Data Engineer - Associate
SAP-C02: AWS Certified Solutions Architect - Professional (Recertified)
DOP-C02: AWS Certified DevOps Engineer - Professional (Recertified)
AIF-C01: AWS Certified AI Practitioner
SCS-C02: AWS Certified Security - Specialty (Recertified)
MLA-C01: AWS Certified Machine Learning Engineer - Associate
ANS-C01: AWS Certified Advanced Networking - Specialty

Build your own path based on thing you are interested in, tasks, demand

One lesson I learned throughout my certification journey is that there is no universal roadmap. The best certifications are the ones that align with your interests, your responsibilities, and the problems you solve every day. I always choose certifications that were relevant to the projects I was working on. The knowledge gained during preparation helped me improve application performance, optimize infrastructure costs, strengthen security, and make better architectural decisions.
Certifications are most valuable when they support real-world experience. Choose the path that helps you grow in the areas that matter most to your career, and the benefits will extend far beyond passing an exam.

Why learn AWS and certify

I quickly realized that AWS certifications provide a well-structured and detailed learning path that helps you focus on specific domains while also building a broad understanding of cloud technologies. The preparation materials are high quality and organized in a way that makes it easier to grow across different areas of expertise.
Studying for these certifications strengthens knowledge across multiple domains, including AI/ML, DevOps, architecture, networking, and databases. This foundation allows you not only to communicate effectively with engineers and business stakeholders but also to contribute to a wide range of project-level decisions.

For me, AWS certifications became a powerful tool for self-improvement and continuous growth across roles such as architect, engineer, DevOps, networking, data, and AI. This hands-on expertise has helped me support multiple companies in solving complex, real-world challenges.

Golden Jacket it is not a collection of badges, it is your constant learning and growth

The knowledge it is not some fact table or absolute that you quote daily, it is a connected information from your experience in different domains, industries, languages, people, etc.
As more bridges between this isolated islands you can establish - more effesiant you can utilise it. Cause the same principles of resilience and high availability are in
financial systems, telco equipment. AWS Certifications allow you not only grow this islands, but rebuild them into mainland. On every next certification you fully rebuild your entire understanding and think about same components from multiple edges.
This is very important to track targets from all angels and positions not only how everybody used to.

Everything Fails — Including Certification Attempts

Even failure is part of the process. There is no perfect study plan, perfect environment, or perfect exam day. Unexpected events can affect your preparation, concentration, internet connectivity, health, personal life, or simply your performance during the exam itself.
Because of that, it is important not to view certification exams as something extraordinary or unattainable. You may fail an exam attempt, and that is perfectly normal.
As engineers, we design systems with the assumption that failures will happen. Servers fail. Networks fail. Storage devices fail. Entire regions can fail. Resilient architectures are not built by pretending failures do not exist—they are built by planning for them and recovering from them.

The same principle applies to certifications. A failed exam is not the end of the journey. It is feedback.
The score report highlights areas where your knowledge can be strengthened. It reveals gaps in understanding and often points to topics that deserve deeper practical experience. Every unsuccessful attempt provides valuable information that can help you prepare more effectively for the next one.
All of the certifications I completed were passed on the first attempt, except one required additional study, more hands-on practice, and a different approach. Looking back, the certification that challenged me the most often became the ones from which I learned the most.

Success is not defined by never failing. Success is defined by how quickly you analyze what happened, adjust your approach, and continue moving forward.
Treat certification failures the same way you would treat a production incident: investigate the root cause, identify improvements, implement corrective actions, and try again.
The goal is not perfection. The goal is continuous growth.

AWS Certifications evolve with the IT Industry

One of the most common arguments against AWS certifications is that they expire after three years.
At first glance, that may seem like a disadvantage. Why invest the time and effort if you need to recertify in the future?
Your perspective will change once you understood how quickly AWS evolves.

Every year, AWS launches new services, introduces major features, updates architectural best practices, and expands into entirely new technology domains. Areas such as serverless computing, containers, machine learning, generative AI, observability, and security have changed dramatically over the last few years alone.
Three years in cloud technology is a long time.
The purpose of recertification is not simply to renew a badge. It is an opportunity to revisit your knowledge, learn what has changed, and stay current with the latest developments across the AWS ecosystem.

Another important aspect of the AWS certification program is its tiered structure. As you progress to higher-level certifications, AWS automatically renews related associate-level certifications. This means that maintaining advanced and professional certifications often helps keep your broader certification portfolio current as well.
Several certifications that I earned no longer exist in their original form. AWS Certified Big Data – Specialty, AWS Certified Advanced Analytics – Specialty, and AWS Certified Database – Specialty have all been retired or replaced by newer certifications that better reflect the current industry focus.

Although the certifications themselves changed, the knowledge behind them did not disappear.
The principles of data processing, analytics, database design, scalability, performance optimization, and architecture remain relevant regardless of how AWS restructures its certification portfolio. Technologies evolve, but foundational engineering concepts continue to provide value throughout your career.
AWS has also experimented with highly specialized certifications over the years. Some, such as the AWS Certified Alexa Skill Builder or SAP Specialty, targeted very specific domains and audiences. Others focused on technologies that were relevant to a particular period of AWS growth. This year AWS Advanced Machine Learning Specialty is also retiered and AWS GenAI Professional Developer is a new alternative to it.

This evolution is a sign of a healthy certification program.
As the cloud industry changes, certifications must change as well. New technologies emerge, demand shifts, and learning paths adapt to reflect what engineers need to know today rather than what was relevant five years ago.

Instead of viewing the three-year validity period as a limitation, I see it as an invitation to continue learning.
The cloud industry never stands still, and neither should we.

non-Certification Challanges

Preparing for and passing AWS certifications is not always about studying services, architectures, and best practices. Sometimes the biggest challenges come from circumstances completely outside your control.

During the COVID-19 pandemic, testing centers around the world suspended operations, making remote-proctored exams the primary certification option. Later, when the war in Ukraine began, the situation became even more complicated. Remote sessions were often the only available way to continue pursuing certifications.

For many candidates, a certification exam means going to comfortable center. For some of us, it also meant planning around air raid alerts, power outages, and uncertainty about whether connectivity would remain available throughout the exam session.
A large-scale blackout interrupts remind you that even the best preparation cannot eliminate every external risk.

COVID-19. War. Blackouts. These were challenges, not excuses. If I was able to complete my certification journey under these circumstances, then temporary setbacks should never stop you from pursuing your own goals. Focus on what you can control, keep learning, and continue moving forward.

Once certified how to request AWS Golden Jacket

After completing all AWS certifications, there is no direct “order button” or automated system that tracks eligibility.
Instead, access to the Golden Jacket program is typically handled through specific AWS-related communities and programs.
Depending on your involvement with AWS, the path may go through one of the following channels:

AWS Employee
AWS Ambassador
AWS Community Builder
AWS Partner Organisation

One important detail that is often surprising: even AWS Technical Account Managers (TAM) may not always be familiar with the Golden Jacket process. In such cases, it is common to connect with other TAMs or AWS representatives who have previously supported similar requests.

Yes, this is you next challange, for me this process was scratched into 1 year.

Based on the program you have applied — delivery options differ. Some can deliver it to your home others require your mandatory in-personal presence on events like - re-Invent or AWS Summit. These are places where a special certification lounge is located to get you swags, receive Jacket, network with others.

The Hidden Benefit of Learning AWS

One unexpected benefit of my AWS certification journey was how much it expanded my knowledge beyond AWS itself.
As you grow in cloud engineering, you begin to look beyond services and APIs. You become interested in what happens behind the scenes—operating systems, networking stacks, virtualization platforms, orchestration systems, databases, storage engines, and the open-source technologies that power modern cloud platforms.
The deeper I went into AWS, the more I found myself exploring the technologies underneath it.
This curiosity eventually led me to build my own home lab environment. Today, I run a self-hosted infrastructure based on Proxmox, pfSense, and a variety of open-source tools and services. It provides a practical environment for experimentation, learning, and validating architectural ideas before applying them in production environments.

This broader perspective makes it easier to design hybrid architectures, integrate cloud and on-premises systems, work across multiple cloud providers, and understand the technologies that power modern distributed systems.

AWS Subject-matter expert

I'm excited that was invited to join AWS Subject matter expert program, where I can contribute to certifications, review exams details. This is AWS certification program that is focus on improvement of exams quality.

Golden Jacket is not a final stop

The plans are continuing to lead the learning journey.
Once I have the opportunity, will be thrilled to visit conferences re-Invent and Global Summits in person and network with other AWS Golden Jackets holders.

For those interested in AWS learning and certifications, I have created an application that facilitates faster learning and growth. You are welcome to join it.

I would like to thank everyone who supported, encouraged, challenged, and inspired me throughout this AWS journey — your guidance, knowledge, and motivation made this achievement possible.

Running local AI LLM anywhere: from EC2 instances to Edge Devices

Roman Tsypuk — Sun, 07 Jun 2026 06:27:50 +0000

Llama.cpp is one of the most efficient frameworks for running Large Language Models locally. Written in pure C/C++, it is optimized for performance and low resource consumption, making it a popular choice for developers who want direct control over model inference without additional runtime layers.

It supports a broad ecosystem of open-source models, including Llama, Mistral, Gemma, Phi, Qwen, Falcon, Yi, Solar, and many others, while providing efficient execution across CPUs, GPUs, mobile devices, and edge hardware.

Benefits and use cases running Local Edge Device LLM

Having possibility to run LLM on Edge Devices opens multiple use cases:

Minimal latency to inference, no 3rd party hops, man in the middle
Operatable offline mode, local LLM does not require full-time Internet connection. Connection is needed for model updates or inferent results upload. System can stay year runing in ofline mode
Edge device can perform inference and get results on a smaller/faster local model. Only in case if model results do not pass defined threshold the bigger more powerful model from Cloud can be used and called
using quantization trained model can be compacted according to available resources of CPU and RAM
embeddings calculation can be done on the edge side allowing integrate RAG and vector stores
privacy: there is no Provider that is logging all requests and responses using your data and uses it for traing of next level model
sensitive data is sent away from device to 3rd parties
no celular or wifi needed
infrastructure costs savings - there is no need to have big fleet of LLM inferences for all edge connections. Cloud model is used as a fallback when local LLM results are out of thresholds

Installation on ARM Edge hardware

Once you got SSH access to EC2 instance, follow this commands to install llamacpp:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DCMAKE_CXX_FLAGS="-mcpu=native" -DCMAKE_C_FLAGS="-mcpu=native"
cmake --build . -v --config Release -j `nproc`

Installation on Android Devices

First install TERMUX emulator do device, then you can enable sshd service, and remotly connect to you device from any machine and run installation:

~ $ pkg upgrade -y
~ $ pkg install -y clang wget cmake git
~ $ pkg install -y clang wget cmake git
clang is already the newest version (21.1.8-2).
wget is already the newest version (1.25.0-1).
cmake is already the newest version (4.3.2-1).
git is already the newest version (2.54.0).
Summary:
  Upgrading: 0, Installing: 0, Removing: 0, Not Upgrading: 0

~ $ git clone https://github.com/ggerganov/llama.cpp
Cloning into 'llama.cpp'...
remote: Enumerating objects: 95392, done.
remote: Counting objects: 100% (80/80), done.
remote: Compressing objects: 100% (45/45), done.
remote: Total 95392 (delta 51), reused 35 (delta 35), pack-reused 95312 (from 3)
Receiving objects: 100% (95392/95392), 393.55 MiB | 13.94 MiB/s, done.
Resolving deltas: 100% (67912/67912), done.

~ $ cd llama.cpp
~ $ cmake -B build
~ $ cmake --build build --config Release

Download models GGUF

One of the key innovations in the llama.cpp ecosystem is the GGUF model format. GGUF was designed to optimize model storage and inference by supporting a wide range of quantization schemes that reduce the precision of model weights without significantly impacting model quality.

Instead of storing weights exclusively in FP32 or FP16 formats, GGUF allows models to be quantized to lower-bit representations such as 8-bit, 6-bit, 5-bit, or even 4-bit integers. This substantially decreases memory usage, reduces bandwidth requirements, and improves inference performance.

These optimizations enable modern LLMs to run efficiently on CPUs, including Arm-based processors, making local AI inference practical on laptops, edge devices, and other resource-constrained systems.

Here are few huggingface GGUF-models I have experimented with:

wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf
wget https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-GGUF/resolve/b69aef112e9f895e6f98d7ae0949f72ff09aa401/Llama-3.2-1B-Instruct-Q3_K_M.gguf
wget https://huggingface.co/QuantFactory/SmolLM2-1.7B-Instruct-GGUF/resolve/main/SmolLM2-1.7B-Instruct.Q3_K_M.gguf
wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf
wget https://huggingface.co/sensura/Qwen3-0.6B-Q3_K_M-GGUF/resolve/main/qwen3-0.6b-q3_k_m.gguf

Quantization

Consider the model file Llama-3.2-1B-Instruct-Q4_K_M.gguf. Each part of the name provides information about the model architecture, size, and quantization level.

The most important component for inference efficiency is Q4_K_M. The Q4 indicates that the model weights have been quantized to 4-bit precision, meaning each parameter requires only 4 bits of storage instead of the 16 bits typically used by FP16 models. The K_M suffix refers to a specific quantization scheme used by llama.cpp that balances model quality and performance.

By reducing the precision of billions of model parameters, quantization dramatically decreases both model size and memory requirements. As a result, models that would otherwise require several gigabytes—or even hundreds of gigabytes for larger parameter counts—can run on consumer hardware with significantly less RAM while maintaining good inference quality.

Start LLM server llama.cpp

$ ./llama.cpp/build/bin/llama-server -t 4 -m models/Llama-3.2-1B-Instruct-Q4_K_M.gguf --host 0.0.0.0 --port 9090 -np 1

To run llama with mcp servers support additional key should be used:

$ ./llama.cpp/build/bin/llama-server -t 4 -m models/qwen2.5-0.5b-instruct-q4_k_m.gguf --host 0.0.0.0 --port 9090 -np 1 --webui-mcp-proxy

To run llama with support of embeddings endpoint;

./llama.cpp/build/bin/llama-server -m models/qwen2.5-0.5b-instruct-q4_k_m.gguf -t 4 --host 0.0.0.0 --port 9090  --embeddings --pooling cls

llamacpp has built in OpenWEB UI frontend

Here are results running qwen2.5-0.5b-instruct-q4_k_m.gguf model on extra small ARM instance, where is free 1G RAM:

Here are resuls running same qwen2.5-0.5b-instruct-q4_k_m.gguf model on Android phone that has 8Cores and 8GB RAM:

Token speed dramatically differs.

Connect OpenAI client to Model

With this framework you can use available models, host them even on your Android phone. Now having your local AI running
same SDKs can be used to work with it, with just few parameters adjustment (URL, KEY, MODEL_NAME):


from openai import OpenAI

if __name__ == '__main__':
    client = OpenAI(api_key="dummy",
                    admin_api_key="dummy",
                    base_url="http://127.0.0.1:9090/v1",
                    )

    response = client.chat.completions.create(
        model="qwen2.5-0.5b-instruct-q4_k_m.gguf",
        messages=[
            {"role": "user", "content": "introduce yourself"}
        ]
    )

    print(response.choices[0].message.content)

As an artificial intelligence language model, I am Qwen and I was created by Alibaba Cloud. I am a language model designed to assist users in various tasks such as writing, communication, and processing text. I am programmed to understand, generate, and respond to text, and I am constantly learning and improving based on the data I have been trained on. I am here to help and to support you in any way that I can.

here are detailed results of model execution:

{
    "messages": [
        {
            "content": "introduce yourself",
            "role": "user"
        }
    ],
    "model": "qwen2.5-0.5b-instruct-q4_k_m.gguf"
}

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "As an artificial intelligence language model, I am Qwen and I was created by Alibaba Cloud. I am a language model designed to assist users in various tasks such as writing, communication, and processing text. I am programmed to understand, generate, and respond to text, and I am constantly learning and improving based on the data I have been trained on. I am here to help and to support you in any way that I can.",
                "role": "assistant"
            }
        }
    ],
    "created": 1780758287,
    "id": "chatcmpl-eZuyspAITmOmnYdCQf5blxdQGJviy41W",
    "model": "qwen2.5-0.5b-instruct-q4_k_m.gguf",
    "object": "chat.completion",
    "system_fingerprint": "b9295-95405ac65",
    "timings": {
        "cache_n": 0,
        "predicted_ms": 4865.387,
        "predicted_n": 88,
        "predicted_per_second": 18.08694765699008,
        "predicted_per_token_ms": 55.28848863636363,
        "prompt_ms": 557.404,
        "prompt_n": 32,
        "prompt_per_second": 57.4089888124233,
        "prompt_per_token_ms": 17.418875
    },
    "usage": {
        "completion_tokens": 88,
        "prompt_tokens": 32,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "total_tokens": 120
    }
}

Let's now run Llama3.2 model:

$ ./llama.cpp/build/bin/llama-server -t 4 -m models/Llama-3.2-1B-Instruct-Q4_K_M.gguf --host 0.0.0.0 --port 9090 -np 1 --webui-mcp-proxy

Hello, I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello, I'm an artificial intelligence model known as Llama. Llama stands for \"Large Language Model Meta AI.\"",
                "role": "assistant"
            }
        }
    ],
    "created": 1780758446,
    "id": "chatcmpl-kTksGv3J8zKpesmhjNzf7NWRUQTdYYUR",
    "model": "Llama-3.2-1B-Instruct-Q4_K_M.gguf",
    "object": "chat.completion",
    "system_fingerprint": "b9295-95405ac65",
    "timings": {
        "cache_n": 0,
        "predicted_ms": 1736.759,
        "predicted_n": 25,
        "predicted_per_second": 14.394628155086572,
        "predicted_per_token_ms": 69.47036,
        "prompt_ms": 1286.379,
        "prompt_n": 38,
        "prompt_per_second": 29.540283229126096,
        "prompt_per_token_ms": 33.85207894736842
    },
    "usage": {
        "completion_tokens": 25,
        "prompt_tokens": 38,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "total_tokens": 63
    }
}

Benchmark

When running llama.cpp on Android devices, it is important to understand the underlying ARM CPU architecture. An octa-core processor does not mean eight equal-performance cores. Most modern mobile chipsets use a big.LITTLE design that combines a small number of high-performance cores with several power-efficient cores.

For LLM inference, the highest performance is not always achieved by using all available cores. Additional efficiency cores can introduce scheduling overhead and may contribute less computational power, resulting in lower token generation rates than a configuration that uses only the high-performance cores.

The best approach is to benchmark your device with different thread counts and compare tokens-per-second performance. Once you identify the optimal configuration, tune llama.cpp accordingly to maximize inference speed and overall efficiency.

~ $ ./llama.cpp/build/bin/llama-bench -t 4,6,8 -m models/Llama-3.2-1B-Instruct-Q4_K_M.gguf  -p 128,256,512 -n 64

model	size	params	backend	threads	test	t/s
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	4	pp128	46.43 ± 0.23
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	4	pp256	49.40 ± 0.38
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	4	pp512	50.47 ± 0.33
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	4	tg64	14.35 ± 0.06
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	6	pp128	51.29 ± 0.25
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	6	pp256	54.54 ± 0.55
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	6	pp512	55.42 ± 0.21
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	6	tg64	14.13 ± 0.04
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	8	pp128	50.08 ± 1.49
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	8	pp256	57.29 ± 0.37
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	8	pp512	58.07 ± 0.48
llama 1B Q4_K - Medium	762.81 MiB	1.24 B	CPU	8	tg64	14.36 ± 0.15

Limitation

When running from cli from termux emulator on Android, Edge device or EC2 instance, there is limitation that GPU resources are not available for c++ code,
so inference is done only using CPU resources. But for some Edge ARM bare-metal devices llamacpp has support for cuda and can inference using
GPU, which should give better tokens speed.

I have tried running llamacpp with few models on Android device - it runs smooze and the performance is higher compared to ARM-based IoT devices (but this is due to ARM chipset architecture).
However, on DataComputation intensive tasks, you can obser that the rate of CPU is fully utilized - thus screen can blink during tokens generation.
Such sideeffects will gone when you offload LLM to GPU benefits, check that you hardware is in a supported list. Other alternative can be Litert-lm framework that is natively supported by google and allows invoke LLM on Android devices in CPU and GPU.

Redis connection monkey patching in Ruby Jungles

Roman Tsypuk — Fri, 10 Apr 2026 12:51:31 +0000

Some programming languages allow developers to “hack” or extend their internals by overriding existing methods in standard libraries, dynamically attaching new behavior to objects, or modifying classes at runtime.

One of the languages that strongly embraces this flexibility is Ruby.

This ability is often referred to as monkey patching, and while it should be used with caution, it can be extremely powerful in real-world scenarios—especially when dealing with legacy systems or unavailable source code.

Ruby and Runtime Flexibility

Ruby is a highly dynamic, object-oriented language where:

Classes can be reopened and modified at any time
Methods can be overridden or extended dynamically
Behavior can be injected into existing objects or modules
Even core classes (like String, Array, etc.) can be modified

This makes Ruby particularly well-suited for rapid prototyping, metaprogramming, runtime instrumentation, patching legacy dependencies.

However, this flexibility comes with responsibility: poorly designed patches can introduce hard-to-debug issues.

Example

A simple example of extending a built-in class:

class String
  def patch
    "---" +self.upcase + "---"
  end
end

# rbi
> "aaa".patch
=> "---AAA---"

> "test".patch
=> "---aaa---"

This demonstrates how easily Ruby allows you to modify even core classes like String.

Real-world Example: Patching Redis Connection Pool

I encountered a set of legacy Ruby applications that depended on outdated libraries. These dependencies were no longer available in Git repositories, although prebuilt gems were still stored in an internal artifact repository.

As part of a Redis migration, I needed to identify all polyglot services connecting to Redis instances. The goal was to introduce a CLIENT_NAME for every Redis client, regardless of the programming language used.
So that majority of services followed projects structure +/- similar go-lang stack, but those Ruby legacy services were out of the landscape.

Challenges

No access to source repositories of dependencies
No explicit Redis connection URLs
A proprietary “DIY Redis discovery” mechanism
Redis connections abstracted behind internal libraries

This made it difficult to instrument Redis clients in a standard way.

Solution: Monkey Patching

Fortunately, Ruby’s monkey patching capabilities provided a way forward.

Even without modifying third-party libraries, I was able to intercept Redis connection creation and inject metadata at runtime.

The idea was simple:

As soon as a Redis connection is established, annotate it with metadata such as service name, Ruby version, and Redis client version.

Original Connection Code (Simplified):

module RedisConfig
  class Connection
    def self.create_instance!(r_name)
      redis = Redis.new(options)
      redis
    end
  end
end

Patched Implementation

I created a module that overrides the create_instance! method and augments it with additional instrumentation:

module ServicePatch
  module RedisMetadataPatch
    def create_instance!(r_name, &blk)
      super(r_name) do |redis|
        set_open_api_metadata!(redis, r_name)
        blk.call(redis) if blk
      end
    end

    private

    def set_open_api_metadata!(redis, r_name)
      safe_call(redis, [:client, :setname, 'SERVICE_NAME'], r_name)
      safe_call(redis, [:client, :setinfo, 'LIB-NAME', "ruby:#{RUBY_VERSION}"], r_name)
      safe_call(redis, [:client, :setinfo, 'LIB-VER', Redis::VERSION], r_name)
    end

    def safe_call(redis, command, r_name)
      redis.call(command)
    rescue Redis::BaseError, StandardError => e
      warn("[redis metadata] #{r_name} #{command.inspect} failed: #{e.class}: #{e.message}")
      nil
    end
  end
end

RedisConfig::Connection.singleton_class.prepend(ServicePatch::RedisMetadataPatch)

Using prepend ensures that:

The patched method runs before the original implementation
super correctly delegates to the original method
The patch is cleanly layered without modifying original code

Results

After deploying this patch, all Redis clients automatically started reporting metadata.
Here is monitoring from Redis server-side that shows how now these ruby services are instrumenting connection name:

valkey.xxxx.xx.xxxx.xxx.cache.amazonaws.com:6379> monitor
OK
1774951026.839060 [0 xx.xx.95.236:48528] "hello" "3" "setname" "service-api1"
1774951026.839435 [0 xx.xx.95.236:48528] "client" "setname" "service-api1"
1774951026.840134 [0 xx.xx.95.236:48528] "client" "setinfo" "LIB-NAME" "ruby:4.0.1"
1774951026.840142 [0 xx.xx.95.236:48528] "client" "setinfo" "LIB-VER" "5.4.1"
1774951026.840614 [0 xx.xx.95.236:48528] "ping"
1774951031.463576 [0 xx.xx.70.215:58252] "hello" "3" "setname" "service-api2"
1774951031.464538 [0 xx.xx.70.215:58252] "client" "setname" "service-api1"
1774951031.468056 [0 xx.xx.70.215:58252] "client" "setinfo" "LIB-NAME" "ruby:4.0.1"
1774951031.468066 [0 xx.xx.70.215:58252] "client" "setinfo" "LIB-VER" "5.4.1"
1774951031.468728 [0 xx.xx.70.215:58252] "ping"

Observability Gains

Once the instrumentation was in place, I was able to use a custom Redis client scanner to analyze traffic to:

identify which services are connected to which Redis instances
track command usage patterns
detect idle or misbehaving clients
correlate activity across polyglot systems

Example output:

┌─────────────────────┬──────────────────────┬──────────────────────┬─────────┬───────┬───────┬────────┬────────┬────────┬────────┐
│ Client Addr         │ Name                 │ Lib                  │ Lib Ver │ Age   │ Idle  │    GET │   MGET │    SET │ ZRANGE │
├─────────────────────┼──────────────────────┼──────────────────────┼─────────┼───────┼───────┼────────┼────────┼────────┼────────┤
│ xx.xx.226.123:27613 │ service-api1         │ ruby:4.0.1           │ 5.4.1   │ 27740 │ 14    │      0 │      2 │     12 │      0 │
│ xx.xx.240.240:32031 │ service-api2         │ ruby:4.0.1           │ 5.4.1   │ 89306 │ 1838  │      0 │      8 │     48 │      0 │
│ xx.xx.240.240:41498 │ service-api3         │ ruby:4.0.1           │ 5.4.1   │ 89306 │ 189   │      0 │     13 │     87 │      0 │
│ xx.xx.254.221:58628 │ service-api4         │ ruby:4.0.1           │ 5.4.1   │ 10503 │ 64    │      0 │     11 │     72 │      0 │
│ xx.xx.254.221:9620  │ service-api5         │ ruby:4.0.1           │ 5.4.1   │ 10503 │ 1238  │      0 │      9 │     54 │      0 │
└─────────────────────┴──────────────────────┴──────────────────────┴─────────┴───────┴───────────────────────────────────────────

Conclusion

This approach allowed me to instrument legacy Ruby applications without modifying their dependencies or internal logic. By leveraging Ruby’s dynamic capabilities, I was able to introduce observability into a previously opaque system.

In environments with legacy constraints, such techniques can turn blockers into manageable engineering problems.

And Ruby is very straightforward language to write code, some ideas from it migrated to kotlin.

From Redis to Valkey: pre-migration Reconnaissance — detect all apps & connections in realtime

Roman Tsypuk — Sun, 05 Apr 2026 17:57:21 +0000

Abstract

Abraham Lincoln: "Give me six hours to chop down a tree and I will spend the first four sharpening the axe"

Redis Valkey is getting more popular due to its performance increase compared to classic Redis version, I'm starting a seris of posts
related to migration from one vendors' Redis implemnetation to AWS Valkey.

To choose proper migration technics, the most important step is a reconnaissance of pre-migration. In this post I'll explain how native Redis features can help to identify all services that have connection to Redis (what is really hard in distributed environment of enterprise level infrastucture, that was created with periodically changed stack, languages, SDKs by multiple engineering generations).

Valkey project bried history

Redis first version release was in 2009, since that time it grown from cache key-value storage to PubSub, Stream, DB and used in a lot of projects.

Thus cloud providers AWS, GCP, Azure, Oracle started providing Redis as a managed service - allowing engineers offload cluster management and all heavy lifting to providers.

But Redis company provided own cloud - RedisCloud with Redis-managed-services. So to have more clients in RedisCloud and bigger monetization, starting from Redis version 7.4 the license was changed oriented to other cloud-providers to pay for Redis if it is offered as a managed service.

At that moment a git fork of Redis was created https://github.com/valkey-io/valkey under name Valkey and it was maintained both by open-source community and Cloud-providers, since this fork had original OS license. Redis is written on C.

Today there are 2 different repositories and projects, each having its own release version and code name:

Redis (https://github.com/redis/redis)
Valkey (https://github.com/valkey-io/valkey)

At this moment the latest release versions are:

Redis v8.6.2
Valkey v9.0.3 (on top of Redis v7.2.4)

So both projects are implementing different features in parallel, like multithreading that was added recently.

Pre-migration Reconnaissance

I'm preparing migration of Redis instances from Redis-cloud to AWS Valkey and before migration need info about data access patterns and all Redis producers/consumers.

Identify clients that are reading/writing to Redis:

There are different technics that can be used to identify all writers/readers of Redis - tracing and monitoring tools
like datadog, x-ray, analysys of ENV variables set for services with aliases to endpoint of Redis.

And things are getting more complex in real life. Do not be surprised to see in your enterprise-leve stack:

distributed environment
AWS multi-account deployments, privatelink, vpc-perrings established, etc.
tons of running services that are written in multiple languages (go, java, ts, ruby)
lack of documentation, stackholders, etc.

Here I will show the technical that I found extra useful, that is natively supported out of the box by Redis, does
not require installation of any third-party agents, monitoring stack, etc.

redis-cli

Redis out of the box has functionality that allows to gather information about clients. You need establish
connection to your Redis server through cli and execute commands.

LIST CLIENT

The CLIENT LIST command returns information and statistics about the client connections server in a mostly human
readable format.

You can use one of the optional subcommands to filter the list. The TYPE type subcommand filters the list by clients'
type, where type is one of normal, master, replica, and pubsub. Note that clients blocked by the MONITOR command
belong
to the normal class.

The ID filter only returns entries for clients with IDs matching the client-id arguments.

redis-cxxxxx.us-east-1.ec2.cloud.xxxx.com:6379> client list
id=3066004040000 addr=xx.xx.xx.135:30746 laddr=xx.20.4.124:18585 fd=4789 name= age=18701 idle=827 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3081428040000 addr=xx.xx.xx.145:31848 laddr=xx.20.4.124:18585 fd=4959 name= age=538 idle=221 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name= lib-ver=
id=2956190040001 addr=xx.xx.xx.118:1604 laddr=xx.20.4.124:18585 fd=5117 name= age=148140 idle=5529 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=set user=xxx resp=2 lib-name= lib-ver=
id=3072343040001 addr=xx.xx.xx.126:46494 laddr=xx.20.4.124:18585 fd=4048 name= age=11493 idle=428 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3078769040001 addr=xx.xx.xx.173:9717 laddr=xx.20.4.124:18585 fd=4184 name= age=3714 idle=1 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3080824040001 addr=xx.xx.xx.158:35937 laddr=xx.20.4.124:18585 fd=5035 name= age=1267 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name=go-redis(,go1.24.13) lib-ver=9.17.2
id=3081894040001 addr=xx.xx.xx.245:36487 laddr=xx.20.4.124:18585 fd=3686 name= age=26 idle=5 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 obl=0 events=r cmd=get user=xxx resp=2 lib-name= lib-ver=
...

The most interesting data:

id: a unique 64-bit client ID.
addr: address/port of the client.
laddr: address/port of local address client connected to (bind address).
name: the name set by the client with CLIENT SETNAME.
age: total duration of the connection in seconds.
idle: idle time of the connection in seconds.
db: current database ID.
cmd: last command played.
user: the authenticated username of the client.
lib-name - the name of the client library that is being used.
lib-ver - the version of the client library.

Having this information from active Redis will allow detecting all client, distinguish them by sesssionID and track
additional information about
library versio, library name, name of the client. Also additional details about the session.

MONITOR

MONITOR is a debugging command that streams back every command processed by the Redis server. It can help in
understanding what is happening to the database. This command can both be used via redis-cli and via telnet.

Because MONITOR streams back all commands, its use comes at a cost. The following (totally unscientific) benchmark
numbers illustrate what the cost of running MONITOR can be.

redis-cxxxxx.us-east-1.ec2.cloud.xxxx.com:6379> monitor
OK
1774787120.638084 [0 xx.xx.xx.222:40277] "get" "prefixkey:namespacea:default_data"
1774787120.649084 [0 xx.xx.xx.243:58512] "get" "prefixkey:namespace2:default_data"
1774787120.652084 [0 xx.xx.xx.222:40277] "get" "prefixkey:namespace3:default_data"
1774787120.687083 [0 xx.xx.xx.50:64040] "zrange" "prefixkey:namespace4:allowed_data" "0" "-1"

CLIENTNAME feature

Output from these commands will be enough in 99% of cases. If you are hosting your apps on EC2, EKS,
ECS, lambda functions - you can easily map IP addresses to dedicated pods/container IPs.

But I got a corner-case - the source Redis is running in RedisLabs - it is a cloud environment that provides
redis-as-a-service (under the hood deploying it to cloud infrasstructure, in case AWS EC2 instances).

But Consumer apps are running in kubernetes cluster and since RedisLabs is hosted externally out of AWS account the network
traffic flows through k8s cluster node.

Since on every kubernetes node is running a lot of payloads in each pod, but all of them when connecting to Redis will have
the same IP address — the IP address of the cluster node on which they are hosted.

So this info can limit the blast radius of services but since node afinity is not in use and due to big landscape of
services, it is hard to identify Redis writers/readers - all pods will have same address set of Cluster nodes.

And here comes CLIENT_NAME - this feature is available since Redis Open Source 2.6.9 - it assigns a name to the
current connection.

The assigned name is displayed in the output of CLIENT LIST so that it is possible to identify the client that
performed a given connection.

However, it is not possible to use spaces in the connection name as this would violate the format of the CLIENT LIST
reply.

Every new connection starts without an assigned name.

setting names to connections is a good way to debug connection leaks due to bugs in the application using Redis.

Once clients are instrumented and deployed, using redis-cli we can check the data.

Writing tool to track clients and operations on Redis:

To make this information in a table format I have created python-based tool that interacts with Redis using RESP
protocol and renders information:

┌────────────────┬─────────────────┬───────────────────────┬─────────┬─────────┬──────────────────────────────────┬─────┐
│ Client IP      │ Name            │ Lib                   │ Lib Ver │ User    │ Full Key                         │ GET │
├────────────────┼─────────────────┼───────────────────────┼─────────┼─────────┼──────────────────────────────────┼─────┤
│ 10.xx.xx.53    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.173   │                 │ go-redis(,go1.21.1)   │ 8.0.2   │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace2:default_data│   1 │
│ 10.100.238.99  │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.100.244.76  │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.100.75.149  │                 │ go-redis(,go1.24.13)  │ 9.17.2  │ default │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace1:default_data│   1 │
│ 10.104.127.44  │                 │                       │         │ default │                                  │   0 │
│ 10.104.136.207 │                 │ go-redis(,go1.24.13)  │ 9.17.2  │ default │                                  │   0 │
│ 10.104.94.31   │                 │                       │         │ default │                                  │   0 │
│ 10.170.3.253   │ monitor-v2-prod │ python-socket-monitor │ 1.0     │ default │                                  │   0 │
└────────────────┴─────────────────┴───────────────────────┴─────────┴─────────┴──────────────────────────────────┴─────┘

For some period in realtime we can track what operations are performed on Redis server and who are the client.
Once all of them are identified and there no unknown areas it's time to analyze Redis instance and plan migraion.

Instrument client with clientName

Redis allows instrument redis SDK with name of the client to be identified on every connection. Here is go-lang code
snippet:

rdb := redis.NewClient(&redis.Options{
    Addr:       "REDIS_HOST:REDIS_PORT",
    Password:   "REDIS_PASSS",
    DB:         0,
    ClientName: "service1-writer",
})

Same can be done on every programming languages libraries, or even at the low-level of socket with Redis RESP-protocol
level:

def send_command(sock, *args):
  """Send a Redis RESP protocol command."""
  command = f"*{len(args)}\r\n"
  for arg in args:
    arg_str = str(arg)
    command += f"${len(arg_str)}\r\n{arg_str}\r\n"
  sock.sendall(command.encode("utf-8"))


def set_client_name(sock, client_name):
  """Assign a client name to the current Redis connection."""
  if not client_name:
    return
  send_command(sock, "CLIENT", "SETNAME", client_name)
  response = read_line(sock)
  if not response.startswith("+OK"):
    raise ConnectionError(f"CLIENT SETNAME failed: {response}")


def set_client_info(sock, lib_name=None, lib_version=None):
  """Assign client library metadata to the current Redis connection."""
  for attribute, value in (("LIB-NAME", lib_name), ("LIB-VER", lib_version)):
    if not value:
      continue
    send_command(sock, "CLIENT", "SETINFO", attribute, value)
    response = read_line(sock)
    # Redis < 7.2 may not support CLIENT SETINFO, so ignore failures.
    if response.startswith("-"):
      continue

Challenge 1: that there are too many connections to this RedisDB

With our custom monitor tool, now we are tracking all clients with details and suddenly see that there are too many
clients and keys that are being accessed.
Since Redis instance is used by multiple service and we are interested in extraction from it only specific prefix
pattern key, we can modify the monitor tool to track only those prefixes access.

add predicate for prefix of keyset—after that, you will see only client names, IPs and operations to a limited subset
of keystore are that is a subject of extraction.
{: .prompt-danger }

Challenge 2: client versions are too old and will not support Valkey

It's important to check SDK client versions—do not be surprised that you are using outdated Redis and outdated
libraries.
In this case, if you provision the latest AWS Valkey and will try to just change the endpoint URL in apps config that
will not work - you apps will get error connections in RESP procol and commands.

upgrade and align client versions to the latest supported redis SDK
{: .prompt-danger }

Completed redis client versions and clientnames alignment

After upgrading all clients code to latest golang redis library version (that is for today is 9.18.0) and adding client
name to redis sessions, now we can review active connections and see what keys are being accessed and ready for the
planning the actual migration:

┌────────────────┬─────────────────┬───────────────────────┬─────────┬─────────┬──────────────────────────────────┬─────┐
│ Client IP      │ Name            │ Lib                   │ Lib Ver │ User    │ Full Key                         │ GET │
├────────────────┼─────────────────┼───────────────────────┼─────────┼─────────┼──────────────────────────────────┼─────┤
│ 10.xx.xx.53    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.173   │ service1-writer │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│                │                 │                       │         │         │ prefixkey:namespace2:default_data│   1 │
│ 10.xx.xx.149   │ service2-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.142   │ service3-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.141   │ service4-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.129   │ service5-writer │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.159   │ service3-reader │ go-redis-(.go1.24.13) │ 9.18.0  │  xxxx   │ prefixkey:namespace1:default_data│   1 │
│ 10.xx.xx.44    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.207   │ service2-reader │ go-redis-(.go1.24.13) │ 9.18.0  │ xxxx    │                                  │   0 │
│ 10.xx.xx.31    │                 │                       │         │ xxxx    │                                  │   0 │
│ 10.xx.xx.253   │ monitor-v2-prod │ python-socket-monitor │ 1.0     │ xxxx    │                                  │   0 │
└────────────────┴─────────────────┴───────────────────────┴─────────┴─────────┴──────────────────────────────────┴─────┘

On top of this tool can be added read/write ios per client measurement, key access detection or any analytic
operations for troubleshooting & refactoring. Its intention is not limited to pre-migration analysis.
{: .prompt-danger }

Reconnaissance step output summary

updated and aligned library versions of all Redis SDKs
properly trackable client names and all connections
fully identified all writers/readers that are planned to be migrated to AWS Valkey
clear understanding of data access patterns (read/write to keystore)

In the next post, I’ll dive into Redis topologies and the critical components to consider when preparing and running a migration, as they depend on the chosen topology.

How switching to SQS Batch operations improves Performance an Billing

Roman Tsypuk — Tue, 23 Dec 2025 19:36:59 +0000

Abstract

In this post, we explore how refactoring SQS message processing from individual SendMessage calls to Batch SendMessage operations can
significantly improve application performance and reduce SQS billing costs by lowering IOPS usage.

The idea

When monitoring a Golang application with DataDog, we can measure SQS message sending in detail. By comparing a traditional loop-based send approach versus batch sending, we can see clear differences in timing, network calls, and resource usage.

Full Datadog tracing of SQS is not supported for all languages:

Set DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING=all and DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING=all on this service to enable complete payload tagging.

https://docs.datadoghq.com/tracing/guide/aws_payload_tagging/?tab=nodejs

DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING=all
DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING=all

For Golang, you can leverage Datadog attribute tags to inspect payload metadata.

Regular SQS message send operations

Sending messages one by one involves multiple network calls and extra overhead.

Following tracing diagram displays how timing looks like when using loop operation.

For example, sending 7 messages individually took 175ms, with 7 separate HTTP requests. The first call typically dominates the timing due to DNS lookup and connection setup.

But since the service in running in same K8S cluster, we can assume that experiment is clean and no additional overhead is present.

Sending messages in a Batch

AWS SQS allows sending up to 10 messages per batch. Sending 20 messages in 2 batches demonstrates significant efficiency gains:

Sent 3x more messages.
Made 10x fewer HTTP requests.
Total processing time reduced by ~3x.

Response examples:

When batch send is performed in a batch response, for each given message there is a status, including the error.
So the batch can be completed, but some messages in a batch can fail, parsing this response status will allow to efficiently
replay or handle with a fallback business logic of such cases.



{
  "Successful": [
    {
      "ID": "0",
      "MessageID": "655f3404-fbe4-4c51-8868-b5c604bd5f6d",
      "Error": null
    },
    {
      "ID": "1",
      "MessageID": "daf36653-9abb-490b-b620-608efa24a219",
      "Error": null
    },
    {
      "ID": "2",
      "MessageID": "93f4dcfd-0500-4076-90f2-3b880b32c943",
      "Error": null
    },
    {
      "ID": "3",
      "MessageID": "f6c7b079-98f5-4290-b293-2ac6e43ed6f2",
      "Error": null
    },
    {
      "ID": "4",
      "MessageID": "2b4a96bc-b4ec-4711-9473-d887dd3213f7",
      "Error": null
    },
    {
      "ID": "5",
      "MessageID": "1bd30cd9-f9c1-4b47-8d6d-2e23ce771841",
      "Error": null
    },
    {
      "ID": "6",
      "MessageID": "8eed75ef-2563-442e-a191-6b3dff29d635",
      "Error": null
    },
    {
      "ID": "7",
      "MessageID": "c65a36ce-7ce0-444c-9974-96648dcae0ea",
      "Error": null
    },
    {
      "ID": "8",
      "MessageID": "75379265-52f9-4a60-8c3a-0537cffdaa80",
      "Error": null
    },
    {
      "ID": "9",
      "MessageID": "59239903-d4d9-498f-9a08-6d7d7ae8beba",
      "Error": null
    },
    {
      "ID": "10",
      "MessageID": "9a614c58-113b-487d-a8f1-7509f93b42f9",
      "Error": null
    },
    {
      "ID": "11",
      "MessageID": "1077de5c-8f0f-4d5b-a0fe-dca45712bfdf",
      "Error": null
    },
    {
      "ID": "12",
      "MessageID": "8b0f5836-0e01-4a88-9793-4bac2a6d879a",
      "Error": null
    }
  ],
  "Failed": []
}

AWS Console Behavior

Batch sending does not change how messages appear in SQS. Each message is stored individually, so consumers don’t need any changes to handle batches.

Same messages, same structures are posted and present in SQS.

However, there are other optimization technics that allow optimise consumer batch size when polling messages from SQS.

Golang Implementation Example

entry := &sqs.SendMessageBatchRequestEntry{
  Id:          aws.String(fmt.Sprintf("%d", i+idx)), // Unique ID within batch
  MessageBody: aws.String(string(b)),
}

if taskConfig.MessageGroupId != "" {
  entry.SetMessageGroupId(taskConfig.MessageGroupId)
}
if taskConfig.MessageDeduplicationId != "" {
  entry.SetMessageDeduplicationId(taskConfig.MessageDeduplicationId)
}
if taskConfig.DelaySeconds > 0 {
  entry.SetDelaySeconds(taskConfig.DelaySeconds)
}
entries = append(entries, entry)
}

type BatchResult struct {
    Successful []BatchResultEntry
    Failed     []BatchResultEntry
}

// BatchResultEntry represents a single entry in a batch result
type BatchResultEntry struct {
    ID        string
    MessageID string
    Error     error
}

// Send batch
input := &sqs.SendMessageBatchInput{
  QueueUrl: stp.url,
  Entries:  entries,
}

output, err := stp.c.SendMessageBatchWithContext(ctx, input)
if err != nil {
  err = handleSqsErrors(err)
  // Mark all entries in this batch as failed
  for idx := range batch {
    result.Failed = append(result.Failed, BatchResultEntry{
      ID:    fmt.Sprintf("%d", i+idx),
      Error: err,
    })
  }
  continue
}

Dedicated message details

Exactly this messageID was returned in a batch is success response section.

Additional things to check and optimize

Deduplication technic

Before sending the messages, perform the deduplication—this will reduce the SQS iops usage, decrease the latency of processing and reduce the load on the consumer side, avoid unneeded storage read queries, rewrites, etc.

Distributed Tracing frameworks can consume SQS batch slot for metaiformation

Some frameworks for distributed tracing are propagating metainformation through async transports like SQS.
If you are using them, check integrations, it can affect the max batch size. I.e. Datadog uses 1 batch element to propagate metainformation with tracing,
that will be consumed and applied with span to a same trace.

x-ray since it is a proprietary AWS technology does not utilize any slots in a SQS batch and uses UDP server to submit span/trace info.

Limitations:

message size payload (1Mb)
batch size (10 messages)
payload serialization (JSON)

Conclusions:

Switching implementation from loop Send to Batch Send, allowed significantly decrease the overall timing, decrease network round trips and as a bonus decrease the SQS billing (due to decrease API calls in x10 times).

Orchestrating AI multi-agent infrastructure with AWS Bedrock, OpenAI and n8n

Roman Tsypuk — Fri, 26 Sep 2025 17:39:06 +0000

Abstract

This article explores how to build a multi-agent AI ecosystem using n8n, AWS Bedrock, OpenAI, and MCP servers—all with a no-code approach.

The idea

Each AI agent is designed with its own dedicated model (optimized for its role) and separate memory storage (ensuring context persistence and isolation). By connecting agents to AWS documentation via MCP, custom AWS news feeds via JSON, and enabling agent-to-agent communication, we demonstrate how to create a flexible system that interacts directly based on user prompts.

What is AI-agent and its parts

An AI agent is not just a single model—it’s a structured system made up of several interconnected components. Think of
it as a worker in a digital team, equipped with a brain, memory, and tools.

Core Components of an AI Agent

LLM Model: The reasoning engine. Can be OpenAI GPT, Anthropic Claude (via AWS Bedrock), Amazon Titan, or any other
large language model.
Responsible for interpreting prompts, generating responses, and orchestrating tool usage.
Memory: A storage layer where past interactions and context are recorded.
Often implemented as a database table or key/value storage. Ensures continuity—so the agent doesn’t “forget” what was
asked earlier.
Tools: Interfaces that extend the agent’s capabilities. Examples: HTTP endpoints, databases, MCP services, or custom
APIs.
Tools allow the agent to access real-time or domain-specific data beyond the model’s training cutoff.

Connecting to AWS documentation server as MCP

AWS offers a free streaming service for official documentation, and we can integrate it into our agent through the Model
Context Protocol (MCP).

By registering MCP client tools, our AI agent gains the ability to search, read, and recommend content directly from AWS
docs.

AWS MCP configuration settings:

Parameter	Value
Endpoint	https://knowledge-mcp.global.api.aws
Server Transport	HTTP streamable
Authentication	none
Tools	read, recommend, search

With this setup, whenever a user asks something like “How do I configure DynamoDB streams?”, the agent can fetch the
latest instructions directly from the AWS documentation server.

Adding AI agent tools for AWS news

Besides documentation, agents can also consume custom news feeds. I maintain a curated set of AWS news in JSON format,
hosted on GitHub Pages. These feeds cover categories such as architecture, big data, and machine learning.

By connecting the agent’s HTTP tool to these feeds, we can provide real-time AWS announcements and updates inside the
agent’s workflow.

REST http tools setup:

Tool	details
Tool info	Makes an HTTP request and returns the latest aws news Training Certification
Tool endpoint	https://tsypuk.github.io/aws-news/news/training_certification.json

Tool	details
Tool info	Makes an HTTP request and returns the latest aws news Architecture
Tool endpoint	https://tsypuk.github.io/aws-news/news/architecture.json

The full list of feeds is available in my repo: tsypuk/aws-news

With this integration, a user can ask: “What are the latest updates in AWS architecture?”, and the agent will pull fresh
content directly from the JSON feed.

You can get and use the full list of news at my aws news repo

Full Agent2Agent connection and communication

Once we have multiple agents ecosystem (Documentation Agent + News Agent), we can introduce a third agent as
orchestrator.

The Orchestrator Agent decides which agent to query.
The Documentation Agent connects via MCP to AWS docs.
The News Agent pulls JSON feeds with announcements.

Example:

Prompt: “Explain the latest DynamoDB updates and show me how to configure them.”
Orchestrator → News Agent: get latest DynamoDB announcements.
Orchestrator → Documentation Agent: fetch relevant setup docs.
Orchestrator synthesizes the results into a single, user-friendly answer.

The beauty of such configuration that each individual agent can be tuned and configured separately, i.e. we can set
different model for each agent - for some agent will be enough small model, so there is no need
to use more expensive model; we can use a specific model that provides better performance for a particular domain or
trained on a dedicated dataset.

This separation lowers costs while improving domain accuracy.

Checking the Memory storage

AI agents persist interaction history in Memory, which is critical for:

Maintaining context across multi-turn conversations.
Remembering what data was fetched previously.
Enabling personalization (e.g., user preferences).

Memory storage is organized as a simple table:

create table n8n_chat_reseach_histories
(
    id         serial
        primary key,
    session_id varchar(255) not null,
    message    jsonb        not null
);

Let's check how it is populated with data based on different agent flows communication.

Request of flow through aws news agent:

what are the latest news in s3 storage?
{: .prompt-tip }

Here are the latest AWS blog items related to Amazon S3 storage, with source links:
Amazon SageMaker introduces Amazon S3-based shared storage for enhanced project collaboration. Link: https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/
Optimize Amazon EMR runtime for Apache Spark with EMR S3A. Link: https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/
A scalable, elastic database and search solution for 1B+ vectors built on LanceDB and Amazon S3. Link: https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/

Sources:

https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/
https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/
https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/

Param	Value
sessionid	15925d170d1e4ce3956667dbb9b172ba
action	sendMessage
chatInput	what are the latest news in s3 storage?

Next, we see that agent called Postgres to load Memory, but since it is a first interaction, there is empty chat history.
Also, we can trace agent to agent communications.

Agent calls tools for bigdata and for architecture to get the latest info about S3.
Just before the response, results are persisted in the memory storage:

Data from main AI-agent memory

id	session_id	message
3	15925d170d1e4ce3956667dbb9b172ba	{"type": "human", "content": "what are the latest news in s3 storage?. Include links to all used sources.", "additional_kwargs": {}, "response_metadata": {}}
4	15925d170d1e4ce3956667dbb9b172ba	{"type": "ai", "content": "Here are the latest AWS blog items related to Amazon S3 storage, with source links:\n\n- Amazon SageMaker introduces Amazon S3-based shared storage for enhanced project collaboration. Link: https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/\\n\\n- Optimize Amazon EMR runtime for Apache Spark with EMR S3A. Link: https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/\\n\\n- A scalable, elastic database and search solution for 1B+ vectors built on LanceDB and Amazon S3. Link: https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/\\n\\nSources:\\n- https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/\\n- https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/\\n- https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/", "tool_calls": [], "additional_kwargs": {}, "response_metadata": {}, "invalid_tool_calls": []}

Data from latest-news AI-agent memory

id	session_id	message
73	15925d170d1e4ce3956667dbb9b172ba	{"type": "human", "content": "You are news facts agent. Should provide accurate response based on latest news, in the response add section with links to all sources that you have used.\nPlease provide the latest news related to Amazon S3 storage, with links to all sources.", "additional_kwargs": {}, "response_metadata": {}}
74	15925d170d1e4ce3956667dbb9b172ba	{"type": "ai", "content": "Here are the latest AWS blog items related to Amazon S3 storage:\n\n- Amazon SageMaker introduces Amazon S3 based shared storage for enhanced project collaboration. Link: https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/\\n\\n- Optimize Amazon EMR runtime for Apache Spark with EMR S3A. Link: https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/\\n\\n- A scalable, elastic database and search solution for 1B+ vectors built on LanceDB and Amazon S3. Link: https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/\\n\\nSources\\n- https://aws.amazon.com/blogs/big-data/amazon-sagemaker-introduces-amazon-s3-based-shared-storage-for-enhanced-project-collaboration/\\n- https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/\\n- https://aws.amazon.com/blogs/architecture/a-scalable-elastic-database-and-search-solution-for-1b-vectors-built-on-lancedb-and-amazon-s3/", "tool_calls": [], "additional_kwargs": {}, "response_metadata": {}, "invalid_tool_calls": []}

Amazing that n8n stores history of executions and allows visually represent flows execution - what path the ai-agent has chosen, what tools were invoked, etc.:

Request of flow through documentation agent

what is the supported knowledge base storages for aws bedrock?
{: .prompt-tip }

Here are the storage options AWS Bedrock documents list as supported for Bedrock Knowledge Bases (for Retrieval Augmented Generation), with the official source links. Where possible, I note the exact integration concept and point you to the corresponding AWS docs for details on configuration and fields.

Amazon OpenSearch Serverless (vector store)
What it is: A vector search collection in OpenSearch Serverless used as Bedrock’s vector store for embeddings.
How it integrates with Bedrock Knowledge Bases: Bedrock connects to an OpenSearch Serverless vector collection and stores embeddings plus text chunks and Bedrock metadata; mappings/fields are defined in the knowledge-base setup flow.
Key docs/fields: Bedrock knowledge-base-setup guidance covers how to configure the OpenSearch Serverless integration.
Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html
Amazon OpenSearch Service (Managed Clusters) as vector store
...

Data from main AI-agent memory

id	session_id	message
7	15925d170d1e4ce3956667dbb9b172ba	{"type": "human", "content": "what is the supported knowledge base storages for aws bedrock. Include links to all used sources.", "additional_kwargs": {}, "response_metadata": {}}
8	15925d170d1e4ce3956667dbb9b172ba	{"type": "ai", "content": "Here are the storage options AWS Bedrock documents list as supported for Bedrock Knowledge Bases (for Retrieval Augmented Generation), with the official source links. Where possible, I note the exact integration concept and point you to the corresponding AWS docs for details on configuration and fields.\n\n1) Amazon OpenSearch Serverless (vector store)\n- What it is: A vector search collection in OpenSearch Serverless used as Bedrock’s vector store for embeddings.\n- How it integrates with Bedrock Knowledge Bases: Bedrock connects to an OpenSearch Serverless vector collection and stores embeddings plus text chunks and Bedrock metadata; mappings/fields are defined in the knowledge-base setup flow.\n- Key docs/fields: Bedrock knowledge-base-setup guidance covers how to configure the OpenSearch Serverless integration.\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n2\) Amazon OpenSearch Service (Managed Clusters) as vector store\n- What it is: A managed OpenSearch Service domain used as a vector store for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases: Create a vector index on the OpenSearch domain and connect it to the Bedrock KB; mappings include vector index name, vector field, and metadata field, plus Bedrock’s text chunk fields.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (OpenSearch Clusters section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n3\) Amazon S3 Vectors (vector store in S3)\n- What it is: Bedrock-supported vector storage that stores embeddings in S3-based vector structures.\n- How it integrates with Bedrock Knowledge Bases: Configure an S3 vector bucket/index; Bedrock stores embeddings and metadata and uses defined mappings for fields.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (S3 Vectors section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n4\) Amazon Aurora (RDS) as vector store\n- What it is: Aurora database (PostgreSQL-compatible) used as the relational vector store for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases: Bedrock expects a relational table with embedding/vector, text chunks, and metadata columns; Bedrock maps UI/API fields to the table schema.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (Aurora section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n5\) Neptune Analytics graphs (GraphRAG)\n- What it is: Neptune Analytics graphs used with GraphRAG as a Bedrock knowledge base storage option.\n- How it integrates with Bedrock Knowledge Bases: Create a Neptune Analytics graph and connect it to Bedrock KB; configure a vector index on the graph and map Bedrock embeddings/text/metadata to graph fields.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (Neptune Analytics/GraphRAG section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n6\) Pinecone (vector store)\n- What it is: Pinecone vector index service used as a Bedrock Knowledge Bases vector store.\n- How it integrates with Bedrock Knowledge Bases: Provide Pinecone index name/endpoint and credentials (via Secrets Manager); Bedrock stores embeddings and metadata and maps fields accordingly.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (Pinecone section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n7\) Redis Enterprise Cloud (vector store)\n- What it is: Redis Enterprise Cloud used as a Bedrock Knowledge Bases vector store.\n- How it integrates with Bedrock Knowledge Bases: Configure endpoint, vector index, vector field, metadata field; credentials via Secrets Manager and TLS settings as needed.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (Redis Enterprise Cloud section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n8\) MongoDB Atlas (vector store)\n- What it is: MongoDB Atlas used as a Bedrock Knowledge Bases vector store.\n- How it integrates with Bedrock Knowledge Bases: Connect to Atlas, specify database/collection and vector index; Bedrock stores embeddings, text chunks, and metadata in the Atlas collection.\n- Key docs/fields: Bedrock knowledge-base-setup guidance (MongoDB Atlas section).\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n9\) Kendra GenAI index (special integration path)\n- What it is: Build a Bedrock knowledge base from an Amazon Kendra GenAI index to combine Kendra’s retrieval with Bedrock’s generation.\n- How it integrates with Bedrock Knowledge Bases: Create a KB from a Kendra GenAI index to reuse indexed content with Bedrock GenAI.\n- Key docs/fields: Bedrock knowledge-base-build-kendra-genai-index guidance.\n- Source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n\\nAdditional context and related docs\n- Overview and retrieval concepts for Bedrock Knowledge Bases: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html and https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html\\n- Prerequisites and setup for vector-store usage: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq.html\\n- Build or connect a knowledge base (general flow): https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build.html\\n- Kendra GenAI integration path (detailed steps): https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n\\nWhat’s New / related announcements (contextual confirmations)\n- OpenSearch-based vector storage support for Bedrock Knowledge Bases (GA/availability note): \n https://aws.amazon.com/about-aws/whats-new/2025/03/amazon-bedrock-knowledge-bases-opensearch-cluster-vector-storage/\\n- MongoDB Atlas vector storage support for Bedrock Knowledge Bases:\n https://aws.amazon.com/about-aws/whats-new/2024/05/knowledge-bases-amazon-bedrock-mongodb-atlas-vector-storage/\\n\\nIf you’d like, I can distill this into a compact quick-reference table (one line per option with integration point and official doc URL), or provide a per-option summary with exact field names as shown in the Bedrock setup docs. All information above is drawn from official AWS Bedrock documentation and related AWS What’s New posts linked here.", "tool_calls": [], "additional_kwargs": {}, "response_metadata": {}, "invalid_tool_calls": []}

Data from aws documentation AI-agent memory

id	session_id	message
1	15925d170d1e4ce3956667dbb9b172ba	{"type": "human", "content": "You are research agent. Should provide accurate response based on documentation and statements, in the response add section with links to all sources that you have used.\nPlease provide the official AWS Bedrock documentation on supported knowledge base storages. Specifically, list the storage options (e.g., vector stores or document stores) that Bedrock can use for retrieval-augmented generation, or knowledge bases, including any required formats, integrations (such as S3, Kendra, etc.), and limitations. Include links to all used sources with exact URLs. If multiple AWS services are involved (Kendra, S3, etc.), include their relevant Bedrock integration docs. Provide only verifiable information from AWS docs.", "additional_kwargs": {}, "response_metadata": {}}
2	15925d170d1e4ce3956667dbb9b172ba	{"type": "ai", "content": "Below is a consolidated, source-backed view of the official AWS Bedrock documentation on supported knowledge base storages (the vector stores and document stores Bedrock Knowledge Bases can use for Retrieval Augmented Generation, or knowledge bases). For each option, I note what it is, how it integrates with Bedrock Knowledge Bases, any required formats or fields, and any limitations or special notes found in AWS docs. Links to the exact AWS sources are provided at the end.\n\nWhat Bedrock Knowledge Bases can use for storage (summary)\n- Bedrock Knowledge Bases support storing and indexing data in multiple vector stores (and related data stores) to power RAG. Data from your sources is converted into vector embeddings and stored in a chosen vector store; Bedrock then retrieves relevant vectors and uses them to augment generation.\n- You can either connect to a data source directly (unstructured or structured) or use Bedrock’s built-in/managed vector stores; you can also create a knowledge base by connecting to a data source, or build a knowledge base with a Kendra GenAI index.\n- Important notes:\n - Multimodal data (text plus images, charts, etc.) is supported only with Amazon S3 and custom data sources.\n - Some vector stores support binary embeddings, others only floating-point embeddings; the available options and capabilities depend on the store.\n - Vector stores require index/collection setup, including mapping fields for embeddings, text chunks, and metadata, as described in each store’s setup.\n - There are integration previews/GA status and regional availability notes in various sources (see specific entries below).\n- Relevant Bedrock doc sections and integration pages include guides on building and using knowledge bases, setting up vector stores, and special integration options with Kendra GenAI.\n\nDetailed storage options (Bedrock Knowledge Bases) with integration details\n\n1) Amazon OpenSearch Serverless (vector storage)\n- What it is: A vector store option via Amazon OpenSearch Serverless that Bedrock Knowledge Bases can use for vector search indexing of embeddings.\n- How it integrates with Bedrock:\n - You configure a vector search collection in OpenSearch Serverless and connect it to the Bedrock knowledge base as the vector store.\n - You must align the vector embedding dimensions with the embeddings model you use.\n - In Bedrock’s knowledge base setup, you map Bedrock to OpenSearch Serverless, including fields for embeddings, text chunks, and Bedrock-managed metadata.\n- Required formats and fields:\n - Vector index configuration in OpenSearch Serverless with a vector field (embeddings) and metadata/text fields (e.g., AMAZON_BEDROCK_TEXT_CHUNK, AMAZON_BEDROCK_METADATA) as part of the index mapping.\n - The embedding space is configured to use the selected embedding model; the vector type is typically a knn_vector with engine faiss and a suitable distance metric (euclidean commonly recommended for floating-point embeddings).\n- Limitations / notes:\n - Guidance includes specific dimension recommendations and how to map Bedrock’s data into the vector index.\n - Documentation notes the OpenSearch Serverless integration as a supported vector store; see the knowledge base setup materials for exact mapping details.\n- Source:\n - Build a knowledge base by connecting to a data source (OpenSearch Serverless section) and general knowledge base setup (knowledge-base-setup.html)\n - Knowledge Base setup text references OpenSearch Serverless as a vector store option and provides detailed steps and field mappings\n - Knowledge-base-setup URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n2\) Amazon OpenSearch Service Managed Clusters (vector storage)\n- What it is: A vector store option via Amazon OpenSearch Service domains (managed OpenSearch clusters) used as a vector store for Bedrock Knowledge Bases.\n- How it integrates with Bedrock:\n - You create a vector index in the OpenSearch Service domain and connect it to the Bedrock knowledge base.\n - Bedrock provides mappings for domain ARN, domain endpoint, vector index name, vector field name, and metadata field name.\n- Required formats and fields:\n - Mappings include vectorIndexName, vectorField (embedding field), and metadataField, plus Bedrock-specific text chunk fields.\n- Limitations / notes:\n - Prerequisites and permissions for using OpenSearch Managed Clusters are documented; there are guidance notes on permissions, encryption, and indexing requirements.\n - Similar to OpenSearch Serverless, there are dimensionality requirements and embedding-field configurations you must align with your embeddings model.\n- Source:\n - Knowledge-base-setup.html (OpenSearch Clusters section)\n - knowledge-base-setup page content includes both OpenSearch Serverless and OpenSearch Managed Clusters guidance\n - Knowledge-base-setup URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n3\) Amazon S3 Vectors (vector storage in S3)\n- What it is: A vector store option that stores vector embeddings in S3 Vectors (S3-based vector storage). This is a Bedrock-supported way to keep embeddings in S3 with vector indexing.\n- How it integrates with Bedrock:\n - You set up an S3 vector bucket and a corresponding vector index; Bedrock stores embeddings and metadata in S3-based vector structures.\n- Required formats and fields:\n - Vector bucket ARN, vector index ARN, vector index name, and vector field name in Bedrock’s knowledge base setup (as fields to fill when creating the knowledge base).\n - Metadata fields (e.g., AMZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA) and the ability to attach non-filterable metadata.\n - Dimensions: embedding dimension must be between 1 and 4096; S3 Vectors only supports floating-point embeddings.\n - The knowledge base builder provides a sample of how to configure the vector index with fields for embeddings and metadata in a JSON-like mapping.\n- Limitations / notes:\n - S3 Vectors integration is noted as a non-trivial, scalable vector storage solution; it has limits on the vector dimension, supports only floating-point embeddings, and provides metadata handling and filtering.\n - Metadata limits and filtering behavior are described (e.g., 40 KB per vector metadata with 2 KB filterable portion, etc., in the general guidance). See the detailed guidance in the S3 vectors section.\n - The integration can be in preview or general depending on the time/region; the Bedrock doc references the S3 Vectors integration as a supported store.\n- Source:\n - Knowledge-base-setup.html (S3 Vectors section)\n - Knowledge-base-setup HTML content explicitly describes S3 Vectors, vector bucket/ARN, vector index, dimension limits, and metadata handling\n - Knowledge-base-setup URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n4\) Amazon Aurora (RDS) vector store\n- What it is: A Bedrock-supported vector store using Amazon Aurora (PostgreSQL-compatible) as the data store for vector embeddings.\n- How it integrates with Bedrock:\n - Bedrock expects a relational table to store embeddings, chunks (text), and metadata; you create a table with specific column names for embedding vectors and text chunks, plus a metadata column.\n - You need to map Bedrock’s UI/API fields to the table’s schema when creating the knowledge base.\n- Required formats and fields:\n - Columns including embedding (vector), chunks (text), and metadata (Bedrock-managed, plus optional custom metadata as needed).\n - You must create a DB index on the vector column and text column; optional GIN index on metadata if using custom metadata.\n- Limitations / notes:\n - The Aurora cluster must reside in the same AWS account as the Bedrock knowledge base.\n - The table schema is fixed in Bedrock’s guidance; you must provide those fields when creating the knowledge base, and they cannot be updated after creation.\n- Source:\n - Knowledge-base-setup.html (Aurora section)\n\n5) Neptune Analytics graphs (GraphRAG)\n- What it is: Neptune Analytics graphs used with GraphRAG (a Neptune-based vector-augmented approach) as a Bedrock knowledge base storage option.\n- How it integrates with Bedrock:\n - You create a Neptune Analytics graph and connect it to Bedrock Knowledge Bases; you configure a vector search index on the graph and map Bedrock’s embeddings/text/metadata to the graph’s fields.\n- Required formats and fields:\n - Graph ARN, vector index dimensions, and Bedrock text/metadata field mappings.\n- Limitations / notes:\n - The guidance describes how to set up the graph and the vector index, including dimensions matching the embeddings model.\n- Source:\n - Knowledge-base-setup.html (Neptune Analytics/GraphRAG section)\n\n6) Pinecone\n- What it is: Pinecone as a vector store option for Bedrock Knowledge Bases.\n- How it integrates with Bedrock:\n - You set up a Pinecone index, provide endpoint URL, and provide credentials (credentials secret ARN) to Bedrock via AWS Secrets Manager.\n- Required formats and fields:\n - Vector index name, endpoint URL, credentials secret ARN, optional KMS key for decrypting credentials.\n - Metadata handling: text field for raw chunk text, metadata field for source attribution metadata, optional text search index name, etc.\n - You must supply a secret in Secrets Manager with the API key for the Pinecone index (and secret ARN for Bedrock to use).\n- Limitations / notes:\n - Pinecone integration requires providing access credentials securely via Secrets Manager.\n - You’ll supply metadata/text field mappings for Bedrock to store and retrieve vectors and associated data.\n- Source:\n - Knowledge-base-setup.html (Pinecone section)\n\n7) Redis Enterprise Cloud\n- What it is: Redis Enterprise Cloud as a vector store option for Bedrock Knowledge Bases.\n- How it integrates with Bedrock:\n - You configure Redis connection settings via Bedrock (endpoint URL, vector index name, vector field, and metadata field). You must provide credentials via Secrets Manager and TLS settings as part of the integration.\n- Required formats and fields:\n - Endpoint URL, vector index name, vector field, metadata field, and Bedrock-managed metadata naming.\n - Secrets Manager secret with credentials (username, password, and TLS details, if applicable).\n- Limitations / notes:\n - TLS and secret configuration requirements are described; you must provide secret values in Secrets Manager for Bedrock to use.\n- Source:\n - Knowledge-base-setup.html (Redis Enterprise Cloud section)\n\n8) MongoDB Atlas\n- What it is: MongoDB Atlas as a vector store option for Bedrock Knowledge Bases.\n- How it integrates with Bedrock:\n - You connect to a MongoDB Atlas cluster, configure the database, collection, and vector index; Bedrock will store embeddings, text chunks, and metadata in the Atlas collection.\n- Required formats and fields:\n - Endpoint URL, database name, collection name, credentials secret ARN for Atlas user, vector index name, vector field name, text field name, metadata field name, optional text search/index fields.\n - Optional PrivateLink for AWS PrivateLink connectivity.\n- Limitations / notes:\n - Metadata filtering may require manual configuration in MongoDB Atlas vector index settings; some features require explicit configuration beyond the Bedrock setup.\n- Source:\n - Knowledge-base-setup.html (MongoDB Atlas section)\n\n9) Other notes on storage options (summary from knowledge base setup)\n- The Bedrock Knowledge Bases setup emphasizes you can connect to various data sources (unstructured or structured). It also notes:\n - You can choose to “set up your own supported vector store” or let Bedrock automatically create a vector store (e.g., via the Console for OpenSearch Serverless).\n - If you plan to use structured data stores, you can transform queries into structured data language queries (like SQL).\n- OpenAI-style “Kendra GenAI” integration option\n - Build a Bedrock knowledge base with an Amazon Kendra GenAI index to reuse indexed content and combine Bedrock GenAI with Kendra retrieval capabilities.\n - This integration is documented as a specific knowledge base-building path.\n- Source:\n - knowledge-base.html (Overview of knowledge bases)\n - knowledge-base-build-kendra-genai-index.html (Kendra GenAI integration)\n\nKendra GenAI integration (special case)\n- What it is: A Bedrock knowledge base built from an Amazon Kendra GenAI index to combine Bedrock’s GenAI capabilities with Kendra’s retrieval strengths.\n- How it integrates with Bedrock:\n - You create a Bedrock knowledge base from a Kendra GenAI index to reuse content across Bedrock applications.\n - It leverages Kendra’s indexing + Bedrock’s generation capabilities for improved accuracy and retrieval.\n- Source:\n - knowledge-base-build-kendra-genai-index.html\n\nRelated AWS documentation notes (context and confirmations)\n- OpenSearch-based vector storage integration status and availability:\n - AWS What’s New: Amazon Bedrock Knowledge Bases now supports Amazon OpenSearch Managed Cluster for vector storage (general availability note).\n - URL: https://aws.amazon.com/about-aws/whats-new/2025/03/amazon-bedrock-knowledge-bases-opensearch-cluster-vector-storage/\\n- MongoDB Atlas integration for Bedrock Knowledge Bases (vector storage):\n - AWS What’s New: Knowledge Bases for Amazon Bedrock now supports MongoDB Atlas for vector storage (vector storage availability).\n - URL: https://aws.amazon.com/about-aws/whats-new/2024/05/knowledge-bases-amazon-bedrock-mongodb-atlas-vector-storage/\\n- Retrieval Augmented Generation (RAG) guidance and integration options (AWS Prescriptive Guidance):\n - URL: https://docs.aws.amazon.com/prescriptive-guidance/latest/retrieval-augmented-generation-options/introduction.html\\n- General Bedrock knowledge base docs (for knowledge bases, setup, build, retrieval):\n - Knowledge Bases main overview: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html\\n - Build a knowledge base by connecting to a data source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build.html\\n - Prerequisites for creating a Bedrock knowledge base with a unstructured data source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq.html\\n - Retrieving information from data sources using Bedrock Knowledge Bases: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html\\n - Installing and using Kendra GenAI index with Bedrock Knowledge Bases: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n - Prerequisites and setup for using a vector store (OpenSearch, S3 Vectors, etc.): https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\nSources (exact URLs used)\n- https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html\\n- https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build.html\\n- https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq.html\\n- https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html\\n- https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n- https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n- https://aws.amazon.com/about-aws/whats-new/2025/03/amazon-bedrock-knowledge-bases-opensearch-cluster-vector-storage/\\n- https://aws.amazon.com/about-aws/whats-new/2024/05/knowledge-bases-amazon-bedrock-mongodb-atlas-vector-storage/\\n- https://docs.aws.amazon.com/prescriptive-guidance/latest/retrieval-augmented-generation-options/introduction.html\\n\\nIf you’d like, I can format this into a compact quick-reference with a side-by-side comparison table of each storage option (store name, Bedrock integration points, required Bedrock fields, and notable limitations) and include only the most relevant bullets for quick decision-making.", "tool_calls": [], "additional_kwargs": {}, "response_metadata": {}, "invalid_tool_calls": []}
3	15925d170d1e4ce3956667dbb9b172ba	{"type": "human", "content": "You are research agent. Should provide accurate response based on documentation and statements, in the response add section with links to all sources that you have used.\nProvide official AWS Bedrock documentation on supported knowledge base storages. List all storage options Bedrock Knowledge Bases can use for Retrieval Augmented Generation (RAG), including vector stores or document stores, integrations (S3, Kendra GenAI, OpenSearch, etc.), required formats/fields, limitations, and exact URLs to AWS docs. Include separate sources for each option if possible.", "additional_kwargs": {}, "response_metadata": {}}
4	15925d170d1e4ce3956667dbb9b172ba	{"type": "ai", "content": "Official AWS Bedrock documentation: Supported knowledge base storages for Retrieval Augmented Generation (RAG)\n\nThe Bedrock Knowledge Bases support a variety of storage options (vector stores and document stores) that Bedrock can use for RAG. Below is a structured list of each option, what it is, how it integrates with Bedrock Knowledge Bases, the required formats/fields, notable limitations, and the exact AWS docs that describe it. When possible, each item links to separate Bedrock-specific documentation pages; in some cases the same Bedrock page covers multiple options.\n\n1) Amazon OpenSearch Serverless (vector store)\n- What it is\n - A vector search collection in Amazon OpenSearch Serverless used as the vector store for embedding vectors generated from your data.\n- How it integrates with Bedrock Knowledge Bases\n - In Bedrock Knowledge Bases, you map to an OpenSearch Serverless vector collection. Bedrock stores embeddings in a vector field and maps accompanying text chunks and Bedrock-managed metadata.\n- Required formats/fields (Bedrock setup guidance)\n - Create a vector index with:\n - A vector field (e.g., embeddings) using the faiss engine and an appropriate dimension (dimensions depend on your embedding model; Euclidean distance is recommended for floating-point embeddings).\n - Metadata fields to pair with vectors (e.g., text chunks and Bedrock metadata).\n - Mapping examples discuss:\n - Field for the vector embeddings\n - Field for the text chunks\n - Bedrock-managed metadata field\n- Limitations / notes (Bedrock doc context)\n - OpenSearch Serverless is one of the supported options for vector storage with explicit guidance on how to map Bedrock data into the index.\n - Requires configuring permissions and collection details in OpenSearch Serverless; Bedrock provides the mapping fields in the knowledge-base setup flow.\n- Bedrock doc source\n - Knowledge Base setup (OpenSearch Serverless section) \n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n2\) Amazon OpenSearch Service (Managed Clusters) as vector store\n- What it is\n - A managed OpenSearch cluster (OpenSearch Service domain) used as a vector store for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases\n - You create a vector index on the OpenSearch domain and connect it to the Bedrock knowledge base. Bedrock requires mappings for:\n - Domain ARN, domain endpoint\n - Vector index name, vector field, and metadata field\n- Required formats/fields (Bedrock setup guidance)\n - Mappings include:\n - vectorIndexName\n - vectorField (embedding field)\n - metadataField\n - Bedrock text chunk and Bedrock metadata fields\n- Limitations / notes\n - Prerequisites include required IAM permissions and domain configuration. Guidance covers encryption, indexing requirements, and domain capacity considerations.\n - Dimensionality and embedding-field configurations must align with your embedding model (including K-NN index considerations when supported).\n- Bedrock doc source\n - Knowledge Base setup (OpenSearch Clusters section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n3\) Amazon S3 Vectors (vector store in S3)\n- What it is\n - Vector storage in Amazon S3 using S3 Vectors (Bedrock-supported) to hold embeddings and related metadata.\n- How it integrates with Bedrock Knowledge Bases\n - You configure an S3 vector bucket and a corresponding vector index. Bedrock stores embeddings and metadata in S3-based vector structures and uses a defined mapping for fields.\n- Required formats/fields\n - Vector bucket ARN, vector index ARN, vector index name, and vector field name\n - Metadata fields (Bedrock-managed) and text chunk fields\n - Embedding dimension constraints (1 to 4096); only floating-point embeddings are supported\n - Sample mappings show fields for:\n - embeddings field\n - text chunk field\n - metadata field\n- Limitations / notes\n - S3 Vectors integration is noted as a supported (and scalable) vector store, but described with several constraints:\n - Preview status (as of documentation) and ongoing availability notes\n - Dimension limits and floating-point embeddings only\n - Metadata handling and filtering limitations (e.g., 40 KB per vector metadata with 2 KB filterable portion)\n- Bedrock doc source\n - Knowledge Base setup (S3 Vectors section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n4\) Amazon Aurora (RDS) vector store\n- What it is\n - A Bedrock-supported vector store using Amazon Aurora (PostgreSQL-compatible) as the data store for embeddings.\n- How it integrates with Bedrock Knowledge Bases\n - Bedrock expects a relational table with:\n - An embedding/vector column\n - A text chunks column\n - A metadata column (Bedrock-managed, plus optional custom metadata)\n - Bedrock maps its UI/API fields to the table schema during knowledge base creation.\n- Required formats/fields\n - Relational table with columns for:\n - embedding vector\n - text chunks\n - metadata\n - Optional metadata filtering/indexing (e.g., GIN index)\n- Limitations / notes\n - Aurora cluster must be in the same AWS account as the Bedrock knowledge base.\n - The table schema is fixed per Bedrock guidance and cannot be updated after creation.\n- Bedrock doc source\n - Knowledge Base setup (Aurora section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n5\) Neptune Analytics graphs (GraphRAG)\n- What it is\n - Neptune Analytics graphs used with GraphRAG as a Bedrock knowledge base storage option.\n- How it integrates with Bedrock Knowledge Bases\n - Create a Neptune Analytics graph and connect it to Bedrock Knowledge Bases; configure a vector index on the graph and map Bedrock embeddings/text/metadata to the graph’s fields.\n- Required formats/fields\n - Graph ARN, vector index dimensions, and Bedrock text/metadata field mappings\n- Limitations / notes\n - Guidance covers graph/vector index setup and dimension matching to embedding models.\n- Bedrock doc source\n - Knowledge Base setup (Neptune Analytics/GraphRAG section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n6\) Pinecone (vector store)\n- What it is\n - Pinecone as a dedicated vector index service to store and query embeddings for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases\n - Bedrock references a Pinecone index (name), endpoint URL, and credentials stored in AWS Secrets Manager (secret ARN; optional KMS key for decryption).\n- Required formats/fields\n - Vector index name\n - Endpoint URL\n - Secrets Manager credentials secret ARN (and optional KMS decryption key)\n - Metadata/text fields to store the raw chunk text and source metadata\n - Optional text-search index name\n- Limitations / notes\n - Credentials must be provided securely via Secrets Manager\n - Metadata/text field mappings must be supplied to Bedrock for storage and retrieval\n- Bedrock doc source\n - Knowledge Base setup (Pinecone section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n7\) Redis Enterprise Cloud (vector store)\n- What it is\n - Redis Enterprise Cloud as a vector store option for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases\n - Bedrock connects to Redis, providing endpoint URL, vector index name, vector field, and metadata field; credentials via Secrets Manager and TLS settings are configured as part of the integration.\n- Required formats/fields\n - Endpoint URL\n - Vector index name\n - Vector field\n - Metadata field\n - Bedrock-managed metadata naming\n - Secrets Manager secret with credentials (and TLS details if applicable)\n- Limitations / notes\n - TLS and secret configuration requirements; secrets must be provided to Bedrock\n- Bedrock doc source\n - Knowledge Base setup (Redis Enterprise Cloud section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n8\) MongoDB Atlas (vector store)\n- What it is\n - MongoDB Atlas as a vector store option for Bedrock Knowledge Bases.\n- How it integrates with Bedrock Knowledge Bases\n - Connect to a MongoDB Atlas cluster, configure database/collection and vector index; Bedrock stores embeddings, text chunks, and metadata in the Atlas collection.\n- Required formats/fields\n - Endpoint URL\n - Database name\n - Collection name\n - Credentials secret ARN for Atlas user\n - Vector index name\n - Vector field name\n - Text field name\n - Metadata field name\n - Optional PrivateLink connectivity\n- Limitations / notes\n - Metadata filtering may require additional Atlas vector index configuration; some features may require explicit setup beyond Bedrock’s default guidance\n- Bedrock doc source\n - Knowledge Base setup (MongoDB Atlas section)\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n\\n9\) Kendra GenAI index (special integration path for Bedrock)\n- What it is\n - Build a Bedrock knowledge base from an Amazon Kendra GenAI index to combine Bedrock GenAI with Kendra’s retrieval capabilities.\n- How it integrates with Bedrock Knowledge Bases\n - The Bedrock knowledge base reuses content indexed in a Kendra GenAI index; benefits include reusing indexed content and leveraging Bedrock’s generation features with Kendra’s retrieval strength.\n- Required formats/fields\n - You build a Bedrock KB from a Kendra GenAI index; the process is documented in Bedrock’s KB build flow for Kendra GenAI\n- Limitations / notes\n - This is a specialized integration path that leverages Kendra GenAI indexing for Bedrock KB creation\n- Bedrock doc source\n - Knowledge Base build path for Kendra GenAI index\n - URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n\\nNotes on broader context (RAG and Bedrock knowledge bases)\n- RAG in Bedrock relies on combining retrieved source content with generation. Bedrock provides operations such as Retrieve, RetrieveAndGenerate, and GenerateQuery to support the RAG workflow; see the Bedrock knowledge base retrieval overview for context.\n - Bedrock retrieval and RAG overview: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html\\n - General knowledge base concepts (overview, building, prerequisites, etc.): https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html\\n - Prerequisites for unstructured data sources and vector stores: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq.html\\n - Build or connect a knowledge base (data source connections): https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build.html\\n - Build a knowledge base with Kendra GenAI index: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n\\nSources (exact URLs used for the above options)\n- Knowledge Base setup (general, includes multiple storage options)\n - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html\\n- Knowledge Base build: Kendra GenAI index path\n - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n- Knowledge Bases overview and retrieval capabilities\n - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html\\n - https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html\\n- Prerequisites for a Bedrock knowledge base with unstructured data (vector store setup)\n - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-prereq.html\\n- Build a knowledge base with a Kendra GenAI index (detailed steps)\n - https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-build-kendra-genai-index.html\\n\\nIf you’d like a quick-reference comparison, I can format this into a compact side-by-side table listing: storage option, Bedrock integration point, required Bedrock fields, known limitations, and the exact source URL for each. All information above is drawn directly from the official Bedrock documentation pages cited.", "tool_calls": [], "additional_kwargs": {}, "response_metadata": {}, "invalid_tool_calls": []}

Checking consumption of tokens and switching between different models

Another practical concern is token usage.

Using a large model for every task is wasteful.
Instead, agents can switch between models dynamically:
Use Nano GPT-5.1 (cheap, fast) for small lookups.
Use AWS Bedrock Claude 3.5 Sonnet (accurate, large context) for reasoning-heavy prompts.

This hybrid approach optimizes both cost and performance.

The flow context data is fully customizable

The extracted data and accumulated response with LLM can be routed to any other flow, custom node chat output, etc.
Here, for any prompt besides providing the response to chat we are creating rtf document and persist it on a file system
for further review and usage.

Conclusions

What we’ve built is a full end-to-end multi-agent ecosystem—all without writing a single line of code. By leveraging n8n’s no-code orchestration, the system allows:

Natural interaction: Agents collaborate dynamically based on user prompts, without predefined rigid flows.
Dedicated integrations: Each agent can connect to specialized MCP servers, custom tools, or even other agents, extending its knowledge far beyond the base LLM.
Persistent memory: All interactions and context are stored in memory, so agents can build on previous sessions instead of starting from scratch.
Transparency and control: With UI-based execution dumps, we can inspect how decisions were made, track history, and debug workflows visually.

This setup proves that multi-agent systems don’t have to be locked away in research papers—they can be practical, maintainable, and production-ready, combining Bedrock models, lightweight GPTs, and n8n’s no-code tools into a flexible AI ecosystem that feels less like a chatbot and more like a team of digital experts.

AWS OpenSearch Full Documents Reindexing: When? Why? How?

Roman Tsypuk — Mon, 22 Sep 2025 06:02:00 +0000

We will explore AWS OpenSearch Service, how it was introduced and deep dive into challenge of documents reindexing.

OpenSearch Brief History: The Forking of Elasticsearch

Elasticsearch is an open-source search engine developed by Elastic NV. It became incredibly popular due to its scalability, distributed nature, and powerful search capabilities. However, in 2021, Elastic NV changed the licensing model of Elasticsearch from Apache 2.0 to a Server Side Public License (SSPL).
This move was made to prevent cloud providers from offering Elasticsearch as a managed service without contributing back to the open-source community.

In response, AWS decided to fork the last Apache 2.0-licensed version of Elasticsearch and create a new service called Amazon OpenSearch Service. This fork not only preserved the open-source nature of the software but also allowed AWS to continue offering a managed search service with full control over its development.

Licensing

After the license change, Elasticsearch is now under the SSPL, which is not recognized as an open-source license by the Open Source Initiative (OSI). The SSPL imposes restrictions on how the software can be used, particularly for cloud services.

ES provides Elastic Cloud installation that can be installed into any public cloud provider resources or on-prem.

AWS OpenSearch: OpenSearch remains under the Apache 2.0 license, which is fully open-source. This means anyone can use, modify, and distribute the software without restrictions, making it more attractive for users who prefer open-source solutions.

Understanding AWS Elasticsearch: Internal Implementation of Indices and Reindexing

Amazon Elasticsearch Service (Amazon ES), now known as Amazon OpenSearch Service, is a managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters in the AWS Cloud. It’s
widely used for real-time application monitoring, log analytics, full-text search, and more. One of the core components of Elasticsearch is its indices, which serve as the primary structure for
storing and querying data. This article delves into the internal implementation of indices in AWS Elasticsearch and provides a guide on how to reindex a database within this environment.

What is an Index in Elasticsearch?

An index in Elasticsearch is akin to a database in traditional relational databases. It contains a collection of documents that are stored and managed together. Each document is a JSON object, and
each field within the document is a data point that can be searched and analyzed.

Key Components of an Elasticsearch Index:

Shards: An index is divided into smaller pieces called shards. Each shard is a self-contained, fully functional instance of Lucene, the underlying search engine for Elasticsearch.
Replicas: For high availability and fault tolerance, Elasticsearch allows you to create replica shards. These are copies of the primary shards that can serve search requests in case the primary shard fails.
Mappings: Mappings define the structure of the documents within an index, including the data types of fields and how they should be indexed and stored.
Internal Implementation of Indices in AWS Elasticsearch

AWS Elasticsearch handles indices similarly to a standard Elasticsearch deployment but with additional layers of management, security, and scaling capabilities provided by AWS.

Key Aspects of AWS Elasticsearch Indices:

Managed Clusters: AWS Elasticsearch manages the underlying infrastructure, including node provisioning, shard allocation, and index replication.
Scaling: Indices can be scaled horizontally by adjusting the number of shards and vertically by increasing the instance size or adding more nodes.
Security: AWS integrates Elasticsearch with other AWS services like AWS IAM, AWS KMS for encryption, and VPC for network isolation, ensuring that your indices are secure and accessible only to authorized users.
Snapshots: AWS Elasticsearch provides automated snapshots for indices, which are stored in Amazon S3 and can be used for backup and recovery purposes.

Why Reindexing is Important

Reindexing in Elasticsearch is the process of copying the data from one index to another. This is often necessary when you need to:

Change the structure of the index, such as altering mappings (changing types).
Improve performance by reconfiguring the number of shards or replicas (also merge/split indecies).
Upgrade Elasticsearch versions that might require data format changes.
Introduce new fields in documents and make them available for search
Reindexing can be a resource-intensive operation, and AWS Elasticsearch provides tools and best practices to ensure that it’s done efficiently without disrupting service availability.

Steps to Reindex in AWS Elasticsearch. Here’s a step-by-step guide to reindexing an index in AWS Elasticsearch:

Step 1: Create the Target Index

Before you start reindexing, you need to create the target index with the desired mappings, settings, and shard configuration.

PUT /new-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text"
      },
      "field2": {
        "type": "date"
      }
    }
  }
}

Step 2: Use the Reindex API

Elasticsearch provides a _reindex API that allows you to copy data from the source index to the target index.

POST /_reindex
{
  "source": {
    "index": "old-index"
  },
  "dest": {
    "index": "new-index"
  }
}

Step 3: Monitor the Reindexing Process

Reindexing can take time depending on the size of your data. You can monitor the progress using the Task API.

GET /_tasks?detailed=true&actions=*reindex

Step 4: Update Aliases (Optional)

Once reindexing is complete, you may want to switch an alias to point to the new index.

POST /_aliases
{
  "actions": [
    {
      "remove": {
        "index": "old-index",
        "alias": "my-alias"
      }
    },
    {
      "add": {
        "index": "new-index",
        "alias": "my-alias"
      }
    }
  ]
}

Step 5: Delete the Old Index (Optional)

After verifying that the new index is functioning correctly, you can delete the old index to free up resources.

DELETE /old-index

Also, during indexing it is possible to update structure of documents (by adding script section):

POST _reindex
{
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination"
   },
   "script":{
      "lang":"painless",
      "source":"ctx._account.number++"
   }
}

No indices' migration technic (acutal for smaller clusters)

If the Data volume is not high, once index template is updated, all new indices will be created from new template (with new fields available for search).
For all existing indicies we need update index mapping (this will allow any update/adding documents to these indices to be in the latest schema).

For older documents we can run scripted update with defined predicate to update needed documents with new fields default values:

Step1: Patch search template

PUT _template/{index}
{
  ...full_body with new mappings
}

Step2: Add to each index new mapping

PUT /{
  index
}/_mappings
{
  "properties": {
    "new_field": {
      "null_value": false,
      "type": "boolean"

  }
}

Step3: Patch all documents with scripted update

POST /{index}/_update_by_query
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "hidden"
        }
      }
    }
  },
  "script": {
    "source": "ctx._source.new_field = false",
    "lang": "painless"
  }
}

Best Practices for Reindexing in AWS Elasticsearch

Snapshot Before Reindexing: Always take a snapshot of your data before starting the reindexing process to safeguard against data loss.
Monitor Cluster Health: Keep an eye on the cluster’s health during reindexing to avoid overwhelming the system.
Use Aliases: Aliases can help minimize downtime by allowing you to switch indices without changing your application code.
Test in Staging: Before reindexing in production, test the process in a staging environment to catch any potential issues.

Conclusion

Reindexing in AWS Elasticsearch is a crucial operation for maintaining and optimizing your search infrastructure. By understanding the internal implementation of indices and following best practices
for reindexing, you can ensure that your Elasticsearch environment remains robust, scalable, and ready to meet the demands of your applications. AWS Elasticsearch, with its managed capabilities and
tight integration with other AWS services, provides a powerful platform for deploying and managing your search workloads.

References (Links)

Amazon OpenSearch Indices migration

AWS-nuke controlled resources cleanup in multiple aws accounts

Roman Tsypuk — Mon, 22 Sep 2025 06:01:29 +0000

AWS-Nuke is an open-source tool, designed to automate the removal of AWS resources from an account. It can be used to quickly and safely delete all resources in a given AWS account, or it can be customized to delete specific types of resources based on user-defined rules.

What is AWS-Nuke?

Whether you’re spinning up test environments or decommissioning an old project, AWS-Nuke is particularly useful for keeping AWS accounts clean, ensuring that old, unused resources do not pile up, leading to cost overruns, security risks, and unnecessary complexity.

AWS-Nuke is a command-line tool that supports multiple AWS services and can be run from any machine that has access to AWS credentials.

Why Use AWS-Nuke?

Before diving into the technical details of AWS-Nuke, it’s important to understand why this tool is useful.

Cost Management: One of the biggest challenges in cloud environments is managing costs. AWS charges based on the resources you use, and sometimes resources that are no longer in use are forgotten and left running. By using AWS-Nuke, you can regularly clean up old resources, ensuring that you only pay for what you actively use.
Environment Cleanup: In development and test environments, resources are often created, modified, and left without proper deletion. AWS-Nuke can help by regularly purging these environments of unnecessary resources, making it easier to manage infrastructure.
Security: Resources left running for extended periods can pose security risks. For instance, unused EC2 instances, security groups, or S3 buckets might expose your environment to attacks. AWS-Nuke helps minimize such risks by cleaning up unused resources.
Automation: AWS-Nuke offers automation capabilities, making it easier to integrate into continuous integration (CI) pipelines or scheduled jobs for periodic account cleanups.
Simplification: With hundreds of AWS services and resources to manage, AWS-Nuke simplifies the task of cleaning up resources by providing a centralized and automated way to perform deletions.

How AWS-Nuke Works

AWS-Nuke works by scanning all the resources in your AWS account and then deleting them based on a configuration file. You can configure AWS-Nuke to delete specific types of resources or to leave certain critical resources untouched.

AWS-Nuke uses the AWS SDK to list and delete resources. It can handle a wide range of AWS services, including but not limited to:

EC2 Instances
S3 Buckets
RDS Instances
IAM Users and Roles
VPCs
Elastic Load Balancers
CloudWatch Alarms
CloudFormation Stacks
ECR Repositories

Configuration

AWS-Nuke uses a YAML configuration file to specify what resources should or should not be deleted. This configuration file is highly customizable, allowing users to target specific AWS resources while leaving others intact.

Here’s a simple example of an AWS-Nuke configuration:

yaml
Copy code
regions:

"us-east-1"
"us-west-2"

resource-types:
excludes:

EC2::KeyPair
IAM::Role
IAM::User This configuration specifies that AWS-Nuke should only run in the us-east-1 and us-west-2 regions, and it will exclude deleting EC2 Key Pairs, IAM Roles, and IAM Users.

Dry-Run Mode
One of AWS-Nuke's most useful features is the dry-run mode, which allows users to see what resources will be deleted without actually deleting them. This is useful for reviewing the results of a cleanup before making any irreversible changes.

To run AWS-Nuke in dry-run mode, simply use the --dry-run flag:

aws-nuke --config config.yml --dry-run

This command will output a list of resources that AWS-Nuke will delete based on the configuration file, without actually performing the deletions.

Running AWS-Nuke
Once you’re confident in your configuration, running AWS-Nuke is simple. First, ensure you have the correct AWS credentials for the account you want to clean up. Then, use the following command:

aws-nuke --config config.yml

This command will delete all resources in the account, except for the ones excluded in the configuration.

Resource Dependencies
AWS-Nuke also handles resource dependencies. For example, it will not delete an S3 bucket until all objects within the bucket have been deleted. This ensures that AWS resources are deleted in the correct order, preventing errors and failed deletions.

brew install aws-nuke

After checking version installed from brew repo is outdated:

 aws-nuke -c config/nuke-config.yml --profile technology
Incorrect Usage: flag provided but not defined: -c

NAME:
   aws-nuke - remove everything from an aws account

USAGE:
   aws-nuke [global options] command [command options]

VERSION:
   3.22.0

AUTHOR:
   Erik Kristensen <erik@erikkristensen.com>

FATA[0000] flag provided but not defined: -c

So I will use the latest version installed from sources:

Contributing to aws-nuke

When running locally aws-nuke with aws profile I got error.

aws_nuke_v2 -c config/nuke-config.yaml --profile personal_profile
Error: failed get caller identity: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Process finished with the exit code 255

And after checking the sources, I find out that profile is present in parameters for aws-nuke, but not implemented.
So I have added this feature to aws-nuke:

if params.Profile != "" {
            cfg, err := awsSDKConfig.LoadDefaultConfig(context.TODO(), awsSDKConfig.WithSharedConfigProfile(params.Profile))
            if err != nil {
                log.Fatalf("unable to load SDK config, %v", err)
            }
            log.Infof("using profile %v", params.Profile)
            /// Extract credentials from the configuration
            cred, err := cfg.Credentials.Retrieve(context.TODO())
            if err != nil {
                log.Fatalf("unable to retrieve credentials, %v", err)
            }
            creds.AccessKeyID = cred.AccessKeyID
            creds.SecretAccessKey = cred.SecretAccessKey
            defaultRegion = cfg.Region
        }

        if !creds.HasKeys() && !creds.HasProfile() {
            creds.AccessKeyID = os.Getenv("AWS_ACCESS_KEY_ID")
            creds.SecretAccessKey = os.Getenv("AWS_SECRET_ACCESS_KEY")
        }

Also added Contribution PR to aws-nuke: https://github.com/rebuy-de/aws-nuke/pull/1263/files

Now aws-nuke correctly process --profile cli argument and is ready to run

Working with implemented profile feature:

go run main.go -c config/nuke-config.yaml --profile personal_account

INFO[0000] using profile personal_account
aws-nuke version unknown - unknown - unknown

Do you really want to nuke the account with the ID 123456789012 and the alias 'personal_account'?
Do you want to continue? Enter account alias to continue.
> personal_account

Tool will query existing resources, match them with filters that we have configured in exclusion and will do dry run.

Resources that we want to leave in account

Let's say we do not want to affect IAM users/permissions/policies, also MFA devices and some route53 records.
These all resources should be declared in aws-nuke config, so when running it will not touch them.

regions:
  - eu-east-1
  - global

accounts:
123456789012:
filters:
PolicyName:
  - property: "UserName"
  - value: "aws-cli"
IAMRole:
  - "OrganizationAccountAccessRole"
IAMRolePolicyAttachment:
  - "OrganizationAccountAccessRole"
  - property: arn
  - value: "arn:aws:iam::aws:policy/AdministratorAccess"
IAMUser:
  - "admin"
  - "aws-cli"
IAMUserAccessKey:
  - property: UserName
value: "aws-cli"
IAMUserPolicyAttachment:
  - property: "arn"
  - value: "arn:aws:iam::aws:policy/AdministratorAccess"
IAMVirtualMFADevice:
  - property: arn
  - value: "arn:aws:iam::123456789012:mfa/to.root.iphone"
Route53HostedZone:
  - property: "Name"
  - value: "xxxxx.xx."

account-blocklist:
  - 1234567890

resource-types:
excludes:
  - Route53ResourceRecordSet
  - Route53HostedZone

Review resources in output

...
global - IAMPolicy - arn:aws:iam::123456789012:policy/terraform-20230725102234859300000005 - [ARN: "arn:aws:iam::123456789012:policy/terraform-20230725102234859300000005", Name: "terraform-20230725102234859300000005", Path: "/", PolicyID: "ANPAVFCI6B4SXG62NJLCB"] - would remove

global - IAMRole - AmazonGrafanaServiceRole-Skz3Lbeo2 - [CreateDate: "2023-03-21T16:02:03Z", LastUsedDate: "2023-03-21T16:02:03Z", Name: "AmazonGrafanaServiceRole-Skz3Lbeo2", Path: "/service-role/"] - would remove

123456789012", PolicyName: "terraform-20230725102234859300000005", RoleCreateDate: "2023-07-25T10:22:27Z", RoleLastUsed: "2023-07-25T10:22:27Z", RoleName: "terraform-20230725102226616700000002", RolePath: "/"] - filtered by config
global - IAMRolePolicy - terraform-20230725102226616700000003 -> terraform-20230725102227744300000004 - [PolicyName: "terraform-20230725102227744300000004", role:CreateDate: "2023-07-25T10:22:27Z", role:LastUsed: "2023-07-25T10:22:56Z", role:Path: "/", role:RoleID: "AROAVFCI6B4SWQNDRGVXF", role:RoleName: "terraform-20230725102226616700000003"] - would remove
Scan complete: 44 total, 13 nukeable, 31 filtered.

The above resources would be deleted with the supplied configuration. Provide --no-dry-run to actually destroy resources.

Perform cleanup

go run main.go -c config/nuke-config.yaml --profile personal_account --no-dry-run

INFO[0000] using profile personal_account
aws-nuke version unknown - unknown - unknown

Do you really want to nuke the account with the ID 123456789012 and the alias 'personal_account'?
Do you want to continue? Enter account alias to continue.
> personal_account

First aws-nuke will scan as in dry-run, and also propose to confirm deletion.

Tool has issues deleting IAMs - it has logic to retry but does not understand resources dependencies. For instance, it can not delete role with attached policies that it also should delete:

ERRO[0373] DeleteConflict: Cannot delete entity, must detach all policies first.
status code: 409, request id: 9506e3fd-5945-48b3-9031-826d28d2abec

so you have manually delete from CLI or aws console

Integration with CI/CD

One of the best use cases for AWS-Nuke is integrating it into a CI/CD pipeline. You can use AWS-Nuke as part of your continuous integration (CI) process to clean up test environments after running tests. This ensures that no unnecessary resources remain after testing is complete, keeping your environment clean and your costs under control.

Here’s an example of how you might integrate AWS-Nuke into a CI/CD pipeline:

Run AWS-Nuke in dry-run mode before deploying a new version of your application to ensure there are no unnecessary resources left over from previous tests.
After the deployment and testing are complete, run AWS-Nuke in full mode to clean up all resources.

Conclusion

AWS-Nuke is a powerful and versatile tool for cleaning up AWS environments. It helps prevent cost overruns, reduces security risks, and simplifies the management of AWS resources. Whether you’re managing a development environment, performing testing, or decommissioning an old project, AWS-Nuke provides an efficient way to automate the cleanup process.

However, AWS-Nuke also requires careful configuration to avoid deleting critical resources. Always use the dry-run mode to review deletions before making any changes, and ensure that your configuration file accurately reflects your environment.

In an age where cloud resources can grow exponentially, tools like AWS-Nuke are indispensable for keeping your infrastructure lean and efficient.

AWS CSM Mode: Advanced monitoring of AWS client

Roman Tsypuk — Mon, 22 Sep 2025 06:01:02 +0000

After exploring aws boto3 core sources on github, I found this interesting commit, that enables monitor mode called CSM.
Once exploring how it works I decided write a post, cause this feature is extremely useful but also start writing aws-client-monitor toolbox on top of it.

Introduction to AWS CSM (Client-Side Monitoring) Mode

AWS Client-Side Monitoring (CSM) is a powerful feature designed to track and analyze the performance of your AWS SDK calls. When enabled, it provides detailed metrics on API requests, response times,
and error rates, helping developers gain a deeper understanding of their application's behavior when interacting with AWS services. This is crucial for debugging, optimizing performance, and ensuring
that applications are running efficiently in production environments.

CSM mode works by capturing information about SDK API calls and sending that data to a local monitoring agent. It helps you:

Track API request latencies.
Identify high failure rates in SDK requests.
Gain visibility into the most frequently called AWS services.

In this blog post, we'll explore AWS CSM mode in more detail, look at common use cases, and provide Golang code snippets to demonstrate how to implement it.

Why Use AWS CSM Mode?

As applications become increasingly reliant on cloud services, monitoring and optimizing the performance of these interactions becomes critical. AWS SDKs are widely used to interface with AWS services
such as S3, DynamoDB, Lambda, and many others. However, managing and tracking these interactions can be challenging, especially when it comes to identifying latency issues or bottlenecks in the
communication between your application and AWS.

CSM provides a granular view of how SDK requests are performing. It allows you to gather metrics like:

Latency: How long each request takes.
Errors: Which AWS services are returning errors and why.
Request Frequency: Which services are being called the most.

This data is invaluable for performance tuning, debugging, and capacity planning.

Common Use Cases for AWS CSM Mode

Performance Optimization:

By tracking the latency of AWS service calls, you can identify the API requests that are taking the longest to execute. This can help you optimize the application's performance, whether through
caching, retries, or parallelizing requests.

Error Tracking:

If your application experiences frequent errors while interacting with AWS services, CSM can help identify the root cause. For example, if a specific AWS service is returning a large number of 5xx
errors, CSM will capture this information, enabling developers to troubleshoot quickly.

Capacity Planning:

Monitoring the number of requests made to AWS services can help forecast capacity needs and adjust resources accordingly. For example, if you're making a large number of requests to DynamoDB, it may
be time to scale your read/write capacity.

Debugging Production Issues:

When something goes wrong in production, CSM can provide critical insights into which AWS services or API calls are causing issues, allowing for quick resolution.

Setting Up AWS CSM in Golang

The AWS SDK for Go provides native support for client-side monitoring. To enable CSM, you need to configure the SDK to send data to the local CSM agent, which processes and forwards it to monitoring
tools like Amazon CloudWatch.

Step 1: Install the AWS SDK for Go

First, you need to install the AWS SDK for Go, if you haven't already:

go get -u github.com/aws/aws-sdk-go

Step 2: Enable CSM in the AWS SDK

To enable CSM in the AWS SDK for Go, you need to configure the environment variables that control CSM behavior, or you can do this programmatically within your application.

Here’s an example of how you can enable CSM using environment variables:

export AWS_CSM_ENABLED=true
export AWS_CSM_HOST=127.0.0.1
export AWS_CSM_PORT=31000
export AWS_CSM_CLIENT_ID=my-client-id

AWS_CSM_ENABLED: Enables or disables CSM.
AWS_CSM_HOST: The hostname where the CSM agent is running (usually localhost).
AWS_CSM_PORT: The port where the CSM agent is listening.
AWS_CSM_CLIENT_ID: A client identifier used to differentiate between clients.

Step 3: Configure AWS SDK to Use CSM

In Golang, you configure the SDK to use CSM when making AWS API calls. Here’s a simple example where we create an S3 client and make a request with CSM enabled.

package main

import (
  "fmt"
  "net"
  "time"
)

func listenUDP(port int, ch chan<- []byte) {
  addr := net.UDPAddr{
    Port: port,
    IP:   net.ParseIP("0.0.0.0"),
  }

  conn, err := net.ListenUDP("udp", &addr)
  if err != nil {
    fmt.Println("Error listening on UDP:", err)
    return
  }
  defer func(conn *net.UDPConn) {
    err := conn.Close()
    if err != nil {
      print("Error closing UDP connection:", err)
    }
  }(conn)

  buffer := make([]byte, 2048)
  for {
    n, _, err := conn.ReadFromUDP(buffer)
    if err != nil {
      fmt.Println("Error reading from UDP:", err)
      continue
    }
    // Send received data to channel
    ch <- buffer[:n]
  }
}

func writeToConsole(ch <-chan []byte) {
  for msg := range ch {
    fmt.Println("Received from channel:", string(msg))
  }
}

func main() {
  byteChannel := make(chan []byte)

  // Goroutine to listen on UDP and write to the channel
  go listenUDP(31000, byteChannel)

  // Goroutines to read from the channel
  go writeToConsole(byteChannel)

  // Prevent the main function from exiting
  for {
    time.Sleep(1 * time.Second)
  }
}

Step 4: Invoke aws cli API

aws s3 ls

Step5: aws-client-monitor will display calls

aws-cli will as usual list the buckets but at the same time it will make 2 calls via UDP to our server.

For each AWS API call there 2 entities ApiCallAttempt and ApiCall, they have the following structure:

{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCallAttempt",
  "Service": "S3",
  "Api": "ListBuckets",
  "Timestamp": 1728194484982,
  "AttemptLatency": 266,
  "Fqdn": "s3.eu-west-1.amazonaws.com",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "AccessKey": "ASIAWFOD4FPxxx",
  "Region": "eu-west-1",
  "SessionToken": "IQoJb3JpZxxx=",
  "HttpStatusCode": 200,
  "XAmzRequestId": "8K3P9AWACxxx",
  "XAmzId2": "vZDGgBpIwz6Jfxxx="
}

{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCall",
  "Service": "S3",
  "Api": "ListBuckets",
  "Timestamp": 1728194484981,
  "AttemptCount": 1,
  "Region": "eu-west-1",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "FinalHttpStatusCode": 200,
  "Latency": 267,
  "MaxRetriesExceeded": 0
}

Tracking errors of aws API

We can also track error of AWS API, let's try to create bucket that exists:

aws s3api create-bucket --bucket existing-bucket

An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

Same error message also present in aws-client-monitor:

{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCallAttempt",
  "Service": "S3",
  "Api": "CreateBucket",
  "Timestamp": 1728195105316,
  "AttemptLatency": 215,
  "Fqdn": "existing-bucket.s3.eu-west-1.amazonaws.com",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "AccessKey": "ASIAWFODxxx",
  "Region": "eu-west-1",
  "SessionToken": "IQoJxxx=",
  "HttpStatusCode": 400,
  "XAmzRequestId": "FPWTJWZC7114XQJE",
  "XAmzId2": "+G9yOxxx",
  "AwsException": "IllegalLocationConstraintException",
  "AwsExceptionMessage": "The unspecified location constraint is incompatible for the region specific endpoint this request was sent to."
}

{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCall",
  "Service": "S3",
  "Api": "CreateBucket",
  "Timestamp": 1728195105312,
  "AttemptCount": 1,
  "Region": "eu-west-1",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "FinalHttpStatusCode": 400,
  "FinalAwsException": "IllegalLocationConstraintException",
  "FinalAwsExceptionMessage": "The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.",
  "Latency": 219,
  "MaxRetriesExceeded": 0
}

Metrics Extraction

Following metrics can be extracted on UDP server side:

API call duration (latency)
History of app operations with AWS API (service name : operation)
Success and failure rates
Error messages
Region
User-agent
IP
clientName
UserAgent
AccessKey
Request payload size.

We can use these insights to fine-tune our application's performance and catch potential bottlenecks or errors.

Conclusion

AWS Client-Side Monitoring (CSM) is a valuable tool for gaining deep insights into how your application interacts with AWS services. It helps you monitor performance, identify errors, and optimize API
usage. By integrating CSM with the AWS SDK for Go, developers can track API calls and gather real-time metrics, ultimately improving application performance and reliability.

If you're working in a production environment where optimizing performance and troubleshooting issues is crucial, CSM is a feature worth enabling. With just a few configuration steps, you can gain
access to a wealth of data that will help you fine-tune your AWS-based applications.

CDC in AWS: Content Data Capture from AWS RDS MySQL into AWS MSK Kafka topic using Debezium

Roman Tsypuk — Mon, 22 Sep 2025 06:00:29 +0000

Kafka connect is a powerful open-source platform for Change Data Capture (CDC), enabling real-time event streaming from databases like MySQL.
In this post, we'll explore how to set up one of the most popular Open Source Debezium to capture changes from an AWS MySQL RDS instance and publish them to a Kafka topic.
We'll also dive into the Debezium message format and explain the difference between standalone and cluster modes.

What problem does it solve

Debezium allows you to:

Stream real-time changes from a MySQL RDS instance.
Track inserts, updates, and deletes as they occur.
Publish these changes to Kafka topics for downstream processing.

This is particularly useful for building event-driven architectures, data pipelines, and synchronizing databases with other systems.

Besides Debezium there are multiple opensource connectors available on confluent platform to provide intergration point with different sink and source systems like AWS S3, ElasticSearch
**, **MongoDB, etc

Setting Up Debezium for MySQL RDS

1. Enable Binary Logging on MySQL RDS:

Log in to your RDS instance and ensure binary logging is enabled in your parameter group.
Configure the following parameters:

binlog_format = RAW
binlog_row_image = FULL
Ensure binlog_retention_period is set to a sufficient duration for your use case.

SHOW
VARIABLES LIKE 'binlog_format';

Variable_name	Value
binlog_format	RAW

SHOW VARIABLES LIKE 'log_bin%';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';

SET SESSION binlog_format = 'ROW';
SET
GLOBAL binlog_format = 'ROW';

3. Run Debezium with Kafka Connect:

Use Docker Compose to start Debezium and Kafka Connect:

services:
  debezium:
    image: debezium/connect:2.7.3.Final
    ports:
      - "8083:8083"
    healthcheck:
      test: [ "CMD-SHELL", "curl -f http://localhost:8083/ || exit 1" ]
    networks:
      - kafka-cluster
    environment:
      - BOOTSTRAP_SERVERS=b-4.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092,b-3.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092,b-1.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092
      - GROUP_ID=MYSQL_1
      - CONFIG_STORAGE_TOPIC=debezium_connect_configs
      - OFFSET_STORAGE_TOPIC=debezium_connect_offsets
      - STATUS_STORAGE_TOPIC=debezium_source_connect_statuses
      - CONFIG_STORAGE_REPLICATION_FACTOR=1
      - OFFSET_STORAGE_REPLICATION_FACTOR=1
      - STATUS_STORAGE_REPLICATION_FACTOR=1
networks:
  kafka-cluster:
    driver: bridge

CONFIG_STORAGE_TOPIC, OFFSET_STORAGE_TOPIC, STATUS_STORAGE_TOPIC are topics in Kafka where connector will store its own configuration and perform synchronization.
CONFIG_STORAGE_REPLICATION_FACTOR, OFFSET_STORAGE_REPLICATION_FACTOR, STATUS_STORAGE_REPLICATION_FACTOR should always be 1 in replication factor.

4. Detect server_id that will be used in connector configuration

SHOW
VARIABLES LIKE 'server_id';

5. Configure the MySQL Connector

Kafka connect exposes HTTP port to establish and monitor connectors.

Create a MySQL source connector by sending a POST request to the Kafka Connect REST API:

curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @../payload.json

{
  "name": "mysql-moderation-comments-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "xxxx.us-east-1.rds.amazonaws.com",
    "database.port": "3306",
    "database.user": "user",
    "database.password": "password",
    "database.server.id": "this_is_mysql_server_id",
    "topic.prefix": "mysql-cdc",
    "database.include.list": "public",
    "table.include.list": "public.users",
    "schema.history.internal.kafka.bootstrap.servers": "b-4.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092,b-3.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092,b-1.kafka-dev-0.xxxx.c6.kafka.us-east-1.amazonaws.com:9092",
    "schema.history.internal.kafka.topic": "schema-changes.moderation",
    "include.schema.changes": true,
    "key.converter.schemas.enable": false,
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": false,
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "database.history.skip.unparseable.ddl": true
  }
}

Parameter	Details
connector.class	Class name of connector from JAVA_PATH in running container. There are pre-built available connectors - "plugins", also we can add any exising plugin to container or write our own.
tasks.max	only 1 task should be operatable at time - proper order and handling of bin log Kafka connect service uses connectors to start 1 or more task that do the actual work and distributes running tasks across the cluster of Kafka connect services. If any of services stopped or crashed those tasks will be redistributed to other running services
database.hostname	RDS endpoint IP or DNS
database.port
database.user
database.password
database.server.id	unique identifier of MySQL server - this is a master server ID
database.server.name	logical ID of the server or cluster of services, used as prefix for all kafka topics
topic.prefix	"mysql-server" - prefix that can be added to Kafka topic to distinguish it from other existing topics
database.include.list	coma-separated list of DBs whose CDC should be captured
schema.history.internal.kafka.topic	connector puts all DDL statements in this topic while reading the binlog. On restart the connector will recover the schema of the DB that existed in point in
key.converter	className of converter or transformer for Event
value.converter	value converter
table.include.list	name_of_the_table

6. Export from MySQL table

Once connector is setup it will create a snapshot of data and will ingest into Kafka. After that connector will monitor for new records and updates in CDC.

debezium-1 | INFO   MySQL|mysql-cdc|snapshot    Exported 496538 of 548957 records for table 'public.users' after 00:41:53.279   [io.debezium.relational.RelationalSnapshotChangeEventSource]
debezium-1 | INFO   MySQL|mysql-cdc|snapshot    Exported 605082 of 548957 records for table 'public.users' after 00:53:07.497   [io.debezium.relational.RelationalSnapshotChangeEventSource]
debezium-1 | INFO   MySQL|mysql-cdc|snapshot    Finished exporting 605274 records for table 'public.users' (1 of 1 tables); total duration '00:53:07.522'   [io.debezium.relational.RelationalSnapshotChangeEventSource]
debezium-1 | INFO   MySQL|mysql-cdc|snapshot  Releasing table read lock to enable MySQL writes   [io.debezium.connector.binlog.BinlogSnapshotChangeEventSource]
debezium-1 | INFO   MySQL|mysql-cdc|snapshot  Writes to MySQL tables prevented for a total of 00:53:13.755   [io.debezium.connector.binlog.BinlogSnapshotChangeEventSource]
debezium-1 | INFO   ||  WorkerSourceTask{id=mysql-moderation-comments-connector-0} Committing offsets for 10242 acknowledged messages   [org.apache.kafka.connect.runtime.WorkerSourceTask]

7. Verify Kafka Topic

Use kafka-console-consumer to check the topic for messages:

kafka-console-consumer --bootstrap-server localhost:9092 --topic mysql-cdc.your_database.your_table --from-beginning

Debezium Data Format

Debezium emits messages to Kafka in a JSON format with three main parts:

key (Identifies the specific database row.)
value (Contains the actual change event, with fields like before, after, op, ts_ms, etc.)

Fields:

before: State of the row before the change.
after: State of the row after the change.
op: Type of operation (c for create, u for update, d for delete).
source: Metadata about the event source.
ts_ms: Timestamp of the event.

{
  "op": "u",
  "source": {
    ...
  },
  "ts_ms": "...",
  "ts_us": "...",
  "ts_ns": "...",
  "before": {
    "userid": "1",
    "name": "bob"
  },
  "after": {
    "userid": "1",
    "field2": "alice"
  }
}

Hide schema from payload

There are multiple configurations and data formats that allow to transform event, hide not needed fields, also register custom SingleMessageTransformers etc.

Here we are instructing Debezium to exclude schema from the payload:

debezium.source.value.converter=org.apache.kafka.connect.json.JsonConverter
debezium.source.value.converter.schemas.enable=false

Conclusion

Debezium is an excellent choice for capturing data changes in real time from MySQL RDS and streaming them into Kafka.
Its support for schema change tracking, rich message format, and scalability make it ideal for modern data pipelines.

Whether you choose standalone mode for simplicity or cluster mode for fault tolerance, Debezium provides the flexibility to meet your needs.

Remote Debugging/Profiling NodeJS app running in private ECS cluster using AWS SSM and inspect protocol

Roman Tsypuk — Mon, 22 Sep 2025 05:59:41 +0000

Debugging a Node.js application running inside a container within an ECS cluster can be daunting. The challenge becomes especially significant in restricted environments like AWS ECS, where you lack direct control over the running workloads. Traditional debugging methods, such as SSH or exposing ports, are often not viable in these setups.

AWS provides a secure and efficient solution through AWS Systems Manager (SSM). By leveraging SSM documents and port forwarding, you can securely establish connections to your Node.js containers and expose the debugger port without compromising the security of your infrastructure.

In this post, we will walk through the steps to set up and perform remote debugging of a Node.js container using an SSM.

Debugging

The Node.js debugging process using the --inspect flag allows developers to attach a debugging client (such as Chrome DevTools) to the application and analyze its behavior in real-time.

Node.js uses the Chrome DevTools Protocol (CDP) for debugging.

Capabilities of Node.js Inspector:

Breakpoints: Pause execution at specific lines to inspect the state of variables and objects.
Heap Analysis: Capture and analyze snapshots of memory usage to identify leaks.
Call Stack: View the sequence of function calls leading to a particular point in execution.
Performance Profiling: Measure the performance of code execution and detect bottlenecks.
Step Execution: Step through code one line at a time to observe how the application progresses.

If successful, you’ll see output indicating that the port has been forwarded and the session is waiting for connections.

AWS ECS specifics and limitations for debugger setup

The main challenge in remote debugging is enabling port-forwarding between the application runtime and the machine where the debugger (e.g., Chrome DevTools) is running. This process is straightforward in local environments but complex in cloud-managed services like ECS.

Key challenges include:

Lack of direct access to the underlying infrastructure: ECS manages the container instances, making traditional SSH-based debugging methods impossible.
Maintaining security: Exposing debugger ports publicly can introduce vulnerabilities.
Securely forwarding ports: Ensuring the debugger port is accessible only to authorized clients while retaining private IP configurations.

AWS SSM addresses these issues by allowing secure execution of commands and port forwarding on remote workloads, providing a robust solution for debugging applications running in ECS.

Step 1: Prepare the Node.js Application

To debug the Node.js application, it must be started with the --inspect flag enabled. This exposes the debugging interface on a specified port.

Modify the ECS task definition to include the following command:

"command": ["node", "--inspect=0.0.0.0:9229", "app.js"]

/usr/src/node-red # node --inspect index.js
Debugger listening on ws://127.0.0.1:9229/9d82bc40-6fe9-4148-b1e8-8089b3351b9f
For help, see: https://nodejs.org/en/docs/inspector
Data
Starting memory leak simulation...
Memory usage: 3 MB
Memory usage: 3 MB

Here is a test sample application that emulates memory leak for analysis:

const memoryLeakArray = [];

function createLeak() {
  const leakObject = {
    timestamp: new Date(),
    data: Buffer.alloc(1024 * 1024) // Allocate 1MB of memory
  };
  memoryLeakArray.push(leakObject);
}

function printMemoryUsage() {
  const used = process.memoryUsage();
  console.log(`Memory usage: ${Math.round(used.heapUsed / 1024 / 1024)} MB`);
}

async function runLoop() {
  while (true) {
    createLeak();
    printMemoryUsage();
    await new Promise(resolve => setTimeout(resolve, 1000)); // 1 second delay
  }
}

console.log("Starting memory leak simulation...");
runLoop();

After making this change, redeploy the ECS task.

Step 2: Use SSM to Connect to the Container

AWS provides several predefined Systems Manager documents (SSM documents) for various tasks, and AWS-StartPortForwardingSession is among them. This document allows secure port forwarding from an AWS resource (such as an ECS container) to your local machine.

To forward the Node.js debugging port (9229) to your local machine, run:

aws ssm start-session \
--target ecs:<cluster-name>_<task-id>_<container-runtime-id> \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["9229"], "localPortNumber":["9229"]}'

Starting session with SessionId: xxxx-xxxx
Port 9229 opened for sessionId xxxx-xxxx.
Waiting for connections...

Connection accepted for session [xxxx-xxxx]

Step 3: Debug the Application

Open Chrome and navigate to chrome://inspect.

Under “Remote Target,” locate the forwarded debugging session and click “Inspect.”

By default, Chrome is configured with 9229 port, but you can use any custom port that was decided to be used for debugging.

Use Chrome DevTools to analyze and debug the Node.js process:

At the same time in console output message of remote debugger attached will be listed.

Memory usage: 3 MB
Debugger attached.
Memory usage: 3 MB
Memory usage: 3 MB
Memory usage: 3 MB

Monitor the event loop:

Analyze heap snapshots to identify memory leaks:

Step through code execution to pinpoint issues:

Conclusion

The beauty of this solution is a high security - the debugging port does not need to be exposed publicly, tunel is initiated dynamically between machine and container.
Even having no inbound/outbound, security groups for cluster (fully private isolated one) we can access to workloads.
And access to such Private Networks without additional network configuration decreases operational overhead.

Amazon ElastiCache Redis as a Vector Embeddings Storage for Semantic Search in AWS Community Blog posts

Roman Tsypuk — Thu, 18 Sep 2025 09:08:37 +0000

Abstract

The AWS Community Builders program has produced an enormous trove of insightful blog content over the years. These
posts, authored by community members across the globe, capture deep technical expertise, innovative use cases, and
countless hours of experience.
But as this body of content grows, finding related information becomes increasingly difficult using traditional keyword
search.

Challenge

To tackle this challenge, I experimented with using Amazon ElastiCache for Redis as a vector store to power semantic
search across all blog posts starting 2021.

This blog post outlines how I vectorized blog content, stored embeddings in Redis, and leveraged K-Nearest Neighbors (
KNN) to discover content semantically related to phrases or topic of interest, discover posts, cluster posts, identify
the trends etc.

Why Use Redis for Vector Search?

Traditionally, Redis is used for caching and fast key-value operations. However, with the introduction of the Redis
Vector Similarity Search (VSS) module, Redis can now be used as a high-performance vector database. This makes it a
powerful choice when you need millisecond-latency search over large datasets of embeddings, such as blog post vectors.

Benefits of Redis VSS:

Speed: In-memory performance with extremely low latency.
Simplicity: Store and retrieve vectors using Redis CLI or SDKs.
Scalability: Redis clusters on ElastiCache scale with demand.
Integration: Easy to integrate with Python NLP libraries and AWS services.

What is an Embedding and How Does it Work?

An embedding is a numerical representation of text that captures its meaning in a multi-dimensional space. Words,
sentences, or even entire documents are transformed into fixed-size vectors of floating-point numbers using models like
BERT or Sentence Transformers.

These vectors are constructed such that semantically similar content lies close together in vector space. For example,
posts discussing "AWS re:Invent keynote" and "major AWS announcements" would have similar embeddings even if they use
different words.

Target Architecture:

Each blog post description and content was passed through a transformer-based model.
The output vector (e.g., 384 dimensions) was stored in Redis.
When a query is made (e.g., "golden jacket"), it is also converted into an embedding.
Redis performs a KNN search, returning the closest blog vectors by cosine similarity or Euclidean distance.

Prepare schema in Redis

Once I have collected the raw data, I created index in Redis, all documents will be injected as json document
also having postDescriptionEmbeddings defined as SchemaFieldTypes.VECTOR

This vector will be calculated using same algorithm that will be used for the search.

when creating index, the dimension should exactly match to the dimension of model that is used for embeddings

Since Xenova/all-MiniLM-L6-v2 was used dimension 384. You can set any other model to check if accuracy is increased,

but vector size can not be changed so new index creation and full data re-ingestion will be required.

/*    (RAW COMMAND)
          FT.DROPINDEX idx:posts

          FT.CREATE idx:posts
          ON JSON
              PREFIX 1 "posts:"
          SCHEMA
          '$.postDescriptionEmbeddings': {
            type: SchemaFieldTypes.VECTOR,
            TYPE: 'FLOAT32',
            ALGORITHM: VectorAlgorithms.FLAT,
            DIM: 384,
            DISTANCE_METRIC: 'L2',
            INITIAL_CAP: 111,
            BLOCK_SIZE: 111,
            AS: 'postDescriptionEmbeddings',
          },
          '$.postDescription': {
              type: SchemaFieldTypes.TEXT,
              NOSTEM: true,
              SORTABLE: true,
              AS: 'postDescription',
          },
          '$.link': {
            type: SchemaFieldTypes.TEXT,
            NOSTEM: true,
            SORTABLE: true,
            AS: 'link',
          },

Ingesting Data into Vector-based storage

I choose Xenova/all-MiniLM-L6-v2 for Embedding and vectorization, so once model is initialized, all data in a
processing loop
will be passed to it to augment blog post with on more field postDescriptionEmbeddings

let modelName = "Xenova/all-MiniLM-L6-v2";
let pipe = await transformers.pipeline("feature-extraction", modelName);

const strippedText = stripHTML(post.description);
post["_id"] = post.id
post["postDescription"] = strippedText;

const sentenceEmbedding = await generateSentenceEmbeddings(strippedText, pipe);
post["postDescriptionEmbeddings"] = sentenceEmbedding;


await nodeRedisClient.json.set(`${POSTS_KEY_PREFIX}:${post.id}`, "$", {
  ...post
});

async function generateSentenceEmbeddings(_sentence, pipe): Promise<number[]> {
  let vectorOutput = await pipe(_sentence, {
    pooling: "mean",
    normalize: true,
  });

  const embeddings: number[] = Object.values(vectorOutput?.data);
  return embeddings;
}

Ingested data in Redis:

using redis-cli we can JSON:GET any random document and explore its fields:

JSON.GET posts:0a1a9e88-32e3-43af-86ce-c70b27ef9482

Besides standard data now it contains also postDescriptionEmbeddings vector (here I'm cutting digits not to represent the full 384 size of it):

{
  "link": "https://dev.classmethod.jp/articles/awsbasics-s3-tables-athena-apache-iceberg/",
  "createdAt": "2025-07-20T02:02:36.821Z",
  "postDescription": "This article introduces AWS S3 Tables, a new service simplifying Apache Iceberg table management on S3. It shows creating tables via console/CLI, CRUD operations with Athena, and highlights automatic optimization, ACID transactions, and seamless integration.",
  "postDescriptionEmbeddings": [
    -0.05445345863699913,
    -0.03639936074614525,
    -0.09256941080093384,
    -0.02023949660360813,
    0.016897333785891533,
    -0.05080459639430046,
    -0.047063130885362625,
    ...

    0.05633614584803581,
    0.030490988865494728,
    -0.035038333386182785,
    -0.014955290593206882,
    0.03450820595026016
  ]
}

Congratulations!!! Now we have a digital footprint of all CloudBuiders blog posts.

Algorithms to search vectors

No using same model Xenova/all-MiniLM-L6-v2 we can convert any search phrase into vector:

const nodeRedisClient = getNodeRedisClient();
let modelName = "Xenova/all-MiniLM-L6-v2";
let pipe = await transformers.pipeline("feature-extraction", modelName);

const searchTxtVectorArr = await generateSentenceEmbeddings(searchTxt, pipe);

Here is example of search phrase Golden Jacket that is transformed using Embedding model Xenova/all-MiniLM-L6-v2 into a Vector, this is what vector looks like (with dimension 384):

[
      -0.0519644059240818,    0.08905413746833801,    0.01799578033387661,
     0.049779586493968964,   0.035871658474206924,   0.020033041015267372,
      0.18083901703357697,   -0.05256553739309311,  0.0003853921080008149,
     -0.04665731266140938,    0.01234174519777298,   -0.02134784311056137,
      0.03652122616767883,  0.0066316151060163975,   0.001835306640714407,
      -0.0348852276802063,   0.026533348485827446,  -0.008266004733741283,
     0.023030489683151245,  -0.025720935314893723,   -0.04542113095521927,
   -0.0034457785077393055,   -0.00999429076910019,    0.05410130321979523,
     -0.07294181734323502,    0.04549195244908333,   -0.03297987952828407,
     0.039497267454862595,   -0.05792579427361488,   -0.08336731791496277,
     0.028136223554611206, -0.0004739225551020354,   -0.03645763173699379,
      0.02329356223344803,    -0.1444464474916458,   -0.02707810141146183,
     0.033310066908597946,  0.0071496316231787205,   -0.11640160530805588,
      0.10475143790245056,    -0.0539717860519886,  -0.039327893406152725,
     -0.08680122345685959,   -0.03737214207649231,   -0.02752811089158058,
     0.005382977891713381,  -0.037658315151929855,  -0.009982343763113022,
    -0.014092876575887203,     0.0367547869682312,   -0.07399916648864746,
     0.008599113672971725,   -0.07765820622444153,   0.013821756467223167,
      0.06757687777280807,    0.09657421708106995,  -0.008364601992070675,
     -0.05157032981514931,    -0.0255973469465971,   0.030314497649669647,
    -0.036268994212150574,   0.004683461040258408,   -0.05857793241739273,
     0.009458839893341064,     0.0592721663415432,     0.0403432659804821,
     -0.06441762298345566,    0.05017191544175148,   -0.05505743995308876,
     -0.04220070689916611,    0.05020679160952568,    0.02721044048666954,
  0.000019773140593315475,  0.0006230986909940839, -0.0004238316905684769,
       0.0330372080206871,  -0.040411293506622314,    0.02342337928712368,
      0.09403083473443985,  -0.030885351821780205,    0.03300361707806587,
     -0.06219857931137085,   -0.06710682064294815,    0.06534728407859802,
      0.02729918621480465,  -0.021849708631634712,    0.04849373549222946,
     -0.04060141369700432,   -0.01779020205140114,   0.015026750974357128,
      0.03931884467601776,  -0.016998404636979103,   0.020048541948199272,
      0.07344327867031097,  -0.027000991627573967,  -0.013987813144922256,
      0.07202176004648209,   -0.10249660909175873,   -0.07182241231203079,
        0.177199587225914,
  ... 284 more items
]

Having search vector and all posts descriptions vectorized and stored in Redis, using KNN we will search the nearest neighbor similarities:

const searchQuery =
  `*=>[KNN ${_resultCount} @postDescriptionEmbeddings $searchBlob AS score]`;
results = await nodeRedisClient.ft.search(POSTS_INDEX_KEY, searchQuery, {
  PARAMS: {
    "searchBlob": float32Buffer(searchTxtVectorArr)
  },
  RETURN: ["score", "description", "createdAt", "region", "link"],
  SORTBY: {
    BY: "score",
  },
  DIALECT: 2,
});

Let's find a CloudBuilders based on a particular topic

Searching Golden Jacket relevant topics. Got search output:

total, number of found documents found with limit
score, vectors compatibility for each document returned

{
  "total": 5,
  "documents": [
    {
      "id": "posts:79256de8-00aa-4127-b025-df410fb63414",
      "value": {
        "score": "1.01139831543",
        "description": "Although I cannot have a AWS 100% fully certified Golden Jacket now, I still buy myself a 25% fully certified Golden Hat 👒 for a little FUN 🤩 Never too late to learn Amazon Web Services (AWS) ! Start your cloud journeys TODAY ! You will become Fully Certified in the future.",
        "link": "https://www.linkedin.com/posts/walterwlee_aws-cloud-future-activity-7020547672751169536-T3r8?utm_source=share&utm_medium=member_desktop"
      }
    },
    {
      "id": "posts:e05e61f3-1cd9-4483-98c7-ad3681a1ea00",
      "value": {
        "score": "1.03479576111",
        "description": "Thanks a lot to the Amazon Web Services (AWS) community builder jacket ! 😀 very warm and high quality with YKK zipper 🥰",
        "link": "https://www.linkedin.com/posts/walterwlee_aws-awscommunity-awscommunitybuilders-activity-7010055038479859712-ahY-?utm_source=share&utm_medium=member_desktop"
      }
    },
    {
      "id": "posts:2f63f16d-8080-4e27-baed-f5c582617106",
      "value": {
        "score": "1.07405209541",
        "description": "A blog post sharing my experience doing the AWS certifications for the Golden Jacket, for others who might be interested in going through the same experience.",
        "link": "https://chamila.dev/blog/2024-10-28_completing-11-professional-certifications-in-8-weeks/"
      }
    },
    {
      "id": "posts:554df865-951d-4937-abad-5ef84f6e67fe",
      "value": {
        "score": "1.12301290035",
        "description": "How you can achieve all AWS certifications and get the golden jacket - Part 1 ",
        "link": "https://dev.to/electrokat/how-you-can-achieve-all-aws-certifications-and-get-the-golden-jacket-part-1-16dp"
      }
    },
    {
      "id": "posts:9a772e6c-f3fe-419a-afe2-b5b0038dbdcb",
      "value": {
        "score": "1.2975268364",
        "description": "How you can achieve all AWS certifications and get the golden jacket - Part 2 - How to study ",
        "link": "https://dev.to/electrokat/how-you-can-achieve-all-aws-certifications-and-get-the-golden-jacket-part-2-how-to-study-2b4c"
      }
    }
  ]
}

search results for "Golden Jacket":

Score	Details
1.01139831543	Although I cannot have a AWS 100% fully certified Golden Jacket now, I still buy myself a 25% fully certified Golden Hat 👒 for a little FUN 🤩 Never too late to learn Amazon Web Services (AWS) ! Start your cloud journeys TODAY ! You will become Fully Certified in the future.
1.03479576111	Thanks a lot to the Amazon Web Services (AWS) community builder jacket ! 😀 very warm and high quality with YKK zipper 🥰
1.07405209541	A blog post sharing my experience doing the AWS certifications for the Golden Jacket, for others who might be interested in going through the same experience.
1.12301290035	How you can achieve all AWS certifications and get the golden jacket - Part 1
1.2975268364	How you can achieve all AWS certifications and get the golden jacket - Part 2 - How to study

search results for "play games":

Score	Details
1.49744784832	🚀 Vibe coding a simple STEM Game with Amazon Q CLI —so I created this simple game to help make learning easier with the help of Amazon Q CLI 🤖✨.
1.54997515678	Learn DevOps By Doing Projects
1.56706357002	I wrote a blog post about building a web-based game using only Amazon Q Developer. The post received 199 readers and 26 reactions from the community.
1.57633209229	This article about my personal challenges as a sales and business developer to have interaction with tech. persons.
1.61025881767	Join the AWS GenAI PartyRock hackathon!

What else can we do with vectors?

Discover similar blog posts and engage with authors.
Track evolving themes across community content.
Understand what topics resonate or need more visibility.
Retrieve relevant document chunks or paragraphs to feed into an LLM (e.g., via Retrieval-Augmented Generation or RAG).
Document Classification
Posts with similar embeddings can be linked with a virtual “related-to” edge. You have read a post and enjoy it, vectorDB can recommend other posts that have similar vector

Observations:

The majority of CPU intensive work is on the producer that is running the model and performs the transformation.

The important caviat is that model must be instantiated only once and all data is fed through it. Otherwise, if model is
created for
every data entry, it will take a lot of time to finish the ingession plus unneeded memory alocation will be performed.

Redis instance based on htop info does not have spike during data ingestion, same when search queries are performed.
On data retrieval, the most time-consuming operation (1sec) was request transformation into vector

There are some parameters that allow to adjust output:

return results count — you can return different response batches or implement a cascaded system, that will do next batch call and lower the threshold to establish data
score threshold — the lower the value, the more closer vectors are. However, this is true for a mathematical model that was used but sometimes not to common scence.

Conclusions

Using Elastic Cache Redis as a vector store offers a powerful, low-latency way to bring semantic search to the AWS
Community Builder blog archive. By moving beyond keywords and enabling intelligent discovery, we help amplify the voices
and contributions of community members everywhere.

This architecture is not limited to AWS community content—it can be adapted for any developer-driven knowledge base. The
future of search is vectorized, and Redis makes that future fast, scalable, and accessible.

DEV Community: Roman Tsypuk

AWS Golden Jacket: My Journey of Learning and Certification

When AWS was a DevOps-Only land

SAA-C01: First AWS certification

Next Certifications

ANS-C01: AWS Advanced Networking - the Toughest Certification

ANS-C01: But it is for sure the most in-depth certification

My full AWS certification flow:

Build your own path based on thing you are interested in, tasks, demand

Why learn AWS and certify

Golden Jacket it is not a collection of badges, it is your constant learning and growth

Everything Fails — Including Certification Attempts

AWS Certifications evolve with the IT Industry

non-Certification Challanges

Once certified how to request AWS Golden Jacket

The Hidden Benefit of Learning AWS

AWS Subject-matter expert

Golden Jacket is not a final stop

Links:

Running local AI LLM anywhere: from EC2 instances to Edge Devices

Benefits and use cases running Local Edge Device LLM

Installation on ARM Edge hardware

Installation on Android Devices

Download models GGUF

Quantization

Start LLM server llama.cpp

llamacpp has built in OpenWEB UI frontend

Connect OpenAI client to Model

Benchmark

Limitation

Links:

Redis connection monkey patching in Ruby Jungles

Ruby and Runtime Flexibility

Example

Real-world Example: Patching Redis Connection Pool

Challenges

Solution: Monkey Patching

Original Connection Code (Simplified):

Patched Implementation

Results

Observability Gains

Conclusion

Links:

From Redis to Valkey: pre-migration Reconnaissance — detect all apps & connections in realtime

Abstract

Valkey project bried history

Pre-migration Reconnaissance

Identify clients that are reading/writing to Redis:

redis-cli

LIST CLIENT

MONITOR

CLIENTNAME feature

Writing tool to track clients and operations on Redis:

Instrument client with clientName

Challenge 1: that there are too many connections to this RedisDB

Challenge 2: client versions are too old and will not support Valkey

Completed redis client versions and clientnames alignment

Reconnaissance step output summary

Links:

How switching to SQS Batch operations improves Performance an Billing

Abstract

The idea

Full Datadog tracing of SQS is not supported for all languages:

Regular SQS message send operations

Sending messages in a Batch

Response examples:

AWS Console Behavior

Golang Implementation Example

Dedicated message details

Additional things to check and optimize

Deduplication technic

Distributed Tracing frameworks can consume SQS batch slot for metaiformation

Limitations:

Conclusions:

Links:

Orchestrating AI multi-agent infrastructure with AWS Bedrock, OpenAI and n8n

Abstract

The idea

What is AI-agent and its parts

Core Components of an AI Agent