DEV Community: Andrii Melashchenko

GEO Score: A Free Scanner for AI Agent Discoverability

Andrii Melashchenko — Wed, 18 Mar 2026 01:00:00 +0000

Over the last few weeks, I’ve been deep in the weeds of ‘Building AI-Accessible Infrastructure.’ It’s a fascinating architectural challenge: how do we shift from optimizing for eyeballs to optimizing for context-hungry agents?

As I was documenting the various layers of AI-accessible infrastructure — from llms.txt files to Model Context Protocol (MCP) endpoints — I kept hitting the same wall: I wished there was a quick, automated way to check all these signals at once. Not just ‘does this file exist,’ but ‘is it implemented in a way that an LLM agent can actually consume it?’

So, I built one.

Today, I’m launching GEO Score (Generative Engine Optimization Score), a free, no-login scanner at javatask.dev/geo/ that audits your site for AI readiness across 13 base technical signals — plus up to 3 bonus checks that can push your score above 100%.

How Scoring Works

GEO Score uses a fixed base denominator — the sum of all base check weights — so your baseline score reflects how well you’ve covered the fundamentals. Bonus checks add points on top of that denominator. Failing a bonus check never hurts your score; passing one rewards you.

Inspired by Mozilla Observatory’s approach, GEO Score now allows sites to exceed 100%. The grade scale mirrors what you’d expect, with one addition:

Grade	Score
A+	≥ 100%
A	≥ 90%
B	≥ 75%
C	≥ 60%
D	≥ 40%
F	< 40%

What GEO Score Measures

The scanner focuses on the ’technical plumbing’ required for an AI agent to discover, parse, and trust your content. It breaks these into three distinct layers, each representing a different stage of the agent-to-site interaction.

1. The Discovery Layer (Head Checks)

Before an agent can read your content, it needs to know where it lives — so the scanner performs rapid HEAD requests to verify the existence of these roadmap files:

llms.txt (15 pts): The emerging standard for LLM-friendly documentation. It provides a human-readable and machine-parsable entry point for agents.
llms-full.txt (5 pts): The expanded version for comprehensive context, often containing the full text of documents.
sitemap.xml (10 pts): While built for search engines, modern AI agents use sitemaps to build their initial crawl frontier.
robots.txt AI Policy (8 pts): The base check tests whether 7 AI discovery agents are explicitly allowed (or at least not blocked): OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, Claude-User, Gemini-Deep-Research, YouBot. These are the bots that power AI search citations and deep research across Perplexity, ChatGPT research mode, Gemini Deep Research, and Claude — blocking any of them quietly cuts you out of the AI discovery layer. Blocking training bots like GPTBot or Google-Extended is a separate concern handled in the bonus layer — more on that distinction below.

2. The HTML Layer (Homepage Parse)

Once your content is found, the next question is: can an agent extract semantic meaning and metadata from your raw HTML without relying on fragile scrapers?

JSON-LD Schema (15 pts): We specifically look for Article or TechArticle schema. Structured data is the highest-signal way to tell an agent what your content is actually about.
Meta Description (8 pts): We look for 50–160 characters of clean context. If it’s too short, it’s useless; too long, and it gets truncated.
Meta Author (5 pts): Essential for attribution. Agents need to know who wrote the content to weigh its authority.
Canonical Link (8 pts): Prevents duplicate-content confusion by telling the agent which URL is the source of truth.
OpenGraph (OG) Tags (10 pts): Standardized titles, descriptions, and URLs that provide a fallback for agents when custom schema is missing.
Performance (5 pts): TTFB (Time to First Byte) < 800ms. Agents have time budgets too; slow sites get skipped or deprioritized in real-time search.
Payload Size (3 pts): Page size < 500 KB. Large payloads are expensive for LLMs to tokenize and process.

3. The MCP Layer (Conditional)

This is the next-generation approach to site-agent interaction, moving beyond static files to interactive capabilities.

.well-known/mcp.json (10 pts): The discovery endpoint for Model Context Protocol servers. The base check (10 pts) verifies that /.well-known/mcp.json exists and returns valid JSON. If the manifest contains a valid MCP endpoint URL, the scanner runs a conditional live probe: it POSTs an MCP initialize request to the declared endpoint to verify the server is actually running. The probe result is reported separately as informational — it does not add or deduct points, but it tells you immediately if your MCP endpoint is down.

Bonus Checks: Scoring Beyond 100%

These three checks sit outside the base denominator. Passing them adds to your numerator; failing them changes nothing. The total is capped at 125%.

Check	Bonus	Passes when
JSON-LD Richness	+7 pts	Your JSON-LD includes author, datePublished, and description fields
Training Bot Opt-out	+5 pts	GPTBot and/or Google-Extended are blocked in your robots.txt
Markdown Mirror	+8 pts	/index.md (or /<!-- raw HTML omitted -->/index.md) returns HTTP 200 with a plain-text or Markdown content type

Note: Security Headers (base, 12 pts) and Strict Security Headers (bonus, +8 pts) were removed in March 2026. These checks duplicate what Mozilla Observatory already validates in depth. GEO Score stays focused on AI-agent discoverability signals.

A note on Markdown Mirror: it was originally a base check that penalized failure. It has since been promoted to a bonus — you’re rewarded for providing a clean Markdown version of your content, but not penalized if your CDN doesn’t serve .md files. The pass condition has also been relaxed to accept text/plain, since most CDNs don’t send text/markdown by default.

⚠️ Important: What GEO Score is NOT

Before we look at some results, let’s be clear about scope. GEO Score measures technical implementation signals only.

A site like mozilla.org or wikipedia.org might score low on this tool because they haven’t implemented llms.txt yet — but they are still heavily cited by AI because of their massive domain authority and content quality.

The tool does not measure domain authority, backlink profile, content quality, or citation history.

It does not read your content to see if it’s ‘good.’ It only checks whether the delivery mechanism is ‘correct.’

Think of it this way: GEO Score tells you, ‘Have you implemented the plumbing?’ It does not guarantee, ‘Will AI cite you?’ Technical signals are necessary for long-term discoverability, but they aren’t a replacement for quality writing.

The Three-Tier Bot Taxonomy: Why Blanket Blocking Is a Mistake

When I first built the robots.txt check, I treated any bot blocking as a negative — you lose reach, so you lose points. That was wrong. The current scanner reflects a more sophisticated three-tier taxonomy underpinned by research into how modern AI crawlers work.

Tier 1 — Training crawlers (block these): GPTBot (OpenAI pre-training), ClaudeBot (Anthropic pre-training), Google-Extended (Gemini model training and Vertex AI grounding), CCBot (Common Crawl — feeds Llama, GPT, and many others). These bots ingest your content into model weights without referring traffic back. Blocking them is legitimate IP protection and has no effect on inference-time discoverability — agents don’t need to have trained on your content to cite it.

Tier 2 — AI search indexers (allow these): OAI-SearchBot (ChatGPT search results), Claude-SearchBot (Claude’s search index). These crawlers build real-time indices for conversational search interfaces. Being indexed by them is the AI equivalent of Google crawling your site.

Tier 3 — Deep research agents (allow these): ChatGPT-User (ChatGPT Deep Research mode), Claude-User (Claude real-time query fetcher), Gemini-Deep-Research (Google’s autonomous research agent), PerplexityBot (Perplexity AI search), YouBot (You.com AI search). These agents act on behalf of a specific user to complete a multi-step research task. Blocking them means a user asking Gemini or ChatGPT to research your topic won’t find you — even if you’re a top result in the underlying search.

The base check validates that all seven discovery/research agents (Tier 2 and Tier 3) are not blocked. Blocking training scrapers (Tier 1) while allowing discovery bots is now a separate bonus check (+5 pts). The original scanner penalized this configuration. It shouldn’t have.

Here is the robots.txt configuration that satisfies the base check and earns the training-bot opt-out bonus:

# Block training crawlers (protect your IP)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allow AI search indexers (Tier 2)
User-agent: OAI-SearchBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

# Allow deep research agents (Tier 3)
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Gemini-Deep-Research
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: YouBot
Allow: /

Dogfooding: Scanning javatask.dev

Of course, the first thing I did after finishing the scanner was point it at my own site. I was confident I’d see a straight ‘A.’

After the upgrade, javatask.dev scores A+ (117.9%, bonus: +20.0 pts). Here’s what changed, including one check where my original thinking was backwards.

llms.txt / llms-full.txt : Pass (20 pts) — My Hugo setup generates these automatically as part of the build process.
sitemap / mcp.json : Pass (20 pts).
robots.txt : Pass (8 pts) — I block GPTBot, ClaudeBot, Google-Extended, and CCBot to protect content from training ingestion. Under the 3-tier taxonomy, all 7 AI discovery and research agents (OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, Claude-User, Gemini-Deep-Research, YouBot) are allowed, so the base check passes. Blocking training crawlers also earns the Training Bot Opt-out bonus (+5 pts). What was once treated as a penalty is now correctly recognized as a reward.
meta_author : Fail → Pass (0 pts → 5 pts) — I was shocked to find that while I had og:author in my OpenGraph tags, I was missing the standard meta name="author" tag. This is a classic example of ‘shadow metadata’ — information that humans see (via the site UI) but machines miss. I fixed this immediately.
JSON-LD : Warn → Pass — When scanning the homepage, the tool found WebSite and Person schema, but not Article. I’ve since ensured individual post pages carry full TechArticle schema — which also satisfies the JSON-LD Richness bonus check (requiring author, datePublished, and description fields).

After addressing those gaps, all three bonus checks pass.

A Note on Scanning Homepages vs. Posts

One key insight from dogfooding: some Fail or Warn results are expected when you scan a site’s root domain rather than a specific article.

The scanner checks the specific URL you provide. If you scan javatask.dev/, the tool won’t find an Article schema because the homepage is a list of summaries, not a single post. Similarly, the Markdown Mirror bonus check is designed for individual post URLs — the homepage intentionally doesn’t serve a /index.md equivalent.

Pro-tip: To see your true score, scan a specific blog post URL. That’s where the Article schema and Markdown Mirror bonus check will actually shine.

Technical Brief: Rust, Axum, and Lambda

I wanted GEO Score to be fast — like, really fast. When a user enters a URL, they shouldn’t be staring at a spinner for 10 seconds. I built the backend in Rust using the Axum framework, deployed as an AWS Lambda function.

Concurrency : The scanner doesn’t run checks sequentially. It uses a three-round parallel orchestrator. Round one handles HEAD requests for discovery files. Round two fetches the main HTML and parses it using the scraper crate. Round three probes any discovered MCP endpoints.
Performance : The average scan takes ~200ms of actual compute time. The rest is network latency as we wait for the target site to respond. By using Rust on Graviton2 (ARM64) Lambdas, cold starts stay minimal and execution cost stays nearly zero.
Security : Building a tool that fetches arbitrary user-provided URLs is an SSRF (Server-Side Request Forgery) nightmare. The backend includes a robust validator.rs module that resolves the hostname and rejects private IPv4/IPv6 ranges, loopbacks, and non-HTTPS schemes before a request is ever dispatched.
Rate Limiting : To prevent abuse while keeping the tool login-free, I implemented an in-memory rate limiter that tracks IP addresses, allowing 5 scans per hour. It’s a balance between preventing bot-on-bot violence and keeping the tool accessible.

Try it Yourself

javatask.dev scores 117.9%. That’s the current benchmark — run the scanner on your own site and see where you land.

Run the GEO Score scanner →

It’s completely free, requires no login, and performs no tracking. If you score A+ (100%+), I’d like to know about it — reply to this post or tag me on LinkedIn. I’ll feature the highest-scoring community sites in a follow-up. If you score below 70%, the breakdown will show you exactly what’s missing and why it matters.

This also applies at the professional level. If you maintain documentation, a developer portal, or a marketing site for a product, your AI readiness score affects whether agents surface your content in their answers — not just whether humans find it in search. The infrastructure layer is the same regardless of whether you’re a solo blogger or an engineering team managing 10,000 pages.

What about licensing and intellectual property? The scanner focuses on technical discoverability signals. But there’s a related question this tool doesn’t answer: once AI agents can find and cite your content, what rights do you retain? Can you allow citation but prohibit training? Does a CC-BY-NC-4.0 license cover AI indexing? I’m working on a follow-up article on the legal and technical architecture for embedding IP rights signals in MCP tool responses, HTTP headers, and structured data — so your content stays citable without becoming training fodder. Stay tuned for Part 9 in the series.

Happy building, Andrii Melashchenko

Perfect Technology Storm: The Beginning of Software Engineering 3.0

Andrii Melashchenko — Fri, 30 May 2025 14:29:32 +0000

Executive Summary: The Promise of AI is No UI

Key Thesis

Naval Ravikant's insight, "The promise of AI is no UI", represents a fundamental shift from traditional user interface design to outcome-driven customer value delivery. This isn't about better chatbotsit's about AI agents directly accomplishing user goals without interface friction, making traditional product development paradigms obsolete.

Critical Business Implications

🎯 From "Ease of Use" to Outcomes

Traditional UI/UX metrics are becoming obsolete
AI eliminates interface friction, enabling direct focus on customer value
Shift to outcome-based pricing models becomes feasible

🤝 Required Organisational Changes

Product Managers : Must understand technical architecture decisions (serverless vs. Kubernetes, API strategy, domain-driven design, etc)
Software Engineers : Must deeply understand customer pain points and business outcomes
Both : Need to collaborate on AI-simulated customer scenarios and real customer insights
Senior Leadership : Must facilitate tight cooperation between product managers and software engineers

Core Business Risk

The Volkswagen/Cariad example demonstrates the catastrophic cost of treating software as non-core:

Delayed launches of Porsche Macan and Audi Q6 e-tron
System failures across ID.3, ID.4, and ID.5 models
6,000-person division requiring complete restructuring

Bottom Line

You cannot outsource your core business. In a world where AI eliminates UI barriers, competitive advantage comes from understanding customers deeply and building the proper technical foundation. Organisations that fail to bridge the gap between product managers and software engineers, while treating software development as peripheral, risk existential business failure.

Action Required : Product managers must become technically literate, while software engineers must become customer-obsessed. Senior leadership must facilitate this collaboration. There is no middle ground in the AI-first era.

Why This Transformation is Happening Now

To execute the strategy outlined above, you need to understand the forces that made this shift inevitable. The "no-UI" revolution didn't emerge overnightit's the culmination of four distinct technology waves that developed over two decades. This moment is unique because these previously separate innovations are now converging into a single, transformative force.

Understanding this convergence is critical because it reveals why traditional approaches to software development are suddenly obsolete, and more importantly, how to build systems that harness this perfect storm.

Wave 1: The Data Foundation (The 2000s)

The first wave addressed a fundamental bottleneck: data. Monolithic applications with single, overworked databases couldn't handle the volume and velocity of information that the internet age was creating.

The Shift : We moved from single-server databases to massively parallel processing.

Key Technologies :

Distributed Systems (Hadoop, Spark): Enabled storage and processing of petabytes across clusters of commodity hardware
Cloud Storage & Lakehouses (AWS S3): Decoupled storage from compute, creating vast, affordable data reservoirs
Real-Time Streaming (Kafka, Flink): Enabled live data processing, moving from batch analysis to real-time intelligence

The Outcome : This wave created the limitless, always-on fuel required for large-scale intelligence. Without it, AI would be starved of the data needed to learn and operate.

Wave 2: The Application Deconstruction (The 2010s)

With data solved, the bottleneck shifted to application architecture. Giant, monolithic codebases were brittle, slow to update, and impossible to scale efficiently.

The Shift : We broke down monolithic applications into collections of independent, communicating services.

Key Technologies :

Microservices & Domain-Driven Design : Organised complex systems into smaller, business-focused services that could be developed independently
API-First Architecture (REST, gRPC): Turned every service capability into a callable, documented action - creating a library of digital building blocks
Containerization & Orchestration (Containerd, Kubernetes): Made services portable, scalable, and manageable across any infrastructure

The Outcome : This wave provided the action toolkit. It created the granular, API-accessible functions (bookFlight, queryInventory, sendInvoice) that AI agents need to execute complex tasks.

Wave 3: The Intelligence Explosion (Late 2010s - Present)

With access to vast data (Wave 1) and a toolkit of actions (Wave 2), the final ingredient was a "brain" capable of understanding and reasoning.

The Shift : Machine learning evolved from an academic discipline to an accessible cloud service, culminating in generative AI.

Key Technologies :

ML Frameworks & Platforms (TensorFlow, PyTorch, SageMaker): Standardised tools for building and deploying models at scale
Large Language Models (AWS Bedrock, Google Gemini, Anthropic Claude): Created a paradigm shift from predictive AI to generative AI that understands intent and creates novel outputs

The Outcome : This wave delivered the intelligent orchestratora "brain" that can understand user goals expressed in natural language, formulate plans, and identify necessary actions to achieve them.

The Convergence: Where the Storm Hits Land

Today, these three waves have merged. The fourth wave is this convergence itself, enabling autonomous action:

An AI Agent (Wave 3) can now understand a user's complex goal, access real-time data (Wave 1) for context, and execute task sequences using flexible APIs (Wave 2) to achieve desired outcomesall without human intervention through a graphical interface.

This convergence requires new ways to build trust between AI agents and humans. One approach is for AI agents to share the data and complete reasoning chains behind their recommendations, making their decision-making transparent and verifiable.

Note : UIs won't disappear entirelythey'll evolve into expert debugging tools for AI agent behavior. Just as debuggers help programmers troubleshoot code, UIs will become specialized interfaces for power users to inspect and optimize AI agent performance. End customers will rarely need them, but experts will rely on them for system maintenance.

This is the perfect storm. The challenge is that universities and traditional training teach these waves in isolation. A data engineer learns Wave 1, a backend developer learns Wave 2, and an ML scientist learns Wave 3. However, the future belongs to teams that can synthesise all three into a cohesive strategy, building systems where AI agents deliver value directly.

Those still focused on optimising UIs are polishing deck chairs while the storm has passed them.

The Compression Effect

Traditional Development: 2-3 years (6-12 months market research + 6-9 months UX design + 12-24 months frontend development)

AI-First Development: 1-2 months (2-3 weeks outcome mapping + 2-3 weeks AI agent design + 2-3 weeks API-driven development)

While competitors perfect interfaces, no-UI teams iterate on customer outcomes at lightning speed.

Implementation Guide

Product Managers:

Master technical architecture decisions as strategic business choices
Map technology value chain components quarterly using frameworks like Wardley Maps
Replace UI/UX metrics with customer outcome indicators

Software Engineers:

Understand customer pain points and success metrics deeply
Build API-accessible, outcome-focused systems for AI agent operation
Design for component evolution, not static features

Senior Leadership:

Create structures ensuring daily product-engineering collaboration on customer outcomes

Team Dynamics: Trust technical/business decisions Debate architecture based on outcomes Commit to no-UI strategies Deliver customer outcomes Measure business impact

The New Business Reality

The convergence enables fundamental business model shifts:

From ease-of-use to effectiveness : Customer success over user experience
From interface design to outcome architecture : Build results, not screens
From subscription features to outcome pricing : Charge for results achieved
From user training to AI intelligence : Customers describe goals; AI executes

Conclusion: Master the Storm or Be Swept Away

The perfect technology storm isn't comingit's here. Teams mastering this convergence dominate markets by delivering direct customer value while competitors optimise obsolete interfaces.

Strategic Checklist:

Map components using value chain analysis to align product managers and software engineers. I'm a big fan of Wardley Maps for commoditising non-core elements.
Build an AI-first architecture for agent operation
Measure customer outcomes, not interface metrics
Establish daily product-engineering collaboration

The Volkswagen example shows the existential risk of treating software as peripheral. In the no-UI era, software architecture decisions directly determine business outcomes. Organisations failing to bridge the product-engineering gap risk complete market irrelevance.

Master the storm now, or watch competitors deliver your customers' outcomes faster and without interface friction.

Welcome to Software Engineering 3.0.

The Dawn of AI Workforce: A 2025 Perspective

Andrii Melashchenko — Sat, 28 Dec 2024 08:47:34 +0000

Intorduction

Recent developments in AI have sparked a significant shift in how businesses view technology investments. Ethan Mollick's observation about the "Dreadnought Moment for AI" and Matt Garman, the AWS CEO's announcements at re:Invent suggest that executives are now actively planning to develop their AI workforces. This shift makes investment in AI infrastructure more tangible and strategic than ever before.

However, as discussed in my previous articles What C-Level Leaders Need to Know About AI Agents and Developing an AI Intern for C-Level Executives with AWS Bedrock, implementing an AI workforce isn't straightforward. Organizations must build an entire ecosystem encompassing platforms, data infrastructure, APIs, and cybersecurity, all within a compressed timeframe and ready for production use. Is this achievable within a year? With current AWS ecosystem capabilities and best practices, yes. The real challenge lies not in the technology itself but in organizational change management and leadership.

Note : While Generative AI Agents capable of solving simple to medium-complexity problems may sound like fiction to many, ignoring this technology could prove costly. Organizations that invest in AI-driven productivity improvements may gain a significant competitive advantage in what could be an uneven playing field, influenced by personal beliefs, talent availability, and access to generative AI resources.

The Foundation: Clarity in Investment

Over the past two decades, post-dot-com bubble, I've witnessed multiple technology waves:

Service-Oriented Architecture (SOA) and Grid Computing
Cloud Computing and Big Data
Internet of Things (IoT) and Microservices
Serverless Architecture and API-First Development
Metaverse and Web 3.0

Each wave brought promises and varying degrees of impact. Selling these technological shifts, even internally, has always been challenging. However, Generative AI Agents represent something different technology capable of performing tasks 24/7, 365 days a year, on-demand. While I remain sceptical about AI's ability to autonomously solve complex infrastructure issues, early results are promising for simple and medium complex problems that usually occupy 80% of the time.

AI Workforce Requirements

AI Agents, like human employees, require:

Clear objectives and goals
Access to quality data
Proper guidance and training
Secure working environments

This parallel to human workforce management might seem surprising, but it's fundamental to successful AI implementation.

Another part is the AI Intern development team. You cannot simply say to your experts: Write instructions!. You need someone to become a glue between Human and AI workforces. My professors explained this challenge during a discussion on "Expert Systems" developed in the 1980s; capturing expert knowledge is incredibly complex"like explaining how you breathe." This complexity necessitates specialized roles to bridge the gap between human expertise and AI capabilities.

Inference = AI Workforce: The New Building Block

During AWS re:Invent, CEO Matt Garman introduced a crucial new building block: Inference. Since 2006, AWS has operated on three primary pillars: Compute, Storage, and Databases. The addition of Inference represents a fundamental shift in cloud computing architecture.

This new paradigm closes the loop in data processing. Previously, humans served as the primary processors of information, using tools like books (Storage), notes (Databases), and calculators (Compute). Inference now enables AI systems to process vast amounts of data through Retrieval Augmented Generation, AI Agents, and specialized tooling.

Inference: A Paradigm Shift - The Executive Perspective

To understand the revolutionary impact of AI, let's consider a simple pre-digital scenario: calculating annual revenue. Traditionally, an employee would:

Access information from books (Storage)
Use a calculator (Compute)
Record results in a notebook (Database)

The human brain is the central processor, limited by individual capacity for learning and processing information.

With the introduction of Inference as the fourth building block, we've added an AI workforce that fundamentally changes this paradigm. This new "brain" still utilizes the traditional componentsbooks, calculators, and notebooksbut operates without human limitations. While it can't reason like humans, it can continuously process vast amounts of information.

This represents a groundbreaking shift: organizations now have access to scalable cognitive processing power. However, success still depends on proper guidance and high-quality data infrastructure. The AI workforce doesn't replace human judgment it amplifies our capabilities while requiring thoughtful direction and robust data management.

Inference: A Paradigm Shift - The Technical Perspective

The technology community is grappling with an interesting question: how does Inference fit into the traditional three-tier architecture that has dominated web development for over two decades?

To understand this shift, we need to step back and examine the evolution of Human-Computer Interaction (HCI). Fifty years ago, concepts like the computer mouse were revolutionary. Web technologies represented the pinnacle of HCI development but they were always means to an end, not the end itself.

Consider this: users don't measure webpage load times or mouse DPI when shopping for holiday gifts, conducting research, or automating tasks. The web interface is a tool, not the objective.

The AI workforce transforms this paradigm. Instead of humans interacting through web interfaces, we now have AI agents directly consuming APIs. These agents require:

Immediate access to accurate information
Direct integration with backend services
Elimination of traditional UI overhead

This new architecture accelerates value delivery by removing traditional interface barriers while introducing more sophisticated ways of processing and interacting with data. The fourth building blockInference doesn't just add to the existing architecture; it fundamentally reshapes how we think about system design and user interaction.

Conclusion

The year 2025 likely marks a fundamental transformation in how businesses operate, driven by the emergence of Inference as the fourth building block of modern computing architecture. This isn't merely another technology wave it represents a paradigm shift in how organizations process information and deliver value.

Traditional architectures built on Storage, Compute, and Databases are being reimagined with the addition of AI-powered Inference capabilities. This new building block doesn't just add functionality; it fundamentally changes how businesses can operate by:

Providing an AI workforce available 24/7 on a pay-as-you-go model
Removing traditional interface barriers to accelerate value delivery
Enabling direct API consumption and data processing at unprecedented scales
Offering cognitive processing power without human limitations

However, success in this new paradigm requires a sophisticated foundation:

Robust data infrastructure to feed AI systems
Comprehensive API strategies for seamless integration
Enhanced cybersecurity measures for AI operations
Expert guidance to direct and optimize AI capabilities

Winners in this transition will be organizations that can either provide these capabilities as a service or build and maintain them effectively in-house. While the technical challenges are significant requiring expertise across domain knowledge, cloud architecture, data management, and AI systems the potential benefits make this investment compelling.

The shift to an AI-augmented workforce isn't just about technology adoption; it's about reimagining how organizations can operate in a world where cognitive processing becomes a utility. As we move through 2025, the gap between organizations that embrace this paradigm shift and those that don't may become increasingly apparent.

For those ready to begin this journey, exploring AI Agents through platforms like AWS Bedrock provides a practical starting point. The future of business operations is being shaped now, and the fourth building blockInferenceis the key to unlocking its potential.

How to Define AI Agents with Cloudformation and SAM: A Builder's Guide

Andrii Melashchenko — Thu, 19 Dec 2024 21:34:59 +0000

Introduction

In my previous articles, we explored how AI Agents can transform business operations from a bakery CEO's perspective and showcased data workflow. Let's dive deep into the technical implementation using AWS Bedrock AI Agents defined via CloudFormation and SAM. This guide will focus on the key components of the AI Agent configuration and deployment process, including SAM AWS Lambda to grab data, AI Agent tooling, and AI Agent itself.

This article will give you a working template for building multiple custom AI Agents and showcase how to call them programmatically from your custom user interface.

Note 1. If you dont feel comfortable with the code, I still recommend you check the AI Agent definition to understand the moving parts. It is human readable and I hope, logical: https://github.com/javatask/ai-agent-ceo-fin-advisor/blob/main/bedrock-agent-stack.yaml

Scroll down to skip permission policy section.

Note 2. Source code is avaliable here: https://github.com/javatask/ai-agent-ceo-fin-advisor/tree/main

Prerequisites

Before starting, ensure you have:

AWS Account with Bedrock model access enabled
AWS CLI configured with appropriate permissions
AWS SAM CLI installed
Basic understanding of CloudFormation and Lambda
JQ was installed for JSON processing, and I used it to grab AWS Lambda arn from SAM deployment

Implementation Overview

Our implementation consists of two main CloudFormation stacks:

Finance Tool Lambda (The Intern's Laptop):
Bedrock AI Agent Configuration (The Intern's Guide):

It is best practice to manage the life cycle of your Interns via the OpenGitOps approach, so any degradation in performance or need to be deployed in other regions can be easily made via CI/CD. Ad-hoc changing tooling, instructions and/or interns are OK for development, but for production release, put the Intern version under git control and deploy via CI/CD pipeline.

Understanding AI Agent Components

1. Intern Laptop or Action Groups

The AI Agent needs specific tools to perform tasks. In our case, I defined two functions only analyze_industry_performance and send_email. The goal of analyze_industry_performance the function is to take three parameters. industry, date_from and date_to - these parameters are taken from chat input. For example, in my previous article, I gave the Intern a question How did we do with the hospitality business in the first half of the year?. Our Intern was capable of mapping Hospitality to function parameters industry=hotels and setting valid date_from and date_to. After calling analyze_industry_performance function, the Intern can make HTML or Markdown formatting to send a report via email or show it as a Markdown output.

It is important to give a detailed description of the function and its parameters so the Intern is capable of mapping CEO intent to function or even asking additional questions that are required to build a report based on CEO intent.

  ActionGroups:
        - ActionGroupName: financial-analysis-actions
          Description: Financial analysis and reporting functions
          ActionGroupState: ENABLED
          ActionGroupExecutor:
            Lambda: !Ref LambdaFunctionArn
          FunctionSchema:
            Functions:
              - Name: analyze_industry_performance
                Description: |
                  Analyzes financial performance metrics across a network of bakeries, providing insights into bookings, billings, and industry-specific trends.
                  Industry Parameter Values Valid values:
                  'schools' - Educational institutions and campus dining
                  'cafes' - Coffee shops and small eateries
                  'shops' - Retail bakery outlets
                  'factories' - Industrial/manufacturing facilities
                  'restaurants' - Full-service restaurants
                  'hotels' - Hospitality sector
                  Date Range Parameter Format: ('YYYY-MM-DD', 'YYYY-MM-DD')
                  Valid ranges:
                  Full year: date_from:'2024-01-01', date_to:'2024-12-31'
                  Q1: date_from:'2024-01-01', date_to:'2024-03-31'
                  Q2: date_from:'2024-04-01', date_to:'2024-06-30'
                  Custom: Any date range within 2024

                Parameters:
                  date_from:
                    Type: string
                    Description: date from in format YYYY-MM-DD
                    Required: true
                  date_to:
                    Type: string
                    Description: date to in format YYYY-MM-DD
                    Required: true
                  industry:
                    Type: string
                    Description: industry for each need to do analysis
                    Required: true
              - Name: send_email
                Description: Function to send an email to CEO with arbitrary payload
                Parameters:
                  html_body:
                    Type: string
                    Description: HTML text of the email
                    Required: true
                  subject:
                    Type: string
                    Description: subject of email
                    Required: true

2. Your Guide to Intern or Agent Instructions

Instructions are crucial for AI Agent behaviour. The main point here is to guide the Intern on how to use a laptop and data access.

  Instruction: |
        You are an experienced financial analyst specializing in business finance. 
        Your task is to prepare ad-hoc reports for the CEO based on the financial data 
        provided and the specific request made. Follow these instructions carefully 
        to produce a comprehensive and insightful report.

        First, you will be presented with the financial data by using the function analyze_industry_performance
        Next, you need to work on CEO's specific request.
        If required, use the send_email function to send the HTML version of the report to the CEO's email.

        Analyze the financial data in the context of the CEO's request. Consider the following steps:

        1. Identify the key financial metrics relevant to the CEO's request.
        2. Perform necessary calculations and comparisons.
        3. Look for trends, patterns, or anomalies in the data.
        4. Consider both short-term and long-term implications of the findings.

        When preparing your report, adhere to these guidelines:

        1. Be concise yet comprehensive.
        2. Use clear, professional language.
        3. Support your analysis with specific data points from the provided financial information.
        4. Provide actionable insights and recommendations when appropriate.
        5. Anticipate follow-up questions the CEO might have and address them proactively.

3. Agent Configuration

Key agent parameters include:

Agent Name
Agent Role - In AWS, everyone should have explicit permission to use the Intern's Laptop; the Agent role explicitly permits the Intern to use a company Laptop and company data.
The description gives you the ability to navigate through hundreds of Interns
FoundationModel - Dont ignore this parameter, it is the University that your Intern graduated from, it defines its intelligence, speed and cost. If you find some tasks that are less intelligent but required very frequently, you may hire other Interns via this field.

FinancialAnalysisAgent:
    Type: AWS::Bedrock::Agent
    Properties:
      AgentName: !Ref AgentName
      AgentResourceRoleArn: !GetAtt BedrockAgentRole.Arn
      Description: Demo of CEO finance assistant
      FoundationModel: arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0

First day on the job on the Deployment Process

Before the Intern joins your team, I assume you already gave it permission to your well-defined and secure Data Lakehouse, defined tools to query your data and instructions to follow when asked questions.

Let's walk through the onboarding or deployment script:

Build and Deploy Lambda Function

cd finance-tool-lambdasam buildsam deploy --stack-name ai-agent-finance-tool-$RANDOM_SUFFIX

Deploy AI Agent Stack

aws cloudformation deploy \
    --template-file bedrock-agent-stack.yaml \
    --stack-name ceo-fin-report-$RANDOM_SUFFIX \
    --parameter-overrides file://agent-params.json \
    --capabilities CAPABILITY_IAM

Prepare Agent for Use

aws bedrock-agent prepare-agent --agent-id $AGENT_ID

First question to the Intern or Testing Your AI Agent

For testing, Ill use a simple NodeJS script and AWS SDK for JS to ask the same question from the previous article: How did we do with the hospitality business in the first half of the year? Please send the report to email.

I assume that you have credentials via aws sso login and changed AgentId to yours. You can find AgentId in the newly deployed Cloudformation stack output or on the AWS Bedrock Agents page.

import {
  BedrockAgentRuntimeClient,
  InvokeAgentCommand,
} from "@aws-sdk/client-bedrock-agent-runtime";

/**
 * @typedef {Object} ResponseBody
 * @property {string} completion
 */

/**
 * Invokes a Bedrock agent to run an inference using the input
 * provided in the request body.
 *
 * @param {string} prompt - The prompt that you want the Agent to complete.
 * @param {string} sessionId - An arbitrary identifier for the session.
 */
export const invokeBedrockAgent = async (prompt, sessionId) => {
  const client = new BedrockAgentRuntimeClient({ region: "us-east-1" });
  // const client = new BedrockAgentRuntimeClient({
  //   region: "us-east-1",
  //   credentials: {
  //     accessKeyId: "accessKeyId", // permission to invoke agent
  //     secretAccessKey: "accessKeySecret",
  //   },
  // });

  const agentId = "AJBHXXILZN";
  const agentAliasId = "TSTALIASID";

  const command = new InvokeAgentCommand({
    agentId,
    agentAliasId,
    sessionId,
    inputText: prompt,
  });

  try {
    let completion = "";
    const response = await client.send(command);

    if (response.completion === undefined) {
      throw new Error("Completion is undefined");
    }

    for await (const chunkEvent of response.completion) {
      const chunk = chunkEvent.chunk;
      const decodedResponse = new TextDecoder("utf-8").decode(chunk.bytes);
      completion += decodedResponse;
    }

    return { sessionId: sessionId, completion };
  } catch (err) {
    console.error(err);
  }
};

// Call function if run directly
import { fileURLToPath } from "node:url";
if (process.argv[1] === fileURLToPath(import.meta.url)) {
  const result = await invokeBedrockAgent("How did we do with the hospitality business in the first half of the year? Please send the report to email.", "123");
  console.log(result);
}

Launching this script via node index.mjs I got a response:

{
  sessionId: '123',
  completion: 'We had a relatively stable performance in the hospitality business during the first half of the year, with a slight decrease from Q1 to Q2. Here are the key points:\n' +
    '\n' +
    '1. Total bookings for H1 2024 were $282,607.99, with total billings of $255,309.52.\n' +
    '2. There was a small decline in performance from Q1 to Q2:\n' +
    '   - Bookings decreased by 1.51%\n' +
    '   - Billings decreased by 2.76%\n' +
    '3. The average billing rate slightly decreased from $90.92 in Q1 to $89.64 in Q2.\n' +
    '4. May was our strongest month, while June showed the weakest performance.\n' +
    '\n' +
    "I've sent a detailed report to your email with further analysis and recommendations for improving performance in the second half of the year."
}

Conclusion

While this implementation requires traditional development skills, the power lies in decoupling Tools from the Brain. This separation enables:

Reuse of tooling across multiple Interns
Creation of specialized Intern teams
Complex cross-industry analysis capabilities
Rapid iteration and improvement cycles

Focus on GitOps practices for your AI Agents while trusting your developers with the coding tasks. This approach creates a scalable foundation for unlimited AI workforce expansion.

The complete CloudFormation template and deployment scripts are available in the GitHub repository.

Developing an AI Intern for C-Level Executives with AWS Bedrock

Andrii Melashchenko — Wed, 20 Nov 2024 14:16:22 +0000

Introduction

The journey to implementing AI Agents often seems straightforward in theory but proves complex in practice. This complexity stems from a fundamental challenge: bridging the gap between business leaders and technical teams. An organization's success with AI initiatives often correlates directly with how well these two groups communicate and share a common language - what we in the industry call Domain-Driven Design.

"But where are my AI Agents?" you might ask. Before diving into the technology, we must address the cultural foundation for success. Daily collaboration between business and IT teams is crucial for any AI Agent implementation.

You may wonder why invest time in yet another hyped technology. In this article, I'll demonstrate what a simple AI Agent can do for an executive through a practical example. This story builds on my previous article, What C-Level Leaders Need to Know About AI Agents, where we introduced a CEO of a medium-sized bakery network who discovered that AI Agents function like interns with specific tools and capabilities.

Note: While this article may appear lengthy, most of the content consists of detailed appendices and a sample AI-generated report for reference.

The Story

After a sleepless night, our CEO (let's call him Ken) received a last-minute invitation to a hospitality conference. This sparked an idea: he needed better insights into his company's financial relationship with hotels. Given the time constraint, Ken asked his IT director and accountant to stay late and prepare an analysis of hotel billings and bookings for the first half of the year. Recognizing the opportunity, Ken shared his vision for AI Agents with his colleagues, mainly because they would spend several days preparing this ad-hoc report.

The Result - AI Agent Generated Report

Generate report from a CEO request to AI Agent: “How did we do with the hospitality business in the first half of the year? Please send the report to email.”

[Start]Hospitality Business Performance Report - H1 2024

Executive Summary

The hospitality business showed strong performance in the first half of 2024, with total bookings of $282,608 and total billings of $255,310. The average billing rate was $90.28, indicating a healthy revenue per booking.

Detailed Analysis

Overall Performance

Total Bookings: $282,607.99
Total Billings: $255,309.52
Average Billing Rate: $90.28

Monthly Trends

Month	Bookings	Billings
January	$47,376.71	$44,865.32
February	$48,402.91	$42,915.77
March	$46,599.87	$41,660.99
April	$47,852.04	$44,383.40
May	$48,651.34	$43,993.93
June	$43,725.12	$37,490.11

Key Observations

Bookings peaked in May at $48,651.34, while billings were highest in January at $44,865.32.
June showed a noticeable decline in both bookings and billings, which may require further investigation.
The gap between bookings and billings suggests potential areas for improving conversion rates.

Recommendations

Investigate the factors behind the June decline to mitigate any potential issues.
Analyze the discrepancy between bookings and billings to improve conversion rates.
Consider implementing strategies to maintain the strong performance seen in May throughout the summer months.

[End]Conclusion

The hospitality business demonstrated a robust performance in the first half of 2024, with consistent bookings and billings. However, the June decline and the gap between bookings and billings present opportunities for improvement. Addressing these areas could lead to even stronger performance in the second half of the year.

Technicalities

Yes, in the article that has C-level in the title will be code and AWS Console, this is a reality that we live in ;)

Understanding the Theory

If you've used ChatGPT for financial analysis, you've likely seen messages about generating and executing Python code. This represents a basic AI Agent with programming capabilities. While Large Language Models (LLMs) cannot perform mathematical calculations directly, they excel at generating code to perform these calculations and explaining the results.

We'll follow this approach for our CEO's Ad-hoc Financial Report Builder. Notice the specific naming conventionit's not just a generic "Report Builder" but "CEO's Financial Report Builder." This specificity reflects the tool's purpose: an intern with CEO-level data access, email authority, and strategic focus. This naming exemplifies the collaboration between business and IT teams.

Data Flow

The typical process includes:

Querying raw ERP data through SQL
Calculating industry segment metrics
Identifying trends
Generating formatted HTML reports
Sending email summaries

Implementing the AI Intern

Prerequisites

AWS Account with enabled LLM models (we'll use Anthropic Claude Sonnet v1)
Clear instructions for the AI Intern
Defined data access and processing tools

Tool Configuration

The AI Intern needs access to specific functions:

send_email: Handles communication with preset CEO email settings
analyze_industry_performance: Processes ERP data consistently

The entire Instruction text I attached in Appendix 1. Instruction to AI Intern starts like this:

You are an experienced financial analyst specializing in business finance. 
Your task is to prepare ad-hoc reports for the CEO based on the financial data 
provided and the specific request made. 
Follow these instructions carefully to produce a comprehensive 
and insightful report.

...

UI to create AI Intern looks like this:

Here you need to select Brain [LLM Model], provide Instructions and define Action Groups = Tools available to Intern.

For example, for our send_email function, the AI Intern needs to provide subject and html_body of email, the CEOs email is already part of this function.

In my example, the tool is a Python function that is executed in a serverless manner; yes, you dont need any server to run them. They are responsible for quering and filtering the data.

Python code for analyze_industry_performance function is:

def analyze_industry_performance(df, industry=None, date_range=None):
    """
    Analyze financial performance by industry.
    Args:
        df: DataFrame with bakery financial data
        industry: Specific industry to analyze (optional)
        date_range: Tuple of (start_date, end_date) for analysis period
    """
    if date_range:
        df = df[(df['date'] >= date_range[0]) & (df['date'] <= date_range[1])]

    if industry:
        df = df[df['industry'] == industry]

    analysis = {
        'total_bookings': df['bookings'].sum(),
        'total_billings': df['billings'].sum(),
        'avg_billing_rate': df['billing_rate'].mean(),
        'industry_summary': df.groupby('industry').agg({
            'bookings': 'sum',
            'billings': 'sum',
            'billing_rate': 'mean'
        }).round(2),
        'monthly_trends': df.groupby('date').agg({
            'bookings': 'sum',
            'billings': 'sum'
        }).round(2)
    }

    return analysis

Conclusion

While the setup might seem straightforward, successful implementation requires the following:

Reliable data infrastructure
Robust cybersecurity measures
Comprehensive API strategy
Expert oversight to validate AI-generated insights

The true challenge lies in the technology and establishing the necessary organizational foundation. Before embarking on an AI Agent initiative, consider the following:

Is your data infrastructure ready?
Do you have appropriate security measures in place?
Have you developed a comprehensive API strategy?
What investment is required?

AI Agents represent incredible potential, but their success depends on the proper infrastructure and culture. The technology is revolutionary - but only for organizations prepared to support it.

Appendix 1. Full Instruction to Intern

You are an experienced financial analyst specializing in business finance. Your task is to prepare ad-hoc reports for the CEO based on the financial data provided and the specific request made. Follow these instructions carefully to produce a comprehensive and insightful report.

First, you will be presented with the financial data by using the function analyze_industry_performance
Next, you need to work on CEO's specific request.
If required, use the send_email function to send the HTML version of the report to the CEO's email.

Analyze the financial data in the context of the CEO's request. Consider the following steps:

1. Identify the key financial metrics relevant to the CEO's request.
2. Perform necessary calculations and comparisons.
3. Look for trends, patterns, or anomalies in the data.
4. Consider both short-term and long-term implications of the findings.

When preparing your report, adhere to these guidelines:

1. Be concise yet comprehensive.
2. Use clear, professional language.
3. Support your analysis with specific data points from the provided financial information.
4. Provide actionable insights and recommendations when appropriate.
5. Anticipate follow-up questions the CEO might have and address them proactively.

Present your report in the following format:

<report>
<executive_summary>
Provide a brief overview of the key findings and their significance (2-3 sentences).
</executive_summary>

<detailed_analysis>
Present your in-depth analysis, broken down into relevant subsections. Include supporting data, calculations, and reasoning for your conclusions.
</detailed_analysis>

<recommendations>
If applicable, offer strategic recommendations based on your analysis. Explain the potential impact of each recommendation.
</recommendations>

<conclusion>
Summarize the main points and reinforce the most critical insights for the CEO.
</conclusion>
</report>

Remember to tailor your analysis and report specifically to the CEO's request while leveraging your financial expertise to provide valuable insights.

What C-Level Leaders Need to Know About AI Agents

Andrii Melashchenko — Thu, 14 Nov 2024 18:13:47 +0000

Introduction

Why should you, as a senior executive or manager, spend precious time on "hyped" generative AI topics? I will offer your business a productivity boost, and then it is up to you to invest time into it.

In this article, I will try to offer mental models of available generative AI technologies. I assume that my reader has already played with claude.ai or chatgpt.com, preferably with paid versions, as free versions have no near "intelligence" in comparison to paid ones. If you, my reader, have no experience with generative AI, I encourage you to read Ethan Mollick's book Co-Intelligence, which inspired this article.

I'll try to make this article as compact as possible to build common ground for discussions about the business value of generative AI approaches. I'll cover "raw" Large Language Models (LLMs), Prompt Engineering, Retrieval-Augmented Generation (RAG), Fine-tuning, and finally, AI Agents. Let's go!

Story

Imagine a medium-sized bakery constrained by local demographics, hiring new personnel to innovate on multiple fronts. The CEO has many ideas, from moving to renewable energy to new innovative marketing ideas. Everything needs to be done in parallel to keeping lights on for existing businesses. As the bakery's products are delicious good, but may not be attractive to new college graduates, it is hard to find new people to start innovating.

The CEO talked to his personnel and saw support for his ideas to innovate, but all people are already 300% loaded, and he heard about this new hyped thing called generative AI that will steal all jobs. He starts experimenting by getting a claude.ai account and reading through tons of materials promising to help match staff. His daughter just joined a computer science master's program, and the generative AI promise also inspires her. She just passed her AWS Certified AI Practitioner certification and is keen to use her new knowledge in practice.

During dinner at home, the father (CEO) and daughter start a conversation to decode all these terms and how they can help the business.

Disclaimer

In this article, I'll use human terms like interns, thinking, reasoning, etc. Current generative AI technology is not alive; it cannot think and reason. I'm using these terms here to simplify the mental model that can be shared within a company.

And I'll repeat one more time: Large Language Models are incapable of thinking. Their behaviour is much more similar to humans and far from traditional computers and programming languages. This is why I use human terms in a technology setting without quotes.

Who are Interns or Large Language Models (LLMs)?

Starting the conversation, the question is easy - it is an intern. Depending on the education, price, and size of the LLM, you will get better or worse performance. Some intelligent people tend to overthink and do some work much longer, the same with LLMs - bigger LLMs are smarter interns, but they cost more and think longer. Smaller ones are less intelligent but faster as they do not have much knowledge to think through.

LLMs are already helpful; they can help with image generation for your branding, summarizing text, or can be a partner in testing new ideas. Think about them as interns who graduated from an educational institution (school, university, institute), and they have basic knowledge but no practical experience. And here, please notice that LLMs are not capable of learning on their own. Here's the main difference between human interns and LLMs - humans learn, and LLMs are not yet capable.

"Interesting analogy, daughter," reacted Father to the start of the discussion. "I already tried to ask questions about gluten-free bread. I'm discovering this opportunity, but the answer was too primitive. Nevertheless, I used the paid version..."

"Great question, Father," his daughter replied. "Let's talk about conversation."

And Here Comes Your Context or Prompt Engineering

"Remember, Father? I shared a story about my friend who constantly starts conversations from the middle of the story, not the beginning, not the end. Middle because the beginning is evident to him, and he runs fast to get results and the end of the story. It makes me mad when he jumps in the middle, and I feel like a fool trying to understand the goal.

I'm not telling you that you acted similarly, but for LLMs, you were this person who started from the middle. When you talk to your colleagues, they know so much context about your business - that you're CEO, love bread, have challenges with personnel, and want to improve your product range. But LLMs read every book on the planet; they know computer science, biology, chemistry, etc. They need your context to understand how to help you specifically. And all knowledge is not helping in the beginning, they try to guess."

There are really good online training resources on this topic at Pluralsight, but you're a busy person, so simply start by giving personas to your questions and providing more context and details. There are also advanced prompting techniques that force LLMs to plan, research, etc., but they may be too complex for now.

Another exciting offer from major chatbot providers is the ability to set personas and contexts for all conversations. For example, claude.ai offers "projects" that inject your persona and context into conversations.

"Imagine that you asked a newly joined intern the same question about gluten-free bread. Maybe they would give you the same boring answer," she concluded.

"Oh, I understand, so the quality of the answer now depends on the quality of the question, as in real human interaction," the father reflected. "Listen, daughter, so I can help my lawyer and accountant with these prompts. How would I pass my contracts on to the chatbot? Should they copy and paste them into input? I want to review contracts signed in the past. We have new unified types of contracts, and I want to help migrate them to the latest templates. And we have tons of them... with schools, cafes, restaurants, ad-hoc, etc."

Let's Talk About Tons of Documents or Retrieval-Augmented Generation

Daughter: "Copy-pasting? Hmm, for thousands of documents, maybe it's not a good idea. And maybe your lawyer and accountant want to have analytics for new, old, updated, and similar metrics."

Father: "So your interns can do all of this?!"

Daughter: "Not so fast. You saw that major chatbots have this 'Enterprise' subscription, right?"

Father: "Right, it's enormously expensive, and I saw that I can upload some files. Why should I pay more money?"

Daughter: "Here's the catch - as a human intern can hold only seven things at a time in their head, the same applies to LLMs that cannot consume all your documents simultaneously. And even if you can do this, it may still add more mess as you're not working on one group of contracts but all of them. How many people can work at one time on everything? Not much? You need to focus, just like LLMs input.

Clever techniques were introduced to handle a lot of data. It was called RAG because before a question reaches LLMs, the chatbot does 'googling' for keywords in the initial question on your documents and dynamically adds document data to the query with reference to the initial document. LLMs think you did copy-paste, but it was done on your behalf. So, back to prompt engineering - when you focus on a specific set of contracts, they will only be included in the context to answer specific questions.

As you understand, all your PDF, Word, and scanned copies need to be preprocessed and stored in your private Google, ready to be injected into LLMs questions. And this costs money. The good thing is that now your non-tech people can even do analytics on top of these documents.

Father: "Hmm, interesting. I need to talk to Juli (lawyer). But what about our local tax regulations? I want LLMs to offer me a strategy for benefiting from the majority of economic stimulus currently available."

Daughter: "I think, Dad, it can help, but very little. Here, we still need tax consultants, and maybe they can train the intern on our local tax regulations and specific use cases."

Father: "But you told me that LLM interns cannot learn."

Daughter: "Wait, it's far from simple. It's called fine-tuning, similar to paying for my University."

Father: "Ah, cost... how much?"

Intern in University or LLM Fine-tuning

Daughter: "As the actor in Le Mans '66 said, not everything money can buy. Maybe fine-tuning may cost you only $10,000, but money is only part of the success. As in university, the main goal is to teach reasoning in a specific field, but LLMs are incapable of reasoning, so what should you do? It would help if you fed them 100,000 and many more pairs of examples, ideally marked as good and bad. For example, you need to say that if a small company is in this situation, a good tax strategy is this, and repeat 100,000 times."

Father: "Oh, I get it. It is not possible at my scale."

Daughter: "Yes, but for some industries with specific terminology and tons of documentation not available to LLM training programs, it's an ideal way to add persistent knowledge to LLMs."

Father: "OK, so you will tell me that AI Agents are also not for me?"

Daughter: "Mhhh... it's not so simple an answer. AI Agents are for everyone, and it's a good part of the question."

Giving Some Freedom to Your Intern or Agentic AI

Daughter: "Before I dive deep into the answer, tell me, father, did you provide your new intern with a new place in the office? Did you buy a PC and a new chair? What tools are available to your intern, if any?"

Father: "Hmm, they have my local documents 'google' and, and, and nothing else. So you want me to give them office programs so they can do what?"

Daughter: "Tell what you want interns to help with; it's a tricky question with humans. Interns often come to a job to figure out what to do and how to be helpful. But remember that computer interns cannot learn; they can follow your instructions.

So in this case, it's good to start gathering or providing AI agents to your employees and look for detailed and prescribed steps for what and how to do it, preferably in the digital world."

Father: "Let me think about it. So I can give it Excel and ask them to calculate how many goods can fit our warehouse, constantly monitor prices, and offer me ideas about where, when, and how much to buy, right?"

Daughter: "Oh, so complex. Preferably, start with primitive and trivial tasks, but let's start with your use case. You must understand that you need a software developer who can use, e.g., AWS Bedrock AI Agents framework to glue everything together. There's also Computer Use from Anthropic, and other frameworks like autogen. Oops, I went too techy. So you need a developer who will combine software, classical code, or API calls to get the current status of the warehouse, current prices for goods, and LLM to run on a schedule based on your precise prompt on when to call The Boss :) and offer good deals on specific positions."

Father: "Interesting and complicated. So I need to think now and maybe share with employees the idea of assistants, with only one requirement - give me detailed simple instructions that these assistants should follow. It will take time for me to adapt."

Recap or the Beginning of a New Era

Father: "So LLMs are new interns, to which I need always to give detailed context and questions via prompt engineering techniques. If I need to feed many documents or these documents constantly change, I need RAG. Fine-tuning is for big companies, and finally, AI Agents can perform simple actions following my instructions, but I need to invest in a software developer, right daughter?"

Daughter: "Yes, father, and I want to write my thesis on AI Agents to manage our fleet of autonomous buses powered by our solar station located on the farm. Good night, Father."

Father: "And now you want me to sleep? No way! I'll write the first instructions for my first AI Agent. By the way, is coding hard?"

Conclusion

In conclusion, integrating generative AI technologies into business operations offers significant potential for enhancing productivity and innovation. As demonstrated through the analogy of a bakery, AI can serve as a valuable tool for executives and managers, acting as an intern that requires guidance and context to perform effectively. By understanding and utilizing concepts such as Large Language Models, Prompt Engineering, Retrieval-Augmented Generation, Fine-tuning, and AI Agents, businesses can leverage AI to streamline processes, manage large volumes of data, and automate routine tasks. However, successful implementation requires investment in both technology and human resources, such as software developers, to tailor AI solutions to specific business needs. As AI continues to evolve, it presents new opportunities for businesses to innovate and adapt, marking the beginning of a new era in business operations.

Introduction to AWS SiteWise: Part 1. An IIoT OPC UA End-to-End Guide

Andrii Melashchenko — Tue, 21 May 2024 11:23:20 +0000

Introduction: AWS IoT SiteWise for Network Device Monitoring

Welcome to this new blog series, where we explore the capabilities of AWS IoT SiteWise and unveil the "magic" it offers for building end-to-end (E2E) Industrial Internet of Thing (IIoT) solutions focused on monitoring the physical health of network devices. While AWS IoT Core is also a critical component of IoT strategy, this series will concentrate exclusively on SiteWise, leaving IoT Core to be covered in depth in a future series.

In this initial instalment, we'll outline our project's broad objectives, providing a roadmap of what we aim to achieve and the technology that will help us achieve it.

The genesis of this series stems from a practical need within my fictional role as a network administrator. Tasked with developing a system to monitor crucial aspects of our operational technology (OT) network devices—such as power supply status, temperature levels, and port functionality—I discovered that these devices are equipped with OPC UA protocol capabilities. OPC UA will be instrumental in enabling us to track and manage the health of our network equipment effectively.

Join me as we delve into AWS IoT SiteWise, leveraging its powerful features to create a robust monitoring system that ensures our network devices operate at their peak and alert us to potential issues before they impact our operations.

The Big Picture: Integrating OT Network with AWS SiteWise

The journey to fully integrate the network device into our network management strategy involves several key components, each serving a distinct purpose in a larger ecosystem. This approach enhances the monitoring and management of network devices and ensures data integrity and security from the edge to the cloud (Figure 1).

Note. Diagrams are avaliable as a code in github repo for this series - aws-sitewise-iiot-e2e/diagrams/big_picture.py at main · javatask/aws-sitewise-iiot-e2e (github.com )

On-Premise Operations

Our system begins in the operational technology (OT) environment, where physical devices grouped under "OT Network 1" are connected and managed. These devices, crucial to our daily operations, are first connected through a Hirschmann Industrial Operating System (HIOS 1) switch, which acts as the data source.

Ensuring Security and Integrity

Data from the OT network is then securely transmitted through "Firewall 1", ensuring that only authorized data flows into the IT network. This firewall acts as a critical barrier, guarding against potential threats and unauthorized access, thus maintaining the integrity of our operational data.

Bridging OT and IT

Once through the firewall, data moves into the IT network, where it is pre-processed and aggregated. This step is crucial for reducing noise and ensuring that only meaningful data is forwarded to the cloud for further analysis.

Edge Computing

At this stage, "Edge Compute" plays a vital role. It involves processing data locally, near the source of data generation, which minimizes latency and reduces the load on cloud services. Edge computing is crucial for real-time applications where immediate data processing is required to make timely decisions. Our OPC UA Client will be deployed to listen for the data from the network devices.

AWS Cloud Integration

Post-edge processing, data is sent to AWS Cloud, specifically to AWS SiteWise. Here, it's modelled, stored, and analyzed. AWS SiteWise provides a robust framework for creating comprehensive asset models that reflect our network's hierarchy, facilitating advanced monitoring and analytics.

Visualization and Control

The final component in our IIoT architecture is the "Dashboard," which is accessible by network administrators. This dashboard provides a centralized, intuitive interface for monitoring and managing the network's health and performance. It offers real-time insights and analytics, enabling quick decision-making and effective operational management.

Through AWS SiteWise, we manage and automate data flows from the ground up, ensuring every layer of our network is optimized for performance, reliability, and security. This strategic integration empowers our network administrators with the tools they need to excel in today's fast-paced digital environment.

What is AWS IoT SiteWise?

AWS IoT SiteWise is a managed service from Amazon Web Services designed to simplify collecting, organizing, and analyzing industrial equipment data. This service is tailored to help companies capture data consistently across devices, understand their operational performance, and identify efficiencies within industrial operations.

SiteWise enables users to define and model their industrial operations as hierarchical assets at its core, making organising and managing data from complex environments easier. This modelling capability creates virtual representations of physical operations, including devices, machinery, and processes, enabling structured data ingestion and storage.

SiteWise also automates the process of data collection and ingestion from various sources like OPC-UA (Open Platform Communications Unified Architecture) and other industrial protocols, removing the need for custom-built data collection solutions. Once data is in SiteWise, the service provides built-in capabilities for monitoring equipment across facilities, quickly identifying issues with real-time alerts, and executing deeper analysis using AWS's powerful analytics tools.

SiteWise includes an integrated dashboard feature called SiteWise Monitor for visualisation and operational insight. This tool allows industrial engineers and operators to create customizable, interactive dashboards to view and manage operational data in real-time. Without deep technical expertise, these dashboards can help identify bottlenecks, optimize processes, and improve overall operational health.

AWS IoT SiteWise offers a comprehensive, scalable, and secure industrial IoT applications (IIoT) platform. It aids organizations in their journey toward digital transformation by leveraging detailed insights from their equipment data.

Series Logical Blocks: Mapping Our Journey

This series will break down our journey into several key stages, each designed to systematically build and enhance our network monitoring system using AWS IoT SiteWise. Here's how we'll proceed:

Configuring Network Devices for OPC UA: Network devices must be set up initially to enable OPC UA and configure the necessary credentials. While I won't cover this process in detail, I recommend using the node-opcua/opcua-commander, a client that simplifies testing OPC UA connectivity.
[Part 2] Configuring Edge Compute/Gateway: Setting up our edge computing infrastructure or gateway is crucial for piping data from OPC UA Servers to the cloud.
[Part 3] Creating SiteWise Asset Models: We'll define and configure the asset models within AWS SiteWise to represent our physical devices and their data hierarchically.
[Part 4] Binding Data Streams to SiteWise Assets: This step involves linking real-time data streams to our previously configured asset models, ensuring data flows correctly through our system.
[Part 5] Developing a SiteWise Monitor Portal Dashboard: Finally, we will build a comprehensive dashboard within the SiteWise Monitor Portal. This dashboard will serve as the central interface for visualizing and managing the health and status of our network devices.

This series aims to guide you through all the necessary steps, from groundwork to full implementation, ensuring a deep understanding of each setup phase.

Conclusion: Laying the Foundation for IIoT Success with AWS SiteWise

As we conclude the first part of our journey into leveraging AWS IoT SiteWise for monitoring network devices, we have outlined the roadmap and set the stage for a transformative monitoring solution. By breaking down the journey into manageable steps, we have prepared the groundwork for a robust end-to-end monitoring system that will enhance operational efficiency and ensure the health of network devices.

In the upcoming parts of this series, we will explore each step in more detail, starting with setting up edge computing capabilities and then creating and configuring SiteWise Asset Models. Each part will build upon the last, gradually assembling a comprehensive solution that effectively captures and analyzes data and provides actionable insights through a well-designed dashboard.

Stay tuned for the next instalment, where we will explore the configuration of Edge Compute/Gateway, a critical component in reducing latency and enabling real-time data processing at the network's edge. Your engagement and feedback are valuable as we navigate this IIoT journey, ensuring each step adds tangible value to your network management strategy.

Thank you for joining me for this introductory part. I look forward to continuing this journey with you.

Fixing Issues in AWS VPC with AI Help

Andrii Melashchenko — Tue, 19 Mar 2024 16:55:31 +0000

Introduction

My story begins with AWS Advance Networking Certification. I will start learning all possible network configurations and buy a notebook to note all VPC, Subnets, Route Tables, ENI, and other entity IDes to debug misconfigurations. I have an idea: All this information is available via AWS CLI. Maybe Generative AI can help me with this?!

So, I built a simple react AI Agent to help with some simple stuff. Maybe this article will inspire you ;)

Result

If you, my honoured reader, want to understand whether this article is for you, here is the result of the small eight-operator code.

Question : I cannot access web server at ec2 instance ID i-xxxxxx.

Answer : Initially, you were unable to access the web server on the EC2 instance i-xxxxxx because the associated security group did not have an inbound rule permitting web traffic. Specifically, it lacked rules allowing HTTP (port 80) or HTTPS (port 443) inbound connections.

!!!

To resolve this issue, an inbound rule has been added to the security group (sg-xxxxxx) to permit traffic on port 80 from any IP address (0.0.0.0/0).

!!!!

With this change, you should now be able to access the web server using the public IP address xxx.xxx.xxx.xxx.

However, if you continue experiencing difficulties accessing the web server after adding the security group rule, you may need to investigate further. Potential areas to review include the web server configuration on the EC2 instance, ensuring the web server software is running, and verifying that the instance is located in a public subnet with an Internet Gateway attached to the VPC's route table.

Prerequisites

You should be comfortable with AWS Console, Cloudformation, VPC
You should configure access to the AWS Bedrock Calude 3 Sonnet model
You should have Project Jupyter | Home installed
Also, I'm using aws sso login to secure access to AWS Enviroment

Disclaimer

!!!!!!!!!!!!!!!

All the words in these sections are big WARNING that the code is using bash shell and AWS credentials to do its "magic". LLM access to shell on its own is high risk ;)

So please think about questions and experiments!

You were warned.

!!!!!!!!!!!!!!!

The code provided in this article is offered as-is, without any warranties or guarantees of any kind.

The author does not assume any responsibility for any errors, omissions, or damages that may arise from the use or misuse of the code. The user assumes all risks associated with the implementation and usage of the code.

The author shall not be held liable for any direct, indirect, incidental, special, or consequential damages that may result from the use or inability to use the code. This includes, but is not limited to, loss of data, loss of profits, business interruption, or any other commercial damages or losses.

The code is provided for educational and informational purposes only and should not be used in production environments without proper testing, validation, and security considerations.

The user is solely responsible for ensuring the code's suitability, security, and compliance with applicable laws and regulations.

The author does not endorse or recommend any specific use of the code and does not guarantee its compatibility with any particular software, hardware, or operating system. The user assumes full responsibility for any modifications made to the code and any consequences that may arise from such modifications.

By using the code, the user acknowledges and agrees to these terms and conditions. If the user does not agree with these terms, they should refrain from using or implementing the code.

!!!!!!!!!!!!!!!

Problem

One of the common challenges AWS newcomers face is misconfigured security group rules, which can lead to denied access to resources like virtual machines (VMs) or web servers running on Elastic Compute Cloud (EC2) instances. This issue often arises due to the "deny all" principle by default in AWS, where no inbound or outbound traffic is allowed unless explicitly permitted through security group rules.

In this use case, we created a virtual cloud network (VPC) with a misconfigured firewall (security group) that prevented access to a web server running on an EC2 instance on port 80 (HTTP). This scenario is a typical newbie mistake and highlights the importance of properly configuring security group rules to allow intended traffic.

To address this issue, we leveraged the power of an AI assistant specifically designed for VPC troubleshooting. This AI agent was equipped with access to a bash shell and the AWS Command Line Interface (CLI) toolkit, enabling it to interact with AWS resources and perform necessary actions.

The AI assistant first analyzed the current configuration of the VPC, including the security groups associated with the EC2 instance hosting the web server. By identifying the missing inbound rule for HTTP traffic on port 80, the AI agent was able to add the required rule to the appropriate security group, allowing inbound traffic from any IP address (0.0.0.0/0) on port 80.

Once the security group rule was updated, the AI assistant verified that the web server could be accessed using the public IP address of the EC2 instance.

If access was still denied, the AI agent had the capability to further investigate potential issues, such as the web server configuration on the instance, ensuring the web server software was running, and verifying that the instance was in a public subnet with an Internet Gateway attached to the VPC's route table.

This use case highlights the power of an AI assistant specifically designed for VPC troubleshooting. By leveraging its knowledge and capabilities, the AI agent was able to quickly identify and resolve a common misconfiguration issue, saving valuable time and effort for AWS users, especially those new to the platform.

Implementation

CloudFormation template to simulate VPC misconfiguration:

AWSTemplateFormatVersion: '2010-09-09'
Description: VPC with EC2 Instance and Security Group
Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: MyVPC
  InternetGateway:
    Type: AWS::EC2::InternetGateway
  VPCGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref VPC
      InternetGatewayId: !Ref InternetGateway
  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.0.0/24
      MapPublicIpOnLaunch: true
      AvailabilityZone: !Select
        - 0
        - !GetAZs ''
      Tags:
        - Key: Name
          Value: Public Subnet
  RouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Public Route Table
  Route:
    Type: AWS::EC2::Route
    DependsOn: VPCGatewayAttachment
    Properties:
      RouteTableId: !Ref RouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway
  SubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnet
      RouteTableId: !Ref RouteTable
  SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow SSH
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: SSH Access
  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0cff7528ff583bf9a
      InstanceType: t2.micro
      KeyName: test-vpc
      NetworkInterfaces:
        - AssociatePublicIpAddress: 'true'
          DeviceIndex: '0'
          GroupSet:
            - !Ref SecurityGroup
          SubnetId: !Ref PublicSubnet
      UserData: !Base64 |
        #!/bin/bash
        yum update -y
        yum install -y httpd
        systemctl start httpd
        systemctl enable httpd
      Tags:
        - Key: Name
          Value: DummyHTTPServer
Outputs:
  VPC:
    Description: A reference to the created VPC
    Value: !Ref VPC
    Export:
      Name: !Sub ${AWS::StackName}-VPCID
  PublicSubnet:
    Description: A reference to the public Subnet
    Value: !Ref PublicSubnet
    Export:
      Name: !Sub ${AWS::StackName}-SubnetID
  EC2Instance:
    Description: A reference to the EC2 instance
    Value: !Ref EC2Instance
    Export:
      Name: !Sub ${AWS::StackName}-EC2InstanceID

This CloudFormation template creates the following resources:

A VPC with DNS settings and a custom tag.
An Internet Gateway is attached to the VPC.
A public subnet with public IP auto-assignment enabled.
A route table associated with the public subnet, including a route to the Internet Gateway.
A security group allowing SSH access from any IP address.
An EC2 instance in the public subnet, running an Apache HTTP server.

AI Agent

Here is a code for AI Agent build using the LangChain framework.

# Notebook to build AI Agent to fix VPC issues

# Access to Claude-3 Sonnet LLM using aws sso login to get credentials
from langchain_community.chat_models import BedrockChat
import boto3
boto3_bedrock  = boto3.client("bedrock-runtime", 'us-east-1')

model = BedrockChat(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={
        "temperature": 0,
        "top_k": 250,
        "top_p": 1,
    },
    client=boto3_bedrock,
)

# configure react agent
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools import ShellTool
from langchain_core.prompts import PromptTemplate

# Initialize the language model
llm = model

# Initialize the tools
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING RAW SHELL EXPOSED !!!!!!!!!!!!!!!!!!!!!!!
tools = [ShellTool()]
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING RAW SHELL EXPOSED !!!!!!!!!!!!!!!!!!!!!!!
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# Initialize prompt
template = """
You are an AWS VPC troubleshooting assistant, and your task is to identify and resolve any misconfigurations in an AWS Virtual Private Cloud (VPC) and its associated resources.

You have access to the bash shell, which allows you to run AWS CLI commands to retrieve information about AWS resources, make changes, and perform various operations.
List of tools:
{tools}
{tool_names}

First, gather information about the existing VPC setup by running the following AWS CLI commands:

1. `aws ec2 describe-vpcs`: This command will list all available VPCs in the current region, along with their CIDR blocks, VPC IDs, and other details. Review the output to ensure the VPC you want to troubleshoot exists and has the correct CIDR block.

2. `aws ec2 describe-subnets --filters Name=vpc-id,Values=<vpc-id>`: Replace `<vpc-id>` with the ID of the VPC you want to troubleshoot. This command will list all subnets associated with the specified VPC, their Availability Zones, and CIDR blocks. Verify that the subnets have the correct CIDR blocks and are associated with the intended Availability Zones.

3. `aws ec2 describe-route-tables --filters Name=vpc-id,Values=<vpc-id>`: This command will list all route tables associated with the specified VPC. Review the routes in each route table to ensure they are configured correctly and match your expectations. For example, check if there are routes to an Internet Gateway for public subnets, and routes to a NAT Gateway or VPN for private subnets.

4. `aws ec2 describe-security-groups --filters Name=vpc-id,Values=<vpc-id>`: This command will list all security groups associated with the specified VPC. Review the inbound and outbound rules for each security group to ensure they are configured correctly and allow or deny the intended traffic.

5. `aws ec2 describe-network-acls --filters Name=vpc-id,Values=<vpc-id>`: This command will list all network ACLs associated with the specified VPC. Review the inbound and outbound rules for each network ACL to ensure they are configured correctly and allow or deny the intended traffic.

After gathering this information, analyze the output and identify any potential misconfigurations or deviations from the expected setup. If you find any issues, provide step-by-step instructions on how to fix them using the appropriate AWS CLI commands.

Remember to double-check your commands before executing them, as some commands may make changes to the AWS environment.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}
"""
prompt = PromptTemplate.from_template(template)

# Initialize the agent
agent = create_react_agent(llm, tools, prompt)

# Run the agent
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

agent_executor.invoke({"input": "I cannot access web server at ec2 instance ID i-xxxxxx"})

Here's a section about the AI Agent based on the provided code:

The provided code demonstrates how to build an AI Agent for troubleshooting AWS Virtual Private Cloud (VPC) issues using the AWS Bedrock runtime and the Claude-3 Sonnet language model from Anthropic. This agent leverages the power of large language models (LLMs) and the AWS Command Line Interface (CLI) to identify and resolve misconfigurations in VPC setups.

Setting up the Environment

The code begins by importing the necessary libraries and establishing a connection to the AWS Bedrock runtime using the boto3 library. The BedrockChat model from the langchain_community package is used to interface with the Claude-3 Sonnet language model.

from langchain_community.chat_models import BedrockChat
import boto3
boto3_bedrock = boto3.client("bedrock-runtime", 'us-east-1')

model = BedrockChat(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={
        "temperature": 0,
        "top_k": 250,
        "top_p": 1,
    },
    client=boto3_bedrock,
)

Configuring the Agent

The code then configures the AI agent using the create_react_agent function from the langchain.agents module. The agent is equipped with a ShellTool from the langchain_community.tools module, which allows it to execute shell commands, including AWS CLI commands.

from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools import ShellTool
from langchain_core.prompts import PromptTemplate

llm = model
tools = [ShellTool()]

Defining the Prompt

A prompt template guides the agent's behaviour and provides instructions and context for the troubleshooting task. The template includes a list of AWS CLI commands that the agent can execute to gather information about the VPC setup, such as describing VPCs, subnets, route tables, security groups, and network ACLs.

template = """
# ... (prompt template omitted for brevity)
"""
prompt = PromptTemplate.from_template(template)

Running the Agent

Finally, the code initializes the agent using the create_react_agent function and runs it using the AgentExecutor. The agent can be invoked with a specific input question or problem statement related to VPC troubleshooting.

agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

agent_executor.invoke({"input": "I cannot access web server at ec2 instance ID i-xxxxxx"})

When invoked, the agent will follow the instructions in the prompt, gathering information about the VPC setup using the AWS CLI commands. It will analyze the output, identify potential misconfigurations, and provide step-by-step instructions on how to fix any issues using the appropriate AWS CLI commands.

This AI Agent can be a valuable tool for AWS administrators and developers who are responsible for managing and troubleshooting VPC setups. By leveraging the power of large language models and the AWS CLI, the agent can quickly identify and resolve complex VPC issues, streamlining the troubleshooting process and improving overall operational efficiency.

The result is a new one

This time I removed route to Internet Gateway in public subnet.

Question : I cannot access web server at ec2 instance ID i-xxxxxx.

Answer : To allow internet access to the web server on the EC2 instance (i-xxxxxx), follow these steps:

Create an Internet Gateway:

aws ec2 create-internet-gateway

Attach the Internet Gateway to the VPC:

aws ec2 attach-internet-gateway --vpc-id vpc-03xxxxx --internet-gateway-id <internet-gateway-id>

Replace <internet-gateway-id> with the ID of the Internet Gateway created in step 1.

Find the route table associated with the public subnet (subnet-03xxxxxx):

aws ec2 describe-route-tables --filters Name=association.subnet-id,Values=subnet-03xxxxxx --query 'RouteTables[*].RouteTableId' --output text

Create a route in the route table to the Internet Gateway:

aws ec2 create-route --route-table-id <route-table-id> --destination-cidr-block 0.0.0.0/0 --gateway-id <internet-gateway-id>

Replace <route-table-id> with the ID of the route table associated with the public subnet, and <internet-gateway-id> with the ID of the Internet Gateway created in step 1.

After completing these steps, instances launched in the public subnet (subnet-03xxxxxx) should be able to access the internet and be accessible from the internet. You should now be able to access the web server on the EC2 instance (i-xxxx) using its public IP address (xxx.xxx.xxx.xxx).

Conclusion

In conclusion, leveraging Generative AI and AWS CLI to troubleshoot AWS VPC issues offers a groundbreaking approach to network management. This methodology streamlines the process of identifying and resolving misconfigurations and significantly reduces the time and effort required for these tasks. By employing a simple React AI Agent integrated with the Claude-3 Sonnet model, we demonstrated a practical application of AI in automating the process of fixing common security group misconfigurations, thereby enhancing access to EC2 instances. This exploration underscores the potential of AI in simplifying cloud networking challenges, making it an invaluable tool for AWS users seeking to optimize their networking configurations efficiently.

Understanding Amazon Bedrock Knowledge Bases: Key Points

Andrii Melashchenko — Fri, 12 Jan 2024 20:41:53 +0000

Article complexity: Intermediate. Target audience: managers and engineers

Introduction

AWS reInvent 2023 was full of Generative AI announcements, one of them being AWS Amazon Bedrock Knowledge Bases. This article will tell the story of starting an internal production-ready AI project in a fictional company, specifically retrieval-augmented generation (RAG). Simply put, your AI assistant is powered by your company's knowledge captured in different forms like PDFs, articles, whitepapers, etc.

Telling the story, I assume you 1) already played with one of the generative AI chatbots like ChatGPT, MS Bing Copilot, or Google Bard. I hope you had a chance to 2) try state-of-the-art models like GPT-4 or Claude 2.1. The last assumption is you know 3) "specific data cut off" of Large Language Models (LLMs). They have knowledge limited by concrete date.

An idea for AI-powered Assistant

The story starts with the fictional company called "ZeHouse Solutions" (ZHS), which has been producing hubs to host smart-home and smart-office solutions since the 2000s when Zigbee was an early days protocol. For the last two decades, the company has been using its own data centre for software development and partner network to deliver its product to end customers.

An essential feature of ZHS hubs was their offline mode. There is no need for Cloud connectivity to set up and run your hub. It's your responsibility to download firmware and upgrade your devices. Also, all custom configuration was the responsibility of integrators or end customers.

In 2024, ZHS engineers saw the potential of generative AI to help their customers configure hubs. As a well-established company with hundreds of PDFs and documentation for different hub features developed over 20 years, it was challenging to master CLI and the web interface.

So, the engineers played with proof of concept for generative AI powred assitent and are ready for the next step to talk to their managers...

Engineers + Managers = Shared Understanding of GenAI product

Before deep-diving into discussions about the architecture, budget, responsibility, and outcomes of engineers offered a mental model of the AI system based on a shiny new book "Generative AI on AWS" by Chris Fregly, Antje Barth, and Shelbee Eigenbrode (OReilly). Copyright 2024 Flux Capacitor, LLC, Antje Barth, and Shelbee Eigenbrode, 978-1-098-15922-1.

So guys! Let's map typical components of an AI system to our trainee program activities!

According to Chapter 9 of the book, we have the following components of the system: 0) Infrastructure, 1) Generative models and supporting machine learning (ML) models, 2) Information sources, 3) External systems, 4) Tools and frameworks, 5) Monitoring and logging, 6) Generated outputs and feedback, 7) Operational tooling.

I'll skip the Infrastructure and Operational tooling blocks and jump straight to Generative models. They will be our trainees :)

We have people, but we must empower them with our documentation = Information sources. We will imagine information sources as our company library!

What is the next block on the list? External systems? What is it? These are a lot of questions about our complex systems, but not all can be answered by reading our documentation; we need experts! Yes! External systems are our experts.

We have trainees with access to our documentation library and experts to find the best possible answers for customer questions. What we are missing are Tools and frameworks to get work done.

And "Monitoring and logging" to understand progress and whether our team needs help with their job.

Oh, some of the engineers said there are many moving parts! Let's show the managers our model of trainees using laptops to access monitored documentation libraries and chat with experts to Answer the questions. And here comes the most important part: Who will assess the quality of the answers? Managers? No! :)

Your user community and partners know questions and the best answers to them. Building and working with your community is the key to success!

Note. Dear reader please note that generative AI is not capable of mimicing the mechanism of human brain. It's litteraly predicts words when asnwering your questions.

Offered here model is only an illustration of all moving parts that you need to combine together in order to offer generative AI powered product. This model is targeted for less technical colleguas, to show them the complexity of a product and roles that GenAI project/product requires.

The AI assistant is ready for production. Do we need to panic?

Our small team of enthusiastic engineers came with our comics mental model to the meeting with managers and got approval. Hooray! Or not? How do you make a proof of concept product from Jupiter Lab + LangChain, even for internal use?

And here comes the book we mentioned: "Generative AI on AWS"! We can use AWS Bedrock services to build our backend E2E. Let's try it using the same moving parts model!

The Challenge

The primary challenge was to demonstrate the effectiveness of RAG in a real-world application. The team aimed to build a system that leveraged the vast amount of information in PDF documentation and provided contextually relevant and accurate information in a conversational AI setting. They needed a platform seamlessly integrating various components, including data storage, a vector database for efficient retrieval (library together with experts), and a hosted LLM (trainees).

The Solution: AWS Bedrock

AWS Bedrock emerged as the ideal solution for this challenge. It provided a comprehensive environment facilitating easy experimentation with advanced AI concepts like RAG. The key features that made AWS Bedrock an excellent choice for this project included:

Out-of-the-Box Data Import from S3 : AWS Bedrock allowed the team to import their PDF documentation directly from Amazon S3. This feature streamlined the process of feeding data into the system, which was crucial for feeding the RAG pipeline. Knowledge base for Amazon Bedrock Service
Vector Database Integration : Integrating a vector database was essential for the RAG setup. It enabled efficient storage and retrieval of information, a backbone of the RAG system. The vector database within AWS OpenSearch serverless ensured that RAG's retrieval component was fast and accurate.
Hosted Large Language Model (LLM): AWS Bedrock provided access to a state-of-the-art LLMs garden. These models served as the foundation for generating responses, while the external data fetched from the vector database enriched these responses with relevant information. Also, we can choose which model is better for our AI assistant, including fine-tuned models.
Model Evaluation Tools : The platform offered comprehensive model evaluation tools. These tools were essential for assessing the performance of the RAG system, allowing the team to measure the accuracy, relevance, and coherence of the responses generated by the LLM and to ensure that the integration with the vector database was functioning optimally.
Fine-Tuning Capabilities : AWS Bedrock enabled the team to fine-tune the LLM to suit specific use cases. This customization was crucial for tailoring the model's responses to the nature of the dataset's smart home terminology and the project's particular requirements. By fine-tuning the LLM, the team could optimize its performance and improve the relevancy of the generated content.
Robust Security Measures : Security is paramount when dealing with large datasets and AI models. AWS Bedrock provided robust security measures to protect data and model integrity. This included secure data handling practices, encryption, and compliance with industry-standard privacy regulations, ensuring that the team's RAG system was efficient and secure.
Comprehensive Monitoring : Finally, AWS Bedrock offered extensive monitoring capabilities. This feature allowed the team to track the systems performance in real-time, promptly identify any issues, and ensure that the system ran smoothly and efficiently. Monitoring tools were especially important to monitor the system's interaction with the vector database and the LLMs response generation.

The Experimentation Process

The team started by importing a curated dataset from S3 into the vector database. This dataset was designed to represent documentation about the smart home and office hub's range of knowledge that could be useful for the LLM while answering complex configuration questions from both the internal support team and later from the customers directly.

Next, they integrated the LLM with the vector database. The LLM was configured to query the database dynamically during the generation process, fetching relevant information to augment its responses. For this one, they used RetrieveAndGenerate AWS Bedrock endpoint .

Results and Insights

The proof of concept was a resounding success. The RAG system demonstrated a remarkable ability to provide contextually relevant and detailed answers. For instance, when asked about CLI commands to configure time synchronization using the NTP server, the system was able to pull recent data from the vector database, providing responses that were not only accurate but also up-to-date.

Conclusion

The introduction of AWS Bedrock has given ZeHouse Solutions an effective tool to develop a retrieval-augmented generation (RAG) system. This AI-powered assistant uses the company's extensive documentation to provide accurate and contextually relevant responses to complex configuration questions. The RAG system's success demonstrates the potential of generative AI in enhancing customer service and internal support. Looking forward, it is clear that AWS Bedrock's capabilities will continue to be instrumental in the evolution and improvement of AI-assisted services.

Note. AI (OpenAI DALL-E engine) generates all images in this article.

AI Unleashed: Running Generative Models Locally. Introduction

Andrii Melashchenko — Thu, 04 Jan 2024 10:01:05 +0000

Introduction

This article is the first in the series about running a generative AI model locally on consumer-grade hardware to give you a safe place to experiment and understand your use cases. I will try to answer the question of how to safely start experimenting with your real data.

Gartner states, "In theory, at least, this (Generative AI) will increase worker productivity". Another Gartner recommendation is to "Start Inside ... with ... Off-the-shelf products". But it's a theory.

In this series, I'll offer practical steps for implementing Gartner's recommendations for regulated (internally and externally) environments. The series promotes the usage of open-source large language models (LLMs) that are the heart of Generative AI (GenAI).

Note. If you don't have any restriction to process your real world data on major Public Cloud providers, like AWS Bedrock, Azure AI or Google Vertex AI. Go for the Public Cloud GenAI offerings!!!

Key Definitions

Generative AI: This is a subfield of artificial intelligence that uses models and algorithms to generate content. It can create anything from written text to images or music, by learning patterns from existing data and producing new content that mimics it.

Large Language Models (LLMs): These are AI models trained on a vast amount of text data. They can generate human-like text by predicting the probability of a word given the previous words used in the text. Examples include OpenAI's GPT-3 and Google's Gemini.

AI Agents: These are systems or software that can perform tasks or make decisions autonomously. They use Generative AI to "understand" their environment and perform actions to achieve specific goals.

CUDA: This stands for Compute Unified Device Architecture. It's a parallel computing platform and application programming interface model created by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit for general-purpose processing.

Chatbot: A chatbot is an AI-powered software designed to interact with humans in their natural languages. These interactions can occur in both text and voice formats.

Machine Customers: This is a term coined by Gartner to describe AI systems that can autonomously perform tasks or make purchasing decisions on behalf of human users or other systems.

Public Cloud GenAI offerings: These are Generative AI models or services offered by public cloud providers like Google, Amazon, and Microsoft. They provide pre-trained models and services which developers can use to integrate AI capabilities into their applications.

Steam Revolution is here

This article has a steam engine on its cover because steam and latte electric engines changed the world. The same is happening with AI, specifically with AI Agents.

According to LangChain: "The core idea of (AI) agents is to use a language model to choose a sequence of actions. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and which order."

So GenAI becomes the "brain", an orchestrator of tools you may use to achieve a given goal or react to the situation. This agent can work 24/7, process enormous amounts of data and be "objective". One type of such agent Gartner call " Machine Customers".

Application Covered by the Series

This series will have two big blocks:

Setting up and running your "brain" = LLM locally
- With CPU only
- With CPU and Nvidia GPU, you need GPUs that has CUDA support
Configuring apps that use "brain" to deliver value:
- ChatGPT like chatbot
- GitHub Copilot like VS Code AI Assistant
- AWS Knowledge Bases like super-powered search on private docs
- AI Agents - Demo on how "brain" can use external APIs

Note. Diagram shows that Public Cloud providers offer you LLM APIs, so you don't need to worry about hardware and other supportive software.

Manage your expectations

Technology is here, but hardware is still evolving. Following my tutorials on setting up GenAI locally, you soon feel that some models are "dummer" than ChatGPT 4 (state-of-the-art, closed model). They are slower because you may not have the latest and greatest CUDA-enabled GPU.

BUT, local LLMs technology is good enough for you to start experimenting with GenAI to get value out of it. Remember GenAI is a company-wide initiative, not an IT initiative!

Conclusion

Setting up a local Generative AI model can be a game-changer, providing an avenue to explore, experiment, and build expertise in its use cases. While the technology is available, remember that hardware is still in a state of evolution. Despite some models being slower and less sophisticated than state-of-the-art models, leveraging these tools locally offers a valuable opportunity to experiment and identify the best use cases for Generative AI. This is not just an IT initiative, but a company-wide effort that can revolutionize productivity and efficiency. As we move forward, the fusion of AI with our daily tools and tasks will become increasingly integral to our work and lives.

Deploying a Single Page Application (SPA) on AWS: A Beginner's Guide. Part 8. Serverless

Andrii Melashchenko — Tue, 12 Dec 2023 10:14:16 +0000

Introduction

Serverless architecture and AWS Lambda have brought a revolutionary change to how developers build and deploy applications. By eliminating the need to manage servers, developers can focus more on writing code and delivering features. This article explores the benefits of serverless, such as reduced operational overhead, cost-effectiveness, and scalability. Additionally, we delve into AWS Lambda, a popular serverless computing platform, and how it can streamline your application development process.

For theoretical insights into serverless, you can visit serverlessland.com.

Tooling

AWS SAM (Serverless Application Model) is a framework that simplifies the development and deployment of serverless applications on AWS. It extends AWS CloudFormation to provide a simplified way of defining the resources and functions required for your application. To install AWS SAM, you can use the AWS SAM CLI, which is available through package managers like Homebrew, pip, or by downloading the installer from the official AWS SAM GitHub repository.

Java Code + SAM CloudFormation Template = Production Backend

The Price of Serverless

To fully benefit from serverless architecture, it's essential to refactor your code to be serverless-compatible with AWS Lambda. This process involves adapting your application's structure and codebase to work with Lambda functions, which are stateless, event-driven, and automatically managed by AWS. By doing so, you can take advantage of reduced operational overhead, cost-effectiveness, and improved scalability without the need to manage servers.

How Complex Is the Refactoring?

With Spring Boot 3 Application, making your code serverless-friendly is as simple as importing one library from AWS to wrap your Spring Boot App into an AWS Lambda handler:

public class GreetingLambdaHandler implements RequestStreamHandler {
    private static SpringBootLambdaContainerHandler<HttpApiV2ProxyRequest, AwsProxyResponse> handler;
    static {
        try {
            handler = SpringBootLambdaContainerHandler.getHttpApiV2ProxyHandler(RestServiceApplication.class);
        } catch (ContainerInitializationException e) {
            e.printStackTrace();
            throw new RuntimeException("Could not initialize Spring Boot application", e);
        }
    }

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
            throws IOException {
        handler.proxyStream(inputStream, outputStream, context);
    }
}

That's all!

Deploying the App

Here's the CloudFormation template (CFN) for our Spring Rest API backend:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Example Rest API Serverless for Spring Boot 3

Resources:
  MyRestApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: com.example.restservice.GreetingLambdaHandler::handleRequest
      Runtime: java17
      CodeUri: .
      MemorySize: 512
      Policies: AWSLambdaBasicExecutionRole
      Timeout: 60
      Events:
        HttpApiEvent:
          Type: HttpApi
          Properties:
            TimeoutInMillis: 20000
            PayloadFormatVersion: "2.0"

Outputs:
  RestApi:
    Description: URL for application
    Value: !Sub "https://${ServerlessHttpApi}.execute-api.${AWS::Region}.amazonaws.com/api"

Now, let's build and deploy it:

sam build
sam deploy -g

And you can use the URL:

curl https://${ServerlessHttpApi}.execute-api.${AWS::Region}.amazonaws.com/api/greeting

So simple!

Let's Deep Dive into the CFN Template

The provided CloudFormation template describes AWS resources to be created or configured, focusing on a serverless architecture for hosting a REST API using Spring Boot 3 and Java 17.

Here's a breakdown of each part:

AWSTemplateFormatVersion : Specifies the version of the CloudFormation template, in this case, '2010-09-09'.
Transform : Indicates that the template uses the AWS Serverless Application Model (SAM) transformation named 'AWS::Serverless-2016-10-31'. This transformation provides a simplified syntax for describing serverless resources such as Lambda functions and APIs.
Description : Provides a brief description of the template's purpose, specifically mentioning that it's an example REST API using serverless technology for Spring Boot 3.
Resources: The main section, where all the resources to be created or configured are defined:
MyRestApiFunction: Describes a Lambda function that will handle the REST API requests.
- Handler: Specifies the Java class and method that AWS Lambda will call to start executing your function.
- Runtime: Specifies the runtime environment (Java 17).
- CodeUri: The location of the code for the Lambda function (in this case, the current directory).
- MemorySize: The amount of memory available to the function during execution (512 MB).
- Policies: The permissions assigned to the function, here using a predefined basic execution role.
- Timeout: The maximum amount of time the function is allowed to run (60 seconds).
- Events: Defines an HTTP API event that triggers the Lambda function. It sets a specific timeout and payload format version for the event.
Outputs: The section defines values that you can import into other stacks, return in response, or view in the AWS CloudFormation console.

RestApi: The output provides the URL for the REST API application, constructed dynamically using the region where the stack is launched.

Overall, this CloudFormation template represents a serverless application for a RESTful API using Spring Boot 3, capable of being deployed on AWS using Lambda and other serverless technologies. It includes configurations and properties that define how the API and Lambda functions should behave.

Summary

This article introduced serverless architecture's and AWS Lambda's benefits for application development, highlighting reduced operational overhead, cost-effectiveness, and scalability. We explored the AWS SAM framework and the process of refactoring code for serverless compatibility and provided an example of deploying a Spring Boot 3 REST API using a CloudFormation template.

Deploying a Single Page Application (SPA) on AWS: A Beginner's Guide. Part 7. AWS App Runner

Andrii Melashchenko — Mon, 04 Dec 2023 07:25:01 +0000

Introduction

In this part of the series, the focus is on running your backend application in a production environment. The goal is to provide the necessary tooling to fulfil the Observable, Seamlessly Updatable, and Failure-Tolerant criteria of a Cloud-Ready App, as described in the FROSST criteria from the Cloud Strategy - A Decision-based Approach to Successful Cloud Migration - The Architect Elevator Book.

Before CloudFormation (CFN) Template

Before diving into the CloudFormation template, it's essential to set up observability. This can be done by creating an observability configuration using the AWS CLI:

aws apprunner create-observability-configuration --observability-configuration-name rest-api --trace-configuration Vendor=AWSXRAY

After creating the configuration, retrieve the ARN from the ObservabilityConfigurationArn field in the output, as shown in the JSON snippet:

{
    "ObservabilityConfiguration": {
        "ObservabilityConfigurationArn": "arn:aws:apprunner:eu-central-1:1111111111:observabilityconfiguration/rest-api/1/xxxxxx",
        "ObservabilityConfigurationName": "rest-api",
        "TraceConfiguration": {
            "Vendor": "AWSXRAY"
        },
        "ObservabilityConfigurationRevision": 1,
        "Latest": true,
        "Status": "ACTIVE",
        "CreatedAt": "2023-10-12T19:11:23.425000+02:00"
    }
}

App Runner CloudFormation (CFN) Template to Support Failure Tolerance

Let's dive into the CloudFormation template:

AWSTemplateFormatVersion: '2010-09-09'
Description: Backend deployed viaAWS App Runner
Parameters:
  Environment:
    Type: String
    Default: production
    Description: A name for the environment that this cloudformation will be part of.
                 Used to locate other resources in the same environment.
  ImageUrl:
    Type: String
    Description: The url of a docker image that contains the application process that
                 will handle the traffic for this service. Should be on ECR private
  ObservabilityConfigurationArnParam:
    Type: String
    Description: Create it via aws cli aws apprunner create-observability-configuration --observability-configuration-name rest-api --trace-configuration Vendor=AWSXRAY

Resources: 
  # Backend
  AppRunnerRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2008-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - build.apprunner.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSAppRunnerServicePolicyForECRAccess

  Backend:
    Type: AWS::AppRunner::Service
    Properties:
      ServiceName: !Sub ${AWS::StackName}-restapi-${Environment}
      ObservabilityConfiguration:
        ObservabilityEnabled: true
        ObservabilityConfigurationArn: !Ref ObservabilityConfigurationArnParam
      InstanceConfiguration:
        Cpu: 0.25 vCPU
        Memory: 0.5 GB
      SourceConfiguration:
        AuthenticationConfiguration:
          AccessRoleArn: !GetAtt AppRunnerRole.Arn
        ImageRepository:
          ImageRepositoryType: ECR
          ImageIdentifier: !Ref ImageUrl
          ImageConfiguration:
            Port: 8080
      HealthCheckConfiguration:
          HealthyThreshold: 1
          Interval: 10
          Path: /api/greeting
          Protocol: HTTP
          Timeout: 5
          UnhealthyThreshold: 5
      Tags:
        - Key: environment
          Value: !Ref Environment

  # Api Gateway
  ApiGatewayRestApi:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: !Sub ${AWS::StackName}-gateway-${Environment}
  ApiGatewayResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId:
        !GetAtt ApiGatewayRestApi.RootResourceId
      PathPart: '{proxy+}'
      RestApiId:
        !Ref ApiGatewayRestApi
  ApiGatewayMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      HttpMethod: ANY
      ResourceId:
        !Ref ApiGatewayResource
      RestApiId:
        !Ref ApiGatewayRestApi
      AuthorizationType: NONE
      RequestParameters:
        method.request.path.proxy: true
      Integration:
        RequestParameters:
          integration.request.path.proxy: method.request.path.proxy
        Type: HTTP_PROXY
        IntegrationHttpMethod: ANY
        Uri: !Sub "https://${Backend.ServiceUrl}/{proxy}"
  ApiGatewayDeployment:
    Type: 'AWS::ApiGateway::Deployment'
    DependsOn:
      - ApiGatewayMethod
    Properties:
      RestApiId:
        !Ref ApiGatewayRestApi
      StageName: !Ref Environment

Outputs:
  AppRunnerServiceArn:
    Description: AppRunnerServiceArn
    Value: !GetAtt Backend.ServiceArn
  AppRunnerServiceId:
    Description: AppRunnerServiceId
    Value: !GetAtt Backend.ServiceId
  AppRunnerServiceUrl:
    Description: AppRunnerServiceUrl
    Value: !GetAtt Backend.ServiceUrl
  IntegrationAPI:
    Description: URL For CDN
    Value: !Sub "https://${ApiGatewayRestApi}.execute-api.${AWS::Region}.amazonaws.com/${Environment}"

This AWS CloudFormation template defines the infrastructure for deploying a backend application using AWS App Runner and exposing it via AWS API Gateway. Let's break down its key components:

Parameters

Environment: Specifies the name of the environment (e.g., production or development) that this CloudFormation stack is part of.
ImageUrl: The URL of a Docker image containing the application process to handle traffic. This image should be stored in a private Amazon ECR registry.
ObservabilityConfigurationArnParam: ARN for the observability configuration, which can be created using the AWS CLI.

Resources

AppRunnerRole (IAM Role): This role allows the App Runner service to assume it and interact with other AWS services. It also attaches the necessary managed policy to access ECR.
Backend (AWS App Runner Service): Defines the App Runner service, specifying its name, observability configuration, instance configuration (CPU and memory), source configuration (image repository details), health check configuration, and tags.
ApiGatewayRestApi (API Gateway REST API): Creates the entry point for the API Gateway.
ApiGatewayResource (API Gateway Resource): Configures a wildcard proxy resource to forward all requests to the App Runner service.
ApiGatewayMethod (API Gateway Method): Defines a method that allows any HTTP method and integrates with the backend via HTTP_PROXY.
ApiGatewayDeployment (API Gateway Deployment): Deploys the API Gateway to a specified stage, making it accessible externally.

Outputs

AppRunnerServiceArn : The ARN of the App Runner service.
AppRunnerServiceId : The ID of the App Runner service.
AppRunnerServiceUrl : The URL of the App Runner service.
IntegrationAPI : The external URL for accessing the API through the API Gateway.

This template sets up a backend service using AWS App Runner, utilizing a Docker image from ECR, and configures an API Gateway to handle and forward HTTP requests to this backend. It also supports observability through AWS X-Ray and offers outputs for accessing both the backend service and the API Gateway endpoint. Parameterizing the environment allows for different configurations for various deployment stages.

Logs, Monitoring, and Tracing to Support Observability

For comprehensive observability, you can access data about your backend by visiting the App Runner console. Here, you can view logs, monitor metrics, and trace information.

Seamlessly Updatable

To make updates to your backend, follow these steps:

Modify the code by adding the word "new" to a string template in your application.

    @RestController
    public class GreetingController {
        private static final String template = "Hello, new %s!";

Build the backend:

    $ ./gradlew bootBuildImage --imageName=spring-rest-api

Publish the new version to the private Elastic Container Registry (ECR):

    aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 11111111.dkr.ecr.eu-central-1.amazonaws.com
    docker tag spring-rest-api:latest 111111111.dkr.ecr.eu-central-1.amazonaws.com/ias:spring-rest-api
    docker push 111111111.dkr.ecr.eu-central-1.amazonaws.com/ias:spring-rest-api

Trigger an App Runner update:

    aws apprunner start-deployment --service-arn arn:aws:apprunner:eu-central-1:1111111111:service/apprunner-restapi-production/2222222222222222222

You can check for logs in the App Runner console.

After querying the API Gateway endpoint, you should receive a "new" response:

curl https://xxxxxxxxxxx.execute-api.eu-central-1.amazonaws.com/production/api/greeting

Response:

{"id":22,"content":"Hello, new World!"}

Summary

This article has provided a comprehensive guide to deploying a backend application using AWS App Runner and exposing it through AWS API Gateway. It covered the creation of an AWS CloudFormation template, which defines the essential infrastructure, parameters, and resources. The template ensures observability through AWS X-Ray and supports seamless updates by demonstrating code changes and deployment triggers using a sample Spring Boot application.

DEV Community: Andrii Melashchenko

GEO Score: A Free Scanner for AI Agent Discoverability

How Scoring Works

What GEO Score Measures

1. The Discovery Layer (Head Checks)

2. The HTML Layer (Homepage Parse)

3. The MCP Layer (Conditional)

Bonus Checks: Scoring Beyond 100%

⚠️ Important: What GEO Score is NOT

The Three-Tier Bot Taxonomy: Why Blanket Blocking Is a Mistake

Dogfooding: Scanning javatask.dev

A Note on Scanning Homepages vs. Posts

Technical Brief: Rust, Axum, and Lambda

Try it Yourself

Perfect Technology Storm: The Beginning of Software Engineering 3.0

Key Thesis

Critical Business Implications

Bottom Line

Why This Transformation is Happening Now

Wave 1: The Data Foundation (The 2000s)

Wave 2: The Application Deconstruction (The 2010s)

Wave 3: The Intelligence Explosion (Late 2010s - Present)

The Convergence: Where the Storm Hits Land

The Compression Effect

Implementation Guide

The New Business Reality

Conclusion: Master the Storm or Be Swept Away

The Dawn of AI Workforce: A 2025 Perspective

Intorduction

The Foundation: Clarity in Investment

AI Workforce Requirements

Inference = AI Workforce: The New Building Block

Inference: A Paradigm Shift - The Executive Perspective

Inference: A Paradigm Shift - The Technical Perspective

Conclusion

How to Define AI Agents with Cloudformation and SAM: A Builder's Guide

Introduction

Prerequisites

Implementation Overview

Understanding AI Agent Components

1. Intern Laptop or Action Groups

2. Your Guide to Intern or Agent Instructions

3. Agent Configuration

First day on the job on the Deployment Process

First question to the Intern or Testing Your AI Agent

Conclusion

Developing an AI Intern for C-Level Executives with AWS Bedrock

Introduction

The Story

The Result - AI Agent Generated Report

[Start]Hospitality Business Performance Report - H1 2024

Executive Summary

Detailed Analysis

Overall Performance

Monthly Trends

Key Observations

Recommendations

[End]Conclusion

Technicalities

Understanding the Theory

Data Flow

Implementing the AI Intern

Prerequisites

Tool Configuration

Conclusion

Appendix 1. Full Instruction to Intern

What C-Level Leaders Need to Know About AI Agents

Introduction

Story

Disclaimer

Who are Interns or Large Language Models (LLMs)?

And Here Comes Your Context or Prompt Engineering

Let's Talk About Tons of Documents or Retrieval-Augmented Generation

Intern in University or LLM Fine-tuning

Giving Some Freedom to Your Intern or Agentic AI

Recap or the Beginning of a New Era

Conclusion

Introduction to AWS SiteWise: Part 1. An IIoT OPC UA End-to-End Guide

Introduction: AWS IoT SiteWise for Network Device Monitoring

The Big Picture: Integrating OT Network with AWS SiteWise