DEV Community: Goh Chun Lin

Beyond the Cert: In the Age of AI

Goh Chun Lin — Sun, 26 Oct 2025 02:55:01 +0000

For the fourth consecutive year, I have renewed my Azure Developer Associate certification. It is a valuable discipline that keeps my knowledge of the Azure ecosystem current and sharp. The performance report I received this year was particularly insightful, highlighting both my strengths in security fundamentals and the expected gaps in platform-specific nuances, given my recent work in AWS.

Objectives

Renewing Azure certification is a hallmark of a professional craftsman because it sharpens our tools, knowing our trade. For a junior or mid-level engineer, this path of structured learning and certification is the non-negotiable foundation of a solid career. It is the path I walked myself. It builds the grammar of our trade.

However, for a senior engineer, for an architect, the game has changed. The world is now saturated with competent craftsmen who know the grammar. In the age of AI-assisted coding and brutal corporate “flattening,” simply knowing the tools is no longer a defensible position. It has become table stakes.

The paradox of the senior cloud software engineer is that the very map that got us here, i.e. the structured curriculum and the certification path, is insufficient to guide us to the next level. The renewal assessment results for Microsoft Certified: Azure Developer Associate I received was a perfect map of the existing territory. However, an architect’s job is not to be a master of the known world. It is to be a cartographer of the unknown. The report correctly identified that I need to master Azure specific trade-offs, like choosing ‘Session’ consistency over ‘Strong’ for low-latency scenarios in CosmosDB. The senior engineer learns that rule. The architect must ask a deeper question: “How can I build a model that predicts the precise cost and P99 latency impact of that trade-off for my specific workload, before I write a single line of code?”

Attending AWS Singapore User Group monthly meetup.

About the Results

Let’s make this concrete by looking at the renewal assessment report itself. It was a gift, not because of the score, but because it is a perfect case study in the difference between the Senior Engineer’s path and the Architect’s.

Where the report suggests mastering Azure Cosmos DB five consistency levels, it is prescribing an act of knowledge consumption. The architect’s impulse is to ask a different question entirely: “How can I quantify the trade-off?” I do not just want to know that Session is faster than Strong. I should know, for a given workload, how much faster, at what dollar cost per million requests, and with what measurable impact on data integrity. The architect’s response is to build a model to turn the vendor’s qualitative best practice into a quantitative, predictive economic decision.

This pattern continues with managed services. The report correctly noted my failure to memorise the specific implementation of Azure Container Apps. The path it offers is to better learn the abstraction. The architect’s path is to become professionally paranoid about abstractions. The question is not “What is Container Apps?” but “Why does this abstraction exist, and what are its hidden costs and failure modes?” The architect’s response is to design experiments or simulations to stress-test the abstraction and discover its true operational boundaries, not just to read its documentation.

DHH has just slain the dragon of Cloud Dependency, the largest, most fearsome dragon in our entire cloud industry. (Twitter Source: DHH)

This is the new mandate for senior engineers in this new world where we keep on listening senior engineers being out of work: We must evolve from being consumers of complexity to being creators of clarity. We must move beyond mastering the vendor’s pre-defined solutions and begin forging our own instruments to see the future.

From Cert to Personal Project

This is why, in parallel to maintaining my certifications, I have embarked on a different kind of professional development. It is a path of deep, first-principles creation. I am building a discrete event simulation engine not as a personal hobby project, but as a way to understand more about the most expensive and unpredictable problems in our industry. My certification proves I can solve problems the “Azure way.” This new work is about discovering the the fundamental truths that govern all cloud platforms.

Certifications are the foundation. They are the bedrock of our shared knowledge. However, they are not the lighthouse. In this new era, we must be both.

AWS + Azure.

Certifications are an essential foundation. They represent the bedrock of our shared professional knowledge and a commitment to maintaining a common standard of excellence. However they are not, by themselves, the final destination.

Therefore, my next major “proof-of-work” will not be another certificate. It will be the first in a series of public, data-driven case studies derived from my personal project.

Ultimately, a certificate proves that we are qualified and contributing members of our professional ecosystem. This next body of work is intended to prove something more than that. We need to actively solve the complex, high-impact problems that challenge our industry. In this new era, demonstrating both our foundational knowledge and our capacity to create new value is no longer an aspiration. Instead, it is the new standard.

Together, we learn better.

The Blueprint Fallacy: A Case for Discrete Event Simulation in Modern Systems Architecture

Goh Chun Lin — Sat, 18 Oct 2025 04:34:46 +0000

Greetings from Taipei!

I just spent two days at the Hello World Dev Conference 2025 in Taipei, and beneath the hype around cloud and AI, I observed a single, unifying theme: The industry is desperately building tools to cope with a complexity crisis of its own making.

The agenda was a catalog of modern systems engineering challenges. The most valuable sessions were the “踩雷經驗” (landmine-stepping experiences), which offered hard-won lessons from the front lines.

A 2-day technical conference on AI, Kubernetes, and more!

However, these talks raised a more fundamental question for me. We are getting exceptionally good at building tools to detect and recover from failure but are we getting any better at preventing it?

This post is not a simple translation of a Mandarin-language Taiwan conference. It is my analysis of the patterns I observed. I have grouped the key talks I attended into three areas:

Cloud Native Infrastructure;
Reshaping Product Management and Engineering Productivity with AI;
Deep Dives into Advanced AI Engineering.

Feel free to choose to dive into the section that interests you most.

Session: Smart Pizza and Data Observability

This session was led by Shuhsi (林樹熙), a Data Engineering Manager at Micron. Micron needs no introduction, they are a massive player in the semiconductor industry, and their smart manufacturing facilities are a prime example of where data engineering is mission-critical.

Micron in Singapore (Credit: Forbes)

Shuhsi’s talk, “Data Observability by OpenLineage,” started with a simple story he called the “Smart Pizza” anomaly.

He presented a scenario familiar to anyone in a data-intensive environment: A critical dashboard flatlines, and the next three hours are a chaotic hunt to find out why. In his “Smart Pizza” example, the culprit was a silent, upstream schema change.

Smart pizza dashboard anomaly.

His solution, OpenLineage, is a powerful framework for what we would call digital forensics. It is about building a perfect, queryable map of the crime scene after the crime has been committed. By creating a clear data lineage, it reduces the “Mean Time to Discovery” from hours of panic to minutes of analysis.

Let’s be clear: This is critical, valuable work. Like OpenTelemetry for applications, OpenLineage brings desperately needed order to the chaos of modern data pipelines.

It is a fundamentally reactive posture. It helps us find the bullet path through the body with incredible speed and precision. However, my main point is that our ultimate goal must be to predict the bullet trajectory before the trigger is pulled. Data lineage minimises downtime. My work with simulation, which will be explained in the next session, aims to prevent it entirely by modelling these complex systems to find the breaking points before they break.

Session: Automating a .NET Discrete Event Simulation on Kubernetes

My talk, “Simulation Lab on Kubernetes: Automating .NET Parameter Sweeps,” addressed the wall that every complex systems analysis eventually hits: Combinatorial explosion.

While the industry is focused on understanding past failures, my session is about building the Discrete Event Simulation (DES) engine that can calculate and prevent future ones.

A restaurant simulation game in Honkai Impact 3rd. (Source: 西琳 – YouTube)

To make this concrete, I used the analogy of a restaurant owner asking, “Should I add another table or hire another waiter?” The only way to answer this rigorously is to simulate thousands of possible futures. The math becomes brutal, fast: testing 50 different configurations with 100 statistical runs each requires 5,000 independent simulations. This is not a task for a single machine; it requires a computational army.

My solution is to treat Kubernetes not as a service host, but as a temporary, on-demand supercomputer. The strategy I presented had three core pillars:

Declarative Orchestration: The entire 5,000-run DES experiment is defined in a single, clean Argo Workflows manifest, transforming a potential scripting nightmare into a manageable, observable process.
Radical Isolation: Each DES run is containerised in its own pod, creating a perfectly clean and reproducible experimental environment.
Controlled Randomness: A robust seeding strategy is implemented to ensure that “random” events in our DES are statistically valid and comparable across the entire distributed system.

The turnout for my DES session confirmed a growing hunger in our industry for proactive, simulation-driven approaches to engineering.

The final takeaway was a strategic re-framing of a tool many of us already use. Kubernetes is more than a platform for web apps. It can also be a general-purpose compute engine capable of solving massive scientific and financial modelling problems. It is time we started using it as such.

Session: AI for BI

Denny’s (監舜儀) session on “AI for BI” illustrated a classic pain point: The bottleneck between business users who need data and the IT teams who provide it. The proposed solution was a natural language interface, the FineChatBI , a tool designed to sit on top of existing BI platforms to make querying existing data easier.

Denny is introducing AI for BI.

His core insight was that the tool is the easy part. The real work is in building the “underground root system” which includes the immense challenge of defining metrics, managing permissions, and untangling data semantics. Without this foundation, any AI is doomed to fail.

Getting the underground root system right is important for building AI projects.

This is a crucial step forward in making our organisations more data-driven. However, we must also be clear about what problem is being solved.

This is a system designed to provide perfect, instantaneous answers to the question, “What happened?”

My work, and the next category of even more complex AI, begins where this leaves off. It seeks to answer the far harder question: “What will happen if…?” Sharpening our view of the past is essential, but the ultimate strategic advantage lies in the ability to accurately simulate the future.

Session: The Impossibility of Modeling Human Productivity

The presented Jugg (劉兆恭) is a well-known agile coach and the organiser of Agile Tour Taiwan 2020. His talk, “An AI-Driven Journey of Agile Product Development – From Inspiration to Delivery,” was a masterclass in moving beyond vanity metrics to understand and truly improve engineering performance.

Jugg started with a graph that every engineering lead knows in their gut. As a company grows over time:

Business grow (purple line, up);
Software architecture and complexity grow (first blue line, up);
The number of developers increases (second blue line, up);
Expected R&D productivity should grow (green line, up);
But paradoxically, the actual R&D productivity often stagnates or even declines (red line, down).

Jugg provided a perfect analogue for the work I do. He tackled the classic productivity paradox: Why does output stagnate even as teams grow? He correctly diagnosed the problem as a failure of measurement and proposed the SPACE framework as a more holistic model for this incredibly complex human system.

He was, in essence, trying to answer the same class of question I do: “If we change an input variable (team process), how can we predict the output (productivity)?”

This is where the analogy becomes a powerful contrast. Jugg’s world of human systems is filled with messy, unpredictable variables. His solutions are frameworks and dashboards. They are the best tools we have for a system that resists precise calculation.

This session reinforced my conviction that simulation is the most powerful tool we have for predicting performance in the systems we can actually control: Our code and our infrastructure. We do not have to settle for dashboards that show us the past because we can build models that calculate the future.

Session: Building a Map of “What Is” with GraphRAG

The most technically demanding session came from Nils (劉岦崱), a Senior Data Scientist at Cathay Financial Holdings. He presented GraphRAG, a significant evolution beyond the “Naive RAG” most of us use today.

Nils is explaining what a Naive RAG is.

He argued compellingly that simple vector search fails because it ignores relationships. By chunking documents, we destroy the contextual links between concepts. GraphRAG solves this by transforming unstructured data into a structured knowledge graph: a web of nodes (entities) and edges (their relationships).

Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs (Image Credit: LangChain)

In essence, GraphRAG is a sophisticated tool for building a static map of a known world. It answers the question, “How are all the pieces in our universe connected right now?” For AI customer service, this is a game-changer, as it provides a rich, interconnected context for every query.

This means our data now has an explicit, queryable structure. So, the LLM gets a much richer, more coherent picture of the situation, allowing it to maintain context over long conversations and answer complex, multi-faceted questions.

This session was a brilliant reminder that all advanced AI is built on a foundation of rigorous data modelling.

However, a map, no matter how detailed, is still just a snapshot. It shows us the layout of the city, but it cannot tell us how the traffic will flow at 5 PM.

This is the critical distinction. GraphRAG creates a model of a system at rest and DES creates a model of a system in motion. One shows us the relationships while the other lets us press watch how those relationships evolve and interact over time under stress. GraphRAG is the anatomy chart and simulation is the stress test.

Session: Securing the AI Magic Pocket with LLM Guardrails

Nils from Cathay Financial Holdings returned to the stage for Day 2, and this time he tackled one of the most pressing issues in enterprise AI: Security. His talk “Enterprise-Grade LLM Guardrails and Prompt Hardening” was a masterclass in defensive design for AI systems.

What made the session truly brilliant was his central analogy. As he put it, an LLM is a lot like Doraemon : a super-intelligent, incredibly powerful assistant with a “magic pocket” of capabilities. It can solve almost any problem you give it. But, just like in the cartoon, if you give it vague, malicious, or poorly thought-out instructions, it can cause absolute chaos. For a bank, preventing that chaos is non-negotiable.

Nils grounded the problem in the official OWASP Top 10 for LLM Applications.

There are two lines of defence: Guardrails and Prompt Hardening. The core of the strategy lies in understanding two distinct but complementary approaches:

Guardrails (The Fortress): An external firewall of input filters and output validators;
Prompt Hardening (The Armour): Internal defences built into the prompt to resist manipulation.

This is an essential framework for any enterprise deploying LLMs. It represents the state-of-the-art in building static defences.

While necessary, this defensive posture raises another important question for a developers: How does the fortress behave under a full-scale siege?

A static set of rules can defend against known attack patterns. But what about the unknown unknowns? What about the second-order effects? Specifically:

Performance Under Attack: What is the latency cost of these five layers of validation when we are hit with 10,000 malicious requests per second? At what point does the defence itself become a denial-of-service vector?
Emergent Failures: When the system is under load and memory is constrained, does one of these guardrails fail in an unexpected way that creates a new vulnerability?

These are not questions a security checklist can answer. They can only be answered by a dynamic stress test. The X-Teaming Nils mentioned is a step in this direction, but a full-scale DES is the ultimate laboratory.

Neil’s techniques are a static set of rules designed to prevent failure. Simulation is a dynamic engine designed to induce failure in a controlled environment to understand a system true breaking points. He is building the armour while my work with DES is in building the testing grounds to see where that armour will break.

Session: Driving Multi-Task AI with a Flowchart in a Single Prompt

The final and most thought-provoking session was delivered by 尹相志, who presented a brilliant hack: Embedding a Mermaid flowchart directly into a prompt to force an LLM to execute a deterministic, multi-step process.

尹相志，數據決策股份有限公司技術長。

He provided a new way beyond the chaos of autonomous agents and the rigidity of external orchestrators like LangGraph. By teaching the LLM to read a flowchart, he effectively turns it into a reliable state machine executor. It is a masterful piece of engineering that imposes order on a probabilistic system.

Action Grounding Principles proposed by 相志.

What he has created is the perfect blueprint. It is a model of a process as it should run in a world with no friction, no delays, and no resource contention.

And in that, he revealed the final, critical gap in our industry thinking.

A blueprint is not a stress test. A flowchart cannot answer the questions that actually determine the success or failure of a system at scale:

What happens when 10,000 users try to execute this flowchart at once and they all hit the same database lock?
What is the cascading delay if one step in the flowchart has a 5% chance of timing out?
Where are the hidden queues and bottlenecks in this process?

His flowchart is the architect’s beautiful drawing of an airplane. A DES is the wind tunnel. It is the necessary, brutal encounter with reality that shows us where the blueprint will fail under stress.

The ability to define a process is the beginning. The ability to simulate that process under the chaotic conditions of the real world is the final, necessary step to building systems that don’t just look good on paper, but actually work.

Final Thoughts and Key Takeaways from Taipei

My two days at the Hello World Dev Conference were not a tour of technologies. In fact, they were a confirmation of a dangerous blind spot in our industry.

From what I observe, they build tools for digital forensics to map past failures. They sharpen their tools with AI to perfectly understand what just happened. They create knowledge graphs to model the systems at rest. They design perfect, deterministic blueprints for how AI processes should work.

These are all necessary and brilliant advancements in the art of mapmaking.

However, the critical, missing discipline is the one that asks not “What is the map?”, but “What will happen to the city during the hurricane?” The hard questions of latency under load, failures, and bottlenecks are not found on any of their map.

Our industry is full of brilliant mapmakers. The next frontier belongs to people who can model, simulate, and predict the behaviour of complex systems under stress, before the hurricane reaches.

That is why I am building SNA, my .NET-based Discrete Event Simulation engine.

Hello, Taipei. Taken from the window of the conference venue.

I am leaving Taipei with a notebook full of ideas, a deeper understanding of the challenges and solutions being pioneered by my peers in the Mandarin-speaking tech community, and a renewed sense of excitement for the future we are all building.

Building a Gacha Bot in Power Automate and MS Teams

Goh Chun Lin — Tue, 07 Oct 2025 13:47:34 +0000

Every agile team knows the “Support Hero” role, that one person designated to handle the interruptions of the day, bug reports, and urgent requests. In our team, we used a messy spreadsheet to track the rotation. People forgot whose turn it was, someone would be on leave, and the whole thing was a low-grade, daily friction point.

One day, a teammate had a brilliant idea: “What if we made it fun? What if we gamified it?”

He quickly prototyped an gacha bot using Power Automate that would randomly select the hero of the day. It was a huge hit. It turned a daily chore into a fun moment of team engagement. It was a perfect example of a small automation making a big impact on our culture.

Over time, as team members changed and responsibilities shifted, that original gacha bot was lost. The fun morning ritual disappeared, and we went back to the old, boring way. We all felt the difference.

Recently, I decided it was time to bring that spark back. I took the original, brilliant concept and decided to re-build it from the ground up as a robust, reusable, and shareable solution.

This post is a tribute to that original idea, and a detailed, step-by-step guide on how you can build a similar gacha bot for your own team. Let’s make our daily routines fun again.

How it Works: The Daily Gacha Ritual

Before we open the hood and look at the Power Automate engine, let me walk you through what my team actually experiences every morning at 10:00 AM.

It all starts with a message from the bot to the Microsoft Teams group of our team. The message says the following.

Hi, Louisa. You are the lucky Support Hero today.

This is the moment of suspense. Everyone sees the ping. Louisa, one of our teammates, is now in the spotlight.

However, what if Louisa is on vacation, sipping a drink on a beach in Bali? The bot is prepared. Immediately following the announcement, it posts a second message which is an interactive Adaptive Card:

Is our teammate mentioned above working today?
[] Yes.
[] No.
[] I volunteer!
[Submit Status]

This is where the team interaction happens.

If Louisa is around , she proudly clicks ‘Yes.’ The card updates to say ‘Louisa has accepted the quest!’ and the ritual is over.
If Louisa is on leave , anyone on the team can click ‘No.’ This immediately triggers the bot to run the gacha again, announcing a new hero.
And my favourite part is that if someone else, for example Austin, is feeling particularly heroic that day, he can click ‘ I volunteer! ‘ This lets him steal the spotlight and take on the role, giving Louisa a day off. The card updates to say ‘A new hero has emerged! Austin has volunteered for the quest!'”

Within a minute, the daily chore is assigned, not through a boring spreadsheet, but through a fun, interactive, and slightly dramatic team ritual. It is a small thing, but it starts our day with a smile and a sense of shared fun.

Now that you have seen what it does, let’s build it.

Step 1: Define The Trigger

First, I setup a “Schedule cloud flow” so that every morning 10am, a message will be sent to the Teams on who is the lucky one.

Second, I will name the flow and define its starting date and time. As shown in the following screenshot, we will set the occurrence to be every day, starting from 1st Oct 2025, 00:00.

Please take note that in the step above, the “12am” is the beginning time, not the time when this job will be executed daily. So in the first node of the flow itself, I have to define at what time the gacha bot will start and at which timezone. Since our daily support needs to be done in the morning, we will make it run at 10am everyday, as shown in the screenshot below.

Step 2: Define Variables and Controls

After that, we add a new “ Initialize Variable ” node where we can define name of all the teammates.

We also need another variable to later store the response of the user on the adaptive card, as shown in the screenshot below.

Since this gacha only makes sense during weekday, so I need a “ Condition ” block to check whether the day is a weekday or not. If it is a weekend, the bot will not send any message.

As shown in the screenshot above, what I do is checking the value of dayOfWeek(convertFromUtc(utcNow(), 'Singapore Standard Time')).

Since there is nothing to be done when it is a weekend, so we will leave the “False” block as empty. For the “True” block, we will have a “ Do Until ” block because the gacha bot needs to keep on selecting a name until someone clicks “Yes” or “Volunteer”. Hence, as shown in the screenshot below, the loop will loop until responseChoice is not “No”.

Step 3: Inside the Loop

There are three important “ Compose ” data operations.

Generate Random Index : To generate a random number from 0 to the number of the team members. rand(0, length(variables('teamMembers')))
Select Random Teammate Object : The random number is used to pick the hero from the array. variables('teamMembers')[int(outputs('Compose:_Generate_Random_Index'))]
Get Name of Hero : Get the name of the person from the array. outputs('Compose:_Select_Random_Teammate_Object')['name']

After the three data operations are added, the flow now looks as shown below.

According the our designed workflow, after a hero is selected, we can send a message with the “ Post message in a chat or channel ” action to inform the team who is being selected by the gacha bot.

Next we need to post an adaptive card to Microsoft Teams and wait for a response. In our case, since the adaptive card is posted to group chat, we need to put an entire JSON below to the Message field.

{
    "type": "AdaptiveCard",
    "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
    "version": "1.4",
    "body": [
        {
            "type": "TextBlock",
            "text": "Daily Check-In",
            "wrap": true,
            "size": "Large",
            "weight": "Bolder"
        },
        {
            "type": "TextBlock",
            "text": "Please pick an option accordingly.",
            "wrap": true
        },
        {
            "type": "Input.ChoiceSet",
            "id": "userChoice",
            "style": "expanded",
            "isMultiSelect": false,
            "label": "Is our teammate mentioned above working today?",
            "choices": [
                {
                    "title": "Yes.",
                    "value": "Yes"
                },
                {
                    "title": "No.",
                    "value": "No"
                },
                {
                    "title": "I volunteer!",
                    "value": "Volunteer"
                }
            ]
        }
    ],
    "actions": [
        {
            "type": "Action.Submit",
            "title": "Submit Status"
        }
    ]
}

In short, the “ Post adaptive card and wait for a response ” action will be setup as shown in the following screenshot.

Step 4: Handle the User’s Response

Right after the adaptive card, I setup a “Switch” control to handle the user’s response.

If the response is “Yes”, there will be a confirmation sent to the Microsoft Teams group chat. If the response is “Volunteer”, before a confirmation message is sent, the bot needs to know who responds so that it can indicate the volunteer’s name. To do so, I use a “Get user profile (V2)” action with body/responder/userPrincipalName as the UPN, as shown in the screenshot below.

The Office 365 Users node will give us the friendly display name of the person who volunteers, as shown in the screenshot below.

Your Turn

So, what have we really built here? On the surface, it is just a simple Power Automate flow. However, the real product is not the bot. Instead, it is the daily moment of shared fun. We did not just automate a chore but we engineered a small spark of joy and human connection into our daily routine. We used technology to solve a human problem, not just a technical one.

Now, it is your turn.

Your mission, should you choose to accept it, is to find the single most boring, repetitive chore that your own team has to deal with. Find that small, grey corner of the life of your team, and ask yourself: “How can I make this fun?”

Together, we learn better.

Securing APIs with OAuth2 Introspection

Goh Chun Lin — Sat, 09 Aug 2025 05:06:51 +0000

In today’s interconnected world, APIs are the backbone of modern apps. Protecting these APIs and ensuring only authorised users access sensitive data is now more crucial than ever. While many authentication and authorisation methods exist, OAuth2 Introspection stands out as a robust and flexible approach. In this post, we will explore what OAuth2 Introspection is, why we should use it, and how to implement it in our .NET apps.

Before we dive into the technical details, let’s remind ourselves why API security is so important. Think about it: APIs often handle the most sensitive stuff. If those APIs are not well protected, we are basically opening the door to some nasty consequences. Data breaches? Yep. Regulatory fines (GDPR, HIPAA, you name it)? Potentially. Not to mention, losing the trust of our users. A secure API shows that we value their data and are committed to keeping it safe. And, of course, it helps prevent the bad guys from exploiting vulnerabilities to steal data or cause all sorts of trouble.

The most common method of securing APIs is using access tokens as proof of authorization. These tokens, typically in the form of JWTs (JSON Web Tokens), are passed by the client to the API with each request. The API then needs a way to validate these tokens to verify that they are legitimate and haven’t been tampered with. This is where OAuth2 Introspection comes in.

OAuth2 Introspection

OAuth2 Introspection is a mechanism for validating bearer tokens in an OAuth2 environment. We can think of it as a secure lookup service for our access tokens. It allows an API to query an auth server, which is also the “issuer” of the token, to determine the validity and attributes of a given token.

The workflow of an OAuth2 Introspection request.

To illustrate the process, the diagram above visualises the flow of an OAuth2 Introspection request. The Client sends the bearer token to the Web API, which then forwards it to the auth server via the introspection endpoint. The auth server validates the token and returns a JSON response, which is then processed by the Web API. Finally, the Web API grants (or denies) access to the requested resource based on the token validity.

Introspection vs. Direct JWT Validation

You might be thinking, “Isn’t this just how we normally validate a JWT token?” Well, yes… and no. What is the difference, and why is there a special term “Introspection” for this?

With direct JWT validation, we essentially check the token ourselves, verifying its signature, expiry, and sometimes audience. Introspection takes a different approach because it involves asking the auth server about the token status. This leads to differences in the pros and cons, which we will explore next.

With OAuth2 Introspection, we gain several key advantages. First, it works with various token formats (JWTs, opaque tokens, etc.) and auth server implementations. Furthermore, because the validation logic resides on the auth server, we get consistency and easier management of token revocation and other security policies. Most importantly, OAuth2 Introspection makes token revocation straightforward (e.g., if a user changes their password or a client is compromised). In contrast, revoking a JWT after it has been issued is significantly more complex.

.NET Implementation

Now, let’s see how to implement OAuth2 Introspection in a .NET Web API using the AddOAuth2Introspection authentication scheme.

The core configuration lives in our Program.cs file, where we set up the authentication and authorisation services.

// ... (previous code for building the app)

builder.Services.AddAuthentication("Bearer")
   .AddOAuth2Introspection("Bearer", options =>
   {
       options.IntrospectionEndpoint = "<Auth server base URL>/connect/introspect";
       options.ClientId = "<Client ID>";
       options.ClientSecret = "<Client Secret>";

       options.DiscoveryPolicy = new IdentityModel.Client.DiscoveryPolicy
       {
           RequireHttps = false, 
       };
   });

builder.Services.AddAuthorization();

// ... (rest of the Program.cs)

This code above configures the authentication service to use the “Bearer” scheme, which is the standard for bearer tokens. AddOAuth2Introspection(…) is where the magic happens because it adds the OAuth2 Introspection authentication handler by pointing to IntrospectionEndpoint, the URL our API will use to send the token for validation.

Usually, RequireHttps needs to be true in production. However , in situations like when the API and the auth server are both deployed to the same Elastic Container Service (ECS) cluster and they communicate internally within the AWS network, we can set it to false. This is because the Application Load Balancer (ALB) handles the TLS/SSL termination and the internal communication between services happens over HTTP, we can safely disable RequireHttps in the DiscoveryPolicy for the introspection endpoint within the ECS cluster. This simplifies the setup without compromising security, as the communication from the outside world to our ALB is already secured by HTTPS.

Finally, to secure our API endpoints and require authentication, we can simply use the [Authorize] attribute, as demonstrated below.

[ApiController]
[Route("[controller]")]
[Authorize]
public class MyController : ControllerBase
{
   [HttpGet("GetData")]
   public IActionResult GetData()
   {
       ...
   }
}

Wrap-Up

OAuth2 Introspection is a powerful and flexible approach for securing our APIs, providing a centralised way to validate bearer tokens and manage access. By understanding the process, implementing it correctly, and following best practices, we can significantly improve the security posture of your applications and protect your valuable data.

References

Observing Orchard Core: Traces with Grafana Tempo and ADOT

Goh Chun Lin — Mon, 26 May 2025 15:01:07 +0000

In the previous article, we have discussed about how we can build a custom monitoring pipeline that has Grafana running on Amazon ECS to receive metrics and logs, which are two of the observability pillars, sent from the Orchard Core on Amazon ECS. Today, we will proceed to talk about the third pillar of observability, traces.

Source Code

The CloudFormation templates and relevant C# source codes discussed in this article is available on GitHub as part of the Orchard Core Basics Companion (OCBC) Project: https://github.com/gcl-team/Experiment.OrchardCore.Main.

Lisa Jung, senior developer advocate at Grafana, talks about the three pillars in observability (Image Credit: Grafana Labs)

About Grafana Tempo

To capture and visualise traces, we will use Grafana Tempo, an open-source, scalable, and cost-effective tracing backend developed by Grafana Labs. Unlike other tracing tools, Tempo does not require an index, making it easy to operate and scale.

We choose Tempo because it is fully compatible with OpenTelemetry, the open standard for collecting distributed traces, which ensures flexibility and vendor neutrality. In addition, Tempo seamlessly integrates with Grafana, allowing us to visualise traces alongside metrics and logs in a single dashboard.

Finally, being a Grafana Labs project means Tempo has strong community backing and continuous development.

About OpenTelemetry

With a solid understanding of why Tempo is our tracing backend of choice, let’s now dive deeper into OpenTelemetry, the open-source framework we use to instrument our Orchard Core app and generate the trace data Tempo collects.

OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project and a vendor-neutral, open standard for collecting traces, metrics, and logs from our apps. This makes it an ideal choice for building a flexible observability pipeline.

OpenTelemetry provides SDKs for instrumenting apps across many programming languages, including C# via the .NET SDK, which we use for Orchard Core.

OpenTelemetry uses the standard OTLP (OpenTelemetry Protocol) to send telemetry data to any compatible backend, such as Tempo, allowing seamless integration and interoperability.

Both Grafana Tempo and OpenTelemetry are projects under the CNCF umbrella. (Image Source: CNCF Cloud Native Interactive Landscape)

Setup Tempo on EC2 With CloudFormation

It is straightforward to deploy Tempo on EC2.

Let’s walk through the EC2 UserData script that installs and configures Tempo on the instance.

First, we download the Tempo release binary, extract it, move it to a proper system path, and ensure it is executable.

wget https://github.com/grafana/tempo/releases/download/v2.7.2/tempo_2.7.2_linux_amd64.tar.gz
tar -xzvf tempo_2.7.2_linux_amd64.tar.gz
mv tempo /usr/local/bin/tempo
chmod +x /usr/local/bin/tempo

Next, we create a basic Tempo configuration file at /etc/tempo.yaml to define how Tempo listens for traces and where it stores trace data.

echo "
server:
  http_listen_port: 3200
distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/traces
" > /etc/tempo.yaml

Let’s breakdown the configuration file above.

The http_listen_port allows us to set the HTTP port (3200) for Tempo internal web server. This port is used for health checks and Prometheus metrics.

After that, we configure where Tempo listens for incoming trace data. In the configuration above, we enabled OTLP receivers via both gRPC and HTTP, the two protocols that OpenTelemetry SDKs and agents use to send data to Tempo. Here, the ports 4317 (gRPC) and 4318 (HTTP) are standard for OTLP.

Last but not least, in the configuration, as demonstration purpose, we use the simplest one, local storage, to write trace data to the EC2 instance disk under /tmp/tempo/traces. This is fine for testing or small setups, but for production we will likely want to use services like Amazon S3.

In addition, since we are using local storage on EC2, we can easily SSH into the EC2 instance and directly inspect whether traces are being written. This is incredibly helpful during debugging. What we need to do is to run the following command to see whether files are being generated when our Orchard Core app emits traces.

ls -R /tmp/tempo/traces

The configuration above is intentionally minimal. As our setup grows, we can explore advanced options like remote storage, multi-tenancy, or even scaling with Tempo components.

Each flushed trace block (folder with UUID) contains a data.parquet file, which holds the actual trace data.

Finally, in order to enable Tempo to start on boot, we create a systemd unit file that allows Tempo to start on boot and automatically restart if it crashes.

cat <<EOF > /etc/systemd/system/tempo.service
[Unit]
Description=Grafana Tempo service
After=network.target

[Service]
ExecStart=/usr/local/bin/tempo -config.file=/etc/tempo.yaml
Restart=always
RestartSec=5
User=root
LimitNOFILE=1048576

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reexec
systemctl daemon-reload
systemctl enable --now tempo

This systemd service ensures that Tempo runs in the background and automatically starts up after a reboot or a crash. This setup is crucial for a resilient observability pipeline.

Did You Know: When we SSH into an EC2 instance running Amazon Linux 2023, we will be greeted by a cockatiel in ASCII art! (Image Credit: OMG! Linux)

Understanding OTLP Transport Protocols

In the previous section, we configured Tempo to receive OTLP data over both gRPC and HTTP. These two transport protocols are supported by the OTLP, and each comes with its own strengths and trade-offs. Let’s break them down.

Ivy Zhuang from Google gave a presentation on gRPC and Protobuf at gRPConf 2024. (Image Credit: gRPC YouTube)

Tempo has native support for gRPC, and many OpenTelemetry SDKs default to using it. gRPC is a modern, high-performance transport protocol built on top of HTTP/2. It is the preferred option when performanceis critical. gRPC also supports streaming, which makes it ideal for high-throughput scenarios where telemetry data is sent continuously.

However, gRPC is not natively supported in browsers, so it is not ideal for frontend or web-based telemetry collection unless a proxy or gateway is used. In such scenarios, we will normally choose HTTP which is browser-friendly. HTTP is a more traditional request/response protocol that works well in restricted environments.

Since we are collecting telemetry from server-side like Orchard Core running on ECS, gRPC is typically the better choice due to its performance benefits and native support in Tempo.

Please take note that since gRPC requires HTTP/2, which some environments, for example, IoT devices and embedding systems, might not have mature gRPC client support, OTLP over HTTP is often preferred in simpler or constrained systems.

Daniel Stenberg, Senior Network Engineer at Mozilla, sharing about HTTP/2 at GOTO Copenhagen 2015. (Image Credit: GOTO Conferences YouTube)

gRPC allows multiplexing over a single connection using HTTP/2. Hence, in gRPC, all telemetry signals, i.e. logs, metrics, and traces, can be sent concurrently over one connection. However, with HTTP, each telemetry signal needs a separate POST request to its own endpoint as listed below to enforce clean schema boundaries, simplify implementation, and stay aligned with HTTP semantics.

Logs: /v1/logs;
Metrics: /v1/metrics;
Traces: /v1/traces.

In HTTP, since each signal has its own POST endpoint with its own protobuf schema in the body, there is no need for the receiver to guess what is in the body.

AWS Distro for Open Telemetry (ADOT)

Now that we have Tempo running on EC2 and understand the OTLP protocols it supports, the next step is to instrument our Orchard Core to generate and send trace data.

The following code snippet shows what a typical direct integration with Tempo might look like in an Orchard Core.

builder.Services
    .AddOpenTelemetry()
    .ConfigureResource(resource => resource.AddService(serviceName: "cld-orchard-core"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://<tempo-ec2-host>:4317");
            options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
        })
        .AddConsoleExporter());

This approach works well for simple use cases during development stage, but it comes with trade-offs that are worth considering. Firstly, we couple our app directly to the observability backend, reducing flexibility. Secondly, central management becomes harder when we scale to many services or environments.

This is where AWS Distro for OpenTelemetry (ADOT) comes into play.

The ADOT collector. (Image credit: ADOT technical docs)

ADOT is a secure, AWS-supported distribution of the OpenTelemetry project that simplifies collecting and exporting telemetry data from apps running on AWS services, for example our Orchard Core on ECS now. ADOT decouples our apps from the observability backend, provides centralised configuration, and handles telemetry collection more efficiently.

Sidecar Pattern

We can deploy the ADOT in several ways, such as running it on a dedicated node or ECS service to receive telemetry from multiple apps. We can also take the sidecar approach which cleanly separates concerns. Our Orchard Core app will focus on business logic, while a nearby ADOT sidecar handles telemetry collection and forwarding. This mirrors modern cloud-native patterns and gives us more flexibility down the road.

The sidecar pattern running in Amazon ECS. (Image Credit: AWS Open Source Blog)

The following CloudFormation template shows how we deploy ADOT as a sidecar in ECS using CloudFormation. The collector config is stored in AWS Systems Manager Parameter Store under /myapp/otel-collector-config, and injected via the AOT_CONFIG_CONTENT environment variable. This keeps our infrastructure clean, decoupled, and secure.

ecsTaskDefinition:
  Type: AWS::ECS::TaskDefinition
  Properties:
    Family: !Ref ServiceName
    NetworkMode: awsvpc 
    ExecutionRoleArn: !GetAtt ecsTaskExecutionRole.Arn
    TaskRoleArn: !GetAtt iamRole.Arn
    ContainerDefinitions:
      - Name: !Ref ServiceName
        Image: !Ref OrchardCoreImage
        ...

      - Name: adot-collector
        Image: public.ecr.aws/aws-observability/aws-otel-collector:latest
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Sub "/ecs/${ServiceName}-log-group"
            awslogs-region: !Ref AWS::Region
            awslogs-stream-prefix: adot
        Essential: false
        Cpu: 128
        Memory: 512
        HealthCheck:
          Command: ["/healthcheck"]
          Interval: 30
          Timeout: 5
          Retries: 3
          StartPeriod: 60
        Secrets:
          - Name: AOT_CONFIG_CONTENT
            ValueFrom: !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/otel-collector-config"

Deploy an ADOT sidecar on ECS to collect observability data from Orchard Core.

There are several interesting and important details in the CloudFormation snippet above that are worth calling out. Let’s break them down one by one.

Firstly, we choose awsvpc as the NetworkMode of the ECS task. In awsvpc, each container in the ECS task, i.e. our Orchard Core container and the ADOT sidecar, receives its own ENI (Elastic Network Interface). This is great for network-level isolation. With this setup, we can reference the sidecar from our Orchard Core using its container name through ECS internal DNS, i.e. http://adot-collector:4317.

Secondly, we include a health check for the ADOT container. ECS will use this health check to restart the container if it becomes unhealthy, improving reliability without manual intervention. In November 2022, Paurush Garg from AWS added the healthcheck component with the new ADOT collector release, so we can simply specify that we will be using this healthcheck component in the configuration that we will discuss next.

Yes, the configuration! Instead of hardcoding the ADOT configuration into the task definition, we inject it securely at runtime using the AOT_CONFIG_CONTENT secret. This environment variable AOT_CONFIG_CONTENT is designed to enable us to configure the ADOT collector. It will override the config file used in the ADOT collector entrypoint command.

The SSM Parameter for the environment variable AOT_CONFIG_CONTENT.

Wrap-Up

By now, we have completed the journey of setting up Grafana Tempo on EC2, exploring how traces flow through OTLP protocols like gRPC and HTTP, and understanding why ADOT is often the better choice in production-grade observability pipelines.

With everything connected, our Orchard Core app is now able to send traces into Tempo reliably. This will give us end-to-end visibility with OpenTelemetry and AWS-native tooling.

References

Observing Orchard Core: Metrics and Logs with Grafana and Amazon CloudWatch

Goh Chun Lin — Sun, 27 Apr 2025 09:02:05 +0000

I recently deployed an Orchard Core app on Amazon ECS and wanted to gain better visibility into its performance and health.

Instead of relying solely on basic Amazon CloudWatch metrics, I decided to build a custom monitoring pipeline that has Grafana running on Amazon EC2 receiving metrics and EMF (Embedded Metrics Format) logs sent from the Orchard Core on ECS via CloudFormation configuration.

In this post, I will walk through how I set this up from scratch, what challenges I faced, and how you can do the same.

Source Code

The CloudFormation templates and relevant C# source codes discussed in this article is available on GitHub as part of the Orchard Core Basics Companion (OCBC) Project:https://github.com/gcl-team/Experiment.OrchardCore.Main.

Why Grafana?

In the previous post where we setup the Orchard Core on ECS, we talked about how we can send metrics and logs to CloudWatch. While it is true that CloudWatch offers us out-of-the-box infrastructure metrics and AWS-native alarms and logs, the dashboards CloudWatch provides are limited and not as customisable. Managing observability with just CloudWatch gets tricky when our apps span multiple AWS regions, accounts, or other cloud environments.

The GrafanaLive event in Singapore in September 2023. (Event Page)

If we are looking for solution that is not tied to single vendor like AWS, Grafana can be one of the options. Grafana is an open-source visualisation platform that lets teams monitor real-time metrics from multiple sources, like CloudWatch, X-Ray, Prometheus and so on, all in unified dashboards. It is lightweight, extensible, and ideal for observability in cloud-native environments.

Is Grafana the only solution? Definitely not! However, personally I still prefer Grafana because it is open-source and free to start. In this blog post, we will also see how easy to host Grafana on EC2 and integrate it directly with CloudWatch with no extra agents needed.

Three Pillars of Observability

In observability, there are three pillars, i.e. logs, metrics, and traces.

Lisa Jung, senior developer advocate at Grafana, talks about the three pillars in observability (Image Credit: Grafana Labs)

Firstly, logs are text records that capture events happening in the system.

Secondly, metrics are numeric measurements tracked over time, such as HTTP status code counts, response times, or ECS CPU and memory utilisation rates.

Finally, traces show the form a strong observability foundation which can help us to identify issues faster, reduce downtime, and improve system reliability. This will ultimately support better user experience for our apps.

This is where we need a tool like Grafana because Grafana assists us to visualise, analyse, and alert based on our metrics, making observability practical and actionable.

Setup Grafana on EC2 with CloudFormation

It is straightforward to install Grafana on EC2.

Firstly, let’s define the security group that we will be use for the EC2.

ec2SecurityGroup:
  Type: AWS::EC2::SecurityGroup
  Properties:
    GroupDescription: Allow access to the EC2 instance hosting Grafana
    VpcId: {"Fn::ImportValue": !Sub "${CoreNetworkStackName}-${AWS::Region}-vpcId"}
    SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: 0.0.0.0/0 # Caution: SSH open to public, restrict as needed
      - IpProtocol: tcp
        FromPort: 3000
        ToPort: 3000
        CidrIp: 0.0.0.0/0 # Caution: Grafana open to public, restrict as needed
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackName

The VPC ID is imported from another of the common network stack, the cld-core-network, we setup. Please refer to the stack cld-core-network here.

For demo purpose, please notice that both SSH (port 22) and Grafana (port 3000) are open to the world (0.0.0.0/0). It is important to protect the access to EC2 by adding a bastion host, VPN, or IP restriction later.

In addition, the SSH should only be opened temporarily. The SSH access is for when we need to log in to the EC2 instance and troubleshoot Grafana installation manually.

Now, we can proceed to setup EC2 with Grafana installed using the CloudFormation resource below.

ec2Instance:
  Type: AWS::EC2::Instance
  Properties:
    InstanceType: !Ref InstanceType
    ImageId: !Ref Ec2Ami
    NetworkInterfaces:
      - AssociatePublicIpAddress: true
        DeviceIndex: 0
        SubnetId: {"Fn::ImportValue": !Sub "${CoreNetworkStackName}-${AWS::Region}-publicSubnet1Id"}
        GroupSet:
          - !Ref ec2SecurityGroup
    UserData:
      Fn::Base64: !Sub |
        #!/bin/bash
        yum update -y
        yum install -y wget unzip
        wget https://dl.grafana.com/oss/release/grafana-10.1.0-1.x86_64.rpm
        yum install -y grafana-10.1.0-1.x86_64.rpm
        systemctl enable --now grafana-server
    Tags:
      - Key: Name
        Value: "Observability-Instance"

In the CloudFormation template above, we are expecting our users to access the Grafana dashboard directly over the Internet. Hence, we put the EC2 in public subnet and assign an Elastic IP (EIP) to it, as demonstrated below, so that we can have a consistent public accessible static IP for our Grafana.

ecsEip:
  Type: AWS::EC2::EIP

ec2EIPAssociation:
  Type: AWS::EC2::EIPAssociation
  Properties:
    AllocationId: !GetAtt ecsEip.AllocationId
    InstanceId: !Ref ec2Instance

For production systems, placing instances in public subnets and exposing them with a public IP requires us to have strong security measures in place. Otherwise, it is recommended to place our Grafana EC2 instance in private subnets and accessed via Application Load Balancer (ALB) or NAT Gateway to reduce the attack surface.

Pump CloudWatch Metrics to Grafana

Grafana supports CloudWatch as a native data source.

With the appropriate AWS credentials and region, we can use Access Key ID and Secret Access Key to grant Grafana the access to CloudWatcch. The user that the credentials belong to must have the AmazonGrafanaCloudWatchAccess policy.

The user that Grafana uses to access CloudWatch must have the AmazonGrafanaCloudWatchAccess policy.

However, using AWS Access Key/Secret in Grafana data source connection details is less secure and not ideal for EC2 setups. In addition, AmazonGrafanaCloudWatchAccess is a managed policy optimised for running Grafana as a managed service within AWS. Thus, it is recommended to create our own custom policy so that we can limit the permissions to only what is needed, as demonstrated with the following CloudWatch template.

ec2InstanceRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
        - Effect: Allow
          Principal:
            Service: ec2.amazonaws.com
          Action: sts:AssumeRole

    Policies:
      - PolicyName: EC2MetricsAndLogsPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Sid: AllowReadingMetricsFromCloudWatch
              Effect: Allow
              Action:
                - cloudwatch:ListMetrics
                - cloudwatch:GetMetricData
              Resource: "*"
            - Sid: AllowReadingLogsFromCloudWatch
              Effect: Allow
              Action:
                - logs:DescribeLogGroups
                - logs:GetLogGroupFields
                - logs:StartQuery
                - logs:StopQuery
                - logs:GetQueryResults
                - logs:GetLogEvents
              Resource: "*"

Again, using our custom policy provides better control and follows the best practices of least privilege.

With IAM role, we do not need to provide AWS Access Key/Secret in Grafana connection details for CloudWatch as a data source.

Visualising ECS Service Metrics

Now that Grafana is configured to pull data from CloudWatch, ECS metrics like CPUUtilization and MemoryUtilization, are available. We can proceed to create a dashboard and select the right namespace as well as the right metric name.

Setting up the diagram for memory utilisation of our Orchard Core app in our ECS cluster.

As shown in the following dashboard, we show memory and CPU utilisation rates because they help us ensure that our ECS services are performing within safe limits and not overusing or underutilizing resources. By monitoring the utilisation, we ensure our services are using just the right amount of resources.

Both ECS service metrics and container insights are displayed on Grafana dashboard.

Visualising ECS Container Insights Metrics

ECS Container Insights Metrics are deeper metrics like task counts, network I/O, storage I/O, and so on.

In the dashboard above, we can also see the number of Task Count. Task Count helps us make sure our services are running the right number of instances at all times.

Task Count by itself is not a cost metric, but if we consistently see high task counts with low CPU/memory usage, it indicates we can potentially consolidate workloads and reduce costs.

Instrumenting Orchard Core to Send Custom App Metrics

Now that we have seen how ECS metrics are visualised in Grafana, let’s move on to instrumenting our Orchard Core app to send custom app-level metrics. This will give us deeper visibility into what our app is really doing.

Metrics should be tied to business objectives. It’s crucial that the metrics you collect align with KPIs that can drive decision-making.

Metrics should be actionable. The collected data should help identify where to optimise, what to improve, and how to make decisions. For example, by tracking app-metrics such as response time and HTTP status codes, we gain insight into both performance and reliability of our Orchard Core. This allows us to catch slowdowns or failures early, improving user satisfaction.

SLA vs SLO vs SLI: Key Differences in Service Metrics (Image Credit: Atlassian)

By tracking response times and HTTP code counts at the endpoint level,

we are measuring SLIs that are necessary to monitor if we are meeting our SLOs.

With clear SLOs and SLIs, we can then focus on what really matters from a performance and reliability perspective. For example, a common SLO could be “99.9% of requests to our Orchard Core API endpoints must be processed within 500ms.”

In terms of sending custom app-level metrics from our Orchard Core to CloudWatch and then to Grafana, there are many approaches depending on our use case. If we are looking for simplicity and speed, CloudWatch SDK and EMF are definitely the easiest and most straightforward methods we can use to get started with sending custom metrics from Orchard Core to CloudWatch, and then visualising them in Grafana.

Using CloudWatch SDK to Send Metrics

We will start with creating a middleware called EndpointStatisticsMiddleware with AWSSDK.CloudWatch NuGet package referenced. In the middleware, we create a MetricDatum object to define the metric that we want to send to CloudWatch.

var metricData = new MetricDatum
    {
        MetricName = metricName,
        Value = value,
        Unit = StandardUnit.Count,
        Dimensions = new List<Dimension>
        {
            new Dimension
            {
                Name = "Endpoint", 
                Value = endpointPath
            }
        }
    };

var request = new PutMetricDataRequest
    {
        Namespace = "Experiment.OrchardCore.Main/Performance",
        MetricData = new List<MetricDatum> { metricData }
    };

In the code above, we see new concepts like Namespace, Metric, and Dimension. They are foundational in CloudWatch. We can think of them as ways to organize and label our data to make it easy to find, group, and analyse.

Namespace : A container or category for our metrics. It helps to group related metrics together;
Metric : A series of data points that we want to track. The thing we are measuring, in our example, it could be Http2xxCount and Http4xxCount;
Dimension :A key-value pair that adds context to a metric.

If we do not define the Namespace, Metric, and Dimensions carefully when we send data, Grafana later will not find them, or our charts on the dashboards will be very messy and hard to filter or analyse.

In addition, as shown in the code above, we are capturing the HTTP status code for our Orchard Core endpoints. We will then use PutMetricDataAsync to send the metric data PutMetricDataRequest asynchronously to CloudWatch.

The HTTP status codes of each of our Orchard Core endpoints are now captured on CloudWatch.

In Grafana, now when we want to configure a CloudWatch panel to show the HTTP status codes for each of the endpoint, the first thing we select is the Namespace, which is Experiment.OrchardCore.Main/Performance in our example. Namespace tells Grafana which group of metrics to query.

After picking the Namespace, Grafana lists the available Metrics inside that Namespace. We pick the Metrics we want to plot, such as Http2xxCount and Http4xxCount. Finally, since we are tracking metrics by endpoint, we set the Dimension to Endpoint and select the specific endpoint we are interested in, as shown in the following screenshot.

Using EMF to Send Metrics

While using the CloudWatch SDK works well for sending individual metrics, EMF (Embedded Metric Format) offers a more powerful and scalable way to log structured metrics directly from our app logs.

Before we can use EMF, we must first ensure that the Orchard Core application logs from our ECS tasks are correctly sent to CloudWatch Logs. This is done by configuring the LogConfiguration inside the ECS TaskDefinition as we discussed last time.

  # Unit 12: ECS Task Definition and Service
  ecsTaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ...
      ContainerDefinitions:
        - Name: !Ref ServiceName
          Image: !Ref OrchardCoreImage
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Sub "/ecs/${ServiceName}-log-group"
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: ecs
          ...

Once the ECS task is sending logs to CloudWatch Logs, we can start embedding custom metrics into the logs using EMF.

Instead of pushing metrics directly using the CloudWatch SDK, we send structured JSON messages into the container logs. CloudWatch will then auto detects these EMF messages and converts them into CloudWatch Metrics.

The following shows what a simple EMF log message looks like.

{
  "_aws": {
    "Timestamp": 1745653519000,
    "CloudWatchMetrics": [
      {
        "Namespace": "Experiment.OrchardCore.Main/Performance",
        "Dimensions": [["Endpoint"]],
        "Metrics": [
          { "Name": "ResponseTimeMs", "Unit": "Milliseconds" }
        ]
      }
    ]
  },
  "Endpoint": "/api/v1/packages",
  "ResponseTimeMs": 142
}

When a log message reaches CloudWatch Logs, CloudWatch scans the text and looks for a valid _aws JSON object inside anywhere in the message. Thus, even if our log line has extra text before or after, as long as the EMF JSON is properly formatted, CloudWatch extracts it and publishes the metrics automatically.

An example of log with EMF JSON in it on CloudWatch.

After CloudWatch extracts the EMF block from our log message, it automatically turns it into a proper CloudWatch Metric. These metrics are then queryable just like any normal CloudWatch metric and thus available inside Grafana too, as shown in the screenshot below.

Metrics extracted from logs containing EMF JSON are automatically turned into metrics that can be visualised in Grafana just like any other metric.

As we can see, using EMF is easier as compared to going the CloudWatch SDK route because we do not need to change or add extra AWS infrastructure. With EMF, what our app does is just writing special JSON-format logs.

Then CloudWatch Metrics automatically extracts the metrics from those logs with EMF JSON. The entire process requires no new service, no special SDK code, and no CloudWatch PutMetric API calls.

Cost Optimisation with Logs vs Metrics

Logs are more expensive than metrics, especially when we are storing large amounts of data over time. This is also true when logs are stored at a higher retention rate and are more detailed, which means higher storage costs.

Metrics are cheaper to store because they are aggregated data points that do not require the same level of detail as logs.

CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name. However, compared to logs, metrics are still usually much cheaper at scale.

By embedding metrics into your log data via EMF, we are actually piggybacking metrics into logs, and letting CloudWatch extract metrics without duplicating effort. Thus, when using EMF, we will be paying for both, i.e.

Log ingestion and storage (for the raw logs);
The extracted custom metric (for the metric).

Hence, when we are leveraging EMF, we should consider expire logs faster if we only need the extracted metrics long-term.

Granularity and Sampling

Granularity refers to how frequent the metric data is collected. Fine granularity provides more detailed insights but can lead to increased data volume and costs.

Sampling is a technique to reduce the amount of data collected by capturing only a subset of data points (especially helpful in high-traffic systems). However, the challenge is ensuring that you maintain enough data to make informed decisions while keeping storage and processing costs manageable.

In our Orchard Core app above, currently the middleware that we implement will immediately PutMetricDataAsync to CloudWatch which will then not only slow down our API but it costs more because we need to pay when we send custom metrics to CloudWatch. Thus, we usually “buffer” the metrics first, and then batch-send periodically. This can be done with, for example, HostedService which is an ASP.NET Core background service, to flush metrics at interval.

using Amazon.CloudWatch;
using Amazon.CloudWatch.Model;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Options;
using System.Collections.Concurrent;

public class MetricsPublisher(
        IAmazonCloudWatch cloudWatch, 
        IOptions<MetricsOptions> options,
        ILogger<MetricsPublisher> logger) : BackgroundService
{
    private readonly ConcurrentBag<MetricDatum> _pendingMetrics = new();

    public void TrackMetric(string metricName, double value, string endpointPath)
    {
        _pendingMetrics.Add(new MetricDatum
        {
            MetricName = metricName,
            Value = value,
            Unit = StandardUnit.Count,
            Dimensions = new List<Dimension>
            {
                new Dimension 
                { 
                    Name = "Endpoint", 
                    Value = endpointPath
                }
            }
        });
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        logger.LogInformation("MetricsPublisher started.");
        while (!stoppingToken.IsCancellationRequested)
        {
            await Task.Delay(TimeSpan.FromSeconds(options.FlushIntervalSeconds), stoppingToken);
            await FlushMetricsAsync();
        }
    }

    private async Task FlushMetricsAsync()
    {
        if (_pendingMetrics.IsEmpty) return;

        const int MaxMetricsPerRequest = 1000;

        var metricsToSend = new List<MetricDatum>();
        var metricsCount = 0;
        while (_pendingMetrics.TryTake(out var datum))
        {
            metricsToSend.Add(datum);

            metricsCount += 1;
            if (metricsCount >= MaxMetricsPerRequest) break;
        }

        var request = new PutMetricDataRequest
        {
            Namespace = options.Namespace,
            MetricData = metricsToSend
        };

        int attempt = 0;
        while (attempt < options.MaxRetryAttempts)
        {
            try
            {
                await cloudWatch.PutMetricDataAsync(request);
                logger.LogInformation("Flushed {Count} metrics to CloudWatch.", metricsToSend.Count);
                break;
            }
            catch (Exception ex)
            {
                attempt++;
                logger.LogWarning(ex, "Failed to flush metrics. Attempt {Attempt}/{MaxAttempts}", attempt, options.MaxRetryAttempts);
                if (attempt < options.MaxRetryAttempts)
                    await Task.Delay(TimeSpan.FromSeconds(options.RetryDelaySeconds));
                else
                    logger.LogError("Max retry attempts reached. Dropping {Count} metrics.", metricsToSend.Count);
            }
        }
    }

    public override async Task StopAsync(CancellationToken cancellationToken)
    {
        logger.LogInformation("MetricsPublisher stopping.");
        await FlushMetricsAsync();
        await base.StopAsync(cancellationToken);
    }
}

In our Orchard Core API, each incoming HTTP request may run on a different thread. Hence, we need a thread-safe data structure like ConcurrentBag for storing the pending metrics.

Please take note that ConcurrentBag is designed to be an unordered collection. It does not maintain the order of insertion when items are taken from it. However, since the metrics we are sending, which is the counts of HTTP status codes, it does not matter in what order the requests were processed.

In addition, the limit of MetricData that we can send to CloudWatch per request is 1,000. Thus, we have the constant MaxMetricsPerRequest to help us make sure that we retrieve and remove at most 1,000 metrics from the ConcurrentBag.

Finally, we can inject MetricsPublisher to our middleware EndpointStatisticsMiddleware so that it can auto track every API request.

Wrap-Up

In this post, we started by setting up Grafana on EC2, connected it to CloudWatch to visualise ECS metrics. After that, we explored two ways, i.e. CloudWatch SDK and EMF log, to send custom app-level metrics from our Orchard Core app:

Whether we are monitoring system health or reporting on business KPIs, Grafana with CloudWatch offers a powerful observability stack that is both flexible and cost-aware.

References

From Design to Implementation: Crafting Headless APIs in Orchard Core with Apidog

Goh Chun Lin — Mon, 31 Mar 2025 10:47:27 +0000

Last month, I had the opportunity to attend an online meetup hosted by the local Microsoft MVP Dileepa Rajapaksa from the Singapore .NET Developers Community, where I was introduced to ApiDog.

During the session, Mohammad L. U. Tanjim, the Product Manager of ApiDog, gave a detailed walkthrough of the API-First design and how Apidog can be used for this approach.

Apidog helps us to define, test, and document APIs in one place. Instead of manually writing Swagger docs and using API tool separately, ApiDog combines everything. This means frontend developers can get mock APIs instantly, and backend developers as well as QAs can get clear API specs with automatic testing support.

Hence, for the customised headless APIs, we will adopt an API-First design approach. This approach ensures clarity, consistency, and efficient collaboration between backend and frontend teams while reducing future rework.

Session “Build APIs Faster and Together with Apidog, ASP.NET, and Azure” conducted by Mohammad L. U. Tanjim.

API-First Design Approach

By designing APIs upfront, we reduce the likelihood of frequent changes that disrupt development. It also ensures consistent API behaviour and better long-term maintainability.

For our frontend team, with a well-defined API specification, they can begin working with mock APIs, enabling parallel development. This eliminates dependencies where frontend work is blocked by backend completion.

For QA team, API spec will be important to them because it serve as a reference for automated testing. The QA engineers can validate API responses before implementation.

API Design Journey

In this article, we will embark on an API Design Journey by transforming a traditional travel agency in Singapore into an API-first system. To achieve this, we will use Apidog for API design and testing, and Orchard Core as a CMS to manage travel package information. Along the way, we will explore different considerations in API design, documentation, and integration to create a system that is both practical and scalable.

Many traditional travel agencies in Singapore still rely on manual processes. They store travel package details in spreadsheets, printed brochures, or even handwritten notes. This makes it challenging to update, search, and distribute information efficiently.

The reliance on physical posters and brochures of a travel agency is interesting in today’s digital age.

By introducing a headless CMS like Orchard Core, we can centralise travel package management while allowing different clients like mobile apps to access the data through APIs. This approach not only modernises the operations in the travel agency but also enables seamless integration with other systems.

API Design Journey 01: The Design Phase

Now that we understand the challenges of managing travel packages manually, we will build the API with Orchard Core to enable seamless access to travel package data.

Instead of jumping straight into coding, we will first focus on the design phase, ensuring that our API meets the business requirements. At this stage, we focus on designing endpoints, such as GET /api/v1/packages, to manage the travel packages. We also plan how we will structure the response.

Given the scope and complexity of a full travel package CMS, this article will focus on designing a subset of API endpoints, as shown in the screenshot below. This allows us to highlight essential design principles and approaches that can be applied across the entire API journey with Apidog.

Let’s start with eight simple endpoints.

For the first endpoint “Get all travel packages”, we design it with the following query parameters to support flexible and efficient result filtering, pagination, sorting, and text search. This approach ensures that users can easily retrieve and navigate through travel packages based on their specific needs and preferences.

GET /api/v1/packages?page=1&pageSize=20&sortBy=price&sortOrder=asc&destinationId=4&priceRange[min]=500&priceRange[max]=2000&rating=4&searchTerm=spa

Pasting the API path with query parameters to the Endpoint field will auto populate the Request Params section in Apidog.

Same with the request section, the Response also can be generated based on a sample JSON that we expect the endpoint to return, as shown in the following screenshot.

As shown in the Preview, the response structure can be derived from a sample JSON.

In the screenshot above, the field “description” is marked as optional because it is the only property that does not exist in all the other entry in “data”.

Besides the success status, we also need another important HTTP 400 status code which tells the client that something is wrong with their request.

By default, for generic error responses like HTTP 400, there are response components that we can directly use in Apidog.

The reason why we need HTTP 400 is that, instead of processing an invalid request and returning incorrect or unexpected results, our API should explicitly reject it, ensuring that the client knows what needs to be fixed. This improves both developer experience and API reliability.

After completing the endpoint for getting all travel packages, we also have another POST endpoint to search travel packages.

While GET is the standard method for retrieving data from an API, complex search queries involving multiple parameters, filters, or file uploads might require the use of a POST request. This is particularly true when dealing with advanced search forms or large amounts of data, which cannot be easily represented as URL query parameters. In these cases, POST allows us to send the parameters in the body of the request, ensuring the URL remains manageable and avoiding URL length limits.

For example, let’s assume this POST endpoint allows us to search for travel packages with the following body.

{
    "destination": "Singapore",
    "priceRange": {
        "min": 500,
        "max": 2000
    },
    "rating": 4,
    "amenities": ["pool", "spa"],
    "files": [
        {
            "fileType": "image",
            "file": "base64-encoded-image-content"
        }
    ]
}

We can also easily generate the data schema for the body by pasting this JSON as example into Apidog, as shown in the screenshot below.

Setting up the data schema for the body of an HTTP POST request.

When making an HTTP POST request, the client sends data to the server. While JSON in the request body is common, there is also another format used in APIs, i.e. multipart/form-data (also known as form-data).

The form-data is used when the request body contains files, images, or binary data along with text fields. So, if our endpoint /api/v1/packages/{id}/reviews allows users to submit both text (review content and rating) and an image, using form-data is the best choice, as demonstrated in the following screenshot.

Setting up a request body which is multipart/form-data in Apidog.

API Design Journey 02: Prototyping with Mockups

When designing the API, it is common to debate, for example, whether reviews should be nested inside packages or treated as a separate resource. By using Apidog, we can quickly create mock APIs for both versions and tested how they would work in different use cases. This helps us make a data-driven decision instead of endless discussions.

Once our endpoint is created, Apidog automatically generates a mock API based on our defined API spec, as shown in the following screenshot.

A list of mock API URLs for our “Get all travel packages” endpoint.

Clicking on the “Request” button next to each of the mock API URL will bring us to the corresponding mock response, as shown in the following screenshot.

Default mock response for HTTP 200 of our first endpoint “Get all travel packages”.

As shown in the screenshot above, some values in the mock response are not making any sense, for example negative id and destinationId, rating which is supposed to be between 1 and 5, “East” as sorting direction, and so on. How could we fix them?

Firstly, we will set the id (and destinationId) to be any positive integer number starting from 1.

Setting id to be a positive integer number starting from 1.

Secondly, we update both the price and rating to be float. In the following screenshot, we specify that the rating can be any float from 1.0 to 5.0 with single fraction digit.

Apidog is able to generate an example based on our condition under “Preview”.

Finally, we will indicate that the sorting direction can only be either ASC or DESC, as shown in the following screenshot.

Configuring the possible value for the direction field.

With all the necessary mock values configuration, if we fetch the mock response again, we should be able to get a response with more reasonable values, as demonstrated in the screenshot below.

Now the mock response looks more reasonable.

With the mock APIs, our frontend developers will be able to start building UI components without waiting for the backend to be completed. Also, as shown above, a mock API responds instantly, unlike real APIs that depend on database queries, authentication, or network latency. This makes UI development and unit testing faster.

Speaking of testing, some test cases are difficult to create with a real API. For example, what if an API returns an error (500 Internal Server Error)? What if there are thousands of travel packages? With a mock API, we can control the responses and simulate rare cases easily.

In addition, Apidog supports returning different mock data based on different request parameters. This makes the mock API more realistic and useful for developers. This is because if the mock API returns static data, frontend developers may only test one scenario. A dynamic mock API allows testing of various edge cases.

For example, our travel package API allows admins to see all packages, including unpublished ones, while regular users only see public packages. We thus can setup in such a way that different bearer token will return different set of mock data.

We are setting up the endpoint to return drafts when a correct admin token is provided in the request header with Mock Expectation.

With Mock Expectation feature, Apidog can return custom responses based on request parameters as well. For instance, it can return normal packages when the destinationId is 1 and trigger an error when the destinationId is 2.

API Design Journey 03: Documenting Phase

With endpoints designed properly in earlier two phases, we can now proceed to create documentation which is offers a detailed explanation of the endpoints in our API. This documentation will include the information such as HTTP methods, request parameters, and response formats.

Fortunately, Apidog makes the documentation process smooth by integrating well within the API ecosystem. It also makes sharing easy, letting us export the documentation in formats like OpenAPI, HTML, and Markdown.

Apidog can export API spec in formats like OpenAPI, HTML, and Markdown.

We can also export our documentation on folder basis to OpenAPI Specification in Overview, as shown below.

Custom export configuration for OpenAPI Specification.

We can also export the data as an offline document. Just click on the “Open URL” or “Permalink” button to view the raw JSON/YAML content directly in the Internet browser. We then can place the raw content into the Swagger Editor to view the Swagger UI of our API, as demonstrated in the following screenshot.

The exported content from Apidog can be imported to Swagger Editor directly.

Let’s say now we need to share the documentation with our team, stakeholders, or even the public. Our documentation thus needs to be accessible and easy to navigate. That is where exporting to HTML or Markdown comes in handy.

Documentation is Markdown format, generated by Apidog.

Finally, Apidog also allows us to conveniently publish our API documentation as a webpage. There are two options: Quick Share , for sharing parts of the docs with collaborators, and Publish Docs , for making the full documentation publicly available.

Quick Share is great for API collaborators because we can set a password for access and define an expiration time for the shared documentation. If no expiration is set, the link stays active indefinitely.

API spec presented as a website and accessible by the collaborators. It also enables collaborators to generate client code for different languages.

API Design Journey 04: The Development Phase

With our API fully designed, mocked, and documented, it is time to bring it to life with actual code. Since we have already defined information such as the endpoints, request format, and response formats, implementation becomes much more straightforward. Now, let’s start building the backend to match our API specifications.

Orchard Core generally supports two main approaches for designing APIs, i.e. Headless and Decoupled.

In the headless approach, Orchard Core acts purely as a backend CMS, exposing content via APIs without a frontend. The frontend is built separately.

In the decoupled approach, Orchard Core still provides APIs like in the headless approach, but it also serves some frontend rendering. It is a hybrid approach because we use Razor Pages some parts of the UI are rendered by Orchard, while others rely on APIs.

So in fact, we can combine the good of both approaches so that we can build a customised headless APIs on Orchard Core using services like IOrchardHelper to fetch content dynamically and IContentManager to allow us full CRUD operations on content items. This is in fact the approach mentioned in the Orchard Core Basics Companion (OCBC) documentation.

For the endpoint of getting a list of travel packages, i.e. /api/v1/packages, we can define it as follows.

[ApiController]
[Route("api/v1/packages")]
public class PackageController(
    IOrchardHelper orchard,
    ...) : Controller
{
    [HttpGet]
    public async Task<IActionResult> GetTravelPackages()
    {
        var travelPackages = await orchard.QueryContentItemsAsync(q => 
            q.Where(c => c.ContentType == "TravelPackage"));

        ...

        return Ok(travelPackages);
    }

    ...
}

In the code above, we are using Orchard Core Headless CMS API and leveraging IOrchardHelper to query content items of type “TravelPackage”. We are then exposing a REST API (GET /api/v1/packages) that returns all travel packages stored as content items in the Orchard Core CMS.

API Design Journey 05: Testing of Actual Implementation

Let’s assume our Dev Server Base URL is localhost. This URL is set as a variable in the Develop Env, as shown in the screenshot below.

Setting Base URL for Develop Env on Apidog.

With the environment setup, we can now proceed to run our endpoint under that environment. As shown in the following screenshot, we are able to immediately validate the implementation of our endpoint.

Validated the GET endpoint under Develop Env.

The screenshot above shows that through API Validation Testing, the implementation of that endpoint has met all expected requirements.

API validation tests are not just for simple checks. The feature is great for handling complex, multi-step API workflows too. With them, we can chain multiple requests together, simulate real-world scenarios, and even run the same requests with different test data. This makes it easier to catch issues early and keep our API running smoothly.

Populate testing steps based on our API spec in Apidog.

In addition, we can also set up Scheduled Tasks, which is still in Beta now, to automatically run our test scenarios at specific times. This helps us monitor API performance, catch issues early, and ensure everything works as expected automatically. Plus, we can review the execution results to stay on top of any failures.

Result of running one of the endpoints on Develop Env.

Wrap-Up

Throughout this article, we have walked through the process of designing, mocking, documenting, implementing, and testing a headless API in Orchard Core using Apidog. By following an API-first approach, we ensure that our API is well-structured, easy to maintain, and developer-friendly.

With this approach, teams can collaborate more effectively, reduce friction in development. Now that the foundation is set, the next step could be integrating this API into a frontend app, optimising our API performance, or automating even more tests.

Finally, with .NET 9 moving away from built-in Swagger UI, developers now have to find alternatives to set up API documentation. As we can see, Apidog offers a powerful alternative, because it combines API design, testing, and documentation in one tool. It simplifies collaboration while ensuring a smooth API-first design approach.

Automate Orchard Core Deployment on AWS ECS with CloudFormation

Goh Chun Lin — Sun, 09 Mar 2025 08:56:40 +0000

For .NET developers looking for Content Management System (CMS) solution, Orchard Core presents a compelling, open-source option. Orchard Core is a CMS built on ASP.NET Core. When deploying Orchard Core on AWS, the Elastic Container Service (ECS) provides a good hosting platform that can handle high traffic, keep costs down, and remain stable.

However, finding clear instructions for deploying Orchard Core to ECS end-to-end can be difficult. This may require us to do more testing and troubleshooting, and potentially lead to a less efficient or secure setup. A lack of a standard deployment process can also complicate infrastructure management and hinder the implementation of CI/CD. This is where Infrastructure as Code (IaC) comes in.

Source Code

The complete CloudFormation template we built in this article is available on GitHub: https://github.com/gcl-team/Experiment.OrchardCore.Main/blob/main/Infrastructure.yml

CloudFormation

IaC provides a solution for automating infrastructure management. With IaC, we define our entire infrastructure which hosts Orchard Core setup as code. This code can then be version-controlled, tested, and deployed just like application code.

CloudFormation is an AWS service that implements IaC. By using CloudFormation, AWS automatically provisions and configures all the necessary resources for our Orchard Core hosting, ensuring consistent and repeatable deployments across different environments.

This article is for .NET developers who know a bit about AWS concepts such as ECS or CloudFormation. We’ll demonstrate how CloudFormation can help to setup the infrastructure for hosting Orchard Core on AWS.

The desired infrastructure of our CloudFormation setup.

Now let’s start writing our CloudFormation as follows. We start by defining some useful parameters that we will be using later. Some of the parameters will be discussed in the following relevant sections.

AWSTemplateFormatVersion: '2010-09-09'
Description: "Infrastructure for Orchard Core CMS"

Parameters:
  VpcCIDR:
    Type: String
    Description: "VPC CIDR Block"
    Default: 10.0.0.0/16
    AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
  ApiGatewayStageName:
    Type: String
    Default: "production"
    AllowedValues:
      - production
      - staging
      - development
  ServiceName:
    Type: String
    Default: cld-orchard-core
    Description: "The service name"
  CmsDBName:
    Type: String
    Default: orchardcorecmsdb
    Description: "The name of the database to create"
  CmsDbMasterUsername:
    Type: String
    Default: orchardcoreroot
  HostedZoneId:
    Type: String
    Default: _ **<your Route 53 hosted zone id>** _
  HostedZoneName:
    Type: String
    Default: _ **<your custom domain>** _
  CmsHostname:
    Type: String
    Default: orchardcms
  OrchardCoreImage:
    Type: String
    Default: **_<your ECR link>_** /orchard-core-cms:latest
  EcsAmi:
    Description: The Amazon Machine Image ID used for the cluster
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id

Dockerfile

The Dockerfile is quite straightforward.

# Global Arguments
ARG DCR_URL=mcr.microsoft.com
ARG BUILD_IMAGE=${DCR_URL}/dotnet/sdk:8.0-alpine
ARG RUNTIME_IMAGE=${DCR_URL}/dotnet/aspnet:8.0-alpine

# Build Container
FROM ${BUILD_IMAGE} AS builder
WORKDIR /app

COPY . .

RUN dotnet restore
RUN dotnet publish ./OCBC.HeadlessCMS/OCBC.HeadlessCMS.csproj -c Release -o /app/src/out

# Runtime Container
FROM ${RUNTIME_IMAGE}

## Install cultures
RUN apk add --no-cache \
   icu-data-full \
   icu-libs

ENV ASPNETCORE_URLS http://*:5000

WORKDIR /app

COPY --from=builder /app/src/out .

EXPOSE 5000

ENTRYPOINT ["dotnet", "OCBC.HeadlessCMS.dll"]

With the Dockerfile, we then can build the Orchard Core project locally with the command below.

docker build --platform=linux/amd64 -t orchard-core-cms:v1 .

The --platform flag specifies the target OS and architecture for the image being built. Even though it is optional, it is particularly useful when building images on a different platform (like macOS or Windows) and deploying them to another platform (like Amazon Linux) that has a different architecture.

ARM-based Apple Silicon was announced in 2020. (Image Credit: The Verge)

I am using macOS with ARM-based Apple Silicon, whereas Amazon Linux AMI uses amd64 (x86_64) architecture. Hence, if I do not specify the platform, the image I build on my Macbook will be incompatible with EC2 instance.

Once the image is built, we will push it to the Elastic Container Registry (ECR).

We choose ECR because it is directly integrated with ECS, which means deploying images from ECR to ECS is smooth. When ECS needs to pull an image from ECR, it automatically uses the IAM role to authenticate and authorise the request to ECR. The execution role of our ECS is associated with the AmazonECSTaskExecutionRolePolicy IAM policy, which allows ECS to pull images from ECR.

ECR also comes with built-in support for image scanning, which automatically scans our images for vulnerabilities.

Image scanning in ECR helps ensure our images are secure before we deploy them.

Unit 01: IAM Role

Technically, we are able to run Orchard Core on ECS without any ECS task role. However, that is possible only if our Orchard Core app does not need to interact with AWS services. Not only for our app, but actually most of the modern web apps, we always need to integrate our app with AWS services such as S3, CloudWatch, etc. Hence, the first thing that we need to work on is setting up an ECS task role.

iamRole:
  Type: AWS::IAM::Role
  Properties:
    RoleName: !Sub "${AWS::StackName}-ecs"
    Path: !Sub "/${AWS::StackName}/"
    AssumeRolePolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Principal:
            Service:
              - ecs-tasks.amazonaws.com
          Action:
            - sts:AssumeRole

In AWS IAM, permissions are assigned to roles, not directly to the services that need them. Thus, we cannot directly assign IAM policies to ECS tasks. Instead, we assign those policies to a role, and then the ECS task temporarily assumes that role to gain those permissions, as shown in the configuration above.

Roles are considered temporary because they are only assumed for the duration that the ECS task needs to interact with AWS resources. Once the ECS task stops, the temporary permissions are no longer valid, and the service loses access to the resources.

Hence, by using roles and AssumeRole, we follow the principle of least privilege. The ECS task is granted only the permissions it needs and can only use them temporarily.

Unit 02: CloudWatch Log Group

ECS tasks, by default, do not have logging enabled.

Hence, assigning a role to our ECS task for logging to CloudWatch Logs is definitely one of the first roles we should assign when setting up ECS tasks. Setting logging up early helps to avoid surprises later on when our ECS tasks are running.

To setup the logging, we first need to specify Log Group, a place in CloudWatch that logs go. While ECS itself can create the log group automatically when the ECS task starts (if it does not already exist), it is a good practice to define the log group in CloudFormation to ensure it exists ahead of time and can be managed within our IaC.

ecsLogGroup:
  Type: AWS::Logs::LogGroup
  Properties:
    LogGroupName: !Sub "/ecs/${ServiceName}-log-group"
    RetentionInDays: 3
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

The following policy will grant the necessary permissions to write logs to CloudWatch.

ecsLoggingPolicy:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Sub "${AWS::StackName}-cloudwatch-logs-policy"
    Roles:
      - !Ref iamRole
    PolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Action:
            - logs:CreateLogStream
            - logs:PutLogEvents
          Resource:
            - !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/ecs/${ServiceName}-log-group/*"

By separating the logging policy into its own resource, we make it easier to manage and update policies independently of the ECS task role. After defining the policy, we attach it to the ECS task role by referencing it in the Roles section.

The logging setup helps us consolidate log events from the container into a centralised log group in CloudWatch.

Unit 03: S3 Bucket

We will be storing the files uploaded to the Orchard Core through its Media module on Amazon S3. So, we need to configure our S3 Bucket as follows.

mediaContentBucket:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: !Join
      - '-'
      - - !Ref ServiceName
        - !Ref AWS::Region
        - !Ref AWS::AccountId
    OwnershipControls:
      Rules:
        - ObjectOwnership: BucketOwnerPreferred
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Since bucket names must be globally unique, we dynamically create it using AWS Region and AWS Account ID.

Since our Orchard Core can be running in multiple ECS tasks that upload media files to a shared S3 bucket, the BucketOwnerPreferred setting ensures that even if media files are uploaded by different ECS tasks, the owner of the S3 bucket can still access, delete, or modify any of those media files without needing additional permissions for each uploaded object.

The bucket owner having full control is a security necessity in many cases because it allows the owner to apply policies, access controls, and auditing in a centralised way, maintaining the security posture of the bucket.

However, even if the bucket owner has control, the principle of least privilege should still apply. For example, only the ECS task responsible for Orchard Core should be allowed to interact with the media objects.

mediaContentBucketPolicy:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Sub "${mediaContentBucket}-s3-policy"
    Roles:
      - !Ref iamRole
    PolicyDocument:
      Version: 2012-10-17
      Statement:
        - Effect: Allow
          Action:
            - s3:ListBucket
          Resource: !GetAtt mediaContentBucket.Arn
        - Effect: Allow
          Action:
            - s3:PutObject
            - s3:GetObject
          Resource: !Join ["/", [!GetAtt mediaContentBucket.Arn, "*"]]

Keeping the s3:ListBucket permission in the policy is a necessary permission for Orchard Core Media module to work properly. Meanwhile, both s3:PutObject and s3:GetObject are used for uploading and downloading media files.

IAM Policy

Now, let’s pause a while to talk about the policies that we have added above for the log group and S3.

In AWS, we mostly deal with managed policies and inline policies depending on whether the policy needs to be reused or tightly scoped to one role.

We use AWS::IAM::ManagedPolicy when the permission needs to be reused by multiple roles or services. So it is frequently used in company-wide security policies. Thus it is not suitable for our Orchard Core examples above. Instead, we use AWS::IAM::Policy because it is for a permission which is tightly connected to a single role and will not be reused elsewhere.

In addition, since AWS::IAM::Policy is tightly tied to entities, it will be deleted when the corresponding entities are deleted. This is a key difference from AWS::IAM::ManagedPolicy, which remains even if the entities that use it are deleted. This explains why managed policy is used in company-wide policies because managed policy provides better long-term management for permissions that may be reused across multiple roles.

We can summarise the differences between two of them into the following table.

Unit 04: Aurora Database Cluster

Orchard Core supports Relational DataBase Management System (RDBMS). Unlike traditional CMS platforms that rely on a single database engine, Orchard Core offers flexibility by supporting multiple RDBMS options, including:

Microsoft SQL Server;
PostgreSQL;
MySQL;
SQLite.

While SQLite is lightweight and easy to use, it is not suitable for production deployments on AWS. SQLite is designed for local storage, not multi-user concurrent access. On AWS, there are fully managed relational databases (RDS and Aurora) provided instead.

The database engines supported by Amazon RDS and Amazon Aurora.

While Amazon RDS is a well-known choice for relational databases, we can also consider Amazon Aurora, which was launched in 2014. Unlike traditional RDS, Aurora automatically scales up and down, reducing costs by ensuring we only pay for what we use.

High performance and scalability of Amazon Aurora. (Image Source: Amazon Aurora MySQL PostgreSQL Features)

In addition, Aurora is faster than standard PostgreSQL and MySQL, as shown in the screenshot above. It also offers built-in high availability with Multi-AZ replication. This is critical for a CMS like Orchard Core, which relies on fast queries and efficient data handling.

It is important to note that, while Aurora is optimised for AWS, it does not lock us in, as we retain full control over our data and schema. Hence, if we ever need to switch, we can export data and move to standard MySQL/PostgreSQL on another cloud or on-premises.

Instead of manually setting up Aurora, we will be using CloudFormation to ensure that the correct database instance, networking, security settings, and additional configurations are managed consistently.

Aurora is cluster-based rather than standalone DB instances like traditional RDS. Thus, instead of a single instance, we deploy a DB cluster, which consists of a primary writer node and multiple reader nodes for scalability and high availability.

Because of this cluster-based architecture, Aurora does not use the usual DBParameterGroup like standalone RDS instances. Instead, it requires a DBClusterParameterGroup to apply settings at the cluster level , ensuring all instances in the cluster inherit the same configuration, as shown in the following Cloudformation template.

cmsDBClusterParameterGroup:
  Type: AWS::RDS::DBClusterParameterGroup
  Properties:
    Description: "Aurora Provisioned Postgres DB Cluster Parameter Group"
    Family: aurora-postgresql16
    Parameters:
      timezone: UTC # Ensures consistent timestamps
      rds.force_ssl: 1 # Enforce SSL for security

The first parameter we configure is the timezone. We set it to UTC to ensure consistency. So when we store date-time values in the database, we should use TIMESTAMPTZ for timestamps, and store the time zone as a TEXT field. After that, when we need to display the time in a local format, we can use the AT TIME ZONE feature in PostgreSQL to convert from UTC to the desired local time zone. This is important because PostgreSQL returns all times in UTC, so storing the time zone ensures we can always retrieve and present the correct local time when needed, as shown in the query below.

SELECT event_time_utc AT TIME ZONE timezone AS event_local_time
FROM events;

After that, we enabled the rds.force_ssl so that all connections to our Aurora are encrypted using SSL. This is necessary to prevent data from being sent in plaintext. Even if our Aurora database is behind a bastion host, enforcing SSL connections is still recommended because SSL ensures the encryption of all data in transit, adding an extra layer of security. It is also worth mentioning that enabling SSL does not negatively impact performance much, but it adds a significant security benefit.

Once the DBClusterParameterGroup is configured, the next step is to configure the AWS::RDS::DBCluster resource, where we will define the cluster main configuration with the parameter group defined above.

cmsDatabaseCluster:
  Type: AWS::RDS::DBCluster
  Properties:
    BackupRetentionPeriod: 7  
    DatabaseName: !Ref CmsDBName
    DBClusterIdentifier: !Ref AWS::StackName
    DBClusterParameterGroupName: !Ref cmsDBClusterParameterGroup
    DeletionProtection: true
    Engine: aurora-postgresql
    EngineMode: provisioned
    EngineVersion: 16.1
    MasterUsername: !Ref CmsDbMasterUsername
    MasterUserPassword: !Sub "{{resolve:ssm-secure:/OrchardCoreCms/DbPassword:1}}"
    DBSubnetGroupName: !Ref cmsDBSubnetGroup
    VpcSecurityGroupIds:
      - !GetAtt cmsDBSecurityGroup.GroupId
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Let’s go through the Properties.

About BackupRetentionPeriod

The BackupRetentionPeriod parameter in the Aurora DB cluster determines how many days automated backups are retained by AWS. It can be from a minimum of 1 day to a maximum of 35 days for Aurora databases. For most business applications, 7 days of backups is often enough to handle common recovery scenarios unless we are required by law or regulation to keep backups for a certain period.

Aurora automatically performs incremental backups for our database every day, which means that it does not back up the entire database each time. Instead, it only stores the changes since the previous backup. This makes the backup process very efficient, especially for databases with little or no changes over time. If our CMS database remains relatively static, then the backup storage cost will remain very low or even free as long as our total backup data for the whole retention period does not exceed the storage capacity of our database.

So the total billed usage for backup depends on how much data is being changed each day, and whether the total backup size exceeds the volume size. If our database does not experience massive daily changes, the backup storage will likely remain within the database size and be free.

About DBClusterIdentifier

For the DBClusterIdentifier, we set it to the stack name, which makes it unique to the specific CloudFormation stack. This can be useful for differentiating clusters.

About DeletionProtection

In production environments, data loss or downtime is critical. DeletionProtection ensures that our CMS DB cluster will not be deleted unless it is explicitly disabled. There is no “shortcut” to bypass it for production resources. If DeletionProtection is enabled on the DB cluster, even CloudFormation will fail to delete the DB cluster. The only way to delete the DB cluster is that we disable DeletionProtection first via the AWS Console, CLI or SDK.

About EngineMode

In Aurora, EngineMode refers to the database operational mode. There are two primary modes, i.e. Provisioned and Serverless. For Orchard Core, Provisioned mode is typically the better choice because the mode ensures high availability, automatic recovery, and read scaling. Hence, if the CMS is going to have a consistent level of traffic, Provisioned mode will be able to handle that load. Serverless is useful if our CMS workload has unpredictable traffic patterns or usage spikes.

About MasterUserPassword

Storing database passwords directly in the CloudFormation template is a security risk.

There are a few other ways to handle sensitive data like passwords in CloudFormation, for example using AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store.

AWS Secrets Manager is a more advanced solution that offers automatic password rotation , which is useful for situations where we need to regularly rotate credentials. However, it may incur additional costs.

On the other hand, SSM Parameter Store provides a simpler and cost-effective solution for securely storing and referencing secrets, including database passwords. We can store up to 10,000 parameters (standard type) without any cost.

Hence, we need to use SSM Parameter Store to securely store the database password and reference it in CloudFormation without exposing it directly in our template, reducing the security risks and providing an easier management path for our secrets.

Database password is stored as a SecureString in Parameter Store.

About DBSubnetGroupName and VpcSecurityGroupIds

These two configurations about Subnet and VPC will involve networking considerations. We will discuss further when we dive into the networking setup later.

Unit 05: Aurora Database Instance

Now that we have covered the Aurora DB cluster, which is the overall container for the database, let’s move on to the DB instance.

Think of the cluster as the foundation, and the DB instances are where the actual database operations take place. The DB instances are the ones that handle the read and write operations, replication, and scaling for the workload. So, in order for our CMS to work correctly, we need to define the DB instance configuration, which runs on top of the DB cluster.

cmsDBInstance:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Retain
  Properties:
    DBInstanceIdentifier: !Sub "${AWS::StackName}-db-instance"
    DBInstanceClass: db.t4g.medium
    DBClusterIdentifier: !Ref cmsDatabaseCluster
    DBSubnetGroupName: !Ref cmsDBSubnetGroup
    Engine: aurora-postgresql
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

For our Orchard Core CMS, we do not expect very high traffic or intensive database operations. Hence, we choose to use db.t4g. T4g database instances are AWS Graviton2-based, thus they are more cost-efficient than traditional instance types, especially for workloads like a CMS that does not require continuous high performance. However, there are a few things we make need to look into when using T instance classes.

Unit 06: Virtual Private Cloud (VPC)

Now that we have covered how the Aurora cluster and instance work, the next important thing is ensuring they are deployed in a secure and well-structured network. This is where the Virtual Private Cloud (VPC) comes in.

VPC is a virtual network in AWS where we define the infrastructure networking. It is like a private network inside AWS where we can control IP ranges, subnets, routing, and security.

The default VPC in Malaysia region.

By the way, you might have noticed that AWS automatically provides a default VPC in every region. It is a ready-to-use network setup that allows us to launch resources without configuring networking manually.

While it is convenient, it is recommended not to use the default VPC. This is because the default VPC is automatically created with predefined settings, which means we do not have full control over its configuration, such as subnet sizes, routing, security groups, etc. It also has public subnets by default which can accidentally expose internal resources to the Internet.

Since we are setting up our own VPC, one key decision we need to make is the CIDR block, i.e. the range of private IPs we allocate to our network. This is important because it determines how many subnets and IP addresses we can have within our VPC.

To future-proof our infrastructure, we will be using a /16 CIDR block, as shown in the VpcCIDR in our CloudFormation template. This gives us 65,536 IP addresses, which we can break into 64 subnets of /22 (each having 1,024 IPs). 64 subnets is usually more than enough for a well-structured VPC because most companies do not even need so many subnets in a single VPC unless they have very complex workloads. Just in case if one service needs more IPs, we can allocate a larger subnet, for example /21 instead of /22.

In the VPC setup, we are also trying to avoid creating too many VPCs unnecessarily. Managing multiple VPCs means handling VPC peering which increases operational overhead.

vpc:
  Type: AWS::EC2::VPC
  Properties:
    CidrBlock: !Ref VpcCIDR
    InstanceTenancy: default
    EnableDnsSupport: true
    EnableDnsHostnames: true
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-vpc"

Since our ECS workloads and Orchard Core CMS are public-facing, we need EnableDnsHostnames: true so that public-facing instances get a public DNS name. We also need EnableDnsSupport: true to allow ECS tasks, internal services, and AWS resources like S3 and Aurora to resolve domain names internally.

For InstanceTenancy, which determines whether instances in our VPC run on shared (default) or dedicated hardware, it is recommended to use the default because AWS automatically places instances on shared hardware, which is cost-effective and scalable. We only need to change it if we are asked to use dedicated instances with full hardware isolation.

Now that we have defined our VPC, the next step is planning its subnet structure. We need both public and private subnets for our workloads.

Unit 07: Subnets and Subnet Groups

For our VPC with a /16 CIDR block, we will be breaking it into /24 subnets for better scalability:

Public Subnet 1: 10.0.0.0/24
Public Subnet 2: 10.0.1.0/24
Private Subnet 1: 10.0.2.0/24
Private Subnet 2: 10.0.3.0/24

Instead of manually specifying CIDRs, we will let CloudFormation automatically calculates the CIDR blocks for public and private subnets using !Select and !Cidr, as shown below.

# Public Subnets
publicSubnet1:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.0.0/24
    AvailabilityZone: !Select [0, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-public-subnet-1"

publicSubnet2:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.1.0/24
    AvailabilityZone: !Select [1, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-public-subnet-2"

# Private Subnets
privateSubnet1:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.2.0/24
    AvailabilityZone: !Select [0, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-private-subnet-1"

privateSubnet2:
  Type: AWS::EC2::Subnet
  Properties:
    VpcId: !Ref vpc
    CidrBlock: 10.0.3.0/24
    AvailabilityZone: !Select [1, !GetAZs '']
    Tags:
      - Key: Name
        Value: !Sub "${AWS::AccountId}-${AWS::Region}-private-subnet-2"

For availability zones (AZs), all commercial AWS regions have at least two AZs, with most having three or more. Hence, we do not need to worry about the assignment of !Select [1, !GetAZs ''] in the template above will fail.

Now with our subnets setup, we can revisit the DBSubnetGroupName in Aurora cluster and instance. Aurora clusters are highly available, and AWS recommends placing Aurora DB instances across multiple AZs to ensure redundancy and better fault tolerance. The Subnet Group allows us to define the subnets where Aurora will deploy its instances, which enables the multi-AZ deployment for high availability.

cmsDBSubnetGroup:
  Type: AWS::RDS::DBSubnetGroup
  Properties:
    DBSubnetGroupDescription: "Orchard Core CMS Postgres DB Subnet Group"
    SubnetIds:
      - !Ref privateSubnet1
      - !Ref privateSubnet2
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Unit 08: Security Groups

Earlier, we configured the Subnet Group for Aurora, which defines which subnets the Aurora instances will reside in. Now, we need to ensure that only authorised systems or services can access our database. That is where the Security Group cmsDBSecurityGroup comes into play.

A Security Group acts like a virtual firewall that controls inbound and outbound traffic to our resources, such as our Aurora instances. It is like setting permissions to determine which IP addresses and which ports can communicate with the database.

For Aurora, we will configure the security group to only allow traffic from our private subnets, so that only trusted services within our VPC can reach the database.

cmsDBSecurityGroup:
  Type: AWS::EC2::SecurityGroup
  Properties:
    GroupName: !Sub "${CmsDBName}-security-group"
    GroupDescription: "Permits Access To CMS Aurora Database"
    VpcId: !Ref vpc
    SecurityGroupIngress:
    - CidrIp: !GetAtt privateSubnet1.CidrBlock
      IpProtocol: tcp
      FromPort: 5432
      ToPort: 5432
    - CidrIp: !GetAtt privateSubnet2.CidrBlock
      IpProtocol: tcp
      FromPort: 5432
      ToPort: 5432
    Tags:
      - Key: Name
        Value: !Sub "${CmsDBName}-security-group"
      - Key: Stack
        Value: !Ref AWS::StackName

Here we only setup security group for ingress but not egress because AWS security groups, by default, allow all outbound traffic.

Unit 09: Elastic Load Balancing (ELB)

Before diving into how we host Orchard Core on ECS, let’s first figure out how traffic will reach our ECS service. In modern cloud web app development and hosting, three key factors matter: reliability , scalability , and performance. And that is why a load balancer is essential.

Reliability – If we only have one container and it crashes, the whole app goes down. A load balancer allows us to run multiple containers so that even if one fails, the others keep running.
Scalability – As traffic increases, a single container will not be enough. A load balancer lets us add more containers dynamically when needed, ensuring smooth performance.
Performance – Handling many requests in parallel prevents slowdowns. A load balancer efficiently distributes traffic to multiple containers, improving response times.

For that, we need an Elastic Load Balancing (ELB) to distribute requests properly.

AWS originally launched ELB with only Classic Load Balancers (CLB). Later, AWS completely redesigned its load balancing services and introduced the following in ElasticLoadBalancingV2:

Network Load Balancer (NLB);
Application Load Balancer (ALB);
Gateway Load Balancer (GLB).

Summary of differences: ALB vs. NLB vs. GLB (Image Source: AWS)

NLB is designed for high performance, low latency, and TCP/UDP traffic, which makes it perfect for situations like ours, where we are dealing with an Orchard Core CMS web app. NLB is optimised for handling millions of requests per second and is ideal for routing traffic to ECS containers.

ALB is usually better suited for HTTP/HTTPS traffic. ALB offers more advanced routing features for HTTP. Since we are mostly concerned with handling general traffic to ECS, NLB is simpler and more efficient.

GLB works well if we manage traffic between cloud and on-premises environments or across different regions, which does not apply to our use case here.

Configure NLB

Setting up an NLB in AWS always involves these three key components:

AWS::ElasticLoadBalancingV2::LoadBalancer;
AWS::ElasticLoadBalancingV2::TargetGroup;
AWS::ElasticLoadBalancingV2::Listener.

Firstly, LoadBalancer distributes traffic across multiple targets such as ECS tasks.

internalNlb:
  Type: AWS::ElasticLoadBalancingV2::LoadBalancer
  Properties:
    Name: !Sub "${ServiceName}-private-nlb"
    Scheme: internal
    Type: network
    Subnets:
      - !Ref privateSubnet1
      - !Ref privateSubnet2
    LoadBalancerAttributes:
      - Key: deletion_protection.enabled
        Value: "true"
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

In the template above, we create a NLB (Type: network) that is not exposed to the public internet (Scheme: internal). It is deployed across two private subnets, ensuring high availability. Finally, to prevent accidental deletion, we enable the deletion protection. In the future, we must disable it before we can delete the NLB.

Please take note that we do not enable Cross-Zone Load Balancing here because AWS charges for inter-AZ traffic. Also, since we are planning each AZ to have the same number of targets , disabling cross-zone helps preserve optimal routing.

Secondly, we need to setup TargetGroup to tell the NLB to send traffic to our ECS tasks running Orchard Core CMS.

nlbTargetGroup:
  Type: AWS::ElasticLoadBalancingV2::TargetGroup
  DependsOn:
    - internalNlb
  Properties:
    Name: !Sub "${ServiceName}-target-group"
    Port: 80
    Protocol: TCP
    TargetType: instance
    VpcId: !Ref vpc
    HealthCheckProtocol: HTTP
    HealthCheckPort: 80
    HealthCheckPath: /health
    TargetGroupAttributes:
      - Key: deregistration_delay.timeout_seconds
        Value: 10
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

Here, we indicate that the TargetGroup is listening on port 80 and expects TCP traffic. TargetType: instance means NLB will send traffic directly to EC2 instances that are hosting our ECS tasks. We also link it to our VPC to ensure traffic stays within our network.

Even though the NLB uses TCP at the transport layer, it performs health checks at the application layer (HTTP). This ensures that the NLB can intelligently route traffic only to instances that are responding correctly to the application-level health check endpoint. Our choice of HTTP for the health check protocol instead of TCP is because the Orchard Core running on ECS is listening on port 80 and exposing an HTTP health check endpoint /health. By using HTTP for health checks, we can ensure that the NLB can detect not only if the server is up but also if the Orchard Core is functioning correctly.

We also setup Deregistration Delay to be 10 seconds. Thus, when an ECS task is stopped or removed, the NLB waits 10 seconds before fully removing it. This helps prevent dropped connections by allowing any in-progress requests to finish. We can keep 10 for now if the CMS does not have long requests. However, when we start to notice 502/503 errors when deploying updates, we should increase it to 30 or more.

In addition, normally, a Target Group checks if the app is healthy before sending traffic.

Since NLB only supports TCP health checks and our Orchard Core app does not expose a TCP check, we skip health checks for now.

Thirdly, we need to configure the Listener. This Listener is responsible for handling incoming traffic on our NLB. When a request comes in, the Listener forwards the traffic to the Target Group , which then routes it to our ECS instances running Orchard Core CMS.

internalNlbListener:
  Type: AWS::ElasticLoadBalancingV2::Listener
  Properties:
    LoadBalancerArn: !Ref internalNlb
    Port: 80
    Protocol: TCP
    DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref nlbTargetGroup

The Listener port is the entry point where the NLB receives traffic from. It is different from the TargetGroup port which is the port on the ECS instances where the Orchard Core app is actually running. The Listener forwards traffic from its port to the TargetGroup port. In most cases, they are the same for simplicity.

The DefaultActions section ensures that all incoming requests are automatically directed to the correct target without any additional processing. This setup allows our NLB to efficiently distribute traffic to the ECS tasks while keeping the configuration simple and scalable.

In the NLB setup above, have you noticed that we do not handle port 443 (HTTPS)? Right now, our setup only works with HTTP on port 80.

So, if users visit our Orchard Core with HTTPS, the request stays encrypted as it passes through the NLB. But here is the problem because that means our ECS task must be able to handle HTTPS itself. If our ECS tasks only listen on port 80, they will receive encrypted HTTPS traffic, which they cannot process.

So why not we configure Orchard Core to accept HTTPS directly by having it listen on port 443 in Program.cs? Sure! However, this would require our ECS tasks to handle SSL termination themselves. We thus need to manage SSL certificates ourselves, which adds complexity to our setup.

Hence, we need a way to properly handle HTTPS before it reaches ECS. Now, let’s see how we can solve this with API Gateway!

Unit 10: API Gateway

As we discussed earlier, not always, but it is best practice to offload SSL termination to API Gateway because NLB does not handle SSL decryption. The SSL termination happens automatically with API Gateway for HTTPS traffic. It is a built-in feature, so we do not have to worry about manually managing SSL certificates on our backend.

In addition, API Gateway brings extra benefits such as blocking unwanted traffic and ensures only the right users can access our services. It also caches frequent requests, reducing load on our backend. Finally, it is able to log all requests, making troubleshooting faster.

By using API Gateway, we keep our infrastructure secure, efficient, and easy to manage.

Let’s start with a basic setup of API Gateway with NLB by setting up the following required components:

AWS::ApiGateway::RestApi : The root API that ties everything together. It defines the API itself before adding resources and methods.
AWS::ApiGateway::VpcLink : Connects API Gateway to the NLB.
AWS::ApiGateway::Resource : Defines the API endpoint path.
AWS::ApiGateway::Method : Specifies how the API handles requests (e.g. GET, POST).
AWS::ApiGateway::Deployment : Deploys the API configuration.
AWS::ApiGateway::Stage : Assigns a stage (e.g. dev, prod) to the deployment.

Setup Rest API

API Gateway is like a front door to our backend services. Before we define any resources, methods, or integrations, we need to create this front door first, i.e. the AWS::ApiGateway::RestApi resource.

apiGatewayRestApi:
  Type: AWS::ApiGateway::RestApi
  Properties:
    Name: !Sub "${ServiceName}-api-gateway"
    DisableExecuteApiEndpoint: True
    EndpointConfiguration:
      Types:
        - REGIONAL
    Policy: ''

Here we disable the execute-api endpoint because we want to stop AWS from exposing a default execute-api endpoint. We want to enforce access through our own custom domain which we will setup later.

REGIONAL ensures that the API is available only within our AWS region. Setting it to REGIONAL is generally the recommended option for most apps, especially for our Orchard Core CMS, because both the ECS instances and the API Gateway are in the same region. This setup allows requests to be handled locally, which minimises latency. In the future, if our CMS user base grows and is distributed globally, we may need to consider switching to EDGE to serve our CMS to a larger global audience with better performance and lower latency across regions.

Finally, since this API is mainly acting as a reverse proxy to our Orchard Core homepage on ECS, CORS is not needed. We also leave Policy: '' empty means anyone can access the public-facing Orchard Core. Instead, security should be handled by the Orchard Core authentication.

Now that we have our root API, the next step is to connect it to our VPC using VpcLink!

Setup VPC Link

The VPC Link allows API Gateway to access private resources in our VPC, such as our ECS services via the NLB. This connection ensures that requests from the API Gateway can securely reach the Orchard Core CMS hosted in ECS, even though those resources are not publicly exposed.

In simple terms, VPC Link acts as a bridge between the public-facing API Gateway and the internal resources within our VPC.

So in our template, we define the VPC Link and specify the NLB as the target, which means that all API requests coming into the Gateway will be forwarded to the NLB, which will then route them to our ECS tasks securely.

apiGatewayVpcLink:
  Type: AWS::ApiGateway::VpcLink
  Description: "VPC link for API Gateway of Orchard Core"
  Properties:
    Name: !Sub "${ServiceName}-vpc-link"
    TargetArns:
      - !Ref internalNlb

Now that we have set up the VpcLink, which connects our API Gateway to our ECS, the next step is to define how requests will actually reach our ECS. That is where the API Gateway Resource comes into play.

Setup API Gateway Resource

For the API Gateway to know what to do with the incoming requests once they cross that VPC Link bridge, we need to define specific resources, i.e. the URL paths our users will use to access the Orchard Core CMS.

In our case, we use a proxy resource to catch all requests and send them to the backend ECS service. This lets us handle dynamic requests with minimal configuration, as any path requested will be forwarded to ECS.

Using proxy resource is particularly useful for web apps like Orchard Core CMS, where the routes could be dynamic and vary widely, such as /home, /content-item/{id}, /admin/{section}. With the proxy resource, we do not need to define each individual route or API endpoint in the API Gateway. As the CMS grows and new routes are added, we also will not need to constantly update the API Gateway configuration.

apiGatewayRootProxyResource:
  Type: AWS::ApiGateway::Resource
  Properties:
    RestApiId: !Ref apiGatewayRestApi
    ParentId: !GetAtt apiGatewayRestApi.RootResourceId
    PathPart: '{proxy+}'
  DependsOn:
    - apiGatewayRestApi

After setting up the resources and establishing the VPC link to connect API Gateway to our ECS instances, the next step is to define how we handle incoming requests to those resources. This is where the AWS::ApiGateway::Method comes in. It defines the specific HTTP methods that API Gateway should accept for a particular resource.

Setup Method

The Resource component above is used to define where the requests will go. However, just defining the path alone is not enough to handle incoming requests. We need to tell API Gateway how to handle requests that come to those paths. This is where the AWS::ApiGateway::Method component comes into play.

For a use case like hosting Orchard Core CMS, the following configuration can be a good starting point.

apiGatewayRootMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    HttpMethod: ANY
    AuthorizationType: NONE
    ApiKeyRequired: False
    RestApiId: !Ref apiGatewayRestApi
    ResourceId: !GetAtt apiGatewayRestApi.RootResourceId
    Integration:
      ConnectionId: !Ref apiGatewayVpcLink
      ConnectionType: VPC_LINK
      Type: HTTP_PROXY
      IntegrationHttpMethod: ANY
      Uri: !Sub "http://${internalNlb.DNSName}"
  DependsOn:
    - apiGatewayRootProxyResource

apiGatewayRootProxyMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    ApiKeyRequired: False
    RestApiId: !Ref apiGatewayRestApi
    ResourceId: !Ref apiGatewayRootProxyResource
    HttpMethod: ANY
    AuthorizationType: NONE
    RequestParameters:
      method.request.path.proxy: True
    Integration:
      ConnectionId: !Ref apiGatewayVpcLink
      ConnectionType: VPC_LINK
      Type: HTTP_PROXY
      RequestParameters:
        integration.request.path.proxy: method.request.path.proxy
      CacheKeyParameters:
        - method.request.path.proxy
      IntegrationHttpMethod: ANY
      IntegrationResponses:
        - StatusCode: 200
          SelectionPattern: 200
      Uri: !Sub "http://${internalNlb.DNSName}/{proxy}"
  DependsOn:
    - apiGatewayRootProxyResource
    - apiGatewayVpcLink

By setting up both the root method and the proxy method, the API Gateway can handle both general traffic via the root method and dynamic path-based traffic via the proxy method in a flexible way. This reduces the need for additional methods and resources to manage various paths.

Handling dynamic path-based traffic for Orchard Core via the proxy method.

Since Orchard Core is designed for browsing, updating, and deleting content, as a start, we may need support for multiple HTTP methods. By using ANY, we are ensuring that all these HTTP methods are supported without having to define separate methods for each one.

Setting AuthorizationType to NONE is a good starting point, especially in cases where we are not expecting to implement authentication directly at the API Gateway level. Instead, we are relying on Orchard Core built-in authentication module, which already provides user login, membership, and access control. Later, if needed, we can enhance security by adding authentication layers at the API Gateway level, such as AWS IAM, Cognito, or Lambda authorisers.

Similar to the authorisation, setting ApiKeyRequired to False is also a good choice for a starting point, especially since we are not yet exposing a public API. The setup above is primarily for routing requests to Orchard Core CMS. We could change if we need to secure our CMS API endpoints in the future when 3rd-party integrations or external apps need access to the CMS API.

Up to this point, API Gateway has a Resource and a Method, but it still does not know where to send the request. That is where Integration comes in. In our setup above, it tells API Gateway to use VPC Link to talk to the ECS. It also makes API Gateway act as a reverse proxy by setting Type to HTTP_PROXY. It will simply forward all types of HTTP requests to Orchard Core without modifying them.

Even though API Gateway enforces HTTPS for external traffic, it decrypts (aka terminates SSL), validates the request, and then forwards it over HTTP to NLB within the AWS private network. Since this internal communication happens securely inside AWS, the Uri is using HTTP.

After setting up the resources and methods in API Gateway, we are essentially defining the blueprint for our API. However, these configurations are only in a draft state so they are not yet live and accessible to our end-users. We need a step called Deployment to publish the configuration.

Setup Deployment

Without deploying, the changes we discussed above are just concepts and plans. We can test them within CloudFormation, but they will not be real in the API Gateway until they are deployed.

There is an important thing to take note is that API Gateway does not automatically detect changes in our CloudFormation template. If we do not create a new deployment, our changes will not take effect in the live environment. So, we must force a new deployment by changing something in AWS::ApiGateway::Deployment.

Another thing to take note is that a new AWS::ApiGateway::Deployment will not automatically be triggered when we update our API Gateway configurations unless the logical ID of the deployment resource itself changes. This means that every time we make changes to our API Gateway configurations, we need to manually change the logical ID of the AWS::ApiGateway::Deployment. The reason CloudFormation does not automatically redeploy is to avoid unnecessary changes or disruptions.

apiGatewayDeployment202501011048:
  Type: AWS::ApiGateway::Deployment
  Properties:
    RestApiId: !Ref apiGatewayRestApi
  DependsOn:
    - apiGatewayRootMethod

In the template above, we append a timestamp 202501011048 to the logical ID of the Deployment. This way, even if we make multiple deployments on the same day, each will have a unique logical ID due to the timestamp.

Deployment alone does not make our API available to the users. We still need to assign it to a specific Stage to ensure it has a versioned endpoint with all configurations applied.

Setup Stage

A Stage in API Gateway is a deployment environment that allows us to manage and control different versions of our API. It acts as a live endpoint for clients to interact with our API. Without a Stage, the API exists but is not publicly available. We can create stages like dev, test, and prod to separate development and production traffic.

apiGatewayStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    StageName: !Ref ApiGatewayStageName
    RestApiId: !Ref apiGatewayRestApi
    DeploymentId: !Ref apiGatewayDeployment202501011048
    MethodSettings:
      - ResourcePath: '/*'
        HttpMethod: '*'
        ThrottlingBurstLimit: 100
        ThrottlingRateLimit: 50
    Tags:
      - Key: Stack
        Value: !Ref AWS::StackName

For now, we will use production as the default stage name to keep things simple. This will help us get everything set up and running quickly. Once we are ready for more environments, we can easily update the ApiGatewayStageName in the Parameters based on our environment setup.

MethodSettings are configurations defining how requests are handled in terms of performance, logging, and throttling. Using /* and * is perfectly fine at the start as our goal is to apply global throttling and logging settings for all our Orchard Core routes in one go. However, in the future we might want to adjust the settings as follows:

Content Modification (POST, PUT, DELETE): Stricter throttling and more detailed logging.
Content Retrieval (GET): More relaxed throttling for GET requests since they are usually read-only and have lower impact.

Having a burst and rate limit is useful for protecting our Orchard Core backend from excessive traffic. Even if we have a CMS with predictable traffic patterns, having rate limiting helps to prevent abuse and ensure fair usage.

The production stage in our API Gateway.

Unit 11: Route53 for API Gateway

Now that we have successfully set up API Gateway, it is accessible through an AWS-generated URL, i.e. something like https://xxxxxx.execute-api.ap-southeast-5.amazonaws.com/production which is functional but not user-friendly. Hence, we need to setup a custom domain for it so that it easier to remember, more professional, and consistent with our branding.

AWS provides a straightforward way to implement this using two key configurations:

AWS::ApiGateway::DomainName – Links our custom domain to API Gateway.
AWS::ApiGateway::BasePathMapping – Organises API versions and routes under the same domain.

Setup Hosted Zone and DNS

Since I have my domain on GoDaddy, I will need to migrate DNS management to AWS Route 53 by creating a Hosted Zone.

My personal hosted zone: chunlinprojects.com.

After creating a Hosted Zone in AWS, we need to manually copy the NS records to GoDaddy. This step is manual anyway, so we will not be automating this part of setup in CloudFormation. In addition, hosted zones are sensitive resources and should be managed carefully. We do not want hosted zones to be removed when our CloudFormation stacks are deleted too.

Once the switch is done, we can go back to our CloudFormation template to setup the custom domain name for our API Gateway.

Setup Custom Domain Name for API Gateway

API Gateway requires an SSL/TLS certificate to use a custom domain.

apiGatewayCustomDomainCert:
  Type: AWS::CertificateManager::Certificate
  Properties:
    DomainName: !Ref HostedZoneName
    ValidationMethod: 'DNS'
    DomainValidationOptions:
      - DomainName: !Sub "${CmsHostname}.{HostedZoneName}"
        HostedZoneId: !Ref HostedZoneId

Take note that please update the DomainNames in the template above to use your domain name. Also, the HostedZoneId can be retrieved from the AWS Console under “Hosted zone details” in the screenshot above.

In the resource, DomainValidationOptions tells CloudFormation to use DNS validation. When we use the AWS::CertificateManager::Certificate resource in a CloudFormation stack, domain validation is handled automatically if all three of the following are true:

We are using DNS validation;
The certificate domain is hosted in Amazon Route 53;
The domain resides in our AWS account.

However, if the certificate uses email validation, or if the domain is not hosted in Route 53, then the stack will remain in the CREATE_IN_PROGRESS state. Here, we will show how we can log in to AWS Console to manually set up DNS validation.

Remember to log in to AWS Console to check for ACM Certificate Status.

After that, we need to choose the Create records in Route 53 button to create records. The Certificate status page should open with a status banner reporting Successfully created DNS records. According to the documentation, our new certificate might continue to display a status of Pending validation for up to 30 minutes.

Successfully created DNS records.

Now that the SSL certificate is ready and the DNS validation is done, we will need to link the SSL certificate to our API Gateway using a custom domain. We are using RegionalCertificateArn, which is intended for a regional API Gateway.

apiGatewayCustomDomainName:
  Type: AWS::ApiGateway::DomainName
  Properties:
    RegionalCertificateArn: !Ref apiGatewayCustomDomainCert
    DomainName: !Sub "${CmsHostname}.{HostedZoneName}"
    EndpointConfiguration:
      Types:
        - REGIONAL
    SecurityPolicy: TLS_1_2

This allows our API to be securely accessed using our custom domain. We also set up a SecurityPolicy to use the latest TLS version (TLS 1.2), ensuring that the connection is secure and follows modern standards.

Even though it is optional, it is a good practice to specify the TLS version for both security and consistency, especially for production environments. Enforcing a TLS version helps avoid any potential vulnerabilities from outdated protocols.

Setup Custom Domain Routing

Next, we need to create a base path mapping to map the custom domain to our specific API stage in API Gateway.

The BasePathMapping is the crucial bridge between our custom domain and our API Gateway because when users visit our custom domain, we need a way to tell AWS API Gateway which specific API and stage should handle the incoming requests for that domain.

apiGatewayCustomDomainBasePathMapping:
  Type: AWS::ApiGateway::BasePathMapping
  Properties:
    DomainName: !Ref apiGatewayCustomDomainName
    RestApiId: !Ref apiGatewayRestApi
    Stage: !Ref apiGatewayStage

While the BasePathMapping connects our custom domain to a specific stage inside our API Gateway, we need to setup DNS routing outside AWS which handles the DNS resolution.

The RecordSet creates a DNS record (typically an A or CNAME record) that points to the API Gateway endpoint. Without this record, DNS systems outside AWS will not know where to direct traffic for our custom domain.

apiGatewayCustomDomainARecord:
  Type: AWS::Route53::RecordSet
  Properties:
    HostedZoneName: !Sub "${HostedZoneName}."
    Name: !Sub "${CmsHostname}.{HostedZoneName}"
    Type: A
    AliasTarget:
      DNSName: !GetAtt apiGatewayCustomDomainName.RegionalDomainName
      HostedZoneId: !GetAtt apiGatewayCustomDomainName.RegionalHostedZoneId

There is one interesting stuff to take note here is that when we use an AWS::Route53::RecordSet that specifies HostedZoneName, we must include a trailing dot (for example, chunlinprojects.com.) as part of the HostedZoneName. Otherwise, we can also choose to specify HostedZoneId instead, but never specifying both.

For API Gateway with a custom domain, AWS recommends using an Alias Record (which is similar to an A record) instead of a CNAME because the endpoint for API Gateway changes based on region and the nature of the service.

Alias records are a special feature in AWS Route 53 designed for pointing domain names directly to AWS resources like API Gateway, ELB, and so on. While CNAME records are often used in DNS to point to another domain, Alias records are unique to AWS and allow us to avoid extra DNS lookup costs.

For the HostedZoneId of AliasTarget, it is the Route 53 Hosted Zone ID of the API Gateway, do not mess up with the ID of our own hosted zone in Route 53.

Finally, please take note that when we are creating an alias resource record set, we need to omit TTL.

Reference 01: ECS Cluster

As we move forward with hosting Orchard Core CMS, let’s go through a few hosting options available within AWS, as listed below.

EC2 (Elastic Compute Cloud): A traditional option for running virtual machines. We can fully control the environment but need to manage everything, from scaling to OS patching;
Elastic Beanstalk : PaaS optimised for traditional .NET apps on Windows/IIS, not really suitable for Orchard Core which runs best on Linux containers with Kestrel;
Lightsail : A traditional VPS (Virtual Private Server), where we manage the server and applications ourselves. It is a good fit for simple, low-traffic websites but not ideal for scalable workloads like Orchard Core CMS.
EKS (Elastic Kubernetes Service): A managed Kubernetes offering from AWS. It allows us to run Kubernetes clusters, which are great for large-scale apps with complex micro-services. However, managing Kubernetes adds complexity.
ECS (Elastic Container Service): A service designed for running containerised apps. We can run containers on serverless Fargate or EC2-backed clusters.

The reason why we choose ECS is because it offers a scalable, reliable, and cost-effective way to deploy Orchard Core in a containerised environment. ECS allows us to take advantage of containerisation benefits such as isolated, consistent deployments and easy portability across environments. With built-in support for auto-scaling and seamless integration with AWS services like RDS for databases, S3 for media storage, and CloudWatch for monitoring, ECS ensures high availability and performance.

In ECS, we can choose to use either Fargate or EC2-backed ECS for hosting Orchard Core, depends on our specific needs and use case. For highly customised, predictable, or resource-intensive workloads CMS, EC2-based ECS might be more appropriate due to the need for fine-grained control over resources and configurations.

Official documentation with CloudFormation template on how to setup an ECS cluster.

There is an official documentation on how to an setup ECS cluster. Hence, we will not discuss in depth about how to set it up. Instead, we will focus on some of the key points that we need to take note of.

Official ECS-optimised AMIs from AWS.

While we can technically use any Linux AMI for running ECS tasks, the Amazon ECS-Optimised AMI offers several key benefits and optimisations that make it a better choice, particularly for ECS workloads. The Amazon ECS-Optimised AMI is designed and optimised by AWS to run ECS tasks efficiently on EC2 instances. By using the ECS-Optimised AMI, we benefit from pre-installed ECS agent + Docker as well as optimised configuration for ECS. Those AMI look for agent configuration data in the /etc/ecs/ecs.config file when the container agent starts. That’s why can specify this configuration data at launch with Amazon EC2 user data, as shown below.

containerInstances:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: "asg-launch-template"
    LaunchTemplateData:
      ImageId: !Ref EcsAmi
      InstanceType: "t3.large"
      IamInstanceProfile:
        Name: !Ref ec2InstanceProfile
      SecurityGroupIds:
        - !Ref ecsContainerHostSecurityGroup
      # This injected configuration file is how the EC2 instance
      # knows which ECS cluster it should be joining
      UserData:
        Fn::Base64: !Sub |
         #!/bin/bash -xe
         echo "ECS_CLUSTER=core-cluster" >> /etc/ecs/ecs.config
      # Disable IMDSv1, and require IMDSv2
      MetadataOptions:
        HttpEndpoint: enabled
        HttpTokens: required

As shown in the above CloudFormation template, instead of hardcoding an AMI ID which will become outdated over time, we have a parameter to ensure that the cluster always provisions instances using the most recent Amazon Linux 2023 ECS-optimised AMI.

EcsAmi:
  Description: The Amazon Machine Image ID used for the cluster
  Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
  Default: /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id

Also, the EC2 instances need access to communicate with the ECS service endpoint. This can be through an interface VPC endpoint or through our EC2 instances having public IP addresses. In our case, we are placing our EC2 instances in private subnets, so we use the Network Address Translation (NAT) to provide this access.

ecsNatGateway:
  Type: AWS::EC2::NatGateway
  Properties:
    AllocationId: !GetAtt ecsEip.AllocationId
    SubnetId: !Ref publicSubnet1

Unit 12: ECS Task Definition and Service

This ECS cluster definition is just the starting point. Next, we will define how the containers run and interact through AWS::ECS::TaskDefinition.

ecsTaskDefinition:
  Type: AWS::ECS::TaskDefinition
  Properties:
    Family: !Ref ServiceName
    TaskRoleArn: !GetAtt iamRole.Arn
    ContainerDefinitions:
      - Name: !Ref ServiceName
        Image: !Ref OrchardCoreImage
        LogConfiguration:
          LogDriver: awslogs
          Options:
            awslogs-group: !Sub "/ecs/${ServiceName}-log-group"
            awslogs-region: !Ref AWS::Region
            awslogs-stream-prefix: ecs
        PortMappings:
          - ContainerPort: 5000
            HostPort: 80
            Protocol: tcp
        Cpu: 256
        Memory: 1024
        MemoryReservation: 512
        Environment:
          - Name: DatabaseEndpoint
            Value: !GetAtt cmsDBInstance.Endpoint.Address
        Essential: true
        HealthCheck:
          Command:
            - CMD-SHELL
            - "wget -q --spider http://localhost:5000/health || exit 1"
          Interval: 30
          Timeout: 5
          Retries: 3
          StartPeriod: 30

In the setup above, we are sending logs to CloudWatch Logs so that we can centralise logs from all ECS tasks, making it easier to monitor and troubleshoot our containers.

By default, ECS is using bridge network mode. In bridge mode, containers do not get their own network interfaces. Instead, the container port (5000) must be mapped to a port on the host EC2 instance (80). Without this mapping, the Orchard Core on EC2 would not be reachable from outside. The reason we set the ContainerPort: 5000 in is to match the port our Orchard Core app is exposed on within the Docker container.

As CMS platforms like Orchard Core generally require more memory for smooth operations, especially in production environments with more traffic, it is better to start with a CPU allocation like 256 (0.25 vCPU) and 1024 MB for memory, depending on expected load.

For the MemoryReservation which is a guaranteed amount of memory for our container, we set it to be 512 MB of memory. By reserving memory, we are ensuring that your container has enough memory to run reliably. Orchard Core, being a modular CMS, can consume more memory depending on the number of features/modules you have enabled. Later if we realise Orchard Core does not need that much guaranteed memory, we can leave MemoryReservation lower. The key idea is to reserve enough memory to ensure stable operations without overcommitting.

Next, we have Essential where we set it to true. This property specifies whether the container is essential to the ECS task. We set it to true so that ECS will treat this Orchard Core container as vital for the task. If the container stops or fails, ECS will stop the entire task. Otherwise, ECS will not automatically stop the task if this Orchard Core container fails, which could lead to issues, especially in a production environment.

Finally, we must not forget about HealthCheck. In most web apps like Orchard Core, a simple HTTP endpoint /health is normally used as a health check. Here, we need to understand that many minimal container images like ECS-optimised AMIs do not include curl by default to keep them lightweight. However, wget is often available by default, making it a good alternative for checking if an HTTP endpoint is reachable. Hence, in the template above, ECS is using wget to check the /health endpoint on port 5000. If it receives an error, the container is considered unhealthy.

We can test locally to check if curl or wget is available in the image.

Once the TaskDefinition is set up, it defines the container specs. However, the ECS service is needed to manage how and where the task runs within the ECS cluster. We need the ECS service tells ECS how to run the task, manage it, and keep it running smoothly.

ecsService:
  Type: AWS::ECS::Service
  DependsOn:
    - iamRole
    - internalNlb
    - nlbTargetGroup
    - internalNlbListener
  Properties:
    Cluster: !Ref ecsCluster
    DesiredCount: 2
    DeploymentConfiguration:
      MaximumPercent: 200
      MinimumHealthyPercent: 50
    LoadBalancers:
      - ContainerName: !Ref ServiceName
        ContainerPort: 5000
        TargetGroupArn: !Ref nlbTargetGroup
    PlacementStrategies:
      - Type: spread
        Field: attribute:ecs.availability-zone
      - Type: spread
        Field: instanceId
    TaskDefinition: !Ref ecsTaskDefinition
    ServiceName: !Ref ServiceName
    Role: !Sub "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS"
    HealthCheckGracePeriodSeconds: 60

The DesiredCount is the number of tasks (or containers) we want ECS to run at all times for Orchard Core app. In this case, we set it to 2 which means that ECS will try to keep exactly 2 tasks running for our service. Setting it to 2 helps ensure that we have redundancy. If one task goes down, the other task can continue serving, ensuring that our CMS stays available and resilient.

Based on the number of DesiredCount, we indicate that during deployment, ECS can temporarily run up to 4 tasks (MaximumPercent: 200) and at least 1 task (MinimumHealthyPercent: 50) must be healthy during updates to ensure smooth deployment.

The LoadBalancers section in the ECS service definition is where we link our service to the NLB that we set up earlier, ensuring that the NLB will distribute the traffic to the correct tasks running within the ECS service. Also, since our container is configured to run on port 5000 as per our Dockerfile, this is the port we use.

Next, we have PlacementStrategies to help us control how our tasks are distributed across different instances and availability zones, making sure our CMS is resilient and well-distributed. Here, attribute:ecs.availability-zone ensures the tasks are spread evenly across different availability zones within the same region. At the same time, Field: instanceId ensures that our tasks are spread across different EC2 instances within the cluster.

Finally, it is a good practice to set a HealthCheckGracePeriodSeconds to give our containers some time to start and become healthy before ECS considers them unhealthy during scaling or deployments.

Unit 13: CloudWatch Alarm

To ensure we effectively monitor the performance of Orchard Core on our ECS service, we also need to set up CloudWatch alarms to track metrics like CPU utilisation, memory utilisation, health check, running task count, etc.

We set up the following CloudWatch alarm to monitor CPU utilisation for our ECS service. This alarm triggers if the CPU usage exceeds 75% for a specified period (5 minutes). By doing this, we can quickly identify when our service is under heavy load, which helps us take action to prevent performance issues.

highCpuUtilizationAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: !Sub "${AWS::StackName}-high-cpu"
    AlarmDescription: !Sub "ECS service ${AWS::StackName}: Cpu utilization above 75%"
    Namespace: AWS/ECS
    MetricName: CPUUtilization
    Dimensions:
      - Name: ClusterName
        Value: !Ref ecsCluster
      - Name: ServiceName
        Value: !Ref ServiceName
    Statistic: Average
    Period: 60
    EvaluationPeriods: 5
    Threshold: 75
    ComparisonOperator: GreaterThanOrEqualToThreshold
    TreatMissingData: notBreaching
    ActionsEnabled: true
    AlarmActions: []
    OKActions: []

Even if we leave AlarmActions and OKActions as empty arrays, the alarm state will still be visible in the AWS CloudWatch Console. We can monitor the alarm state directly on the CloudWatch dashboard.

Similar to the CPU utilisation alarm above, we have another alarm to trigger when the count of running tasks is less 0 for 5 consecutive periods, indicating that there have been no running tasks for a full 5 minutes.

noRunningTasksAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: !Sub "${AWS::StackName}-no-task"
    AlarmDescription: !Sub "ECS service ${AWS::StackName}: No running ECS tasks for more than 5 mins"
    Namespace: AWS/ECS
    MetricName: RunningTaskCount
    Dimensions:
      - Name: ClusterName
        Value: !Ref ecsCluster
      - Name: ServiceName
        Value: !Ref ServiceName
    Statistic: Average
    Period: 60
    EvaluationPeriods: 5
    Threshold: 1
    ComparisonOperator: LessThanThreshold
    TreatMissingData: notBreaching
    ActionsEnabled: true
    AlarmActions: []
    OKActions: []

The two alarms are available on CloudWatch dashboard.

By monitoring these key metrics, we can proactively address any performance or availability issues, ensuring our Orchard Core CMS runs smoothly and efficiently.

Wrap-Up

Setting up Orchard Core on ECS with CloudFormation does have its complexities, especially with the different moving parts like API Gateway, load balancers, and domain configurations. However, once we have the infrastructure defined in CloudFormation, it becomes much easier to deploy, update, and manage our AWS environment. This is one of the key benefits of using CloudFormation, as it gives us consistency, repeatability, and automation in our deployments.

Orchard Core website is up and accessible via our custom domain!

The heavy lifting is done up front, and after that, it is mostly about making updates to our CloudFormation stack and redeploying without having to worry about manually reconfiguring everything.

When Pinecone Wasn’t Enough: My Journey to pgvector

Goh Chun Lin — Mon, 20 Jan 2025 12:21:03 +0000

If you work with machine learning or natural language processing, you have probably dealt with storing and searching through vector embeddings.

When I created the Honkai: Star Rail (HSR) relic recommendation system using Gemini, I started with Pinecone. Pinecone is a managed vector database that made it easy to index relic descriptions and character data as embeddings. It helped me find the best recommendations based on how similar they were.

Pinecone worked well, but as the project grew, I wanted more control, something open-source, and a cheaper option. That is when I found pgvector, a tool that adds vector search to PostgreSQL and gives the flexibility of an open-source database.

About HSR and Relic Recommendation System

Honkai: Star Rail (HSR) is a popular RPG that has captured the attention of players worldwide. One of the key features of the game is its relic system, where players equip their characters with relics like hats, gloves, or boots to boost stats and unlock special abilities. Each relic has unique attributes, and selecting the right sets of relics for a character can make a huge difference in gameplay.

An HSR streamer, Unreal Dreamer, learning the new relic feature. (Image Source: Unreal Dreamer YouTube)

As a casual player, I often found myself overwhelmed by the number of options and the subtle synergies between different relic sets. Finding the good relic combination for each character was time-consuming.

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In November 2024, I started a project to develop a Gemini-powered HSR relic recommendation system which can analyse a player’s current characters to suggest the best options for them. In the project, I have been storing embeddings in Pinecone.

Embeddings and Vector Database

An embedding is a way to turn data, like text or images, into a list of numbers called a vector. These vectors make it easier for a computer to compare and understand the relationships between different pieces of data.

For example, in the HSR relic recommendation system, we use embeddings to represent descriptions of relic sets. The numbers in the vector capture the meaning behind the words, so similar relics and characters have embeddings that are closer together in a mathematical sense.

This is where vector databases like Pinecone or pgvector come in. Vector databases are designed for performing fast similarity searches on large collections of embeddings. This is essential for building systems that need to recommend, match, or classify data.

pgvector is an open-source extension for PostgreSQL that allows us to store and search for vectors directly in our database. It adds specialised functionality for handling vector data, like embeddings in our HSR project, making it easier to perform similarity searches without needing a separate system.

Unlike managed services like Pinecone, pgvector is open source. This meant we could use it freely and avoid vendor lock-in. This is a huge advantage for developers.

Finally, since pgvector runs on PostgreSQL, there is no need for additional managed service fees. This makes it a budget-friendly option, especially for projects that need to scale without breaking the bank.

Choosing the Right Model

While the choice of the vector database is important, it is not the key factor in achieving great results. The quality of our embeddings actually is determined by the model we choose.

For my HSR relic recommendation system, when our embeddings were stored in Pinecone, I started by using the multilingual-e5-large model from Microsoft Research offered in Pinecone.

When I migrated to pgvector, I had the freedom to explore other options. For this migration, I chose the all-MiniLM-L6-v2 model hosted on Hugging Face, which is a lightweight sentence-transformer designed for semantic similarity tasks. Switching to this model allowed me to quickly generate embeddings for relic sets and integrate them into pgvector, giving me a solid starting point while leaving room for future experimentation.

The all-MiniLM-L6-v2 model hosted on Hugging Face.

Using all-MiniLM-L6-v2 Model

Once we have decided to use the all-MiniLM-L6-v2 model, the next step is to generate vector embeddings for the relic descriptions. This model is from the sentence-transformers library, so we first need to install the library.

pip install sentence-transformers

The library offers SentenceTransformer class to load pre-trained models.

from sentence_transformers import SentenceTransformer

model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

At this point, the model is ready to encode text into embeddings.

The SentenceTransformer model takes care of tokenisation and other preprocessing steps internally, so we can directly pass text to it.

# Function to generate embedding for a single text
def generate_embedding(text):
    # No need to tokenise separately, it's done internally
    # No need to average the token embeddings
    embeddings = model.encode(text) 

    return embeddings

In this function, when we call model.encode(text), the model processes the text through its transformer layers, generating an embedding that captures its semantic meaning. The output is already optimised for tasks like similarity search.

Setting up the Database

After generating embeddings for each relic sets using the all-MiniLM-L6-v2 model, the next step is to store them in the PostgreSQL database with the pgvector extension.

For developers using AWS, there is a good news. In May 2023, AWS announced that Amazon Relational Database Service (RDS) for PostgreSQL would be supporting pgvector. In November 2024, Amazon RDS started to support pgvector 0.8.0.

pgvector is now supported on Amazon RDS for PostgreSQL.

To install the extension, we will run the following command in our database. This will introduce a new datatype called VECTOR.

CREATE EXTENSION vector;

After this, we can define our table as follows.

CREATE TABLE IF NOT EXISTS embeddings (
    id TEXT PRIMARY KEY,
    vector VECTOR(384),
    text TEXT
);

Besides the id column which is for the unique identifier, there are two other columns that are important.

The text column stores the original text for each relic (the two-piece and four-piece bonus descriptions).

The vector column stores the embeddings. The VECTOR(384) type is used to store embeddings, and 384 here refers to the number of dimensions in the vector. In our case, the embeddings generated by the all-MiniLM-L6-v2 model are 384-dimensional, meaning each embedding will have 384 numbers.

Here, a dimension refers to one of the “features” that helps describe something. When we talk about vectors and embeddings, each dimension is just one of the many characteristics used to represent a piece of text. These features could be things like the type of words used, their relationships, and even the overall meaning of the text.

Updating the Database

After the table is created, we can proceed to create INSERT INTO SQL statements to insert the embeddings and their associated text into the database.

In this step, I load the relic information from a JSON file and process it.

import json

# Load your relic set data from a JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']} # Combine descriptions
    for relic in relic_data
]

The relic_info_data will then be passed to the following function to generate the INSERT INTO statements.

# Function to generate INSERT INTO statements with vectors
def generate_insert_statements(data):
    # Initialise list to store SQL statements
    insert_statements = []

    for record in data:
        # Extracting text and id from the record
        id = record.get('id')
        text = record.get('text')

        # Generate the embedding for the text
        embedding = generate_embedding(text)

        # Convert the embedding to a list
        embedding_list = embedding.tolist()

        # Create the SQL INSERT INTO statement
        sql_statement = f"""
        INSERT INTO embeddings (id, vector, text)
        VALUES (
          '{id.replace("'", "''")}', 
          ARRAY{embedding_list}, 
          '{text.replace("'", "''")}')
        ON CONFLICT (id) DO UPDATE
        SET vector = EXCLUDED.vector, text = EXCLUDED.text;
        """

        # Append the statement to the list
        insert_statements.append(sql_statement)

    return insert_statements

The embeddings of the relic sets are successfully inserted to the database.

How It All Fits Together: Query the Database

Once we have stored the vector embeddings of all the relic sets in our PostgreSQL database, the next step is to find the relic sets that are most similar to a given character’s relic needs.

Just like what we have done for storing relic set embeddings, we need to generate an embedding for the query describing the character’s relic needs. This is done by passing the query through the model as demonstrated in the following code.

def query_similar_embeddings(query_text):
    query_embedding = generate_embedding(query_text)

    return query_embedding.tolist()

The generated embedding is an array of 384 numbers. We simply use this array in our SQL query below.

SELECT id, text, vector <=> '[<embedding here>]' AS distance
FROM embeddings
ORDER BY distance
LIMIT 3;

The key part of the query is the <=> operator. This operator calculates the “distance” between two vectors based on cosine similarity. In our case, it measures how similar the query embedding is to each stored embedding. The smaller the distance, the more similar the embeddings are.

We use LIMIT 3 to get the top 3 most similar relic sets.

Test Case: Finding Relic Sets for Gallagher

Gallagher is a Fire and Abundance character in HSR. He is a sustain unit that can heal allies by inflicting a debuff on the enemy.

According to the official announcement, Gallagher is a healer. (Image Source: Honkai: Star Rail YouTube)

The following screenshot shows the top 3 relic sets which are closely related to a HSR character called Gallagher using the query “Suggest the best relic sets for this character: Gallagher is a Fire and Abundance character in Honkai: Star Rail. He can heal allies.”

The returned top 3 relic sets are indeed recommended for Gallagher.

One of the returned relic sets is called the “Thief of Shooting Meteor”. It is the official recommended relic set in-game, as shown in the screenshot below.

Gallagher’s official recommended relic set.

Future Work

In our project, we will not be implementing indexing because currently in HSR, there are only a small number of relic sets. Without an index, PostgreSQL will still perform vector similarity searches efficiently because the dataset is small enough that searching through it directly will not take much time. For small-scale apps like ours, querying the vector data directly is both simple and fast.

However, when our dataset grows larger in the future, it is a good idea to explore indexing options, such as the ivfflat index, to speed up similarity searches.

References

Configure Portable Object: Localisation in .NET 8 Web API

Goh Chun Lin — Tue, 17 Dec 2024 10:11:36 +0000

Localisation is an important feature when building apps that cater to users from different countries, allowing them to interact with our app in their native language. In this article, we will walk you through how to set up and configure Portable Object (PO) Localisation in an ASP.NET Core Web API project.

Localisation is about adapting the app for a specific culture or language by translating text and customising resources. It involves translating user-facing text and content into the target language.

While .NET localisation normally uses resource files (.resx) to store localised texts for different cultures, Portable Object files (.po) are another popular choice, especially in apps that use open-source tools or frameworks.

About Portal Object (PO)

PO files are a standard format used for storing localised text. They are part of the gettext localisation framework, which is widely used across different programming ecosystems.

A PO file contains translations in the form of key-value pairs, where:

Key: The original text in the source language.
Value: The translated text in the target language.

Because PO files are simple, human-readable text files, they are easily accessible and editable by translators. This flexibility makes PO files a popular choice for many open-source projects and apps across various platforms.

You might wonder why should we use PO files instead of the traditional .resx files for localisation? Here are some advantages of using PO files instead of .resx files:

Unlike .resx files, PO files have built-in support for plural forms. This makes it much easier to handle situations where the translation changes based on the quantity, like “1 item” vs. “2 items.”
While .resx files require compilation, PO files are plain text files. Hence, we do not need any special tooling or complex build steps to use PO files.
PO files work great with collaborative translation tools. For those who are working with crowdsourcing translations, they will find that PO files are much easier to manage in these settings.

SHOW ME THE CODE!

The complete source code of this project can be found at https://github.com/goh-chunlin/Experiment.PO.

Project Setup

Let’s begin by creating a simple ASP.NET Web API project. We can start by generating a basic template with the following command.

dotnet new webapi

This will set up a minimal API with a weather forecast endpoint.

The default /weatherforecast endpoint generated by .NET Web API boilerplate.

The default endpoint in the boilerplate returns a JSON object that includes a summary field. This field describes the weather using terms like freezing, bracing, warm, or hot. Here’s the array of possible summary values:

var summaries = new[]
{
    "Freezing", "Bracing", "Chilly", "Cool", 
    "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
};

As you can see, currently, it only supports English. To extend support for multiple languages, we will introduce localisation.

Prepare PO Files

Let’s start by adding a translation for the weather summary in Chinese. Below is a sample PO file that contains the Chinese translation for the weather summaries.

#: Weather summary (Chinese)
msgid "weather_Freezing"
msgstr "寒冷"

msgid "weather_Bracing"
msgstr "冷冽"

msgid "weather_Chilly"
msgstr "凉爽"

msgid "weather_Cool"
msgstr "清爽"

msgid "weather_Mild"
msgstr "温和"

msgid "weather_Warm"
msgstr "暖和"

msgid "weather_Balmy"
msgstr "温暖"

msgid "weather_Hot"
msgstr "炎热"

msgid "weather_Sweltering"
msgstr "闷热"

msgid "weather_Scorching"
msgstr "灼热"

In most cases, PO file names are tied to locales, as they represent translations for specific languages and regions. The naming convention typically includes both the language and the region, so the system can easily identify and use the correct file. For example, the PO file above should be named zh-CN.po, which represents the Chinese translation for the China region.

In some cases, if our app supports a language without being region-specific, we could have a PO file named only with the language, such as ms.po for Malay. This serves as a fallback for all Malay speakers, regardless of their region.

We have prepared three Malay PO files: one for Malaysia (ms-MY.po), one for Singapore (ms-SG.po), and one fallback file (ms.po) for all Malay speakers, regardless of region.

After that, since our PO files are placed in the Localisation folder, please do not forget to include them in the .csproj file, as shown below.

<Project Sdk="Microsoft.NET.Sdk.Web">

  ...

  <ItemGroup>
    <Folder Include="Localisation\" />
    <Content Include="Localisation\**">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>
  </ItemGroup>

</Project>

Adding this <ItemGroup> ensures that the localisation files from the Localisation folder are included in our app output. This helps the application find and use the proper localisation resources when running.

Configure Localisation Option in .NET

In an ASP .NET Web API project, we have to install a NuGet library from Orchard Core called OrchardCore.Localization.Core (Version 2.1.3).

Once the package is installed, we need to tell the application where to find the PO files. This is done by configuring the localisation options in the Program.cs file.

builder.Services.AddMemoryCache();
builder.Services.AddPortableObjectLocalization(options => 
    options.ResourcesPath = "Localisation");

The AddMemoryCache method is necessary here because LocalizationManager of Orchard Core uses the IMemoryCache service. This caching mechanism helps avoid repeatedly parsing and loading the PO files, improving performance by keeping the localised resources in memory.

Supported Cultures and Default Culture

Now, we need to configure how the application will select the appropriate culture for incoming requests.

In .NET, we need to specify which cultures our app supports. While .NET is capable of supporting multiple cultures out of the box, it still needs to know which specific cultures we are willing to support. By defining only the cultures we actually support, we can avoid unnecessary overhead and ensure that our app is optimised.

We have two separate things to manage when making an app available in different languages and regions in .NET:

SupportedCultures : This is about how the app displays numbers, dates, and currencies. For example, how a date is shown (like MM/dd/yyyy in the US);
SupportedUICultures : This is where we specify the languages our app supports for displaying text (the content inside the PO files).

To keep things consistent and handle both text translations and regional formatting properly, it is a good practice to configure both SupportedCultures and SupportedUICultures.

We also need to setup the DefaultRequestCulture. It is the fallback culture that our app uses when it does not have any explicit culture information from the request.

The following code shows how we configure all these. To make our demo simple, we assume the locale that user wants is passed via query string.

builder.Services.Configure<RequestLocalizationOptions>(options =>
{
    var supportedCultures = LocaleConstants.SupportedAppLocale
        .Select(cul => new CultureInfo(cul))
        .ToArray();

    options.DefaultRequestCulture = new RequestCulture(
        culture: "en", uiCulture: "en");
    options.SupportedCultures = supportedCultures;
    options.SupportedUICultures = supportedCultures;
    options.AddInitialRequestCultureProvider(
        new CustomRequestCultureProvider(async httpContext =>
        {
            var currentCulture = 
                CultureInfo.InvariantCulture.Name;
            var requestUrlPath = 
                httpContext.Request.Path.Value;

            if (httpContext.Request.Query.ContainsKey("locale"))
            {
                currentCulture =         
httpContext.Request.Query["locale"].ToString();
            }

            return await Task.FromResult(
                new ProviderCultureResult(currentCulture));
        })
    );
});

Next, we need to add the RequestLocalizationMiddleware in Program.cs to automatically set culture information for requests based on information provided by the client.

app.UseRequestLocalization();

After setting up the RequestLocalizationMiddleware, we can now move on to localising the API endpoint by using IStringLocalizer to retrieve translated text based on the culture information set for the current request.

About IStringLocalizer

IStringLocalizer is a service in ASP.NET Core used for retrieving localised resources, such as strings, based on the current culture of our app. In essence, IStringLocalizer acts as a bridge between our code and the language resources (like PO files) that contain translations. If the localised value of a key is not found, then the indexer key is returned.

We first need to inject IStringLocalizer into our API controllers or any services where we want to retrieve localised text.

app.MapGet("/weatherforecast", (IStringLocalizer<WeatherForecast> stringLocalizer) =>
{
    var forecast = Enumerable.Range(1, 5).Select(index =>
        new WeatherForecast
        (
            DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
            Random.Shared.Next(-20, 55),
            stringLocalizer["weather_" + summaries[Random.Shared.Next(summaries.Length)]]
        ))
        .ToArray();
    return forecast;
})
.WithName("GetWeatherForecast")
.WithOpenApi();

The reason we use IStringLocalizer<WeatherForecast> instead of just IStringLocalizer is because we are relying on Orchard Core package to handle the PO files. According to Sebastian Ros, the Orchard Core maintainer, we cannot resolve IStringLocalizer, we need IStringLocalizer. When we use IStringLocalizer<T> instead of just IStringLocalizer is also related to how localisation is typically scoped in .NET applications.

Running on Localhost

Now, if we run the project using dotnet run, the Web API should compile successfully. Once the API is running on localhost, visiting the endpoint with zh-CN as the locale should return the weather summary in Chinese, as shown in the screenshot below.

The summary is getting the translated text from zh-CN.po now.

Dockerisation

Since the Web API is tested to be working, we can proceed to dockerise it.

We will first create a Dockerfile as shown below to define the environment our Web API will run in. Then we will build the Docker image, using the Dockerfile. After building the image, we will run it in a container, making our Web API available for use.

## Build Container
FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine AS builder
WORKDIR /app

# Copy the project file and restore any dependencies (use .csproj for the project name)
COPY *.csproj ./
RUN dotnet restore

# Copy the rest of the application code
COPY . .

# Publish the application
RUN dotnet publish -c Release -o out

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

ENV ASPNETCORE_URLS=http://*:80

WORKDIR /app
COPY --from=builder /app/out ./

# Expose the port your application will run on
EXPOSE 80

ENTRYPOINT ["dotnet", "Experiment.PO.dll"]

As shown in the Dockerfile, we are using .NET Alpine images. Alpine is a lightweight Linux distribution often used in Docker images because it is much smaller than other base images. It is a best practice when we want a minimal image with fewer security vulnerabilities and faster performance.

Globalisation Invariant Mode in .NET

When we run our Web API as a Docker container on our local machine, we will soon realise that our container has stopped because our Web API inside it crashed. It turns out that there is an exception called System.Globalization.CultureNotFoundException.

Our Web API crashes due to System.Globalization.CultureNotFoundException, as shown in docker logs.

As pointed out in the error message, only the invariant culture is supported in globalization-invariant mode.

The globalization-invariant mode was introduced in .NET 2.0 in 2017. It allows our apps to run without using the full globalization data, which can significantly reduce the runtime size and improve the performance of our application, especially in environments like Docker or microservices.

In globalization-invariant mode, only the invariant culture is used. This culture is based on English (United States) but it is not specifically tied to en-US. It is just a neutral culture used to ensure consistent behaviour across environments.

Before .NET 6, globalization-invariant mode allowed us to create any custom culture, as long as its name conformed to the BCP-47 standard. BCP-47 stands for Best Current Practice 47, and it defines a way to represent language tags that include the language, region, and other relevant cultural data. A BCP-47 language tag typically follows this pattern: language-region, for example zh-CN and zh-Hans.

Thus, before .NET 6, if an app creates a culture that is not the invariant culture, the operation succeeds.

However, starting from .NET 6, an exception is thrown if we create any culture other than the invariant culture in globalization-invariant mode. This explains why our app throws System.Globalization.CultureNotFoundException.

We thus need to disable the globalization-invariant mode in the .csproj file, as shown below, so that we can use the full globalization data, which will allow .NET to properly handle localisation.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <InvariantGlobalization>false</InvariantGlobalization>
  </PropertyGroup>

  ...

</Project>

Missing of ICU in Alpine

Since Alpine is a very minimal Linux distribution, it does not include many libraries, tools, or system components that are present in more standard distributions like Ubuntu.

In terms of globalisation, Alpine does not come pre-installed with ICU (International Components for Unicode), which .NET uses for localisation in our case.

Hence, after we turned off the globalization-invariant mode, we will encounter another issue, which is our Web API not being able to locate a valid ICU package.

Our Web API crashes due to the missing of ICU package, as shown in docker logs.

As suggested in the error message, we need to install the ICU libraries (icu-libs).

In .NET, icu-libs provides the necessary ICU libraries that allow our Web API to handle globalisation. However, the ICU libraries rely on culture-specific data to function correctly. This culture-specific data is provided by icu-data-full, which includes the full set of localisation and globalisation data for different languages and regions. Therefore, we need to install both icu-libs and icu-data-full, as shown below.

...

## Runtime Container
FROM mcr.microsoft.com/dotnet/aspnet:8.0-alpine AS runtime

# Install cultures
RUN apk add --no-cache \
   icu-data-full \
   icu-libs

...

After installing the ICU libraries, our weather forecast Web API container should be running successfully now. Now, when we visit the endpoint, we will realise that it is able to retrieve the correct value from the PO files, as shown in the following screenshot.

Yay, we can get the translated texts now!

One last thing I would like to share is that, as shown in the screenshot above, since we do not have a PO file for ms-BN (Malay for Brunei), the fallback mechanism automatically uses the ms.po file instead.

Additional Configuration

If you still could not get the translation with PO files to work, perhaps you can try out some of the suggestions from my teammates below.

Firstly, you may need to setup the AppLocalIcu in .csproj file. This setting is used to specify whether the app should use a local copy of ICU or rely on the system-installed ICU libraries. This is particularly useful in containerised environments like Docker.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <AppLocalIcu>true</AppLocalIcu>
  </PropertyGroup>

</Project>

Secondly, even though we have installed icu-libs and icu-data-full in our Alpine container, some .NET apps rely on data beyond just having the libraries available. In such case, we need to turn on the IncludeNativeLibrariesForSelfExtract setting as well in .csproj.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    ...
    <IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
  </PropertyGroup>

</Project>

Thirdly, please check if you need to configure DOTNET_SYSTEM_GLOBALIZATION_PREDEFINED_CULTURES_ONLY as well. However, please take note that this setting only makes sense when when globalization-invariant mode is enabled.

Finally, you may also need to include the runtime ICU libraries with the Microsoft.ICU.ICU4C.Runtime NuGet package (Version 72.1.0.3), enabling your app to use culture-specific data for globalisation features.

References

From Zero to Gemini: Building an AI-Powered Game Helper

Goh Chun Lin — Sun, 08 Dec 2024 03:03:32 +0000

On a chilly November morning, I attended the Google DevFest 2024 in Singapore. Together with my friends, we attended a workshop titled “Gemini Masterclass: How to Unlock Its Power with Prompting, Functions, and Agents.” The session was led by two incredible speakers, Martin Andrews and Sam Witteveen.

Martin, who holds a PhD in Machine Learning and has been an Open Source advocate since 1999. Sam is a Google Developer Expert in Machine Learning. Both of them are also organisers of the Machine Learning Singapore Meetup group. Together, they delivered an engaging and hands-on workshop about Gemini, the advanced LLM from Google.

Thanks to their engaging Gemini Masterclass, I have taken my first steps into the world of LLMs. This blog post captures what I learned and my journey into the fascinating world of Gemini.

Martin Andrews presenting in Google DevFest 2024 in Singapore.

About LLM and Gemini

LLM stands for Large Language Model. To most people, an LLM is like a smart friend who can answer almost all our questions with responses that are often accurate and helpful.

As a LLM, Gemini is trained on large amount of text data and can perform a wide range of tasks: answering questions, writing stories, summarising long documents, or even helping to debug code. What makes them special is their ability to “understand” and generate language in a way that feels natural to us.

Many of my developer friends have started using Gemini as a coding assistant in their IDEs. While it is good at that, Gemini is much more than just a coding tool.

Gemini is designed to not only respond to prompts but also act as an assistant with an extra set of tools. To make the most of Gemini, it is important to understand how it works and what it can (and cannot) do. With the knowledge gained from the DevFest workshop, I decided to explore how Gemini could assist with optimising relic choices in a game called Honkai: Star Rail.

Honkai: Star Rail and Gemini for Its Relic Recommendations

An HSR streamer, MurderofBirds, browsing through thousands of relics. (Image Sourcce: MurderofBirds Twitch)

This is where LLMs like Gemini come into play. With the ability to process and analyse complex data, Gemini can help players make smarter decisions.

In this blog post, I will briefly show how this Gemini-powered relic recommendation system can analyse a player’s current characters to suggest the best options for them. Then it will also explain the logic behind its recommendations, helping us to understand why certain relics are ideal.

Setup the Project

To make my project code available to everyone, I used Google Colab, a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. You can access my code by clicking on the button below.

In my project, I used the google-generativeai Python library, which is pre-installed in Colab. This library serves as a user-friendly API for interacting with Google LLMs, including Gemini. It makes it easy for us to integrate Gemini capabilities directly into our code.

Next, we will need to import the necessary libraries.

Importing the libraries and setup Gemini client.

The first library to import is definitely the google.generativeai. Without it, we cannot interact with Gemini easily. Then we have google.colab.userdata which securely retrieves sensitive data, like our API key, directly from the Colab notebook environment.

We will also use IPython.display for displaying results in a readable format, such as Markdown.

In the Secret section, we will have two records, i.e.

HONKAI_STAR_RAIL_PLAYER_ID: Your HSR player UID. It is used later to personalise relic recommendations.
GOOGLE_API_KEY: The API key that we can get from Google AI Studio to authenticate with Gemini.

Creating and retrieving our API keys in Google AI Studio.

Once we have initialised the google.generativeai library with the GOOGLE_API_KEY, we can proceed to specify the Gemini model we will be using.

The choice of model is crucial in LLM projects. Google AI Studio offers several options, each representing a trade-off between accuracy and cost. For my project, I choose models/gemini-1.5-flash-8b-001, which provided a good balance for this experiment. Larger models might offer slightly better accuracy but at a significant cost increase.

Google AI Studio offers a range of models, from smaller, faster models suitable for quick tasks to larger, more powerful models capable of more complex processing.

Hallucination and Knowledge Limitation

We often think of LLMs like Gemini as our smart friends who can answer any question. But just like even our smartest friend can sometimes make mistakes, LLMs have their limits too.

Gemini knowledge is based on the data it was trained on, which means it doesn’t actually know everything. Sometimes, it might hallucinate, i.e. model invents information that sounds plausible but not actually true.

Kiana is not a character from Honkai: Star Rail but she is from another game called Honkai Impact 3rd.

While Gemini is trained on a massive dataset, its knowledge is not unlimited. As a responsible AI, it acknowledges its limitations. So, when it cannot find the answer, it will tell us that it lacks the necessary information rather than fabricating a response. This is how Google builds safer AI systems, as part of its Secure AI Framework (SAIF).

Knowledge cutoff in action.

To overcome these constraints, we need to employ strategies to augment the capabilities of LLMs. Techniques such as integrating Retrieval-Augmented Generation (RAG) and leveraging external APIs can help bridge the gap between what the model knows and what it needs to know to perform effectively.

System Instructions

Leveraging System Instructions is a way to improve the accuracy and reliability of Gemini responses.

System instructions are prompts given before the main query in order to guide Gemini. These instructions provide crucial context and constraints, significantly enhancing the accuracy and reliability of the generated output.

System Instruction with contextual information about HSR characters ensures Gemini has the necessary background knowledge.

The specific design and phrasing of the system instructions provided to the Gemini is crucial. Effective system instructions provide Gemini with the necessary context and constraints to generate accurate and relevant responses. Without carefully crafted system instructions, even the most well-designed prompt can yield poor results.

Context Framing

As we can see from the example above, writing clear and effective system instructions requires careful thought and a lot of testing.

This is just one part of a much bigger picture called Context Framing, which includes preparing data, creating embeddings, and deciding how the system retrieves and uses that data. Each of these steps needs expertise and planning to make sure the solution works well in real-world scenarios.

You might have heard the term “Prompt Engineering,” and it sounds kind of technical, but it is really about figuring out how to ask the LLM the right questions in the right way to get the best answers from an LLM.

While context framing and prompt engineering are closely related and often overlap, they emphasise different aspects of the interaction with the LLM.

Stochasticity

While experimenting with Gemini, I noticed that even if I use the exact same prompt, the output can vary slightly each time. This happens because LLMs like Gemini have a built-in element of randomness , known as Stochasticity.

Lingsha, an HSR character released in 2024. (Image Credit: Game8)

For example, when querying for DPS characters, Lingsha was inconsistently included in the results. While this might seem like a minor variation, it underscores the probabilistic nature of LLM outputs and suggests that running multiple queries might be needed to obtain a more reliable consensus.

Lingsha was inconsistently included in the response to the query about multi-target DPS characters.

According to the official announcement, even though Lingsha is a healer, she can cause significant damage to all enemies too. (Image Source: Honkai: Star Rail YouTube)

Hence, it is important to treat writing efficient system instruction and prompt as iterative processes. so that we can experiment with different phrasings to find what works best and yields the most consistent results.

Temperature Tuning

We can also reduce the stochasticity of Gemini response through adjusting parameters like temperature. Lower temperatures typically reduce randomness, leading to more consistent outputs, but also may reduce creativity and diversity.

Temperature is an important parameter for balancing predictability and diversity in the output. Temperature, a number in the range of 0.0 to 2.0 with default to be 1.0 in gemini-1.5-flash model, indicates the probability distribution over the vocabulary in the model when generating text. Hence, a lower temperature makes the model more likely to select words with higher probabilities, resulting in more predictable and focused text.

Having Temperature=0 means that the model will always select the most likely word at each step. The output will be highly deterministic and repetitive.

Function Calls

A major limitation of using system instructions alone is their static nature.

For example, my initial system instructions included a list of HSR characters, but this list is static. The list does not include newly released characters or characters specific to the player’s account. In order to dynamically access a player’s character database and provide personalised recommendations, I integrated Function Calls to retrieve real-time data.

For fetching the player’s HSR character data, I leveraged the open-source Python library mihomo. This library provides an interface for accessing game data, enabling dynamic retrieval of a player’s characters and their attributes. This dynamic data retrieval is crucial for generating truly personalised relic recommendations.

Using the mihomo library, I retrieve five of my Starfaring Companions.

Defining the functions in my Python code was only the first step. To use function calls, Gemini needed to know which functions were available. We can provide this information to Gemini as shown below.

model = genai.GenerativeModel('models/gemini-1.5-flash-8b-001', tools=[get_player_name, get_player_starfaring_companions])

After we pass a query to a Gemini, the model returns a structured object that includes the names of relevant functions and their arguments based on the prompt, as shown in the screenshot below.

The correct function call is picked up by Gemini based on the prompt.

Using descriptive function names is essential for successful function calling with LLMs because the accuracy of function calls depends heavily on well-designed function names in our Python code. Inaccurate naming can directly impact the reliability of the entire system.

If our Python function is named incorrectly, for example, calling a function get_age but it returns the name of the person, Gemini might select that function wrongly when the prompt is asking for age.

As shown in the screenshot above, the prompt requested information about all the characters of the player. Gemini simply determines which function to call and provides the necessary arguments. Gemini does not directly execute the functions. The actual execution of the function needs to be handled by us, as demonstrated in the screenshot below.

After Gemini telling us which function to call, our code needs to call the function to get the result.

Grounding with Google Search

Function calls are a powerful way to access external data, but they require pre-defined functions and APIs.

To go beyond these limits and gather information from many online sources, we can use Gemini grounding feature with Google Search. This feature allows Gemini to google and include what it finds in its answers. This makes it easier to get up-to-date information and handle questions that need real-time data.

If you are getting the HTTP 429 errors when using the Google Search feature, please make sure you have setup a billing account here with enough quota.

With this feature enabled, we thus can ask Gemini to get some real-time data from the Internet, as shown below.

The upcoming v2.7 patch of HSR is indeed scheduled to be released on 4th December.

Building a Semantic Knowledge Base with Pinecone

System instructions and Google search grounding provide valuable context, but a structured knowledge base is needed to handle the extensive data about HSR relics.

Having explored system instructions and Google search grounding, the next challenge is to manage the extensive data about HSR relics. We need a way to store and quickly retrieve this information, enabling the system to generate timely and accurate relic recommendations. Thus we will need to use a vector database ideally suited for managing the vast dataset of relic information.

Vector databases, unlike traditional databases that rely on keyword matching, store information as vectors enabling efficient similarity searches. This allows for retrieving relevant relic sets based on the semantic meaning of a query, rather than relying solely on keywords.

There are many options for vector database, but I choose Pinecone. Pinecone, a managed service, offered the scalability needed to handle the HSR relic dataset and the robust API essential for reliable data access. Its availability of a free tier is also a significant factor because it allows me to keep costs low during the development of my project.

API keys in Pinecone dashboard.

Pinecone’s well-documented API and straightforward SDK make integration surprisingly easy. To get started, simply follow the Pinecone documentation to install the SDK in our code and retrieve the API key.

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key=userdata.get('PINECONE_API_KEY'))

I prepare my Honkai: Star Rail relic data, which I have previously organised into a JSON structure. This data includes information on each relic set’s two-piece and four-piece effects. Here’s a snippet to illustrate the format:

[
  {
    "name": "Sacerdos' Relived Ordeal",
    "two_piece": "Increases SPD by 6%",
    "four_piece": "When using Skill or Ultimate on one ally target, increases the ability-using target's CRIT DMG by 18%, lasting for 2 turn(s). This effect can stack up to 2 time(s)."
  },
  {
    "name": "Scholar Lost in Erudition",
    "two_piece": "Increases CRIT Rate by 8%",
    "four_piece": "Increases DMG dealt by Ultimate and Skill by 20%. After using Ultimate, additionally increases the DMG dealt by the next Skill by 25%."
  },
  ...
]

With the relic data organised in Pinecone, the next challenge is to enable similarity searches with vector embedding. Vector embedding captures the semantic meaning of the text, allowing Pinecone to identify similar relic sets based on their inherent properties and characteristics.

Vector embedding representations (Image Credit: Pinecode)

Now, we can generate vector embeddings for the HSR relic data using Pinecone. The following code snippet illustrates this process which is to convert textual descriptions of relic sets into numerical vector embeddings. These embeddings capture the semantic meaning of the relic set descriptions, enabling efficient similarity searches later.

# Load relic set data from the JSON file
with open('/content/hsr-relics.json', 'r') as f:
    relic_data = json.load(f)

# Prepare data for Pinecone
relic_info_data = [
    {"id": relic['name'], "text": relic['two_piece'] + " " + relic['four_piece']} # Combine relic set descriptions
    for relic in relic_data
]

# Generate embeddings using Pinecone
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in relic_info_data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

As shown in the code above, we use the multilingual-e5-large model, a text embedding model from Microsoft research, to generate a vector embedding for each relic set. The multilingual-e5-large model works well on messy data and it is good for short queries.

Pinecone ability to perform fast similarity searches relies on its indexing mechanism. Without an index, searching for similar relic sets would require comparing each relic set’s embedding vector to every other one, which would be extremely slow, especially with a large dataset. I choose Pinecone serverless index hosted on AWS for its automatic scaling and reduced infrastructure management.

# Create a serverless index
index_name = "hsr-relics-index"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ) 
    ) 

# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

The dimension parameter specifies the dimensionality of the vector embeddings. Higher dimensionality generally allows for capturing more nuanced relationships between data points. For example, two relic sets might both increase ATK, but one might also increase SPD while the other increases Crit DMG. A higher-dimensional embedding allows the system to capture these subtle distinctions, leading to more relevant recommendations.

For the metric parameter which measures the similarity between two vectors (representing relic sets), we use the cosine metric which is suitable for measuring the similarity between vector embeddings generated from text. This is crucial for understanding how similar two relic descriptions are.

With the vector embeddings generated, the next step was to upload them into my Pinecone index. Pinecone uses the upsert function to add or update vectors in the index. The following code snippet shows how we can upsert the generated embeddings into the Pinecone index.

# Target the index where you'll store the vector embeddings
index = pc.Index("hsr-relics-index")

# Prepare the records for upsert
# Each contains an 'id', the embedding 'values', and the original text as 'metadata'
records = []
for r, e in zip(relic_info_data, embeddings):
    records.append({
        "id": r['id'],
        "values": e['values'],
        "metadata": {'text': r['text']}
    })

# Upsert the records into the index
index.upsert(
    vectors=records,
    namespace="hsr-relics-namespace"
)

The code uses the zip function to iterate through both the list of prepared relic data and the list of generated embeddings simultaneously. For each pair, it creates a record for Pinecone with the following attributes.

id: Name of the relic set to ensure uniqueness;
values: The vector representing the semantic meaning of the relic set effects;
metadata: The original description of the relic effects, which will be used later for providing context to the user’s recommendations.

Implementing Similarity Search in Pinecone

With the relic data stored in Pinecone now, we can proceed to implement the similarity search functionality.

def query_pinecone(query: str) -> dict:

  # Convert the query into a numerical vector that Pinecone can search with
  query_embedding = pc.inference.embed(
      model="multilingual-e5-large",
      inputs=[query],
      parameters={
          "input_type": "query"
      }
  )

  # Search the index for the three most similar vectors
  results = index.query(
      namespace="hsr-relics-namespace",
      vector=query_embedding[0].values,
      top_k=3,
      include_values=False,
      include_metadata=True
  )

  return results

The function above takes a user’s query as input, converts it into a vector embedding using Pinecone’s inference endpoint, and then uses that embedding to search the index, returning the top three most similar relic sets along with their metadata.

Relic Recommendations with Pinecone and Gemini

With the integration with Pinecode, we design the initial prompt to pick relevant relic sets from Pinecone. After that, we take the results from Pinecone and combine them with the initial prompt to create a richer, more informative prompt for Gemini, as shown in the following code.

from google.generativeai.generative_models import GenerativeModel

async def format_pinecone_results_for_prompt(model: GenerativeModel, player_id: int) -> dict:
  character_relics_mapping = await get_player_character_relic_mapping(player_id)

  result = {}

  for character_name, (character_avatar_image_url, character_description) in character_relics_mapping.items():
    print(f"Processing Character: {character_name}")

    additional_character_data = character_profile.get(character_name, "")

    character_query = f"Suggest some good relic sets for this character: {character_description} {additional_character_data}"

    pinecone_response = query_pinecone(character_query)

    prompt = f"User Query: {character_query}\n\nRelevant Relic Sets:\n"
    for match in pinecone_response['matches']:
        prompt += f"* {match['id']}: {match['metadata']['text']}\n" # Extract relevant data
    prompt += "\nBased on the above information, recommend two best relic sets and explain your reasoning. Each character can only equip with either one 4-piece relic or one 2-piece relic with another 2-piece relic. You cannot recommend a combination of 4-piece and 2-piece together. Consider the user's query and the characteristics of each relic set."

    response = model.generate_content(prompt)

    result[character_avatar_image_url] = response.text

  return result

The code shows that we are doing both prompt engineering (designing the initial query to get relevant relics) and context framing (combining the initial query with the retrieved relic information to get a better overall recommendation from Gemini).

First the code retrieves data about the player’s characters, including their descriptions, images, and relics the characters currently are wearing. The code then gathers potentially relevant data about each character from a separate data source character_profile which has more information, such as gameplay mechanic about the characters that we got from the Game8 Character List. With the character data, the query will find similar relic sets in the Pinecone database.

After Pinecone returns matches, the code constructs a detailed prompt for the Gemini model. This prompt includes the character’s description, relevant relic sets found by Pinecone, and crucial instructions for the model. The instructions emphasise the constraints of choosing relic sets: either a 4-piece set, or two 2-piece sets, not a mix. Importantly, it also tells Gemini to consider the character’s existing profile and to prioritise fitting relic sets.

Finally, the code sends this detailed prompt to Gemini, receiving back the recommended relic sets.

Knight of Purity Palace, is indeed a great option for Gepard!

Enviosity, a popular YouTuber known for his in-depth Honkai: Star Rail strategy guides, introduced Knight of Purity Palace for Gepard too. (Source: YouTube)

Langtrace

Using LLMs like Gemini is sure exciting, but figuring out what is happening “under the hood” can be tricky.

If you are a web developer, you are probably familiar with Grafana dashboards. They show you how your web app is performing, highlighting areas that need improvement.

Langtrace is like Grafana, but specifically for LLMs. It gives you a similar visual overview, tracking our LLM calls, showing us where they are slow or failing, and helping us optimise the performance of our AI app.

Traces for the Gemini calls are displayed individually.

Langtrace is not only useful for tracing our LLM calls, it also offers metrics on token counts and costs, as shown in the following screenshot.

Beyond tracing calls, Langtrace collects metrics too.

Wrap-Up

Building this Honkai: Star Rail (HSR) relic recommendation system is a rewarding journey into the world of Gemini and LLMs.

I am incredibly grateful to Martin Andrews and Sam Witteveen for their inspiring Gemini Masterclass at Google DevFest in Singapore. Their guidance helped me navigate the complexities of LLM development, and I learned firsthand the importance of careful prompt engineering, the power of system instructions, and the need for dynamic data access through function calls. These lessons underscore the complexities of developing robust LLM apps and will undoubtedly inform my future AI projects.

Building this project is an enjoyable journey of learning and discovery. I encountered many challenges along the way, but overcoming them deepened my understanding of Gemini. If you’re interested in exploring the code and learning from my experiences, you can access my Colab notebook through the button below. I welcome any feedback you might have!

References

[KOSD] Change of FromQuery Model Binding from .NET 6 to .NET8

Goh Chun Lin — Thu, 31 Oct 2024 12:44:16 +0000

Recently, while migrating our project from .NET 6 to .NET 8, my teammate Jeremy Chan uncovered an undocumented change in model binding behaviour that seems to appear since .NET 7. This change is not clearly explained in the official .NET documentation, so it can be something developers easily overlook.

To illustrate the issue, let’s begin with a simple Web API project and explore a straightforward controller method that highlights the change.

[ApiController]
public class FooController
{
  [HttpGet()]
  public async void Get([FromQuery] string value = "Hello")
  {
    Console.WriteLine($"Value is {value}");

    return new JsonResult() { StatusCode = StatusCodes.Status200OK };
  }
}

Then we assume that we have nullable enabled in both .NET 6 and .NET 8 projects.

<Project Sdk="Microsoft.NET.Sdk.Web">

    <PropertyGroup>
        <Nullable>enable</Nullable>
        ...
    </PropertyGroup>

    ...

</Project>

Situation in .NET 6

In .NET 6, when we call the endpoint with /foo?value=, we shall receive the following error.

{
  "type": "https://tools.ietf.org/html/rfc7231#section-6.5.1",
  "title": "One or more validation errors occurred.",
  "status": 400,
  "traceId": "00-5bc66c755994b2bba7c9d2337c1e5bc4-e116fa61d942199b-00",
  "errors": {
    "value": [
      "The value field is required."
    ]
  }
}

However, if we change the method to be as follows, the error will not be there.

public async void Get([FromQuery] string? value)
{
    if (value is null)
        Console.WriteLine($"Value is null!!!");
    else
        Console.WriteLine($"Value is {value}");

    return new JsonResult() { StatusCode = StatusCodes.Status200OK };
}

The log when calling the endpoint with /foo?value= will then be “Value is null!!!”.

Hence, we can know that query string without value will be interpreted as being null. That is why there will be a validation error when value is not nullable.

Thus, we can say that, in order to make the endpoint work in .NET 6, we need to change it to be as follows to make the value optional. This will not mark value as a required field.

public async void Get([FromQuery] string? value = "Hello")

Now, if we call the endpoint with /foo?value=, we shall receive see the log “Value is Hello” printed.

Situation in .NET 8 (and .NET 7)

Then how about in .NET 8 with the same original setup, i.e. as shown below.

public async void Get([FromQuery] string value = "Hello")

In .NET 8, when we call the endpoint with /foo?value=, we shall see the log “Value is Hello” printed.

So, what is happening here?

In .NET 7, a new Interface IParsable was introduced. Thus, starting from the .NET 7, IParsable.TryParse API is used for binding controller action parameter values.

Initial research shows that, under the hood, .NET 7 onwards, the new model binding implementation is used and it causes this to happen.

References

KOSD, or Kopi-O Siew Dai, is a type of Singapore coffee that I enjoy. It is basically a cup of coffee with a little bit of sugar. This series is meant to blog about technical knowledge that I gained while having a small cup of Kopi-O Siew Dai.