DEV Community: Brian Caffey

RedLM: My submission for the NVIDIA and LlamaIndex Developer Contest

Brian Caffey — Thu, 14 Nov 2024 09:59:15 +0000

td;dr

RedLM is a new way to study art and literature powered by artificial intelligence. It is an application that applies LLMs to the study of one of China’s most famous literary works: Dream of the Red Chamber. It uses leading language and vision models from Chinese AI groups including Alibaba’s Qwen, Baichuan Intelligence Technology and 01.AI. RedLM uses tools, techniques and services from NVIDIA and LlamaIndex including NVIDIA NIMs, Retrieval Augmented Generation and Multi-Modal RAG with vision language models. This project is my submission for the NVIDIA and LlamaIndex Developer Contest.

This article will cover how I built the project, challenges I faced and some of the lessons I learned while working with NVIDIA and LlamaIndex technologies.

What is RedLM?

RedLM is a combination of the word “Red” and LM, an abbreviation for “language model”. Dream of the Red Chamber is such an important book in Chinese literature that it has its own field of study called 红学 (literally “the study of red”), or Redology. So, RedLM is an application that uses language models for the study of Redology.

In this project I focused on three applications of language models:

Summary and translation of the source text
A Q&A bot that can answer questions about the book providing references to the specific paragraphs used to give answers
An image-based Q&A bot that can answer questions about sections of paintings that depict scenes from each of the book’s chapters.

NotebookLM

I used this article to create a "Deep Dive" podcast episode for RedLM using Google's NotebookLM.

You can listen to this podcast episode here on 𝕏.

How I built RedLM

RedLM consists of two parts: a web UI built with Vue 3 using the Nuxt Framework and a backend API built with Python, FastAPI and LlamaIndex. There are lots of great tools for building full-stack AI applications such as Gradio and Streamlit, but I wanted to build with the web tools that I’m most familiar with and that provide the most flexibility. These frameworks (Nuxt and FastAPI) are simple and effective and they allowed me to develop quickly.

Most of the code for this project was written by AI. I used OpenAI’s ChatGPT 4o, Anthropic’s Claude 3.5 Sonnet and 01.AI’s Yi-1.5-9B-Chat model. In my development process with AI, I prompted for one logical piece of the application at a time, such as one API route, one Vue component, one pinia store or one utility function, for example. In this article I'll share some of the prompts I used in my development workflow.

This project embraces a hybrid AI inference model, meaning that the AI inference can be done either on local RTX PCs or using NVIDIA’s Cloud APIs from build.nvidia.com depending on configuration via environment variables. I used PCs with NVIDIA GeForce RTX 4090 GPUs to do inference with language and vision models, and with a change of configuration, I was able to do similar inference using NVIDIA’s API endpoints. This allowed me to develop the project both on powerful RTX desktop workstations and Mac laptops.

Translating Dream of the Red Chamber with TensorRT-LLM

Translation is often mentioned as one of the capabilities of bilingual LLMs from China. I wanted to try translating this book from Chinese to English, but I also wanted to better understand the meaning of the original text written in vernacular Chinese. Written vernacular Chinese is essentially a form of Chinese that closely resembles the way Chinese was spoken in imperial China by common people. The use of vernacular Chinese (Baihua) in literary works marked a significant cultural shift that started to make literature and education more accessible. Before the emergence of written vernacular Chinese, Chinese literature was dominated by Classical Chinese (Wenyanwen) which is a more concise, ambiguous and specialized for of languages that assumes an understanding of ancient texts and Confucian classics. The difference between vernacular Chinese and modern Mandarin Chinese is somewhat analogous to the different between Shakespearian English (Early Modern English) and Modern English.

Chinese large language models are well versed in Classical Chinese, written Chinese vernacular and modern Mandarin Chinese. I decided to rewrite the original vernacular text in simple, modern Mandarin Chinese and then using this new modern Mandarin version, translate the story into English.

Dream of the Red Chamber is a large book. It is composed of over 800,000 Chinese characters, using 4303 unique Chinese characters. It has 120 chapters and a total of 3996 paragraphs. Here is a histogram showing the number of characters per paragraph.

I rented a large multi-GPU instance from AWS using some of the credits I get as a member of the AWS Community Builders program. The g5.12xlarge instance I selected has 4 A10G Tensor Core GPUs. The TensorRT-LLM LLM API is a relatively new part of the TensorRT-LLM library. It provides a very simple, high-level interface for doing inference. Following the LLM Generate Distributed example from the TensorRT-LLM documentation, I was able to translate the entire book into simple Mandarin and then from Mandarin into English in about an hour and 15 minutes. The tensor_parallel_size option in the LLM API allows for distributed inference, this meant that up to 4 paragraphs could be translated at the same time on different GPUs on the same EC2 instance.

Translating: data/book/22.json
Processed requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:15<00:00,  2.41it/s]
Processed requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:24<00:00,  1.54it/s]
Translated: data/book/22.json
Translating: data/book/114.json
Processed requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00,  1.81it/s]
Processed requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00,  1.58it/s]
Translated: data/book/114.json
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] Refreshed the MPI local session

real    74m1.578s
user    0m45.936s
sys 0m36.283s

Getting good results required a bit of experimentation with parameters. The LLM API makes this very easy. The following code configures settings and builds the inference engine that can be used for doing completions:

sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=256)
build_config = BuildConfig(max_seq_len=2048)
llm = LLM(model=MODEL, build_config=build_config, tensor_parallel_size=4)

I used the following prompts to rewrite each paragraph of the original text in simple, modern Mandarin Chinese:

bai_prompts = [
    # Here are examples of how to rewrite Chinese vernacular into simple modern Mandarin.\n\nChinese vernacular:\n\n{p}\n\nSimple modern Mandarin
    f"以下是如何将中国白话改写为简单的现代普通话的示例。\n\n中文白话：\n\n{p}\n\n简单的现代普通话：\n\n"
    for p in flat_bai
]

It was difficult to get good results consistently. Here are some observations I had:

Some of the translated paragraphs were perfect
some translated paragraphs would randomly hallucinate the same phrase over and over again
Some requests to translate text to English would reply in Mandarin Chinese rather than in English
Sometimes I would even see computer code generated when asking for a translation
The names of characters were sometimes translated inconsistently, sometimes literally and sometimes using differing versions of pinyin, the Romanization system for transcribing the sounds of Mandarin Chinese

I found that ChatGPT 4o could handle any Chinese translation task flawlessly, but the Qwen2-7B model I used had mixed results! The change that I made that seemed to have the biggest impact on translation quality was setting *max_tokens*=256 in SamplingParams. I probably could have used a dynamic value for max_tokens based on the size of the current paragraph being translated. I also would have like to set up side-by-side comparisons of translations using different sized models, but rather than spend time and AWS credits on optimizing translation with TensorRT-LLM, I wanted to focus on the main part of this project: retrieval augmented generation (RAG) with LlamaIndex.

Building Q&A bots with RAG using LlamaIndex

My primary objective with this project was to implement a simple chat bot that responds to questions about the book with references to the specific paragraphs used in the response. The following shows images of the UI I built with one of the examples I included in the video I made for this project.

I haven't read that much of the book before working on this project, but I have read a lot about this book's characters, major themes and plot. This Q&A bot was a very interesting entrypoint to explore specific passages of the book starting with questions coming from my knowledge about the book. The question in the screenshots above is: “What does Jia Baoyu’s father think about him?” The response includes references to paragraphs where Jia Zheng (Baoyu’s father) is discussing his son. I was pretty amazed that the RAG query was able to pull out these two paragraphs.

In Dream of the Red Chamber, the relationship between protagonist Jia Baoyu and his father, Jia Zheng, is complex and fraught with tension. Jia Zheng, a strict, traditional Confucian patriarch, embodies values of discipline, scholarly rigor, and duty. He expects his son to excel in his studies and uphold the family’s honor by pursuing an official career in government. Baoyu, however, is sensitive, imaginative, and inclined toward poetry and the company of women, especially his cousins Lin Daiyu and Xue Baochai. This preference clashes with Jia Zheng’s expectations, leading to frequent misunderstandings and disappointment.

By default, LlamaIndex uses cosine similarity as the distance metric for finding the vectors representing the documents (paragraphs) that are “closest” to the vector representing the user query. This is the central mechanism by which RAG works. LlamaIndex provides an abstraction of this process, hiding the implementation details and allowing rapid development of retrieval systems.

Source: https://medium.com/@kbdhunga/a-beginners-guide-to-similarity-search-vector-indexing-part-one-9cf5e9171976

Here is some of the code I wrote for the text-based Q&A bot using LlamaIndex’s CustomQueryEngine class to fetch the nodes from which I get the referenced paragraph text, chapter number and paragraph number.

class QAndAQueryEngine(CustomQueryEngine):
    """RAG Completion Query Engine optimized for Q&A"""

    retriever: BaseRetriever
    response_synthesizer: BaseSynthesizer
    llm: OpenAILike
    qa_prompt: PromptTemplate

    def custom_query(self, query_str: str):
        nodes = self.retriever.retrieve(query_str)
        metadata = []
        # Collect the metadata into a list of dicts so that it can be sent to UI for references
        for node in nodes:
            metadata_dict = {}
            node_metadata = node.node.metadata
            metadata_dict["content"] = node.node.text
            metadata_dict["chapter"] = int(node_metadata.get("chapter"))
            metadata_dict["paragraph"] = int(node_metadata.get("paragraph"))

            metadata.append(metadata_dict)

        context_str = "\n\n".join([n.node.get_content() for n in nodes])
        response = self.llm.chat(
            [
                ChatMessage(
                    role="user",
                    content=q_and_a_prompt.format( # the English and Chinese prompt templates are discussed below
                        context_str=context_str, query_str=query_str
                    ),
                )
            ]
        )

        return response, metadata

Indexing the book data

In the indexing process, embedding models are used to translate chunks of text (paragraphs) into high-dimensional vectors that represent the relationships between the tokens in a chunk of text. These are the vectors stored in the "Vector Database" used by LlamaIndex. The chapter number, paragraph number and version (original, Mandarin Chinese and English) of each paragraph are added to the database entry as metadata during the indexing step which runs via a script before starting the FastAPI server. Here's how I indexed the original text and translations with LlamaIndex:

from llama_index.core import Document, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

en_embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
zh_embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-zh-v1.5")

def persist_index():
    documents = []
    for chapter in range(1, 121):
        with open(f"data/book/{chapter}.json", "r") as f:
            data = json.load(f)
            paragraphs = data["paragraphs"]

        for i, p in enumerate(paragraphs):
            for lang in ["original", "chinese", "english"]:
                document = Document(
                    text=p[lang],
                    metadata={
                        "chapter": str(chapter),
                        "paragraph": str(i),
                        "language": lang,
                    },
                    metadata_seperator="::",
                    metadata_template="{key}=>{value}",
                    text_template="Metadata: {metadata_str}\n-----\nContent: {content}",
                    embed_model=(
                        en_embedding_model if lang == "english" else zh_embedding_model
                    ),
                )
                documents.append(document)

    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir="storage")

if __name__ == "__main__":
    persist_index()

For the embedding models, I used the small BAAI General Embedding models (BGE) for English and Chinese. BAAI is the Beijing Academy of Artificial Intelligence, and I learned about this organization through some of the examples on the LlamaIndex site that use BAAI embeddings. There are multi-lingual embedding models (e.g. BAAI/bge-m3), but setting the embedding model on a per-document basis is possible and in some cases it might be preferable to using a single embedding model for all documents.

Milvus Vector Database

I did most of the development for this project using the in-memory VectorIndexStore provided by LlamaIndex. This worked well, but making any changes to the FastAPI server required the data to be reloaded into memory which took several seconds each time. This can really hinder a good development flow, so I looked into using an external service for the vector database instead of running it in memory.

There are a LOT of options to consider when picking a vector database for a RAG application. Milvus has a highly decoupled architecture, it is fully open source and I had seen it in some examples in the NVIDIA/GenerativeAIExamples repo, so I decided to give it a try.

Using the Milvus docker compose example I was able to set up an external vector database based on etcd and minio. Milvus also provides a Helm chart for running their vector database, this would be helpful if I was going to be running everything in Kubernetes (inference, vector database and application containers).

RedLM RAG Evaluation

Examinations have long been a cornerstone of Chinese society, shaping individual aspirations, cultural values, and even government structures. This legacy began with the imperial civil service exams, kējǔ (科举), established during the Sui and Tang dynasties, and carries through in Modern China with the gaokao (高考) college entrance examination, both of which have allowed for unprecedented meritocratic routes to power and prestige. Given how widely this novel is studied in China, I was not surprised to find a wealth of examination questions written for students studying Dream of the Red Chamber.

I used a set of 1000 multiple choice questions about Dream of the Red Chamber on examcoo.com to evaluate the effectiveness of the RAG system I built with LlamaIndex. I wrote a script to parse the questions from the website HTML using ChatGPT (parsing HTML is one of my favorite use cases of LLMs!) I filtered the list of 1000 questions down to 877 questions based on the following criteria:

Four answer choices: some of the questions had more than four answer choices. I filtered questions with more than four answer choices to keep the evaluation simple. This would allow me to assume that random answer choices would have a 25% chance of being correct.
Only one answer: For some questions the correct answer required selecting multiple answer choices. This would also help keep the evaluation logic simple.

Multiple choice questions from a Dream of the Red Chamber test (examcoo.com)

To run the evaluation I set up two scripts. The first script would prompt the LLM to answer the question without any additional information from the RAG system. This served as a baseline to see how well the LLM could do at answering multiple choice questions about the book. The script simply checks to see if the LLM response contains the letter (A, B, C or D) of the correct answer and keeps track of the number of questions answered correctly.

Another script was used to take the test using large language models with RAG. In this script, the prompt sent to the LLM included relevant paragraphs from the book based on how similar the query is to each paragraph in the book based on the cosine similarity metric mentioned earlier.

Here are some results and other observations from this experiment:

LLMs alone scored in the mid 30% range (36%)
LLMs using retrieval augmented generation with the set of questions score in the mid 40% range (44%)
I used the completion API rather than the chat API and set the max_tokens to 16. This was done to ensure that the LLM only gave a short response with a valid answer choice rather than giving a long response with an explanation.
The evaluation took longer for LLM + RAG test because of the time required for making the RAG query and the longer prompt (including both the original multiple-choice question and the referenced paragraphs).
I used the 01-ai/Yi-1.5-9B-Chat model for this test, but I probably should have used the base model rather than the chat model.
Some questions would not be capable of being answered by RAG. For example, some of the questions are about film renditions of the novel. Most of the questions seemed relevant to the content of the book, so I didn’t bother to filter out the questions that were not directly related to the book’s content.

Here is an example of a question that the LLM test script answered incorrectly and the LLM + RAG test script answered correctly.

秦钟的父亲是如何死的？

A、外感风寒、风毒之症

B、被智能儿气死的

C、生气引发旧病加重

D、生气而诱发中风而死

Translation:

How did Qin Zhong's father die?

A. He caught a cold and developed wind-related illnesses.

B. He was angered to death by Zhineng'er (a character).

C. His old illness worsened due to anger.

D. He had a stroke induced by anger and died.

Here is the paragraphs that the RAG query returned along with the English translation:

Original

荣两处上下内外人等莫不欢天喜地，独有宝玉置若罔闻。你道什么缘故？原来近日水月庵的智能私逃入城，来找秦钟，不意被秦邦业知觉，将智能逐出，将秦钟打了一顿，自己气的老病发了，三五日便呜呼哀哉了。秦钟本自怯弱，又带病未痊，受了笞杖，今见老父气死，悔痛无及，又添了许多病症。因此，宝玉心中怅怅不乐。虽有元春晋封之事，那解得他的愁闷？贾母等如何谢恩，如何回家，亲友如何来庆贺，宁荣两府近日如何热闹，众人如何得意，独他一个皆视有如无，毫不介意。因此，众人嘲他越发呆了。

English

Everyone in the Rong and Ning households, both inside and outside, were extremely happy, except for Baoyu, who seemed indifferent. Do you want to know why? It turns out that recently, the nun Zhineng from Shuiyue Temple secretly ran into the city to find Qin Zhong. Unexpectedly, she was discovered by Qin Zhong's father, Qin Banger. Qin Banger not only drove Zhineng away but also gave Qin Zhong a beating. This made Qin Banger so angry that his old illness relapsed, and within three to five days, he passed away. Qin Zhong had always been weak and hadn't fully recovered from a previous illness. After being beaten and seeing his father die in anger, he was overwhelmed with regret and sorrow, which worsened his condition. As a result, Baoyu felt very melancholic. Although the promotion of Yuan Chun to imperial concubine was a joyful event, it couldn't alleviate the gloom in his heart. While Grandmother Jia and others were busy expressing their gratitude and returning home, and relatives and friends came to celebrate, and the Rong and Ning households were bustling with excitement, Baoyu alone remained completely indifferent to it all. Consequently, everyone started to mock him for becoming more and more absent-minded.

The correct answer for this question is C.

Multi-modal RAG for visual reasoning

Qwen2-VL is a new AI model that was released in late August 2024. Qwen is the name of Alibaba’s AI Lab, and it is an abbreviation of the Chinese characters: 千问 ("qian wen", meaning 1000 questions). VL stands for vision-language, meaning that the model is capable of understanding both text and images. I had tested out the previous version of Qwen’s vision-language model and was very impressed by how it could accurately describe the contents of images and also answer general questions about images.

Sun Wen was a Qing-era painter who spent 36 years of his life creating a series of 230 paintings capturing scenes from Dream of the Red Chamber. The paintings are incredibly detailed and often contain repeated figures in a temporal sequence. If you asked a Qwen-VL model to describe one of the images, it might return lengthy description that doesn't fully capture the full detail of the scene. It might also be difficult for a language model to "focus" on a portion of the whole image.

This sparked the idea to create a feature where users can click and drag over an image to select part of a painting, then ask questions specifically about the selected portion. I knew that while this could be achieved with tools like HTML canvas, writing it on my own would be quite time-consuming. It took me just a few minutes to write out the prompt, and Claude 3.5 Sonnet generated a perfect prototype of this feature in under a minute. Here’s the prompt I used:

I'm going to describe a Vue component and I want you to write it using Vue 3 to the best of your ability.

write a simple single-file vue component using Vue 3 that does the following:

displays an image

allows the users to click and drag to select a subsection of the image

the subsection of the image is saved as a base64-encoded data url to a variable that is displayed below the image

The solution should make use of HTML canvas. When you click down on the image you begin selecting the subsection. You then move the mouse to make your subsection on the image, and when you mouse-up the subsection is selected and the data url is updated. Then the subsection is displayed at the very bottom of the page as a "preview" image using the base 64 image string as the image source.

The selection box should be a dashed red line

This shows the final result of the UI I built for the image Q&A feature in RedLM. It uses a similar chat layout that the text-based Q&A feature uses, with the addition of the image preview included in the chat log. The user query in this example just says “Please describe the contents of the image”. This was the first image that I tested when building the image Q&A feature to see if the correct passage can be referenced based on the description of an image. This pulled the exact passage and the answer provides details about what happened (a fire broke out) where it happened (at the Gourd Temple) and why it happened (a Monk accidentally set an oil pot on fire).

Here is a diagram showing the overall flow of data in the image Q&A feature:

This flow chart shows how the image Q&A feature works.

The user selects part of an image and writes a question. This data is then sent to the RedLM API as a post request to the /mm-q-and-a endpoint (multi-modal Q&A).
Vision language models are used to get a description of the image. Depending on the application configuration, this query can use models such as Qwen/Qwen2-VL-2B-Instruct on RTX PCs or using the NVIDIA API Catalog using larger models such as meta/llama-3.2-90b-vision-instruct. Not all vision language models have the same interface, so I added some logic to handle different model formats.
The image description is used to fetch relevant documents from the Vector Database
The full prompt with the image description and relevant documents is sent to the LLM. Again, inference for this step is done either with RTX PCs or using models from the build.nvidia.com API catalog.
The response from the LLM is sent back to the browser and is displayed to the user as a chat message.

Here is the prompt I used for the image Q&A feature:

# Chinese prompt for image-based Q&A bot
mm_q_and_a_prompt = PromptTemplate(
    "这是书中相关的内容：\n"
    "{context_str}\n"
    "---------------------\n"
    "下面是场景的描述：\n"
    "---------------------\n"
    "{image_description}\n"
    "---------------------\n"
    "根据上述的信息，尽量解释上说的场景和书的关系。"
)

# English prompt for image-based Q&A bot
mm_q_and_a_prompt_english = PromptTemplate(
    "Here is relevant content from the book:\n"
    "{context_str}\n"
    "---------------------\n"
    "Below is the description of a scene:\n"
    "---------------------\n"
    "{image_description}\n"
    "---------------------\n"
    "Based on the information provided above, try to explain the relationship between the described scene and the book content."
)

The prompt engineering for this feature was tricky. I was able to get some awesome results that would give me detailed and accurate responses, and then sometimes the LLM would seem confused about my query and tell me that there was no relationship between the scene description and the book content. Sometimes it would give me an accurate description of the scene, but then proceed to tell me that the book content is not related to the scene at all.

There is another important concept from LlamaIndex that I used to build the image Q&A feature: metadata filtering. Metadata filtering is an important concept in RAG systems because it helps you focus your query on relevant documents in a precise way. A very simple example might be a RAG system that indexes news articles and stores the associated date as metadata. You could allow a user to set a date range for their query and only include articles that match the given date range.

For my image Q&A system, I have a mapping between the paintings and their associated chapters. When I ask a question about a painting, I want to use the description of the image to find similar paragraphs, but only the paragraphs that occur in the painting’s associated chapter. What I ended up doing was filtering the entire index before making the query. The alternative would be filtering the returned nodes after making the query, but this would have the possibility of not returning any nodes.

Here’s what some of the metadata filtering code looks like:

# main.py
# filter by chapters associated with the queried image
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="chapter", value=str(req_data.chapter))]
)
query_engine = get_query_engine_for_multi_modal(filters)

# rag.py
# utility function that returns the query engine use for image Q&A
# the index is filtered to only include nodes associated with the image being queried
def get_query_engine_for_multi_modal(filters):
    retriever = index.as_retriever(filters=filters)
    synthesizer = get_response_synthesizer(response_mode="compact")
    try:
        query_engine = QAndAQueryEngine(
            retriever=retriever,
            response_synthesizer=synthesizer,
            llm=model,
            qa_prompt=mm_q_and_a_prompt,
        )
    except Exception as e:
        print(e)
    return query_engine

This seemed to work well for my use case, but it might not be a best practice, and it might not be efficient at a bigger scale.

Multi-modal Q&A examples

Here are some more examples of results from different types of questions from the multi-modal Q&A bot.

The response to the following query did a good job of combining information gathered from the image description and image from related passages.

This is one of my favorite examples of the RedLM image Q&A bot in action. The query here in Chinese says: "What are these two people doing"? The answer incorporates a description of what is happening in the story (Jia Baoyu comes across Ou Guan while visiting a temple) and also describes the significance of what is happening (burning paper money as a form of Chinese ancestral worship). But this answer is not perfect and it demonstrates some of the difficulties I had with the prompt engineering for this project. The first part makes a lot of sense:

这段参考资料来自中国古典小说《红楼梦》，讲述了贾宝玉在探访寺庙时发现藕官在河边烧纸钱的情景。藕官是贾府中的一个戏子，她的行为可能代表了对去世亲人的哀悼或祭奥。贾宝玉询问她烧纸的原因，藕官透露了自己有私下的情感问题，并且提到了芳官和蕊官，暗示了她们之间可能存在某种秘密或情感联系。

This reference is from the classical Chinese novel Dream of the Red Chamber, describing a scene where Jia Baoyu discovers Ou Guan burning paper money by the river during his visit to a temple. Ou Guan is a performer in the Jia household, and her actions might represent mourning or honoring deceased loved ones. Jia Baoyu asks her why she is burning the paper, and Ou Guan reveals that she has private emotional issues, mentioning Fang Guan and Rui Guan, hinting that there might be some kind of secret or emotional connection among them.

But the second part seems to contradict the first part of the response:

相比之下，你描述的画作场景与《红楼梦》中的描述不同。画作描绘的是两个人在河边的场景，左边的人跪在地上，似乎在烧东西，右边的人站在旁边，似乎在观察或指导。根据你提供的描述，无法确定画中人物的具体行为，如可能是烧水、煮饭、烧药等。因此，这幅画可能是一幅现实主义作品，描绘了日常生活中的一幕。它可能没有像《红楼梦》中的情节那样深刻的情感或秘密背景，而是单纯地表现了人与人之间的关系和行为。此外，画中的背景、人物的表情和姿态等细节也会对画面的意义产生影响，需要结合具体的画作进行分析。

In contrast, the scene you described in the painting differs from the depiction in Dream of the Red Chamber. The painting shows two people by the river: the person on the left is kneeling on the ground, seemingly burning something, while the person on the right stands nearby, possibly observing or guiding. Based on your description, it’s unclear what specific action the figures are engaged in, such as possibly boiling water, cooking, or burning herbs. Therefore, this painting might be a realist work, portraying a moment from everyday life. It may not have the profound emotional or secretive background found in the storyline of Dream of the Red Chamber, instead simply highlighting the relationships and actions between individuals. Additionally, details such as the background, expressions, and postures of the figures in the painting would also influence the scene’s meaning and would require analysis based on the specific artwork.

LlamaIndex Developer Experience

Overall, I found the LlamaIndex documentation to be very helpful. Before using LlamaIndex for this project I had used LangChain to build a RAG POC, but I didn’t get very good results. I love how the LlamaIndex documentation has a 5-line starter example for building a RAG system:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Some question about the data should go here")
print(response)

Source: https://docs.llamaindex.ai/en/stable/#getting-started

I was able to expand this simple example to implement the text and image Q&A bots for RedLM fairly easily. The application I built is somewhat similar to the Full-Stack Web App with LLamaIndex included in their documentation.

Most of the early development I did on this project used the CustomQueryEngine. Later I tried using LlamaIndex Workflows to better organize the logic in the text and image-based Q&A bots. The same workflow RAGWorkflow is used to handle requests for both the text and image Q&A bot queries. Workflows also work seamlessly with asynchronous Python frameworks like FastAPI. Here's the API endpoint for the multimodal image-Q&A bot using a LlamaIndex Workflow:

@app.post("/mm-q-and-a")
async def mm_q_and_a_workflow(req_data: MultiModalRequest):
    """
    This function handles Multimodal Q&A bot requests using a LlamaIndex workflow
    """
    try:
        # parse data from request object
        image_b64 = req_data.image
        prompt = req_data.prompt
        chapter = req_data.chapter

        # setup LlamaIndex Workflow and run it with data from request
        w = RAGWorkflow(timeout=None)
        result = await w.run(query=prompt, image_data=image_b64, chapter_number=chapter)

        # return response
        return QAQueryResponse(
            response=result["response"].message.content,
            metadata=result["metadata"],
            image_desc=result["image_description"],
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Using LlamaIndex Workflows also helped me add additional logic in a maintainable and standardized way. For example, I expanded the RAGWorkflow logic to include LLM-based re-ranking in order to ensure retrieval of the most relevant documents for my chatbot queries. This technique increases request latency, but this is an acceptable tradeoff for an application like RedLM.

LLMRerank

LLM Rerank was an interesting technique to try out, and LlamaIndex provides LLMRerank to make the implementation as simple as possible. Here's my understanding of how it works:

LLMRerank searches in the vector database for a high number of documents that are relevant to your query. This is done using cosine similarity, which essentially compares the vectors that represent the query and the documents.
Next, LLMRerank goes through a process of assigning a numerical to each document to score relevancy. It does this via a special prompt that requests relevancy score for each document in batches.
For example, I configured LLMRerank to initially fetch 4 documents from the vector database based on cosine similarity. Then in batches of 2, relevancy scores are assigned. Finally, the top 2 most relevant documents based on the LLM-give scores are used to make the RAG query.
Adding LLMRerank can require a number of additional queries based on how you configure the batch size and the number of documents you would like to compare. This will increase latency for your application and use more resources to make the extra calls.

Here's an example LLM query that LLMRerank uses to do assign scores:

Here are logs from my application showing what happens inside the workflow.

Application for text-base Q&A query:

INFO:     💬Request for Q&A chatbot: query='宝玉和谁打架？'
INFO:     🔀Routing Workflow to next step
INFO:     💬Routing to QueryEvent
INFO:     🧮Query the vector database with: 宝玉和谁打架？
INFO:     🖥️Using in-memory embedding database
INFO:     ⏳Loading index from storage directory...
INFO:     ✅Finished loading index.
INFO:     📐Retrieved 4 nodes.
INFO:     🔀Doing LLMRerank
INFO:     ℹ️ Chat Model Info:
INFO:     🟩Using NVIDIA Cloud API for inference
INFO:     🔘Chat Model: baichuan-inc/baichuan2-13b-chat
INFO:     🔢Reranked nodes to 2
INFO:     🤖Doing inference step
INFO:     ⚙️ Getting query engine..
INFO:     🔎Getting response from custom query engine
INFO:     💬Text-based Q&A query
INFO:     🀄Text is Chinese
INFO:     Using nodes from workflow...
INFO:     🔏Formatting prompt
INFO:     Prompt is

这是相关的参考资料：
---------------------
宝玉从来没有经历过这样的痛苦。起初，他觉得被打得很痛，乱喊乱叫。后来，他的气变得虚弱，声音变得嘶哑，无法说话。众门客见他被打得很惨，赶上来恳求他停下来。贾政不肯听，说：“你们知道他干了什么坏事，还能饶他吗？平时都是你们这些人把他带坏了，现在到了这步田地，你们还来劝他。明天，如果他杀父弑君，你们才不劝吗？”

宝玉从来没有经历过这样的痛苦。起初，他觉得打得很痛，乱喊乱叫。后来，他的气变得虚弱，声音变得嘶哑，无法说话。众门客见他被打得很惨，赶上来恳求他停下来。贾政不肯听，说：“你们知道他干了什么坏事，还能饶他吗？平时都是你们这些人把他带坏了，现在到了这步田地，你们还来劝他。明天，如果他杀父弑君，你们才不劝吗？”
---------------------
根据上述的参考资料，回答下面的问题
问题：宝玉和谁打架？

Response...
宝玉和贾政打架。

My question here was basically asking "Who gets in a fight with Baoyu?" The reply says that his father, Jiazheng, gets in a fight with Baoyu, and the documents that are used here very similar, differing by only one character. One of the documents is supposed to be and English translation, but in fact there was a failure in the translation for this paragraph and it "translated" the Chinese by simply repeating it. A translation of this paragraph using GPT 4o describes a tense scene between protagonist Jia Baoyu and his father Jia Zheng:

Baoyu had never endured such agony before. At first, he felt the pain intensely and cried out loudly. Later, his breath grew weak, his voice turned hoarse, and he couldn’t speak. The attendants, seeing how severely he was being beaten, rushed forward to plead for him to stop. Jia Zheng refused to listen, saying, “Do you know the misdeeds he’s committed, and still you want to spare him? Normally, it’s you people who lead him astray, and now that it’s come to this, you still try to persuade him? Tomorrow, if he were to commit patricide or treason, would you still not advise him?”

Another benefit of LlamaIndex workflows is the ability to create visualizations of each step, the branches between them and the overall flow of events and the functions that accept/emit them as arguments/return values. It took a little bit of getting used to the patterns used to create workflows, but the documentation for Workflows provided a good starting point that I could adapt for my application. Here's a visualization of the LlamaIndex Workflow that is used by the image and text-based Q&A bots:

Observability and Tracing with Langfuse

It is never too soon to add observability and tracing to a RAG application! I learned this the hard way after doing some refactoring of prompts and CustomQueryEngine logic.

Langfuse is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications. With the Langfuse integration, you can seamlessly track and monitor performance, traces, and metrics of your LlamaIndex application. Detailed traces of the LlamaIndex context augmentation and the LLM querying processes are captured and can be inspected directly in the Langfuse UI.

LlamaIndex supports lots of different observability and tracing solutions. I tried using Langfuse (YC W23) which is an open-source option that has a self hosted option.

Langfuse came in handy when debugging the prompts for the image-based Q&A bot. This screenshot shows a trace of a multi-modal Q&A bot query about the fire at the Gourd Temple that occurs in Chapter 1 of the book.

NVIDIA inference stack (TensorRT-LLM and build.nvidia.com)

The LLM API for TensorRT-LLM is a very nice developer experience compared with my earlier attempts with manually building inference engines. The roadmap for TensorRT-LLM looks promising, I’m looking forward to support for an OpenAI Compatible API and more models. NVIDIA NIMs using TensorRT-LLM are an easy way to run models as OpenAI compatible API servers, but the selection of models is still pretty limited. vLLM provides a strong alternative with a wide range of support models. NVIDIA NIMs for LLMs build on vLLM libraries and the TensorRT-LLM library, so it is helpful to have an understanding of both of these libraries to stay on the bleeding edge of performant inference engines.

The NVIDIA API catalog is a great way to test a variety of different models, especially large models that cannot fit into consumer hardware like RTX PCs or high-end MacBooks. I got to try out the new meta/llama-3.2-90b-vision-instruct model in my project by simply changing a value in my .env file, this is a great developer experience!

The NVIDIA API catalog doesn’t have every model in every size, however. For example, it has the qwen/qwen2-7b-instruct model, but doesn’t have the qwen/qwen2-7b-instruct model. Also, only some of the models are labeled as “Run Anywhere”; a lot of the models say “Self-Hosted API Coming Soon” meaning that they can’t be downloaded an run locally as a container. To get around this, I ran inferences services locally using both vLLM’s vllm/vllm-openai container and my own container running Qwen and other services.

My local inference stack (RTX)

Two of the RTX PCs in my home network: a1 and a3. a1 was the first PC I built by myself and was the beginning of my GeForce journey. Luckily I built it with an over-provisioned PSU, so it can use a 4090 FE card! The front panel doesn't fit, however.

One limitation of the NVIDIA API catalog is the number of free credits given for a trial account. Using 1 credit per API call, I would use up the 1000 credits very quickly when running scripts like translation or the RAG evaluation. The same would be true with rate limits of the OpenAI API. That’s why running LLMs locally is still an important part of the development cycle for this type of project.

This project primarily uses two models: a large language model and a vision language models. Running the Yi-1.5-9B-Chat model from 01.AI takes up just about all of the GPU memory on one of my RTX 4090 PCs, so I had to run the vision model on another PC. In a previous project, I used Kubernetes to manage lots of different inference services: LLMs, ComfyUI, ChatTTS and MusicGen for making AI videos and I found it to a nice way to manage different containerized inference services.

brian@a3:~$ microk8s kubectl get no -o wide
NAME   STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
a1     Ready    <none>   4d4h   v1.30.5   192.168.5.182   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.6.28
a2     Ready    <none>   11d    v1.30.5   192.168.5.96    <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.6.28
a3     Ready    <none>   11d    v1.30.5   192.168.5.173   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.6.28

In the RedLM GitHub repo I included kubernetes manifests that show how to run the LLM and VLM across two different computers. I used Kustomize as a way to replace dynamic values in the YAML files for different resources. The kubernetes set up is experimental; the LLM and VLM can more reliably be run with docker run commands.

I had a lot of driver issues when trying to get kubernetes to run the vLLM container for the Yi LLM. I struggled with the following error message when trying to run the vllm LLM service:

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

I tried uninstalling and reinstalling different versions of the NVIDIA drivers and CUDA but kept seeing the same message once the server would try to start up in the vLLM container logs. Rebooting my PC didn't work either. I saw a recommendation to turn off secure boot in BIOS. I didn't turn it on, but having nothing else to try I went into the BIOS settings and found that there were some keys configured in the secure boot section. After I deleted these keys and reboot, everything seemed to work normally. I'm not sure why my PC was in secure boot mode, though!

AI Models used in this project

I selected LLMs that run efficiently on RTX PCs, are available in the NVIDIA API catalog, and offer strong bilingual support in Chinese and English, ensuring compatibility, performance, and linguistic flexibility. Here are the models that I ended up using with RedLM:

`01-ai/Yi-1.5-9B-Chat` and `nvidia/yi-large`

I used 01-ai/Yi-1.5-9B-Chat for most of the LLM inference while developing RedLM on my RTX PCs. This model family performs well on both Chinese and English benchmarks, and has a variety of model sizes. I was able to try using the 01-ai/yi-large model from the NVIDIA API catalog when using remote cloud inference. I used the vllm/vllm-openai:latest container to run this locally.

There are also vision models in the Yi series, such as 01-ai/Yi-VL-34B, but I didn't use these models in my project.

`baichuan-inc/baichuan2-13b-chat`

This model is available in the NVIDIA API catalog, and it was the main model I used when testing remote inference. It performs well in a variety of tasks and scores highly on the the Chinese Massive Multitask Language Understanding (CMMLU) benchmark.

`Qwen/Qwen2-7B`

This model was used for summary and translation of the source text. It was supported by the TensorRT-LLM LLM API and I didn't have any issues building the TensorRT-LLM model with it on the EC2 instance used to do the completion inference for translations.

`Qwen/Qwen2-VL-2B-Instruct`

This was the vision language model (VLM) that I used locally when developing on RTX. I was impressed at how well it could describe images given the small parameter count of the model (2 billion parameters). The small size of this model made it easy to run in my RTX PC cluster.

There is an open GitHub issue for TensorRT-LLM support for Qwen2-VL at the time of writing.

I wrote a simple FastAPI server using the Hugging Face transformers library based on example code from this model's documentation (see services/qwen2-vl in the RedLM GitHub repo for more details). I packaged this service into a container in order to run it in my local kubernetes cluster along with other inference services.

`meta/llama-3.2-90b-vision-instruct`

This model came out while I was working on the project, and I decided to use it instead of the adept/fuyu-8b model that was previously one of the only vision language models in the NVIDIA API catalog. The meta/llama-3.2-90b-vision-instruct model has strong Chinese language skills, so it was a good model to use when doing remote inference for the image Q&A bot.

`nvidia/NVLM-D-72B`

I didn't use this model in my project, but it came out recently and looks awesome! Hopefully this model will be available on the NVIDIA API catalog soon. It is trained on the Qwen2-72B-Instruct text-only model, so it likely also has very strong support for Chinese language.

Today (September 17th, 2024), we introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training.

The success of Black Myth: Wukong

I originally got the idea to build this project after seeing the release of Black Myth: Wukong. This game is a blockbuster success from a Chinese developer that tells the story of the Monkey King’s adventure in the Journey West universe. Journey West (西游记) is another one of the “Four Great Works” of Chinese literature. It tells the story of the legendary pilgrimage of the monk Xuanzang (also known as Tang Sanzang) to India, accompanied by his three disciples—Sun Wukong (the Monkey King), Zhu Bajie (Pigsy), and Sha Wujing (Sandy). The group travels from China to India to retrieve sacred Buddhist scriptures, facing numerous challenges, demons, and supernatural beings along the way.

The novel blends elements of adventure, mythology, and spiritual allegory, with Sun Wukong's mischievous nature and extraordinary powers adding humor and excitement. Through their journey, the characters grow and overcome personal flaws, ultimately achieving enlightenment and spiritual success. The video game adaptation has set world records for numbers of concurrent players, and it has rewritten the narrative around what is possible with single-player, offline games in the gaming industry.

Three renditions of Journey West: Songokū (The Monkey King) polychrome woodblock (surimono) (1824) by Yashima Gakutei (1786–1868), Black Myth: Wukong video game by Game Science (2024), Journey to the West TV series by CCTV (1982-2000)

RedLM video

Watch the RedLM video on 𝕏

I created the video for this project using Blender.The Blender sequencer editor is a great non-linear video editing tool for simple video projects like this one. I used the following formula to create the project video for RedLM:

Background music: I used the AI music generation service called Suno with the prompt “mystical strange traditional Chinese music from the Qing Dynasty”. Here’s the link to my Suno playlist called “Qing Dynasty Music” where you can find the original song and some other good songs that I generated using this prompt. My Qing Dynasty Music Playlist on Suno
Outline: For this project, the main sections are the introduction, then explaining each part with a short demo: translation, text-based Q&A, evaluation for text-based Q&A, image-based Q&A, and finally a short outro. I wrote an outline and then ChatGPT helped with filling out the content.
Narration: I used ElevenLabs to narrate the main part of the video using a clone of my voice using the ElevenLabs Voice Lab. The Chinese voices were generated on my computer with an open-source text-to-speech model called ChatTTS.
Images and videos: I gathered images and screen captures of different parts of the project including code snippets, paintings of the book, flow diagrams and screen recordings of the application.

The video is composed of different “strips”. The green strips represent the music and voice clips. Red strips are images and yellow strips are videos. Here is what the final cut of the video looks like in Blender’s Sequencer view:

ChatTTS is one of the most impressive open-source models I have seen for generating conversational speech with prosodic elements (pausing, laughter, etc.) It is developed by a Chinese company called 2noise. Earlier this year I made a small contribution to this project with an API example using FastAPI to show how to run a standalone API using the model. Another example in this project provides a comprehensive example application built with gradio:

I was planning on streaming the narration audio for Q&A answers using my ChatTTS API service, but I didn’t get around to doing this. Instead, I just used the Gradio application to generate the Chinese narration for Q&A and image Q&A examples included in the video.

RedLM Deep Dive video with NotebookLM

NotebookLM is a new application from Google that is a truly magical application of retrieval augmented generation.

NotebookLM is a research and note-taking online tool developed by Google Labs that uses artificial intelligence, specifically Google Gemini, to assist users in interacting with their documents. It can generate summaries, explanations, and answers based on content uploaded by users.

I used NotebookLM to generate a "Deep Dive" podcast episode using only this article. I was pretty impressed with what it was able to produce, and I wanted to share it as part of this project, so I used Blender and some Python scripts to put together a simple and engaging visualization.

The openai/whisper-base model was used to get time stamps for the start and end of each spoken word using Automated Speech Recognition (ASR). A speaker segmentation library called pyannote/audio was used to perform speaker diarization. This is an interesting algorithm that can segment any number of distinct speakers in an audio recording using a series of models and a discrete-time stochastic process known as the Chinese restaurant process. This gave a list of time intervals with a speaker ID, and I used the intervals to attribute a speaker ID to each word. Then I segmented the audio into two files using this data and used the files to generate audio waveforms using Blender's geometry nodes. Another script was used to animate each word of as it is spoken in one of two positions for each speaker.

Final thoughts

I’m glad to have had the opportunity to join three NVIDIA developer contests this year. I like the idea of a “developer contest” that takes place over several weeks compared to hackathons that take place over just a few days. Having more time allows you to learn about a new tool or framework at a deeper level and think about how to apply it in a creative project.

I also like how this contest is not team based. Working on this project I was able to do a lot of high-level thinking, write out features as detailed prompts, and then delegate the code writing to LLMs as if I was giving tasks to teammates.

NVIDIA’s contests are “global developer contests”, but the contests so far are not open to developers in India and China. This is probably due to local rules and regulations governing how contests, prizes and taxes work. It is too bad; I would love to see what types of applications would come from participants in these countries. Also, there are also a lot of really interesting developments happening in the LLM space in both China and India!

The LLMs I used in this project were developed by leading Chinese AI companies, and they are competitive with LLMs from Western countries on LLM benchmarks despite having access to fewer GPU resources. Qwen recently released a new model called Qwen2.5-Coder-32B that has outperfomed leading models at coding tasks.

Kaifu Lee mentioned in a Bloomberg interview that the scarcity of GPU resources in China will force Chinese engineers to innovate in new ways to gain an advantage. One example of this we saw recently was when Chinese hardware hackers doubled the usable memory of the RTX 4090D (a variant of the RTX 4090 card with lower processing power to comply with US export regulations for China - the D stands for Dragon, apparently!)

NVIDIA recently concluded it's AI Summit in Mumbai. I was intrigued by the fact that Hindi has unique challenges that have have limited the development of Hindi LLMs compared to the development of English and Chinese LLMs. In a conversation with Jensen Huang, Indian industrial titan and CEO of Reliance Industries Mukesh Ambani spoke about his aspirations and ambition for India to overcome these challenges and develop a Hindi LLM. In a viral moment Mukesh Ambani shared that through devotion to attaining knowledge through the Hindu Goddess of knowledge Sarawati, India will be met by the Goddess of prosperity, Lakshmi.

NVIDIA recently released a small language model for Hindi at the AI Summit in Mumbai called Nemotron-4-Mini-Hindi-4B. Hindi LLMs could enable applications to explore important works of literature from India. I don't know that much about India literature, but a comparable work of literature in size and cultural significance might be the Ramayana.

The Ramayana is an ancient Indian epic that tells the story of Prince Rama's heroic quest to rescue his wife, Sita, who has been kidnapped by the demon king Ravana. Set in a world of gods, demons, and celestial beings, the story explores themes of duty, loyalty, and the triumph of good over evil. Guided by wisdom, strength, and the support of devoted allies like Hanuman, the monkey god, and his brother Lakshmana, Rama's journey is a deeply spiritual tale, celebrated for its poetic beauty and moral depth. The Ramayana continues to inspire and captivate audiences across cultures.

The Ramayana story journeyed to Thailand centuries ago, transforming into the Ramakien, a Thai adaptation that retains the essence of the original Indian epic while adding distinctive Thai cultural elements. Introduced through trade, diplomacy, and cultural exchange between India and Southeast Asia, the story became deeply woven into Thailand’s art, literature, and performance traditions. Thai kings, particularly King Rama I, adapted and documented the Ramakien, giving it a prominent place in Thai history. Lavishly detailed murals surrounding the Temple of the Emerald Buddha in Bangkok’s Grand Palace depict the Ramakien in over 178 panels that totaling over 2 kilometers in length. On a recent visit to the Grand Palace, I imagined having an application that could link the detailed murals to elements of the story in Hindi, Thai, English, Chinese or any language.

The Dream of the Red Chamber, originally titled The Story of the Stone, is one of China’s greatest literary works and a masterpiece of world literature. The novel begins with a frame story centered on a magical stone, left over from the Chinese creation myth where the goddess Nuwa mends the heavens. Longing to experience the human world, the sentient stone persuades a Buddhist monk and a Taoist priest to reincarnate it as a boy. This boy, Baoyu, is born into a wealthy and influential family—a character partly based on the author, Cao Xueqin, and his own aristocratic upbringing. Through Baoyu's life, friendships, and romantic relationships, the novel delves into his family’s gradual decline, mirroring the instability of China’s own noble families in the late Qing dynasty. The story also portrays the era's customs, social structures, and beliefs, offering readers a richly detailed exploration of life in Qing China.

It was a lot of fun to work on this project with tools from LlamaIndex and NVIDIA. With AI technology, GPUs are now essentially sentient stones, and I was able to share this important touchstone of the human experience with my computers using LlamaIndex and open source language models. In turn, RedLM shared with me delightful insights into world of Dream of the Red Chamber.

This scene describes a piece of traditional Chinese painting, depicting two elderly figures conversing amidst mountains and rivers. The painting likely visually represents the scene from the book where a monk and a Taoist are chatting at the foot of Qinggeng Peak. The two elderly figures in the painting may represent the monk and Taoist from the book, discussing their discovery of a bright and pristine stone, and planning to take it to a bustling, splendid place for a happy life. The painting’s elements—mountains, peaks, flowing water, trees, and rocks—might echo the book's descriptions, illustrating the natural environment at the base of Qinggeng Peak where the monk and Taoist reside. The painting’s tranquil and harmonious atmosphere may also align with the storyline, expressing the monk and Taoist's care for the stone and their wish for it to live a happy life. In summary, this painted scene might be an artistic portrayal of the story between the monk, the Taoist, and the stone from the book, using visual elements and ambiance to convey the narrative and themes within the story.

Rocket League BotChat powered by TensorRT-LLM: My submission for NVIDIA's Generative AI on RTX PCs Developer Contest

Brian Caffey — Mon, 19 Feb 2024 12:42:16 +0000

tl;dr

This article is about my submission to NVIDIA's Generative AI on RTX PCs Developer Contest: Rocket League BotChat. Rocket League BotChat is a BakkesMod plugin for Rocket League that allows bots to send chat messages based on in-game events. It is designed to be used with a local LLM service optimized and accelerated with NVIDIA's TensorRT-LLM library.

Here's my project submission post on 𝕏:

// Detect dark theme var iframe = document.getElementById('tweet-1760529251072118901-463'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1760529251072118901&theme=dark" }

Here's a link to the Rocket League BotChat GitHub repository.

NVIDIA's Gen AI Developer Contest
The following email caught my attention last month:

Generative AI on RTX PCs Developer Contest: Build your next innovative Gen AI project using NVIDIA TensorRT or TensorRT-LLM on Windows PC with NVIDIA RTX systems

The part about “on Windows PC” made me think: why would a developer contest focus on a particular operating system? I use all three of the major operating systems: macOS, Ubuntu and Windows 11, but most of the development work I do is on macOS and Ubuntu. I discovered WSL (Windows Subsystem for Linux) a few years ago and really enjoy using that for development as well, but I had never considered doing development work on Windows outside of WSL. I had also never used any of the Windows-specific development frameworks like .NET or Visual Studio.

My experience with Windows goes back to 2016 when I built my fist PC with an NVIDIA GeForce GTX 1080 graphics card. When I built another personal computer last year in 2023, getting the NVIDIA GeForce RTX 4090 graphics card was a big step up. I bought two NVMe drives in order to dual boot into both Windows and Ubuntu operating systems. Switching between the operating systems requires turning off the computer, going into the BIOS settings and changing the boot order and restarting the computer.

Last year I started learning more about AI image generation using Stable Diffusion with programs like Automatic1111, InvokeAI and ComfyUI. I set up everything on my PC's Ubuntu operating system, and frequently had to switch between using Ubuntu for working with stable diffusion and Windows for gaming and other Windows-specific software. The friction of having to constantly switch operating systems pushed me to move my stable diffusion software workflows to Windows. All of my models and images are stored to external drives, so moving things over to Windows was pretty easy.

I learned PowerShell and got more familiar with how Windows works as a development machine. Environment variables and system variables are one example of how Windows does things differently compared ot Linux-based operating systems. And just like that, I became a Windows developer! This experience got me interested in coming up with an idea for the NVIDIA Generative AI on NVIDIA RTX PCs Developer Contest.

Coming up with an Idea
The contest description and some related NVIDIA articles about the contest helped me with brainstorming:

Whether it’s a RAG-based chatbot, a plug-in for an existing application, or a code generation tool, the possibilities are endless.

Many use cases would benefit from running LLMs locally on Windows PCs, including gaming, creativity, productivity, and developer experiences.

This contest is focused on NVIDIA's consumer hardware line: GeForce RTX. It has a diverse set of use cases including gaming, crypto mining, VR, simulation software, creative tools and new AI techniques including image generation and LLM (Large Language Model) inference.

Gaming seemed like an interesting avenue for me to explore. PC gaming is still an industry that is developed primarily for Windows operating systems, and the gaming industry has been the largest revenue driver of NVIDIA in recent years, only recently surpassed by the data center segment. GPUs are needed to render graphics of enormous open-world environments. Some story-driven games include huge amounts of dialogue that can be considered as huge literary works in their own right. Red Dead Redemption and Genshin Impact are two massively popular games of this type. There might be an interesting project idea that could use LLMs and RAG (retrieval augmented generation), but I don't play these types of games and it didn't seem practical for a project that would be built in just over a month. I thought about trying to build something for a simpler game that I already know.

Rocket League is a vehicular soccer game that is played on both game consoles and on PCs. It is an eSports with a very high skill ceiling and a massive player base (85 million active players in the last 30 days). I started playing it during the pandemic with some of my friends and all got hooked. We also came to learn that Rocket League's in-game is varies from entertaining, annoying, toxic and in some cases, sportsmanlike.

One other thing I learned about Rocket League is that it has an active modding community. Developers create plugins for the game for all different purposes, such as coaching, practice drills, capturing replays, tracking player statistics, etc. Most Rocket League Mods are written in a popular framework called Bakkesmod (developed Andreas "bakkes" Bakke, a Norwegian software engineer). Rocket League's in-game chat inspired the idea for my submission to NVIDIA's Generative AI Developer Contest: Rocket League BotChat. The idea for my project is to build a plugin with Bakkesmod that allows Rocket League bots to send chat messages based on game events using an LLM accelerated and optimized by TensorRT-LLM (more on TensorRT-LLM soon!)

Bots are built into the Rocket League game and you can play with or against them in offline matches. However, the built-in bots are not very good. Another 3rd-party project called RLBot allows players to play against community-developed AI bots that are developed with machine learning frameworks like TensorFlow and PyTorch. These bots are very good, but they are not infallible. My contest project idea was now clear: develop a plugin for Rocket League capable of sending messages from bot players. This idea seemed to check the boxes for the large language model category of NVIDIA's developer contest: develop a project in a Windows environment for a Windows-specific program, and use an LLM powered by TensorRT-LLM.

Putting together the puzzle pieces
With this idea in mind, I looked into the project's feasibility. I really had no idea if this would work. I looked through the Bakkesmod documentation and found some helpful resources that gave me confidence that I could pull something together for at least a proof-of-concept.

The Bakkesmod Plugin Wiki (https://wiki.bakkesplugins.com/)
- HttpWrapper for sending HTTP requests from Bakkesmod
- StatEvents that allow for running custom code when specific event functions are triggered in the game (such as scoring a goal, or making a save).
The Bakkesmod plugin template: https://github.com/Martinii89/BakkesmodPluginTemplate
This provides a great starting-off point for developing Bakkesmod plugins. Plugins for Bakkesmod are written in C++ and this repo provides an organized file structure that allows your to get started quickly
Plugin Tutorial: https://wiki.bakkesplugins.com/plugin_tutorial/getting_started/
Open-source chat-related Bakkesmod plugins on GitHub
- BetterChat: https://github.com/JulienML/BetterChat
- Translate: https://github.com/0xleft/trnslt

Starting with the Plugin Template, I wrote a simple console command that when triggered sends an HTTP request to localhost:8000/hello. I set up a Hello World Flask app running on localhost:8000 and I was able to get a response from my Hello World server! There didn't seem to be any network or permission errors that would prevent my game code from communicating with other applications on my PC.

Next I started looking into how to build and run optimized LLMs with NVIDIA's TensorRT-LLM library, the software that this contest is promoting. The contest announcement included an interesting building block that I thought could be very useful: an example repo showing how to run CodeLlama-13b-instruct-hf optimized by TensorRT-LLM to provide inference for a VSCode extension called Continue (Continue.dev).

CodeLlama-13b-instruct-hf is an open source model from Meta that is trained on code and can help with code generation tasks
TensorRT-LLM is a Python library that accelerates and optimizes inference performance of large language models. It takes a Large Language Model like CodeLlama-13b-instruct-hf and generates an engine that can be used for doing inference
VSCode is an open source code editor developed by Microsoft with an large number of community plugins
Continue.dev is a startup backed by Y Combinator that is developing an open-source autopilot (code assistant) for VSCode and JetBrains that works with local LLMs or paid services like ChatGPT

To get the coding assistant project working I needed to build the TensorRT-LLM engine.

Building TensorRT-LLM engines on Windows can be done in one of two ways:

using a "bare-metal" virtual environment on Windows (with PowerShell)
using WSL

At the time of writing, building a TensorRT-LLM engine on Windows can only be done with version v0.6.1 of the TensorRT-LLM repo and version v0.7.1 of the tensorrt_llm Python package.

With WSL you can use the up-to-date versions of the TensorRT-LLM repo (main branch). The engines produced by Windows and WSL (Ubuntu) are not interchangeable and you will get errors if you try to use an engine created with one operating system on another operating system.

Once the engines are built you can use them to run the example from the trt-llm-as-openai-windows repo.

The example repo exposes an OpenAI-compatible API locally that can do chat completions. You then need to configure the Continue.dev extension to use the local LLM service:

{
  "title": "CodeLlama-13b-instruct-hf",
  "apiBase": "http://192.168.5.96:5000/",
  "provider": "openai",
  "apiKey": "None",
  "model": "gpt-4"
}

The Continue.dev extension using CodeLlama-13b-instruct-hf accelerated and optimized by TensorRT-LLM is very fast. According to this post on Continue.dev's blog, C++ is a "first tier" language:

C++ has one of the largest presences on GitHub and Stack Overflow. This shows up in its representation in public LLM datasets, where it is one of the languages with the most data. Its performance is near the top of the MultiPL-E, BabelCode / TP3, MBXP / Multilingual HumanEval, and HumanEval-X benchmarks. However, given that C++ is often used when code performance and exact algorithm implementation is very important, many developers don’t believe that LLMs are as helpful for C++ as some of the other languages in this tier.

Most of the time I'm working with either Python and TypeScript. I've read about C++ but haven't used it for anything before doing this project. I primarily used Microsoft Visual Studio to build the plugin, but VSCode with the Continue.dev autopilot extension was helpful for tackling smaller problems in a REPL-like environment. For example, I used Continue.dev in VSCode to figure out how to handle JSON. Coming from Python and JavaScript languages, I found the nlohmann/json JSON library syntax to be somewhat different. For example, here is how to add a message to messages in the body of an OpenAI API request:

messages.push_back({ {"role", role}, {"content", content } });

In Python the code for appending a message to a list of messages would be written differently:

messages.append({"role": role, "content": content})

Development environment

While working on different projects using web technologies and frameworks in the Python and JavaScript ecosystems, I developed an appreciation for well-structured development environments that are easy to use. Development environment refers to the tools and processes by which a developer can make a change to source code and see these changes reflected in some version of the application running on a local environment. The local environment (the developer's computer) should be a close proxy for the production environment where the code will ultimately deployed to for end users. For this project the local development environment is our PC itself, which simplifies things. A development environment should support hot-reloading so incremental changes can be run to test functionality, offering a tight feedback loop. I really like the development environment for this project. Here's a screenshot that shows the different parts of the development environment I used for working on Rocket League BotChat:

Rocket League (running with the -dev flag turned on). The console is helpful for viewing log messages and the plugin settings panel can be used to view and change plugin configuration values. The BakkesMod plugin also needs to be running in order to inject plugin code into the game engine
Visual Studio for working on the plugin code. Control+Shift+B rebuilds the code and automatically reloads the plugin in the game
OpenAI-compatible LLM server powered by TensorRT-LLM (using Llama-2-13b-chat-hf with AWQ INT4 quantization) running in a docker container on Ubuntu in WSL
VSCode for debugging C++ code with Continue.dev extension powered by TensorRT-LLM (using CodeLlama-13b-instruct-hf with AWQ INT4 quantization) running in a virtual environment on Windows

Building the TensorRT-LLM engines

I was able to build and run the TensorRT LLM engines for my game plugin's inference and the Continue.dev extension's inference both in Python virtual environments on Windows and on Ubuntu in WSL. For building the Llama-2-13b-chat-hf model with INT4 AWQ quantization on Windows 11 I used this command:

(.venv) PS C:\Users\My PC\GitHub\TensorRT-LLM\examples\llama> python build.py --model_dir D:\llama\Llama-2-13b-chat-hf\ --quant_ckpt_path D:\llama\Llama-2-13b-chat-hf\llama_tp1_rank0.npz --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3500 --max_output_len 1024 --output_dir D:\llama\Llama-2-13b-chat-hf\single-gpu\ --vocab_size 32064

Running the TensorRT-LLM engines

Using Windows PowerShell to start the CodeLlama server for Continue.dev:

(.venv) PS C:\Users\My PC\GitHub\trt-llm-as-openai-windows> python .\app.py --trt_engine_path "D:\llama\CodeLlama-13b-Instruct-hf\trt_engines\1-gpu\" --trt_engine_name llama_float16_tp1_rank0.engine --tokenizer_dir_path "D:\llama\CodeLlama-13b-Instruct-hf\" --port 5000 --host 0.0.0.0

Tip: Adding --host 0.0.0.0 isn't required here, but it allows me to use the CodeLlama/TensorRT-LLM server with VSCode any computer on my local network using my PC's local IP address in the Continue.dev configuration.

Using docker in WSL to start the Llama-2-13b-chat-hf LLM server:

root@0a5b5b75f079:/code/git/TensorRT-LLM/examples/server/flask# python3 app.py --trt_engine_path /llama/Llama-2-13b-chat-hf/trt_engines/1-gpu/ --trt_engine_name  llama_float16_t_rank0.engine --tokenizer_dir_path /llama/Llama-2-13b-chat-hf/ --port 5001 --host 0.0.0.0

Note: Here I also add --host 0.0.0.0, but this is required in order for the service in the docker container to be reached from WSL by the game running on Windows.

BakkesMod includes a console window that came in handy for debugging errors during development.

At the beginning of this developer contest on January 9, NVIDIA announced Chat with RTX. This is a demo program for Windows that automates a lots of the processes needed to set up a TensorRT-LLM-powered LLM running on your PC. Keep an eye on this project as it may become the best way to install and manage large language models on Windows PCs.

How it works

Here's a quick look at key parts of the plugin source code (https://github.com/briancaffey/RocketLeagueBotChat).

Hooking events

Hooking events is the core of how this plugin works. StatTickerMessage events cover most of the events that are triggered in Rocket League, such as scoring a goal, making a save or demolishing a car.

    // Hooks different types of events that are handled in onStatTickerMessage
    // See https://wiki.bakkesplugins.com/functions/stat_events/
    gameWrapper->HookEventWithCallerPost<ServerWrapper>("Function TAGame.GFxHUD_TA.HandleStatTickerMessage",
        [this](ServerWrapper caller, void* params, std::string eventname) {
            onStatTickerMessage(params);
        });

Handling events and building the prompt

We can unpack values from the event to determine the player to which the event should be attributed. The code then translates the game event and related data into an English sentence. This is appended to a vector of message objects with the appendToPrompt method.

    // handle different events like scoring a goal or making a save
    if (statEvent.GetEventName() == "Goal") {

        // was the goal scored by the human player or the bot?
        if (playerPRI.memory_address == receiver.memory_address) {
            appendToPrompt("Your human opponent just scored a goal against you! " + score_sentence, "user");
        }
        else {
            appendToPrompt("You just scored a goal against the human player! " + score_sentence, "user");
        }
    }

Making requests and handling responses

The last main part of the code is making a request to the LLM server with the prompt that we have formed above based on game messages. This code should look familiar to anyone who has worked with OpenAI's API.

std::string message = response_json["choices"][0]["message"]["content"];

The LogToChatbox method is used to send a message to the in-game chat box with the name of the bot that is sending the message. Since messages could possibly be longer than the limit of 120 characters, I send messages to the chatbox in chunks of 120 characters at a time.

gameWrapper->LogToChatbox(messages[i], this->bot_name);

That's it! The code isn't that complicated. I had to sanitize the message so that it would not include emoji or the stop character that the LLM server would include in messages (</s>). Oddly, I had a hard time getting the LLM to not use emoji even when I instructed it to not use emoji in the system prompt.

Rocket League BotChat UI

Most BakkesMod plugins for RocketLeague UIs that allow for controlling settings. Here's what the UI for Rocket League BotChat looks like:

System prompt

The system prompt instructs the bot on how it shoud reply. This is an important part of the prompt engineering for this project, and I used Postman to experiment with lots of different types of instructions. Here's the default prompt that I used:

    std::string ai_player = "You are an elite AI player in the car soccer game Rocket League. ";
    std::string one_v_one = "You are playing a 1v1 match against a human player. ";
    std::string instructions = "You will send short chat messages to your human opponent in response to what happens in the game. ";
    std::string details = "Respond to the human player with brief messages no more than 12 words long.";
    // initial system prompt
    std::string initial_system_prompt = ai_player + one_v_one + instructions + details;

The last part about no more than 12 words long was the most effective way of controlling the length responses from the LLM. I tried changing the max_output_len when building the TensorRT engine, but this degraded the quality of the responses. The system prompt can be changed by the user. Changing the system prompt was a lot of fun to expirment with!

Temperature and Seed

These values are included in the body of the request to the LLM, but I didn't have much luck with these. Early on I had issues with getting sufficient variation in the responses from the LLM, so I tried using random values for seed and temperature, but this didn't really work.

Messages

This section of the UI displays the messages that are used in requests to the LLM. In order keep the prompt within the context window limit, I only used the most recent six messages sent from the "user" (which are messages about game events) and the "assistant" (which are LLM responses from the bot). Whenever the user changes the system prompt, the messages vector is reset to only include the new system prompt.

Demo Video for Contest Submission

I used Blender's sequence editor to create a demo video for my contest submission. I don't edit a lot of videos, but it is a fun process and I learned a lot about Blender and non-linear video editing in the process. Here's how I approached creating the demo video for my project.

Structure the video in three main parts: introduction to my project and the contest, description of how it works, demo of my project in action
Find an upbeat song from playlists included in Rocket League with no vocals to use as background music. I used "Dads in Space" by Steven Walking
Get stock Rocket League footage from YouTube with youtube-dl (this is an amazing tool!). I mostly used footage from the RLCS 2023 Winter Major Trailer. This video was uploaded at 24 fps, and my Blender Video project frame rate was set to 29.97, so I used ffmpeg to convert this video from 24 fps to 29.97 fps.
Record myself playing Rocket League with my plugin enabled using NVIDIA Share. Miraculously, I was able to score against the Nexto bot!
Use ComfyUI to animate some of the images used in the contest description and use these in my video

Use ElevenLabs to narrate a simple voice over script that describes the video content. This tuned out a lot better than I expected. I paid $1 for the ElevenLabs creator plan and got lots of tokens to experiment with different settings for voice generation using a clone of my voice.

Shortcomings of my project

This plugin is a proof of concept and it has some shortcomings. One issue is that some events that my plugin listens to can happen in rapid succession. This results in "user" and "assistant" prompts getting out of order which breaks assertions on the trt-llm-as-openai-windows repo. It would make more sense to have the bot send messages not immediately after the events are triggered, but on a different type of schedule that allows for multiple events to happen before sending the prompt to the LLM.

There are lots of events that are triggered that would be interesting things for the bot to react to, but I decided not to prompt on every event since the above situation would be triggered frequently. For example, suppose I listen for events like taking a shot on goal and scoring a goal. If the goal is scored immediately after the shot is taken, then the second prompt is sent before the response for the first prompt comes back. For this reason I decided to simply not listen to events like "shot on goal" to avoid prompt messages getting out of order. This could also be addressed with more code logic.

Prompt engineering is something that can always be improved. It is hard to measure and testing it is subjective. I am pleased with the results I was able to capture for the demo video, but the quality of the LLM responses can very depending on what happens during gameplay. One idea I had to address this would be to provide multiple English translations for any given event, and then select one at random. This might help improve the variety of responses, for example.

I faced some limitations that are built in to the game iteself. For example, it is not possible for a player to send messages to the in-game chat in offline matches, which makes sense! I built a backdoor for doing this through the BakkesMod developer console, so you can send messages to the bot by typing something like SendMessage Good shot, bot!, for example.

What's next?

Participating in this contest was a great opportunity to learn more about LLMs and how to use them to extend programs in a Windows environment. It was also a lot of fun to build something by putting together new tools like TensorRT-LLM. Seeing the bot send me chat messages was very satisfying when I first got it to work! Overall it is a pretty simple implementation, but this idea could be extended to produce useful application. I could imagine a "Rocket League Coach" plugin that expands on this idea to give helpful feedback based on higher-level data, statistical trends, training goals, etc.

I think the gaming industry's adoption of LLMs for new games will be BIG, and it will present a huge opportunity for LLM optimization and acceleration software like TensorRT-LLM that I was able to use in my Rocket League BotChat. This is not to discredit the work of writers which play an important role in game development. I'm excited to see what other developers have built for this contest, especially submissions that are building mods for games using TensorRT-LLM.

Thanks NVIDIA and the TensorRT and TensorRT-LLM teams for organizing this contest! Keep on building!!

Rocket League BotChat powered by TensorRT-LLM: My submission for NVIDIA's Generative AI on RTX PCs Developer Contest

Brian Caffey — Mon, 19 Feb 2024 01:50:22 +0000

tl;dr

Here's my project submission post on 𝕏:

// Detect dark theme var iframe = document.getElementById('tweet-1759265152275292644-805'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1759265152275292644&theme=dark" }

Here's a link to the Rocket League BotChat GitHub repository.

NVIDIA's Gen AI Developer Contest
The following email caught my attention last month:

Generative AI on RTX PCs Developer Contest: Build your next innovative Gen AI project using NVIDIA TensorRT or TensorRT-LLM on Windows PC with NVIDIA RTX systems

Coming up with an Idea
The contest description and some related NVIDIA articles about the contest helped me with brainstorming:

Whether it’s a RAG-based chatbot, a plug-in for an existing application, or a code generation tool, the possibilities are endless.

Many use cases would benefit from running LLMs locally on Windows PCs, including gaming, creativity, productivity, and developer experiences.

The Bakkesmod Plugin Wiki (https://wiki.bakkesplugins.com/)
- HttpWrapper for sending HTTP requests from Bakkesmod
- StatEvents that allow for running custom code when specific event functions are triggered in the game (such as scoring a goal, or making a save).
The Bakkesmod plugin template: https://github.com/Martinii89/BakkesmodPluginTemplate
This provides a great starting-off point for developing Bakkesmod plugins. Plugins for Bakkesmod are written in C++ and this repo provides an organized file structure that allows your to get started quickly
Plugin Tutorial: https://wiki.bakkesplugins.com/plugin_tutorial/getting_started/
Open-source chat-related Bakkesmod plugins on GitHub
- BetterChat: https://github.com/JulienML/BetterChat
- Translate: https://github.com/0xleft/trnslt

CodeLlama-13b-instruct-hf is an open source model from Meta that is trained on code and can help with code generation tasks
TensorRT-LLM is a Python library that accelerates and optimizes inference performance of large language models. It takes a Large Language Model like CodeLlama-13b-instruct-hf and generates an engine that can be used for doing inference
VSCode is an open source code editor developed by Microsoft with an large number of community plugins
Continue.dev is a startup backed by Y Combinator that is developing an open-source autopilot (code assistant) for VSCode and JetBrains that works with local LLMs or paid services like ChatGPT

To get the coding assistant project working I needed to build the TensorRT-LLM engine.

Building TensorRT-LLM engines on Windows can be done in one of two ways:

using a "bare-metal" virtual environment on Windows (with PowerShell)
using WSL

At the time of writing, building a TensorRT-LLM engine on Windows can only be done with version v0.6.1 of the TensorRT-LLM repo and version v0.7.1 of the tensorrt_llm Python package.

Once the engines are built you can use them to run the example from the trt-llm-as-openai-windows repo.

The example repo exposes an OpenAI-compatible API locally that can do chat completions. You then need to configure the Continue.dev extension to use the local LLM service:

{
  "title": "CodeLlama-13b-instruct-hf",
  "apiBase": "http://192.168.5.96:5000/",
  "provider": "openai",
  "apiKey": "None",
  "model": "gpt-4"
}

The Continue.dev extension using CodeLlama-13b-instruct-hf accelerated and optimized by TensorRT-LLM is very fast. According to this post on Continue.dev's blog, C++ is a "first tier" language:

C++ has one of the largest presences on GitHub and Stack Overflow. This shows up in its representation in public LLM datasets, where it is one of the languages with the most data. Its performance is near the top of the MultiPL-E, BabelCode / TP3, MBXP / Multilingual HumanEval, and HumanEval-X benchmarks. However, given that C++ is often used when code performance and exact algorithm implementation is very important, many developers don’t believe that LLMs are as helpful for C++ as some of the other languages in this tier.

messages.push_back({ {"role", role}, {"content", content } });

In Python the code for appending a message to a list of messages would be written differently:

messages.append({"role": role, "content": content})

Development environment

Rocket League (running with the -dev flag turned on). The console is helpful for viewing log messages and the plugin settings panel can be used to view and change plugin configuration values. The BakkesMod plugin also needs to be running in order to inject plugin code into the game engine
Visual Studio for working on the plugin code. Control+Shift+B rebuilds the code and automatically reloads the plugin in the game
OpenAI-compatible LLM server powered by TensorRT-LLM (using Llama-2-13b-chat-hf with AWQ INT4 quantization) running in a docker container on Ubuntu in WSL
VSCode for debugging C++ code with Continue.dev extension powered by TensorRT-LLM (using CodeLlama-13b-instruct-hf with AWQ INT4 quantization) running in a virtual environment on Windows

Building the TensorRT-LLM engines

(.venv) PS C:\Users\My PC\GitHub\TensorRT-LLM\examples\llama> python build.py --model_dir D:\llama\Llama-2-13b-chat-hf\ --quant_ckpt_path D:\llama\Llama-2-13b-chat-hf\llama_tp1_rank0.npz --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3500 --max_output_len 1024 --output_dir D:\llama\Llama-2-13b-chat-hf\single-gpu\ --vocab_size 32064

Running the TensorRT-LLM engines

Using Windows PowerShell to start the CodeLlama server for Continue.dev:

(.venv) PS C:\Users\My PC\GitHub\trt-llm-as-openai-windows> python .\app.py --trt_engine_path "D:\llama\CodeLlama-13b-Instruct-hf\trt_engines\1-gpu\" --trt_engine_name llama_float16_tp1_rank0.engine --tokenizer_dir_path "D:\llama\CodeLlama-13b-Instruct-hf\" --port 5000 --host 0.0.0.0

Using docker in WSL to start the Llama-2-13b-chat-hf LLM server:

root@0a5b5b75f079:/code/git/TensorRT-LLM/examples/server/flask# python3 app.py --trt_engine_path /llama/Llama-2-13b-chat-hf/trt_engines/1-gpu/ --trt_engine_name  llama_float16_t_rank0.engine --tokenizer_dir_path /llama/Llama-2-13b-chat-hf/ --port 5001 --host 0.0.0.0

Note: Here I also add --host 0.0.0.0, but this is required in order for the service in the docker container to be reached from WSL by the game running on Windows.

BakkesMod includes a console window that came in handy for debugging errors during development.

How it works

Here's a quick look at key parts of the plugin source code (https://github.com/briancaffey/RocketLeagueBotChat).

Hooking events

Hooking events is the core of how this plugin works. StatTickerMessage events cover most of the events that are triggered in Rocket League, such as scoring a goal, making a save or demolishing a car.

    // Hooks different types of events that are handled in onStatTickerMessage
    // See https://wiki.bakkesplugins.com/functions/stat_events/
    gameWrapper->HookEventWithCallerPost<ServerWrapper>("Function TAGame.GFxHUD_TA.HandleStatTickerMessage",
        [this](ServerWrapper caller, void* params, std::string eventname) {
            onStatTickerMessage(params);
        });

Handling events and building the prompt

    // handle different events like scoring a goal or making a save
    if (statEvent.GetEventName() == "Goal") {

        // was the goal scored by the human player or the bot?
        if (playerPRI.memory_address == receiver.memory_address) {
            appendToPrompt("Your human opponent just scored a goal against you! " + score_sentence, "user");
        }
        else {
            appendToPrompt("You just scored a goal against the human player! " + score_sentence, "user");
        }
    }

Making requests and handling responses

std::string message = response_json["choices"][0]["message"]["content"];

gameWrapper->LogToChatbox(messages[i], this->bot_name);

Rocket League BotChat UI

Most BakkesMod plugins for RocketLeague UIs that allow for controlling settings. Here's what the UI for Rocket League BotChat looks like:

System prompt

    std::string ai_player = "You are an elite AI player in the car soccer game Rocket League. ";
    std::string one_v_one = "You are playing a 1v1 match against a human player. ";
    std::string instructions = "You will send short chat messages to your human opponent in response to what happens in the game. ";
    std::string details = "Respond to the human player with brief messages no more than 12 words long.";
    // initial system prompt
    std::string initial_system_prompt = ai_player + one_v_one + instructions + details;

Temperature and Seed

Messages

Demo Video for Contest Submission

Structure the video in three main parts: introduction to my project and the contest, description of how it works, demo of my project in action
Find an upbeat song from playlists included in Rocket League with no vocals to use as background music. I used "Dads in Space" by Steven Walking
Get stock Rocket League footage from YouTube with youtube-dl (this is an amazing tool!). I mostly used footage from the RLCS 2023 Winter Major Trailer. This video was uploaded at 24 fps, and my Blender Video project frame rate was set to 29.97, so I used ffmpeg to convert this video from 24 fps to 29.97 fps.
Record myself playing Rocket League with my plugin enabled using NVIDIA Share. Miraculously, I was able to score against the Nexto bot!
Use ComfyUI to animate some of the images used in the contest description and use these in my video

Use ElevenLabs to narrate a simple voice over script that describes the video content. This tuned out a lot better than I expected. I paid $1 for the ElevenLabs creator plan and got lots of tokens to experiment with different settings for voice generation using a clone of my voice.

Shortcomings of my project

What's next?

Thanks NVIDIA and the TensorRT and TensorRT-LLM teams for organizing this contest! Keep on building!!

My Infrastructure as Code Rosetta Stone - Deploying the same web application on AWS ECS Fargate with CDK, Terraform and Pulumi

Brian Caffey — Sat, 07 Jan 2023 18:54:38 +0000

tl;dr

I wrote three infrastructure as code libraries for deploying containerized 3-tier web apps on AWS ECS Fargate using CDK, Terraform and Pulumi. This article will provide an overview of my experience working with these three IaC tools and will show how I use my libraries in automated infrastructure deployment pipelines with GitHub Actions.

CDK Construct Library: github.com/briancaffey/cdk-django
Terraform Modules: github.com/briancaffey/terraform-aws-django
Pulumi Component Library: github.com/briancaffey/pulumi-aws-django
Mono repo with a sample Django micro blogging app (μblog) and frontend app (Vue SPA written with Quasar), GitHub Action workflows for infrastructure and (separate) application deployment pipelines, IaC code that consumes each of the libraries listed above, VuePress documentation site and miscellaneous items (k6 load testing scripts, Cypress tests, docker-compose, etc.): github.com/briancaffey/django-step-by-step

eli5

Pretend we are at the beach building sandcastles. We can build sandcastles using our hands, but this takes a lot of time, and we might bump into each other and accidentally knock over part of our sandcastle. I made some tools for building sandcastles. We have one tool for building a sand castle base that includes the wall around the outside, the moat, the door and different sections inside the walls. And I made another tool for deploying smaller sand castle houses inside the walls of the sandcastle base. We fill the tool with sand and water and then turn it over inside of our base and we can build an entire city of sandcastles. Also, the tool lets us carefully remove parts of our sandcastle without knocking over any of the other parts. We can share the tool with all of our friends and they can make cool sandcastles too, and the tool is free for them to use.

Instead of sandcastles, I'm working with computer systems that can power internet applications, like YouTube for example. I'm building tools that can allow me or anyone else to build really awesome internet applications using computers.

The tools are not physical tools like the ones for building sandcastles, but instead these tools are made with code. The code for websites like YouTube allow you upload videos to YouTube, but the code I'm writing allows you to upload any type of website (even on like YouTube) to the internet. When we run this code, it creates applications on the internet. Also, sand is very expensive and Jeff Bezos owns the beach.

Why I made an Infrastructure as Code Rosetta Stone with CDK, Terraform and Pulumi

To push myself to learn more about AWS, IaC, CI/CD, automation and Platform Engineering

Learn differences between major IaC tools and how to use them to do exactly the same thing (build a web app) on the same Cloud (AWS) in the same way (serverless container technology using ECS Fargate).
Get more experience publishing software packages (npm) and finding the right level of abstraction for IaC libraries that is both dynamic and straightforward

To fail as many times as possible

Every time I fail when I think I have things right, I learn something new
Failed IaC pipelines can sometimes be scary, and every failure I have on these project can teach me about potential failure modes for live projects running in production
You can often times be "stuck" where you have a set of resources that you can't update or delete. Learning to get unstuck from these scenarios is important

To take an application-first approach to DevOps

Application developers are increasingly being tasked with operational duties
While learning about IaC, I had a hard time finding in-depth materials covering application development, CI/CD pipelines and automation and Infrastructure as Code and how these three knowledge domains work together. There are important considerations to make when between a Hello World docker image
You could probably use another framework with these IaC libraries like Flask or Rails, but for now I'm building these projects with Django first-in-mind

To develop a project I can reference when helping myself and others

companies and projects that do IaC and CI/CD for the most part have things in private repos for obvious reasons, there isn't any good reason to share this type of code unless you are sharing it with an auditor
Hopefully the sample application, IaC and CI/CD pipelines aren't overly complex. There are more complex examples of open source companies out there, but their repos have steep learning curves and a lot going on
People often ask about how to split up IaC deployments and application deployments. I want to be able to use this project to show people how it can be done

To encourage others (specifically Developer Advocates / Developer Relations / Solutions Architects in the CDK, Terraform and Pulumi communities) to share complete and non-trivial examples of IaC software **in use with an actual application.**

There are many ways one could create "IaC Rosetta Stone" (public cloud providers x CI/CD providers x IaC tools is a big number)
This takes a lot of effort and time

I have nothing to sell you

So many articles about Cloud/DevOps are trying to sell you a tool. Outside of what I consider to be mainstream vendors like GitHub and AWS, there are no products that I'm promoting here
I'm also not trying to sell anyone on using my IaC packages
Hopefully my IaC packages can serve as helpful reference or starting point

Walk before running

I want to build up confidence with vanilla use cases before getting too fancy
With a solid foundation in these tools, I want to learn about some of the more advanced patterns teams are adopting (Pulumi Automation API, Terragrunt for Terraform, self-mutating CDK Pipelines)

12 Factor App, DevOps and Platform Engineering

12 Factor App is great, and has guided how I approach both Django application development and IaC library development
The platformengineering.org community has some good guiding principles

CDK/Terraform/Pulumi terminology

constructs, modules and components

A CDK construct, Terraform module and Pulumi component generally mean the same thing: an abstract grouping of one or more cloud resources.

In this article I will refer to constructs/modules/components as c/m/c for short, and the term stack can generally be used to refer to either a CloudFormation stack, a Pulumi Stack or a Terraform group of resources that are part of a module that has had apply ran against it.

what is a stack?

AWS has a resource type called CloudFormation Stacks, and Pulumi also has a concept of stacks. Terraform documentation doesn't refer to stacks, and instead in Terraform docs use the words "Terraform configuration" to refer to some group of resources that were built using a module.

CDK Constructs and Pulumi Components are somewhat similar, however CDK Constructs map to CloudFormation and the Pulumi components I'm using from the @pulumi/aws package generally map directly to Terraform resources from the AWS Provider (the Pulumi AWS Provider uses much of the same code that the Terraform AWS Provider uses).

verbs

In CDK you synth CDK code to generate CloudFormation templates. You can also run diff to see what changes would be applied during a stack update.

In Terraform you init to download all providers and modules. This is sort of like running npm install in CDK and Pulumi. You then run terraform plan to see the changes that would result. terraform apply does CRUD operations on your cloud resources.

In Pulumi you run pulumi preview to see what changes would be made to a stack. You can use the --diff flag to see the specifics of what would change.

To summarize:

In CDK you synth CloudFormation and use these templates to deploy stacks made up of constructs. An "app" can contain multiple stacks, and you can deploy one or more stacks in an app at a time
In Terraform you plan a configuration made up of modules, and then run terraform apply to build the configuration/stack (discuss.hashicorp.com/t/what-is-a-terraform-stack/31985)
Pulumi: You preview a Pulumi stack made up of components, and then run pulumi up to build the resources
To tear down a stack in all three tools, you run destroy

Infrastructure as Code library repos

Let's look at the three repos that I wrote for deploying the same type of 3-tier web application to AWS using ECS Fargate.

Language

cdk-django and pulumi-aws-django are both written in TypeScript. terraform-aws-django is written in HCL, a domain specific language created by HashiCorp. The cdk-django is published to both npm and PyPI, so you can use it in JavaScript, TypeScript and Python projects, other languages are supported as well, but you need to write your library in TypeScript so it can be transpiled to other languages using jsii.

My Pulumi library is written in TypeScript and is published to NPM. For now it can only be used in JavaScript and TypeScript projects. There is a way in Pulumi to write in any language and then publish to any other major language, but I haven't done this yet. See this GitHub repo for more information on this.

The HCL is pretty simple when you get used to it. I find that I don't like adding lots of logic in Terraform code because it takes away from the readability of a module. There is a tool called CDKTF which allows you to write HCL Terraform in TypeScript, but I haven't used it yet.

Release management, versioning and publishing

pulumi-aws-django and terraform-aws-django both use release-please for automatically generating a changelog file and bumping versions. release-please is an open source tool from Google that they use to version their Terraform GCP modules. Whenever I push new commits to main, a new PR is created that adds changes to the CHANGELOG.md file, bumps the version of the library in package.json and adds a new git tag (e.g. v1.2.3) based on commit messages.

cdk-django uses projen for maintaining the changelog and bumping versions and publishing to npm. It is popular among developers in the CDK community and is a really awesome tool since it basically uses one file (.projenrc.ts) to configure your entire repo, including files like tsconfig.json, package.json, and even GitHub Action workflows. It has a lot of configuration options, but I'm using it in a pretty simple way. It generates a new release and items to the changelog when I manually trigger a GitHub Action.

These tools are both based on conventional commits to automatically update the Changelog file.

I'm still manually publishing my pulumi-aws-django package from the CLI. I need to add a GitHub Action to do this for me. This and other backlog items are listed at the end of the article!

Makefile, examples and local development

Each repo has a Makefile that includes commands that I frequently use when developing new features or fixing bugs. Each repo has commands for the following:

synthesizing CDK to CloudFormation / running terraform plan / previewing pulumi up for both the base and app stacks
creating/updating an ad hoc base stack called dev
destroying resources in the ad-hoc base stack called dev
creating an ad hoc app stack called alpha that uses resources from dev
destroying an ad hoc app stack called alpha that uses resources from dev
creating/updating a prod base stack called stage
destroying resources in the prod base stack called stage
creating a prod app stack using called stage that uses resources from the stage base stack
destroying resources in the prod app stack called stage

Here's an example of what these commands look like in pulumi-aws-django for prod infrastructure base and app stacks:

prod-base-preview:  build
    pulumi -C examples/prod/base --stack stage --non-interactive preview

prod-base-up:   build
    pulumi -C examples/prod/base --stack stage --non-interactive up --yes

prod-base-destroy:  build
    pulumi -C examples/prod/base --stack stage --non-interactive destroy --yes

prod-app-preview:   build
    pulumi -C examples/prod/app --stack stage --non-interactive preview

prod-app-preview-diff:  build
    pulumi -C examples/prod/app --stack stage --non-interactive preview --diff

prod-app-up:    build
    pulumi -C examples/prod/app --stack stage --non-interactive up --yes

prod-app-destroy:   build
    pulumi -C examples/prod/app --stack stage --non-interactive destroy --yes

I currently don't have tests for all of these libraries, but for now the most effective way of testing that things are working correctly is to use the c/m/cs to create environments and smoke check the environments to make sure everything works correctly.

Adding unit tests is another item for the backlog.

ad-hoc vs prod

the last article I wrote was about ad hoc environments. Also known as "on-demand" environments or "preview" environments.
the motivation for using ad-hoc environments is speed and cost (you can stand up an environment in less time and you share the costs of the base environment, including VPC, ALB, RDS)
you can completely ignore "ad-hoc" environments and use the "prod" infrastructure for any number of environments (such as dev, QA, RC, stage and prod)
prod can be used for a production environment and any number of pre-production environments
multiple environments built with "prod" infrastructure can be configured with a "knobs and dials" (e.g., how big are app and DB instances, how many tasks to run in a service, etc.)
the "prod" infrastructure should be the same for the "production" environment and the "staging" environment

Directory structure

The directory structures for each repo are all similar with some minor differences.

There are two types of environments: ad-hoc and prod. Within ad-hoc and production, there are two directories base and app.

Each repo has a directory called internal which contain building blocks used by the c/m/cs that are exposed. The contents of the internal directories are not intended to be used by anyone who is using the libraries.

CDK construct library repo structure

~/git/github/cdk-django$ tree -L 4 -d src/
src/
├── constructs
│   ├── ad-hoc
│   │   ├── app
│   │   └── base
│   ├── internal
│   │   ├── alb
│   │   ├── bastion
│   │   ├── customResources
│   │   │   └── highestPriorityRule
│   │   ├── ecs
│   │   │   ├── iam
│   │   │   ├── management-command
│   │   │   ├── redis
│   │   │   ├── scheduler
│   │   │   ├── web
│   │   │   └── worker
│   │   ├── rds
│   │   ├── sg
│   │   └── vpc
│   └── prod
│       ├── app
│       └── base
└── examples
    └── ad-hoc
        ├── app
        │   └── config
        └── base
            └── config

Terraform module library repo structure

~/git/github/terraform-aws-django$ tree -L 4 -d modules
modules
├── ad-hoc
│   ├── app
│   └── base
├── internal
│   ├── alb
│   ├── autoscaling
│   ├── bastion
│   ├── ecs
│   │   ├── ad-hoc
│   │   │   ├── celery_beat
│   │   │   ├── celery_worker
│   │   │   ├── cluster
│   │   │   ├── management_command
│   │   │   ├── redis
│   │   │   └── web
│   │   └── prod
│   │       ├── celery_beat
│   │       ├── celery_worker
│   │       ├── cluster
│   │       ├── management_command
│   │       └── web
│   ├── elasticache
│   ├── iam
│   ├── rds
│   ├── route53
│   ├── s3
│   ├── sd
│   └── sg
└── prod
    ├── app
    └── base

Pulumi component library repo structure

~/git/github/pulumi-aws-django$ tree -L 3 src/
src/
├── components
│   ├── ad-hoc
│   │   ├── README.md
│   │   ├── app
│   │   └── base
│   └── internal
│       ├── README.md
│       ├── alb
│       ├── bastion
│       ├── cw
│       ├── ecs
│       ├── iam
│       ├── rds
│       └── sg
└── util
    ├── index.ts
    └── taggable.ts

Pulumi examples directory

~/git/github/pulumi-aws-django$ tree -L 3 examples/
examples/
└── ad-hoc
    ├── app
    │   ├── Pulumi.alpha.yaml
    │   ├── Pulumi.yaml
    │   ├── index.ts
    │   ├── node_modules
    │   ├── package-lock.json
    │   ├── package.json
    │   └── tsconfig.json
    └── base
        ├── Pulumi.yaml
        ├── bin
        ├── index.ts
        ├── package-lock.json
        ├── package.json
        └── tsconfig.json

CLOC

Let's use CLOC (count lines of code) to compare the lines of code used in the c/m/c of CDK/CloudFormation/Terraform/Pulumi.

cdk-django

~/git/github/cdk-django$ cloc src/constructs/
      14 text files.
      14 unique files.
       0 files ignored.

github.com/AlDanial/cloc v 1.94  T=0.04 s (356.1 files/s, 30040.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TypeScript                      13            155             59            908
Python                           1             18              8             33
-------------------------------------------------------------------------------
SUM:                            14            173             67            941
-------------------------------------------------------------------------------

terraform-aws-django

~/git/github/terraform-aws-django$ cloc modules/
      68 text files.
      58 unique files.
      11 files ignored.

github.com/AlDanial/cloc v 1.94  T=0.15 s (385.9 files/s, 20585.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HCL                             55            472            205           2390
Markdown                         3              7              0             20
-------------------------------------------------------------------------------
SUM:                            58            479            205           2410
-------------------------------------------------------------------------------

pulumi-aws-django

~/git/github/pulumi-aws-django$ cloc src/components/
      15 text files.
      15 unique files.
       0 files ignored.

github.com/AlDanial/cloc v 1.94  T=0.11 s (134.5 files/s, 12924.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TypeScript                      13            110            176           1119
Markdown                         2              6              0             30
-------------------------------------------------------------------------------
SUM:                            15            116            176           1149
-------------------------------------------------------------------------------

Communities

The CDK, Terraform and Pulumi communities are all great and a lot of people helped when I got stuck on issues writing these libraries. Thank you!

μblog

μblog is a micro blogging application that I have written using Django and Vue.js. Here's a screenshot of the homepage:

It is a pretty simple app. Users can write posts with text and an optional images. Logged in users can write posts and like posts.

Mono-repo structure

It lives in a GitHub mono repo called django-step-by-step. This mono repo contains a few different things:

backend Django application
frontend Vue.js application
IaC code that uses c/m/c from cdk-django, terraform-aws-django and pulumi-aws-django
GitHub Actions workflows for both Infrastructure deployments and application deployments

μblog is the reference application that I deploy to infrastructure created with CDK, Terraform and Pulumi. μblog is meant to represent a generic 12 Factor application that uses:

gunicorn for a backend API
Vue.js for a client that consumes the backend API
celery for async task processing
celery beat for scheduling tasks
Postgres for relational data
Redis for caching and message brokering
S3 for object storage
Django admin for a simple admin interface

There is a lot more I could add on μblog. For now I'll just mention that it:

has a great local development environment (supports both docker-compose and virtual environments)
demonstrates how to use Django in different ways. It implements the same application using Function Based View and Class Based Views, and implements both a REST API (both with FBV and CBV) and GraphQL.
GitHub Actions for running unit tests
k6 for load testing
contains a documentation site deployed to GitHub pages (made with VuePress) can be found here: https://briancaffey.github.io/django-step-by-step/

Infrastructure Deep Dive

Let's go through each of the c/m/cs used in the three libraries. I'll cover some of the organizational decisions, dependencies and differences between how things are done between CDK, Terraform and Pulumi.

I'll first talk about the two stacks used in ad hoc environments: base and app. Then I'll talk about the prod environments which are also composed of base and app stacks.

Keep in mind that there aren't that many differences between the ad hoc environment base and app stacks and the prod environment app and base stacks. A future optimization could be to use a single base and app stack, but I think there is a trade-off between readability and DRYness of infrastructure code, especially with Terraform. In general I try to use very little conditionals and logic with Terraform code. It is much easier to have dynamic configuration in CDK and Pulumi, and probably also for other tools like CDKTF (that I have not yet tried).

Splitting up the stacks

While it is possible to put all resources in a single stack with both Terraform, CDK and Pulumi, it is not recommended to do so.

Terraform enables this with outputs and terraform_remote_state
Pulumi encourages the use of micro stacks
CDK has an article on how to create an app with multiple stacks

My design decision was to keep things limited to 2 stacks. Later on it would be interesting to try splitting out another stack.

Also, on-demand environments really lends itself to stacks that are split up.

In the section "Passing unique identifiers", the CDK recommends that we keep the two stacks in the same app. In Terraform and Pulumi, each stack environment is in its own app.

There is a balance to be found between single stacks vs micro stacks. Both the base and app c/m/cs could be split out further. For example, the base c/m/cs could be split into networking and rds. The app stack could be split into different ECS services so that their infrastructure can be deployed independently, like cluster, backend and frontend. The more resources that a stack has, the longer it takes to deploy and the more risky it gets, but adding lots of stacks can add to mental overhead, and pipeline complexity. Each tool has ways of dealing with these complexities (CDK Pipelines, Terragrunt, Pulumi Automation API), but I won't be getting into any of these options in this article. I would like to try these out and share in a future article.

My rules of thumbs are:

single stacks are bad because you don't want to put all your eggs in one basket, however your IaC tool should give you confidence about what is going to change when you try to make a change
Lots of small stacks can cause overhead and make things more complex than they need to be

Ad hoc base overview

Here's an overview of the resources used in an ad hoc base environment.

(Inputs)
(Optional environment configs)
VPC and Service Discovery
S3
Security Groups
Load Balancer
RDS
Bastion Host

Visualization

Here's a dependency graph showing all of the resources in ad hoc base stack. This can be found on the Resources tab of the ad hoc base stack in the Pulumi console.

Inputs

There are only two required inputs for the ad hoc base stack

ACM certificate ARN
Domain Name

I store these values in environment variables for the pipelines in CDK, Terraform and Pulumi. When running pipelines from my local environment, they are exported in my shell before running deploy/apply/up or synth/plan/preview.

VPC

The VPC is the first resource that is created as part of the base stack. There official, high-level constructs in each IaC tool for building VPCs and all related networking resources.

awsx has a VPC module
terraform-aws-vpc module
L2 VPC Construct in CDK

The setting in the Terraform VPC module one_nat_gateway_per_az = false doesn't seem to exist on the awsx.ec2.Vpc module. This will add to cost savings since it will use 1 NAT Gateway instead of 2 or 3.

Security Groups

Pulumi and Terraform can be used in a similar way to define security groups. CDK has a much more concise option for defining ingress and egress rules for security groups.

    const albSecurityGroup = new SecurityGroup(scope, 'AlbSecurityGroup', {
      vpc: props.vpc,
    });

    albSecurityGroup.addIngressRule(Peer.anyIpv4(), Port.tcp(443), 'HTTPS');
    albSecurityGroup.addIngressRule(Peer.anyIpv4(), Port.tcp(80), 'HTTP');

Load Balancer Resources

There's not much to comment on here. In each library I have resource group that defines the following:

Application Load Balancer
A default target group
An HTTP listener that redirects to HTTPS
An HTTPS listener with a default "fixed-response" action

Properties from these resources are used in the "app" stack to build listener rules for ECS services that are configured with load balancers, such as the backend and frontend web services.

Ad hoc app environments all share a common load balancer from the base stack that they use.

RDS Resources

All three libraries have the RDS security group and Subnet Group in the same c/m/c as the RDS instance. The SG and DB Subnet group could alternatively be grouped closer to the other network resources.

Currently the RDS resources are part of the "base" stack for each library. A future optimization may be to break the RDS instance out of the "base" stack and put it in its own stack. The "RDS" stack would be dependent on the "base" stack, and then "app" stack would then be dependent on both the "base" stack and the "RDS" stack. More stacks isn't necessarily a bad thing, but for my initial implementation of these libraries I have decided to keep the "micro stacks" approach limited to only 2 stacks for an environment.

The way that database secrets are handled is another difference between CDK and Terraform and Pulumi. I am currently "hardcoding" the RDS password for Terraform and Pulumi, and in CDK I am using a Secrets Manager Secret for the database credential.

    const secret = new Secret(scope, 'dbSecret', {
      secretName: props.dbSecretName,
      description: 'secret for rds',
      generateSecretString: {
        secretStringTemplate: JSON.stringify({ username: 'postgres' }),
        generateStringKey: 'password',
        excludePunctuation: true,
        includeSpace: false,
      },
    });

In the DatabaseInstance props we can then use this secret like so:

    credentials: Credentials.fromSecret(secret),

In the application deployed with CDK, I use a Django settings module package that uses a package called aws_secretsmanager_caching to get and cache the secrets manager secret for the database, whereas in the apps deployed with Terraform and Pulumi I read in the password from an environment variable.

The Terraform and Pulumi database instance arguments simply accept a password field. This will be another item for the backlog for Terraform and Pulumi. The randompassword and secretversion components can be used to do this.

Bastion Host

There are two main use cases for the bastion host in ad-hoc environments.

When creating a new ad hoc app environment, the bastion host is used to create a new database called {ad-hoc-env-name}-db that the new ad hoc environment will use. (There might be another way of doing this, but using a bastion host is working well for now).
If you using a database management tool on you local machine like DBeaver, the bastion host can help you connect to the RDS instance in a private subnet. The bastion host instance is configured to run a service that forwards traffic on port 5432 to the RDS instance. If you port forward from your local machine to the bastion host on port 5432, you can connect RDS by simple connecting to localhost:5432 on your local machine.

You don't need to manage SSH keys since you connect to the instance in a private subnet using SSM:

aws ssm start-session --target $INSTANCE_ID

Outputs

Here are the outputs for the ad hoc base stack used in Terraform and Pulumi:

vpc_id
assets_bucket_name
private_subnet_ids
app_sg_id
alb_sg_id
listener_arn
alb_dns_name
task_role_arn
execution_role_arn
rds_address

In CDK, the stack references in the app stack don't reference the unique identifiers from the base stack (such as the VPC id or bastion host instance id), but instead they reference the properties of the stack which have types like Vpc and RdsInstance. More on this later in the following section Passing data between stacks.

Ad hoc app overview

The ad hoc app is an group of resources that powers an on-demand environment that is meant to be short lived for testing, QA, validation, demos, etc.

This visualization shows all of the resources in the ad hoc app stack. It also comes from the Pulumi console.

ECS Cluster

This is a small component that defines both ECS Cluster and the default capacity providers
It defaults to not using FARGATE_SPOT; ad hoc environments do use FARGATE_SPOT for cost savings

NOTE: defaultCapacityProviderStrategy on cluster not currently supported. (link)

Shared environment variables

The backend containers should all have the same environment variables, so I define them once in the app stack and pass these into the service resource c/m/cs.

I struggled to get this right in pulumi. A lot of Pulumi examples used JSON.stringify for containerDefinitions in task definitions. I was able to get help from the Pulumi Slack channel; someone recommended that I use pulumi.jsonStringify which was added in a relatively recent version of pulumi/pulumi.
CDK allows you to declare environment variables for a containerDefinition like { FOO: "bar" }
Pulumi and Terraform require that values are passed like { name: "FOO", value: "bar"}
You could transform { FOO: "bar" } into the name/value format, but I didn't bother to do this
extra env vars in Terraform to allow for dynamically passing extra environment variables, and I used the concat function to add these to the list of default environment variables.

Here's what the code looks like for joining extra environment variables to the default environment variables:

    // CDK
    if (extraEnvVars) {
      environmentVariables = { ...extraEnvVars, ...environmentVariables };
    }

    # terraform
    env_vars = concat(local.env_vars, var.extra_env_vars)

    // Pulumi
    if (extraEnvVars) {
      envVars = envVars.apply(x => x.concat(extraEnvVars!))
    }

Route53 Record

This is pretty straightforward in each library. Each ad hoc environment gets a Route 53 record, and listener rules for the web services (Django and Vue.js SPA) match on a combination of the host header and path patterns.

This part is pretty opinionated in that it assumes you want to host the frontend and backend services on the same URL. For example, requests matching example.com/api/* are routed to the backend API and all other requests matching example.com/* are routed to the frontend service.

Redis

I go into more depth about why I run a Redis instance in an ECS service in my other article. This is only for the ad hoc environments. Production environments are configured with ElastiCache running Redis.

I decided to not make this service use any persistent storage. It may be a good idea to not use FARGATE_SPOT for this service, since restarts to the redis service could cause issues in ad hoc environments. For example, you may get a lot of celery errors in ad hoc environments if redis is not reachable.

Web Service

The web service is what defines the main Django application as well as the frontend website (JavaScript SPA or SSR site). I designed the Web Service resources group to be able to support both traditional Django apps (powered by templates), or for Django apps that service only a limited number of endpoints. This c/m/c has an input parameter called pathPatterns which determines which paths it serves. For example, the API container may serve traffic for /api/* and /admin/* only, or it may want to serve all traffic (/*).

The way I use these components in ad hoc and prod environments is heavily opinionated in that:

it assumes that the frontend SPA/SSR site should have a lower priority rule than the backend service and should route request paths matching /*, while the backend service routes requests for a specific list of path patterns (/api/*, /admin/*, /graphql/*, etc.).

You may want Django to handle most of your routes and 404 pages, in which case you would want the SPA to only handle requests matching certain paths. This would require some more consideration and careful refactoring.

Celery

The reason for having a celery service is to be able to have potentially multiple workers that scale independently
I use the same Pulumi component for both works and schedulers

The terminology for this resource group could be better. Celery is one of many options for running async task workers, so it should probably be called something like AsyncWorker across the board rather than using the term celery.

Management Command

Defines a task that can be used to run commands like collectstatic and migrate
These tasks are ran both after the initial app stack deployment and before rolling application upgrades

In my Django app I have a single management command that calls migrate and collectstatic and runs them in the same process one after another. This management command could also be used for clearing caches during updates, loading fixtures, etc.

One other thing to note about this c/m/c is that it outputs a complete script that can be used in GitHub Actions (or on your CLI when testing locally) that does the following:

saves the START timestamp
runs the task with the required settings
waits for the task to complete
saves the END timestamp
collects the logs for the task between START and END and prints them to stdout

Here's an example of what the script looks like in Pulumi:

    const executionScript = pulumi.interpolate`#!/bin/bash
START_TIME=$(date +%s000)
TASK_ID=$(aws ecs run-task --cluster ${props.ecsClusterId} --task-definition ${taskDefinition.arn} --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[${props.privateSubnetIds.apply(x => x.join(","))}],securityGroups=[${props.appSgId}],assignPublicIp=ENABLED}" | jq -r '.tasks[0].taskArn')
aws ecs wait tasks-stopped --tasks $TASK_ID --cluster ${props.ecsClusterId}
END_TIME=$(date +%s000)
aws logs get-log-events --log-group-name ${cwLoggingResources.cwLogGroupName} --log-stream-name ${props.name}/${props.name}/\${TASK_ID##*/} --start-time $START_TIME --end-time $END_TIME | jq -r '.events[].message'
`;
    this.executionScript = executionScript;

In GitHub Actions we get this command as a stack output, save it to a file, make it executable and then run it. This is what it looks like with CDK as a CloudFormation stack output:

      - name: "Run backend update command"
        id: run_backend_update
        run: |
          # get the script from the stack output with an output key that contains the string `backendUpdate`
          BACKEND_UPDATE_SCRIPT=$(aws cloudformation describe-stacks \
            --stack-name $AD_HOC_APP_NAME \
            | jq -r '.Stacks[0].Outputs[]|select(.OutputKey | contains("backendUpdate")) | .OutputValue' \
          )

          echo "$BACKEND_UPDATE_SCRIPT" > backend_update_command.sh
          cat backend_update_command.sh
          sudo chmod +x backend_update_command.sh
          ./backend_update_command.sh

Passing data between stacks

Pulumi uses stack references, Terraform uses remote state and CDK uses Stack Outputs or Stack References.

Here's what this looks like in Terraform

data "terraform_remote_state" "this" {
  backend = "local"

  config = {
    path = "../base/terraform.tfstate"
  }
}

module "main" {
  source = "../../../modules/ad-hoc/app"

  vpc_id                         = data.terraform_remote_state.this.outputs.vpc_id
  assets_bucket_name             = data.terraform_remote_state.this.outputs.assets_bucket_name
  private_subnet_ids             = data.terraform_remote_state.this.outputs.private_subnet_ids
  app_sg_id                      = data.terraform_remote_state.this.outputs.app_sg_id
  alb_sg_id                      = data.terraform_remote_state.this.outputs.alb_sg_id
  listener_arn                   = data.terraform_remote_state.this.outputs.listener_arn
  alb_dns_name                   = data.terraform_remote_state.this.outputs.alb_dns_name
  service_discovery_namespace_id = data.terraform_remote_state.this.outputs.service_discovery_namespace_id
  rds_address                    = data.terraform_remote_state.this.outputs.rds_address
  domain_name                    = data.terraform_remote_state.this.outputs.domain_name
  base_stack_name                = data.terraform_remote_state.this.outputs.base_stack_name
  region                         = var.region
}

In CDK:

const baseStack = new Stack(app, 'ExampleAdHocBaseStack', { env, stackName: adHocBaseEnvName });
baseStack.node.setContext('config', adHocBaseEnvConfig);

const appStack = new Stack(app, 'ExampleAdHocAppStack', { env, stackName: adHocAppEnvName });
appStack.node.setContext('config', adHocAppEnvConfig);

const adHocBase = new AdHocBase(baseStack, 'AdHocBase', { certificateArn, domainName });

const addHocApp = new AdHocApp(appStack, 'AdHocApp', {
  baseStackName: adHocBaseEnvName,
  vpc: adHocBase.vpc,
  alb: adHocBase.alb,
  appSecurityGroup: adHocBase.appSecurityGroup,
  serviceDiscoveryNamespace: adHocBase.serviceDiscoveryNamespace,
  rdsInstance: adHocBase.databaseInstance,
  assetsBucket: adHocBase.assetsBucket,
  domainName: adHocBase.domainName,
  listener: adHocBase.listener,
});

and in Pulumi:

const stackReference = new pulumi.StackReference(`${org}/ad-hoc-base/${environment}`)

const vpcId = stackReference.getOutput("vpcId") as pulumi.Output<string>;
const assetsBucketName = stackReference.getOutput("assetsBucketName") as pulumi.Output<string>;
const privateSubnets = stackReference.getOutput("privateSubnetIds") as pulumi.Output<string[]>;
const appSgId = stackReference.getOutput("appSgId") as pulumi.Output<string>;
const albSgId = stackReference.getOutput("albSgId") as pulumi.Output<string>;
const listenerArn = stackReference.getOutput("listenerArn") as pulumi.Output<string>;
const albDnsName = stackReference.getOutput("albDnsName") as pulumi.Output<string>;
const serviceDiscoveryNamespaceId = stackReference.getOutput("serviceDiscoveryNamespaceId") as pulumi.Output<string>;
const rdsAddress = stackReference.getOutput("rdsAddress") as pulumi.Output<string>;
const domainName = stackReference.getOutput("domainName") as pulumi.Output<string>;
const baseStackName = stackReference.getOutput("baseStackName") as pulumi.Output<string>;

// ad hoc app env
const adHocAppComponent = new AdHocAppComponent("AdHocAppComponent", {
  vpcId,
  assetsBucketName,
  privateSubnets,
  appSgId,
  albSgId,
  listenerArn,
  albDnsName,
  serviceDiscoveryNamespaceId,
  rdsAddress,
  domainName,
  baseStackName
});

CLI scaffolding

CDK and Pulumi have some good options for how to scaffold a project.

Pulumi has pulumi new aws-typescript among lots of other options (run pulumi new -l to see over 200 project types). I used this to create the library itself, the examples and the pulumi projects that I use in django-step-by-step that consume the library.
CDK has projen CLI commands which can help set up either library code or project code
The major benefits of these tools is setting up tsconfig.json and package.json correctly
Terraform is so simple that it doesn't really need tooling for scaffolding

Best practices

For terraform-aws-django, I tried to follow the recommendations from terraform-best-practices.com which helped me a lot with things like consistent naming patterns and directory structures. For example:

use the name this for resources in a module where that resource is the only resource of its type

CDK and Pulumi lend themselves to more nesting and abstractions because they can be written in more familiar programming languages with better abstractions, functions, loops, classes, etc., so there are some differences in directory structure of my libraries when comparing Terraform to both CDK and Pulumi.

For Pulumi and CDK, I mostly tried to follow along with recommendations from their documentation and example projects. While working with Pulumi I struggled a bit with the concepts of Inputs, Outputs, pulumi.interpolate, apply(), all() and the differences between getX and getXOutput. There is a little bit of a learning curve here, but the documentation and examples go a long way in showing how to do things the right way.

Environment configuration

Environment configuration allows for either a base or app stack to be configured with non-default values. For example:

you may decide to start a new base environment but you want to provision a powerful database instance class and size. You would change this using environment configuration
You might want to create an ad hoc app environment but you need it to include some special environment variables, you could set these in environment config.

In the examples above, our IaC can optionally take environment configuration values that overwrite default values, or extend default values.

Pulumi defines environment-specific config in files called Pulumi.{env}.yaml (Pulumi article on configuration)
Terraform uses {env}.tfvars for this type of configuration
CDK has several options for this type of configuration (cdk.context.json, extending stack props, etc.)

For CDK I have been using setContext and the tryGetContext method:

setContext needs to be set on the node before any child nodes are added:

const baseStack = new Stack(app, 'ExampleAdHocBaseStack', { env, stackName: adHocBaseEnvName });
baseStack.node.setContext('config', adHocBaseEnvConfig);

const appStack = new Stack(app, 'ExampleAdHocAppStack', { env, stackName: adHocAppEnvName });
appStack.node.setContext('config', adHocAppEnvConfig);

And the config objects are read from JSON files like this:

var adHocBaseEnvConfig = JSON.parse(fs.readFileSync(`src/examples/ad-hoc/base/config/${adHocBaseEnvName}.json`, 'utf8'));
var adHocAppEnvConfig = JSON.parse(fs.readFileSync(`src/examples/ad-hoc/app/config/${adHocAppEnvName}.json`, 'utf8'));

The context can be used in constructs like this:

    const extraEnvVars = this.node.tryGetContext('config').extraEnvVars;

Pulumi has similar functions for getting context values, here's an example of how I get extra environment variables for app environments using Pulumi's config:

    interface EnvVar {
      name: string;
      value: string;
    }

    let config = new pulumi.Config();
    let extraEnvVars = config.getObject<EnvVar[]>("extraEnvVars");

In my Pulumi.alpha.yaml file I have the extraEnvVars set like this:

config:
  aws:region: us-east-1
  extraEnvVars:
    - name: FOO
      value: BAR
    - name: BIZ
      value: BUZ

I haven't done too much with configuration, but it seems like the right place to build out all of the dials and switches for optional settings in stack resources that you want people to be able to change in their ad hoc environments, or that you want to set per "production" environment (QA, stage, prod, etc.)

Local development

Using the Makefile targets in each library repo, my process for developing c/m/cs involves making code changes followed by Makefile targets that preview/plan/diff against my AWS account, then running deploy/apply/up and waiting for things to finish deploying. Once I can validate that things are looking correct in my account, I run the destroy command and make sure that all of the resources are removed successfully. RDS instances can take up to 10 minutes to create, which means that the base stack takes some time to test. The app environment is able to be spun up quickly, but it can sometimes get stuck and take some time to delete services.

Here are some sample times for deploying ad hoc stacks with CDK.

# CDK ad hoc base deployment time

 ✅  ExampleAdHocBaseStack (dev)

✨  Deployment time: 629.64s

# CDK ad hoc app deployment time

 ✅  ExampleAdHocAppStack (alpha)

✨  Deployment time: 126.62s

Here is an example of what the pulumi preview commands shows for the ad-hoc base stack:

# Pulumi preview
~/git/github/pulumi-aws-django$ pulumi -C examples/ad-hoc/base --stack dev preview
Previewing update (dev)

View Live: https://app.pulumi.com/briancaffey/ad-hoc-base/dev/previews/718625b2-48f5-4ef4-8ed4-9b2694fda64a

     Type                                                    Name                        Plan
 +   pulumi:pulumi:Stack                                     ad-hoc-base-dev             create
 +   └─ pulumi-contrib:components:AdHocBaseEnv               myAdHocEnv                  create
 +      ├─ pulumi-contrib:components:AlbResources            AlbResources                create
 +      │  ├─ aws:alb:TargetGroup                            DefaultTg                   create
 +      │  ├─ aws:alb:LoadBalancer                           LoadBalancer                create
 +      │  ├─ aws:alb:Listener                               HttpListener                create
 +      │  └─ aws:alb:Listener                               HttpsListener               create
 +      ├─ pulumi-contrib:components:BastionHostResources    BastionHostResources        create
 +      │  ├─ aws:iam:Role                                   BastionHostRole             create
 +      │  ├─ aws:iam:RolePolicy                             BastionHostPolicy           create
 +      │  ├─ aws:iam:InstanceProfile                        BastionHostInstanceProfile  create
 +      │  └─ aws:ec2:Instance                               BastionHostInstance         create
 +      ├─ pulumi-contrib:components:RdsResources            RdsResources                create
 +      │  ├─ aws:rds:SubnetGroup                            DbSubnetGroup               create
 +      │  ├─ aws:ec2:SecurityGroup                          RdsSecurityGroup            create
 +      │  └─ aws:rds:Instance                               DbInstance                  create
 +      ├─ pulumi-contrib:components:SecurityGroupResources  SecurityGroupResources      create
 +      │  ├─ aws:ec2:SecurityGroup                          AlbSecurityGroup            create
 +      │  └─ aws:ec2:SecurityGroup                          AppSecurityGroup            create
 +      ├─ aws:s3:Bucket                                     assetsBucket                create
 +      ├─ awsx:ec2:Vpc                                      dev                         create
 +      │  └─ aws:ec2:Vpc                                    dev                         create
 +      │     ├─ aws:ec2:InternetGateway                     dev                         create
 +      │     ├─ aws:ec2:Subnet                              dev-private-1               create
 +      │     │  └─ aws:ec2:RouteTable                       dev-private-1               create
 +      │     │     ├─ aws:ec2:RouteTableAssociation         dev-private-1               create
 +      │     │     └─ aws:ec2:Route                         dev-private-1               create
 +      │     ├─ aws:ec2:Subnet                              dev-private-2               create
 +      │     │  └─ aws:ec2:RouteTable                       dev-private-2               create
 +      │     │     ├─ aws:ec2:RouteTableAssociation         dev-private-2               create
 +      │     │     └─ aws:ec2:Route                         dev-private-2               create
 +      │     ├─ aws:ec2:Subnet                              dev-public-1                create
 +      │     │  ├─ aws:ec2:RouteTable                       dev-public-1                create
 +      │     │  │  ├─ aws:ec2:RouteTableAssociation         dev-public-1                create
 +      │     │  │  └─ aws:ec2:Route                         dev-public-1                create
 +      │     │  ├─ aws:ec2:Eip                              dev-1                       create
 +      │     │  └─ aws:ec2:NatGateway                       dev-1                       create
 +      │     └─ aws:ec2:Subnet                              dev-public-2                create
 +      │        ├─ aws:ec2:RouteTable                       dev-public-2                create
 +      │        │  ├─ aws:ec2:RouteTableAssociation         dev-public-2                create
 +      │        │  └─ aws:ec2:Route                         dev-public-2                create
 +      │        ├─ aws:ec2:Eip                              dev-2                       create
 +      │        └─ aws:ec2:NatGateway                       dev-2                       create
 +      └─ aws:servicediscovery:PrivateDnsNamespace          PrivateDnsNamespace         create


Outputs:
    albDnsName                 : output<string>
    albSgId                    : output<string>
    appSgId                    : output<string>
    assetsBucketName           : output<string>
    baseStackName              : "dev"
    bastionHostInstanceId      : output<string>
    domainName                 : "example.com"
    listenerArn                : output<string>
    privateSubnetIds           : output<string>
    rdsAddress                 : output<string>
    serviceDiscoveryNamespaceId: output<string>
    vpcId                      : output<string>

Resources:
    + 44 to create

Running infrastructure pipelines in GitHub Actions

I don't currently have GitHub Actions working for all tools in all environments, this part is still a WIP but is working at a basic level. Another item for the backlog!

In the .github/workflows directory of the django-step-by-step repo, I will have the following 2 * 2 * 2 * 3 = 24 pipelines for running infrastructure as code pipelines:

{ad_hoc,prod}_{base,app}_{create_update,destroy}_{cdk,terraform,pulumi}.yml

For CDK I'm using CDK CLI commands
For Terraform I'm also using terraform CLI commands
For Pulumi I'm using the official Pulumi GitHub Action

Pulumi has a great article about how to use their official GitHub Action. This action calls the Pulumi CLI under the hood with all of the correct flags.

The general pattern that all of these pipelines use is:

Do a synth/plan/preview, and upload the synth/plan/preview file to an artifact
Pause and wait on manual review of the planned changes
download the artifact and run deploy/apply/up against it, or optionally cancel the operation if the changes you see in the GitHub Actions pipeline logs are not what you expected.

I do this by having two jobs in each GitHub Action: one for synth/plan/preview and one for deploy/apply/up.

The job for deploy/apply/up includes an environment that is configured in GitHub to be a protected environment that requires approvals. Even if you are the only approver (which I am on this project), it is the easiest and safest way preview infrastructure changes before they happen. If you see something in the plan and it isn't what you wanted to change, you cancel the job.

Application deployments

There are two GitHub Actions pipelines for deploying the frontend and the backend. Both of these pipelines run bash scripts that call AWS CLI commands to perform rolling updates on all of the services used in the application (frontend, API, workers, scheduler)
The backend deployment script runs database migrations, the collectstatic command and any other commands needed to run before the rolling update starts (clearing the cache, loading fixtures, etc.)
What is important to note here is that application deployments are not dependent on the IaC tool we use. Since we are tagging things consistently across CDK, Terraform and Pulumi, we can look up resources by tag rather than getting "outputs" of the app stacks.

Interacting with AWS via IaC

CDK interacts directly with CloudFormation (and custom resources which allow for running arbitrary SDK calls and lambda functions) and provides L1, L2 and L3 constructs which offer different levels of abstraction over CloudFormation.
Terraform has the AWS Provider and the terraform-aws-modules.
Pulumi has AWS Classic (@pulumi/aws) provider and AWSx (Crosswalk for Pulumi) library and aws_native provider.

aws_native "manages and provisions resources using the AWS Cloud Control API, which typically supports new AWS features on the day of launch."

aws_native looks like a really interesting option, but it is currently in public preview so I have not decided to use it. I am using the AWSx library only for my VPC and associated resources, everything else uses the AWS Classic provider.

For CDK I use mostly L2 constructs and some L1 constructs.

Fot Terraform I use the VPC from the terraform-aws-modules, and everything else uses the AWS Terraform Provider.

What I did not put in IaC

ECR (Elastic Container Registry)
ACM (Amazon Certificate Manager)
(Roles used for deployments)

I created the Elastic Container Registry backend and frontend repos manually in the AWS Console. I also manually requested an ACM certificate for *.mydomain.com for the domain that I use for testing that I purchased through Route53 domains.

I currently am using another less-than best practice of using Administrative Credentials stored in GitHub secrets. The better approach here is to make roles for different pipelines and use OIDC to authenticate instead of storing credentials. This is another good item for the backlog.

Tagging

Terraform and CDK both make it easy to automatically tag all resources in a stack
It is possible to do this in Pulumi, but you need to write a little bit of code.
https://www.pulumi.com/blog/automatically-enforcing-aws-resource-tagging-policies/
https://github.com/joeduffy/aws-tags-example/tree/master/autotag-ts
Tagging is important since I look up resources by tag in GitHub Actions pipelines (for example, the Bastion Host is looked up by tag)
Automatically tagging resources works through stack transformations are unique to Pulumi

Smoke checking application environments

Here's the list of things I check when standing up an application environment:

[x] Run the init/tsc, synth/plan/preview and deploy/apply/up commands successfully
[x] Access the bastion host (make aws-ssm-start-session)
[x] Run ECSExec to access a shell in a backend container (make aws-ecs-exec)
[x] Test database connectivity (python manage.py showmigrations)
[x] Run the migrations (python manage.py migrate)
[x] Run collectstatic (python manage.py collectstatic)
[x] Visit the site (alpha.example.com)
[x] Publish a blog post
[x] Publish a blog post with an image
[x] Check celery worker logs for successfully complete scheduled tasks
[x] Trigger an autoscaling event by running k6 load tests against an environment
[x] Optionally deploy another backend or frontend image tag using the GitHub Actions pipelines for backend and frontend updates
[x] Destroy the app stack
[x] Destroy the base stack

Backlog and next steps

Here are some of the next things I'll be working on in these project, roughly in order of importance:

Introduce manual approvals in GitHub Actions for all deployments and allow for the previewing or "planning" before proceeding with an live operations in infrastructure pipelines
Switch to using OIDC for AWS authentication from GitHub Actions and remove AWS secrets from GitHub
Show how to do account isolation (different accounts for prod vs pre-prod environments)
GitHub Actions deployment pipeline for publishing pulumi-aws-django package
Complete all GitHub Action deployment pipelines for base and app stacks (both ad hoc and prod)
For Pulumi and Terraform, use a Secrets Manager secret for the database instead of hardcoding it. Use the random functions to do this
Refactor GitHub Actions and make them reusable across different projects
Writing tests for Pulumi and CDK. Figure out how to write tests for Terraform modules
Use graviton instances and have the option to select between different architectures
Standardize all resources names across CDK, Terraform and Pulumi
The Pulumi components that define the resources associated with each ECS service are not very dry
Interfaces could be constructed with inheritance (base set of properties that is extended for different types of services)
Fix the CDK issue with priority rule on ALB listeners. I need to used a custom resource for this which is currently a WIP. Terraform and Pulumi look up the next highest listener rule priority under the hood, so you are not required to provide it, but CDK requires it, which means that you can't do ad hoc environments in CDK without a custom resource that looks up what the next available priority number is.
Make all three of the libraries less opinionated. For example, the celery worker and scheduler should be optional and the frontend component should also be optional
experiment with using a frontend with SSR. This is supported by Quasar, the framework I'm currently using to build my frontend SPA site

If you want to get involved or help with any of the above, please let me know!

Conclusion

I first started out with IaC following this project aws-samples/ecs-refarch-cloudformation (which is pretty old at this point) and wrote a lot of CloudFormation by hand. The pain of doing that lead me to explore the CDK with Python. I learned TypeScript by rewriting the Python CDK code I wrote in TypeScript. I later worked with a team that was more experienced in Terraform and learned how to use that. I feel like Pulumi takes the best of the two tools and has a really great developer experience. There is a little bit of a learning curve with Pulumi, and you give up some of the simplicity of Terraform.

Sharing

Thank you for reading to the end! I have posted the article on the following channels, please like and share the article and follow me wherever you are active. Also please share your feedback!

(links pending)

my personal blog (briancaffey.github.io)
DEV.to
Ops Community
Medium
Substack
My MailChimp mailing list
Hacker News
Twitter
LinkedIn
Facebook
HashNode
Hackernoon
Reddit (r/aws, r/Terraform, r/pulumi, r/django, r/devops)
Terraform Forum
Pulumi Slack channel
CDK.dev Slack channel

Redis Light, Green Light

Brian Caffey — Mon, 29 Aug 2022 02:44:40 +0000

Overview of My Submission

I recreated the game "Red Light, Green Light" using Python, TypeScript and Redis!

One day in early August I was browsing DEV while re-watching MrBeast's Squid Game recreation video in the background when I came across the Redis Hackathon article. Then I got a crazy, random idea: Redis Light, Green Light!

I became determined to create my own online, real-time, multi-player version of Red Light, Green Light powered by Redis and submit it to the Wacky Wildcard project category for my chance to win the hackathon!

I used my favorite languages and frameworks for rapid prototyping: Python with Flask to power the backend and TypeScript with the Nuxt.js framework to build the frontend components for my game.

For real-time communication I added the Flask-SocketIO library to my Flask app and the socket.io-client library to my Nuxt app. I also added celery for scheduling and processing async tasks. Redis was used as the message queue for websocket messages and it was also used as the broker for celery tasks.

This was my first project working with Redis Stack and Redis OM and I really liked using these tools. I stored most of my data in hashes, and the Redis OM library is perfect for using this data type. I also used Redis streams for the first time which was a lot of fun.

The backend application services include:

Flask server (for API endpoints and Websocket handlers)
Celery beat task scheduler (for scheduling tasks to change the light color in each room)
Celery worker (to change the light color for a room and to update that players in that room via Websocket)

Please check out the video below for more details about how the project works.

Submission Category

Wacky Wildcards

Redis Light, Green Light YouTube Video

Language Used

Python. Honorable mention for JavaScript.

Link to Code

briancaffey / redis-light-green-light-dev-to-hackathon

My submission for the Redis Hackathon on DEV! "Red Light, Green Light" built with Python, JavaScript and Redis

Redis Light, Green Light

This project is an online, multiplayer implementation of "Red Light, Green Light" from Squid Game built with Python, Javascript and Redis. This is my submission for the 2022 Redis Hackathon on DEV!

Gameplay

Game event log built with redis streams

Architecture Overview

Code Overview with `cloc` (count lines of code)

make cloc

github.com/AlDanial/cloc v 1.94  T=0.03 s (1102.7 files/s, 67098.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Vuejs Component                 13            104             14            528
Python                           3            178            104            410
Markdown                         5            139              0            272
YAML                             3             10              0             83
make                             1             10              2             34
SVG                              2              0              0             22
TypeScript                       1              1              1             22
CSS                              1              4              0             18
JavaScript                       1              0              1             18
Text                             2              0              0             13
Dockerfile                       1              8              0             12
-------------------------------------------------------------------------------
SUM:                            33            454            122           1432
-------------------------------------------------------------------------------

Overview video

Here's a short video that explains the project and how it uses…

View on GitHub

Additional Resources / Info

Check out Redis OM, client libraries for working with Redis as a multi-model database.
Use RedisInsight to visualize your data in Redis.
Sign up for a free Redis database.

Setting up ad hoc development environments for Django applications with AWS ECS, Terraform and GitHub Actions

Brian Caffey — Sat, 11 Jun 2022 16:20:47 +0000

tl;dr

This article will show how software development teams can build on-demand instances of a web application project for dog-food testing, quality review, internal and external demos and other use cases that require short-lived but feature-complete environments. It will focus on the technical implementation of building ad hoc environments using a specific set of tools (including AWS ECS, Terraform and GitHub Actions). I will also be giving context on high-level implementation decisions based on what I think are best practices guided by the 12-Factor Application methodology. If any of this interests you, please have a read and let me know what you think in the comments on the outlets where I'll be sharing this article (links at the end).

GitHub Links

This article references three open-source code repositories on GitHub.

django-step-by-step
- this repo contains an example microblogging application called μblog built with Django
- the same application is implemented as a traditional Model Template View (MTV) site, a decoupled REST API and Javascript web application and a GraphQL API
- it is a monorepo that also includes a frontend Vue.js application, CI/CD pipelines, a VuePress documentation site as well as tooling and instructions for settings up a local development environments (both with and without docker)
- it includes a complete set of GitHub Action examples for automating the processes of creating, updating and destroying ad hoc environments that will be an important part of what is covered in this article
terraform-aws-django
- a collection of modules for running Django applications on AWS using Terraform
- one of the submodules can be used for creating ad hoc environments which will be what we use to create ad hoc environments
- this module has been published to Terraform Registry and is used in the terraform/live/ad-hoc directory of the django-step-by-step repo
terraform-aws-ad-hoc-environments
- a Terraform module that provides shared infrastructure used by ad hoc environments (including VPC, RDS instance, bastion host, security groups and IAM roles, etc.)
- this module has also been published to Terraform Registry and is also used in django-step-by-step
- this module is designed to be used with the terraform-aws-django Terraform module

Assumptions

There are all sorts of applications, and all sort of engineering teams. For some context on what I'm describing in this article, here are some basic assumptions that I'm making about the type of engineering team and software application product that would be a good fit for this type of development workflow.

engineering team is composed of a backend team, a frontend team, a devops team and works closely with a product team
backend team primarily develops a REST API
frontend team develops a JavaScript SPA (frontend website)
SPA consumes backend REST API
product team frequently needs to demo applications to prospective clients
development teams don't have deep expertise in infrastructure, containers, CI/CD or automation
devops team has been tasked with building automation that will allow anyone on the team to quickly spin up a complete environment for testing and demoing purposes within minutes

Here are assumptions about specific tools and technologies used at the company:

backend is a REST API developed with Django and a Postgres database
backend is packaged into a docker container
frontend is also packaged into a docker container using multi-stage builds and NGINX
frontend does not require any build-time configuration (all configuration needed by frontend is fetched from backend)
backend application's configuration is driven by plain-text environment variables at run-time
engineering team uses AWS
automation pipeline exists for building, tagging and pushing backend and frontend container images to an ECR repository
devops team uses AWS ECS for running containerized workloads
devops team uses Terraform for provisioning infrastructure
devops team uses GitHub Actions for building automation pipelines
team is somewhat cost-conscious

What are ad hoc environments?

Ad hoc environments are short-lived environments that are designed to be used for testing a specific set of features or for demoing a specific application configuration in an isolated environment. It is intended to be a functional duplicate of the main production environment. An ad hoc environment is the first cloud environment that the application code will be deployed to after a developer has been working on it in a local development environment.

Trade-offs to make when designing ad hoc environment infrastructure and automation

Now that we have a sense of what we are building and the team we are working with, let's think about the high-level trade-offs that we will face as we build a solution for providing on-demand ad hoc environments. When building infrastructure and workflows for ad hoc environments, there are a few things to solve for:

simplicity of the end-user interface and process for requesting an ad hoc environment
startup speed
total cost of ownership
degree of similarity to production environments
shared vs isolated resources
automation complexity

Let's look at these items by considering how we can set up the main infrastructure components that will be used to run our ad hoc application environments.

Relational Databases

Startup speed can be measured by the time between when an environment is requested and when that environment can be used by whoever requested it. In this period of time, an automation pipeline may do some of the following:

run terraform init, terraform plan and terraform apply to build infrastructure
run scripts to prepare the application such as database migrations
seeding initial sample data with a script or database dump
message the user with information about the environment (URLs, commands for accessing an interactive shell, etc.)

RDS instances can take a long time to create relative to other AWS resources such as S3 buckets and IAM roles. RDS instances are also more costly than other resources. We could use a single, shared RDS instance placed in a private subnet of a shared VPC. Each ad hoc environment could use a different named database in the RDS instance in the form {ad-hoc-env-name}-db. Using one RDS instance per ad hoc environment would be slow to startup and tear down and also costly if there are many developers using ad hoc environments simultaneously.

If we choose to isolate the application's relational database at the database level (and not the RDS instance level), then we will need our automation workflow to create a database per ad hoc environment.

Let's spin up a simple example to illustrate how this would would work.

A developer is working on feature-abc that involves a significant refactor of the data model.
The developer decides to spin up an ad hoc environment called feature-abc.
Our automation will need to create a database in the RDS instance called feature-abc.
We can configure a bastion host with psql installed that has network and security group access to our RDS instance, and we can give our GitHub Actions an SSH key that can be used to run createdb over SSH.
The automation also runs database migrations once the application has started, and we can view the logs of the database migration to check for any irregularities or other issues.
This will give the developer and the rest of team confidence that the promoting feature-abc to the next pre-production environments will not have any errors.
The developer may even choose to load a SQL dump of the next pre-production environment into their feature-abc database get even more confidence that there will be no data integrity errors.
When the developer's PR is merged and approved, the ad hoc environment feature-abc can be destroyed, including the feature-abc database in the shared RDS instance.

With this approach we won't incur the costs of multiple RDS instances. Ad hoc environments will start up faster because an RDS instance per environment is not required. We do have slightly less resource isolation, and we need to introduce a bastion host, but I consider this an acceptable trade-off.

Redis (key-value database)

Redis is another database used in the application and it plays a few different roles:

primarily, it is a caching layer that can cache request responses to reduce load on the database and speed up our application
it is a message broker for our async task workers (celery)
it can be used as a backend for other 3rd party Django apps that our main application may need to use (such as django-constance, cache-ops, django-channels, etc.)

AWS offers a managed Redis service called ElastiCache. Redis running on an ElastiCache instance can do database isolation similar to how Postgres running on RDS can do database isolation as we discussed previously, but there are some key differences:

redis databases are numbered, not named
the backend application uses isolated numbered databases for the different 3rd party apps that I just mentioned (for example: celery can use database 0, API caching layer can use database 1, etc.)

This makes it difficult to use a single ElastiCache instance for our ad hoc environments since we would need to figure out which numbered database to assign to a specific role for each ad hoc environment (e.g. how do we know which numbered database to use for the API caching for the feature-abc ad hoc environment).

So how can we approach providing isolated redis instances for multiple ad hoc environments? Spoiler: my solution is to run redis as a stateful service in ECS. Before we dig into how to do this, we need to talk about another important part of our application: compute.

Compute

Our backend application is composed of a few different services that all share the same code base. In other words, our backend's services uses the same docker image but run different processes for each component:

gunicorn for the core API
celery for the task workers
celerybeat for task scheduling

If our application used websockets, we could have another service that runs an asgi server process (like daphne or uvicorn).

Since our backend application is packaged into a container and we are using AWS as our cloud provider, ECS is a great choice for running our backend services. ECS is a container orchestration tool that I usually describe as a nice middle ground between docker swarm and Kubernetes. Simply put, it is a flexible option for running our containerized services that make up our backend application.

With ECS you can choose to run containers directly on EC2 instances that you manage, or you can run containers using Fargate. Fargate is a serverless compute option that takes care of managing both the underlying "computer" and operating system that run our application's containers. All of our backend dependencies are defined in our Dockerfile, so we do not to maintain or update the underlying operating system that runs our containers -- AWS handles all of this for us. To use Fargate, we simply tell AWS which containers to run and how much CPU and memory to use in the ECS Task that runs the containers. To scale our app horizontally, the ECS service that managed ECS tasks simply increases the number of tasks that run.

Since we are going to use the Fargate launch type for our ECS Tasks, let's talk about the ergonomics of these serverless compute instances compared to running our services directly on an EC2 instances.

We can't SSH into Fargate compute instances. We can instead use AWS Systems Manager and EcsExec to open an interactive shell in a running backend container. This can be useful for developers who might need to run a management command or access an interactive Django shell to verify behavior in their ad hoc environment.
We can't simply change code on the server and restart services. This can sometimes be a useful pattern for debugging something that can only be tested on a cloud environment (e.g. something that can't easily be reproduced on your local machine), so this requires that developers push new images to their backend services for every change they want to see reflected on their ad hoc environment. Later on I'll discuss how we can provide tooling for developers to quickly update the image used in their backend services.

With AWS Fargate, you will pay more than you would for a comparable amount of CPU and memory on EC2 instances. Similar to EC2 spot instances, Fargate offers interruptable instances called Fargate Spot which costs significantly less than regular Fargate instances. Fargate spot is appropriate for our ad hoc environments since ad hoc environments are non-critical workloads. In the event that a Fargate spot instance is interrupted, the ECS service will automatically launch another Fargate task to replace the task that was stopped.

In my opinion, ECS with Fargate is ideal for running the stateless services that make up our backend application. In terms of parity with our production environment, we can keep almost everything the same, except use regular Fargate instances instead of Fargate spot instances.

Redis, revisited

We can run redis as an ECS service instead of using ElastiCache. In order to do this, we will need our backend services (gunicorn, celery and celerybeat) to be able to communicate with a fourth ECS service that will be running redis (using an official redis image from Docker Hub, or a redis image that we define in ECR).

By default, there is no way for our backend services to know how to communicate with any other service in our ECS cluster. If you have used docker-compose, you may know that you can use the service name redis in a backend service to easily communicate with a redis service called redis. This networking convenience is not available to use out of the box with ECS. To achieve this in AWS, we need some way to manage a unique ad hoc environment-specific Route 53 DNS record that points to the private IP of the Fargate task that is running redis in an ECS cluster for a given ad hoc environment. Such a service exists in AWS and it is called Cloud Map. Cloud Map offers service discovery so that our backend services can make network calls to a static DNS address that will reliably point to the correct private IP of the ECS task running the redis container.

We can define a service discovery namespace (which will essentially be a top level domain, or TLD) that all of our ad hoc environments can share. Let's assume this namespace is called ad-hoc. Each ad hoc environment can then define a service discovery service in the shared namespace for redis that is called {ad-hoc-env-name}-redis. This way, we can have a reliable address that we can configure as an environment for our backend that will look like this: redis://{ad-hoc-env-name}-redis.ad-hoc:6379/0. {ad-hoc-env-name-redis}.ad-hoc will be the hostname of the redis service, and Route 53 will create records that point to {ad-hoc-env-name}-redis.ad-hoc to the private IP of the redis Fargate task for each ad hoc environment.

Load Balancing

We now have our backend services (gunicorn, celery and celerybeat) running on Fargate spot instances, and these services can communicate with the redis service in our ad hoc environment's ECS cluster using service discovery that we configured with Cloud Map. We still need to think about a few things:

how will we expose our API service to the public (or private) internet
how will we expose our frontend application to the public (or private) internet
how will we make sure that requests go the correct ECS services

Application load balancers (ALBs) are a great way to expose web app traffic to the internet. We could either have one application load balancer per ad hoc environment, or one application load balancer shared between all ad hoc environments. ALBs are somewhat slow to create and they also incur a significant monthly cost. They are also highly scalable, so using a shared ALB for all ad hoc environments would work.

Individual ad hoc environments can then create target groups and listener rules for a shared ALB for each service that needs to serve requests from the internet (the backend and the frontend). In our case this is the backend API server and the frontend server that serves our static frontend site using NGINX.

ECS services that need to be exposed to the internet can specify the target group, port and container to use for load balancing. A target group is created that defines the health check and other settings, and a load balancer listener rule is created on the shared load balancer that will forward traffic matching certain conditions to the target group for our service.

For a given ad hoc environment, we need to specify that only traffic with certain paths should be sent to the backend service, and all other traffic should be sent to the frontend service. For example, we may only want to send traffic that starts with the path /api or /admin to the backend target group, and all other traffic should be sent to the frontend target group. We can do this by setting conditions on the listener rules that forward traffic do the frontend and backend target groups based on the hostname and path.

We want our listener rule logic to forward /api, /admin and any other backend traffic to the backend target group, and forward all other traffic (/*) to the frontend target group. In order to do this, we need the backend listener rule to have a higher priority than the frontend listener rule for each ad hoc environment. Since we are using the same load balancer for all ad hoc environments, the priority values for each listener rule need to be unique. If we don't set the priority explicitly, then the priority will be set automatically to the next available value in ascending order. In order to make sure that the backend listener rule has a higher priority than the frontend listener rule for each ad hoc environment, we need to tell Terraform that the frontend module depends_on the backend module. This way the backend listener rule will have a higher priority (e.g. priority of 1) because it will be created first, and the frontend listener rule will have a lower priority (e.g. priority of 2).

More on shared resources vs per-environment resources

Up until now we have discussed infrastructure design decisions at a high level, but we have not yet talked about how to organize our infrastructure as code. At a basic level, components of our ad hoc environment either fall into shared infrastructure or infrastructure that is specific to an individual ad hoc environment. Here's a list of the resources that are shared and the resources that are specific to each ad hoc environment.

Shared resources include:

VPC
IAM policies
Security groups
RDS instance
Service Discovery namespace
Application Load Balancer
Bastion host

Ad hoc environment resources include:

ECS Cluster
ECS Tasks and Services (for backend and frontend applications)
ECS Tasks for running management commands (such as migrate)
CloudWatch logging groups for containers defined in ECS Tasks
ALB Target groups
ALB listener rules
Route 53 record that points to the load balancer (e.g. ad-hoc-env-name.example.com)
S3 bucket for static and media assets
Service Discovery Service for redis service in ECS cluster

Shared resources can be defined in one terraform configuration and deployed once. These resources will be long-lived as long as the application is under active development and the team requires on-demand provisioning of ad hoc environments.

Ad hoc environment resources can be defined in another terraform configuration that references outputs from the shared resource configuration using terraform_remote_state. Each ad hoc environment can be defined by a <name>.tfvars file that contains the name of the ad hoc environment (such as brian, brian2, demo-feature-abc, etc.). This <name> value will also be the name of the Terraform workspace and will be used to name and tag AWS resources associated with the corresponding ad hoc environment.

The <name>.tfvars file will allow developers to use a simple, standard file interface for defining application specific values, such as the version of the backend and frontend. This brings developers into the concepts and practices of "infrastructure as code" and "configuration as code" and also helps the entire team keep track of how different environments are configured.

Ad hoc environment <name>.tfvars files are stored in a directory of a special git repository that also defines the ad hoc environment terraform configuration. Currently, the tfvars files are stored here.

Now let's look at the two terraform configurations used for defining shared resources and ad hoc environment resources.

Ad Hoc Environment Diagram

Here's an overview of the resources used for the ad hoc environments. The letters represent shared resources and the numbers represent per-environment resources.

Shared architecture

A. VPC (created using the official AWS VPC Module)

B. Public subnets for bastion host, NAT Gateways and Load Balancer

C. Private subnets for application workloads and RDS

D. Application Load Balancer that is shared between all ad hoc environments. A pre-provisioned wildcard ACM certificate is attached to the load balancer that is used to secure traffic for load-balanced ECS services

E. Service discovery namespace that provides a namespace for application workloads to access the redis service running in ECS

F. IAM roles needed for ECS tasks to access AWS services

G. RDS instance using postgres engine that is shared between all ad hoc environments

H. Bastion host used to access RDS from GitHub Actions (needed for creating per-environment databases)

I. NAT Gateway used to give traffic in private subnets a route to the public internet

Environment-specific architecture

ECS Cluster that groups all ECS tasks for a single ad hoc environment
Listener rules and target groups that direct traffic from the load balancer to the ECS services for an ad hoc environment.
Redis service running in ECS that provides caching and serves as a task broker for celery
Route53 records that point to the load balancer
Frontend service that serves the Vue.js application over NGINX
API service that serves the backend with Gunicorn
Celery worker that process jobs in the default queue
Celery beat that schedules celery tasks
collectstatic task
migrate task
CloudWatch log groups are created for each ECS task in an ad hoc environment
Each ad hoc environment gets a database in the shared RDS instance

Shared resources terraform configuration

Let's have a detailed look at the terraform configuration for shared resources that will support ad hoc environments.

VPC

We can use the AWS VPC module for creating the shared VPC with Terraform. This module provides a high level interface that will provision lots of the components that are needed for a VPC following best practices, and it is less code for the DevOps team to manage compared to defining each component of a VPC (route tables, subnets, internet gateways, etc.).

Cloud Map Service Discovery Namespace

Cloud Map is used in order to allow services in our ECS cluster to communicate with each other. The only reason that Cloud Map is needed is so that the backend services (API, celery workers, beat) can communicate with Redis, which will be an important service for our application, providing caching and also serving as a broker for celery. If we were to use Django Channels for websockets, the Redis service would also function as the backend for Django Channels.

We will only need to specify service_registries on the redis service in our ECS cluster. What this will do is provide an address that our other services can use to communicate with redis. This address is created in the form of a Route 53 record, and it points to the private IP address of the redis service. If the private IP of the redis service is updated, the Route 53 record record for our redis service will be updated as well.

In order for service discovery to work in the VPC that we created, we need to add the following options to the terraform AWS VPC module:

# DNS settings
enable_dns_hostnames = true
enable_dns_support   = true

Security Groups

There are two important security groups that we will set up as part of the shared infrastructure layer to be used by each ad hoc environment: one security group for the load balancer, and one security group where all of our ECS services will run.

The load balancer security group will allow all traffic on port 80 and 443 for HTTP and HTTPS traffic. The ECS security group will only allow inbound traffic from the application load balancer security group. It will also allow for traffic from port 6379 for redis traffic.

IAM Roles

There are two important IAM roles that we will need for our ECS tasks. We need a task execution role that our ECS tasks will use to interact with other AWS services, such as S3, Secrets Manager, etc.

RDS Instance

We will create one RDS instance in one of the private subnets in our VPC. This RDS instance will have one Postgres database per ad hoc environment. This RDS instance has a security group that allows all traffic from our ECS security group.

Load Balancer

We will use one load balancer for all ad hoc environments. This load balancer will have a wildcard ACM certificate attached to it (*.dev.example.com, for example). Each ad hoc environment will create a Route 53 record that will point to this load balancer's public DNS name. For example, brian.dev.example.com will be the address of my ad hoc environment. Requests to this address will then be routed to either the frontend ECS service or the backend ECS service depending on request header values and request path values that will be set on the listener rules.

By default, a load balancer supports up to 50 listener rules, so we can create plenty of ad hoc environments before we need to increase the default quota. There will be a discussion at the end of this article about AWS service quotas.

Bastion Host

The bastion host will be created in one of the VPC's public subnets. This will primarily be used for connecting to RDS to create new databases for new ad hoc environments, or for manually manipulating data in an ad hoc environment for debugging.

Ad hoc environment resources

Now that we have defined a shared set of infrastructure that our ad hoc environments will use, let's have a look at the resources that will be specific to ad hoc environments that will be added on top of the shared resources.

ECS Cluster

The ECS Cluster is a simple grouping of ECS tasks and services.

ECS Tasks and Services

Each environment will have a set of ECS tasks and services that will be used to run the application.

There are four important ECS services in our application that are used to run "long-running" ECS tasks. Long-running tasks are tasks that start processes that run indefinitely, rather than running until completion. The long-running tasks in our application include:

backend web application (gunciron web server)
backend celery worker
backend celery beat
frontend web site (nginx web server)
redis

The infrastructure code also defines some tasks that are not long-running but rather short lived tasks that run until completion and do not start again. These tasks include:

collectstatic
database migrations
any other ad-hoc task that we want to run, usually wrapped in a Django management command

How to setup an ad hoc environment

Now that we have been over the resources that will be created to support our ad hoc environments, let's talk about how we can enable individuals on our team to create and update ad hoc environments.

Design decisions

The devops team will decide on the interface that will be used for creating an ad hoc environment. Since we are using Terraform, this interface will be a Terraform configuration. The minimum amount of information that our ad hoc environment configuration needs is image tags for the frontend and backend images to use. Other configurations will be provided by default values set in variables.tf, and these defaults can easily be overridden by passing values to terraform plan and terraform apply. I'm choosing to use <name>.tfvars as the way to pass configuration values to our ad hoc environments where <name> is the name of the ad hoc environment being created. This will give us the following benefits:

all ad hoc environments will be visible to the entire team in git since each ad hoc environment will have a <name>.tfvars file associated with it
adding additional customization to an ad hoc environment does not add additional complexity to our automation pipeline since all customization is added through a single file that will be referenced by $WORKSPACE.tfvars

The downsides of this approach are:

creating ad hoc environments requires knowledge of git, so non-technical product team members might need help from the engineering team when setting up an ad hoc environment
there is an additional "manual" step of creating a <name>.tfvars file that must be done before running a pipeline to create an ad hoc environment

Provided that a <name>.tfvars file has been created and pushed to the repo, creating or updating an ad hoc environment will be as simple as running a pipeline in GitHub Actions that specifies the <name> of our ad hoc environment. If no such <name>.tfvars file exists, our pipeline will fail.

GitHub Action

Creating ad hoc environments will involve manually triggering a GitHub Action that runs on workflow_dispatch:

on:
  workflow_dispatch:
    inputs:
      workspace:
        description: 'Name of terraform workspace to use'
        required: true
        default: 'dev'
        type: string

We only have to enter the name of the ad hoc environment we want to create or update. The ad hoc environment name is used as the Terraform workspace name. This name is also the name of the <name>.tfvars file that must be created per environment.

This workflow will do terraform init, terraform plan and terraform apply using the <name>.tfvars file. When everything has been created, we will use the AWS CLI to prepare the environment so that it can be used. We will use the aws ecs run-task command to run database migrations needed so that the application code can make database queries.

How to update code in an existing ad hoc environment

Assuming that we have deployed an ad hoc environment called brian with version v1.0.0 of the backend application and v2.0.0 of the frontend application, let's think about the process of updating the application to v1.1.0 of the backend and v2.1.0 of the frontend.

The simplest approach to updating the application would be edit the brian.tfvars file with the new versions:

# brian.tfvars
be_image_tag = "v1.1.0"
fe_image_tag = "v2.1.0"

If we run the same pipeline that we initially used to deploy ad hoc environment (with terraform init, terraform plan and terraform apply) against the updated brian.tfvars file, this will result in a rolling update of the frontend and backend services (more on rolling updates here).

If there are database migrations included in the new version of the code that is going out, we need to run database migrations after the terraform apply completes. We use a top level output from the ad hov environment terraform configuration that is a run-task command with all appropriate arguments that will run database migrations when called from GitHub Actions.

Order of Operations

For ad hoc environments, it is probably fine to update the services and then run the database migrations. Ad hoc environments may only have a single "user" -- the developer, so we don't need to worry about any errors that may occur if requests are made against the new version of code before database migrations have been applied.

Let's consider a simple example to illustrate what can go wrong here. If we add a total_views to our blog post model to track the total number of page views a post has, we would add a field to the model, generate migration file with makemigrations, and then update our views and model serializers to make use of this new field. In the time between updating our service and running the database migrations, any requests to endpoints that access the new database field will fail since the table does not yet exist.

If we first run database migrations and then update application code (ECS services), then we can avoid errors about fields not existing. In our production application, we want to aim for fewer errors, so we should be using this "order of operations": first run new database migrations and then update application code.

GitHub Action for application updates

We need a GitHub Action that can do the following:

fetch the current container definition JSON files for each backend tasks (aws ecs describe-task-definition)
write new container definitions JSON with the new backend image tag (using jq)
register new task definitions with the new container definition JSON files for each task (aws ecs register-task-definition)
call run-task with the newly updated migration ECS task (aws ecs run-task)
wait for the task to exit and display the logs (aws ecs wait tasks-stopped)
update the backend services (gunicorn, celery, celery beat) (aws ecs update-service)
wait for the new backend services to be stable (aws ecs wait services-stable)

Here's a visual representation of the backend update process:

In order have the correct arguments for all of the AWS CLI calls used in the above workflow, we can use the AWS CLI to fetch resource names by tag.

Here is what I'm using for the script. There lots of comments, so please refer to those comments for an explanation of what the script is doing.

#!/bin/bash

# This script will be called to update an ad hoc environment backend
# with a new image tag. It will first run pre-update tasks (such as migrations)
# and then do a rolling update of the backend services.

# It is called from the ad_hock_backend_update.yml GitHub Actions file

# Required environment variables that need to be exported before running this script:

# WORKSPACE - ad hoc environment workspace
# SHARED_RESOURCES_WORKSPACE - shared resources workspace
# BACKEND_IMAGE_TAG - backend image tag to update services to (e.g. v1.2.3)
# AWS_ACCOUNT_ID - AWS account ID is used for the ECR repository URL

echo "Updating backend services..."

# first define a variable containing the new image URI
NEW_BACKEND_IMAGE_URI="$AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/backend:$BACKEND_IMAGE_TAG"


# register new task definitions
# https://docs.aws.amazon.com/cli/latest/reference/ecs/describe-task-definition.html#description
for TASK in "migrate" "gunicorn" "default" "beat"
do
  echo "Updating $TASK task definition..."

  # in Terraform we name our tasks based on the ad hoc environment name
  # (also the Terraform workspace name) and the name of the task
  # (e.g. migrate, gunicorn, default, beat)
  TASK_FAMILY=$WORKSPACE-$TASK

  # save the task definition JSON to a variable
  TASK_DESCRIPTION=$(aws ecs describe-task-definition \
    --task-definition $TASK_FAMILY \
  )

  # save container definitions to a file for each task
  echo $TASK_DESCRIPTION | jq -r \
    .taskDefinition.containerDefinitions \
    > /tmp/$TASK_FAMILY.json

  # write new container definition JSON with updated image
  echo "Writing new $TASK_FAMILY container definitions JSON..."

  # replace old image URI with new image URI in a new container definitions JSON
  cat /tmp/$TASK_FAMILY.json \
    | jq \
    --arg IMAGE "$NEW_BACKEND_IMAGE_URI" '.[0].image |= $IMAGE' \
    > /tmp/$TASK_FAMILY-new.json

  # Get the existing configuration for the task definition (memory, cpu, etc.)
  # from the variable that we saved the task definition JSON to earlier
  echo "Getting existing configuration for $TASK_FAMILY..."

  MEMORY=$( echo $TASK_DESCRIPTION | jq -r \
    .taskDefinition.memory \
  )

  CPU=$( echo $TASK_DESCRIPTION | jq -r \
    .taskDefinition.cpu \
  )

  ECS_EXECUTION_ROLE_ARN=$( echo $TASK_DESCRIPTION | jq -r \
    .taskDefinition.executionRoleArn \
  )

  ECS_TASK_ROLE_ARN=$( echo $TASK_DESCRIPTION | jq -r \
    .taskDefinition.taskRoleArn \
  )

  # check the content of the new container definition JSON
  cat /tmp/$TASK_FAMILY-new.json

  # register new task definition using the new container definitions
  # and the values that we read off of the existing task definitions
  echo "Registering new $TASK_FAMILY task definition..."

  aws ecs register-task-definition \
    --family $TASK_FAMILY \
    --container-definitions file:///tmp/$TASK_FAMILY-new.json \
    --memory $MEMORY \
    --cpu $CPU \
    --network-mode awsvpc \
    --execution-role-arn $ECS_EXECUTION_ROLE_ARN \
    --task-role-arn $ECS_TASK_ROLE_ARN \
    --requires-compatibilities "FARGATE"

done

# Now we need to run migrate, collectstatic and any other commands that need to be run
# before doing a rolling update of the backend services

# We will use the new task definitions we just created to run these commands

# get the ARN of the most recent revision of the migrate task definition
TASK_DEFINITION=$( \
  aws ecs describe-task-definition \
    --task-definition $WORKSPACE-migrate \
    | jq -r \
    .taskDefinition.taskDefinitionArn \
)

# get private subnets as space separated string from shared resources VPC
SUBNETS=$( \
  aws ec2 describe-subnets \
    --filters "Name=tag:env,Values=$SHARED_RESOURCES_WORKSPACE" "Name=tag:Name,Values=*private*" \
    --query 'Subnets[*].SubnetId' \
    --output text \
)

# replace spaces with commas using tr
SUBNET_IDS=$(echo $SUBNETS | tr ' ' ',')

# https://github.com/aws/aws-cli/issues/5348
# get ecs_sg_id - just a single value
ECS_SG_ID=$( \
  aws ec2 describe-security-groups \
    --filters "Name=tag:Name,Values=$SHARED_RESOURCES_WORKSPACE-ecs-sg" \
    --query 'SecurityGroups[*].GroupId' \
    --output text \
)

echo "Running database migrations..."

# timestamp used for log retrieval (milliseconds after Jan 1, 1970 00:00:00 UTC)
START_TIME=$(date +%s000)

# run the migration task and capture the taskArn into a variable called TASK_ID
TASK_ID=$( \
  aws ecs run-task \
    --cluster $WORKSPACE-cluster \
    --task-definition $TASK_DEFINITION \
    --network-configuration "awsvpcConfiguration={subnets=[$SUBNET_IDS],securityGroups=[$ECS_SG_ID],assignPublicIp=ENABLED}" \
    | jq -r '.tasks[0].taskArn' \
  )

echo "Task ID is $TASK_ID"

# wait for the migrate task to exit
# https://docs.aws.amazon.com/cli/latest/reference/ecs/wait/tasks-stopped.html#description
# > It will poll every 6 seconds until a successful state has been reached.
# > This will exit with a return code of 255 after 100 failed checks.
aws ecs wait tasks-stopped \
  --tasks $TASK_ID \
  --cluster $WORKSPACE-cluster

# timestamp used for log retrieval (milliseconds after Jan 1, 1970 00:00:00 UTC)
END_TIME=$(date +%s000)

# print the CloudWatch log events to STDOUT
aws logs get-log-events \
  --log-group-name "/ecs/$WORKSPACE/migrate" \
  --log-stream-name "migrate/migrate/${TASK_ID##*/}" \
  --start-time $START_TIME \
  --end-time $END_TIME \
  | jq -r '.events[].message'

echo "Migrations complete. Starting rolling update for backend services..."

# update backend services
for TASK in "gunicorn" "default" "beat"
do

  # get taskDefinitionArn for each service to be used in update-service command
  # this will get the most recent revision of each task (the one that was just created)
  # https://docs.aws.amazon.com/cli/latest/reference/ecs/describe-task-definition.html#description
  TASK_DEFINITION=$( \
    aws ecs describe-task-definition \
      --task-definition $WORKSPACE-$TASK \
      | jq -r \
      .taskDefinition.taskDefinitionArn \
  )

  # update each service with new task definintion
  aws ecs update-service \
    --cluster $WORKSPACE-cluster \
    --service $WORKSPACE-$TASK \
    --task-definition $TASK_DEFINITION \
    --no-cli-pager

done

echo "Services updated. Waiting for services to become stable..."

# wait for all service to be stable (runningCount == desiredCount for each service)
aws ecs wait services-stable \
  --cluster $WORKSPACE-cluster \
  --services $WORKSPACE-gunicorn $WORKSPACE-default $WORKSPACE-beat

echo "Services are now stable. Backend services are now up to date with $BACKEND_IMAGE_TAG."

echo "Backend update is now complete!"

With this GitHub Actions workflow, a developer can now easily change the version of backend code that is running in their ad hoc environments without needing to involve Terraform. Using the ad_hoc_backend_update.yml GitHub Actions workflow, a developer only needs to enter the name of the workspace and the version of the backend code they want to use. The workflow will then run the migrate task and update the backend services.

Using `ignore_changes` in the definitions

There is one more important point to make about Terraform before we conclude this discussion on updating backend code for existing ad hoc environments. Consider the scenario where a developer has launched an ad hoc environment with backend version v1.0.0. They make a small change to the backend code and push version v1.0.1. Next, the developer remembers that a backend environment variable needs to be changed. This can be done by updating their *.tfvars file. If they now re-run the ad hoc environment update pipeline without also updating the backend version in their *.tfvars file, then their code will be reverted from v1.0.1 to v1.0.0. We would need to coordinate version changes between updating the backend with the pipelines that use Terraform commands and the pipelines that use the AWS CLI commands.

There is a setting on the aws_ecs_service resource in Terraform we can can use to prevent this from happening. This setting is called ignore_changes and is defined under the resource's lifecycle configuration block. With this setting, when we update the *.tfvars file with our new environment variable value, we will create another recent task definition with the same v1.0.0 image, but the ECS service will not update in response to this change (that's what the ignore_changes is for). Once we make the *.tfvars file update and redeploy using the Terraform pipeline, nothing on our ad hoc changes, but we did get a new task definitions defined in our AWS account for each backend service. When we go to make the backend update with the pipeline that uses AWS CLI commands, the most recent task revision is used to create the new task definition, so it will include the environment variable change that we added earlier.

Frontend updates

The process described above is needed for updating the backend application. Updating the frontend application involves a similar process to the backend update. The main difference is that no task (such as the migrate command on the backend) needs to run before the service is updated. Here's an overview of the frontend update process:

fetch the container definition JSON for the frontend tasks (aws ecs describe-task-definition)
write new container definition JSON with the new frontend image tag (using jq)
register new task definitions with the new container definition JSON file for the frontend task (aws ecs register-task-definition)
update the frontend service (aws ecs update-service)
wait for the new backend services to be stable (aws ecs wait services-stable)

Setting up everything from a new AWS account and GitHub Actions

Here's a quick overview of initial setup steps that are needed in order to use the automation defined in the GitHub Actions for ad hoc environments.

Configure AWS credentials locally

There is one Terraform command that we will run on our local machine to setup a remote backend for storing our Terraform state. In order to run this, we need to configure AWS credentials locally.

Run the `make tf-bootstrap` command

This command will setup an S3 bucket and DynamoDB table for storing Terraform state. Running this command will also require that a bootstrap.tfvars file has been created from the template. This will define the AWS region and name to be used for creating resources.

Build and push a backend and frontend image

The tf-bootstrap command also creates ECR repositories for the backend and frontend images. We can use the ecr_backend.yml and ecr_frontend.yml GitHub Actions workflows to build and push the backend and frontend images to the ECR repositories. These pipelines accept a single parameter which is a git tag that must exist in the repo. This git tag will then be used as the image tag for the backend and frontend images.

Purchase a domain name in Route 53

I use Route 53 to manage DNS records, and have purchased a domain name that I use for testing and debugging in this and other projects.

Create a wildcard ACM certificate

I chose to create this certificate outside of Terraform and import it via ARN. We need a wildcard certificate so that multiple ad hoc environments can be hosted on subdomains of the same domain.

Create a new EC2 key pair

The key pair should be created manually and it needs to be added to GitHub repository secrets so that it can be used in the ad hoc environment pipelines.

Add secrets to GitHub

The following secrets are needed for GitHub Actions to run. Add these as repository secrets:

ACM_CERTIFICATE_ARN - ARN of the wildcard ACM certificate
AWS_ACCESS_KEY_ID - AWS access key ID
AWS_ACCOUNT_ID - AWS account ID
AWS_DEFAULT_REGION - AWS default region (I use us-east-1)
AWS_SECRET_ACCESS_KEY - AWS secret access key
DOMAIN_NAME - domain name for the ad hoc environment (e.g. example.com)
KEY_NAME - name of the EC2 key pair
SSH_PRIVATE_KEY - private key for the EC2 key pair
TF_BACKEND_BUCKET - name of the S3 bucket used for storing Terraform state
TF_BACKEND_DYNAMODB_TABLE - name of the DynamoDB table used for locking the Terraform state file
TF_BACKEND_REGION - AWS region for the S3 bucket and DynamoDB table

These secrets are referenced in the GitHub Actions workflows.

Create shared resources

Now that we have GitHub secrets configured, we can run the shared_resources_create_update.yml GitHub Actions workflow. This will create shared resources environment in which we can build our ad hoc environments. This workflow requires a name (e.g. dev). This require that we create a dev.tfvars file in terraform/live/shared-resources directory. This usually takes between 5 and 7 minutes to complete since it needs to create an RDS instance which takes a few minutes to provision.

Create an ad hoc environment

We can now create an ad hoc environment. This requires the name of the shared resources environment (e.g. dev) and the name of the ad hoc environment (e.g. brian-test). The only thing we need to do before creating the brian-test ad hoc environment is to create the brian-test.tfvars file in the terraform/live/ad-hoc/envs directory. This will define the versions of the application and any other environment configuration that is needed. This pipeline usually takes about 3 minutes to finish.

Update the backend version in an ad hoc environment

Now that the backend is up and running and usable, we can update the backend version used in our ad hoc environment. This can be done by running the ad_hoc_backend_update.yml GitHub Actions workflow. To run this workflow you must specify:

the shared resource workspace (e.g. dev)
the ad hoc environment name (e.g. brian-test)
the new version of the backend to be used (e.g. v1.0.2)

The backend version should already have been built and pushed to the ECR repository using the ecr_backend.yml GitHub Actions workflow.

Use ECS Exec to access a Django shell in a running container

Instead of SSHing into the container, we can use the ecs-exec command to access a shell in the container. This is useful for debugging and testing. One of the outputs of the ad hoc environment terraform configuration contains the script needed to run the ecs-exec command:

TASK_ARN=$(aws ecs list-tasks \
  --cluster alpha-cluster \
  --service-name  alpha-gunicorn | jq -r '.taskArns | .[0]' \
)
aws ecs execute-command --cluster alpha-cluster \
    --task $TASK_ARN \
    --container gunicorn \
    --interactive \
    --command "/bin/bash"

You will then have a shell in the container:

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


Starting session with SessionId: ecs-execute-command-073d1947fa71c058c
root@ip-10-0-2-167:/code# python manage.py shell
Python 3.9.9 (main, Dec 21 2021, 10:03:34)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>

Destroy an ad hoc environment

Use the ad_hoc_env_destroy.yml GitHub Actions workflow to destroy an ad hoc environment. To run this workflow you need to specify the shared resource workspace (e.g. dev) and the ad hoc environment name (e.g. brian-test).

Destroy the shared resources environment

Use the shared_resources_destroy.yml GitHub Actions workflow to destroy the shared resources environment. To run this workflow you need to specify the shared resource workspace (e.g. dev).

This defines the full lifecycle of creating and destroying ad hoc environments.

Future improvements, open questions and next steps

Enhanced Security

The low hanging fruit here is to use least privilege (for roles used in automation) and to define all roles with IaC. Currently I am using Admin roles for the credentials I store in GitHub which is a shortcut to using IaC and is not a best practice.

Keeping track of ad hoc environments

We need to think about how we can keep track of our active ad hoc environments. Active environments will incur additional AWS costs, and we do not want developers or the product team to create lots of environments and then leave them running without actively using them.

We may decide to have some long-lived ad hoc environments, but those would be managed primarily by the DevOps team and respective owners (e.g. QA, product team, etc.).

One way to check the active ad hoc environments would be to use the AWS CLI. We could list the ECS clusters in our development account, and this would show the number of ad hoc environments running. We could go farther and list the ad hoc environments by when they were last updated. We could then request developers or team members to remove ad hoc environments that are not in use.

Or we could have a policy that all ad-hoc environments are deleted automatically at the end of each week.

Terraform Tooling

Testing Terraform Code
Testing GitHub Actions
Using advanced Terraform frameworks like Terragrunt
Using Terraform Cloud for more advanced Terraform features

More secure way of defining RDS username and password

Currently the postgres database does not have a secure password. It is both hardcoded in the module as a default value and it will also be saved in plaintext in the Terraform state file.

Backend application update script

The script used for updating the backend application could be improved or broken up into multiple scripts to better handle errors and failures that happen during the pipeline. The script runs several different commands and could potentially fail at any step, so it would be nice to improve the error messages so that both developers and DevOps teams can more quickly diagnose pipeline failures.

Limiting traffic to ad hoc environments to a VPN

Another good next step would be to show how we can limit traffic to ad hoc environments to a VPN.

Figure out how many ad hoc environments we can create with the default quotas

AWS accounts limit the number of resources you can create, but for most of this quota limits you can request an increase per account. I need to figure out how many ad hoc environments I can create with the default quotas.

Code Repo Organization

One minor improvement would be to move the terraform directory out of the django-step-by-step monorepo into a dedicated repo. We may also want to move GitHub Actions for creating, updating and destroying environments to this new repo. For early stage development, using a single repository that stores both application code and Terraform configuration works, but it would be better to keep these separate at the repository level as the project grows.

Multiple AWS Accounts

Everything shown here uses a single AWS account: ECR images, Terraform remote state storage, all shared resource environments and all ad hoc environments. Using one account keeps things simple for a demonstration of this workflow, but in practice it would be beneficial to use multiple AWS accounts for different purposes. This would also involve more carefully planned IAM roles for cross-account resource access.

Modules for stable environments to be used for long-lived pre-production and production environments

This article looked at how to make tradeoffs between costs, speed of deployment and production parity in ad hoc environments. I'm interested in building a new set of modules that can be used to set up environments that:

are more stable and more long-lived
have less resource sharing (dedicated RDS and ElastiCache resources)
implement autoscaling for load-testing (or maybe implement autoscaling for ad hoc environments)
can be used to perform load testing
have enhanced observability tooling

These environments might be used as part of a QA process that does a final sign-off on a new set of features scheduled for deployment to production environments, for example.

Conclusion

This wraps up the tour of my ad hoc environment infrastructure automation. Thank you for having a read through my article. If you have a similar (or different) approach to building ad hoc environments, I would love to hear about it.

How and why I added AdSense and an AdBlock detector to my personal website

Brian Caffey — Sun, 07 Nov 2021 19:57:31 +0000

If you are reading this article on briancaffey.github.io/2021/10/31/how-and-why-i-added-adsense-and-adblock-detector-to-my-website, then you will be prompted to pause your ad blocker if you are using one. If you are not using an ad blocker, I recommend that you consider installing one. Reading this article on a browser with AdBlock enabled will allow you to see how I detect AdBlock and ask people to pause it when they are on my site.

This article is a deep dive on how I added ads to my site with Google AdSense and how I request that visitors to my site pause AdBlock so that I can make more money from Google AdSense.

How I added AdSense to my site

I have been enjoying using my GitHub Pages website to learn more about static sites, JAMStack and Nuxt.js, an awesome Vue.js framework with support for building statically generated sites. I have been able to learn and implement several different features which I have written about on my blog. Some examples include:

Adding a Drift chat window so users can message me directly
Implementing a contact form with formsubmit.io
Using Vue.js components in Markdown files to add interactive elements to my articles (such as graphs)
Adding a custom MailChimp newsletter sign-up form that is included in the footer of each page of my blog
Adding an RSS feed for my blog
Adding a site index and submitting it to Google

I have also been learning the suite of Google tools for monitoring and measuring traffic to my site, including Google Analytics and Google Search Console. Google Search Console is helpful for understanding the search terms that people are using when searching Google that result in organic traffic to my site.

At one point I found out that another website was using the same Google Tracking code that I had previously hard-coded into an old version of my website, and my Google Analytics started measuring traffic to URLs that I didn't recognize as belonging to my site. I was able to fix this by adding a Hostname filter rule in Google Analytics.

One area that I have not had any experience with until recently is Google AdSense. Google AdSense allows you to place ads on your website. Here's an overview of what I did to get started:

Add a site in Google AdSense
Submit my site for approval (this takes a few days)
Install and configure the Google AdSense plugin for NuxtJS
Add the ads.txt file generated by Google AdSense to my site
Confirm my address by entering a code that was mailed to me
Connect a bank account to my Google AdSense account

Here's the address confirmation code that I received from Google:

Here's the config code for AdSense from nuxt.config.js:

  /*
   ** Nuxt.js modules
   */
  modules: [
    // Doc: https://axios.nuxtjs.org/usage
    '@nuxtjs/axios',
    // Doc: https://github.com/nuxt/content
    '@nuxt/content',
    // Doc: https://www.npmjs.com/package/@nuxtjs/sitemap
    '@nuxtjs/sitemap',
    '@nuxtjs/feed',
    'nuxt-i18n',
    ['@nuxtjs/google-adsense', {     <-- AdSense config
      id: 'ca-pub-4924597640144289'
    }]
  ],

The process was pretty simple. Google now automatically places ads on my site in a few different formats:

ads displayed on the top and bottom of the page
popup ads displayed between route navigation
ads automatically inserted into the body of the page between paragraphs in my articles
ads that I place on articles explicitly using the <adsbygoogle /> Vue component

When everything was set up properly I started seeing ads on my site, and I see a non-zero value in my estimated earnings in the AdSense console. Google has a payment threshold of $100, so I need make this amount before I can start receiving money from Google.

My estimated earning report shows that I make between $0 and $4.32 in ad sales per day. I'm interested to see how much I can make with an article that I post across the many different channels that I can publish to. I explored this in a previous article, but the main channels I can use for sharing content are:

DEV.to
Facebook
Hashnode
Medium
Reddit
Discord
Hacker Noon
Twitter
My MailChimp mailing list
Substack
Hacker News

This article should be a good place to start exploring how effective the different channels are in driving content to my site, and I'll update this article later with more details and numbers from my AdSense reports.

How I built an AdBlock detector for my site

I assume that most people reading my blog have an AdBlock extension installed in their browser like I do, such as AdBlock or ABP (AdBlock Pro). This got me thinking about how I could implement a simple AdBlock detector for my site that would hide the contents of the page if AdBlock is enabled.

How do you check to see if AdBlock is enabled?

I started with this question, and I came across this StackOverflow question which inspired the code that I am now using on this site to detect AdBlock.

The components of my AdBlock detector

There are a few different parts of my Nuxt Application that work together to detect if AdBlock is active and request that the user pause AdBlock for the site. The main components are:

A component called AdBlockBlocker
- This component is used in the default layout, so it is included in all pages on briancaffey.github.io
Vuex store module called adblock
- this module is used to keep track of a boolean value that indicates if AdBlock is enabled
- the module also has a simple getter and a mutation for turning adBlock to true or false
Some logic in the default.vue layout that is used for almost all of the pages on my site
- the getter from the Vuex store is used here to either show regular content or a message that the user needs to pause AdBlock
A component/page to display when AdBlock is enabled
- this component asks the user to please pause AdBlock
- I named this component PleaseDisableAdblock.vue
localStorage
- this is used to keep track of the presence of an AdBlocker that is blocking ads

Here's an overview of each part:

AdBlockBlocker.vue

This is the key part of how the AdBlock detection works. If the client is unable to download the adsbygoogle.js file, then that indicates that the user is using AdBlock.

<template>
  <div />
</template>

<script>
export default {
  mounted () {
    // if adblock is detected through a value that is set local storage, then show the AdBlock message
    // the `adblockEnabled` value is set in the catch block of the `detectAdBlock` method
    // (see adblock/setAdblockEnabled mutation)
    if (JSON.parse(localStorage.getItem('adblockEnabled')) === true) {
      this.$store.commit('adblock/setAdblockEnabled', true)
    }
    // check to see if the URL can be accessed on a 5 second interval
    setInterval(() => {
      this.detectAdBlock()
    }, 5000)
  },
  methods: {
    async  detectAdBlock () {
      // this is a URL that should be blocked by AdBlock
      const googleAdUrl = 'https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js'
      // make a request to the above URL
      await fetch(new Request(googleAdUrl)).then((_) => {
        // isAdblockEnabled is false by default
        // Check to see if isAblockEnabled was set to true by a previous request
        if (this.$store.getters['adblock/isAdblockEnabled'] === true) {
          // if the request was successful, then the user does not have AdBlock enabled,
          // so we can set isAdblockEnabled to false using the setAdblockEnabled mutation
          // this mutation will also set the `adblockEnabled` value in local storage to "false"
          // `adblockEnabled` be `JSON.parse`d since it is saved in localStorage as a string
          this.$store.commit('adblock/setAdblockEnabled', false)
          window.localStorage.setItem('adblockEnabled', 'false')
          // do a full reload of the page
          window.location.reload()
        }
      }).catch((_) => {
        // if the request was unsuccessful, then the user has AdBlock enabled.
        // we can set isAdblockEnabled to true using the setAdblockEnabled mutation
        // this will also set the `adblockEnabled` value in local storage to "true"
        this.$store.commit('adblock/setAdblockEnabled', true)
        window.localStorage.setItem('adblockEnabled', 'true')
      })
    }
  }
}
</script>

One other important point about this component is that it runs AdBlock detection using setInterval, meaning that it will check if AdBlock is enabled every few seconds while a user is on my site.

If AdBlock is enabled, the request will fail and the Vuex store value will be updated, which will cause the site to display the PleaseDisableAdblock.vue component.
Id AdBlock is not enabled, then the file will be read from disk cache and the Vuex store value will remain false.

Vuex Store

This is a very simple Vuex store module. The isAdblockEnabled getter will be used in the default.vue layout to component.

let initialValue = false

if (process.window) {
  initialValue = JSON.parse(window.localStorage.getItem('adblockEnabled')) || false
}
export const state = () => ({
  adblockEnabled: initialValue
})

export const getters = {
  isAdblockEnabled: state => state.adblockEnabled
}

export const mutations = {
  setAdblockEnabled (state, payload) {
    state.adblockEnabled = payload
  }
}

default.vue layout logic

<template>
  <div>
    <Navigation />
    <PleaseDisableAdblock v-if="$store.getters['adblock/isAdblockEnabled']" />
    <Nuxt v-else />
    <AdBlockerBlocker />

    <Footer />
  </div>
</template>

Content to show to request that a user pauses AdBlock

When users has AdBlock enabled, I show a simple message that asks them to please disable AdBlock. I also want to invite people who do not wish to disabled AdBlock to read my blog directly on GitHub, or to read it without ads by cloning or forking my repo, building it and running it in development mode with yarn dev.

Why am I adding ads and an AdBlock detector to my site?

Being asked to pause AdBlock is increasingly common. I have noticed that the experience is not the same on each site that requests AdBlock be paused. For example:

a site might ask you to pause AdBlock, but you have the option to continue without pausing AdBlock
a site shows you a preview of an article and asks you to unpause AdBlock to see the full article
all site content is hidden if you are using AdBlock, and a pop-up message ask you to "Continue to site" once you have paused AdBlock.

I remember being able to delete AdBlock detection modals and backgrounds on some sites as a way of getting around ad block detectors. This is as easy as Cmd + Shift + C, click on the element that is blocking the page content and delete the selected element.

One of the reasons I wanted to implement AdBlock detection is to see what is possible from the perspective of user experience (UX). Ideally, here's how I want to the "please disable AdBlock" experience to work on my site:

A user visits my site with AdBlock enabled
For a few seconds, the user can start reading the content of the article
The page content is replaced with a message that says "Please disable AdBlock" and some other links that people may find interesting or helpful.
The user goes into the AdBlock extension and pauses AdBlock on my site
The original page content is then displayed with ads
If the user re-enables AdBlock shortly after pausing it, then the "Please disable AdBlock" message should be displayed again
If a u

I don't want to ask the user to press a button or make the user think that they need to refresh the page. This is why I'm using setInterval, I continuously make requests to the Google Ads JavaScript file that will be used to detect if AdBlock enabled.

I'm happy to pause my AdBlock for smaller sites that ask me to, or for newspaper sites that are supported by advertising, and I'm assuming that people visiting my site will also be OK with pausing AdBlock as a way of thanking me for the work that goes into what I share on my blog.

I'm mostly curious to see what happens to my site's traffic, and to see what impact it could have on the earnings I make from AdSense.

Some of my questions are:

What will happen to the bounce rate if I request that AdBlock users pause AdBlock for my site?
What is the most effective amount of time to wait before requesting that a new user pause AdBlock for my site
What else can I include on my "Please Disable AdBlock" page to encourage new users to pause AdBlock on my site?

I'll have a good follow-up article to share with numbers from my AdSense and Google analytics.

One small issue

While ads are working on my site, I did notice that a console error related to AdSense:

K {message: "adsbygoogle.push() error: Only one 'enable_page_level_ads' allowed per page.", name: 'TagError', pbr: true, stack: "TagError: adsbygoogle.push() error: Only one 'enab…agead/js/adsbygoogle.js?client=ca-google:77:1130)"}
message: "adsbygoogle.push() error: Only one 'enable_page_level_ads' allowed per page."
name: "TagError"
pbr: true
stack: "TagError: adsbygoogle.push() error: Only one 'enable_page_level_ads' allowed per page.
    at go (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:219:326)
    at fo (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:218:788)
    at mo (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:225:365)
    at c (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:226:38)
    at no (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:226:156)
    at yo (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:235:248)
    at oo (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:232:89)
    at http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:227:47
    at Od.aa.ma (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:64:802)
    at Jf (http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-google:77:1130)"

Here's a related issue on GitHub: https://github.com/nuxt-community/google-adsense-module/issues/141. I'm still not sure how to fix this issue. If anyone has any ideas, please let me know!

If you are interested in following my progress, feel free to subscribe to my MailChimp newsletter by filling out the form in the footer of my website, or by following me on any of the accounts listed on briancaffey.github.io/contact.

How I write and share technical software development articles in 2021

Brian Caffey — Sat, 02 Oct 2021 00:00:00 +0000

This article describes how I write and share technical articles on my personal website and other developer websites and technical article aggregation sites.

This article is broken into three sections:

How I use Nuxt.js to build my personal website
How I share my articles on other development sites and article aggregators
Bonus content, project plug and conclusion

Building `briancaffey.github.io` with Nuxt.js

I first started writing my personal website on GitHub pages using a static website builder called Jekyll. Jekyll is a great tool for getting started with building a personal portfolio or technical blog, and it served me well for several years. I eventually changed my static site generation tool from Jekyll to Nuxt since I wanted to learn more about Vue.js. Also, I don't know Ruby very well, and it was difficult for me to use the Jekyll template language.

GitHub Pages

I host my website on GitHub pages at the following domain: briancaffey.github.io. GitHub pages is a great way to host a public site. The subdomain, briancaffey in my case, is your GitHub username. GitHub pages will serve content from a specified branch and nested folder. My website uses the branch gh-pages and the root folder /.

GitHub Actions

I use GitHub actions to deploy my website to GitHub pages. Here's the file that sets up the GitHub action that builds and deploys my site:

name: github pages

on:
  push:
    branches:
      - master
  pull_request:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Setup Node
        uses: actions/setup-node@v2
        with:
          node-version: "14"

      - name: Cache dependencies
        uses: actions/cache@v2
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      - run: yarn
      - run: yarn lint
      - run: yarn generate

      - name: deploy
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./docs

When changes are pushed to the master branch, this GitHub Action runs. It lints the code, builds the site with yarn generate and then the peaceiris/actions-gh-pages@v3 GitHub Action commits only the build artifacts to the gh-pages branch where the content is served on briancaffey.github.io.

Nuxt.js Framework

Nuxt.js is a versatile Vue.js framework. It can be used to build several different types of websites, including Server Side Rendered (SSR) websites, single page applications (SPA) and static websites. I use the static website mode, also called Server Side Generation (SSG).

Content API

Nuxt has a module called Content API that allows you to do the following:

write articles in Markdown
Use Vue components in Markdown
write custom front-matter that can be used to query Markdown articles using an API

Articles and Folder Structure

Nuxt uses a folder structure that automatically generates routes and helps you organize components, pages and layouts. Here is the folder structure for my personal website's repository:

$ tree -L 1 .
.
├── README.md
├── assets  <-- compiled assets
├── components <-- Vue components used in the site
├── content  <-- contains Markdown files for articles
├── i18n  <-- contains translations
├── jsconfig.json
├── layouts  <-- layouts used for each page
├── middleware  <-- I'm not using this folder
├── node_modules
├── nuxt.config.js  <-- config file
├── package.json
├── pages  <-- directory structure defines URLs for site pages
├── plugins  <-- plugins
├── static  <-- files in this folder are served as-is
├── store  <-- Vuex store
├── tailwind.config.js
└── yarn.lock

The content folder has the following structure:

$ tree -L 3 content
content/
├── 2016
│   └── 04
│       └── 07
│           └── my-article.md
├── 2017
│   └── 01
│      └── 01
│           └── my-other-article.md
...
└── projects
│   └── my-project.md

This will produce the following routes:

/2016/04/07/my-article.html
/2017/01/01/my-other-article.html
/projects/my-project.html

Here's the folder structure for the pages directory:

$ tree -L 4 pages
pages
├── README.md
├── _year
│   └── _month
│       └── _day
│           └── _slug.vue
├── blog
│   ├── index.vue
│   └── tags
│       ├── _tag.vue
│       └── index.vue
├── confirm-subscription.vue
├── contact
│   └── index.vue
├── drafts
│   └── index.vue
├── index.vue
├── projects
│   ├── _slug.vue
│   └── index.vue
└── thank-you.vue

Directories in the pages directory starting with an underscore (like _year) can be used as URL parameters.

Markdown and Vue Components

One nice feature of Nuxt and the Nuxt Content module is the ability to use Vue components directly in Markdown files. Here's an example of a Vue component in a Markdown file:

# Markdown with Vue components

This is some content

<my-component />

This is more content.

Vue components used in Markdown files must be included in the components/global directory. Here's an article on my blog where I used Vue components to show interactive graphs: https://briancaffey.github.io/2021/01/16/i-scraped-analyzed-and-generated-yc-companies-founders-and-work-at-a-startup-job-postings.

Images

Images are an important part of the articles that I write. Each article has an optional cover image. I try to include a cover image for each of the articles I write. I mostly use Gimp to create the cover images.

The cover image for this article was made with Inkscape. I have also used Gimp and Blender to generate images for my blog.

Including images in the body of an article is pretty simple on my personal site. First I need to add the image to the static directory, and then I can reference that image using the following syntax:

![alt text](/static/path/to/my-image.png)

Here's the image for this article:

Markdown front-matter

Front-matter is a way to define metadata for a Markdown file. It is used to define the title, description, and other metadata that can be used to query articles.

Here's whe front-matter for this article on my personal site briancaffey.github.io/2021/10/02/sharing-a-dev-article-everywhere:

---
title: How I write and share technical development articles in 2021
date: '2021-10-02'
description: This article describes how to write and share technical articles in 2021
tags:
  - nuxt
  - vue
  - publishing
  - writing
draft: true
---

I also use front-matter for the following:

OpenGraph meta tags & social sharing
links to other outlets
meta tags
article tags

Nuxt Sitemap

Using the @nuxt/sitemap module, it is easy to generate a sitemap.xml file. This generates a static file that is available on /sitemap.xml. This file is used in the Google Search Console to tell Google which pages on my site should be indexed. This helps with SEO.

RSS Feed

An RSS feed is configured using another official Nuxt module called @nuxtjs/feed. This plugin generates an RSS feed for the site in XML. An RSS feed can be used to automatically publish articles to other sites, I'll show this in the next section of this article.

Google Analytics

Google Analytics is used to track site traffic and gives insights into what content is popular, where my users are visiting from, how long they spend browsing my site and other helpful metrics. It is likely that many readers of my site may have disabled Google Analytics in their browsers.

Google Search Console

Google Search Console is another tool that is helpful from an SEO perspective.

Here is a report from Google Data Studio showing some some of the metrics that I use to analyze my site's traffic: https://briancaffey.github.io/2021/10/02/how-i-write-and-share-technical-software-development-articles-in-2021#google-search-console

MailChimp

I use MailChimp to build a newsletter audience. I'll be sending out a newsletter to my current audience with an update about this article. I wrote an article about how to set up MailChimp on Nuxt. I wrote an article on my blog about how I set up a form for guests to sign up for a newsletter using a MailChimp form: https://briancaffey.github.io/zh/2020/10/10/how-to-add-email-signup-form-to-nuxt-site-with-mailchimp.html.

formsubmit.co

Site visitors can send me messages through an online form called formsubmit.co. I include this form on my site's Contact page.

Drift

Drift is a freemium service that allows site visitors to send me messages in real time. It's a great way to get in touch with site visitors, and it can be configured so that messages go to the Drift mobile app.

Drafts

One option in the front-matter for my blog articles is draft. If an article has draft: true set in the front-matter, then the article will not be listed on the main list of blog articles on my site, and the page will not be indexed by Google. Here's where you can find the draft articles for my site: https://briancaffey.github.io/drafts. The articles here can be accessed publicly, but I only show them on the /drafts page which is not listed anywhere else on my site.

Publishing on other outlets

When publishing articles from my personal website on other sites, I make sure that custom content is either removed or replaced with a link or static image that I can upload to the other site while editing the article.

For example, this article includes an embedded Google Data Studio report in the version that is published on briancaffey.github.io. This embedded iframe will not work when posted to other platforms, so I can instead link to an anchor tag that corresponds to the location of the custom element on my site.

For this article, I have mostly tried to keep the custom content to a minimum so that it will be easy to cross publish on other sites without having to make lots of edits to the markdown. Most of the tweaking will likely have to do with preview images and other custom front-matter properties that some site (like DEV.to) support.

DEV.to

DEV.to is a popular site for sharing technical articles. They allow you to automatically draft articles to publish on their site by adding your site's RSS feed. This article will be published on DEV.to through the RSS feed connection that my account has with DEV.to.

DEV.to articles support their own custom front-matter properties. Here's what the front-matter for the DEV.to article looks like:

---
title: How I write and share technical software development articles in 2021
published: false
date: '2021-10-02'
tags:  nuxt, vue, publishing, blogging
image: 'https://briancaffey.github.io/static/dev-sites.png'
canonical_url: https://briancaffey.github.io/2021/10/02/how-i-write-and-share-technical-software-development-articles-in-2021
---

If you don't see your article in your list of article drafts on your DEV.to dashboard, you can go into Settings > Extensions > Publishing to DEV Community from RSS and click on Save Feed Settings. I think this refreshes your RSS feed in your dashboard.

Medium

I haven't published anything on Medium, so one of my goals for this article is to cross publish it on Medium in my first article on that platform.

https://medium.com/new-story

Hashnode

Hashnode seems very similar to DEV.to. Here's a comparison that shows some of the advantages of using Hashnode as a blogging platform over DEV.to: https://hashnode.com/vs/devto.

https://hashnode.com/create/story

egghead.io

Egghead is another blogging platform that allows you to helps you Own Your Online Presence and also lets you create free and paid courses and content.

Hacker Noon

Hacker Noon is another platform that I haven't used before as a writer, but one that I'm aiming to publish on with this article!

https://app.hackernoon.com/new

I have shared a lot of content on different programming subreddits specific to some of the tools and frameworks I use, such as r/aws and r/django. When sharing on reddit, I like to share links to my personal website with at least on comment that provides a detailed summary of the article. When sharing on

https://www.reddit.com/r/Nuxt/

Facebook

Facebook has very large and active developer communities. Sometimes the communities are more fragmented than the communities on reddit. For example, there are several Nuxt communities on Facebook, but there is just one r/nuxt. Similar to sharing content on reddit, I like to share links to my personal websites with detailed comments on the content of my article.

Discord Servers

Discord also has some dedicated servers for software frameworks, such as Nuxt.js. Discord seems to be the official place where Vue.js community members chat in real-time. There are dedicated channels on the server for sharing articles.

Bonus content, project plug and conclusion

One more great thing about GitHub pages is that you can publish a site on any of your GitHub repositories that will be hosted on a subpath of your GitHub pages blog.

I have been working updating and rewriting my Django + Vue.js + AWS reference project. It contains a documentation site that I am making with VuePress. The repo for this project is here: github.com/briancaffey/django-step-by-step. This repository has it's own GitHub Pages configuration, as well as a GitHub Action to help automate the deployment of this project documentation site to GitHub Pages. The project site is currently hosted on briancaffey.github.io/django-step-by-step/.

You may want to split a large project's documentation site into its own site, rather than having it live on the nested path of a personal blog. Following this pattern, your GitHub pages blog can become a site that is much larger than one single Nuxt static site. briancaffey.github.io is now a hybrid Nuxt.js and VuePress site, with a subset of routes (starting with /django-step-by-step/) being served by VuePress.

As I'm writing this article, Nuxt 3 is almost one week away from a public beta. I'm excited to try upgrading this site to Nuxt 3 and trying out some of the new features that it includes.

Thanks for reading this article, wherever you may have found it on the internet!

On-demand, serverless Valheim server setup with AWS CDK, Discord Interactions and GitLab CI

Brian Caffey — Sat, 20 Mar 2021 17:07:51 +0000

Here's a link to the GitLab repo I'll be referencing in this article: https://gitlab.com/briancaffey/valheim-cdk-discord-interactions

This is an in-depth technical article about running an on-demand, dedicated server for Valheim using Amazon Web Services controlled with Discord Slash Commands, a new part their Interactions API that is currently in beta. Valheim is an open-world online multiplayer survival game loosely based on Norse mythology that has blown up recently.

My main goal with this project was to find an inexpensive way of running a server given how my friends and I play the game, which is typically a few times a week in the evenings. Some combination of 4 of us will start playing, and we drop in and out between walking dogs, cooking, etc. When playing, we all jump on a dedicated voice channel on our group's Discord.

Before setting up a dedicated server, our game's world state was stored on files that lived on one of our computers, and that computer needed to be running the game server in order for anyone to connect. Sending files around would be possible, but would quickly become tedious. There are lots of services that offer dedicated servers for Valheim, as well as many technical guides and channels on the official Valheim Discord server to support the use of dedicated servers. I wanted to see if I could set up a server myself on AWS using CDK, or Cloud Development Kit. CDK is an Infrastructure as Code (IaC) tool that allows you to define, deploy and update AWS infrastructure with popular programming languages such as Python, Typescript, Java, etc.

`cdk-valheim` construct on GitHub

The best part of CDK is that it enables the creation high-level, reusable constructs that can be published to software registries like npm and PyPI. Developers can import and use these constructs in their own CDK code. A quick google search for "cdk valheim" turned up a few results. cdk-valheim seems like the best option for what I was looking for. This project uses ECS, a container orchestration tool from AWS that I have experience using with web applications and EFS for persistent file storage. Although it is written in Typescript, I can still use the construct in my preferred programming language (Python) without any extra effort or configuration. This is thanks to the jsii. From https://github.com/aws/jsii:

jsii allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase!

Here's an overview of the cdk-valheim construct:

Scheduled scaling of an ECS service using AWS Fargate (a serverless compute engine for containers)
Elastic File System (EFS) file system mounted into the Fargate Task container of our ECS service
Optional automated backups of the EFS file system using AWS Backup

Discord Interactions and Slash Commands

Scheduled ECS scaling is nice, but we don't always know when we will be able to play together, so it is not the best way to minimize infrastructure costs. My plan was to set the ECS service to an initial task count of zero and then let any of us set the number of tasks to either zero or one through a Discord Slash Command. A Slash Command allows you to interact with a discord bot by typing / and then tabbing through to the option we want, such as /valheim server start or /valheim server status.

Invoking a Slash Command from Discord sends a POST request from Discord to a webhook URL that we have to provide. To handle the webhook, one simple and inexpensive approach is to use API Gateway and a Lambda function that serves a Flask app. The Flask app can then use boto3 (which is included in the Lambda execution environment) to call update_service or describe_services to scaled the ECS task's desiredCount based on the slash command options and sub options.

The function that handles the webhook POST request when the sub-command is status queries AWS for the number of ECS tasks in our service that are desired, running and pending and then sends back a message that will be displayed to the user who sent the command. When the sub-command is start or stop, the desiredCount is either set to 1 or 0.

Here's an overview of the server setup and how Discord Slash Commands can be used to control the server:

I'll cover each of the steps labelled in this diagram at the end of the article. The rest of the article will provide detailed instructions for how to set everything up. I won't be going over AWS account setup or Discord server setup.

How to set up the Discord developer application and Interaction

First, you need to be the admin of a Discord server. Once you create the server, go to Server Settings > Widget and take note of the Server ID. This is also known as the Guild ID.

Then go to https://discord.com/developers/applications and create an application. Under General Information, make note of the public key (the application ID). Put these values in a file that will be .gitignored called .env:

export GUILD_ID=123456789
export APPLICATION_ID=abc123

We will use this file later when registering the Interaction.

Next go to the OAuth2 tab and select the bot and applications.commands permissions. This will generated an OAuth2 authorization link. Copy the link and open it in a browser. We will see an error:

OAuth2 application does not have a bot

Configure a bot for our Discord application

Next, got to the Bot tab and click the Add Bot button.

Adding a bot user gives your app visible life in Discord. However, this action is irrevocable! Choose wisely.

Turn off the Public Bot option and save changes.

Get the bot token by clicking on Click to Reveal Token, and add this to .env:

export BOT_TOKEN=abc.xyz.123

Now go back to the OAuth2 tab and select the bot and applications.commands permissions again, copy the link and open it. Select the server that you want to add this application to. You should see a captcha, and then a message that says Authorized. You should also see a message from your discord server that the bot has joined the server.

Create the Interaction

Now we will set up the Interaction. Currently the only way to set up the interaction is through an HTTP POST request. This Python script sets up our Interaction:

"""
https://discord.com/developers/docs/interactions/slash-commands#registering-a-command
"""

import os

import requests

APPLICATION_ID = os.environ.get("APPLICATION_ID")
GUILD_ID = os.environ.get("GUILD_ID")
BOT_TOKEN = os.environ.get("BOT_TOKEN")

url = f"https://discord.com/api/v8/applications/{APPLICATION_ID}/guilds/{GUILD_ID}/commands"

json = {
    "name": "vh",
    "description": "Start, stop or get the status of the Valheim server",
    "options": [
        {
            "name": "valheim_server_controls",
            "description": "What do you want to do?",
            "type": 3,
            "required": True,
            "choices": [
                {
                    "name": "status",
                    "value": "status"
                },
                {
                    "name": "start",
                    "value": "start"
                },
                {
                    "name": "stop",
                    "value": "stop"
                }
            ]
        },
    ]
}

headers = {
    "Authorization": f"Bot {BOT_TOKEN}"
}

if __name__ == "__main__":
    r = requests.post(url, headers=headers, json=json)
    print(r.content)

Before running this command, source the .env file:

source .env

Then run the script:

python3 register_bot.py

You should see this response:

b'{"id": "XXXXXXXXXXXXXX", "application_id": "XXXXXXXXXXXXXX", "name": "vh", "description": "Start, stop or get the status of the Valheim server", "version": "XXXXXXXXXXXXXX", "default_permission": true, "guild_id": "XXXXXXXXXXXXXX", "options": [{"type": 3, "name": "valheim_server_controls", "description": "What do you want to do?", "required": true, "choices": [{"name": "status", "value": "status"}, {"name": "start", "value": "start"}, {"name": "stop", "value": "stop"}]}]}'

Now, when you type / in any channel on the Discord server that you authenticated the bot, you should see the vh command at the top of the list of autocomplete options.

If we run any of these commands, we should see a response saying:

This interaction failed

This is because we have not configured an Interactions Endpoint URL under the General Information section of our Discord Application's admin page (https://discord.com/developers/applications/).

Setting up the Interactions Endpoint URL for our Slash Command

In order for our Slash Command to do anything, we need to set up URL that Discord will POST the Interaction event data to, including information such as who sent the Interaction, what channel it was sent on, what options were used, etc. You can see an example of the event payload here on the Discord developer documentation:

{
    "type": 2,
    "token": "A_UNIQUE_TOKEN",
    "member": {
        "user": {
            "id": 53908232506183680,
            "username": "Mason",
            "avatar": "a_d5efa99b3eeaa7dd43acca82f5692432",
            "discriminator": "1337",
            "public_flags": 131141
        },
        "roles": ["539082325061836999"],
        "premium_since": null,
        "permissions": "2147483647",
        "pending": false,
        "nick": null,
        "mute": false,
        "joined_at": "2017-03-13T19:19:14.040000+00:00",
        "is_pending": false,
        "deaf": false
    },
    "id": "786008729715212338",
    "guild_id": "290926798626357999",
    "data": {
        "options": [{
            "name": "cardname",
            "value": "The Gitrog Monster"
        }],
        "name": "cardsearch",
        "id": "771825006014889984"
    },
    "channel_id": "645027906669510667"
}

This POST request also includes some special headers used for security that we will need to do validation with our handling function. This part can be handled with a decorator provided by the discord-interactions package on PyPI, but we will need to add some additional configuration to our API Gateway endpoint since these headers will not be passed through the lambda by default.

Setting up AWS infrastructure with CDK

Let's start a CDK project in a blank repository that will define our infrastructure, Lambda functions and CI/CD pipeline with GitLab CI. Make sure that you have the aws-cdk CLI installed globally:

npm i -g aws-cdk

Then start a CDK project in a subdirectory called cdk with:

mkdir cdk && cd cdk && cdk init app --language=python

Add CDK project dependencies

The next step is to add all of the dependencies to our CDK project that we will use in this project. In setup.py add the following:

    install_requires=[
        "aws-cdk.core==1.92.0",
        "aws-cdk.aws_applicationautoscaling==1.92.0",
        "aws-cdk.aws_datasync==1.92.0",
        "aws-cdk.aws_lambda==1.92.0",
        "aws-cdk.aws_s3==1.92.0",
        "aws-cdk.aws_apigateway==1.92.0",
        "cdk-valheim==0.0.16",
    ],

Next we can add the CDK construct for ValheimWorld in the cdk_stack.py file that was generated in our project as well as the imports for the packages we included in setup.py:

from aws_cdk import core as cdk

from aws_cdk import (
    core,
    aws_datasync as datasync,
    aws_iam as iam,
    aws_lambda as _lambda,
    aws_apigateway as apigw,
    aws_applicationautoscaling as appScaling,
    aws_s3 as s3,
)
from cdk_valheim import ValheimWorld, ValheimWorldScalingSchedule


class CdkStack(cdk.Stack):

    def __init__(self, scope: cdk.Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)
        # The code that defines your stack goes here
        self.valheim_world = ValheimWorld(
            self,
            'ValheimWorld',
            cpu=2048,
            memory_limit_mib=4096,
            schedules=[ValheimWorldScalingSchedule(
                start=appScaling.CronOptions(hour='12', week_day='1-5'),
                stop=appScaling.CronOptions(hour='1', week_day='1-5'),
            )],
            environment={
                "SERVER_NAME": os.environ.get("SERVER_NAME", "CDK Valheim"),
                "WORLD_NAME": os.environ.get("WORLD_NAME", "Amazon"),
                "SERVER_PASS": os.environ.get("SERVER_PASS", "fargate"),
                "BACKUPS": 'false',
            })

We are almost ready to deploy a basic version of our Valheim server using the cdk-valheim construct. If we were to run the following command:

cdk deploy --app cdk/app.py --require-approval never

from the root of our project, it should work. This assumes that we have default credentials configured in ~/.aws/credentials and that we have also bootstrapped our AWS account with the resources it needs for CDK to work:

cdk bootstrap --app cdk/app.py aws://$AWS_ACCOUNT_ID/$AWS_DEFAULT_REGION

Setup GitLab CI job for automated deployments

Instead of deploying from the command line, it would be better to run the deployment from a CI/CD pipeline. Add a .gitlab-ci.yml file to the root of your project and populate it with the following YAML:

stages:
  - deploy

image: python:3.8

cdk_deploy:
  stage: deploy
  rules:
    - if: "$CI_COMMIT_TAG"
      when: always
  before_script:
    - apt-get -qq update && apt-get -y install nodejs npm
    - npm i -g aws-cdk
    - pip3 install -e cdk
  script:
    - cdk bootstrap --app cdk/app.py aws://$AWS_ACCOUNT_ID/$AWS_DEFAULT_REGION
    - cdk deploy --app cdk/app.py --require-approval never

Before we initialize a git repository in the root directory of our project, remove the .git repo that CDK created when we initialized the project with rf -rf cdk/.git. Now initialized a project in the root directory with git init.

Next, create a GitLab repository and add the remote to this project with:

git remote add origin git@gitlab.com:gitlab-username/project-name.git

In the GitLab project's Settings > CI/CD > Variables section, add the following environment variables as protected variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_ACCOUNT_ID
APPLICATION_PUBLIC_KEY
SERVER_PASS
SERVER_NAME
WORLD_NAME

Now under Settings > Repository > Protected Tags, add a wildcard (*) so that all tags are protected and only maintainers can push tags. This allows us to use the protected environment variables only when a trusted maintainer pushes a tag to the repository.

Edit the name of the stack and add region and account info

We are almost ready to create a tag and push to GitLab, but before we do that let's change the name of the CloudFormation stack that CDK will create in cdk/app.py:

#!/usr/bin/env python3

import os

from aws_cdk import core as cdk

# For consistency with TypeScript code, `cdk` is the preferred import name for
# the CDK's core module.  The following line also imports it as `core` for use
# with examples from the CDK Developer's Guide, which are in the process of
# being updated to use `cdk`.  You may delete this import if you don't need it.
from aws_cdk import core

from cdk.cdk_stack import CdkStack

aws_region = os.environ.get("AWS_DEFAULT_REGION", "us-east-1")
aws_account = os.environ.get("AWS_ACCOUNT_ID", "")


app = cdk.App()
CdkStack(
    app,
    "valheim-server-stack",
    env={"region": aws_region, "account": aws_account}
)

app.synth()

Now commit changes, create a tag and push it to GitLab:

git add .
git commit -m "initial commit"
git tag v0.0.1
git push origin v0.0.1

Check the logs of the GitLab CI pipeline that this creates in your GitLab project's CI/CD settings.

If everything runs successfully, you should be able to see your Valheim server listed in the list of community servers once it comes online, and you should be able to connect to it with the password you set in GitLab project variables.

Add the Lambda function handler code

We will have a simple Flask application respond the the Discord POST requests that send Interaction events. Let's add lambda-handler.py in lambda/functions/interactions/lambda-handler.py, and requirements.txt in lambda/functions/interactions/requirements.txt. Our project structure should look like this:

$ tree -L 4
.
├── cdk
│   ├── app.py
│   ├── cdk
│   │   ├── cdk_stack.py
│   │   └── __init__.py
│   ├── cdk.json
│   ├── README.md
│   ├── requirements.txt
│   ├── setup.py
│   └── source.bat
├── lambda
│   └── functions
│       └── interactions
│           ├── lambda-handler.py    <-- here
│           └── requirements.txt     <-- and here
├── README.md
└── register_bot.py

The requirements.txt defines the pip dependencies for our Lambda function. It should include the following:

aws-wsgi==0.2.7
discord-interactions==0.2.0
Flask==1.1.2

aws-wsgi will transform API Gateway requests into WSGI application requests that Flask can handle
discord-interactions will help us with some security-related requirements
Flask will be our web application framework

Here's the code for lambda-handler.py:

import os
import logging

import awsgi
import boto3
from discord_interactions import verify_key_decorator
from flask import (
    Flask,
    jsonify,
    request
)


client = boto3.client('ecs')

# Your public key can be found on your application in the Developer Portal
PUBLIC_KEY = os.environ.get('APPLICATION_PUBLIC_KEY')

logger = logging.getLogger()
logger.setLevel(logging.INFO)

app = Flask(__name__)


@app.route('/discord', methods=['POST'])
@verify_key_decorator(PUBLIC_KEY)
def index():
    if request.json["type"] == 1:
        return jsonify({"type": 1})
    else:
        logger.info(request.json)
        try:
            interaction_option = request.json["data"]["options"][0]["value"]
        except KeyError:
            logger.info("Could not parse the interaction option")
            interaction_option = "status"

        logger.info("Interaction:")
        logger.info(interaction_option)

        content = ""

        if interaction_option == "status":
            try:

                resp = client.describe_services(
                    cluster=os.environ.get("ECS_CLUSTER_ARN", ""),
                    services=[
                        os.environ.get("ECS_SERVICE_NAME", ""),
                    ]
                )
                desired_count = resp["services"][0]["desiredCount"]
                running_count = resp["services"][0]["runningCount"]
                pending_count = resp["services"][0]["pendingCount"]

                content = f"Desired: {desired_count} | Running: {running_count} | Pending: {pending_count}"

            except Error as e:
                content = "Could not get server status"
                logger.info("Could not get the server status")
                logger.info(e)

        elif interaction_option == "start":
            content = "Starting the server"

            resp = client.update_service(
                cluster=os.environ.get("ECS_CLUSTER_ARN", ""),
                service=os.environ.get("ECS_SERVICE_NAME", ""),
                desiredCount=1
            )

        elif interaction_option == "stop":
            content = "Stopping the server"

            resp = client.update_service(
                cluster=os.environ.get("ECS_CLUSTER_ARN", ""),
                service=os.environ.get("ECS_SERVICE_NAME", ""),
                desiredCount=0
            )

        else:
            content = "Unknown command"

        logger.info(resp)

        return jsonify({
            "type": 4,
            "data": {
                "tts": False,
                "content": content,
                "embeds": [],
                "allowed_mentions": { "parse": [] }
            }
        })

def handler(event, context):
    return awsgi.response(
        app,
        event,
        context,
        base64_content_types={"image/png"}
    )

Notice how we pass the Flask app to awsgi.response. aws-wsgi (or awsgi as it is imported) is the go-between for API Gateway and WSGI.

Add the CDK code for API Gateway and Lambda that will serve our Discord Interaction Endpoint URL

Now we can add the following code to cdk_stack.py to configure the API Gateway and Lambda function. Add the following to cdk_stack.py after our definition of self.valheim_world:

        self.env_vars = {
            "APPLICATION_PUBLIC_KEY": os.environ.get("APPLICATION_PUBLIC_KEY"),
            "ECS_SERVICE_NAME": self.valheim_world.service.service_name,
            "ECS_CLUSTER_ARN": self.valheim_world.service.cluster.cluster_arn
        }

        self.flask_lambda_layer = _lambda.LayerVersion(
            self,
            "FlaskAppLambdaLayer",
            code=_lambda.AssetCode("./layers/flask"),
            compatible_runtimes=[_lambda.Runtime.PYTHON_3_8,],
        )

        self.flask_app_lambda = _lambda.Function(
            self,
            "FlaskAppLambda",
            runtime=_lambda.Runtime.PYTHON_3_8,
            code=_lambda.AssetCode('./lambda/functions/interactions'),
            function_name="flask-app-handler",
            handler="lambda-handler.handler",
            layers=[self.flask_lambda_layer],
            timeout=core.Duration.seconds(60),
            environment={**self.env_vars},
        )

        self.flask_app_lambda.role.add_managed_policy(
            iam.ManagedPolicy.from_managed_policy_arn(
                self,
                'ECS_FullAccessPolicy',
                managed_policy_arn='arn:aws:iam::aws:policy/AmazonECS_FullAccess'
            )
        )

        # https://slmkitani.medium.com/passing-custom-headers-through-amazon-api-gateway-to-an-aws-lambda-function-f3a1cfdc0e29
        self.request_templates = {
            "application/json": '''{
                "method": "$context.httpMethod",
                "body" : $input.json("$"),
                "headers": {
                    #foreach($param in $input.params().header.keySet())
                    "$param": "$util.escapeJavaScript($input.params().header.get($param))"
                    #if($foreach.hasNext),#end
                    #end
                }
            }
            '''
        }

        self.apigateway = apigw.RestApi(
            self,
            'FlaskAppEndpoint',
        )

        self.apigateway.root.add_method("ANY")

        self.discord_interaction_webhook = self.apigateway.root.add_resource("discord")

        self.discord_interaction_webhook_integration = apigw.LambdaIntegration(
            self.flask_app_lambda,
            request_templates=self.request_templates
        )

        self.discord_interaction_webhook.add_method(
            'POST',
            self.discord_interaction_webhook_integration
        )

First we add some environment variables that will be made available to the Lambda function's execution environment. The ECS cluster and service name as well as our Discord application's PUBLIC_KEY are needed in the Lambda function for everything to work.

We have to give the lambda function permissions to make changes to ECS since it will be interacting with ECS via boto3.

self.request_templates is needed in order to pass the special security headers from the Discord POST request that are needed for security. I couldn't find a lot of resources on how to make this work, but I learned that this uses Apache Velocity Template Language.

Add a GitLab CI job for installing dependencies into Lambda Layer

There's one more step before we can push our code. We need to add another GitLab CI job that will install the Lambda dependencies so that they can be sent to the Lambda layer that we defined in our Lambda function. A Lambda Layer is where you install dependencies for this type of Lambda setup. Let's add the following stage:

stages:
  - build
  - deploy

image: python:3.8

pip_install:
  stage: build
  rules:
    - if: "$CI_COMMIT_TAG"
      when: always
  artifacts:
    paths:
      - layers/flask/python
  script:
    - pip install -r lambda/functions/interactions/requirements.txt -t layers/flask/python

Now we are installing dependencies into a target location (with the -t flag) that our Lambda Layer will be able to use in the cdk_deploy GitLab CI job. This is because we have indicated the path to layers/flask/python in the paths array of artifacts in the pip_install job. There are other ways to add the pip dependencies to the Lambda Layers. We don't absolutely need this to be done in a separate CI job.

Now tag and push the code to GitLab and check to see that the pipeline runs successfully.

Add the API Gateway URL to Discord Application settings

If everything runs smoothly, we should see a URL in the very last lines of the pipeline. This is the URL for our API Gateway endpoint:

https://abc123xyz.execute-api.us-east-1.amazonaws.com/prod/

We need to add discord to the end of this URL and then add that to our the Interactions Endpoint URL in the General Information section of our Discord application's admin page:

https://abc123xyz.execute-api.us-east-1.amazonaws.com/prod/discord

When we add this URL in the application settings, Discord will make sure that our endpoint is properly verifying the request based on its headers. Check out this function to see how it works:

def verify_key_decorator(client_public_key):
    from flask import request, jsonify

    # https://stackoverflow.com/questions/51691730/flask-middleware-for-specific-route
    def _decorator(f):
        @wraps(f)
        def __decorator(*args, **kwargs):
            # Verify request
            signature = request.headers.get('X-Signature-Ed25519')
            timestamp = request.headers.get('X-Signature-Timestamp')
            if signature is None or timestamp is None or not verify_key(request.data, signature, timestamp, client_public_key):
                return 'Bad request signature', 401

            # Automatically respond to pings
            if request.json and request.json.get('type') == InteractionType.PING:
                return jsonify({
                    'type': InteractionResponseType.PONG
                })

            # Pass through
            return f(*args, **kwargs)
        return __decorator
    return _decorator

If it fails verification, we will not be able to add the URL and it will not work. You might want to add some additional logging to the Lambda function if you are not able to add the URL successfully.

This is all covered in the documentation for Discord Interactions.

Now you should be able to run the Discord slash commands. You can get the status of your ECS cluster and scale it to either 1 or 0 for ON and OFF.

Overview

Here's an overview of what we covered:

This is my computer. For development of this project (and most other projects) I used Windows with WSL2.
GitLab CI - This is used to run our automated pipelines whenever we push a tag.
The CDK CLI is used to create, update and delete the infrastructure in our AWS account.
Valheim - The client for the game server that we set up
The public IP address of the ECS Task that can be used to connect to our server on port 2456.
The ECS Cluster that runs the actual docker container for the Valheim server. By default, the image used is lloesche/valheim-server.
EFS - This is the file system that is mounted onto the container of the ECS task where our game's world data is stored.
AWS Backup (Optional) - This is an optional feature of the cdk-valheim construct that can make regular backups of our EFS file system.
Events (Optional) - AWS Events can be used to scale the number of ECS tasks between 0 and 1.
This is the cdk-valheim construct that I use in this project.
S3 bucket for syncing data to and from EFS with DataSync (WIP)
DataSync for moving game data between EFS and S3.
The Slash Commands that we set up
Discord
Discord Interactions sends and a POST request
The API Gateway endpoint that we configured to handle Discord Interaction POST requests.
The Lambda function running a simple Flask app that responds to the Interaction POST request.
boto3 - This is the AWS SDK Python library included in the Python execution environment that allows us to interact with the resources in our AWS account. In particular, the interactions we use from boto3 are the update_service and describe_servics methods from the ECS module. This allows us to turn our server on and off and also get the status.
This represents the valheim-server-stack we defined in our CDK application.

TODO

There are still some things that I'm working on finalizing.

DataSync for easily moving data between S3 and EFS
Report billing data with an additional slash command sub-command
Add tagging to the resources in our stack to make the billing command easier to implement.
Get feedback from the Discord, CDK and Valheim communities about what I can improve here
Contribute to gotodeploy/cdk-valheim

Thanks for reading!

Session Authentication with Django, Django REST Framework and Nuxt

Brian Caffey — Fri, 01 Jan 2021 20:01:00 +0000

Originally published to https://briancaffey.github.io/2021/01/01/session-authentication-with-django-django-rest-framework-and-nuxt.

This will be a continuation of the discussion about how data flows in Django + Nuxt applications, looking specifically at session authentication.

Here's a GitLab repo that where you can find the source code and other diagrams related to this project:

https://gitlab.com/briancaffey/django-nuxt-starter

This diagram focuses on the interactions between:

I. The browser

II. The Nuxt server (Node process)

III. The Django backend API server (gunicorn process)

Context

For illustration purposes, I'm using a simple CRUD application that has two models: Users and (blog) Posts. Users can log in with email and password credentials and create, read, update and delete blog posts (CRUD). Currently I'm only doing the R (read) of CRUD: listing and viewing blog posts. Creating, updating and delete will be added later. For now, users must be logged in to see posts.

I'm still learning a lot about Nuxt and how it can be used with Django and Django REST Framework. This project is an effort at documenting my learning process, learning in public and learning from mistakes, so any feedback or guidance on what I have written here would be highly appreciated!

Why Nuxt?

Using Nuxt (with Server Side Rendering, or SSR) is one of many ways to use Vue.js with Django. Vue is a progressive framework, which means that it can be gradually adopted into a project--you don't have to go all-in on the framework or rewrite the application from scratch to fit with how Vue works.

Different ways to use Vue with Django

In terms of Django, here are some ways that you can use Vue:

Vue as a jQuery replacement for adding basic interactivity in views served by Django templates
Build a static Vue application and serve it as a set of static assets in a Django project alongside routes that are served by other normal Django templates views.
Build a Vue SPA which consumes a Django API (usually built with Django REST Framework or similar), and serve it over a content delivery network (CDN).
Use Vue to build an Electron desktop app that uses Django as an API

In these scenarios, Vue is served as either static assets (such as in the case of serving a SPA over a CDN), or Vue code is included in an HTML response from a server (where the view library, not your application, is served over a CDN), similar to how jQuery is used.

Different ways to use Nuxt

Nuxt is a Framework that can be used in a few different ways, I'll briefly discus three ways in which Nuxt can be used. Common to all three of these ways of using Nuxt is the directory structure. No matter how you use Nuxt, it provides a great way to organize Vue code.

Static mode: this mode allows you to write Vue code which is built into a static HTML, and then that HTML is deployed to a CDN or webserver like NGINX. The developer (or CI/CD process) runs a command to generate HTML files for each page in the application, and these pages are served as-is when accessed by a user. I recently migrated my personal blog from Jekyll to Nuxt with full-static mode. Check it out at (briancaffey.github.io).
SPA mode: This is similar to what you might use if you started a Vue project with Vue CLI. The project is also generated as in Static Mode, but what is generated is primarily Javascript code that is executed on the browser.
SSR mode: Server Side Rendering is the mode that I'll be focusing on here. Unlike the other ways of using Vue that have already been discussed, this mode involves a Node.js server that will handle our requests. For example, a web request for /posts is sent to our Nuxt Server (a Node.js server process) and Node.js is responsible for returning HTML that contains all of the blog Posts that we want to show (or a paginated selection of all blog posts, which is how my example blog app is built). So the Nuxt app has to make a request to our Django API server before returning fully rendered HTML page for the /posts page. The user then gets the page from Nuxt, reads all of the blog posts and then decides to check out the blog posts on the second page of posts. When the user clicks on page 2, we request the second page of data from our Django API directly, not from Nuxt. The user then sees a short loading animation followed by the second page of blog posts that are loaded in using AJAX (usually with fetch or axios).

Nuxt Benefits

The main reason for using Nuxt is to render the first page loads on the server, returning a complete HTML response that can be beneficial for SEO, social sharing, and other scenarios where you need control over how a website's pages are delivered (specifically, the initial request made to the server).

This type of control is not possible for applications that serve Vue over CDN since they can only request backend API data once the JS client has been requested from a CDN.

Nuxt Downsides and Tradeoffs

Using Nuxt for SSR introduces quite a bit of complexity in both the application deployment and our Vue code. The backend API won't have to change at all when moving to Nuxt from a static Vue SPA.

Django alone is capable of returning fully generated, SEO-optimized HTML for each request, but applications built with Vue and Django templates may be difficult to work on as the project grows larger and larger. The Django/DRF + Nuxt approach may be more appropriate for projects with dedicated backend and frontend teams.

One other potential downside is added latency because of the "double request". If the Nuxt server and the Django server are on the same machine, then this latency will probably be a non-issue.

Diagram

This diagram looks at session authentication with a focus on the browser, the Nuxt server and the Django server. It looks at two simple user stories, ordered from top to bottom in the diagram.

I. An existing application user visits the site in a new browser, navigates to the Login page, logs in with credentials and then visits a protected page: /posts.
II. The user closes the browser and then comes back directly to the /posts page.

These two user stories sound simple, but they touch on a lot of the features of Nuxt that make it powerful, and complicated at first (for Vue users). These include:

asyncData
nuxtServerInit
Vuex on the client and server
Custom plugin for axios

Some important parts of Nuxt that this diagram does not (yet) touch on are:

Nuxt auth module (I don't know if this is relevant for my use case)
Nuxt fetch property (different from the fetch web API)
Nuxt middleware (I'm also not sure if this would be helpful for anything I am doing in this example project)

User story I.: Visit site, login, navigate to `/posts`

User navigates to http://domain.com/. This request is handled by the Nuxt server.
The nuxtServerInit action is called (read more on nuxtServerInit). This is a special Vuex action that, if defined in store/index.js, will be called once per request to the Nuxt Server (when a page is initially visited or refreshed in the browser).
nuxtServerInit dispatches a Vuex action in the user module called fetchData. This action makes an GET request to /api/account/ in the Django application.
/api/account/ gets the logged-in user's profile information.
If the request is made by an anonymous user (no user is logged in), a 403 response is returned to the Nuxt server and no account data is set in the Vuex store (on the server).
The HTML is rendered for the Home Page
The rendered HTML generated by Vuex is sent back to the browser.
The user navigates (via a <router-link>/<nuxt-link>) to the login page. Since the navigation happened through Vue router, no request to the Nuxt server is made.
The created hook for the Login page makes a GET request to /api/login-set-cookie/.

10, 11. This endpoint calls a simple view that is decorated with @ensure_csrf_token.

When the response returns to the browser, the csrftoken is set in the browser.
The $apiCall function is defined in plugins/axios.js, and it adds the csrftoken cookie to the X-CSRFToken header of API requests. This is important for POST request where the CSRF token is required. When the user fills out their email and password in the login form, the $apiCall function is called with /api/login/ and the email/password as credentials.
The email and password are sent as data in the POST request to /api/login/.

15, 16. The /api/login/ URL calls the login_view which makes use of two functions from django.contrib.auth: authenticate and login. authenticate gets a user from the provided email/password, and the login function sets an HttpOnly sessionid session cookie on the response.

The HttpOnly sessionid cookie is automatically set on the browser when the /api/login/ request returns successfully.
When this /api/login/ request returns successfully, a value in the auth Vuex module is set to keep track of the current user's logged in state.
Next, a GET request is made to /api/account/.

20, 21. Since the sessionid cookie is set and sent along with the request automatically, this request will succeed.

When the /api/account/ request returns, the user's account information is saved to the user Vuex module. At this point, the client may redirect automatically to the home page, or user account page, dashboard, etc.
Now logged in, the user navigates (again via Vue router) to /posts, a page that shows a paginated view of all blog posts.
This page has an asyncData method which is called when the page component is created and it dispatches a Vuex action posts/fetchData.
This Vuex action makes a GET request to /api/posts/.

26, 27. /api/posts/ uses a ModelViewSet and returns a paginated list of blog posts

When the /api/posts/ request returns successfully, the blog post data is saved to the blog Vuex module.

User story II.: Logged in user opens new browser window and revisits `/posts`

The user closes their browser and then opens a new browser window and navigates to /posts.
nuxtServerInit is called as usual,
The user/fetchData action is called. This action makes a GET request to /api/account/.
The /api/account/ request returns successfully. The sessionid cookie is passed along from the browser to the API request that is made from the Nuxt server to the backend API (/api/account/). User account data is then set on the Vuex user module.
The asyncData method for the /posts pages is called.
asyncData dispatches a Vuex action posts/fetchData
posts/fetchData makes an API request to /api/posts/.
The /api/posts/ request is handled by a ModelViewSet for the Post model that gets blog posts and then sets them to the Vuex store (on the server) when the request returns a response (to the Nuxt server).
Once the async data fetching is compete (nuxtServerInit and asyncData for the /posts page), the page HTML is rendered using the Vuex store data stored on the server. The Vuex data is sent back with the rendered HTML (I think this is how it works).
Finally, the user sees the list of blog posts. The page is loaded "at once"; there is no waiting for data to load after loading the page initially.

Discussion

Complexity

Is this authentication process overly complicated? When I make these diagrams, I try to make simple concept as detailed as possible, but there are a lot of distinct actions being taken in many different parts of the application and getting them all into one diagram was tricky.

HttpOnly Session Cookies

Session authentication is the officially recommended way to do authentication with Django REST Framework for clients that run in the browser. However, there seem to be lots of people using JWT with DRF and Javascript clients that run in the browser. The main argument against doing this is that the JWT must be stored in a Javascript-accessible store (localStorage or Cookies) so it can be passed with each request. Many people are also interested in trying to store JWT for authentication in HttpOnly cookies to harden client-side security. I'm very curious to know if anyone is actually doing this, and what the implementation looks like. While djangorestframework_simplejwt doesn't support HttpOnly, there seems to be lots of interest in doing this. I think it might be possible with a special middleware, so let me know if anyone is interested in proof-of-concept/diagram for that.

Some use cases for JWT and other token authentication methods with DRF might include native mobile apps or Desktop apps. For most cases, I think session authentication with Django's built in session cookies for DRF authentication is the best option. JWTs also have no clear solution for logging out, which may be important for some security considerations. The concept of stateless authentication is interesting, but for most use cases I would argue that it is not worth doing. Let me know if anyone has thoughts on this, I'm curious to see what everyone thinks.

Next Steps

My next steps for this project/repo are to deploy this to a production environment as soon as I have time to do so. My local setup has been working well, and I think it should work well for a simple DigitalOcean docker swarm deployment like I have done with other Django + Vue projects.

I also want to add the create, update and delete functionality for posts, improve error handling with API calls, add form validation, and maybe write some tests with Jest.

Questions

Here are some questions and areas that I still need to investigate.

Nuxt Composition API

I have seen that there is a Composition API module for Nuxt. I have only just now started looking at Composition API examples and documentation for "vanilla" Vue, but I have heard that the Nuxt Composition API module has some additional features specifically for use with Nuxt, so I'm curious to learn what these are.

Nuxt v3 and Vue 3

Nuxt looks like it has plans to support Vue 3, so I am interested to learn more about Vue 3 as it is adopted by Vue frameworks such as Nuxt and Quasar.

Nuxt's fetch method, server middleware, Nuxt auth module

Building web applications with Django, Django REST Framework and Nuxt

Brian Caffey — Mon, 28 Dec 2020 04:26:28 +0000

Over the holidays between lots of big meals and many naps, I tried to tackle one more goal of mine before this year come to an end: building an application with Django and Nuxt.js.

This year I rebuilt my personal blog (briancaffey.github.io) with Nuxt.js, the @nuxt/content headless git-based CMS and TailwindCSS. It is statically generated with Nuxt's full-static mode and has been really enjoyable to work with. I have also learned a lot more about SEO and how Nuxt helps improve Vue applications' SEO. I have also been working a lot with Django and Vue.js applications where Django serves as an API to a Vue.js SPA. This combination of technologies works well for a lot of use cases, but it falls short in SEO. Nuxt also provides a great way to organize large Vue.js projects which I have been finding very helpful. For these reasons, combining Django and Nuxt has been something that I have wanted to try for a while, so this article will share some of my experiences in recent efforts to build with these two frameworks. I took detailed notes of each step of the project setup starting from an empty repository, and I put together a diagram of my understanding of how data flows in the application.

Here's the link to the project repository that I'll be referencing: https://gitlab.com/briancaffey/django-nuxt-starter

This article will focus on explaining the project through the diagram shown below. I added two types of labels: letters and numbers. The letters will introduce each component of the application and its role in the application as a whole. The numbers summarize how data flows through the different components in my sample blog application.

Diagram components

A. Your computer - Possibly also your development machine which is running the application in docker containers with docker-compose.

B. NGINX - This is the "front desk" of the application that does a few different things. It is the first component that web requests come to. It serves as a reverse proxy which does path-based routing. It looks at the URL request and determines where to send it. For example: /api/posts/1, /dashboard/, /admin/ could all be routed differently depending on the NGINX configuration file. We will look at this again in the next section. This component, like most of the other things in the diagram, runs in a container. NGINX can also serve static files for our Django app and do TLS termination to make our application available over a secure HTTPS connection.

C. Nuxt.JS server - The first "S" in SSR (server side rendering). It is a Node.js process that renders HTML from Vue components that we define in our Nuxt app, as well as data fetched from other servers/APIs before returning HTML back to the client.

D. Django server - This runs the WSGI application with a gunicorn process in a container.

E. Django REST Framework is a Django package the facilitates the creation of REST API endpoints. This is part of the Django application, it primarily takes care of data serialization (which can be thought of as translating between JSON and Python objects that represent rows of data in our Postgres database)

F. This is the Postgres database, also a containerized service. It is on the same docker network as the Django/gunicorn application, so the Django application can connect to the Postgres database using the hostname postgres.

G. docker-compose is used to orchestrate the docker network, containers and volumes that make up the application.

H. This box represents the docker network that allows for easy networking between services. We will come back to this the significance of this in the next section.

Data flow in the application

The simple application I have built for this demonstration is a blog. There is only a list view and a detail view for simple blog post model with three fields: title, body and created date. For the list view, the frontend (Nuxt) route is /posts/ and the backend route is /api/posts/ for the detail view the frontend route is /posts/_id and the API route is /api/posts/_id/.

The data flow shown here will walk through what happens when a user visits http://localhost/posts/, and then show what happens when the user clicks on one of the listed posts to see the detail view of the post (http://localhost/posts/2).

docker-compose up is one command that is used to start the entire application in local development. This exposes the NGINX process on port 80 of the host machine (your laptop).
When the application is running on your machine and you navigate to http://localhost/posts/, the request is first handled by NGINX.
As we mentioned earlier, NGINX's path-based routing sends all requests that do not start with /api/* or /admin/* to the Nuxt.js server.
When the request gets to the Nuxt server, the Nuxt lifecycle methods start. The important one that I'm using so far is asyncData. This property is used to request data that will be used in the rendering of our HTML response.
Inside of asyncData, the application uses axios to make a request to /api/posts/ (for example). In nuxt.config.js, the privateRuntimeConfig sets a baseUrl value for axios to http://backend:8000. Since the Nuxt server is on the same docker network as the backend Django/gunicorn server, the Nuxt server is able to resolve http://backend to the address of the backend server.
Django processes this endpoint, using the PostViewSet, the views of which have been added to urlpatterns in blog/urls.py.
The PostViewSet makes a database query on the posts which is used to serialize the data.
The Django server returns the response to the original axios call.
The data returned from Django is used to render the HTML response.
The HTML response from the Nuxt server is sent back to the browser that originally navigated to http://localhost/posts/.
The user is presented with page that lists blog posts. Each blog posts lists to a detail view. When a blog post (let's say the post with id of 2) is clicked on, a request for /posts/2/ is made directly to the Django backend. The browserBaseURL value in the axios settings under publicRuntimeConfig defined in nuxt.config.js is set to http://localhost, so the request is made to http://localhost/api/posts/2/. To clarify, since we are making this request using axios in the browser, we can't make a request to http://backend:8000/api/posts/2/ like we did in step 4 (http://backend:8000/api/posts/) because the browser doesn't know how to resolve the backend hostname.
This request to http://localhost/api/posts/2/, like all others, first goes to NGINX which sends it to the backend since the path starts with /api/. At this point the application functions like a regular Vue SPA making axios calls to a backend service. This is because we used <nuxt-link> for the posts listed in the posts list view. If we used <a> tags, we would go through the same process as in step 4 where the HTML is rendered on the Nuxt server and sent back to the browser all at once.

Discussion

My main takeaway is that using Nuxt and Django together can give you good SEO and a great SPA experience at the same time. Using Django alone, or Django with traditional non SSR Vue makes this harder to do. Being a progressive framework, there are a lot of ways to use Vue with any other backend. From what I have heard, most people use Vue via CDN similar to how jQuery was and still is delivered for use in the browser.

There is additional work in setting up 3 servers for a single application (Nuxt, Django and NGINX), but the tradeoff is that I am (at least I feel) very productive writing frontend logic in Vue and backend logic with DRF. I have never liked working with Django templates and I used to know a lot more about them than I do now.

Spotlight for baserow.io's awesome open-source Django Nuxt application

Lastly I want to mention that there are some great resources in the nuxt-community/awesome-nuxt GitHub repo. There's one project that really stood out to me when I searched for "django" projects in the README, and that project is called baserow.io (repo: https://gitlab.com/bramw/baserow). Please check this repo our if you are interested in Django and Nuxt. This company is building an open source no-code database, similar to Airtable which I have worked with before.

Their entire product is open source and I have been very impressed with what I have seen. Please go give that project a star or consider becoming a Github sponsor if you are interested. I'm not affiliated with that project in any way, but I'll be referencing how they use Django and Nuxt to build their application.

Next steps

There is a still a lot I have to learn about Nuxt. I'm still very new to the Framework and this is my first time using Nuxt's SSR mode. Nuxt seems to have its own way of doing lots of things that I'm used to doing in Vue. There is a very supportive community and well-maintained official packages to help with lots of things, like the @nuxt/axios package that I'm using.

My next step is to keep expanding my blog application. One thing I didn't mention is authentication. I plan on using Django session authentication for authenticating request to Django. It seems that it already works correctly in my application (logging in through Django admin and then navigating to Nuxt routes that make Django requests are working only when I'm logged in.) I think I have an idea about how Vuex, authentication and route guards will work together, but I haven't gotten there yet. If anyone has some good reference projects or recommendations on how to expand on what I already have, please let me know!

I know that Nuxt has an auth module, so I need to see if that is relevant for what I want need in my application. I also need to continue reading the Nuxt documentation. I still don't know what I don't know about Nuxt and the plugins and modules that it makes available. I also noticed that Nuxt has it's own version of the Vue 3 Composition API, something I am just now starting to learn more about, so that it another area I'll need to dig into eventually.

Weekend project update: Open SEC Data

Brian Caffey — Sun, 29 Nov 2020 23:39:59 +0000

Here's an early look at a project I have been working on to practice some Django and Vue.js concepts: Open SEC Data.

https://opensecdata.ga (project staging website, deployed to docker swarm cluster running on DigitalOcean)
https://gitlab.com/briancaffey/sec-filings-app (main repository, requires GitLab account)
https://github.com/briancaffey/sec-filings-app (mirror, no account required to view)

This project uses Django, DRF and Celery to read public SEC filings from sec.gov, build it into an API which is consumed through a Vue.js application. I'm currently focused on 13F filings which are required for large US investment funds managing over $100 million USD. There is data dating back to 1993 and it is published quarterly.

Here are some of the things I'm focusing on in this project in no particular order:

Getting better at Django REST Framework. This project has been helping me apply some of the parts of DRF that I have found difficult. I'm currently using ViewSets which feels function-based views inside of class-based views. They are flexible, but I would like to add more abstraction with filtering
Django admin. While this project primarily uses Django as a REST API with Django REST Framework, I have tried to take advantage of the Django admin to build out helpful views that can be used to spot check the data I'm creating. Most of my API is read-only, this makes things pretty simple.
Moderately complex paginated data tables with Vue. I work with lots of paginated table data, and I think there is a better way to do abstract some of the repeated logic that I use (getting and setting current page, rows per page). I'm using Vuex, and I have heard of module factories, but I'm thinking that there will be a better way to do this when Vue 3 officially comes to Quasar Framework (Quasar is a Vue.js framework).
Session authentication with DRF. There are a lot of guides showing how to use JWT and Token Authentication for DRF with Javascript frontends. The DRF recommends using Session Authentication for such use cases as a web-base Javascript client, so I hope I can promote some best practices around how to use Django's built-in session authentication for use with the Django REST Framework using an HttpOnly session cookie. I also understand that all security decisions have trade-offs, and I'm trying to understand what trade-offs come with handling authentication in this way.
Social authentication. I have previously setup social authentication with Google, Facebook and GitHub using Python Social Auth. I think it is a great package, and it adds a lot of flexibility with it's concept of pipelines, but I haven't done much with these yet, so I'm hoping to dig in further and better understand how I can make better use of social authentication in my app. This app uses Linkedin 0Auth2 with a custom user model. Logging in with Linkedin account gives you the ability to request an API Token (Django REST Framework's Token) to access the public API.
Automatic API documentation with OpenAPI. Swagger/OpenAPI seems like nice way to document and API, so I'm hoping to build best practices around how to document a DRF API automatically with OpenAPI and Swagger UI.
CI/CD with GitLab and docker swarm. I will admit that I am huge GitLab fan. I love how flexible their CI/CD pipelines are. Being a docker fan as well, I chose to use docker swarm for this project to keep things simple and straightforward. I think one under-appreciate feature of docker is being able to set DOCKER_HOST to an SSH connection, such as ssh://root@123.456.789.10. This let's you control the remote docker host without needing to SSH to it first, and it is also how I'm able to deploy and run management commands "manually" through the GitLab UI.
Productive development environment. To start the project, you only need to run docker-compose up (after copying .env.template to .env in the root directory for storing sensitive data outside of git such as LinkedIn OAuth2 keys). The development environment is very similar to how this project runs in production with some additional utilities for monitoring and debugging such as pgadmin4, flower (for celery), redis commander (a GUI for viewing redis databases), Django debug toolbar (a must have for any Django project, I believe), runserver_plus with Werkzeug, and others. Also, the backend and frontend hot reload automatically with the help of webpack for Vue and watchdog for Django and Celery.
Automatic TLS certificate generation with Traefik. For a simple project in docker swarm, I'm really happy with how simple it is to request TLS certificates from Let's Encrypt automatically with Traefik. There are no scripts, cron jobs or one-time setup jobs, it just seems to work out of the box if configured correctly.
Testing with pytest. I have only been trying to test most of my API views so far. I really like using factory with pytest, so I use that in most of my tests.

That's all I have for now. I have a long list of questions, things I want to improve, add and experiment with, here are just a few that come to mind:

Frontend testing. I don't have any component testing or e2d tests, so this would be good to add eventually. Since I'm using a component library and my app uses these components directly, I'm not exactly sure how much testing I should be doing.
Data verification/validation. There are a lot of site that do provide similar data, WhaleWisdom is the biggest one that I know of. Once I get more data built on the site it would be good to spot check some of the values. There are some nuances to the filing data that I haven't addressed, such as Amendment filings and additions.
Calculating period changes. One of the features that I'm not sure how best to implement is the ability to sort holdings for a filer in a given period on the percent increase from the last period. One way would be to add these as additional fields to the Holding model and then calculate these values as I process the data in celery. If I process the data from recent periods to later periods, I will have to update these values once the previous period has been processed, so it would be an additional check to do. I'll probably post this question here in more detail later. Here's an example of what this means from WhaleWisdom.
Accessing LinkedIn profile data to populate fields on my CustomUser model.
Scaling? I have a lot more experience with deploying projects to AWS which is built around the ability to scale. I don't know a project on DigitalOcean would be scaled automatically. A single node docker swarm cluster while take some time to process all of the data. I would probably be better of scaling vertically with much bigger droplets and higher celery concurrency.
Docker swarm secrets. I'm currently using environment variables to pass secrets stored in GitLab CI when I build images and deploy to docker swarm. I would like to learn how to properly use swarm secrets and work them into my CI/CD pipeline.
As I mentioned above, I'm also interested in updating this project to Vue3 and to apply some of its new features to this project.
Use pipenv, poetry or some other way of pinning secondary python dependencies. Does anyone have a recommendation on how best to do this with docker. I have always thought that docker is the virtual environment, but I realize that some versions of indirect dependencies may change when pip installing without using a lockfile similar to package-lock.json.

DEV Community: Brian Caffey

RedLM: My submission for the NVIDIA and LlamaIndex Developer Contest

td;dr

Links

What is RedLM?

NotebookLM

How I built RedLM

Translating Dream of the Red Chamber with TensorRT-LLM

Building Q&A bots with RAG using LlamaIndex

Indexing the book data

Milvus Vector Database

Other examples of RAG with English questions

RedLM RAG Evaluation

Multi-modal RAG for visual reasoning

Multi-modal Q&A examples

LlamaIndex Developer Experience

LLMRerank

Observability and Tracing with Langfuse

NVIDIA inference stack (TensorRT-LLM and build.nvidia.com)

My local inference stack (RTX)

AI Models used in this project

01-ai/Yi-1.5-9B-Chat and nvidia/yi-large

baichuan-inc/baichuan2-13b-chat

Qwen/Qwen2-7B

Qwen/Qwen2-VL-2B-Instruct

meta/llama-3.2-90b-vision-instruct

nvidia/NVLM-D-72B

The success of Black Myth: Wukong

RedLM video

RedLM Deep Dive video with NotebookLM

Final thoughts

Rocket League BotChat powered by TensorRT-LLM: My submission for NVIDIA's Generative AI on RTX PCs Developer Contest

tl;dr

Development environment

Building the TensorRT-LLM engines

Running the TensorRT-LLM engines

How it works

Hooking events

Handling events and building the prompt

Making requests and handling responses

Rocket League BotChat UI

System prompt

Temperature and Seed

Messages

Demo Video for Contest Submission

Shortcomings of my project

What's next?

Rocket League BotChat powered by TensorRT-LLM: My submission for NVIDIA's Generative AI on RTX PCs Developer Contest

tl;dr

Development environment

Building the TensorRT-LLM engines

Running the TensorRT-LLM engines

How it works

Hooking events

Handling events and building the prompt

Making requests and handling responses

Rocket League BotChat UI

System prompt

Temperature and Seed

Messages

Demo Video for Contest Submission

Shortcomings of my project

What's next?

My Infrastructure as Code Rosetta Stone - Deploying the same web application on AWS ECS Fargate with CDK, Terraform and Pulumi

tl;dr

eli5

Why I made an Infrastructure as Code Rosetta Stone with CDK, Terraform and Pulumi

CDK/Terraform/Pulumi terminology

constructs, modules and components

what is a stack?

verbs

Infrastructure as Code library repos

Language

Release management, versioning and publishing

Makefile, examples and local development

ad-hoc vs prod

Directory structure

CLOC

Communities

μblog

`01-ai/Yi-1.5-9B-Chat` and `nvidia/yi-large`

`baichuan-inc/baichuan2-13b-chat`

`Qwen/Qwen2-7B`

`Qwen/Qwen2-VL-2B-Instruct`

`meta/llama-3.2-90b-vision-instruct`

`nvidia/NVLM-D-72B`

Code Overview with `cloc` (count lines of code)