DEV Community: ChunTing Wu

Kiro Workflow for Copilot, Claude & More

ChunTing Wu — Mon, 08 Sep 2025 01:26:22 +0000

Last week we introduced how an agent understands a codebase and built an actual codebase agent to demonstrate the underlying mechanism.

During this process, we highlighted some current limitations of Kiro. For instance, even with a detailed design.md file, Kiro frequently deviates from the plan during execution, producing results that diverge significantly from the intended design.

The root cause lies in Kiro's execution process, where it uses design.md as context but fails to fully read the entire file. This issue becomes particularly serious when steering content is extensive, as the agent tends to protect its context window. Consequently, it may only read the first few lines of a file and consider it fully processed.

This means when Kiro executes plans, we still need to constantly remind it to deviate from the design. Not only does this waste a lot of vibe request quotas, but it also lowers overall development efficiency. After all, we need to review every detail more carefully.

The problem is that Kiro is a closed-source product, so we can't fine-tune its prompts. We're left watching this cycle repeat over and over. So I wondered if there's a way to use prompts to make tools like Copilot or Claude Code perform Kiro-like workflows while still allowing adjustment of each execution detail.

After actually trying it out, it turns out to work.

https://github.com/wirelessr/kiro-workflow-prompts

Although this project is written in Copilot's format, it can be applied to any agent, including Claude Code. However, since I'm using Copilot, I'll demonstrate using its actual workflow.

Installation is straightforward, simply place these prompt files in the project directory or a global prompts folder to use them directly in Copilot's chat window.

All prompt files are thoroughly explained in the README.md. This article will focus on how to use them.

Copilot Spec-driven Development

At first, just like Kiro's spec mode, we need to ask Copilot to create a spec directory and convert our requirements into an EARS specification document.

/createSpec
I want to build a web service with OAuth authentication, integrating Google OAuth and requiring a database to store user information.

By running the /createSpec command with these requirements, the Agent begins preparing the requirements.md file. If there are materials under .kiro/steering that provide behavioral guidelines for the Agent, it will also reference those contents.

Throughout the requirements creation process, we can continuously interact with the Agent to refine the details of the entire requirements document.

Here's an interesting observation, I've noticed Kiro tends to over-engineer things, adding some obsessive details. So I used system instructions to make our Agent more pragmatic like Linus Torvalds, adhering to the KISS (Keep It Simple, Stupid) principle.

Once we have the requirements.md file, we move on to the design phase.

/design
I approve the requirements document. Let's begin the design phase.

The design phase requires no prompting, as demonstrated by Kiro. We only need to approve requirements.md, after which the Agent will autonomously initiate the design process. Of course, the Agent will still reference steering documents for design guidance, and we can continue to fine-tune the process throughout.

Upon completing the design, the Agent will generate the corresponding design.md file, enabling us to proceed to the next step, the planning phase.

/createTask
I approve the design document. Let's begin planning.

Here, explicit approval is still required; otherwise, the Agent will not proceed.

After this phase is complete, we will have the three most critical specifications: requirements.md, design.md, and tasks.md. Interestingly, these outputs follow Kiro's rules exactly. Therefore, switching back to Kiro to execute tasks from this stage is no problem at all.

After approving tasks.md, the final step is to enter the execution phase.

/executeTask task1

You can explicitly specify which task to start with, or omit the task number altogether. In that case, the Agent will attempt to begin with the first uncompleted task.

This workflow finishes once the entire tasks.md file is marked as complete. We can see this is basically identical to Kiro's process.

Wrap Up

In these prompts, I explicitly state that all documents must be thoroughly reviewed before proceeding. However, the preceding steps don't need to be so meticulous, as humans will collaborate with the Agent to co-create those specification documents.

However, /executeTask operates in headless execution mode, so we absolutely must ensure the Agent understands its purpose. Therefore, within the executeTask prompt, I not only require it to thoroughly review all files but also demand it summarize each one. Only by asking for summaries can we enforce sufficient rigor to ensure the Agent genuinely reads everything carefully.

This insight emerged during my implementation of the codebase agent. The greatest advantage of this workflow is that we can achieve results comparable to Kiro, or even better. Using tools we're familiar with (and have paid for), we can customize prompts to meet specific needs. Just as I required the agent to thoroughly read files, there's lots of room for adjustment, allowing us to fully unleash our creativity.

If you have any good ideas, feel free to contribute to my project. Currently, I've tailored the prompts based on my own habits and needs, but you might have even more insights to share.

How to Learn AI from Scratch

ChunTing Wu — Tue, 17 Jun 2025 01:44:33 +0000

In today's era of generative AI, there are countless ways to get started with AI. However, for engineers without a background in AI or machine learning, the overwhelming number of buzzwords can make it hard to know where to begin. That said, there's no denying the productivity gap between those who know how to use AI and those who don’t.

This article aims to give software engineers a fast-track introduction—a practical guide to navigating and thriving in this new landscape.

The mind map below outlines the flow of the article. We’ll start with how to use AI effectively, then move into how to build things with it, touching on key concepts along the way. Let’s dive in.

How to Use

When people talk about using AI, it's impossible not to bring up how it all started—ChatGPT was the moment generative AI really hit the mainstream. After that, every major company started launching their own chat models.

I’ve listed four models I personally use almost every day. You might wonder why I switch between them. Simple reason: free plans come with limits, so rotating helps me stay productive without paying.

ChatGPT – This is the one I use the most. Whether I’m writing, editing, brainstorming, or just trying to get a fresh idea out, ChatGPT usually gets the first draft going.
Claude – When it comes to quick scripts or anything related to the command line, Claude feels the easiest to work with. For instance, if I need a curl command to upload a JSON file with auth, I’ll ask Claude.
Gemini – I use this mainly for more in-depth research. It gives off a more grounded vibe, which helps when I need something solid to work with.
Grok – Once I hit Gemini’s limit for the day, Grok usually takes over.

These tools make it really easy to fold AI into everyday tasks. For most situations, this setup covers everything I need. But when I’m working on something more specific—like building a presentation—I’ll bring in different tools.

One I rely on a lot is Gamma.app. It’s changed how I make slides.

I already had a pretty good rhythm from doing talks regularly, so I can usually outline things quickly. But Gamma takes it even further. I just give it a prompt, let it build a rough version, and then tweak the parts I want to improve. Something that used to take half a day now takes me about an hour.

Another one I keep coming back to is Perplexity.

Since generative AI is basically predicting what words come next, it’s not always accurate. That’s where Perplexity helps—it’s the tool I use to cross-check facts or dig up references. Sure, other AI tools have similar features, but I’ve set Perplexity as my browser’s default search engine, so it’s the quickest for me.

I use a few other tools depending on the project. For example, if you work with Confluence Cloud, Rovo Chat is a solid option that fits nicely into that workflow.

Vibe Coding

For software engineers, vibe coding has become a key part of using AI effectively to boost productivity. But getting good at it takes a lot of hands-on practice—and even just picking the right IDE and agent can take serious trial and error.

Beyond that, no matter which agent you go with, you’ll still need to plug it into the right ecosystem to unlock its full potential. Some rely on MCP-style control flows, others on rule templates. Here are a few specific needs I personally care a lot about:

Task master: If your instructions to the AI aren’t clear enough, it’s easy for the model to get stuck—wasting tokens without producing anything useful. This was highlighted in Apple’s paper, which shows that as task complexity increases, model performance can collapse entirely. That’s why task decomposition is essential. Task master is a solid open-source option for that.
Memory bank: Since LLMs have limited context windows, they tend to forget past mistakes or important task details. A persistent memory mechanism helps address that gap.

There are plenty of other tools that can be added, depending on your development habits and how far you've gone with vibe coding. It's really about building the setup that fits your workflow.

Underneath all of this is prompt engineering—everything starts with how you communicate your intent to the model. Getting that part right matters a lot. Fortunately, Google has published a pretty thorough whitepaper that’s worth reading if you’re looking to understand the fundamentals.

If you want to go deeper, there’s a growing body of research too.

For example, this paper summarizes 26 different techniques for improving prompts—worth checking out if you’re serious about refining your workflow.

Vibe coding is half tools, half mindset—and prompt engineering is the bridge between the two.

How to Develop

Once you’ve got the hang of using AI, the natural next step—especially for engineers—is to start experimenting with what AI can actually do for you. Personally, I’ve built a handful of small tools that I now use regularly at work, like a code review agent and a text-to-command-line agent.

These tools each focus on solving very specific problems with AI. So how do you even start building something like that?

Model selection

The first step is understanding what resources are available. By “resources,” I mean which AI services expose endpoints that you can actually call.

If you’re willing to pay, then the big-name providers are all viable. But if you’re trying to keep costs low, what are your options? Turns out—plenty. Here are a few I actively use:

OpenRouter: This platform gives you access to a wide range of models, including quite a few with free quotas. In fact, even models like Google’s Gemma 3:27B have a free tier here.
Ollama: If you’re worried about running through OpenRouter’s quotas, you can always fall back to local setups. Ollama is plug-and-play and runs locally. The only trade-off? Heavy models are tough to handle on a local machine—but smaller models are often too limited.
gemini-balance: This one’s kind of clever. Gemini offers a free tier that’s actually free—it doesn’t sneak in charges once you pass a usage limit. The catch is, the quota’s small. But if you can cycle through enough free-tier tokens, you can effectively run things at zero cost. That’s exactly what gemini-balance helps with.

Here’s how I’ve set things up:

For important tasks, I rely on gemini-balance. The Gemini 2.0 models are just that good—I trust them with higher-stakes stuff.
For lighter tasks, I go with OpenRouter, especially when I want to use Gemma 3:27B. It’s a strong model, but OpenRouter doesn’t support function calling for Gemma, so I keep it for simpler jobs.
For embedding, I use Ollama locally. Embedding isn’t very compute-intensive, but I run it at scale, so I’d rather not worry about hitting quotas.

As you can probably tell, picking a model isn’t just about performance—it’s also about constraints, access, and cost. Each model comes with its own trade-offs, so understanding those is key to building something reliable.

AI Application

Once you're familiar with the tools, the next step is figuring out what kind of applications you want to build.

This is where imagination comes in—but regardless of what you're building, you'll eventually run into the concept of RAG.

RAG (Retrieval-Augmented Generation) is one of the most effective ways to unlock real-world utility from LLMs.

Why?

Because language models can’t access huge documents directly—their input limits are real. But many tasks (like customer support) require deep background context. That’s exactly what RAG helps with.

I won’t go deep into RAG architecture here. If you're curious, I wrote a more detailed piece: Evolution of RAG: Baseline RAG, GraphRAG, and KAG.

No matter what kind of app you're building, sooner or later, you'll hit the need for fine-tuning. Most base models are trained for general tasks, but real applications require customization—your data, your workflows. That’s where fine-tuning comes in. Of course, fine-tuning isn’t trivial—it assumes some background in ML. I’ll cover that in another post.

And then there’s evaluation and observability—two things you must consider when your AI app is live. You need to know if the model is doing its job, and why it failed when it didn’t. Tools like LangSmith and LangFuse can help with this, but you’ll need to spend time experimenting with what works best for your stack.

Conclusion

So yeah—diving into AI development is a big journey.

From picking tools, to writing prompts, to deploying apps, there’s a huge surface area of decisions.

But don’t let that intimidate you. AI is still new, and most of us are learning as we go.

My advice: start with problems from your day-to-day life. That’s the best way to learn. You’ll naturally uncover more tools, more patterns, and more techniques as you build.
Eventually, you’ll develop your own set of “survival skills” for the AI era.

This article includes a pretty massive mind map—it touches on nearly every major area in the Gen AI space.

You don’t need to master everything—not even close. Just start with what you care about, build one small thing, and the rest will follow.

In this new AI era, curiosity and momentum matter way more than credentials.

Master Vibe Coding: AI-Powered Workflow

ChunTing Wu — Mon, 12 May 2025 06:32:12 +0000

Recently, I've taken a lot of time to actually experience vibe coding, and I've come up with a few tips.

TL;DR

The used agent must support differential mode modification instead of overwriting the entire file.
Commit continuously.
The commands should be clear and simple, and context-sensitive.

Before describing the experience I'd like to briefly introduce the toolset I use.

IDE is VS Code.
Agent is a plugin from RooCode.
Model uses several free models provided by OpenRouter.
- The main one is deepseek-v3-0324
- Occasionally llama-4-scout

As you can see I use free tools, which I believe is the best setup for the common developer. But it also creates a lot of headaches, which I'll talk about in a bit more detail.

Next, I'll provide you with the complete project I made using vibe coding.
https://github.com/wirelessr/alice

This is a command line tool that converts the natural language into the actual terminal command to be executed and gets the result.

In fact, many tools have similar functions, as follows.

Warp (AI-powered terminal tool)
iTerm2 AI
ShellGPT
ai-shell
etc.

There are many more, but they all have their own problems.

The main one is they don't support customized models like OpenRouter or Anthropic, and the second one is that the results are not very good, either the results are not on point or they aren't easy to use.

So I wrote a tool for my own needs, and I won't dive into the details of the tool.

The whole development process is I wrote a basic working script first, which is main.py and then I rely on vibe coding to start refactoring and adding new features step by step.

This development process should fit the scenario of most developers practicing vibe coding. Instead of using vibe coding to create a project from scratch, we use vibe coding to iterate on an existing project.

Let's start by explaining the three points mentioned at the beginning of the article.

Diff mode patching

I think this is one of the most important things, why?

When the model is not smart enough, it is easy to overwrite the whole file and change something that is not a problem. Here is an example of deepseek-v3-0324 muddying the water.

The project I developed uses AutoGen to implement the communication between agents, here is the import I used.

from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.models import ModelInfo
from autogen_agentchat.agents import AssistantAgent

But when RooCode wanted to make some changes to the code, he would often overwrite my entire file. Then the code breaks. This is the most common “unauthorized” change he does.

He changes models to model and agents to agent for some reason.

from autogen_ext.model.openai import OpenAIChatCompletionClient
from autogen_core.model import ModelInfo
from autogen_agentchat.agent import AssistantAgent

Therefore, I've added a couple of absolute rules to my RooCode.

.roo/rules/rules.md

1. Do not overwrite the whole file easily.
2. Just modify parts that is asked.
3. Do not change any no bug part.

Commit continuously

In vibe coding, agent behavior is largely unpredictable, especially with dummy agents like the example above.

So when you make a command, and you get an acceptable result, you have to make a git commit right away. Even if the command is only partially complete, at least the result is acceptable, so save it.

In the beginning, I often let the agent do what he wants and I ask him to do what I want, and then I have no idea what he's changed, or even what he's changed that's broken.

Therefore, please remember, as long as you see the result is acceptable, then just commit.

With commit, you will be able to compare what you have changed in subsequent commands, and you will also be able to know what needs to be fixed in this prompt.

Prompt should be clear and contextualized

Don't expect the agent to know what you want to change.

If you want to change something, tell the agent exactly what you want, which file, which function or even which lines you want to change. The clearer you are, the more you can get the expected result, not the free-for-all of a creative person.

When I first came across vibe coding, I always thought that agent can see all the code and structure, so agent should know what to change. But I was wrong, and very wrong.

There is a limit to the context that Agent can see. Of course, you can make that context huge, but there are a couple of problems.

Huge contexts cost money.
Huge contexts are easier to hallucinate.
Huge contexts are not always acceptable in terms of agent capability.

So you have to tell the agent clearly what to look at and where to change.

Will developers be eliminated?

There is often a dream that agent can replace the developer, I can only say, this is really a dream.

The project structure this time is quite simple, and the functions are not complicated.

To put it frankly, it is actually three agents communicate with each other to accomplish one thing.

Planning agent converts the natural language into a command line to be executed.
Execution agent executes the command line and reports the result.
Verification agent checks whether the result is correct or not, if not, it will start again.

Along the way, I added some new features.

Use PyInstaller to generate the executable.
Automatically release with Github Action.
Support both environment variables and configuration files.
Model setting.
Choose the language of the response.
Hide agent interactions and only get the final result.

I've also done some refactoring, such as splitting main.py into corresponding modules or turning the config module into a singleton pattern.

But with vibe coding, I feel a bit exhausted, agent's development efficiency is not as good as my own code, sometimes, I am really angry to see him playing idiot, but after all, I am practicing vibe coding, so I can only continue to see him playing idiot.

This kind of playing idiots with the stronger the model will be less and less, that is to say, it can be easier to write down the prompt.

Back to the above conclusion, vibe coding can not replace the developer, but for the skilled developers can improve development efficiency.

For example, if I tell the agent to turn my load_config() into a singleton pattern, he will make a standard singleton class as he imagines, but his context is not enough to change all the callers to load_config and then break the program.

But if I tell him to turn load_config into a singleton while keeping the load_config interface the same, he'll be able to make something similar to what I imagined.

So the role of the developer has become more important.

Nowadays, more and more agents have orchestractor (dispatcher) mode, which means he will break down complex tasks and assign them to other agents. Honestly, what he does is really not good, it's better to break down the requirements myself and then use prompt to tell the agent what to do.

Wrap Up

Vibe coding is already a trend among developers, but just like we practice writing programs, vibe coding requires continuous practice.

The more we use an agent, the easier to know where the agent's limits are, and the easier to make the agent meet the expectations.

Vibe coding is just a tool, a tool to make things easier for the user, not to replace the user. I hope everyone can become a sophisticated user of vibe coding, I am practicing too.

Hands-On WrenAI Review: Text-to-SQL Powered by RAG

ChunTing Wu — Mon, 21 Apr 2025 01:44:48 +0000

WrenAI is a text to sql solution that I've been following for a while. Recently, I have some time to try it out, so let me share my experience.

First of all, according to the installation guide provided by the official document, we can generally deploy WrenAI to the local machine. As for the integration with local Ollama, there are some sample configuration files that can be used.

However, there are still some details to be adjusted that are not mentioned in the document. For example, if we use the ollama/nomic-embed-text mentioned in the document as the embedder, then we need to change the embedding_model_dim of the configuration file from 3072 to 768, which is a detail that can be easily overlooked.

Just provide the appropriate settings and WrenAI will work fine.

By the way, I am using MySQL and the official MySQL test dataset.

WrenAI Advantage

In addition to schema-based chat Q&A, WrenAI has another awesome feature.

It provides the flexibility to customize prompts, and under WrenAI's Knowledge page, it is possible to enter pre-designed questions and corresponding SQL answers. In addition, we can also enter additional commands to provide the AI with additional “parameters”.

After I disassembled WrenAI's prompt, I realized that these Knowledge play a very important role in determining the final SQL's appearance. This customization provides a reliable fine tuning opportunity when integrating a usage-based data source like BigQuery.

WrenAI Disadvantage

This is a fairly new project (0.19.2 at the moment), so there are bound to be some bugs.

I've encountered two problems that I find quite annoying.

First, when I first logged in the homepage, WrenAI will provide some recommended questions based on the data model, so that users can get into the situation quickly. However, this kind of full model scanning consumes a lot of computing power, and if the model is not strong enough, basically we cannot get the result.

It doesn't matter if we can't get the recommended questions, but WrenAI's error handle is not well designed, it will provide a lot of unimportant built-in questions, and there is no align project setting for the language. I've mentioned a GitHub issue about this.

Secondly, even though there are not many data models in the source and not many columns, the llama3.1:8b model still has a certain percentage of AI hallucination.

Since I'm a data engineer, it's easy for me to read SQL, so it's easy for me to find the problem, and WrenAI also provides a good correction mechanism to correct the original answer, so I haven't encountered too many obstacles in using it.

However, I am worried that people who are not familiar with datasets and SQL may have unexpected surprises if they use it directly.

Lastly, and this is both a strength and a weakness of WrenAI, WrenAI is based on a RAG implementation to generate SQL, which requires a strong model to support. Take my llama3.1:8b running on MacBook Pro M2, a simple problem (joining a few tables) would take more than 5 minutes, not to mention the complicated problems, and it's common to run into internal server error.

I won't go into the details of some minor Web UI issues and design flaws in the interaction with the backend.

Wrap Up

WrenAI uses RAG as the foundation to implement a pretty good text to sql solution, and because it is based on RAG, there is a lot of flexibility to customize the prompt.

However, because it is a RAG, the computing power and model capacity are high, and the effect on small model scenes needs to be strengthened.

I will do more experiments with more powerful models and expect to get better results. Overall, WrenAI is a product worth trying.

Benchmarking Lakehouse Query Engines with TPC-DS

ChunTing Wu — Thu, 20 Feb 2025 02:51:34 +0000

Apache Iceberg is already a popular lakehouse format that is supported by many query engines. What should we do if we want to make a technical selection among many query engines?

In the data warehouse domain, the most commonly used standard is TPC-DS, which defines several common scenarios and provides a set of standardized queries. Generally speaking, TPC-DS is the gold standard for benchmarking performance.

Although TPC-DS is quite popular and there are many common connectors for dumping test data into various databases, and even Trino, a pure computing engine, provides a dedicated catalog for TPC-DS, there is no such thing as a TPC-DS for lakehouse at the moment.

Lakehouse does not have a good connector for this purpose. Therefore, in this article we will try to describe how to dump the test data of TPC-DS into Iceberg's lakehouse.

Experiment environment setup

Regarding how to build the TPC-DS tools is not the focus of this article, so I'll start by assuming that dsdgen is already installed.

First, let's generate a test data package.

dsdgen -SCALE 1 -DIR /home/ec2-user/sample

Once we have the test data, we need to build the Iceberg environment and import the data.

Although I've provided some Iceberg playgrounds before, this time I'd like to use tabular's experiment environment. The main reason is that the tabular environment also includes a spark notebook, which helps a lot.

https://gist.github.com/wirelessr/37b19323664cff6f9af42bd814f05a5d#file-docker-compose-yaml

One small modification we made was to mount the test folder sample into spark-iceberg.

    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
      - ./sample:/home/iceberg/sample

Import test data

After we have created the environment, we need to write the data into Iceberg.

First, we need to create table schema.

In general, the tpcds.sql of TPC-DS can be used directly, but there are a few things that need to be modified in the Iceberg experiment.

Add catalog and database to the original table name, i.e. <catalog>.<db>.<table>.
the primary key line should be pulled out and declared PARTITIONED BY instead. It's better to have it in buckets, so I've split it into 16 buckets, but it can be tweaked.
Now that the primary key has been removed, remember to remove the comma from the previous line as well.

Here is the table income_band for illustration.

create table demo.test.income_band      -- rename table
(
    ib_income_band_sk  integer  not null,
    ib_lower_bound     integer          ,
    ib_upper_bound     integer          -- remove comma: ,
    -- remove line: primary key (ib_income_band_sk)
)
PARTITIONED BY (bucket(16, ib_income_band_sk)) -- add line
;

I have written a Python script to handle this.

https://gist.github.com/wirelessr/37b19323664cff6f9af42bd814f05a5d#file-proc_ddl-py

When we are done with all the DDLs, then we just need to open the Spark SQL built into the experiment environment to copy and paste those DDLs directly.

docker exec -it spark-iceberg spark-sql

After the tables are defined, it's time to start importing the data into Iceberg, which we typically do using the shortcuts provided by Spark SQL.

INSERT INTO demo.test.income_band
SELECT * FROM csv.`file:///home/iceberg/sample/income_band.dat`;

But there is a problem, the column type will not match and cause an error (SQL will treat INT in csv as STRING). So we can do it with PySpark to save some time.

There is also a ready-made notebook available in the experiment environment.

docker exec -it spark-iceberg pyspark-notebook

Therefore, we just need to open the notebook and start the task of importing the csv.

from pyspark.sql import SparkSession

dat = [
    'call_center.dat',
    'catalog_page.dat',
    'catalog_returns.dat',
    'catalog_sales.dat',
    'customer.dat',
    'customer_address.dat',
    'customer_demographics.dat',
    'date_dim.dat',
    # 'dbgen_version.dat',
    'household_demographics.dat',
    'income_band.dat',
    'inventory.dat',
    'item.dat',
    'promotion.dat',
    'reason.dat',
    'ship_mode.dat',
    'store.dat',
    'store_returns.dat',
    'store_sales.dat',
    'time_dim.dat',
    'warehouse.dat',
    'web_page.dat',
    'web_returns.dat',
    'web_sales.dat',
    'web_site.dat'
]

spark = SparkSession.builder.appName("Import CSV").getOrCreate()

for f in dat:
    df = spark.read.csv(f"file:///home/iceberg/sample/{f[:-4]}.dat", header=False, inferSchema=True, sep="|")
    df = df.drop(df.columns[-1]) # drop the last empty column

    df.write.mode("append").insertInto(f"demo.test.{f[:-4]}")

It's pretty easy to fill out each table with PySpark's inferSchema feature. One thing to note is that I intentionally cut off the last column of the csv.

df = df.drop(df.columns[-1])

The reason is that the csv generated by TPC-DS has a separator at the end of each line, which will be wrongly recognized as one more column.

At this point, we have written all the data into lakehouse, and then we can use various query engines to benchmark the performance of the query predefined by TPC-DS.

Wrap up

TPC-DS is still a standard on lakehouse, but there are fewer resources on how to test it on lakehouse.

This article provides a quick overview of how to import data on lakehouse, and I believe that following the steps should be feasible.

The only thing left to do is to actually query on it, but that's a less difficult task, so I won't dive into it.

Apache Paimon Playground ft. Flink and Trino

ChunTing Wu — Mon, 25 Nov 2024 01:38:03 +0000

Apache Paimon is a new data lakehouse format that focuses on solving the challenges of streaming scenarios, but also supports batch processing. Overall, Paimon has the potential to replace the existing Iceberg as the new standard for data lakehousing.

Why Iceberg and not the other two (Hudi and Delta Lake)?

Iceberg is the most widely supported by various open-source engines, including pure query engines (e.g., Trino), New SQL databases (e.g., StarRocks, Doris), and streaming frameworks (e.g., Flink, Spark), all of which support Iceberg.

However, Iceberg faces several problems in streaming scenarios, the most serious one is the fragmentation of small files. Queries in data lakehouses rely heavily on file reads, and if a query has to scan many files at once, it will of course perform poorly.

To avoid this problem, we need to rely on an external orchestrator to merge files on a regular basis. Paimon is designed with a built-in merge mechanism, and many other optimizations for mass writes, making it more adaptable to streaming scenarios.

Experiment environment

In order to learn more about Iceberg, I have set up two experimental environments.

This time I also built a playground for Paimon, which also includes Trino and Flink.

https://github.com/wirelessr/paimon-trino-flink-playground

In addition, StarRocks was also put in as a representative of New SQL.

Because neither Trino nor StarRocks support streaming writes at this stage, Paimon's writes come from Flink.

How to use

NOTE: Since some of the links to the official Paimon files are not working, I've put the files into this repo. However, some of the files are huge, so I put them in via LFS, so be sure to install git-lfs.

Trino driver is in paimon-trino-427-0.8-20241112.000605-197-plugin.tar.gz.

tar -zxvf paimon-trino-427-0.8-20241112.000605-197-plugin.tar.gz

Then it will run normally with docker compose up -d.

docker compose up -d

Flink

Let's start by connecting to Flink SQL.

docker compose exec flink-jobmanager ./bin/sql-client.sh
./bin/sql-client.sh

To write data using Flink we first need to create the correct catalog.

CREATE CATALOG my_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 's3://warehouse/flink',
    's3.endpoint' = 'http://storage:9000',
    's3.access-key' = 'admin',
    's3.secret-key' = 'password',
    's3.path.style.access' = 'true'
);

As shown in the above commands, we're using the MinIO as an S3 to store the Paimon.

The next step in creating the table and writing the data is quite simple, just run the commands according to the official documentat.

USE CATALOG my_catalog;

-- create a word count table
CREATE TABLE word_count (
    word STRING PRIMARY KEY NOT ENFORCED,
    cnt BIGINT
);

-- create a word data generator table
CREATE TEMPORARY TABLE word_table (
    word STRING
) WITH (
    'connector' = 'datagen',
    'fields.word.length' = '1'
);

-- paimon requires checkpoint interval in streaming mode
SET 'execution.checkpointing.interval' = '10 s';

-- write streaming data to dynamic table
INSERT INTO word_count SELECT word, COUNT(*) FROM word_table GROUP BY word;

Then we actually read it and see the result of what we've written.

-- use tableau result mode
SET 'sql-client.execution.result-mode' = 'tableau';

-- switch to batch mode
RESET 'execution.checkpointing.interval';
SET 'execution.runtime-mode' = 'batch';

-- olap query the table
SELECT * FROM word_count;

Trino

Let's go to Trino's cli first.

docker compose exec trino trino

Trino's paimon catalog is already set up, but I didn't add a new schema but just used the default one.

So we can query the Flink write result directly.

SELECT * FROM paimon.default.word_count;

We should see something similar to the Flink query.

StarRocks

This is an extra, just to show how much attention Paimon is getting now that many New SQL databases are starting to support it.

Prepare a mysql client locally to connect to StarRocks.

mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "

We still need to create a catalog.

CREATE EXTERNAL CATALOG paimon_catalog_flink
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "filesystem",
    "paimon.catalog.warehouse" = "s3://warehouse/flink",
    "aws.s3.enable_path_style_access" = "true",
    "aws.s3.endpoint" = "http://storage:9000",
    "aws.s3.access_key" = "admin",
    "aws.s3.secret_key" = "password"
);

The mysql client should not support Trino's table locator format: <catalog>. <schema>. <table>, so we have to switch to the db before we can query.

USE paimon_catalog_flink.default;
SELECT * FROM word_count;

The results here will be similar to the above.

Conclusion

Although Paimon supports many kinds of metastore as follows.

filesystem
hive metastore
jdbc

But for the sake of simplicity, I didn't use extra components, so I only use S3 aka filesystem as metastore. Although the function is fine, according to the official document, using S3 as warehouse needs to be paired with hive metastore or jdbc metastore to ensure consistency.

But for object storage such as OSS and S3, their 'RENAME' does not have atomic semantic. We need to configure Hive or jdbc metastore and enable 'lock.enabled' option for the catalog. Otherwise, there may be a chance of losing the snapshot.

Understanding what kind of scenarios require this kind of consistency will be the goal of my future experiments.

Is there an Alternative to Debezium + Kafka?

ChunTing Wu — Mon, 04 Nov 2024 01:28:50 +0000

I asked this question on Reddit a while back and received lots of valuable answers.

Therefore, I've looked into each answer and documented the results in this article.

TL;DR

No, Debezium dominates the market at the moment, despite some drawbacks.

Background Explanation

Why would we want to find an alternative to Debezium? The main reason is we encountered a challenging scenario.

This is a typical scenario for Debezium, where any modifications to the data source are captured and fed into Kafka for downstream processing.

The advantage of this architecture is simple and efficient, ensuring all downstream processes are as real-time as possible.

If the source has a large number of updates, Debezium can scale horizontally until a large number of updates are concentrated in a single table. This is where Debezium hits its limits.

Even though Debezium can scale horizontally, it means the updates originally handled by one process can be distributed to multiple processes. If each table already has a dedicated process, horizontal scaling is no longer feasible.

We are in such a situation, in our environment, even if the machine specification is stretched, the CDC throughput of a single table is capped at 25 MB/s.

This is certainly not a regular case, after all, 25 MB/s change for a single table is quite significant. However, if we encounter a data source that is doing large-scale data migration, this limit can be easily breached.

In order to ensure the real-time performance of our data pipeline downstream, we can only ask the upstream to be merciful when encountering this level of data migration, and try to do a good job of rate limiting.

However, this limitation will greatly reduce the productivity of the upstream developers. On the one hand, they have to add auditing process to their regular maintenance, and on the other hand, they need to develop an additional rate limit for each maintenance.

So let's find a solution.

Solution Overview

The following solutions were gathered from that Reddit article.

The first three solutions are enterprise services without open source, so they're not going to work for us. After all, we're trying to solve a partial use case, not a complete do-over.

Although Estuary Flow says they have a local deployment approach, I couldn't find any information about it.

The fourth solution was to develop a new tool ourselves, which I believe would be a fundamental solution to the problem. After all, Debezium is developed in JAVA, and we should be able to achieve better performance with Golang, Rust, or even C/C++. However, the development cost was too high for us, and it was difficult to start from scratch.

The first four options didn’t meet our needs, but the fifth option caught my attention as a promising solution.

Conduit is an open source data migration platform developed in Golang, and provides a variety of connectors to integrate many data stores. In addition, we can also develop our own converter to do data format preprocessing.

Therefore, I started to test the performance of Conduit.

Expirement Environment

To keep things simple, I used Kafka Connect in place of Debezium. The two are essentially the same but with different dispatchers, and behind the scenes they all use the same library.

Locust is responsible for generating MongoDB changes, then Conduit and Kafka Connect will write to different Kafka topics.

We can observe the writing speed of Kafka topics to determine who has better performance.

The whole experiment environment is as follows.

https://gist.github.com/wirelessr/82a642685d40d78a49a4cdb1ff1cfa9f

I use two of my own packaged images, Conduit and Kafka Connect, which have MongoDB connectors.

It's easy to generate a large amount of change by stuffing MongoDB with a bunch of fat documents and then just changing the value of a field in all the documents.

The locust script used is as follows.

import time

from locust import User, task, between
from faker import Faker
import pymongo

fake = Faker()

class MongoDBUser(User):
    wait_time = between(1, 5)

    def __init__(self, environment):
        super().__init__(environment)

        self.env = environment

    def on_start(self):
        self.client = pymongo.MongoClient(self.host)
        self.db = self.client["test"]
        self.collection = self.db["test_new"]

    @task
    def incr_seq(self):
        response = None
        exception = None
        start_perf_counter = time.perf_counter()
        response_length = 0

        try:
            response = self.collection.update_many(
                {},
                {"$inc": {"seq": 1}}
            )
            response_length = response.matched_count
        except Exception as e:
            exception = e

        self.env.events.request.fire(
            request_type="mongo",
            name="incr seq",
            response_time=(time.perf_counter() - start_perf_counter) * 1000,
            response_length=response_length,
            response=response,
            context=None,
            exception=exception,
        )

Load Test Result

For this test, I used a local machine without fully stressing its CPU or memory, leaving some resources available to avoid errors from performance bottlenecks.

In other words, this test shows the regular ability of a single process to handle a single table load.

Conduit

Kafka Connect

As the results show, Kafka Connect's throughput significantly outperforms Conduit’s when system resources are sufficient.

I was a bit confused about this result, so I repeated the test a few times, but I got similar numbers.

Wrap Up

Back to the question in the title.

Is there an alternative to Debezium + Kafka?

Not at the moment—at least, not among open-source tools.

I've asked on Reddit, but maybe Dev.to will have a different answer, so feel free to offer your solutions.

Dockerize Local RAG with Models

ChunTing Wu — Wed, 18 Sep 2024 03:35:29 +0000

Previously, I introduced a generic RAG tamplate, in which I mentioned that there are three cores needed to make a high-quality RAG.

embedding with semantic understanding
LLM with contextualized knowledge.
compression result by rerank.

When all of these are in place, a high quality RAG will be created, regardless of whether there is fine-tuning or not.

Add high quality sources and accurate prompts, and you've got a complete RAG.

Simple, right?

Is it possible to containerize such a simple yet useful implementation and run it completely locally? Yes, of course.

Let's take the three models mentioned in the previous template as an example.

Ollama plus TAIDE.
BGE-M3 for embedding.
ms-marco-MultiBERT-L-12 as reranker

Ollama with Models

Ollama is a completely local LLM framework, you can pull down the LLM model you want to use by ollama pull.

Ollama itself provides a basic container.

docker pull ollama/ollama.

Nevertheless, there is no simple way to get this container to mount the model. So here's a little bit hack, let me demonstrate with a Dockerfile.

FROM ollama/ollama as taide_base

RUN nohup bash -c "ollama serve &" && sleep 5 && ollama pull cwchang/llama3-taide-lx-8b-chat-alpha1

We use Ollama's containers directly and wake up the ollama service during the docker build process and download the model directly.

This way we have an LLM framework with models.

Packaging `BGE-M3`

The BGE-M3 here is a HuggingFace supplied model, so all we need to do is find the HuggingFace model catalog and copy it into the container.

In my environment (without modifying any settings), the model directory is at

~/.cache/huggingface/hub/models--BAAI-bge-m3

Therefore, we only need to COPY the contents of this directory into the container.

However, it is important to note that HuggingFace requires config.json when loading models, and this file is very deep.

def init_embeddings():
    from langchain_huggingface import HuggingFaceEmbeddings
    HF_EMBEDDING_MODEL = './models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181'

    return HuggingFaceEmbeddings(
        model_name=HF_EMBEDDING_MODEL,
        model_kwargs={'device': 'cpu'},
        encode_kwargs={'normalize_embeddings': False}
    )

As we can see from this code, we actually need to specify the snapshot that is used at the moment when using the model.

Well, we are left with the last one, reranker.

Packaging `ms-marco-MultiBERT-L-12`

The ms-marco-MultiBERT-L-12 used here is integrated by langchain. With the default behavior, langchain's document_compressors will place the model in /tmp.

In other words, when we run the following code, it downloads the model into /tmp.

from langchain.retrievers.document_compressors import FlashrankRerank
compressor = FlashrankRerank(model_name='ms-marco-MultiBERT-L-12', top_n=5)

So what we need to do is copy /tmp/ms-marco-MultiBERT-L-12 into the container.

But that's not enough, we need to explicitly specify on the client side that the model's directory has been changed to the container's current directory. This is a bit complicated to explain, so let's just look at an example.

from flashrank import Ranker
from langchain.retrievers.document_compressors import FlashrankRerank

ranker = Ranker(model_name='ms-marco-MultiBERT-L-12', cache_dir='.')
compressor = FlashrankRerank(client=ranker, top_n=5)

All right, we've got the three models we need in the container.

Conclusion

Although this article provides a containerized RAG solution, I have to say that the container image is 18 GB.

If we were to package it with the embedded vectors from the source, it would easily exceed 20 GB.

Therefore, this container can only be used for simple testing, and is not really capable of scaling, so you need to be more careful when using it.

Turn Based Multiplayer Beer Game

ChunTing Wu — Wed, 11 Sep 2024 06:00:20 +0000

Since I need to organize a systems thinking workshop in the near future, I need a beer game to start it off.

The beer game itself consists of four characters: Retailer, Wholesaler, Distributor and Factory. Through the time-delay nature of the logistics to understand the system perspective, and can have a better understanding of the system boundaries.

As this is a few hours workshop, I want this beer game to fulfill the following features.

It's a multiplayer game.

The beer game itself will have many participants playing various roles in the supply chain, but I'd like to be able to have multiple supply chains competing at the same time to see who scores higher. Thus, we can learn about their system strategies at the same time.

The game host should be able to see everyone's status.

Since there are multiple teams competing at the same time, as a host I need to be able to see how each team is progressing and scoring at the moment.

The game flow has to be simple and easy to control the pace.

As I said at the beginning, this is a short workshop, so I need to get everyone up to speed quickly and I need to be able to control the details of each round.

Moreover, a timer appears in the player's UI at the beginning of each round, advancing the game pace by counting down.

Be able to customize the characters.

A classic beer game consists of four characters, but the more characters there are, the longer the game will be. So I'd like to adjust the game pace so that it's better to have three characters.

After searching around, I found that neither open source projects nor projects that are already online can satisfy these requirement perfectly. So, I'd better make one myself.

Beer Game Project

https://github.com/wirelessr/beer_game

Host UI

Player UI

The entire project is business driven developed and tested with over 90% coverage, so please feel free to use it.

Preparedness

Create a file for secrets in the project folder. You should see me copy it in the Dockerfile.

.streamlit/secrets.toml

[mongo]
uri = "<your mongo connection>"

[admin]
key = "<your admin key>"

[player]
key = "<your player key>"

Since this project is using MongoDB, you have to fill in the link with your account password. In addition, admin.key and player.key correspond to the key fields on the UI.

After all, I'm uploading the app to the public cloud, so I still need a basic authentication mechanism. If you're running locally only and find authentication troublesome, you can remove the corresponding source code.

Installation and Use

This project has a Dockerfile attached, so it can be run directly with docker.

docker build -t beer_game .
docker run --rm --name beer -p 8501:8501 beer_game

For development, in addition to requiremnts.txt, requirements-test.txt, which runs the unit tests, should also be installed. Then you can run all the unit tests through the Makefile.

pip install -r requiremnts.txt
pip install -r requirements-test.txt
make test

Game Flow

The whole game is divided into a host mode and a participant mode, which correspond to the options in the top corner of the UI.

The host first assigns a game_id to create the game, and all participants have to fill in the player_game with this id.

All players on the same supply chain need to use the same player_id, so this id is also known as the supply chain ID, and participants with the same player_id are separated into roles by player_role.

You can see the status on the host's screen when a participant joins.

Let's look at what a full iteration would look like from the host's point of view.

All the components that need to be manipulated are in this picture, and each turn starts by pressing the Refresh button and ends by pressing Next Week.

As for how many orders to send to all the supply chains in this round, they will be triggered by Place Order.

It's worth mentioning that the Place Order itself is idempotent, so it's fine to change the number and press it again, the last number will be used. The Place Order of each participant's interface will be idempotent as well.

Once the host has placed the order, the shop player can take the order.

Similarly, each role in the supply chain starts with Refresh and ends with Place Order, with the shop player taking the action followed by the retailer player, and so on.

Finally, back to the host, who can press Refresh again to see all the statuses for the round, and Next Week to end the round.

Game Detail

There are a couple of things actually done during Refresh.

it refills inventory from downstream based on orders placed four weeks ago.
it receives orders from upstream.
decides how much to sell based on what inventory it can sell.

Since Place Order is idempotent, Refresh itself is idempotent too.

Future work

It basically meets all of my needs now, but there are some enhancements that could be made.

For example, although the host can see the status of all the participants, it would be helpful to have a graph to show the change of inventory and cost information over time, which would be useful for reviewing the game after it is over.

There's also a more basic problem: the current UI has no test coverage at all, mainly because the current game flow is quite simple. Just a few clicks on the UI will cover all the UI flow, so I don't rely so much on auto-testing. However, if there is a UI modification, it will still be a bit tedious, so it would be better to have a UI unit test.

Overall, these requirements are optimizations, but their lack does not affect the functionality.

If you have additional ideas, you can also just submit a pull request, contributions are welcome.

When to Consider Design Patterns

ChunTing Wu — Wed, 07 Aug 2024 01:59:37 +0000

This week, at the study group, our engineer asked a question: Should we follow the design pattern to design and implement the project from the beginning?

The design pattern mentioned here refers to GoF's Design Patterns.

Meanwhile, there is a related context which suggests that our coding should follow SOLID principle as much as possible, especially the first S and the second O, anyway.

Since SOLID principle should be followed as much as possible, then the design pattern should also be taken into consideration from the very beginning, right?

No, not really.

What are design patterns?

The design patterns we often talk about actually refer to those solutions defined by GoF, which are designed to solve the problems often faced in object-oriented programming.

When we want the code to be easy to maintain and easy to modify, i.e., the open and closed principle, we abstract the problems through various encapsulation techniques, and eventually become those interesting design patterns.

Did you notice two key points in this paragraph?

OOP-specific problems
Ease of maintenance and modification

If your project has just started and there is still a question mark over whether it will be successful or not, why do you need to think about the flexibility in the long term?

If your project requirements are fixed, why do you need to worry about maintenance and modification?

If you're one of my regular readers, you'll remember we've talked about a similar question before.

When to consider clean architecture?

Even earlier on, we have talked about another similar issue.

Do you really need a microservice?

Most of time, I'm on the side of "don't over-engineering".

Whether it's design patterns, clean architecture, or even microservice architecture, they are all means to solve problems, but first, you have to encounter the problem. If you have a clear problem, find the right solution, and don't do it just for the sake of doing it.

What exactly is a Pattern?

By the way, what is pattern?

As I mentioned at the beginning, the design patterns we often talk about actually refer to the solutions listed in the GoF book.

Pattern is like a "symptom", when you have a cold over and over again, you will know that you need to drink more water and take more rest, the same is true when we program. When we see a certain problem over and over again, we will naturally have a corresponding solution, but will this solution be the most effective one?

The design pattern is the medicine for that symptom. When we come across a symptom, we will find out the corresponding medicine to solve the problem, and therefore, you should not usually take medicine without thinking about it, right? It's unhealthy to take medicine when you don't have symptoms.

In fact, patterns are everywhere.

Object-oriented programming has its patterns, software architecture has its patterns, and even system architecture has its patterns.

For example, the following book should not be unfamiliar to you.

This book may feel a little unfamiliar.

Here is a series of books from Volume 1 to Volume 5.

These books are all about the patterns encountered in various software development contexts, in short, a list of medicines.

If we're sick, we can get something out of it, but if we've never been sick, we don't even know what they're talking about, not to mention when to consider the patterns.

Conclusion

Software development is a pragmatic process.

What we are doing is always encountering problems, thinking about them, and finally solving them.

"This may be needed in the future, so I'm ready for it." I've heard this phrase a lot, but in reality, most of these pre-prepared things don't work. Because requirements are always changing.

But if we find requirements have changed and it's hard for us to implement them, then it's not too late to consider what medicine we need to take. This is refactoring.

Paradigm Shift in Software Development, Part 2

ChunTing Wu — Wed, 31 Jul 2024 07:08:33 +0000

Paradigm Shift in Software Development, Part 1
Paradigm Shift in Software Development, Part 2

Last time, we mentioned that GenAI can be used to implement business logic and dramatically increase development productivity and reduce the effort of debugging.

However, I have to say that GenAI is not suitable for applications that require accurate computation. In other words, the demonstration in the previous article was just a demo and not a recommendation to use GenAI for calculating promotions.

But that doesn't mean using GenAI as a business logic is a bad idea. On the contrary, GenAI is really suitable for replacing "some" business logic.

Maybe you will ask, most of the business logic needs to be calculated and requires accuracy, so what exactly can GenAI help? Well, I'll tell you, it helps call the APIs that have already been coded.

The correct way to use GenAI to handle business logic is to encapsulate that business logic directly into a Remote Procedure Call (RPC) and let GenAI prepare the parameters needed for the RPC and call the corresponding RPC correctly.

Currently, the RPC that is easiest for GenAI to handle is the REST API for the following reasons.

Plain text style, no matter the URI, HTTP header, HTTP method, query parameter and request payload are all plain text.
Full specification support. Nowadays, the most straightforward way to describe a REST API is to use a swagger, and the description file of the swagger is also plain text.
REST is a mature and relatively less dependency choice, on the contrary, such as gRPC and other protocols need to have to install additional drivers.

Therefore it's pretty clear what we're going to do, so let's follow the steps and explain them. We will still use Gemini as a demonstration as in the previous article, but once again, you can use what you are familiar with.

Experimental environment introduction

I have prepared a web service with basic CRUD.

https://github.com/wirelessr/genai_api_calling

Just docker compose up to get the service up.

This service is a product microservice that handles CRUD for a single product and saves the results in a database. For ease of use, I'm using Redis as the database for this example.

This service has two entry points.

http://localhost:50000 is the home page of the microservice, with a simple list page and blocks for adding and modifying products.
http://localhost:50000/apidocs is the swagger home page, which lists all the API descriptions and specifications.

Then prepare some basic test data.

curl -X POST "http://localhost:50000/api/product" -H "accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" -d "id=123&name=Apple&description=Fruit"
curl -X POST "http://localhost:50000/api/product" -H "accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" -d "id=234&name=Bird&description=Animal"
curl -X POST "http://localhost:50000/api/product" -H "accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" -d "id=345&name=Cat&description=Animal"

It is worth noting that creating a new product and modifying a product are actually the same API: POST /api/product. When the product ID doesn't exist, the API will create a new product; on the other hand, if the product ID exists, then it will modify the product.

Remember this, because it's relevant to the business logic we're trying to implement.

The following examples are actually listed in rest_api_calling.ipynb.

GenAI calls RPC

First, we need to enable GenAI to call RPCs, which is called function calling, and both OpenAI's ChatGPT and Google's Gemini have similar capabilities.

Reference links are listed here.

The following is an example.

def get_website_content(url: str) -> str:
  """Get the content from a specific URL similar to curl -X GET.

  Args.
      url (str, required): The target URL is either remote or local.

  Returns.
      str: The raw content from a specific URL.
  """
  response = requests.get(url)
  return response.text

def post_request(url: str, data: str) -> dict:
  """
  Send a request similar to curl -X POST.

  Args.
      url (str, required): Target URL.
      data (str, required): The data of the form in "id=value1&name=value2&description=value3" format.

  Returns.
      dict: Dictionary containing the status code and response content.
  """
  response = requests.post(url, headers={"Content-Type": "application/x-www-form-urlencoded"}, data=data)

  try:
      response_content = response.json()
  except ValueError:
      response_content = response.text

  return {
      "status_code": response.status_code,
      "content": response_content,
  }

system_instruction = '''You are a professional web crawler and familiar with swagger usage.
You can get the content you want by yourself through web api, 
and when you utilize web api, you will actively list which api is used.
If you don't know the answer to a question, just answer no, don't make up an answer.
'''

model = genai.GenerativeModel(model_name='gemini-1.5-pro',
                              tools=[get_website_content, post_request],
                              system_instruction=system_instruction,)

We have designed two tools for Gemini, get_website_content and post_request. The code comments and annotations must be written in detail so that GenAI can realize the use of these tools.

Now GenAI can use GET and POST.

GenAI understands RPC specifications

Once GenAI is able to invoke RPC, we then need to enable GenAI to learn "all" RPC specifications.

In the case of REST APIs, the simplest way is to use a swagger to describe all the APIs, including their inputs and outputs, as well as their functionality. The more detailed the information, the more GenAI can operate correctly without having to spend a lot of effort tuning prompts.

chat_session = model.start_chat(
    enable_automatic_function_calling=True,
    history=[
        {
            "role": "user",
            "parts": [
                "I have a website and this website provides full swagger: http://localhost:50000/apispec_1.json Please tell me the features of this website."
            ]
        },
        {
            "role": "model",
            "parts": [
                """This website provides four API endpoints:

- Create or update a product: You can send a POST request to `/api/product` with product ID, name, and description to create or update a product.
- Delete a product: You can send a POST request to `/api/product/delete/{product_id}` to delete a product by its ID.
- Get a product: You can send a GET request to `/api/product/{product_id}` to retrieve the details of a product by its ID.
- Get all products: You can send a GET request to `/api/products` to get a list of all products. """
            ]
        }
    ]
)

This is the record I left after talking to Gemini beforehand, it can be used as the history of the prompt.

You can also do some additional conversations to verify that Gemini has really learned it.

response = chat_session.send_message("Create a new product by your own thought")

print(response.text)

Practical Business Logic

Now we've made sure that GenAI knows the basics, we can make GenAI implement business logic based on those basics.

For example, the current POST /api/product is a combination of creation and modification, with the id determining whether to create or modify. The id is filled in by the client itself, so it is very likely to be wrong.

Then we can ask GenAI to find out the unused id to add based on the result of list.

Avoid the existing ids and create a new product.

In this way, the client does not need to fill in the id itself but GenAI is responsible for generating the id.

Or another use case, right now the description is free format, so it can be written any way we want. But we can use GenAI to provide a template so that all creations and modifications are in a fixed format, such as the following template.

When creating and modifying products, the description must conform to the following format.

Category: str, the category of the product.

Price: int, the price of the product.

Notes: str, additional information.

The implementation of this business logic doesn't involve computation, it's just a matter of letting GenAI call on a known API to accomplish a specific goal.

You can use your imagination to make more variations.

Conclusion

In the previous article we mentioned that GenAI can be used to implement business logic, and this is true. But in practice, we still need to do some development work to make GenAI able to implement business logic accurately.

Because GenAI may be good at business logic, but it's not accurate, so in order to maximize GenAI's strengths, we need to make GenAI do as little computation as possible. By encapsulating the business logic that needs to be computed and enabling GenAI to execute it exactly according to the instructions we provide, we can maximize productivity.

Why do we use GenAI to implement business logic?

Let's go back and answer this fundamental question. Because GenAI has the ability to understand natural language and execute our predefined scripts or steps, implementing business logic with GenAI becomes software development in natural language.

Of course, GenAI is not perfect, there are a few core elements in the development process.

Integration must be done properly. Although the examples I have provided are all working scripts, in fact, to make GenAI actually put into production environment requires a lot of infrastructure. For example, LLM caching, vector databases, and model repository. Each of these components is a new tech stack for organizations that have never introduced GenAI before.
Prompts must be good. Although we are developing in natural language, GenAI can easily "learn the wrong way" if we are not precise enough. Moreover, GenAI may perform normally but crash when it encounters a specific pattern, which will be very difficult to debug.
Testing must be done right. We have already encapsulated business logic into RPC for GenAI, but we still need to make sure that GenAI works properly in all kinds of scenarios, so we must have a high coverage of test cases.

As we can see, although software development with GenAI may seem attractive, there are many challenges that must be overcome. If we want to become a master of prompting, we still have to be a developer first.

A Paradigm Shift in Software Development: Leveraging GenAI for Direct Business Logic Processing

ChunTing Wu — Wed, 17 Jul 2024 01:49:15 +0000

A Paradigm Shift in Software Development: Leveraging GenAI for Direct Business Logic Processing

Generative AI transforms the way we handle business logic

If you read the title and thought I am going to introduce Copilot, you are wrong.

Before we start the topic, let's start with a case study of an e-commerce platform.

Suppose the shopping cart looks like the following.



[
  {
    "product_id": 123,
    "amount": 2,
    "price": 10.99,
    "category_id": 1
  },
  {
    "product_id": 456,
    "amount": 1,
    "price": 29.99,
    "category_id": 2
  },
  {
    "product_id": 789,
    "amount": 5,
    "price": 1.99,
    "category_id": 3
  }
]

Each field should be simple enough to contain the item purchased, the quantity purchased, the price of the single item, and the category it belongs to.

I have 3 promotions.

$5 off a $20 purchase, which continues to accrue after qualifying.
buy 2 get 1 free on category_id 1 items.
30% off the total price of category_id 3 items.

What is the total price after calculation?
How much of the discount is allocated to each item?

To implement such promotions, please answer the following questions.

How long would it take you to write this logic?
Can you make the logic better than O(n^2)?

The first question is easy to understand, but what does the second question mean?

We have three promotions, and to be able to determine the impact of every promotion we need to scan the entire shopping cart for every item. So, in the example above, that's 3 * 3 = 9, i.e. O(n^2).

Then

What if I told you:

I could do it in just a few minutes.
And it's O(1).

Would you believe me?

Guess how I did it.

GenAI can help

Although GenAI was mentioned, if you thought I am going to introduce Copilot or similar tools you are very wrong.

It's true that those code generation tools can produce business logic in a matter of minutes, but the business logic they produce will still work the way we think, which means it will still be O(n^2).

So what do we do with GenAI? The answer is simple: let GenAI learn business logic and then answer the results directly.

Sounds unbelievable, right? Let's see how I did it.

GEMINI DEMO LINK

Even though I'm using Gemini as an example, actually, you can use any model.

Step 1

First, I'll tell Gemini his role using System Instructions.

You are an e-commerce expert who is well versed in all kinds of promotions and understands how shopping cart profits are calculated.

Step 2

Next, ask Gemini to explain the structure of a shopping cart that I dropped in. It's important to ask him to explain this. Instead of telling him what it is, it's better to let him understand it for himself so that he can get a more accurate mental model.

Let's describe a shopping cart in JSON, here's an example.

Step 3

Tell Gemini what he needs to know about the promotion, and explain in detail what we need. This echoes the question at the beginning of this article. The point of this step, by the way, is not just to explain the promotion, but also to tell him what results to send back.

Step 4

Based on Gemini's thought process, we have to keep correcting it until his understanding and calculations are correct. Fortunately, GenAI doesn't hide anything. He tells us step by step what he's thinking, so it's easy to find mistakes in the middle. I have to say, it's much easier to debug a natural language than a programmed language.

Step 5

Ask Gemini to generate a response structure that corresponds to the requirements in step 3, which is why I said we should tell him what we want as early as possible. If we don't tell him as early as possible, we may need to go back and adjust his thought process at this step, which would be very ineffective.

Final step

Because GenAI will still keep "describing" his answer, we have to tell him, "I don't want to see the process, I just want to see the result, and I don't want any description".

Finally, the business logic is complete.

Wait, that's a little weird. We don't deal with business logic this way by interacting with GenAI.

Yes, that's right! The first step to the last step are all pre-defined "prompts", and we can get the result by wrapping all these prompts and business logic inputs in the same question and asking GenAI.

In fact, it looks like this.

https://gist.github.com/wirelessr/85ed1e1616513a4fcd4bdc3ad5f7874b

The INSERT_INPUT_HERE is actually the original structure of our shopping cart promotion calculation.

This process is exactly the same as the popular prompt engineering nowadays.

Conclusion

In this article we have shown a case study of using GenAI to accomplish business logic.

Let's organize the whole process again.

Inform about the role of GenAI
Explain the input of the business logic.
Describe the requirements of the business logic.
Correct GenAI's errors.
Generate the output of the business logic.
Prune all descriptive statements.

These steps are all centered around prompt engineering, and the more you are familiar with prompt engineering, the quicker the process will be.

The benefits of this process are not only that we can make O(n^2) business logic processing become O(1) as mentioned at the beginning, but also that we can make business logic easier to debug. As I said, it's much easier to catch human speech defects than it is to find bugs in a program.

Nevertheless, there is one important thing to realize about this development process. We have to realize that GenAI is actually a Large Language Model, or LLM, which is not good at computation. So when we use GenAI to write business logic, we still need to have full unit testing to make sure the results are what we expect.

In other words, the importance of unit testing increases rather than decreases with this development process.

When we think of GenAI for software development, we always think of Copilot, but it's much simpler to let GenAI implement business logic directly without generating code.

DEV Community: ChunTing Wu

Kiro Workflow for Copilot, Claude & More

Copilot Spec-driven Development

Wrap Up

How to Learn AI from Scratch

How to Use

Vibe Coding

How to Develop

Model selection

AI Application

Conclusion

Master Vibe Coding: AI-Powered Workflow

Diff mode patching

Commit continuously

Prompt should be clear and contextualized

Will developers be eliminated?

Wrap Up

Hands-On WrenAI Review: Text-to-SQL Powered by RAG

WrenAI Advantage

WrenAI Disadvantage

Wrap Up

Benchmarking Lakehouse Query Engines with TPC-DS

Experiment environment setup

Import test data

Wrap up

Apache Paimon Playground ft. Flink and Trino

Experiment environment

How to use

Flink

Trino

StarRocks

Conclusion

Is there an Alternative to Debezium + Kafka?

Background Explanation

Solution Overview

Expirement Environment

Load Test Result

Wrap Up

Dockerize Local RAG with Models

Ollama with Models

Packaging BGE-M3

Packaging ms-marco-MultiBERT-L-12

Conclusion

Turn Based Multiplayer Beer Game

Beer Game Project

Preparedness

Installation and Use

Game Flow

Game Detail

Future work

When to Consider Design Patterns

What are design patterns?

What exactly is a Pattern?

Conclusion

Paradigm Shift in Software Development, Part 2

Experimental environment introduction

GenAI calls RPC

GenAI understands RPC specifications

Practical Business Logic

Conclusion

A Paradigm Shift in Software Development: Leveraging GenAI for Direct Business Logic Processing

A Paradigm Shift in Software Development: Leveraging GenAI for Direct Business Logic Processing

Then

GenAI can help

Step 1

Step 2

Step 3

Step 4

Step 5

Final step

Conclusion

Packaging `BGE-M3`

Packaging `ms-marco-MultiBERT-L-12`