Mastering Prompt Engineering: My Journey with GenAI

Gintautas Musketas — Wed, 01 Oct 2025 14:04:07 +0000

The current hype around Generative AI (GenAI) has led many to believe it’s a mythical genie that grants wishes on demand. However, the reality of working with these models is far more nuanced. As a professional software engineer, I’ve come to see GenAI as a powerful tool that requires time, practice, and a methodical approach to master. In this post, I’d like to share my personal journey into the world of prompt engineering and the lessons I’ve learned along the way.

Why Prompt Engineering Matters

As humans, we often have a clear idea of what we want to achieve and the steps needed to get there, based on our experiences. For me, this usually involves writing code to solve problems—whether it’s creating API endpoints or automating workflows. However, when working with language models, every prompt you make in a new session starts from scratch. Unlike humans, these models lack historical context, making it essential to provide clear, detailed instructions every time.

Challenges in Prompt Engineering

One of the hardest parts of prompt engineering is defining exactly what you want to achieve. For example, creating a function with specific inputs and outputs sounds simple, but crafting a prompt that is flexible yet specific enough to apply to a wide range of situations can be surprisingly difficult. That’s where techniques like SMART goals and attribute-driven design principles come into play.

Key Attributes of Effective Prompts

To create reusable and effective instructions, prompts must meet the following criteria:

Accurate: Clearly state the desired outcome.
Unambiguous: Avoid confusion or multiple interpretations.
Concise: Keep it short and to the point.
Actionable: Provide clear steps that the model can follow.

These attributes mirror the qualities of a well-designed programming language, making prompt engineering an interesting intersection of technical and linguistic skills.

My Goal: Automating the Software Development Lifecycle

My ultimate goal is to use tools like GitHub Copilot to automate the software development lifecycle (SDLC) from a technical architecture document to a deployable application. This includes:

Selecting desired architectural services.
Creating actionable tasks on GitHub Project.
Writing Infrastructure-as-Code (IaC) based on tasks.
Implementing actual business logic.

Tools and Approach

Here’s the setup I’m using to tackle this challenge:

Tools:
- A Confluence page or Markdown file as the source for the technical architecture.
- GitHub Project for task management.
- VSCode with the GitHub Copilot plugin.
- One software engineer (me!).
Approach:
1. Define a desired output—ideally something familiar.
2. Initiate a brainstorming session using self-prompting techniques.
3. Set the model’s role to focus on the specific problem.
4. Provide enough context to the model.
5. Ask the model to generate a few initial ideas.
6. Guide the model through the process until the desired outcome is reached.
7. Generate accurate, concise, and unambiguous instructions based on the steps performed.

What’s Next?

Prompt engineering is still a growing field, and there’s much to learn. I’d love to hear your thoughts! Do you agree with the key attributes I’ve outlined? Have you used similar techniques in your work? Share your experiences in the comments below!

Creating data product using Confluent Kafka, Flink, Tableflow, PyIceberg and AWS

Gintautas Musketas — Sat, 12 Jul 2025 11:00:47 +0000

What is data streaming
Why data streaming is needed
What is Confluent Flink
What is Confluent Tableflow
Business use case
Solution overview
- Purpose
- Benefits
- Challenges
Conclusion
References
Code

What is data streaming

Data streaming is the continuous transfer of data from one or more sources at a steady, high speed for processing into specific outputs. Data streaming is not new, but its practical applications are a relatively recent development [4].

Data streams are generated by all types of sources, in various formats and volumes. From applications, networking devices, and server log files, to website activity, banking transactions, and location data, they can all be aggregated to seamlessly gather real-time information and analytics from a single source of truth [2].

Why data streaming is needed

Real worlds data is arriving gradually overtime, as your users are producing it now, produced it yesterday and will produce it in future. Unless you go out of business, this process never ends, and so the dataset is never complete in meaningful way [1, 439 p.].

Todays users expect instant results, whether it would come in form of a analytical dashboard providing insights into business, showing system metrics or even military and intelligence systems to raise alerts on signs of attack [1, 465 p.].

Streaming data allows to process data in small time windows or even continuously, using single event as it arrives.

What is Confluent Flink

Apache Flink® is a powerful, scalable stream processing framework for running complex, stateful, low-latency streaming applications on large volumes of data. Flink excels at complex, high-performance, mission-critical streaming workloads and is used by many companies for production stream processing applications. Flink is the de facto industry standard for stream processing [3].

Confluent’s fully managed Flink service enables you to:

Easily filter, join, and enrich your data streams with Flink
Enable high-performance and efficient stream processing at any scale, without the complexities of managing infrastructure
Experience Kafka and Flink as a unified platform, with fully integrated monitoring, security, and governance

For additional details: Confluent Flink overview

What is Confluent Tableflow

Tableflow enables you to represent Apache Kafka® topics and associated schemas as open table formats such as Apache Iceberg™ or Delta Lake in a few clicks to feed any data warehouse, data lake, or analytics engine, eliminating the complexities of data pre-processing, preparation, and transfer [5].

Tableflow automates key tasks, such as schematization, type conversions, schema evolution, Change Data Capture (CDC) stream materialization, table metadata publishing to catalogs, and table maintenance.

Tableflow introduces support for:

Iceberg: Now Generally Available, exposing streaming data in Confluent Cloud-hosted Kafka topics as Iceberg tables.
Delta Lake: Now in Open Preview, enabling you to explore materializing Kafka topics as Delta Lake tables in your own storage. These tables can be consumed as storage-backed external tables from Databricks.

For additional details: Tableflow in Confluent Cloud overview

Business use case

Let's put theory into practice, global e-commerce organization with multiple sales channels would like their analysts to:

Analyze sales channel data on weekly bases for past week in order to understand product demand and customer behavior based on current offering.

Additionally following requirements must be met:

Aggregated weeks data is required to be a flat file (CSV) export.
Sales channel data contains product, order and customer information, thus PII data has to be obfuscated.
Data is streamed into existing managed Confluent Kafka as a separate events.
Deduplicated data is required for accurate analysis.
Events uses AVRO encoding with Schema registry.

Solution overview

There are multiple ways to solve this problem, for instance using AWS Lambda native support for AVRO [8] for consuming channel events, performing obfuscation and deduplication of data before storing them in DynamoDB, while preparing export.

Alternatively Confluent Flink snapshot queries maybe used, however at the time of writing (2025-07-11) feature is in early access.

Purpose

However goal for today will be test feasibility of solution based on Confluent Cloud managed Flink and Tableflow. The following high level architecture is displayed in figure.

The solution consists of the following steps:

Sales channel data is received as it is and is stored in Kafka input topic.
Managed Flink processes channel event:
- Performs deduplication on events.
- Obfuscates PII data (out of scope in PoC code).
- Stores processed data in Kafka output topic for consumption.
AWS Lambda on schedule queries Tableflow enabled output topic and stores query result as flat file on object store.
On file creation email notification with a link to export is sent to business analyst (out of scope in PoC code)
Business analyst downloads the file and prepares report.

Benefits

Shift-left stream processing. Confluent Cloud for Apache Flink powers real-time stream processing on Kafka topics, which are subsequently materialized into tables using Tableflow. With Flink, you can implement business-specific logic such as filtering, deduplication, personally identifiable information (PII) removal, and data stream-level joining. This approach significantly minimizes the need for expensive post-processing in data lakes or data warehouses [12].
Tableflow removes need of copying data to temporary storage.
It is fully managed solution, thus you can focus on core business, as proactive maintenance and security, scalability etc. are handled by Confluent.
Predictable monthly cost.
Looks good on CV :).

Challenges

Flink queries are running continuously, for thus running costs has to be measured, multiple use cases might be needed to justify cost [10].
Tableflow adds additional cost [11], as you will be charged by amount you are running the service on a topic.
Events in output topics are earliest after deduplication as Tableflow only supports append only Flink tables (Kafka topics) [9].
Although completely managed, relatively complex solution.
There's a learning curve to it, as multiple different technologies and tools involved, PyIceberg, Flink, AWS.

Conclusion

After performing this experiment we have arrived at the following conclusions:

The Solution is relatively high cost, thus requires careful consideration if to be used for this use case, may require accompanying use cases to proceed.
The solution is fully managed and can cope with large amounts of data.
For particular use case Flink Snaphot query may be more suitable, however it is early access [13] or AWS only based solution.

References

[1] Designing Data-Intensive Applications, Martin Kleppmann, O'Reilly, 2017, 590 p.
[2] https://www.confluent.io/learn/data-streaming/
[3] https://docs.confluent.io/cloud/current/flink/overview.html
[4] https://www.techtarget.com/searchnetworking/definition/data-streaming
[5] https://docs.confluent.io/cloud/current/topics/tableflow/overview.html
[6] https://py.iceberg.apache.org/
[7] https://iceberg.apache.org/
[8] https://aws.amazon.com/blogs/compute/introducing-aws-lambda-native-support-for-avro-and-protobuf-formatted-apache-kafka-events/
[9] https://docs.confluent.io/cloud/current/topics/tableflow/overview.html#topic-and-table-limitations
[10] https://docs.confluent.io/cloud/current/flink/concepts/flink-billing.html#many-data-streaming-apps-and-statements
[11] https://docs.confluent.io/cloud/current/topics/tableflow/concepts/tableflow-billing.html
[12] https://docs.confluent.io/cloud/current/topics/tableflow/overview.html#shift-left-stream-processing-with-af-long
[13] https://docs.confluent.io/cloud/current/flink/concepts/snapshot-queries.html
[not-used] Tableflow Confluent explanation video https://youtu.be/O2l5SB-camQ?t=680
[not-used] Apache Iceberg (OTF open table format) explained https://youtu.be/TsmhRZElPvM?t=233

Code

https://github.com/gintautasm/data-engineering-blueprints/blob/main/confluent-topic-to-csv-with-flink-tableflow/confluent-flink.md

DEV Community: Gintautas Musketas