DEV Community: Michal Šimon

How STRV Transformed Their Cloud Architecture and Sparked a Culture of Innovation

Michal Šimon — Mon, 02 Feb 2026 12:30:19 +0000

It all started with an introduction from AWS Hero Filip Pýrek, who connected me with Marek Čermák, Engineering Manager at STRV. That connection quickly grew into a collaboration that was much more than consulting; it became a journey of transforming architecture, empowering developers, and turning curiosity into real, running prototypes.

Rather than delivering a predefined architecture, our consultancies were structured as an ongoing exchange of knowledge and architectural best practices. Through this close collaboration, the team could explore modern AWS services in a real-world setting, understand architectural trade-offs, and gain the confidence to choose the right patterns for future challenges.

Over a few months, we explored modern AWS architecture together, looking at how STRV’s internal project could move beyond its monolithic Fargate setup and step into a more scalable, cost-efficient, serverless-first world. The goal was not only to redesign infrastructure but also to give the team a foundation they could immediately build on.

From Monolith to Momentum

Before the collaboration, the internal projects were running on a large Fargate container connected to Aurora and S3. This architecture is completely valid, works well, and remains the STRV’s go-to stack. However, for this specific use case, the team’s goal was to explore something leaner, faster, and easier to maintain. As Marek put it:

During our sessions, we mapped out how services like AppSync, Step Functions, Cognito, DynamoDB, Glue, and Athena could replace parts of their monolithic workload with purpose-built, fully managed components. The moment things clicked was when these ideas became practical. Not theoretical diagrams, but real workflows, real data pipelines, and real examples that fit STRV’s day-to-day work.

Hands-On Learning That Sparked an Idea

As we explored serverless patterns and data pipelines, the team started seeing possibilities everywhere:

workflows simplified through Step Functions
backend logic handled by AppSync and resolvers
data analytics running on S3, Glue, and Athena with almost no operational overhead

This hands-on learning atmosphere led the STRV team to propose something bold:

“Why don’t we take all of this and turn it into a hackathon?”

And that is exactly what happened.

A Weekend Hackathon With Real Results

Two teams, one Saturday, and a room full of developers and whiteboards. The energy in the Prague office was electric from the moment the hackathon kicked off.

What made the event special was the freedom. There were no sprints and no tickets. Just creativity, experimentation, and a clear deadline.

One team went all-in on serverless, building a lightweight backend with AppSync and DynamoDB. The other team blended familiar tools with newly explored AWS Lambda, choosing a steady and incremental approach.

This contrast turned into one of the most insightful moments of the entire collaboration. It showed how flexible AWS can be and how the same challenge can be solved in completely different ways depending on what you value most, such as speed, familiarity, scalability, or simplicity. What made the event even stronger was the team experience. As Tomáš Kocman, Principal Software Engineer at STRV, shared:

It was a reminder that innovation often thrives in collaborative, encouraging environments.

From Weekend Prototypes to Real Projects

After the hackathon, the team did not just walk away with MVPs. They walked away with confidence. Within weeks, the team put their new skills into practice on an internal project. As they described it,

The impact was immediate:

lower operational and maintenance costs
much faster development cycles
easier iteration
improved scalability

Knowledge turned into prototypes, and prototypes turned into production improvements.

More Than Architecture, a Cultural Shift

Looking back at the collaboration, the most meaningful outcome wasn’t any single architectural decision. It was how quickly the team moved from learning new concepts to confidently applying them in practice. As Michal Klacko, Director of Engineering at STRV, reflected on the experience:

By working on a single internal project, the team gained hands-on experience with a wide range of AWS-native and serverless patterns. This experience now enables them to make deliberate, well-informed architectural decisions based on the context of each project.

Personally, for me, the real success wasn’t the new architecture. It was the momentum the STRV team built around learning and experimenting. I’m thankful to Marek and the entire team for the trust, openness, and great energy they brought into every session.

How STRV Transformed Their Cloud Architecture and Sparked a Culture of Innovation

Michal Šimon — Mon, 02 Feb 2026 12:27:23 +0000

From Monolith to Momentum

Hands-On Learning That Sparked an Idea

As we explored serverless patterns and data pipelines, the team started seeing possibilities everywhere:

workflows simplified through Step Functions
backend logic handled by AppSync and resolvers
data analytics running on S3, Glue, and Athena with almost no operational overhead

This hands-on learning atmosphere led the STRV team to propose something bold:

“Why don’t we take all of this and turn it into a hackathon?”

And that is exactly what happened.

A Weekend Hackathon With Real Results

Two teams, one Saturday, and a room full of developers and whiteboards. The energy in the Prague office was electric from the moment the hackathon kicked off.

What made the event special was the freedom. There were no sprints and no tickets. Just creativity, experimentation, and a clear deadline.

It was a reminder that innovation often thrives in collaborative, encouraging environments.

From Weekend Prototypes to Real Projects

The impact was immediate:

lower operational and maintenance costs
much faster development cycles
easier iteration
improved scalability

Knowledge turned into prototypes, and prototypes turned into production improvements.

More Than Architecture, a Cultural Shift

From Crashes to Acquisition: How Technical Decisions Stabilized Dividend.watch’s Product

Michal Šimon — Mon, 26 Jan 2026 14:06:39 +0000

About Dividend.watch

Dividend.watch is an investment platform for tracking users' stock portfolios. For their users, reliability is everything, and downtime means missed alerts and frustrated customers.

So when their platform started crashing every few days, it quickly became a business-critical problem. At that point, the team didn’t yet know what was causing the instability; only that internal firefighting wasn’t leading to a solution. The team was also in the middle of preparation for acquisition, actively talking with several prospects interesting on purchasing the app. These issues were blocking the deal.

Aleš Chromec, Dividend.watch co-founder, made a crucial decision: to bring in external help rather than continue firefighting internally. That decision would prove pivotal.

TL;DR

Dividend.watch reached out when their product began experiencing recurring crashes that were difficult to diagnose. The team knew the application was unstable, but the root cause wasn’t clear. Together, we investigated the issue and traced the crashes to a memory leak caused by architectural patterns that weren’t releasing memory as expected. I helped stabilize the platform and worked with the team to define a clear plan for refactoring the backend.

With stability restored, the team took ownership of the improvements, successfully refactored the backend, and moved toward a serverless approach. This shift reduced operational burden. The decision to bring in support at the right moment, combined with the team’s execution, strengthened the platform’s technical foundation and helped position the company confidently for an acquisition that successfully happened a few months later.

How We Found the Root Cause

The engagement began in a collaborative, hands-on way. Aleš created a dedicated Slack channel and invited me to join, allowing us to share insights. I adapted to their workflow, keeping all discussions and data within their environment.

We started testing hypotheses in their staging environment. However, because the issue only appeared under heavier load, some controlled experiments in production were necessary. I deployed enhanced logging and memory metrics to gain visibility into the system’s behavior under real-world conditions.

It soon became clear that this was not a simple bug. The app’s instability stemmed from architectural patterns that retained memory longer than necessary. A quick patch wouldn’t be enough.

To buy time for a proper fix, I proposed a temporary workaround:

Introduced a load balancer and ran the app across multiple instances
Monitored memory usage per instance

If an instance exceeded thresholds for too long, traffic automatically shifted to healthy nodes, and the “unhealthy” instance was restarted to mitigate the memory leak before it could impact users

This gave us stability and breathing room to focus on the real solution: refactoring the backend to eliminate the architectural issues.

Fixing the Root Cause

The root cause wasn’t a single function. It was a systemic architectural problem. To be more specific, long-running WebSockets were used for real-time communication with the frontend, but they weren’t properly freeing memory, which led to slowly consuming resources until the app crashed.

After pinpointing and fixing the problem, the team took ownership of the backend rewrite. Together, we designed a clear, actionable plan for refactoring. With firefighting reduced, the team could execute the improvements themselves.

The Results

Near-zero downtime: The app went from crashing every few days to running reliably
Higher developer confidence: Refactoring made the backend predictable and maintainable
Simpler, more modern stack: Serverless adoption reduced load on main servers
Business impact: A stable, reliable product positioned the company strongly during the acquisition

The memory leak had put the entire product at risk, but Aleš’s decision to get help at the right time, combined with the team’s execution, saved the platform and strengthened the company’s technical foundation.

Final Thoughts

In this case, what began as an unstable application became a catalyst for meaningful architectural improvements, renewed team confidence, and long-term business success.

And personally, I believe Aleš made the right call to bring in support at the right moment. Together, we stabilized the platform, refactored the backend, and prepared the product for acquisition.

Sometimes the problems that threaten a product the most are the ones that push teams toward their best decisions, both technically and strategically.

What goes after Serverless? I'm looking for a new buzzword.

Michal Šimon — Mon, 18 Nov 2024 20:06:56 +0000

Over the past two decades, various infrastructure trends have emerged and been replaced by new ones every few years. Initially, we relied on bare metal servers housed in company basements, requiring us to purchase hardware, replace failed hard drives, and manage power outages. With the advent of virtualization and cloud providers, we shifted our focus primarily to software management, though we still had to patch operating systems. Containers further reduced maintenance and enhanced scalability, but we remained responsible for managing the runtime environment. The serverless approach has taken this a step further, allowing us to concentrate solely on our application code and dependencies, while the cloud provider handled the rest.

AWS Lambda is turning 10

At re:Invent 2014, AWS introduced Lambda, marking the beginning of the serverless infrastructure paradigm. While many AWS services, such as SQS and S3, have been serverless from the start, Lambda catalyzed this architectural shift and popularized the term “serverless.” Over the past decade, the ecosystem has matured significantly and has become mainstream. New startups, including ours, are proudly building their entire infrastructure on serverless components.

A decade is a long time in technology, and innovation is relentless. It was inevitable that a new trend would emerge, and it seems to me that it has. We just don’t have a buzzword for it yet. Before we can start speculating about what comes after serverless, let's discuss what a modern application running on AWS looks like in 2024.

Modern Application Stack

From a simplified perspective, most web and mobile applications share fundamental similarities on the backend. They typically provide authentication, accept various forms of user input, send that information via APIs, and store it in a database. This data is then transformed or aggregated and presented back to the user.

This pattern is evident in e-commerce, CRM systems, internet banking, bookkeeping, and many other applications. Despite these commonalities, each piece of software has unique features that add value for users. Additionally, elements like machine learning models, integrations with other systems, and cryptocurrency transactions further enhance the value of modern applications. However, without the basic components of user input, APIs, and databases, these applications would be largely ineffective.

When building a new application today, you might leverage Lambda for your event-driven (EDA) microservices (Authenticator, API, Notifier, …). Combined with DynamoDB, these choices are mainstream and safe for a modern cloud-based system.

But what if there is an alternative?

Lambda Alternatives

Fortunately, we no longer need to develop all these microservices ourselves as AWS provides serverless solutions for many common scenarios:

Authenticator - Cognito
API - API Gateway, AppSync
Static Hosting - S3 + CloudFront
Async jobs - StepFunctions
Database - DynamoDB
…

Creating these services independently can easily consume over 50% of your project time. This effort, known as “Undifferentiated Heavy Lifting,” is essential but doesn’t set your application apart from competitors. Leveraging existing services allows you to concentrate resources on areas where your application delivers unique value.

Real Application Use Case

Imagine a truly serverless application constructed solely by integrating existing AWS services, even going so far as to omit a Lambda function. By "truly serverless," I refer to the original serverless manifesto, which advocates for resilience, pay-per-use pricing, the ability to scale to zero, and more.

Whether you opt for a REST API or GraphQL, the architectural variations may differ slightly, but the core concepts remain consistent. Thanks to API Gateway's ability to build custom HTTP requests using the VTL templating language, you can directly call services like DynamoDB to resolve complex queries, retrieve data, and create or update records. The OpenAPI schema defines validation rules for each endpoint, while integration with Cognito manages permissions and security.

AppSync offers a similar experience but with a few notable enhancements. It includes JavaScript resolvers in addition to standard VTL and Pipeline Resolvers for executing a sequence of steps in a specific GraphQL query or mutation. AppSync also supports GraphQL subscriptions, enabling real-time communication with your frontend over WebSockets. While similar to API Gateway’s capabilities, AppSync provides easier and more comprehensive WebSocket access compared to REST API implementations. Moreover, all AWS services have an HTTP API, allowing you to leverage services like AWS Rekognition for object detection in images or Bedrock for LLM prompting.

Asynchronous data processing is a common application requirement. Typically, this would involve a Lambda function consuming messages from an SQS queue or EventBridge. However, in our case, we'll use Step Functions to manage tasks such as chaining DynamoDB calls, executing parallel API calls to S3 or sending notifications to the user via SNS, and more.

Step Functions, combined with API Gateway or AppSync, can also handle synchronous jobs, but you get the idea by now, right? Lambda functions are great but in many cases, you can achieve the same result without them just by “gluing together” existing AWS services.

We can live without Lambda, but why should we?

As you can see we can get quite far by this approach. To be honest, from my experience of trying to build Lambdaless applications for several years already, I have to say it is not the most pleasant developer experience. Thankfully, it gets better with every single re:Invent. But why would one do all that hustle if writing a simple Lambda function can get you the same result without the headache?

I'm glad you asked. I'm not suggesting you abandon using Lambdas where they make sense. Instead, I'm highlighting an alternative worth considering when designing new infrastructure. Lambdas enables the use of familiar technologies, libraries, and frameworks, likely getting you to the desired result faster. However, this speed comes with long-term maintenance costs.

Lambda runtimes are regularly updated and deprecated, libraries and frameworks receive security patches and improvements, and the tech stack evolves rapidly. Keeping everything up-to-date, especially in complex architectures, is a significant task. Yet, many internal applications are developed once for a specific use case and expected to operate with minimal maintenance for years, leading to technical debt - a costly and cumbersome issue.

AWS services, in contrast, have stable APIs with minimal breaking changes. If a security issue arises, AWS typically addresses it faster than individual developers could manage for a library or framework. As a side effect, you end up configuring existing AWS services for most of your applications rather than programming in the traditional way. While becoming a YAML developer might seem less exciting, maintaining configuration is much easier over time than managing a Python or JavaScript project.

By "gluing together" AWS services, you improve operations predictability, uptime, and reliability. Native services handle traffic well, and scalability issues are more often architectural than technological. Additionally, your operation costs align closely with actual usage, which businesses appreciate. While Lambdas are quick to deploy, they can be costly to maintain over time. Relying solely on AWS services may be more expensive to develop initially but is cheaper to operate in the long run. The benefits are great. The question is whether you are willing to exchange the safety and familiarity of standard programming languages for significantly lower maintenance.

Back to reality

It’s very likely that you will occasionally need a Lambda function or even a container, and that's perfectly fine. When serverless concepts emerged almost a decade ago, it seemed unlikely that fully serverless systems would become a reality. Yet today, even healthcare and financial sectors rely significantly on serverless architectures, and startups are building fully serverless applications.

I strongly believe that with the constantly improving developer experience around AWS services, the need to write our own Lambdas for undifferentiated heavy-lifting tasks will diminish. Instead, we will integrate these ready-made "LEGO blocks" as needed for the majority of the use cases with significantly lower maintenance costs and increased reliability at the same time. This will enable us to quickly prototype applications without requiring major refactorings later on. I encourage you to give these ideas a chance and carefully consider whether you truly need your next Lambda function.

So how about the buzzword?

Backendless, Backend as a Service, and Backend as Configuration are a few of my own suggestions that, while not perfect, might spark some inspiration. If you come across a catchy buzzword for this architectural style, please share it in the comments.

What goes after Serverless? I'm looking for a new buzzword.

Michal Šimon — Mon, 18 Nov 2024 07:00:00 +0000

AWS Lambda is turning 10

Modern Application Stack

But what if there is an alternative?

Lambda Alternatives

Fortunately, we no longer need to develop all these microservices ourselves as AWS provides serverless solutions for many common scenarios:

Authenticator - Cognito
API - API Gateway, AppSync
Static Hosting - S3 + CloudFront
Async jobs - StepFunctions
Database - DynamoDB
…

Real Application Use Case

We can live without Lambda, but why should we?

Back to reality

So how about the buzzword?

Afraid of outgrowing AWS Rekognition? Try YOLO in Lambda.

Michal Šimon — Thu, 28 Mar 2024 09:33:00 +0000

When it comes to machine learning, the cloud is a straightforward choice due to its robust computing power. Cloud can also provide a variety of services that can speed up your efforts significantly. For the purpose of this article, we will use computer vision as an example of a machine learning use case and since my cloud of choice is AWS, the most relevant service is AWS Rekognition. It offers an API where you can send a picture and it will respond with a list of keywords that have been identified there. This is perfect for when you are building a PoC of your new product, as these ready-made ML services can bring your idea to life in a cheap and fast manner.

However, there’s a catch. As your product gains traction and real-world customers, you might find yourself bumping against the limits of AWS Rekognition. Suddenly, what was once a smooth ride can become a bottleneck, both in terms of accuracy and cost. Fear not! Switching quite early to a more flexible solution can save you a lot of headaches - like running a pre-trained model in an AWS Lambda, harnessing the same serverless properties that AWS Rekognition provides.

In this article, we’ll explore the limitations of AWS Rekognition, dive into the world of YOLO, and guide you through implementing this more adaptable solution. Plus, we’ll look at a cost comparison to help you make an informed decision.

AWS Rekognition

AWS Rekognition is a great choice for many types of real-world projects or just for testing an idea on your images. The issue eventually comes with its cost, unfortunately, which we will see later in a specific example. Don’t get me wrong, Rekognition is a great service and I love to use it for its simplicity and reliable performance on quite a few projects.

Another downside is the inability to grow the model with your business. When embarking on a new project, requirements are often modest. AWS Rekognition shines here, catapulting you from zero to 80% completion in record time. Over time, however, you realise there are some specific cases that you need to optimise your application for and Rekognition is not working flawlessly in those situations anymore. You can bring your own dataset and use AWS Rekognition Custom Labels which is a fantastic feature. It is a bit of work to get it right but once it works it can get you quite far. Unfortunately, if you outgrow even this feature and you need to tweak your model even further, the only option is to start over with a custom model. This limitation can feel restrictive, hindering your model’s evolution alongside your business needs which brings me to pre-trained models.

Overall, Rekognition can get you started quite quickly and bring you a lot of added value. Yet, as your project matures, its constraints and pricing can become bottlenecks and you will need to dance around them.

YOLO and Friends: A Versatile Approach to Object Detection

You Only Look Once, commonly known as YOLO, is a well-established player in the world of computer vision. This pre-trained model boasts remarkable speed and accuracy, making it an excellent choice for detecting hundreds of object categories within images. Whether you’re building a recommendation system, enhancing security, or analyzing satellite imagery, YOLO has your back. There are also many alternatives out there so do your research and pick the best model for your images. Hugging Face is a great starting point.

YOLO comes in various sizes, akin to a wardrobe of pre-trained outfits. Start with the smallest, lightweight and nimble. As your business needs evolve, upgrade to larger variants. But here’s the magic: YOLO’s flexibility extends beyond off-the-rack sizes. You can fine-tune it with your own data, enhancing accuracy to suit your unique context. It needs a little bit of knowledge but by the time your business will need such accuracy, you will probably also have some time and budget for it. It is not a silver bullet but you will eventually not face a dead end with the only solution of starting over like with AWS Rekognition.

Implementation

Python is these days the de-facto standard programming language for data analysis and machine learning. If you are familiar with Python, go ahead and leverage the YOLO documentation to implement a Python Lambda function for image classification. For educational purposes, I think it’s beneficial to assume you are not necessarily running everything in Python. I like to use JavaScript/TypeScript and I will use it here as an example. If you prefer to write your e-commerce solution, FinTech startup, or CRM in a different language, Python can not stop you here.

ONNX Runtime

Developing machine learning (ML) models in Python has become second nature for data scientists and engineers. Yet, the beauty of ML lies not in the language used during development, but in the ability to deploy and run those models seamlessly across diverse platforms. ONNX Runtime is a set of tools that allows you to achieve portability of models across different platforms, languages as well a variety of computing hardware.

You can choose a model developed in several of the common frameworks and export it into a *.onnx format that can be easily run on a specialized cloud processor or in your web browser. Let’s draw a parallel. Think of ONNX as the bytecode of the ML world. When you compile Java code, it transforms into bytecode, a portable representation that can journey across platforms. Then the ONNX Runtime, our Java Virtual Machine (JVM) for ML, will take the intermediate format and run it specifically optimized for the hardware configuration of the device.

ONNX Runtime is like JVM. Wherever it runs you can execute your models the same way. In our case, we will use the JavaScript version of ONNX Runtime for Node.js which is running in AWS Lambda as well. If you prefer a different language or a different infrastructure, feel free to leverage it as long as ONNX Runtime is available for your tech stack. Even though ONNX Runtime and JVM are quite different technologies, they share some common concepts at their core.

PyTorch to ONNX Conversion

ONNX covers many languages and platforms, yet a touch of Python to convert the model first remains essential. Because YOLO is leveraging PyTorch, let’s look at this variant. If you leverage TensorFlow or any other supported model format, the principle stays the same, just the specific commands will differ.

import torch

model = torch.load("./yolov8n.pt")

torch.onnx.export(model, "./yolov8n.onnx")

First, you need to load the original model. In our case it is yolov8n.pt - the smallest of the shelf YOLO variants. Then we need to convert it to yolov8n.onnx format. PyTorch already has an export method so we leverage it. Keep in mind that you can tweak the model even during the export and if you are planning to put a heavy load on the model, this can increase your performance and save you quite a lot of money. For simplification, we will just export it as it is.

Inference

Running the inference is almost as easy as the export. I will use JavaScript here to demonstrate a different language scenario which you might probably have. The Node.js ONNX Runtime library is quite lightweight and powerful.

The inference process can be broken down into three main steps:

Preprocessing - prepare the picture for the model
Prediction - ask the model to execute the classification
Postprocessing - convert the output into human-readable format

We create an InferenceSession instance which loads the ONNX model. Then you can just run it with the input to classify.

import { InferenceSession } from 'onnxruntime-node';

const session = await InferenceSession.create('./yolov8n.onnx');

const input = { ... };

const results = await session.run(input);

The input for the inference is a picture converted into a specific format that YOLO requires. In simple words it needs to be an array of all pixels ordered by colour: [...red, ...green, ...blue]. Each pixel in a standard color image has a value of red, green and blue which together mix the final color. So if your picture has 2 pixels, you put the red value of the first pixel into the array and move to the next pixel. Once both reds are in the array, repeat the same process for greens and blues.

Pixel1: [R1, G1, B1]
Pixel2: [R2, G2, B2]
YOLO Input: [R1, R2, G1, G2, B1, B2]

For more details, explore the YOLO v8 reference implementation.

Results

YOLO returns back three types of information. They all come in a bit ciphered format from the model but with a little bit of postprocessing (NMS) they look like this:

Label - the model returns an identifier, which you can easily convert to a name of the object category
Bounding Box - coordinates, where in the picture was the label detected
Probability - a number on a range from 0 to 1 how confident is the model about the label here

[
    {
        "label": "person",
        "probability": 0.41142538189888,
        "boundingBox": [
            730.8682617187501,
            400.01552124023436,
            150.67451171875,
            180.06134033203125
        ]
    },
    ...
]

It’s important to note that bounding box coordinates are normalized. You will therefore need to account for the original image width and height to calculate the specific coordinates if you want to draw rectangles around the detected objects like in my example.

Deployment

Microservices and event-driven architecture (EDA) are preferred choices in modern cloud architectures. The only step missing here is wrapping an API around the model and deploying it into a Lambda function that fits well into this architecture paradigm from several points of view.

Lambda is a serverless compute service that allows quick scaling and handling quite a big load as well as scaling down to zero. This means that Lambda itself costs nothing if not used. It’s an ideal candidate for an infrequent asynchronous load to annotate images. Scenarios like automatic content moderation, building an image search index, or improving your image ALT attributes because of SEO are perfect for this architecture.

Ideally, we would like to leverage an asynchronous SQS queue or EventBridge as a source of the events. Furthermore, storing the actual image in S3 and the results in DynamoDB can be a great addition. These architectural decisions depend highly on your application though.

AWS Rekognition vs. YOLO Lambda Comparison

I’ve mentioned earlier that AWS Rekognition is not the cheapest service, especially at scale so let’s compare the YOLO Lambda solution with Rekognition. Note that this is not a detailed benchmark, just a high-level comparison. Your implementation may vary.

	AWS Rekognition	YOLO Lambda
Inference	~300ms	~1s
Cost per image	$0.0010	$0.0000166667
Images per $1	~1000	~60K
Resources		1024 MB RAM
Other	us-east-1	YOLO: 8n

As you can see, the price difference (~60x) is quite significant in favor of YOLO Lambda although the comparison is not strictly apples to apples. I used the smallest YOLO model which is not as accurate as Rekognition. For some use cases that might be enough though. On the other hand, you can leverage the flexibility of a custom image model and upgrade the accuracy as you go or even fine-tune the accuracy with your own dataset.

With Rekognition, you get the simplicity of an API which is in many cases great to start with and in some cases even to stay with. YOLO Lambda is a bit more complicated to build and operate, however, it gives you great flexibility in terms of price, functionality, performance, and accuracy. Both variants can be further optimised for performance and cost so don’t take this calculation for granted.

Conclusion

Computer Vision is a very interesting and helpful capability if you are working with pictures. It can help you moderate content, identify specific objects, or even improve the SEO of your e-commerce website (article coming soon).

Achieving the best accuracy vs. price combination can sometimes be tricky with API-based services. Therefore we explored the possibilities of how to take advantage of pre-trained models like YOLO and run them in AWS Lambda with much more control and options to tweak for your specific use case. Finally, we compared the pros and cons of each solution from the cost perspective so you can pick the right one for you.

For those interested in delving deeper into this topic, I recently spoke at a conference where I discussed these concepts in greater detail. I encourage you to watch the video of the talk for additional insights and practical examples.

Are you leveraging computer vision in your application? Leave a comment below, I would love to hear your thoughts and experience.

Afraid of outgrowing AWS Rekognition? Try YOLO in Lambda.

Michal Šimon — Tue, 26 Mar 2024 20:00:08 +0000

AWS Rekognition

YOLO and Friends: A Versatile Approach to Object Detection

Implementation

ONNX Runtime

PyTorch to ONNX Conversion

import torch

model = torch.load("./yolov8n.pt")

torch.onnx.export(model, "./yolov8n.onnx")

Inference

The inference process can be broken down into three main steps:

Preprocessing - prepare the picture for the model
Prediction - ask the model to execute the classification
Postprocessing - convert the output into human-readable format

We create an InferenceSession instance which loads the ONNX model. Then you can just run it with the input to classify.

import { InferenceSession } from 'onnxruntime-node';

const session = await InferenceSession.create('./yolov8n.onnx');

const input = { ... };

const results = await session.run(input);

Pixel1: [R1, G1, B1]
Pixel2: [R2, G2, B2]
YOLO Input: [R1, R2, G1, G2, B1, B2]

For more details, explore the YOLO v8 reference implementation.

Results

YOLO returns back three types of information. They all come in a bit ciphered format from the model but with a little bit of postprocessing (NMS) they look like this:

Label - the model returns an identifier, which you can easily convert to a name of the object category
Bounding Box - coordinates, where in the picture was the label detected
Probability - a number on a range from 0 to 1 how confident is the model about the label here

[
    {
        "label": "person",
        "probability": 0.41142538189888,
        "boundingBox": [
            730.8682617187501,
            400.01552124023436,
            150.67451171875,
            180.06134033203125
        ]
    },
    ...
]

Deployment

AWS Rekognition vs. YOLO Lambda Comparison

	AWS Rekognition	YOLO Lambda
Inference	~300ms	~1s
Cost per image	$0.0010	$0.0000166667
Images per $1	~1000	~60K
Resources		1024 MB RAM
Other	us-east-1	YOLO: 8n

Conclusion

Are you leveraging computer vision in your application? Leave a comment below, I would love to hear your thoughts and experience.

Homelab - Lego for tech enthusiasts

Michal Šimon — Wed, 13 Sep 2023 08:11:51 +0000

I’m a cloud enthusiast who loves tinkering with different technologies and learning new things. That’s why I have decided to invest quite some time in setting up my own homelab, a computing environment where I can run my applications and services, experiment with different configurations and scenarios, and have fun with my gadgets. In this article, I want to share what a homelab is, how I built mine, and why you should consider building one too.

What is a Homelab

A homelab is your personal computing playground. A small on-premise solution for anything you want to test, experiment and learn on. It’s a great way how to build a piece of infrastructure to serve some non-critical workload, let it break, and fix it again to reveal opportunities for improvement in your architecture design. It’s also great for gaining hands-on experience and deepening your understanding of infrastructure concepts.

Among all the fun, homelab also comes with several challenges. Mainly it’s time, money, and energy to set up and maintain some extra hardware. Trust me, it’s a great commitment so make sure you manage your expectations accordingly. It can generate noise and heat, or pose a security risk if not properly configured. However, it’s a great way to have fun with technology without being afraid of breaking something serious (like production at work) and can be a rewarding experience.

The “hard” Lego pieces

The main server of my homelab is an HP ProLiant MicroServer Gen10 Plus with an Intel Xeon E-2224 CPU, 16 GB of memory, 512GB NVMe SSD, and three 2TB 3.5 NAS hard drives.

For networking, I use several Mikrotik routers and switches, including the Mikrotik RB4011iGS+5HacQ2HnD-IN, which is a powerful router with a quad-core CPU, 1 GB of RAM, 10 Gigabit ports, and dual-band 4x4 802.11ac wireless. This router acts as the gateway and firewall for my homelab network as well as providing wireless connectivity.

I also have two Reolink IP cameras which we will explore deeper in the following computer vision articles. These cameras have 4K resolution, night vision, and motion detection, and can stream video to my server or cloud storage.

Finally, I have a Raspberry Pi 4 model B with 8GB RAM that I use for running additional applications that do not require much computing power. Some parts of this infrastructure are backed up by a UPS unit connected to a 12V car battery.

The “soft” Lego pieces

I have various applications and services running on my homelab, each with its own purpose. I use Proxmox as my main hypervisor, which allows me to create and manage multiple virtual machines on my HP server. All VMs leverage the NVMe SSD, which offers high performance and low latency. One of the virtual machines I run is TrueNAS Scale, which covers all my storage needs, such as file sharing, backups, media streaming, and more for my whole homelab. It has direct access to the 3x 2TB hard drives in my server, which are configured as direct pass-through devices. This way, I can use the full capacity and performance of the hard drives without any overhead from the hypervisor.

One of the advantages of using TrueNAS Scale is that it supports ZFS, a modern filesystem that offers many features such as snapshots, compression, and encryption. For my 6TB total hard drive storage capacity, I chose to use RAIDZ1, which is ZFS’s equivalent of RAID5. RAIDZ1 uses a parity block to store redundant information that can be used to recover data if one of the hard drives fails. This way, I achieve a balance between durability and performance, as I get 4TB of usable storage and protection against single-drive failure. As a side effect, it also gives me double the HDD read speed, as it can read parts of the same file from two drives at once.

ZFS is not only a filesystem, but also a volume manager that allows me to partition the storage into multiple datasets based on the use case. Datasets are like sub-filesystems that can have their own properties, such as compression, encryption, quotas, and permissions. For example, for large video files, it is better to configure a dataset differently than for many small text files to achieve the best performance. Additionally, TrueNAS allows me to connect to the storage through many different protocols. For instance, in my case, I use SMB, FTP, and S3 thanks to MinIO.

TrueNAS Scale runs among other services its own instance of K3s, which is a lightweight Kubernetes distribution that allows extensibility on top of the ZFS storage solution. In my case, I run a MariaDB database in a container with a persistency layer in its own ZFS dataset. Finetuning the dataset for a relational database type of workload with regular snapshots and encryption was quite a challenge but I learned a lot.

Gotta talk about the cloud

Having a homelab makes me a better cloud architect

I am probably not giving away any spoilers by saying that almost everything in the cloud runs on top of bare metal servers. The cloud provider may abstract us from the complexity of managing the hardware, network, and security aspects of the infrastructure, but I do believe that this knowledge can be essential. By having some hands-on experience and seeing what is happening under the hood, I can gain a deeper understanding of how different components interact, how to troubleshoot issues, and how to optimize performance and scalability.

Just to be clear, in no way am I saying that managed cloud services are bad. Quite the opposite, I am a huge fan of AWS Lambda, AppSync, and many other services that make development easier and faster. However, I have felt that my lack of understanding of the underlying systems prevented me from designing the best architecture possible. But that’s just me, trying to be a perfectionist.

Homelab and cloud?

As a cloud enthusiast, I could not stop my mind from drawing parallels between my homelab and the cloud. To the extent that I could directly link my on-premise solution to some cloud concepts. Yes, it sounds a little bit ridiculous but bear with me, please.

My HP Server with Proxmox is like an AWS Availability Zone with a single server rack unit. Unfortunately, I can not plan for high availability just yet due to this limitation. Proxmox interface is comparable to an AWS Console where I can spin up new resources and monitor their utilization.

The cloud is usually on a very high level categorized as Network, Storage, and Compute. Almost all services require all three to operate, however, we can simply think about them so.

In my case, the Networking is done through my Mikrotik routers where I can manage my firewall, routing tables, subnets, and IP ranges. This is similar to AWS VPC and Security Groups.

Compute is covered by Proxmox which provides a building block of a Virtual Machine. This is similar to the AWS EC2 service.

The Storage part is handled by TrueNAS which provides services like MinIO S3, FTP, SMB, and MySQL which is comparable to AWS S3, RDS, and so on.

It would be nice to have support for serverless functions and other higher-level services, however, that’s one of the main reasons why it makes more sense to run your workload in the cloud instead of on-premise. The cloud offers more flexibility, scalability, reliability, and features than a homelab can ever provide. However, a homelab also has its advantages, such as lower cost (depending on the usage), more control over the hardware and software, and more fun for crazy people like me. 🙂

(Let me know if you would be interested in seeing Knative in my homelab.)

Conclusion

Homelab is the ideal place for me to experiment, make mistakes and learn from them. It has given me a lot of insight into the cloud and the amazing work that cloud providers do behind the scenes.

By running my own servers, I have gained hands-on experience with different technologies and systems administration skills. I have also learned to appreciate the convenience, flexibility, scalability, and reliability of cloud services that abstract away many of the complexities and challenges of running your own infrastructure.

At the end of the day, there are servers under the serverless functions and managed services, and as Werner Vogels, the CTO of AWS says: “Everything fails all the time”. It’s good to then know how the magic of the cloud works and how to fix it in case of a need.

I have thoroughly enjoyed building my own “personal cloud” and will be using it for various purposes, such as media streaming, file sharing, backups, and more. Homelab is a rewarding experience that I would recommend to anyone who is interested in technology and wants to grow their skills and knowledge.

Big thanks goes to TechnoTim, Lawrence Systems, and all the open-source software this project could not have happened without.