DEV Community: Johanna Both

Innovative Background Removal Solutions: A Comparative Analysis of Technologies

Johanna Both — Tue, 08 Oct 2024 15:03:21 +0000

For over five years, we've partnered with a leading UK media agency, to deliver cutting-edge digital solutions that blend physical spaces with interactive experiences. In this post, we’ll explore the technical solutions behind their popular photo and video processing application, diving into how background removal technology enhances user experiences.

Background Replacement feature with Green Screen

One of the most requested photo/video effects is background replacement using a green screen. In our app, there are two places where you’ll find this effect: the live preview and the final digital/analog product generated by the app. While these two use cases share a lot of similarities, they also have key differences.
For the live preview, we capture the video stream, break it down into individual frames, apply the background replacement, and then send the updated frames back to the live preview canvas. The trick is to do all this as quickly as possible, so it looks seamless. This process is similar to how we handle video processing, just at a faster pace. Replacing the background on a single image, on the other hand, is way less resource-intensive and much simpler to pull off.
The real question isn't about whether we can handle photo/video processing in-house; it’s more about what kind of external services we should tap into during the process to optimize performance.
Three Methods for Background replacement

Seriously.js by Brian Chirls

Seriously.js is a real-time, node-based video compositor for the web, inspired by tools like After Effects. It enables dynamic video effects with GPU acceleration, supporting various image inputs such as video, canvas, and Three.js. With built-in 2D transformations and a plugin system, it offers flexibility for interactive, high-quality video compositing directly in the browser.

Implementation
The chroma effect utilizes the principles of chroma keying, often referred to as "green screen" or "blue screen" techniques, to seamlessly blend images by rendering specific colors transparent. This method allows users to effectively remove backgrounds by selecting a range of colors from the image. With adjustable settings for the target color, weight, and balance, users can fine-tune the effect to suit different backgrounds. The real-time application of this effect makes it particularly useful for dynamic scenarios like video editing and live streaming.

Remove.bg API

This API automatically removes backgrounds from images, enhancing image processing efficiency. It offers a reliable and accurate solution that simplifies editing for users. With its advanced background removal capabilities, it saves time compared to manual editing. Additionally, the API is easy to integrate into various apps and websites, making it a versatile choice for developers. Here’s a straightforward example from the API documentation demonstrating how to use it in practice.

TensorFlow - Body segmentation API

The TensorFlow Body Segmentation API allows developers to segment human bodies in images or video using machine learning models. It identifies different parts of the human body, such as arms, legs, and torso, creating detailed masks for each region. This API is useful for applications in fitness, virtual backgrounds, augmented reality, and more. It provides real-time processing and it can be easily integrated into applications for tasks like human pose estimation or activity recognition.

Background replacement with the Selfie Segmentation model

We leveraged the body segmentation functionality of the API to identify the background and individuals in an image using masks. After segmentation, we can replace the background with any image of our choice, using any of two different models with the API. The BodyPix model helps identify and track a person's body parts, while the MediaPipe Selfie Segmentation model is designed to detect the background and individuals in a selfie using masks.

API integration
The implementation is swift and straightforward. We simply need to instantiate the model and integrate it into our processing workflow.

Now that we've built our segmentation model, it's time to put it to work. The function takes a canvas with our image on it as input, and the output is a binary mask showing the segments created by the model.

The output binary mask visually looks something like this. Now, we can use the background mask to cut out the background image and layer it on top of the original as an overlay.

https://github.com/tensorflow/tfjs-models/blob/master/body-segmentation/images/drawMask.jpg

Hope you enjoyed our comparison post about background removal technologies. Open for some opinions and suggestions based on your experience, let us know in the comments.

Author: Norbert Juhász, Software Developer @Tappointment
Image credit: Generated by DALL-E and Canva

Code images: codesnap.dev, github

Links:
Seriously.js - https://github.com/brianchirls/Seriously.js
Remove.bg API - https://www.remove.bg/
TensorFlow Body segmentation API - https://github.com/tensorflow/tfjs-models/blob/master/body-segmentation/README.md

Optimizing Testing Efficiency: A Journey from Selenium to Playwright

Johanna Both — Tue, 24 Sep 2024 14:34:52 +0000

What is Playwright?

Playwright is a web browser automation tool developed by Microsoft, designed to thoroughly test web applications from end to end. It supports multiple programming languages such as Python, JavaScript, Java and C#, and allows testers to conduct their browser automation tests in Chromium, WebKit and Firefox. Playwright is also suitable for mobile testing on Chrome and Safari.
Importance of browser automation in software development
Every IT company needs to deliver high-quality products, so they need to test their system thoroughly. Beyond traditional manual testing, automation is vital in ensuring software quality. Browser automation plays a critical role in covering regression testing, UI/UX evaluations, and verifying cross-browser compatibility.

This approach alleviates the testing team's workload, optimizes efficiency, and conserves valuable time and resources through the automation of repetitive tasks. Moreover, integrating automation testing into Continuous Integration/Continuous Deployment (CI/CD) pipelines enables developers to immediately check their code and ensure that any introduced modifications do not break existing features.

Early testing is crucial to avoid prolonged debugging sessions, project delays, and low-quality product releases.
Test automation frameworks
At Tappointment, when we first had the opportunity to write automated tests, we were debating which framework to use.
Initially, we found two worthy candidates: Cypress and Selenium. Both browser automation tools have their strengths and weaknesses, but after weighing the pros and the cons, we decided on Selenium to automate the regression tests of our popular Jira retrospective tool > Power Retro ✨ AI-powered Retrospective for Jira

Road to Playwright

Our initial choice of Selenium was driven by its capacity to seamlessly integrate with iframes, a crucial functionality required as Jira exclusively provides an iframe for embedding our application. Cypress, at the time, did not offer comparable support for iframes. Additionally, Selenium's broad browser support, encompassing Chrome, Firefox, and Safari, aligned with the primary focus of our company's projects. The framework's popularity, coupled with our testers' proficiency in Python—a language fully supported by Selenium but not by Cypress—played a significant role in shaping our decision-making process.

Swiftly immersing ourselves in test case development, we used Selenium to craft approximately 14-15 tests. The integration of Selenium with our CI/CD pipeline in Gitlab proved instrumental, enhancing our ability to identify and address new bugs before deployment. As the product's development progressed, the necessity arose to adapt the tests to accommodate new features, prompting the occasional creation of additional tests to augment our regression test suites.

While Selenium initially met our requirements, a significant challenge emerged as the tests exhibited flakiness, frequently reporting false positives and false negatives. This drawback became counterproductive, consuming more time than manual testing following system changes. Recognizing the need for a more reliable solution, our quest led us to Playwright.

Initial thoughts

Playwright received numerous positive feedback for its reliability, prompting me to explore its capabilities for our testing needs. Initially, I attempted to recreate our tests using Python, given its compatibility with Playwright. However, I encountered limitations as certain user-friendly features of Playwright were unavailable in the Python implementation. As a result, despite my limited proficiency in JavaScript and TypeScript, I opted to develop tests using TypeScript. The framework's ease of use became apparent, with any challenges arising primarily from my relative unfamiliarity with the programming language.

Key benefits

Installing Playwright is a straightforward process; it can be easily set up through npm without the need for additional installations, as opposed to Selenium.

Playwright boasts remarkable browser compatibility, eliminating concerns about code working seamlessly across various browsers like Safari and Chrome. In contrast to Selenium, where different drivers had to be created and installed for each browser, Playwright streamlines this process, requiring no extra packages for immediate functionality. This not only simplifies the code but also allows for the creation of separate browser contexts, facilitating parallel test execution.

The inclusion of Playwright's test runner UI enhances the testing experience. It simplifies test execution, debugging, code navigation, element inspection, console monitoring, and network analysis. Additionally, the UI provides an effortless way to view screenshots during test runs, all without the need for additional code. Configuration settings in the config file enable the framework to automatically save screenshots on failure and record execution videos.

Playwright's automatic waits offer a substantial advantage over Selenium, where extensive waits were necessary for proper functioning.

The reporting feature is integrated into Playwright, offering diverse output formats such as blob, xml, json, and html, as well as CLI options in both line and dot modes. Furthermore, users have the flexibility to implement custom reporting methods. This functionality proves especially beneficial when incorporating tests into CI/CD pipelines, particularly when there is a requirement to store reports within Jira.

Most notably, Playwright has proven to be highly reliable in our experiences so far. The singular drawback encountered pertains to the insufficient development of its documentation and a notably smaller community compared to Selenium.
Conclusions
Our transition from Selenium to Playwright at Tappointment has markedly enhanced our testing efficiency and reliability.

While Selenium initially met our requirements, the streamlined setup, unparalleled browser compatibility, user-friendly test runner UI, and integrated reporting features of Playwright have proven instrumental in overcoming challenges and optimizing our testing processes.

Despite encountering a limitation in Playwright's documentation and a smaller community, the benefits in terms of reliability and ease of use have decisively outweighed this singular drawback.

In our commitment to delivering high-quality software products, we persistently explore optimal solutions for our automated testing requirements, with Playwright currently serving as our preferred framework.

Author: Annamária Takács | QA Engineer @Tappointment
Image Credit: Canva AI image generator

Behind the Scenes: Implementing AI to Streamline Retrospectives

Johanna Both — Thu, 29 Aug 2024 11:56:12 +0000

In this article, I’ll walk you through our journey of how we implemented artificial intelligence in Power Retro, our Jira Add-on designed to help with facilitating retrospectives. Let me show you what challenges we faced along the way and what we’ve learned in the process.

We as software developers are obsessed with improving and optimizing every little detail in our processes. While developing Power Retro, we realized that we spent a significant part of our retrospectives (using Power Retro, pretty meta, I know) grouping ideas that we’ve already discussed one by one, then creating action items for these groups based on what we’ve discussed half an hour earlier. Lots of redundancy and monotonous work. There had to be a better way, right?

Fixing retrospectives

Last year we set out to fix these problems slowing us down. We knew we wanted to use some kind of AI to do the heavy-lifting. We debated a lot about what technology to use: we could’ve gone the route of training our own model but it would have required way more time and resources than what we had available. While skeptical at first, we agreed on trying out a commercial service. Luckily, the timing was perfect. OpenAI had just released their GPT-4 model that opened the door for a lot of exciting opportunities. We quickly threw together a proof-of-concept of how we could integrate artificial intelligence into a retrospective’s flow. And the results were… mixed at first.

Early challenges

Our initial prompt was very basic, simply asking the AI to group the given cards and return the output as JSON. Even though we gave it some examples of how the JSON array should look like, sometimes it ignored it and made up new fields or completely changed the structure.
For example, the newer versions of the models love ignoring the instructions about how they should insert JSON into the response. Our prompt explicitly asked for the JSON to be wrapped in 3 double quotes (something like """[...]""") but the models will, by default, wrap them in a Markdown code block ([...]).

To fix this, we had to adjust the temperature and top_p values to the make the AI more deterministic and strictly follow the prompt. LLM’s work by guessing what the next token should be (the unit of text they operate on) based on the prompt and previous generated tokens. A low sampling temperature forces the model to to be more focused and deterministic (0 always generates the same output for the given input) and higher values allow it to be more creative but also more unpredictable. top_p controls nucleus sampling which limits the available tokens based on probability. A 0.1 top_p value would only allow the top 10% most likely tokens to be chosen.

Here’s some examples of the same prompt at different sampling temperatures. The prompt for the tests was intentionally kept simple to show show the effect of sampling temperature and I’ve used the gpt-3.5-turbo-16k model since that was the latest model generally available at the time:

The cards were 10 randomly generated nouns: actor, port, teapot, broccoli, ramen, sneakers, nectarine, archaeologist, plane, ship.
With the temperature value set at 0 the results are pretty boring:

actor was categorised as “person”, archaeologist as “occupation” and everything else as “object” when there were clearly more connections that could’ve been made. Let’s look at what happens when temperature is set to 1, the default value for the API:

Getting better, the model now identified that both actor and archaeologist are professions, broccoli, ramen and nectarine are food and put everything else into “object”. But remember, the output is not deterministic anymore: in the following runs it even went back to the same grouping as the previous example, switched between uppercase and lowercase category names, made groups for each individual card and returned the JSON in the wrong format. There’s a tradeoff between creativity and consistency. What if we push creativity too far? It was a challenge to not get a 500 status code when using the maximum temperature value of 2, but here’s the masterpiece generated by the AI:

It’s generally not recommended to set the temperature any higher than 1, the web interface of ChatGPT, for example, is set at around 0.7, while code generation tools, such as Github Copilot use even lower values.

Back to Power Retro. To avoid these issues we decided to set the temperature to a low value. While making the model deterministic fixed our problem of inconsistent outputs, we limited the AI’s creativity. This meant it couldn’t find more subtle (or even obvious) connections between the cards and we couldn’t retry the grouping since the output was exactly the same each time. We’ll get back to this later.

Why group cards for minutes when the AI can do it in… also minutes?
The other issue was speed. GPT-3.5 Turbo and GPT-4 were both painfully slow in an actual retrospective (we average around 30-50 cards), in extreme cases taking 2 minutes to return the whole output. This isn’t an issue when using ChatGPT since it uses streaming, a technique that allows an HTTP request to stay open and receive the response in chunks instead of waiting for it to finish before processing it. In our case, seeing cards fly across the retro board and magically group themselves might be cool for the first time but it doesn’t improve the user experience at all. We realized we could speed up the grouping process by generating the response ahead of time instead of waiting for user interaction. Think of how YouTube starts uploading and processing a video in the background while you’re filling out the details.

In Power Retro after creating the cards and moving to the “Presentation” step, they cannot be changed so we can start grouping them right away. We introduced a state machine to keep track of the grouping’s state and moved calling the OpenAI API to a background job to not block any other actions. By the time we reach the “Grouping” step the processing will have already finished and clicking on the “AI Grouping” button will feel instantaneous, without a minutes long loading screen.

At this point the grouping felt pretty good and we used it in every single one of our retrospectives, only having to move around a couple of cards, if any at all.

Generating action items

Seeing the success of the grouping feature we began working on the action items. After the grouping step, participants can vote on issues or groups of issues that they feel are important and create action items for resolving them. This is also something that the AI should be able to help with.

Remember how we were struggling with our prompt to get the required output earlier? Just a few days after kicking off work on the action items, the most important feature (at least for us) of the OpenAI API just launched, function calling. Instead of getting a text response and trying to enforce a JSON object that’s usable in our code, we can create “functions” that the AI can call. When the model decides to call a function it returns a special response with the arguments of the function in a format following the schema we provided.

The function’s result can then be fed back into the context of the model to be used later, for example, an analytics chatbot could make a SQL request to the database based on the user’s input and generate a human readable report. For our use case we are only interested in the structured JSON output. Let’s look at a very simplified example:

When we ask the model in the prompt to generate action items, it returns an object following this schema. This was a game changer. We were able to give more creativity to the model and significantly improve the quality of the results without any downsides. We discarded almost everything we’d made in those few days before function calling, but the new implementation was done quickly, and we started testing it.

There was still one problem that made this feature unreliable (despite being more reliable at returning correct code). The titles of the cards are usually short with very limited context, and the AI had difficulty handling ambiguous titles. For instance, we often had cards related to unit testing both in positive and negative contexts that the model couldn’t differentiate and defaulted to the negative sentiment. The solution was to assign a “feeling” value to each card. The feelings were already present in the application: a retrospective board has different columns for different sentiments (Start/Stop/Continue, What went well?/What didn’t go well? and so on); we just needed to assign numerical values to them. Now our AI responds with correct and (with some prompt-engineering) actually helpful action items. We used these same techniques to give more creativity to the AI in the grouping feature as well.

What’s next?

Thank you for making it all the way through! It’s hard to keep up with LLM’s getting better day by day but it also means we always have something to improve. Since we released these features OpenAI has introduced their GPT-4o and GPT-4o mini models, outperforming GPT-4 in quality, speed and even price. Just as I was writing this post, they also released Structured Outputs, which allows developers to use JSON responses directly, without relying on function calls and even integrating with libraries such as Zod for complete type safety. The future of LLM’s is more exciting than ever and I can’t wait for what’s to come.

More about Power Retro

Power Retro is a Jira extension built to make agile retrospectives more efficient and less time-consuming by automating repetitive tasks and enhancing collaboration. Designed with distributed teams in mind, it helps identify pain points and generate actionable items that integrate seamlessly into Jira. With the addition of AI capabilities, Power Retro automates repetitive tasks, reducing the duration of retros by up to 60% and enabling teams to focus on meaningful discussions and continuous improvement without getting bogged down by process.

Author: László Bucsai | Full Stack Developer @Tappointment
The picture is generated by DALL-E and Canva