DEV Community: knarik

Shipping large ML models with electron

knarik — Tue, 11 Apr 2023 13:53:34 +0000

How do I ship a large machine learning model with an electron app? Not so long ago, I couldn't find a resource to solving this problem, so I decided to do a write up of my experience, which could be useful for others. Let's dive right in.

Since the beginning of our work on acreom, we wanted it to have an IDE-like experience with real-time autocomplete suggestions in the context of knowledge base and tasks.

The first problem we decided to experiment with was to classify free text as a task or event. It turns out that building a binary classifier for such a use case is relatively easy; however, shipping it with Electron is the tricky part.

But why ship it with electron in the first place? It mainly boils down to speed of inference and user privacy. Hence, the problem presents the following specs:

fast inference (required for real-time suggestions)
minimum memory footprint
a good user experience means high accuracy with low false positive rate
user privacy - no API calls, fully offline, shipped on the client.

The ML part: exploration & fine-tuning

In spite of having no prior datasets available, this was a relatively fast and easy task. I have manually created a small dataset of roughly 1200 samples, where a task / event class looks like this: code tomorrow morning and negative class like code is simple, with a roughly 50/50 class balance.

An interesting side observation was to learn the semantics of such examples where 2 opposite samples share the same words but not the meaning. Later on, this allowed me to take a few actions to increase the overall performance of the model. I left out the lemmatization in the preprocessing pipeline and created feature engineering that applies additional weights to queries which start with verbs or include time for example. I have used quickadd, an open-source library for parsing time & date I have forked from ctparse (and upgraded with a lots of modifications).

After many experiments with different techniques and models, I settled with a bi-directional LSTM written in Pytorch. This worked surprisingly well considering the tiny dataset it was trained on. After some additional fine-tuning, I was happy to end up with F1 scores around 0.95, which is in the production territory for this use case. Great, the model works. Now all I need is to figure out the electron stuff.

Figuring out the electron stuff

This is where things get hairy. Firstly, The trained LSTM model with its custom word embeddings was not small in size by any means. Its dependencies, with custom word embedding and ~70k parameter model, had over 4GB in size all together!

Secondly, I wanted to keep our ML development process lean and fast when it comes to shipping in production. A few fundamental building blocks were necessary, so I could build future models systematically.

Okay, so maybe I can have some sort of an API interface written in python that would somehow communicate with the electron and it would all be freezed as a separate executable with the model, shipped alongside electron? Maybe this could work.

Inspired by the IDE language server protocol, I created an API interface between the electron and the Python ML interface. ZeroMQ turned out be an invaluable resource as a fast and lightweight messaging queue between the two.

Now all I needed was to freeze the Python interface into an executable that would accept requests from the electron, infer from model, and send response back.

PyInstaller seemed like the most maintained and developed tool to freeze python script into an executable, so I went with it. As expected, the freezed interface with the model was gigabytes large, so I had to figure out how to squeeze this. Fortunately, Onnx worked wonders and packaged the model into an inference only state, so I could throw away the Pytorch and Torchtext dependencies when freezing with Pyinstaller.Now the size of the executable with the model was 43MB instead of 4GB.

Pyinstaller throws a curveball every now and then with missing .dylib files in the process, but nothing that can't be figured out with symbolic links to the local dependencies. What did the trick was to offload heavy Pytorch and Torchtext libraries with their dependencies to the bare minimum so the script could work.

Here's brief rundown of how all of this works:

When the electron is opened for the first time, the main process retrieves available port and runs the ML executable, listening to the port. I have built in a retry logic for error handling and for disconnecting handlers on close.
The electron then sends an initialize message through the ZeroMQ to initialize the ML model, so it listens for requests. This came as an additional logic to prevent sending requests to the executable which was not yet initialized in the step #1. After the initialization, it listens for queries as JSON objects. Here's a sample query:

'{"requestId":{ID}, "action":"infer","service":"classifier","data":{"data":"code tmrw 7-9pm"}}'

When initialized, the executable listens for messages with the service type, reads its request, and runs it through the appropriate model for the inference.
Since this model is a binary classifier, the response propagated back through the messaging queue to the fronted of the application, like this:

'{"data": "1", "requestId":{ID}, "service": "classifier"}'

The Frontend takes care of triggering the visuals and converting the text into a task component upon the confirmation from the user within a timeout of the listener. The end result looks pretty solid!

This experiment went to production soon after, and while it's not the best and most desired UX implementation, it served with good learnings for future work.

If you have any questions or feedback, feel free to reach me martin@acreom.com

Building startup local-first

knarik — Tue, 28 Mar 2023 09:33:25 +0000

by @matoantos

For the majority of the past 2 years, our small team of 5 (1 designer, 4 engineers) have been working remote-first. While this lately popular trend worked for us in the beginning, soon after we felt something was missing. It was when we set up our first office, the real change came.

Looking back, it was one of the best decisions we’ve made. Here‘s what we have learned.

Communication is instant

When working remotely, we relied on Discord to handle all of our communication. This often meant that a meaningful chunk of our communication was async. We would occasionally miss notifications or had to coordinate our debugging and pair programming sessions. This often resulted in communication delays , and made it more difficult for us to convey information real-time.

Working in-person, most of these issues have disappeared. We are able to have more productive and efficient discussions, and we can clear up any misunderstandings or ambiguities more quickly.

Being in an office setting can lead to distractions from others' conversations when you are trying to stay focused on your work. ANC headphones turned out to be an invaluable tool for us, serving as a 'do not disturb' indicator and enabling us to focus on deep work.

Shipping and learning faster

Our productivity and learning curve velocity have risen significantly once we started working together in a physical space. We are shipping higher quality software, faster and more frequently. This is partly due to just being together and not having too many distractions in our own familiar space around us.

Within the first weeks of working together we have found our rhythm and routine which further helps us being more focused on the work by minimizing the need to think about what to do during the day. Our day begins at around 9am by a focus period until 11:45am. After lunch we have a cooldown period until around 1:30pm. In the afternoon, we do another stint of focus until around 5pm to 6pm, interrupted only by the bi-weekly sync at 3pm.

Another factor contributing to our faster delivery is the accountability we have for each other. If we do not deliver or do a poor job, we receive almost immediate feedback to do better. On the other hand, if we deliver good results, we receive positive feedback to stay on the right path.

Spending time together outside of work

The serendipity of talking about work as well as life in general, during lunchtime or breaks, often led us to new perspectives and advancements in the development process. It also does wonders for alignment on different issues and the overall direction.

Similarly, having random discussions about the task you are currently working on can help problem solving both by talking to another person about your problem, and perhaps letting things settle in your head, as well as hearing another opinion and point of view.

Pair programming

Pair programming plays a big role in the development process. Whether we are stuck on an issue, or want to speed up making of a new feature, we use pair programming. It helps us both to code faster, as well as be more error proof.

We also do pair VQA with our designer to do final adjustments on features. The effect of this is we always tweak the design just right and also learn to do UI better and speed up future development by understanding the concepts our app is based on.

Pair programming sessions quite often (intentionally or otherwise) turn into open forums where everyone chips in with their opinion and perspective. Having such discussions serves to align ourselves on the issue, serves as an early feedback session, which in turn speeds up the development process.

Culture

Building culture is easier in-person. As a small team building a startup from the ground, we work, eat and have fun together. Firstly, there’s so much serendipity that happens just by being side by side in this process. From exploring new ideas randomly to laughing together about how miserably we have failed at something - real-time. We of course could have done all of that remotely, but being present just adds a little bit of something magical to the equation.

Secondly, we believe culture is a sum of everyone’s decisions. It’s how we approach things, handle situations and work together as a team. More importantly, we not only get to know each other by having a first row seat to this experience, we influence each other. Hiding behind a screen can water down this experience.

Startup is an intense rollercoaster of highs and lows we get to experience - together. It would indeed suck if we sat in our own isolated cabins throughout the ride.

Takeaways

The switch to a physical office has had numerous benefits for our team, and we have gathered some important lessons from the experience.

Communication is more effective making it easier to collaborate and coordinate.
ANC headphones are an awesome 'do not disturb' indicator.
Productivity and learning curve velocity increases with ideas and resources being shared more easily.
Interacting throughout the day leads to improved development and improves alignment on different issues.
Pair programming is more effective when done in-person.
Working in a physical office helps us build a culture and makes the whole experience more enjoyable.

➤➤➤ https://acreom.com