DEV Community: Jakub Narloch

The curious case of building a serverless service

Jakub Narloch — Mon, 20 Nov 2023 04:39:31 +0000

Two weeks back at CodeMaker AI we made announcements that new optimized APIs become available. We haven’t really explained at that time what involved making the change and what was the rationale behind it.

When I started this project back in early 2023 I made a conscious decision to build the entire service on a serverless stack, this would be at least the fourth time in my career when I would be building a production grade system in the same way, so this wasn’t by any means uncharted territory for me. I was already familiar with the well-known trade-offs like the cost versus service latency and issues with cold starts. The cold starts turn out in the end not to be our biggest problem.

The initial phase of prototyping and experimenting with the tech stack was successful and the service came to be by early March 2023. We ended up being happy with the tradeoff that we made, initially the development was fast, we were able to make deployments multiple times a day and the entire process was automated. When we finally launched, the biggest benefit came into play with service with very little usage in the early days, our first-ever bill was only $35.25 per entire month. Everything look great, except for one thing.

That one thing was the API latency, because the product is in an emerging market of Generative AI we have certain characteristics of our product performance. Typical model evaluation performance is being measured in thousands of tokens per second, which makes it unsurprising that a request can take anywhere from 2-3 sec up to 60 sec, we also had an ambitious goal that we challenge ourselves with allowing processing inputs that exceed the typical limits of the context of window with the limit at that time being 256 KB, currently increased up to 1 MB. This cause not uncommon to see API request latencies in the ranges of 30+ seconds.

All of this would not be really a problem if not for one thing, the AWS API Gateway request timeout hard limit of 29 seconds, this may be a small, but important detail, because there are other serverless offerings on the market that are not constrained in such way, like Google CloudRun. The API Gateway limitation forced us to build a solution that would work around that limitation, at that time we were still committed to continue investing in the serverless stack, so we ended up building on top of the existing tech stack. Unfortunately, this introduces a fatal flow. That flow was the latencies at P0 being continuously in the 1-2 sec ranges. Rather than hundreds of milliseconds, the requests would be taking at minimum an order of magnitude more. Now that may not have much of an impact on the requests that would be taking, either way, a couple of seconds, but in the meantime, we had also built other features that either were directly tide with user-triggered action or were aimed to optimize the end-user experience and even this optimized versions has introduced a visible delay to the users.

Fast forward 6 months into the future and now our service has grown 21 times, our cost is now on par with the use of serverfull stack and in fact, switching from serverless to serverfull would be a cost optimization for us at this point. We have also collected feedback from our users that the latency is an actual paint point for them.

So we re-architected the service, made the needed changes to move to the full server stack, and updated the integrations, we also used this as an opportunity to introduce a couple of small optimizations. The end result was as one could predict, our P0 latency has decreased by 40%, and our infrastructure cost increased, but one thing out of this entire experiment was a complete surprise.

The completely unexpected consequence of this change was the user's response to it. What we noticed is that our service usage grew by 30% week over week, measured throughout the entire week after the launch, and the only thing that changed was the API latency. No new feature has been launched at that time. I am completely aware of the discovery that Amazon made years back, that with every 100ms increase of latency their sales drop by 1%, but for the first time I have so clear indication of how performance is being perceived by the end user and how optimizing it ends up offering an overall a better user experience.

This isn’t the first time I have optimized a service, but in the past, most of those services were used in fully automated use cases, where the client was another service or system, in such case the main factor for the optimization was simply cost saving for the service provider. In this case, our cost in the short term did increase, but at the same time the attractiveness of our product increased, and hopefully the user satisfaction. Those are the tradeoffs that are worth the price.

At this time, we are committed to our new architecture. This switch required us to make some more changes to our build and deployment processes and from now on also requires capacity management, but in the end, this isn’t something that can not be dealt with and can be fully automated.

The learning from this entire experience is that a serverless stack can help in optimizing the operating cost. If the service turns out to be a complete bust, we would not have to spend cost going in thousands of dollars to learn that. The serverless stack was initially 20 times less expensive to operate until the usage actually caught up. It also matters who is your service end user, if your end users are people it will matter to them how long it takes for the API to complete. Through this experience, we learned a lot and we remain committed to making the service even slightly better every day.

CodeMaker AI offers tools and automation for software developers for writing, testing, and documenting source code.

The Next Generation of AI Developer Tools

Jakub Narloch — Fri, 17 Nov 2023 06:39:18 +0000

Overview

Generative AI has taken the world by storm in the past year. OpenAI achieved undeniable success in reaching out to the mass audience with ChatGPT. The shift towards AI did not avoid software development. Many tools have been created that bring the chat experience to existing developer tools and IDEs, but there are also some that took a completely different route.

What is CodeMaker AI?

CodeMaker AI is a new developer tool that specializes in processing source code. The cornerstone idea of its design was to offer a tool that can focus on automation. This is reflected in the way users can interact with the tool as well as through the offered set of features.

Not only auto-completion

Engineers at CodeMaker AI ask themself the question if prompt engineering or auto-completion are the only user experiences that can be offered. If an engineer is faced with the task of implementing a certain functionality, is there a more efficient way of doing it than by auto-completion? And while CodeMaker AI still does support that experience its main focus was to allow users to perform tasks on entire source files or even source directories. Currently, the supported tasks are code or documentation generation. By simply triggering an action on the source file it is possible to automatically generate the content of the entire file, which is not limited by anything other than the file size. The file itself can contain exactly one definition for processing like function, type, or method, or hundreds of them. The tool is not constrained by the input file structure.

This experience introduces a certain tradeoff. The same as with humans, the model can not with 100% accuracy predict what is the user's intention unless it's strictly told what to do. This is why it required the introduction of the concept of context-aware code generation, the easiest way to explain it is that if a user requires a specific implementation of the created code, they need to provide relevant requirements as part of the code comment, and in result have the generated result match it as close as possible.

Since this technology is still in its infancy phase and will take probably a long time until it is perfected there is no claim that the code that is generated is actually in every case going to meet the user expectations, but this is why CodeMaker AI makes it as easy as possible to experiment and re-iterate on the generated implementation, by allowing to replace the code at any time. This makes the generated code more expendable, but on the other hand, the generation is way faster and cheaper than written at hand, so iterating on it may be an acceptable trade-off.

Contextual operations

The task-based operations are the primary way of interacting with the tool, but not the only one. It is possible to use prompts like comments to generate new source code. Those prompts are contextual and depending on the location within the file will result in different outcomes. To illustrate this idea in Java or C# programming languages the top-level element of a file needs to be a class (or struct in C# case). Providing a comment within an empty file will generate exactly that a valid code in the given context. Going one step deeper and placing a comment within the context of a class will result in the generation of a method, and finally placing the comment within the method body will result in the generation of statements of code.

Syntax autocorrection

One remarkable application of machine learning is the ability to correct natural language errors. Most of us are familiar with features in our text processors in which spelling or grammar errors can be highlighted with suggested options to correct them. Developers are familiar with similar capabilities of various Integrated Development Environments that would also highlight the syntax or semantic errors and allow users to manually choose the correct action.

The next step would be to offer the capability of automatically detecting and correcting syntax errors. This is exactly the feature that has been developed by CodeMaker AI, where any syntax errors will be automatically discovered and corrected. In its current state, this works best for statically typed programming languages, but in the future, it can generalized to a degree that it will be useful for virtually any programming language.

Conclusion

The recent developments in AI space offered the possibility of exploring different ways of integrating it within the landscape of developer tools and offering a completely new set of capabilities. This space hasn't been even fully discovered and what is being offered by the tools today is only a peak of the iceberg of the future capabilities.