DEV Community: Davide Mauri

Retrieval Augmented Generation with Azure SQL

Davide Mauri — Thu, 29 Aug 2024 21:22:23 +0000

Retrieval Augmented Generation, or RAG, is one of the hottest topics at the moment as it opens up the possibility of interacting with data using natural language, which is a long-time dream finally coming true.

It is very likely that a lot of your data is already stored or will be stored in Azure SQL, so a common request is to have an example on how to apply the RAG pattern to your own data stored an Azure SQL database.

This blog post is all about that. Let's start from the basics and make sure the RAG pattern is clearly understood.

RAG Pattern 101

To make the explanation easy to understand, even if you are completely new to the topic, let's start from a simple scenario. You have a database where you have stored details - title, abstract, time, speakers - of all the session of a conference. A good example could be the just passed .NET Focus on AI conference or the forthcoming .NET Conf 2024 conference, or one of my favorites, the VS Live conference.

Why RAG?

You already know that Language Models like GPT-4 or Phi-3 can accept any text you'll provide them, and they can generate answer to almost any question you may want to ask. So, why a specific pattern like RAG is needed? Why can't we just send all the titles and the abstract and all the information stored in the database to the LM and call it a day? Then we could ask anything using a simple API call and our work we'll be done.

There are two reasons why you don't want to do that.

First of all, if you are asking something related to Blazor, there is no need to give the LM details about a session that is completely out of scope: it will not help the LM to answer and could actually make the answer less precise (see: "Lost in the Middle: How Language Models Use Long Contexts"). Secondly, the cost of an AI call is based on how many tokens the sent text must be split into - tokenization is something that happen transparently behind the scenes - and so the less text you send, the less token you'll be sending, which means that you'll be using less resources. More efficiency, less digital waste (which means less power consumed), lower price to pay for: it's a win for everyone!

RAG Steps

The first step of the RAG pattern is to filter out all the data that is not relevant to the question being asked. For this first step, typically, a semantic search is performed on the text. To do a semantic search, embeddings, and thus vectors, are used to do vector similarity search and return only the relevant results. Since Azure SQL is a powerful modern relational and multi-model database, you can enrich vector search other all other filtering capabilities that it has already. Filtering by exact values (for example all sessions on a certain day), by JSON data (for example sessions with certain tags) or even using geospatial filters (for example session delivered withing 1 km from me).

Once you have the relevant data you can then efficiently send it to the LM, along with the question you want to ask, to have the answer in natural language, without wasting resources and money.

A diagram that shows the two steps of the RAG pattern is here so that you can easily visualize the process and see how it is applied to the sample data we're using in this post.

Now that you are familiar with the RAG pattern, is time to see how it can be realized using Azure services.

RAG pattern in Azure

There are many ways to implement the RAG pattern in Azure. I personally love the serverless approach provided by the cloud, so I'm using serverless services in this sample. If you prefer a containerized approach, keep in mind that everything described here can be easily hosted in a container, if you prefer to do so.

The high-level architecture of the RAG pattern applied to Azure is the following:

The Azure Services being used are:

Azure Static Web Apps
Azure OpenAI
Azure Functions
Azure Functions SQL Trigger Binding
Azure SQL Database
Data API builder

Frontend

The fronted is a simple React application hosted in Azure Static Web Apps. It is used to allow users to ask a question that will then be answered applying the RAG pattern. There is also the option to just do similarity search to clearly see the difference in terms of response between a simple similarity search and the full RAG pattern.

Backend

Azure Functions are used to handle the question asked by the user and to orchestrate the RAG pattern. Once the question is asked, the Azure Function called will do similarity search in Azure SQL, then pack the results into a pipe-separated format

string.Join("\r", sessions.Select(s => $"{s.Title}|{s.Abstract}|{s.Speakers}|{s.Start}|{s.End}"));

and then send the question, the list of session and the following prompt to the LM, hosted in Azure OpenAI:

"You are a system assistant who helps users find the right session to watch from the conference, based off the sessions that are provided to you. Sessions will be provided in an assistant message in the format of `title|abstract|speakers|start-time|end-time`. You can use only the provided session list to help you answer the user's question. If the user asks a question that is not related to the provided sessions, you can respond with a message that you can't help with that question."

Data API builder is used to easily expose stored procedures that are called via REST calls from the frontend to show how many sessions have been indexes (in the “About” tab) and to call the find_session procedure that does vector search (available in the “Search” tab). Data API builder automatically expose desired database objects as REST or GraphQL endpoints, which is great to quickly deploy a CRUD service that can be called by any framework, in just a few minutes.

Azure Functions are also used to immediately turn session title and abstract into an embedding as soon changes are made to the database table. This is done by calling Azure OpenAI embedding model. Azure SQL Trigger Binding is what make possible to have tables monitored for changes and then react to those changes by executing some code in the Azure Function itself. It is extremely lightweight (it uses native Azure SQL Change Tracking capabilities behind the scenes) and it provides all the flexibility and computation power needed for almost anything.

Database

Azure SQL's support for natively storing and querying vectors is in Early Adopter Preview. It includes the ability to store vectors in a compact binary format and to calculate distance between two vectors - and thus calculate the semantic similarity of related topics as vectors in this sample are the embeddings of session title and abstract - so that filtering only the relevant session given a user query is as easy as writing the following query:

select top(10)
    id, 
    title,
    vector_distance('cosine', @qv, embeddings) as cosine_distance
from
    web.sessions
order by
    cosine_distance

the query will return the 10 most similar sessions, given the search vector @qv that contains the embedding of the topic being searched. Getting the embeddings for some text can be done in many ways with many languages, but at the end of the day is just a REST call, so in Azure SQL it can be easily done using sp_invoke_external_rest_endpoint as shown in this sample Get_Embeddings procedure.

Code and Demo

That's it. Implementing the RAG pattern in Azure SQL is incredibly easy. If you want to see it by yourself, I've presented about this topic just a few days ago at .NET Conf Focus on AI, where I demoed the full end-to-end pattern. You can get the repo and run the demo either locally (except for Azure SQL DB, but hey! there is a free tier for that!) or in Azure and then from there you can start to use your data instead of the sample demo data provided and you'll be on a good path already for allowing your users to chat with your data.

Conclusion

For this sample, as you have noticed, I used SQL and .NET directly, even though there are many libraries out there that are trying to abstract and simplify the whole process. My goal for this post was to make sure that you learn and understand how things work behind the scenes so when you'll be using any of the amazing libraries available (be it Semantic Kernel or LangChain) they will not be just a magic black box, but you know exactly what is happening behind the scenes.

And, anyway, samples using those libraries I just mentioned will follow soon, so stay tuned!

Share your dev wishes 👍

Davide Mauri — Tue, 13 Feb 2024 16:04:51 +0000

🚨 🚨 🚨 CALLING ALL DEVELOPERS!!! 🚨 🚨 🚨

Are you building applications with Databases? 👍 Help us understand how we can make the Azure SQL Database the best place for your developers 👨‍💻 👩‍💻 to be productive, guaranteeing scalability 📈, performance 🚀 and security 🔒, so that you can grow with peace of mind 😃. Fill out this form here and help to shape the future of Azure SQL as you would like it to be!

https://aka.ms/datadevlist

Thanks to your feedback in the last years we added quite a lot of new stuff the developers loved. From the ability to call a REST API right from Azure SQL with the sp_invoke_external_rest_endpoint stored procedure to Data API builder that takes your database and turn it into a stateless, scalable, REST and GraphQL API, through JSON improvements, optimized locking and more to come this year...so don't miss your chance to shape the future so that it will be as you want it to be!

PS
And of course I could I not mention also the Azure SQL database free tier? (And free in the sense of totally, completely free, with no time-limits!)

OpenAPI for your Azure SQL database

Davide Mauri — Tue, 24 Oct 2023 22:20:27 +0000

A recent and exciting feature of Data API builder (you may have learned about Data API builder from my previous article, as I used it to quickly make a stored procedure and a table available as REST endpoint to easily integrate with OpenAI) is the compatibility with OpenAPI specifications and Swagger. This was a highly demanded feature, and it is impressive to see it in action. You can quickly transform your database tables, views or stored procedures into standard, modern, scalable, REST endpoints that are accessible to everyone.

For this example, in just a few minutes, I converted the AdventureWorksLT sample database into a REST service that you can access and use – yes, you heard me, enjoy it! – at this link: https://dm-dab-awlt.azurewebsites.net/swagger

The database can now be used easily with any modern frontend framework, be it React, Vue.JS, Svelte, Blazor or anything that is able to make a REST call, and easily query the data with a simple (just using plan Javascript here):

var result = await fetch('https://dm-dab-awlt.azurewebsites.net/api/Customer/CustomerID/5')
var body = await result.json()

And you also have pagination, sorting, filtering and field selection capabilities, not to mention support for authentication and authorization (in fact if you try to do anything other than a GET you’ll get a 403). Pretty impressive if you ask me!

“Hold on a second!” – you might say – “I’ve just read recently somewhere that having all tables exposed is bad-bad-bad! Why are you doing this?”. I’m very familiar with that post that recently went viral, that really shows a nightmarish situation:

This post highlights a crucial point: choose the appropriate tool for each task. Data manipulation should not be done inefficiently in the client or the backend. Let the database handle it. Relational databases (which nowadays always go beyond the relational model) can do the work for you in optimal ways. Azure SQL Database can do things that looks like magic to optimize data manipulation in ways you’ll never be able to do yourself (for example, figuring out in real time the best JOIN strategy via the newly introduced Adaptive Join feature), as it would be extremely expensive and absolutely impractical, to move all data out of the database and then do the join. Just ike the tweet says.

Make sure that you do the right thing, and use views and stored procedure as needed, and expose those instead of all the tables. But as an example, having all the tables to play with is just fine for this playground, and allows you to get confident with Data API builder.

If you want to install it in your own subscription, to play with the Data API builder configuration file and check out all the features we packed into Data API builder, here’s the GitHub repo with the deployment code that can help you to get started:

https://github.com/azure-samples/dab-adventureworks-lt

Now, just have fun!

How I built a session recommender in 1 hour using Open AI

Davide Mauri — Wed, 18 Oct 2023 14:50:55 +0000

As a developer, I often attend conferences to learn new skills and network with other professionals. However, conferences can be overwhelming, especially when they offer dozens of sessions on different topics. How can I decide which ones are worth my time and attention?

That's why I decided to use OpenAI to create a tool that can help me find the most relevant sessions for my interests. I used the session abstracts of a conference as input and converted them into embeddings using OpenAI's natural language processing capabilities. Then, I used vector search to compare the embeddings with a query topic and rank the sessions by similarity.

This way, I can quickly and easily discover the sessions that match my goals and preferences, without having to read all the abstracts manually. I built this tool in a couple of hours during the weekend, using simple, scalable, and fast technologies.

Here's how I did it. I hope you'll find it useful.

The Architecture

The entire solution is a mix of fullstack, jamstack to be more precise, and event-driven architecture pattern.

Sessions' data, which is structured by nature, is saved into a relational database and the tables and stored procedures are made available to the fronted via REST and GraphQL.

Each time a new session is added, a serverless function is executed to turn abstracts into a vector by using OpenAI model via a REST call.

A (minimal) web frontend allows end users to type a text that will also be converted into a vector using OpenAI and then the most similar vectors - and thus the associated sessions - will be found using cosine similarity.

That's all. Simple, easy, elegant, and scalable: the architecture I love.

Implementation Details

I used Azure for everything. Thanks to the various free and trial offers it is possible to create such a solution completely for free.

The Frontend

I used Azure Static Web Apps to host the fronted, written using React. Simple, well-known and easy to create. I'm new to React, so using this project was an effective way to ramp-up my skills while doing something funny and cool at the same time. I've learned a lot around the ecosystem around it (for example react-router-dom, vite) and the way it works internally.

Azure Static Web Apps provides a fantastic on-prem development experience, and it is heavily integrated with GitHub, so that deployment is just a push to the target repository. Kind of obvious choice.

The Backend

Azure Static Web Apps comes with a cool feature named Database Connections that does a lot of heavy lifting for you. It automatically takes the database objects you configure and make them available via GraphQL and REST.

Database Connections is powered by Data API builder, which is open-source and available also on-premises. It is heavily integrated with Static Web Apps, and the on-prem development experience that they provide is honestly unmatched. I was able to do everything on my machine with no friction at all, no CORS worries, authentication provider emulation, integration with vite tools...it is absolutely amazing! Same as for the frontend, the deployment to Azure is done via a simple git push.

The Database

I used Azure SQL database. It now has a free offer, and it can easily scale to terabytes of data if needed. Not that I think I need a terabyte of data for now...but you never know :)

About vectors

Azure SQL database doesn't have a native vector type, but a vector is nothing more than a list of numbers. Relational databases are in general pretty good at managing list of things (otherwise known as sets 😊, otherwise known as relations 😁) efficiently.

With a dataset the size of the one I'm using, doing a full scan of all available vectors is desirable approach, as it performs exact nearest neighbor search, which provides perfect recall. There is no need to specialized indexes, so Azure SQL looks like a great choice.

In fact, even if the amount of data is not huge, is not insignificant either. OpenAI text embeddings return a vector with 1536 (float) values, so even with just 100 sessions to store the number of calculations to do in order to compute the cosine distance quickly approaches hundreds of thousands, and they should be done in the quickest way possible to accommodate as many requests per second as possible.

Azure SQL uses vector calculations internally (SIMD and AVX512 CPU instructions), to speed up operations on sets of data, and offers a columnstore index that can make operations on sets of data even faster. Not that a columnstore would be really needed for the small amount of data I have, but I wanted to give it a try to see how it would perform. And you'll see that the performances are great. The CPU usage is minuscule, and vector search is done in less than 50 msec on 100 of session abstracts. Impressive performance, and I don't have to install, manage, and integrate another database or a third-party library. Perfect: simplicity for the win!

If you want to dig more into in doing vector search with Azure SQL, read the Vector Similarity Search with Azure SQL database and OpenAI article.

OpenAI integration

Finally, I wanted to keep the solution as simple as possible and as efficient as possible, so I wanted to use the new capability of Azure SQL to call a REST API directly to call OpenAI directly from inside the database to convert the searched text into a vector.

By doing it inside the database, I don't have to create yet another Azure Function just for the purpose of calling the OpenAI REST endpoint and then storing the resulting vector in the database. This approach is something I can switch to in case I need more scalability, but there is no proof that such additional scalability is needed right now, so I decided to start with a simpler architecture.

The Result

The result is here for you to test out, which is something it will not only be interesting but also helpful if you plan to attend the .NET Conf 2023. In fact, I populated the database of the published sample solution with their session abstracts.

I spent more or less one hour building everything, mostly on React and the frontend side. Creating the database and publishing it as REST endpoint took probably no more than 10 minutes. 😍

Try it out yourself here:

https://aka.ms/dotnetconf2023-session-finder

Of course, the OpenAI calls are limited (I'm using the smallest tier) so you might see throttling happening. Please be patient or deploy everything in our subscription to try it out yourself, using the code available here:

https://github.com/azure-samples/azure-sql-db-session-recommender

Vector Similarity Search with Azure SQL database and OpenAI

Davide Mauri — Sun, 04 Jun 2023 20:47:36 +0000

Vector databases are gaining quite a lot of interest lately. Using text embeddings and vector operations makes extremely easy to find similar "things". Things can be articles, photos, products…everything. As one can easily imagine, this ability is great to easily implement suggestions in applications. From providing suggestions on similar articles or other products that may be of interest, to quickly finding and grouping similar items, the applications are many.

A great article to understand how embeddings work, is the following: Introducing text and code embeddings.

Reading the mentioned articles, you can learn that "embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts."

More specifically, embeddings are vectors…hence the great interest for vector databases.

But are vector databases really needed? At the end of the day a vector is just a list of numbers and finding if two vectors represent similar object is as easy as calculating the distance between the vectors. One of the most common and useful distance metric is the cosine distance and, even better, the related cosine similarity

The real complex part is calculating the embeddings, but thanks to Azure OpenAI, everyone has an easily accessible REST service that can used to get the embeddings using pre-trained ML models. In this article we will use OpenAI to generate vectors for doing similarity search and then use Azure SQL database to store and search for similar vectors.

In this article we’ll build a sample solution to find Wikipedia articles that are related to any topic we may be interested in. As usual all the code is available in GitHub:

https://github.com/Azure-Samples/azure-sql-db-openai

The pre-calculated embeddings, both for the title and the body, of a selection of Wikipedia articles, is made available by OpenAI here:

https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

Vectors in Azure SQL database

Vectors can be efficiently stored in Azure SQL database by columnstore indexes. There is no specific data type available to store a vector in Azure SQL database, but we can use some human ingenuity to realize that a vector is just a list of numbers. As a result, we can store a vector in a table very easily by creating a column to contain vector data. One row per vector element. We can then use a columnstore index to efficiently store and search for vectors.

Using this Wikipedia article as starting point, you can see that there are two vectors, one to store title embeddings and one to store article embeddings:

The vectors can be more efficiently stored into a table like this:

CREATE TABLE [dbo].[wikipedia_articles_embeddings_titles_vector]
(
    [article_id] [int] NOT NULL,
    [vector_value_id] [int] NOT NULL,
    [vector_value] [float] NOT NULL
)

On that table we can create a column store index to efficiently store and search for vectors. Then it is just a matter of calculating the distance between vectors to find the closest. Thanks to the internal optimization of the columnstore (that uses SIMD AVX-512 instructions to speed up vector operations) the distance calculation is extremely fast.

The most common distance is the cosine similarity, which can be calculated quite easily in SQL.

Calculating cosine similarity

Cosine similarity can be calculated in SQL using the following formula, given two vectors a and b:

SELECT 
    SUM(a.value * b.value) / (  
        SQRT(SUM(a.value * a.value)) * SQRT(SUM(b.value * b.value))   
    ) AS cosine_similarity
FROM
    vectors_values

Really easy. What is now left to do is to query the Azure OpenAI REST service so that, given any text, we can get the vector representation of that text. Then we can use that vector to calculate the cosine distance against all the Wikipedia articles stored in the database and take only the closest ones which will return the article most likely connect to the topic we are interested in.

Querying OpenAI

Create an Azure OpenAI resource via the Azure portal. For this specific sample you have to deploy an Embedding model using the text-embedding-ada-002 model, the same used for the Wikipedia articles source we are using in this sample. Once that is done, you need to get the API KEY and the URL of the deployed model (read the Embeddings REST API documentation) and then you can use sp_invoke_external_rest_endpoint to call the REST API from Azure SQL database.

declare @retval int, @response nvarchar(max);
declare @payload nvarchar(max) = json_object('input': 'Isaac Asimov');

exec @retval = sp_invoke_external_rest_endpoint
    @url = 'https://<your-app-name>.openai.azure.com/openai/deployments/<deployment-id>/embeddings?api-version=2023-03-15-preview',
    @method = 'POST',
    @headers = '{"api-key":"<your api key>"}',
    @payload = @payload,
    @response = @response output;
The response is a vector of 1536 elements in JSON format. Vector values can be easily extracted using the following T-SQL code:

select [key] as value_id, [value] from openjson(@response, '$.result.data[0].embedding')

Source code

If you are interested in trying this amazing capability by yourself, you can find the source code here:

https://github.com/Azure-Samples/azure-sql-db-openai

Conclusion

The provided sample is not optimized. For example, the square of the vectors: SUM(a.value * a.value) could be pre-calculated and stored in a table for even better efficiency and performance. The sample is purposely simple to make it easier to understand the concept. Even if the sample is also not optimized for performance, it is still quite fast. On an eight vCore Azure SQL database, the query takes only half of a second to return the fifty most similar articles. The cosine distance is calculated on 25,000 articles, for a total of 38 million vector values. Pretty cool, fast and useful!

CTEs, Views or Temp Tables?

Davide Mauri — Fri, 20 Jan 2023 18:03:38 +0000

I've just finished watching the video from the @GuyInACube about Common Table Expressions

and I noticed that in several comments there was the request to explain what is the difference between Common Table Expressions, Views and Temp Tables.

This is quite a common question, and it is time to give it a simple, concise, and clear answer, once and for all.

Common Table Expressions

You can think of a Common Table Expression (CTE) as a table subquery. A table subquery, also sometimes referred to as derived table, is a query that is used as the starting point to build another query. Like a subquery, it will exist only for the duration of the query. CTEs make the code easier to write as you can write the CTEs at the top of your query - you can have more than one CTE, and CTEs can reference other CTEs - and then you can use the defined CTEs in your main query.

CTEs make the code easier to read, and favor reuse: imagine that in each CTE you are defining the subset of data that you want to work on in the main query and you are giving it a label. In the main query then you can just refer to that subset by using its label instead of having to write the whole subquery.

CTEs also allows for some complex scenarios like recursive queries.

Views

Views are metadata objects that allow to save the definition (and the definition only, not the result!) of a query and then use it later by referencing its name. To quote the book I wrote couple of years ago: "A view is nothing more than a query definition, labeled with name, and usable as a table. In fact, views are also known as virtual tables, even if this name is seldomly used"

Temp Tables

Temporary tables are regular tables that must start with # character (or ## for global temporal tables), and on which the query engine can do some special optimization knowing they are ephemeral, that will be automatically dropped once going out of scope (for example, when the connection that created them is terminated).

Temporary tables have no special relationships with queries: you can simply take any query result and save it into a temporary table using, for example, the SELECT INTO command.

Once the SELECT INTO command is finished, the relationship between the query that produced a result set and the temp table that has been used to store that result set is concluded. Think of it as a very simple ETL process. Once it is finished there is nothing that will automatically update or keep in sync the data in the temp table with the result set generated by the query used to move data into the temp table.

When to use what?

Let's start simple

Let's say you have a complex query, where you must put together several different tables to produce the result set you need.

👉 The first thing that is important to keep in mind, that Subqueries, Views and CTEs are all conceptually the same for the query engine. SQL is declarative language where - unless there are some precedence constraints imposed by the operator used - everything is evaluated "all-at-once".

It means that there is no guarantee that a subquery (or a CTE or a view) will be executed before the query that uses it.

This is a super important concept to grasp. It might sound strange to you, but this is a key feature of SQL as the query optimizer can decide - as long as the final result will be correct - to apply filters and optimization wherever is more appropriate.

Let's say, for example, that you have a subquery that filters all people that live in Seattle. On this subquery you want to add an additional filter to limit the result set only to return people that are named Davide.

👉 Thanks to the fact that SQL is "all-at-once" the optimizer can push the outer filter down to the inner subquery and immediately search for all those people who live in Seattle and are named Davide.

If SQL hand't been "all-at-once", the database engine would have had to first execute the query and then search among the resulting rows for only those for which the name is Davide.

That would have been an incredible waste of resources - more CPU and memory used - and would have provided a much worse performance result.

In addition to that, it would also make index usage much more complex and less likely. If we had an index on the people's name, it would be useful only if we could filter first by name and then by city. Luckly the optimizer can move the filters around, given that the query simply states what you want and not how to get it. If we had, instead, to firstly execute the subquery and then the outer query, well...you can guess that the index wouldn't have been that useful, right?

Let's complicate things a bit

So far, from what has just been explained, it seems that using a subquery, a view, or CTE is always a clever idea as we're just allowing the query optimizer to do its work at best.

On paper yes. In practice, we have to take into account that the query optimizer doesn't really know exactly what values are contained in each table and how data is distributed. Does it have a normal distribution? Or is skewed toward some specific values?

🚗 You can think of the query optimizer as your car navigation system. When it plans for a route, it will do so considering the most up-to-date and accurate information about traffic...but you won't be 100% sure that the road it tells you to drive will be the best choice until you are there. What if there has been a sudden surge in traffic for whatever reason? Well, you're there now and you just must wait in queue (or take another road, of course).

🙀 The database is similar in the sense that it will have statistical information of how data is distributed within a table. This is useful to try to decide the best strategy to return the data you're asking in the query, but it also comes with the fact that statistics come with a certain degree of error. This means that the query engine may estimate that the subquery will return "X" rows, but when it executes it will really return "Y" rows. If you have nested subqueries, the error will propagate and can get amplified, up to the point that - potentially - the query optimizer will try to use index "A" as it thinks at some point there will be only - for example - 10 rows involved, but there will be 10K for real, making the index usage a potentially bad choice.

The amplitude of the error propagation and amplification is completely dependent on the query itself, the data in your tables and other factors (updates statics, partitions, etc.). The more table references you have, the more likely this is to happen. Imagine a complex query using CTEs calling several Views which have subqueries inside. Estimation errors will potentially pile up.

How do you see if you are having estimation errors? Execution Plans are your friend. They'll show what steps will be taken to generate the result set and for each step they will show the estimated and the current number of rows touched.

If you notice an estimation completely gone wrong (like several orders of magnitude differences) you may need to give the query optimizer some help.

Temporary tables to the rescue?

Temporary tables can help to greatly reduce or even fix the poor row estimation due to the aforementioned error amplification. How? Well, by storing the result of a subquery into a temporary table, you are resetting such error amplification as the query engine can use the data in the temporary table and thus make sure it is not guessing too much anymore.

Another reason to use a temporary table is if you have a complex query that needs to be used one or more time in subsequent steps and you want to avoid spending time and resource to execute that query again and again (especially if the result set is small compared to the originating data and/or the subsequent queries will not be able to push any optimization down to the subquery as you are working on aggregated data, for example)

But there is no "one-solution-fits-all" here. You must try to see if, for your use case, a subquery is enough, or a temporary table is needed to give the query engine some leverage to get better estimations and thus a better execution plan.

Keep also in mind that using temporary tables comes with some overhead. Aside from the obvious space usage, resources - and thus time - will be spent just for loading them. Sometimes you might even need to create indexes on temporary tables to make sure subsequent query performances are at the top.

The data persisted in the temporary table, also, is not automatically kept up to date with any changes that might be made to the data in the tables used in the originating query. It is your responsibility to refresh the data on the temporary table anytime you need it (Another option would be to use Indexed Views: see below for more details on this feature).

Other stuff that you may want to know

Indexed Views

A special kind of Views, the Indexed Views, can be created so that the produced result is materialized and persisted into the database data file. With Indexed Views, the result doesn't need to be re-calculated every time, so they are great for improving read performances. In HTAP scenarios they can help to get a great performance boost. The database engine will also make sure that every time data in one of the based tables used in an Indexed View is updated, the persisted result is updated too, so that you always have fresh and updated values.

Inline Table-Valued Functions (aka Parametrized Views)

Sometimes you would like to have a View with parameters, to make it easier to return just the subset of values you are interested in. In Azure SQL and SQL Server, you can create parametrized views. They fall (more correctly, IMHO) under the umbrella of "Functions", and specifically they can be created by using Inline Table-Valued Functions:

Conclusion

Now you should have a clear picture of what is the difference between CTEs (or subqueries), Views and Temp Tables.

My recommendation is to start with a CTE and then use temporary tables as needed, so that you can get the performance you want with the minimum overhead possible. (I like to say that usage of temporary table is like salt with foods. You can always add it later.)

If you still have questions, make sure to leave them in the comments, so that we can keep the discussion on!

Photo by Pixabay from Pexels

Advent of Code - Day 10

Davide Mauri — Sun, 11 Dec 2022 04:39:18 +0000

Last week I've been to the DevIntersection conference to present several sessions around Azure SQL and development (Modern Architecture Patterns with Azure SQL Database, The 10 things every developer must absolutely know about Azure SQL and Build a Jamstack solution in a day) so...yeah, I'm already falling behind with the challenges. Anyway. I'll try to catch up with the challenge I missed later.

I've also started using some of the new or updated language elements introduced in SQL Server 2022, also available in Azure SQL.

Part 1

Challenge 10 can be solved using a non-equi join, so that each command provided as input will have exactly one line per cycle. Here's how I did it.

After importing the input using the usual STRING_SPLIT I have a table with one row per command:

All commands operate only on the fictional variable X, which the challenge said start being set to 1.

Using a running total, I can calculate cycle number at which each command is issued and what is the final value if X once the command has completed:

select
    *,
    sum(cycles) over (order by ordinal) as end_cycle,
    isnull(sum([value]) over (order by ordinal), 0) + 1 as end_value
from
    #commands

On that resultset, using the LAG operator, I can identify what is the value at the start and during the command execution, and what is the final value once the command is done. The challenge says, in fact, that the value of X is changed only once the command is finished, not at the beginning or during the operation.

select
    ...
    lag(end_cycle, 1, 0) over (order by ordinal) as start_cycle,
    end_cycle,
    lag(end_value, 1, 1) over (order by ordinal) as start_value,
    end_value
from
    ...

The result is a table with all the data needed to resolve the challenge.

Now I only have a row per command, but instead I need a row per cycle.

Not a big issue, since I have the start and end cycle of each command. I can generate a row per cycle by joining the one-row-per-command table with the usual numbers table (this time I'm using the new GENERATE_SERIES introduced in SQL Server 2022 and available also on Azure SQL), I need to use a non-equi join to generate as many rows as used cycles:

select
    ...
from
    #command_details cd
inner join
     generate_series(1, 10000) n on n.value-1 >= cd.start_cycle and n.value <= cd.end_cycle

Now I have one row per cycle:

The next step is a simple aggregation, filtering by the requested cycles:

select 
    sum([cycle] * [start_value]) as [signal_strength]
from
    #cycles_exploded
where 
    cycle in (20, 60, 100, 140, 180, 220)

Part 1 done.

Part 2

Part 2 is very interesting, as the goal is to "visualize" the result of a fictional low-res CRT display. The display only has 40 rows and 6 columns. The first thing I had to do was to convert to cycle value into horizontal and vertical coordinates. A division is enough to do the trick, and then I had to make sure that for each CRT line the leftmost position was set to 0, as explained in the challenge text:

select 
    *,
    (cycle-1)/40 as [line],
    row_number() over (partition by (cycle-1)/40 order by cycle) - 1 as crt_pos
from
    #cycles_exploded
order by

then the only thing left to do is to implement the logic to understand what character will be printed on the CRT, as described in the challenge:

iif(crt_pos between start_value - 1 and start_value + 1, '#', ' ') as [crt_char]

which will result in the following table:

and finally aggregate (making sure aggregation is done respecting the defined order, via the WITHIN GROUP supported by STRING_AGG) the characters values into a string so that the solution can appear. Part 2 solved.

Code

The solution as usual is on GitHub: https://github.com/yorek/aoc-2022/tree/main/day-10

Advent of Code - Day 4

Davide Mauri — Mon, 05 Dec 2022 02:59:35 +0000

Day 4 - Camp Cleanup challenge is all about dealing with intervals. Intervals are trickier and more complex than one might think. No surprise the organizer of the Advent Of Code, included them in their challenges.

I did a lot of research on intervals, and specifically time intervals in the past, with a specific focus on how they can be applied to Business Intelligence and more specifically to Fact Table. Couple of slide decks I created lately on the subject are the following:

As usual I'm importing the input data and storing it into a table. Input data contains pairs of intervals, so I'm splitting the pairs and the interval begin and end into dedicated columns, for easier manipulation:

create table dbo.ch04_input 
(
    id int identity not null primary key,
    pair1_b int,
    pair1_e int,
    pair2_b int,
    pair2_e int
);

with cteLines as
(
    select 
        trim(replace([value], char(13), '')) as [line]
    from
        string_split(@input, char(10))
),
ctePairs as
(
    select 
        left([line], charindex(',', [line])-1) as pair1,
        right([line], len([line]) - charindex(',', [line])) as pair2
    from
        cteLines
)
insert into 
    dbo.ch04_input (pair1_b, pair1_e, pair2_b, pair2_e)
select
    left([pair1], charindex('-', [pair1])-1)  as pair1_b,
    right([pair1], len([pair1]) - charindex('-', [pair1])) as pair1_e,
    left([pair2], charindex('-', [pair2])-1) as pair2_b,
    right([pair2], len([pair2]) - charindex('-', [pair2])) as pair2_e
from
    ctePairs

the result is the following:

Part 1

The challenges ask to find all the pairs where one interval is completely included in the other. We must use the CONTAINS operator that can be expressed using simple math:

the query, therefore, is:

select
    count(*) 
from 
    ch04_input
where
    (pair1_b >= pair2_b and pair1_e <= pair2_e) -- pair1 CONTAINS pair2
or
    (pair2_b >= pair1_b and pair2_e <= pair1_e) -- pair2 CONTAINS pair1

Challenge solved.

Part 2

The second challenge requires to find all the pairs in which interval overlaps. There's an operator for that, and the math is even simpler:

The query then is:

select
    count(*)
from 
    ch04_input a
where
    (pair1_b <= pair2_e and pair2_b <= pair1_e) -- OVERLAPS

Challenge completed.

Advent of Code - Day 3

Davide Mauri — Sun, 04 Dec 2022 18:50:35 +0000

Today's Advent of Code challenge is really interesting. Somehow easy, but with a couple of interesting discussion points.

Day 3: Rucksack Reorganization is about helping elves to organize and prioritize their supplies.

I imported the input data as usual, by coping it from the website to my Azure Data Studio query and then using STRING_SPLIT to import each single line - that represents the rucksack content - in its own row:

declare @input nvarchar(max) = 'QLFdFCdlLcVqdvFLnFLSSShZwptfHHhfZZZpSwfmHp
rTJRjjbJTgzDJjdsRsfwtfNwtfmZpZNhmmzt
...
jGrGqjJfqccrfqGcGplrJpFvzggqmCtMzmsMnvMvvCgm';

drop table if exists dbo.ch03_input;
create table ch03_input 
(
    id int identity not null primary key,
    items varchar(100) collate Latin1_General_BIN2
);

insert into 
    dbo.ch03_input (items)
select 
    trim(replace([value], char(13), ''))
from
    string_split(@input, char(10))
go

The items in the rucksacks are identified by a letter and the identifiers are case sensitive. For this reason, I used the Latin1_General_BIN2 collation that will make sure I adhere to the requirements and get the best performance possible with strings, as explained in Day 2 challenge solution post.

Part 1

In the first part of the challenge, the rucksack list is split in two, and you must find the item type - represented by its letter - that is in both lists. It is a string comparison problem: which letter of the first list is also in the second list?

The first step is to split the list into two list of the same size:

select 
    *,
    len(items) as itemcount,
    left(items, len(items)/2) as comp1,
    right(items, len(items)/2) as comp2
into    
    #step1
from 
    ch03_input;

I also calculate the length of string as it will come useful in the next step, where I'll split the rucksack string into its letters and store each letter in its own row:

select top(100) row_number() over (order by a.object_id) as n into #n from sys.columns a cross join sys.columns b

select 
    *,
    substring(items, n, 1) as item 
into
    #step2 
from 
    #step1 s  
cross join
    #n n
where 
    n.n <= s.itemcount

Splitting a string in its letters is easy if you have a table with numbers, which is what I'm creating as first thing in the query above. Then I use that numbers table to generate a row for each letter in the string, via the CROSS JOIN and for each row generate extract the Nth letter of the string. The WHERE clause uses itemcount to make sure that I generate exactly one row for each letter in each string, and no more than that.

Then I need to find which item is in both compartments. This means checking if a letter is in a string and that can be done using CHARINDEX:

select distinct
    id, items, comp1, comp2, item,
    charindex(item, comp1, 1) as p1, 
    charindex(item, comp2, 1) as p2
into 
    #step3
from 
    #step2 
where 
    charindex(item, comp1, 1) != 0 
and 
    charindex(item, comp2, 1) != 0
order by id

The query needs a DISTINCT as there can be more than one item of the same type in the string, I need just only one per type. The fact that I need to use a DISTINCT rings some bells (or bring some smell): I'm fairly sure I can refactor my code to do this operation earlier and avoid checking for a letter that has been found already. I'll do this later if I have enough free time. For now, I want to see if my solution works; after that I can optimize it.

Now that the items present in both compartments have been found, I have to assign to each item type the priority value as described in the challenge. Priorities values are based on alphabetical order, so I can use ASCII to get the letter value and transform it to the corresponding priority value. Priority values are ordered differently than the ASCII order, so a CASE statement is needed to apply the right transformations:

select
    item,
    case 
        when item like '[a-z]' then ASCII(item) - ASCII('a') + 1
        when item like '[A-Z]' then ASCII(item) - ASCII('A') + 27
    end as priority,
    id,
    comp1,
    comp2
into
    #priorities
from
    #step3
order by
    item

And now just summing all the priorities will give the answer:

select sum(priority) from #priorities

Answer is correct, so let's move to the next part of the challenge.

Part 2

Elves gather in groups of three, and the goal is to find which item type is carried by everyone in the group.

The first step is to create a way to easily group the elves together. By using the existing id columns and the modulo operator I can find when a new group begins. When the result of the modulo operation is equal to one:

I just need to know when a group starts, so I can set everything not equal to 1 to 0:

Now I can then use a simple and fast running total to generate a group_id that will allow quick identification of all items in a single group. Amazing, isn't it?

Funny enough my SQL Guru friend Itzik mentioned this technique with the running total when we met yesterday evening. Funny that I would have needed it right the next day. Thanks Itzik!

The final query is the following:

select 
    *,
    len(items) as itemcount,
    sum(case when (id % 3) != 1 then 0 else 1 end) over (order by id) as group_id
into    
    #step1
from 
    ch03_input;

easy, fast, and elegant!

Once that a way to identify each group is there, the challenge is almost solved. It is just a matter of splitting the strings into their letters, as I did for part one too.

select 
    *,
    substring(items, n, 1) as item 
into
    #step2 
from 
    #step1 s  
cross join
    #n n
where 
    n.n <= s.itemcount

Now I have everything I need to see which letter is present in all three rucksacks. As simple GROUP BY filtering only those letters that appears exactly three times by using the HAVING clause will give the answer:

with cte as
(    
    select distinct id, group_id, item from #step2 
)
select
    group_id,
    item
into
    #step3
from
    cte
group by 
    group_id, item
having
    count(*) = 3
order by
    group_id

The tricky part here is the DISTINCT operator in the Common Table Expression. That DISTINCT makes sure I can differentiate between an item appearing three times in the same rucksack vs an item appearing one item in all three rucksacks. We're interested only in the latter, and not in the first.

Now I just have to apply the same logic to get the priority value used before, calculate the overall total and I'll get the solution to Part 2. Challenge done.

Try it yourself

The full solution is available here: yorek/aoc-2022

The technique to deal with islands of data is useful in so many practical uses case that I really recommend you to deep dive into it. Take advantage of the free book chapter on the subject available here: Gaps and islands

Alternative solution to Part 2

An alternative solution could have been a simple JOIN between the three rucksacks in the same group:

select distinct
    a.id as group_id,
    a.item as item
into
    #step3
from 
    #step2 a 
inner join
    #step2 b on a.id + 1 = b.id and a.item = b.item
inner join
    #step2 c on b.id + 1 = c.id and b.item = c.item
where
    a.id % 3 = 1
order by a.id
go

That would work just fine, but it will only work for a group of exactly three elves. While perfectly fitting the requirement, I find the chosen solution gives more flexibility and is way more elegant and future proof. Or agile if you wish. It requires a bit of lateral thinking, which is always a good ability to exercise, so great to have a chance to use it. Given that the resulting query touches the table only once instead of, I also suspect it will also be faster. It is worth digging into it a bit more if you have time.

Have fun!

Advent of Code - Day 2

Davide Mauri — Fri, 02 Dec 2022 21:51:31 +0000

The second challenge of the Advent of Code 2022 is pretty straightforward with SQL. In summary the task is to use some starting values and transform those into a numeric value using a lookup table, and then calculate the sum of all the values you get. If you are familiar with relational databases this should sound like a JOIN operation to get the lookup value and a GROUP BY to get the results.

The background story is that you are playing Tic-Tac-Toe with the elves. You are given an encrypted strategy guide that you have to follow if you want to win.

Let's start importing the input. As yesterday I've pasted the input values in a query and then I'm using STRING_SPLIT to move everything into a more comfortable table:

declare @input varchar(max) = 'B Y
A Y
B Z
...
A Y';

drop table if exists dbo.ch02_input;
with cte as
(
    select replace(value, char(10), '') as [round] from string_split(@input, char(13))
)
select 
    identity(int, 1, 1) as id,
    left([round], 1) as [opponent], 
    right([round], 1) as [player] 
into 
    [ch02_input]
from cte;

Full script is available on GitHub here: day-02/00-setup.sql

Part 1

In part one you have to assign to each shape a value. I built the set on the fly using the Row Constructors

select 
        * 
from 
    (values
        ('A', 'Rock', 1),
        ('Y', 'Paper', 2),
        ('B', 'Paper', 2),
        ('X', 'Rock', 1),
        ('C', 'Scissors', 3),
        ('Z', 'Scissors', 3)
    ) decode(code, [shape], [value])

and then all I had to do was join the above set with the input table, to convert the shapes into the associated value. I stored the result into the #result temporary table.

The last step to complete the task is to calculate if I won, tied, or lost each round. While I'm sure there are better ways to do that, given that the number of combinations is extremely limited, I went for a super simple solution, using the CASE statement (I'm really all in for KISS approach):

select 
    *,
    case 
        when opponent_shape = player_shape then 3 -- Tie
        when opponent_shape = 'Rock' and player_shape = 'Paper' then 6 -- Won
        when opponent_shape = 'Rock' and player_shape = 'Scissors' then 0  -- Lost
        when opponent_shape = 'Paper' and player_shape = 'Rock' then 0 -- Lost
        when opponent_shape = 'Paper' and player_shape = 'Scissors' then 6  -- Won
        when opponent_shape = 'Scissors' and player_shape = 'Paper' then 0 -- Lost
        when opponent_shape = 'Scissors' and player_shape = 'Rock' then 6 -- Won
    end as outcome
from 
    #rounds;

Now, to calculate the overall score I did I just need to sum all my games:

select sum(player_value + outcome) from #results

Part 1, done. Find the full script here: day-02/01-part1.sql

Part 2

In part two you discover that you didn't really decoded the original encrypted strategy guide. If fact, the X, Y and Z letter tells you not which shape you should play, but what should be the outcome of that game: "X means you need to lose, Y means you need to end the round in a draw, and Z means you need to win."

To solve the challenge then, I only needed to transform the X, Y and Z into the related A, B and C, based on the given logic. Again, with a CASE statement is pretty easy:

select 
    e.g.*,  
    case
        when player = 'Y' then opponent -- Must tie
        when player = 'X' then -- Must lose
            case opponent
                when 'A' then 'C'
                when 'B' then 'A'
                when 'C' then 'B'
            end
        when player = 'Z' then -- Must win
            case opponent 
                when 'A' then 'B'
                when 'B' then 'C'
                when 'C' then 'A'
            end
    end as player_decoded
from 
    dbo.ch02_input as eg
order by
    eg.id;

With these results I can just then apply the same queries used in Part One to calculate the round results and then get the overall result points. The full script for part two is here: day-02/02-part2.sql

Additional notes

With such small datasets performances are almost never an issue. If you were to use a much larger dataset, say 100 times bigger than this, I would suggest three things to make sure you'll get the best performances possible

If you can, use numbers - integers - as identifiers. Those are much faster when aggregations are required. Strings are really expensive from a CPU perspective.
If you cannot use numbers as identifiers for any reason, make sure you create columns or operate on string using a binary collation. That will make string comparisons (and thus aggregations) much faster as the engine doesn't have to take into account casing, accents and so on. A collation like Latin1_General_BIN2 is your friend when a string is used as id. (Binary Collations, Colummn-level Collations, Expression-level Collations)
Use the columnstore indexes whenever you need to boost the aggregation performance.

Advent of Code - Day 1

Davide Mauri — Thu, 01 Dec 2022 23:51:00 +0000

The first challenge of the Advent of Code 2022 is out, and this year I decided to try to solve the proposed challenges only using T-SQL. I also want to share the solutions here, as I think they will provide a great learning experience for those who are new to SQL, and get a sense of how powerful it is. Here's the solution to the first one (I'll try to keep up with the challenges every day, but I really can't promise anything as it really depends on how much free time I'll have after work and family needs...)

The problem is about elves (of course!), food and calories. You start with a list of calories that elves are bringing with them. The list you are given as the starting point of this challenge has empty lines to separate the calories of each elf in their own inventory. Here's an example, simplified and moved to a spreadsheet for easier understanding:

Let's start importing the calories data. I just pasted the content from the Advent of Code website and the used the STRING_SPLIT function to turn the string into a table and then move the data into the final table I'll be using for this challenge:

drop table if exists dbo.ch01_input;
create table dbo.ch01_input
(
    id int identity not null primary key,
    calories int null
)
go

declare @calories nvarchar(max) = '
3264
4043

...<content from the Advent of Code file here>...

6438
1020';

insert into 
    dbo.ch01_input 
select 
    cast(nullif(replace([value], char(13), ''), '') as int) as calories 
from 
    string_split(@calories, char(10))
go

As you can see, I'm also converting the empty lines into NULL values, so that they can fit into a INT data type as calories values are just numbers. Choosing the right data type is good for data hygiene and to avoid expensive cast operations in future. Strings requires quite a lot of CPU power compared to numbers, and on the cloud, since your pay for what you use, you really want to optimize performances to reduce costs. Not that in this case it would matter as there are just a few thousand rows, but it is nonetheless a good habit to have. I do not want to overengineer, but just some common sense is always good to apply.

Part 1

The first challenge is to "Find the Elf carrying the most Calories. How many total Calories is that Elf carrying?"

That is easy! From the spreadsheet you can see that each elf has a nice set of values, so the problem can be easily solved if we could find a way to easily identify and work on these sets of values. In fact, as per challenge description: "Each Elf separates their own inventory from the previous Elf's inventory (if any) by a blank line". Too bad there are two thousand values so we can't do it visually. The first solution that can come to mind is to just start from the first row and process all of them in sequence, assigning the calories values to a new elf whenever a white line is found. That would work, but it just feels like a "brute force approach". We can do better: we can be smart.

We can use some simple math to identify all the sets of values in the provided list. All that is needed is to give each row a sequential number based on its ordinal position in the file, including the blank lines, and another sequential number excluding, this second time, the blank rows. Here's an example using the simplified spreadsheet representation:

N1 and N2 represent the ordinals given to each element, based on its order, as mentioned before.

Now, if you subtract the values in N2 from the values in N1, you'll get a group identifier:

Isn't that cool! In T-SQL this can be done easily, thanks to the row_number() function:

drop table if exists #part1;
select 
    group_id = id - row_number() over (order by id),
    * 
into
    #part1
from 
    dbo.ch01_input
where
    calories is not null

Now that #part1 contains the data with also the group identifier, a group by can give the answer:

select top(1)
    group_id, sum(calories) as totcalories
from
    #part1
group by
    group_id
order by
    totcalories desc

Part 2

Once the first part of the challenge is done, you'll have access to the second part. The question this time is: "Find the top three Elves carrying the most Calories. How many Calories are those Elves carrying in total?"

Again, pretty easy now that we have a group identifier. We just must take the top three and sum them up:

with cte as
(
    select top(3)
        group_id, sum(calories) as totcalories
    from
        #part1
    group by
        group_id
    order by
        totcalories desc
)
select sum(totcalories) as totaltop3 from cte

Done!

Gaps and Islands

The technique used to solve this problem is well known and my friend and SQL Guru Itzik Ben-Gan has some great articles on it and how it can be used to solve complex problems. Here's some references for you do dig into it:

Push the data out

Davide Mauri — Tue, 22 Nov 2022 17:34:16 +0000

Event-Driven and reactive architectures (see the "Reactive Manifesto" if you're new to the topic or interested to learn more about it) are very popular today. Events - and thus data - is pushed to services so that integration can happen faster and more efficiently.

With Azure SQL database it is possible to call a REST API using the stored procedure sp_invoke_external_rest_endpoint. It is available in any Azure SQL Database (so, no Azure SQL Managed Instance or SQL Server for now) and it as easy to use as you would expect:

declare @response nvarchar(max);

exec sp_invoke_external_rest_endpoint 
    @url = N'https://say-hello.azurewebsites.net/api/hello-message',
    @response = @response output;

select * from openjson(@response);

Take a look at the documentation here: sp_invoke_external_rest_endpoint (Transact-SQL) and unleash your creativity. The possibilities that this feature opens are almost infinite.

The feature is in Public Preview, which means that this is the perfect time to give us your feedback, if you think something can be improved or needs to be changed. Feel free to comment here below, if you have any questions or ideas you want to share.

In the meantime, happy coding! :)

DEV Community: Davide Mauri

Retrieval Augmented Generation with Azure SQL

RAG Pattern 101

Why RAG?

RAG Steps

RAG pattern in Azure

Frontend

Backend

Database

Code and Demo

Conclusion

Share your dev wishes 👍

OpenAPI for your Azure SQL database

How I built a session recommender in 1 hour using Open AI

The Architecture

Implementation Details

The Frontend

The Backend

The Database

About vectors

OpenAI integration

The Result

Vector Similarity Search with Azure SQL database and OpenAI

Vectors in Azure SQL database

Calculating cosine similarity

Querying OpenAI

Source code

Conclusion

CTEs, Views or Temp Tables?

Common Table Expressions

Views

Temp Tables

When to use what?

Let's start simple

Let's complicate things a bit

Temporary tables to the rescue?

Other stuff that you may want to know

Indexed Views

Inline Table-Valued Functions (aka Parametrized Views)

Conclusion

Advent of Code - Day 10

Part 1

Part 2

Code

Advent of Code - Day 4

Part 1

Part 2

More content

Advent of Code - Day 3

Part 1

Part 2

Try it yourself

Alternative solution to Part 2

Advent of Code - Day 2

Part 1

Part 2

Additional notes

Advent of Code - Day 1

Part 1

Part 2

Gaps and Islands

Push the data out