DEV Community: Brainhub

Make Notion search great again: Vector Database

Łukasz Pluszczewski — Mon, 20 Nov 2023 09:15:57 +0000

In this series we’re looking into the implementation of a vector index built from the contents of our company Notion pages that allow us not only to search for relevant information but also to enable a language model to directly answer our questions with Notion as its knowledge base. In this article, we will see how we’ve used a vector database to finally achieve this.

Numbers, vectors, and charts are real data unless stated otherwise

Last time we downloaded and processed data from Notion API. Let’s do something with it.

Vector Database

To find semantically similar texts we need to calculate the distance between vectors. While we have just a few short texts we can brute-force it: calculate the distance between our query and each text embedding one by one and see which one is the closest. When we deal with thousands or even millions of entries in our database, however, we need a more efficient way of comparing vectors. Just like for any other way of searching through a lot of entries, an index can help here. To make our life easier we’ll use Weaviate DB - a vector database that implements the HNSW vector index to improve the performance of vector search.

There are a lot of different vector database you can use. We’ve used Weaviate DB because it has reasonable defaults, including vector and BM25 indexes working out of the box and a lot of features that can be enabled with modules (like “rerank” mentioned before). You can also consider postgres extension “pgvector” to take advantage of SQL goodness: relations, joins, subqueries and so on while weaviate may be more limited in that regard. Choose wisely!

I may revisit the topic of vector indexes in the future but in this article I’ll just use the database that implements it. To learn more about HNSW itself look here, and to learn more about configuring vector index in Weaviate DB look here.

Weaviate DB

Weaviate DB is an open-source, scalable, vector database that you can easily use in your own projects. The vector goodness is just one docker container away and you can run it like this:

docker run -p 8080:8080 -d semitechnologies/weaviate:latest

Weaviate is modular, and there are a number of modules allowing you to add functionality to your database. You can provide the embedding vectors to the database entries yourself, but there are modules to calculate those for you, like text2vec-openai module that uses the openAI API. There are modules allowing you to easily backup your DB data to S3, add rerank functionality to your searches, and many more. Enabling a module is as simple as adding an environment variable:

docker run -p 8080:8080 -d \
  -e ENABLE_MODULES=text2vec-openai,backup-s3,reranker-cohere \
  semitechnologies/weaviate:latest

Now, to connect to the database from our typescript project:

import weaviate from 'weaviate-ts-client';

const client = weaviate.client({
  scheme: 'http',
  host: 'localhost:8080',
});

All the data in Weaviate DB is stored in classes (equivalent to tables in SQL or collections in MongoDB), containing data objects. Objects have one or more properties of various types, and each object can be represented by exactly one vector. Just like SQL databases, Weaviate is schema-based. We define a class with its name, properties, and additional configuration, like which modules should be used for vectorization. Here is the simplest class with one property.

{
  class: 'MagicIndex',
  properties: [
    {
      name: 'content',
      dataType: ['text'],
    },
  ],
}

We can add as many properties as we like. There are a number of types available: integer, float, text, boolean, geoCoordinates (with special ways to query based on the location), blobs, or lists of most of these like int[] or text[]:

{
  class: 'MagicIndex',
  properties: [
    { name: 'content', dataType: ['text'] },
    { name: 'tags', dataType: ['text[]'] },
    { name: 'lastUpdated', dataType: ['date'] },
    { name: 'file', dataType: ['blob'] },
    { name: 'location', dataType: ['geoCoordinates'] },
  ],
}

You can also control how, and for what properties the embeddings are going to be calculated if you don’t want to provide them yourself:

{
  class: 'MagicIndex',
  properties: [
    { name: 'content', dataType: ['text'] },
    {
      name: 'metadata',
      dataType: ['text'],
      moduleConfig: {
        'text2vec-openai': {
          skip: true,
        },
      },
    },
  ],
  vectorizer: 'text2vec-openai',
}

In this case, we’re going to use the text2vec-openai module to calculate vectors but only from the content property.

Weaviate stores exactly one vector per object so if you have more fields that are vectorized (or you have vectorizing class name enabled) embedding is going to be calculated from concatenated texts. If you want to have separate vectors for different properties of the document (like different chunks, title, metadata etc.) you need separate entries in the database.

Applying a schema is as simple as:

await client.schema
  .classCreator()
  .withClass(classDefinition)
  .do();

Let’s see what the data objects look like in our Notion index:

{
  pageTitle: 'Locomotive Kinematics of Quick Brown Foxes: An In-Depth Analysis of Canine Velocity Over Lazy Canid Obstacles',
  chunk: '1',
  originalContent: '# Abstract\n\nThe paradigm of quick brown foxes leaping over lazy dogs has long fascinated both the scientific community and the general public...',
  content: 'abstract\nthe paradigm of quick brown foxes leaping over lazy dogs has long fascinated both the scientific community and the general public...',
  pageId: 'dfda9d5d-b059-4186-95f4-7cb8cdf42545',
  pageType: 'page',
  pageUrl: 'https://www.notion.so/LeapFoxSolutions/dfda9d5d-b059-4186-95f4-7cb8cdf42545',
  lastUpdated: '2023-04-12T23:20:50.52Z'
}

Let’s get what is obvious out of the way: we store the page title, its ID, URL, and the last update date. We also vectorize only content property: the vectorizer ignores the title, originalContent, and so on.

You probably noticed a chunk property though. What is it? For vectors to work best it is preferable that texts are not too long. They are generally used for texts not longer than a short paragraph so we split the contents of Notion pages into smaller chunks. We’ve used the lanchain's recursive text splitter. It tries to split the text first by double newline, if some chunks are still too long by a single new line, then by spaces, and so on. This way we keep paragraphs together if possible. We’ve set the target chunk length to 1000 characters with a 200-long overlap.

The length of the chunks and the way you split them can have a huge impact on vector search performance. It is generally assumed that chunk size should be similar to the length of the query (so during the search you compare vectors of similarly sized texts). In our case chunks 1000 characters long, although pretty big, seem to work best but your mileage may vary. Additionally, we also make sure that table rows are not sliced in half to avoid “orphaned” columns. This is a huge topic and I may revisit it in one of the future posts.

We save each chunk separately in the database and the chunk property is an index of the chunk. Why is it string and not number though? Because we don’t vectorize the title property, we save a separate entry for it that looks like this:

{
  pageTitle: 'Locomotive Kinematics of Quick Brown Foxes: An In-Depth Analysis of Canine Velocity Over Lazy Canid Obstacles',
  chunk: 'title',
  originalContent: 'Locomotive Kinematics of Quick Brown Foxes An In-Depth Analysis of Canine Velocity Over Lazy Canid Obstacles',
  ...
}

In the future, we may decide that we want to vectorize more properties of the page than just content and title. We can do that easily just by adding a new possible value to the chunk property.

What’s the deal with content and originalContent properties? To spare the vectorizer some noise in the data, we prepare a cleaned-up version of each chunk. We remove all special characters, replace multiple whitespaces with a single one, and change the text to lowercase. In our testing, vector search is slightly more accurate with this simple cleanup. We still keep originalContent though because this is what we pass to rerank and use for traditional, reverse index search.

Lastly, we have pageType property which is just a result of a Notion quirk: a page in Notion can be either a page or a database. As mentioned in the previous article, we treat both the same way in our index: databases are converted to simple tables.

Ok, we have an idea of what data we are going to store in the database, but how to add, fetch, and query that data?

Weaviate interface

Weaviate offers two interfaces to interact with it, RESTful and graphQL APIs and it is reflected in the available typescript client methods. We will focus on the graphQL interface. To get entries from the database, we need to simply provide a class name and the fields we want to get

client.graphql
  .get()
  .withClassName('MagicIndex')
  .withFields('pageTitle originalContent pageUrl');

It is recommended that each query is limited and uses cursor-based pagination if necessary:

client.graphql
  .get()
  .withClassName('MagicIndex')
  .withFields('pageTitle originalContent pageUrl')
  .withLimit(50)
  .withAfter(cursor);

Let’s add some entries to the database:

await client.data
  .creator()
  .withClassName('MagicIndex')
  .withProperties({
    pageTitle: 'Vulpine Agility vs. Canine Apathy: A Comparative Study',
    chunk: '2',
    originalContent: '## Background \n\n Though colloquially immortalized in typographical tests, the scenario of a quick brown fox vaulting over a lazy dog presents...',
    content: 'background\nthough colloquially immortalized in typographical tests the scenario of a quick brown fox vaulting over a lazy dog presents...',
    pageId: '1ba0b851-d443-4290-8415-3cd295850d14',
    pageType: 'page',
    pageUrl: 'https://www.notion.so/LeapFoxSolutions/1ba0b851-d443-4290-8415-3cd295850d14',
    lastUpdated: '2023-03-01T12:21:30.12Z'
  })
  .do();

With vectorizer enabled for MagicIndex class, that’s all we need to do. The entry is added to the database together with its vector representation calculated by OpenAI’s ADA embedding model. Now we can search for texts about foxes and dogs all day long.

Traditional search

Weaviate allows us to search with traditional reverse index methods too! We have a bag-of-words ranking function called BM25F at our disposal. It’s configured with reasonable defaults out of the box. Let’s see it in action:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withBm25({
    query: 'Can the fox really jump over the dog?',
    properties: ['originalContent'],
  })
  .withLimit(5)
  .withFields('pageTitle originalContent pageUrl _additional { score }')
  .do();

You can see the _additional property that we can request in the query. It can contain various additional data related to the object itself (like its ID) or the search (like BM25 score or the cosine distance in case of vector search).

Vector search

Of course, a reverse index search will not find many texts that, while talking about brown foxes, don’t use those words. Thankfully, semantic search is as easy to perform:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withNearText({ concepts: ['Can the fox really jump over the dog?'] })
  .withLimit(5)
  .withFields('pageTitle originalContent pageUrl _additional { distance }')
  .do();

There is some additional magic that we can do to make the search even better like setting the maximum cosine distance that we accept in the search results, or using the autocut feature:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withNearText({
    concepts: ['Can the fox really jump over the dog?'],
    distance: 0.25,
  })
  .withAutocut(2)
  .withLimit(10)
  .withFields('pageTitle originalContent pageUrl _additional { distance }')
  .do();

Now, not only do we get only results with cosine distance less than 0.25 (that’s what distance setting in withNearText method does), but additionally, weaviate’s autocut feature will group the results by similar distance and return the first two groups (more on how autocut works here).

But that’s not all. We can also make the search like some concepts and avoid some others:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withNearText({
    concepts: ['Can the fox really jump over the dog?'],
    moveAwayFrom: {
      concepts: ['typography'],
      force: 0.45,
    },
    moveTo: {
      concepts: ['scientific'],
      force: 0.85,
    },
  })
  .withFields('pageTitle originalContent pageUrl')
  .do();

While the example with foxes is a little silly, you can imagine many scenarios where that feature can be really useful. Maybe you’re looking for “ways to fly” but you want to move away from “planes” and move toward “animals”. Or you may search for a query, but keep the results similar to some other object in the database:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withNearText({
    concepts: ['Can the fox really jump over the dog?'],
    moveTo: {
      objects: [{ id: '84ab0371-a73b-4774-8b03-eccb97b640ae' }],
      force: 0.85,
    },
  })
  .withFields('pageTitle originalContent pageUrl')
  .do()

There are many other features that you may want to experiment with. Read more on those in the Weaviate documentation.

Hybrid search

Finally, we can combine the power of vector search with the BM25 index! Here comes the hybrid search which uses both methods and combines them with a given weights:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withHybrid({
    query: 'Can the fox really jump over the dog?',
  })
  .withLimit(5)
  .withFields('pageTitle originalContent pageUrl _additional { distance score explainScore }')
  .do();

In _additional.explainScore property, you will find the details about score contributions from vector and reverse index searches. By default, the vector search result has a weight of 0.75 and a reverse index: 0.25, and those are the values we use in our Notion search. More about how hybrid search works and how to customize the query (including how to change the way vector and reverse index results are combined) can be found here.

Rerank

If we enable the rerank module, we can use it to improve the quality of search results. It works for any search method: vector, BM25, or hybrid:

await client.graphql
  .get()
  .withClassName('MagicIndex')
  .withHybrid({
    query: 'Can the fox really jump over the dog?',
  })
  .withLimit(100)
  .withFields('pageTitle originalContent pageUrl _additional { rerank(property: "originalContent" query: "Can the fox really jump over the dog?") { score } }')
  .do();

Adding a rerank score field to the query will make Weaviate call a rerank module and reorder the results based on the score received. To increase the chance of finding relevant results, we’ve also increased the limit: now rerank has more texts to work on and can find relevant results even if we had a lot of false positives from a hybrid search.

Summary

To summarize. In our Notion index we’ve used Weaviate DB with the following modules:

text2vec-openai enabling Weaviate to calculate embeddings using OpenAI API and ADA model
reranker-cohere allowing us to use CohereAI’s reranking model to improve search results
backup-s3 just to make it easier to backup data and migrate between environments

To get the data to index, we fetch all Notion pages using a search endpoint with an empty query. In each page, we recursively fetch all blocks that are then parsed by a set of parsers: specific for each type of block. We then have a markdown-formatted string for each page.

We then split the contents of each page into chunks: 1000 characters long with 200 characters of overlap. We also “clean up” the texts by removing special characters and multiple whitespaces to improve the performance of vector search.

The data for each page chunk is then inserted into the database with a fairly straightforward schema. We have an index of the chunk and some properties of the Notion page: URL, ID, title, and type. Additionally, we keep both original, unaltered content and cleaned-up versions but we calculate embeddings only from the latter.

To find information in the index, we use the hybrid search with a default limit of 100 chunks, with rerank enabled by default.

What worked and what didn’t

So, the $100mln question. Does it work?

Absolutely! We have a working semantic search that allows us to reliably search for information even without using the exact wording used on the pages we’re looking for. You can search for “parking around the office” or “where to leave my car around the office” or even just “parking?”. How to use a coffee machine? What benefits are available in Brainhub? Which member of the team is skilled in martial arts? Who should I talk to if I want a new laptop? What are Brainhub’s values?

Not everything works perfectly though. Finding information in large tables (e.g. we have a table with team members - long, with a lot of columns and long texts inside) may be challenging if you’re not smart in chunking them e.g. by ensuring that one row is in one chunk even if very long to avoid orphaned columns. Even then the search is not perfect e.g. when asking who is a UX in our team, it may find a chunk with one person out of three UX designers in a table. While this is fine for search (in search results, you still get the link to the correct page that contains the whole table) it may not be enough for a Q&A bot that may miss some information because of it.

Another issue is noise. One of the reasons we wanted a better search was thousands of pages of meeting notes, outdated guidelines, and other mostly irrelevant stuff that lurks in the depths of our Notion workspace. We did implement some mitigations to improve search results and get rid of noise, like lowering the “search score” of old pages but it was not enough. The best method was still manually excluding areas that were most problematic. That’s not ideal of course, we would like our search engine to figure out what’s relevant automatically so that’s something to do more research on.

In general though, the results are more than satisfactory and, while there were a lot of small tweaks here and there needed, we’ve managed to create a Notion search that actually works.

Make Notion search great again: Notion API

Łukasz Pluszczewski — Tue, 17 Oct 2023 07:42:42 +0000

In this series, we’re looking into the implementation of a vector index built from the contents of our company Notion pages that allow us not only to search for relevant information but also to enable a language model to directly answer our questions with Notion as its knowledge base. In this article, we will explore the Notion API.

Before we can create a searchable index, we need to get the contents of the Notion pages. Let’s see how we used Notion API to do that.

Notion integrations

Before we can start using Notion API we must create an integration, sometimes called a “Connection” in the UI. An “Integration Secret” is generated for each integration which can be then used to access API. You can select permissions for the connection which Notion calls “capabilities”. Our index does not write anything to indexed pages, so we selected only “read” permissions. We also allowed the integration to read user information so that we can replace mentions with people's real names.

Integration can only access pages that it has been manually added to. This access also extends to all of its descendant pages. Keep in mind that what an integration has access to is not as straightforward as you might think. Removing it from a page may not propagate if either permissions or integration access to some child page has been separately modified. Also, integration has access to non-public pages! Because our solution will give indirect access to all indexed pages to everyone in the company (via the language model answers), we’ve made sure that we’re assigning it only to the pages we’re absolutely sure don’t contain any private information.

Notion client

To communicate with Notion API we’ve used a dedicated typescript library @notionhq/client

Below is an example of configuration and simple request:



const client = new NotionClient({
  auth: 'abcdef123',
  logLevel: LogLevel.WARN,
});

await client.pages.retrieve({ page_id: '3853acec-eebc-42e2-843b-2c340f769b80' });

Besides retrieving pages, we can also fetch databases, blocks, block’s children, etc.



await client.databases.retrieve({ database_id: 'c2da9700-8244-4bc0-bff1-8dccd909b211' });
await client.blocks.retrieve({ block_id: 'b00afc3d-1db2-4cf3-9801-868bd84f06f8' });
await client.blocks.children.list({ block_id: 'b00afc3d-1db2-4cf3-9801-868bd84f06f8' });

Rate limiting

Notion API doesn’t have a hard limit. You’re expected to not exceed an average of 3 req/s but occasional bursts above that are allowed. That’s why our internal rate limiter allows slightly more frequent requests, with proper rate limit error handling just in case. If you reach the rate limit, the Notion API will respond with a specific error code and a “retry-after” header that indicates the wait time in milliseconds.

To ensure that we handle API’s rate limits correctly, we’ve implemented an API client wrapper that handles errors appropriately. Below is a simplified example of rate limit handling:



async request() {
  try {
    return await client.databases.retrieve({ database_id: 'c2da9700-8244-4bc0-bff1-8dccd909b211' });
  } catch (error) {
    if (!isNotionApiErrorOfType(error, APIErrorCode.RateLimited)) {
      throw error
    }

    const retryAfter = parseInt(error.headers.get('retry-after'));

    return delay(
      () => request(),
      retryAfter * 1000,
    );
  }
}

Pagination

Most endpoints have a limit on the number of entries returned and provide a “cursor” if there is more data to fetch. Below is a simple example of how to handle the pagination if we want to load all data - a function that fetches all pages:



private async fetchAllPages(query?: string, cursor?: string) {
  const response = await client.search(query, {
    start_cursor: nextCursor,
  });

  if (response.next_cursor) {
    return [
      ...response.results,
      ...(await this.fetchAllPages(query, response.next_cursor, index + 1)),
    ];
  }
  return response.results;
}

Since the Notion API does not have a “get all pages” endpoint, the function above uses the search endpoint with an empty query to retrieve all pages. While it is not a reliable way of doing that, as for example recently added pages or databases may have not been indexed yet and are not going to be returned, we’ve decided that it’s good enough for now.

Blocks

Texts in Notion are structured around “blocks” which are the basic units of content. Whatever you add to the page is a block: a paragraph, a list, a table, and so on. Each block can be standalone, like a paragraph with just some text in it, or have child blocks, like a list item containing a sub-list, etc. Below is an example of a block (from notion documentation):



{
    "id": "c02fc1d3-db8b-45c5-a222-27595b15aea7",
    "type": "heading_2",
    "heading_2": {
        "rich_text": [
            {
                "type": "text",
                "text": {
                    "content": "Lacinato kale",
                    "link": null
                },
                "annotations": {
                    "bold": false,
                    "italic": false,
                    "strikethrough": false,
                    "underline": false,
                    "code": false,
                    "color": "green"
                },
                "plain_text": "Lacinato kale",
                "href": null
            }
        ],
        ...
    }

There are more properties (like “parent” or “last_edited_time”) that we’ve hidden so that we can focus on what’s important. Each block has a “type” property that tells us what kind of block it is, but also, where to get its contents from. Different blocks have different data structures, so we have a separate piece of code, called “parser” to handle each block type. Below are two examples of parsers:



[BlockTypes.NumberedListItem]: {
  parse: (block, ctx) => {
    const number =
      'number' in block.numbered_list_item
        ? block.numbered_list_item.number
        : null;
    const text = getPlainText(block.numbered_list_item.rich_text, ctx);
    return number ? `${number}. ${text}` : text;
  },
},
[BlockTypes.ToDo]: {
  parse: (block, ctx) => {
    const text = getPlainText(block.to_do.rich_text, ctx);
    const checkbox = block.to_do.checked ? '[X]' : `[ ]`;
    return `${checkbox} ${text}`;
  },
},

The “getPlainText” function is a simple helper that converts a rich_text array into a string. Additionally, it receives “context” containing the list of users so that it can replace all mentions with actual names.

The “rich_text” property contains an array of elements that we need to parse. We have a simple “getPlainText” function that converts that to just a string. Our parsers return text formatted as markdown, as it is easily understandable by LLMs, and also, unlike HTML, doesn’t leave much garbage after removing special characters for embeddings.

Since blocks can have child blocks, we fetch blocks recursively:



private async getBlocksRecursively(pageOrBlockId: string): Promise<BlockObjectResponse[]> {
    const blocks = await this.notion.blocks.children.list({
            block_id: pageOrBlockId,
        });

    return await Promise.all(
      blocks.results.flatMap(async (block) => {
        if (block.has_children) {
          return {
            ...block,
            children: await this.getBlocksRecursively(block.id),
          };
        }
        return block;
      }),
    );

By gathering all blocks recursively and converting them to text, we get a nice, markdown formatted content of the page.

Pages and Databases

Notion organizes its content into pages. In addition to the block contents described above, each page can also have properties. They are similar to blocks, but they have keys, don't have children, their ID is not a UUID, and lack certain properties like 'last_edited_time'. The possible types and formats of property values are the same as those of blocks, so we use the same code to parse them. Below is the example of property, with the key “When” and type “date”:



"When": {
    "id": "some-id",
    "type": "date",
    "date": {
        "start": "2023-03-23",
        "end": "2023-05-05",
        "time_zone": null
    }
},

Notion also includes databases, which are collections of pages that can be filtered, sorted, and organized as needed. When you view the database as a table, what you see in the column contains the value of the corresponding “property” of the given page. In our index, we represent databases and simple tables in the same way.

Pages that are members of databases, in addition to their properties, can also have ordinary content. In other words, with each “row” being a page, each “column” is a property of that page, but because the page itself works just like any other, users can add normal content to it: paragraphs, lists, images, and so on. Because the content is not visible in any database view in Notion and is not visible in our representation of a database, we additionally index all members as separate pages.

Retrieving data from Notion can be unintuitive and sometimes tedious, especially when dealing with permission management and handling different types of blocks. However, we've successfully parsed all page and database contents into clean markdown texts. The only thing left to do is to build a vector index from these contents, but we'll cover that in the next article. Stay tuned!

Make Notion search great again: semantic search

Łukasz Pluszczewski — Wed, 04 Oct 2023 11:22:25 +0000

In this series, we’re looking into the implementation of a vector index built from the contents of our company Notion pages that allow us not only to search for relevant information but also to enable a language model to directly answer our questions with Notion as its knowledge base. In this article, we will see how to use vector embeddings to search and how to improve its performance.

Numbers, vectors, and charts are real data unless stated otherwise

Last time we explored vector embeddings and their main utility for our case: distances between them represent semantic similarities between texts. Let’s see them in action.

Airbus or Boeing?

Let’s consider the following texts:

"Pope John Paul II was the first non-Italian pope in more than..."
"Pope Francis is the head of the Catholic Church, the bishop..."
"Nicolaus Copernicus, a Renaissance-era astronomer..."
"Johannes Kepler was a German astronomer, mathematician, astrologer,..."
"The Tesla Model 3 is an electric car produced by..."
"The Ford Focus is a compact car manufactured by Ford..."
"The Ford Mustang is a series of American automobile..."
"The Dodge Challenger is the name of three different..."
"The Boeing 737 is a narrow-body aircraft produced ..."
"The Airbus A380 is a large wide-body airliner that..."
"The Airbus A320 family consists of short to..."
"Salamanders are a group of amphibians typically..."
"The dog is a domesticated descendant of the wolf..."
"The cat is a domestic species of small carnivorous mammal..."
"Elephants are the largest living land animals..."
"The tiger (Panthera tigris) is the largest living..."
"Rabbits, also known as bunnies or bunny rabbits..."

We have texts about cars, planes, animals, two popes, and two astronomers. We can calculate embeddings for each text and see how far they are from each other. Using the OpenAI's ADA-2 model, we would get 1500-dimensional vectors so we would have a hard time visualizing it. But we have a tool up our sleeves that will help us out. What is that? That’s right, embeddings 🙂

There is nothing stopping us from calculating 2-dimensional embeddings of those vectors so that we can see relationships between them on a flat screen. This time, the small embeddings were calculated algorithmically, not by neural network. Below is the result:

If you’re curious about how to reduce the dimensionality of vectors using the same algorithm, you can read more here

You can find a few interesting things here. Firstly, it’s clear to see that different categories of texts are clearly separated: animals, cars, people, and planes have their own place in the chart and are quite far from other categories. But that’s not all. Two popes are close together and a little further away from astronomers. A dog is close to a cat, but quite far away from a tiger, which in turn is closer to a cat than to a dog or a bunny. Of course, because we decreased the dimensionality of the vectors, we’ve most likely lost a lot of semantic data that is encoded in 1500 values of original vectors. We can still see the relationships though.

While embeddings can be used as inputs to neural networks, we don’t need neural networks to use the spatial relationships between them to implement efficient semantic search. It’s enough to calculate the embedding of a search query using the same method and find its closest neighbors in the semantic space. Let’s write some queries about a few topics, calculate embeddings for those queries, and add them to our chart:

It’s quite clear what texts are going to answer our question. The current pope is even closer than the previous one (but we’ve probably just got lucky on that one)!

We rather fly Airbus than Nicolaus Copernicus, sounds about right.

AI has spoken! Dogs are the cutest.

I should consider Boeing 737 🤔.

The last one is not so clear. While we see that the vector for the query is closer to the cars than people, in a search engine, Boeing 737 would still be quite high in search results for cars to buy.

Looking at the examples above, keep in mind that we don’t see the actual vectors - just their 2-D embeddings. Regardless, you can probably see the utility of the vector spaces: you can find texts that are on similar topics fairly easily. While not perfect this method is a great first step for more complex semantic search solutions. Let’s dive deeper to see what we can do with the results to make it better.

Vectors are stupid, language models are not

Well, maybe they are. But they will help us nonetheless. Let’s imagine that for the query “What car should I buy?” we’ve got the following results (not the actual vector search result):

“The Boeing 737 is a narrow-body aircraft...”
“The tiger (Panthera tigris) is the largest living...”
“The Ford Mustang is a series of American...”
“The Tesla Model 3 is an electric car produced by...”
“The Airbus A380 is a large wide-body airliner...”
”The Dodge Challenger is the name of three...”
”The Ford Focus is a compact car manufactured...”

The Boeing 737 is probably not your first choice. For such a simple query the actual results are much more accurate of course, vector search is not that stupid, but irrelevant results may appear for more complex and nuanced queries. The capable language model would clearly distinguish between a plane and a car, or between text that is roughly about the topic, and one that actually contains the answer - even the nuanced one. Here comes the rerank!

Rerank

While it’s not feasible to use a big and expensive language model to analyze thousands or even millions of texts you may have in your database, you can easily afford to let it clean up and reorder your initial vector search results if needed. That’s exactly what rerank models do. They are language models, so, unlike simple vector search, they understand the contents of the texts they are processing. They accept a query and a list of text documents and they “rerank” those documents giving them scores based on how relevant they are to the query. It’s much more expensive to use those models than to just calculate embeddings so we only use them after the initial vector search. Let’s use Cohere.ai's rerank model on our not-so-perfect car buying search (you can find rerank’s score in the brackets, while the initial order from the vector search was made up, the results from the rerank model are real):

[0.41] “The Tesla Model 3 is an electric car...”
[0.36] “The Dodge Challenger is the name...”
[0.34] “The Ford Focus is a compact car...”
[0.32] “The Ford Mustang is a series of...”
[0.20] “The tiger (Panthera tigris) is the...”
[0.08] “The Boeing 737 is a narrow-body...”
[0.05] “The Airbus A380 is a large...”

Now we’re talking! We have relevant results at the top thanks to rerank’s ability to actually understand the query and the texts. While it’s still not perfect (tiger seems dangerously close to the Ford Mustang for some reason), it’s enough in the vast majority of cases. Now let’s put all of that into practice and build a proper search engine!

In the next articles, we’ll see how we get the data from Notion using its API and how we used Weaviate vector database to build a searchable index out of it.

Make Notion search great again: vector embeddings

Łukasz Pluszczewski — Thu, 28 Sep 2023 08:14:41 +0000

In this series, we’re looking into the implementation of a vector index built from the contents of our company Notion pages that allow us not only to search for relevant information but also to enable a language model to directly answer our questions with Notion as its knowledge base. In this article, we will focus on the theory of vector embeddings.

Numbers, vectors, and charts are real data unless stated otherwise

Notion, with its infinite flexibility, is perfect for keeping unstructured notes, structured databases, and anything in between. Thanks to that flexibility, adding stuff is easy. It’s so easy in fact, that we add, and add, and add, then add some more until we have 5000 pages of meeting-notes, temporary-notes and god-knows-what-notes.
Do you have that problem at your company? Tens of thousands of pages in Notion, created by dozens of people with different ideas about naming and structuring data. You’re adding new pages regularly just to keep things from being lost and then… they are lost. You try to search for something and you get pages that are kind-of on-topic, but don’t answer the question, some completely unrelated stuff, a random note from 2016, and an empty page for good meassure. Notion truly works in mysterious ways.

We can divide companies into two groups: those before their failed Notion reorganization attempt, and those after. We can try to clean it up, reorganize, and add tags but trying to clean up thousands of pages, with new ones being added daily, is doomed to fail. So we’ve decided to solve this problem with the power of neural networks. Our goal was to create a separate index that would allow us to efficiently search through all the pages in Notion and find ones that are actually relevant to our query. And we did it! (with some caveats - more on that at the end of the series)

But before we get there, let’s dive into the technologies we’re going to need.

Brown foxes and lazy dogs

Let’s consider a simple neural network that accepts text as input. It could be a classification network for example. Let’s say that you want to train the network to detect texts about foxes. How can you represent a text so that a neural network can work on it?

The quick brown fox jumps over the lazy dog

Neural networks understand and can work on vectors.

Well, actually, while a vector is just a one-dimensional tensor and simple neural networks can have vectors as input, more complex architectures, including most language models, work on higher-dimensional tensors. We’re simplifying a bit just to get to the point.

A n-dimensional vector is just a one-dimensional array with length n. The text about the brown fox is definitely not a vector. Let’s do something about it. One way of converting the text to numbers is to create a dictionary and assign each word a unique ID, like this (these are actual IDs from the dictionary used by OpenAI’s GPT models):

The	quick	brown	fox	jumps	over	the	lazy	dog
464	2068	7586	21831	18045	625	262	16931	3290

You may have noticed that the words 'the' and ' the' (with space at the beginning) have different IDs. Why do you think it’s useful to have those as separate entries in the dictionary?

So we figured out how to convert text into a vector, great! There are issues with it though. First, even though our example is straightforward and small, real texts may be much longer. Our vector changes dimensionality based on the length of the input text (which in itself may be an issue for some simple architectures of neural networks) so for long texts, we need to deal with huge vectors. Additionally, it’s harder to successfully train the neural network on data that contains big numbers (without going into details, while neural networks do calculations on big numbers just fine, the issue is the scale of the numbers and the potential for numerical instability during training when using gradient-based optimization methods). We could try one-hot encoding (by taking all entries from a dictionary, assigning 0 to entries that do not appear in our text, and 1 to ones that do) to have a constant-length vector with only ones and zeros, but that doesn’t really help us that much. We still end up with a large vector - this time it’s huge no matter the length of the input text and additionally, we lose the information about word position. And last but not least: what about the words we don’t have in our dictionary? We need syllables or even separate letters to be assigned IDs in some texts which makes the resulting vector even longer and messier.

The dictionary-based vectors are used in practical NLP applications e.g. as inputs to neural networks. One-hot encoding is also used but for different purposes (e.g. when the dictionary is small, or for categorization problems).

There is one thing that our vector could tell us about the input text, but doesn’t - the meaning. We could decide if the text is about foxes just by checking if there is an element 21831 in it, but what about the texts about "omnivorous mammals belonging to several genera of the family Canidae” or about "Vulpes bengalensis”? Meaning can be conveyed in many more ways than we can include in a simple algorithm or a whitelist of words (I bet you wouldn’t think of adding “bengalensis” to your dictionary, would you?). For that reason, to know for sure, we need to process the whole text, even if thousands of words long. The only way we can reliably decide if the text is about foxes is to pass the vector on to the advanced, and expensive, language model to process it and understand what those words, and the relationships between them, mean. Or is it?

Embeddings to the rescue

Embed… what? In simple terms, embedding is a way of representing data in a lower-dimensional space while preserving some relationship between data points. In other words, embedding is a smaller vector representation of a larger vector (or more general: a tensor), that still contains some important information about it. It can be calculated algorithmically but in cases like ours, this is done with the help of neural networks.

There are many ways to reduce the dimensionality of the data, like principal component analysis which is fast, deterministic, and used in data compression, noise reduction, and many other areas, or t-SNE which is used mainly in high dimensional data visualization and in the next article of this series ;)

The simplest example would be to represent our huge vector with potentially thousands of dimensions as just one number, representing how much about foxes the text is. For the sentence about jumping fox, we would probably get something like this:

0.89

Not very exciting. In reality, embeddings are a little bigger - depending on the use case they may have hundreds or thousands of dimensions.

You may now think: "Wait a moment. Thousands of dimensions? It is certainly not less than those few numbers representing our jumping fox, isn't it?". In our case yes - the embedding will actually be bigger than the dictionary-based vector. But that vector is also an embedding of a huge, practically infinite-dimensional vector in the space of all possible texts. In that sense, we have a dictionary-based embedding which can be small for short texts but is not very useful, and a powerful embedding that may be a little bigger but gives us way more easily extractable information about the text.

Embeddings can be calculated from many different types of data: images, sounds, texts, etc. Each of those is just a big vector, or actually, a tensor. You can think about an image as a 2-dimensional tensor (a matrix) of color values, as big, as many pixels there are in an image. A sound file is often represented as a spectrogram which is also a 2-dimensional tensor.

In each case, each dimension of the, often smaller, embedding vector encodes a different aspect of the item’s meaning.

It’s also worth mentioning that, while in principle, each dimension of the embedding encodes some aspect of the original, in most cases, it is not possible to determine what exactly each number in the embedding means. The idea of “foxness”, while may be preserved in the embedding in some way, is foreign to the embedding model and is unlikely encoded in a single dimension but rather in a much more complicated and nuanced way.

Because vectors are also points in n-dimensional spaces, we can think about the relative positions in those spaces as semantic relationships. The closer the embedding vectors are, the more similar the data points are. In the case of image embeddings, images of dogs would be very close to each other, quite close to images of foxes but far away from images of spaceships. The same applies to text embeddings.

By “closer” we don’t necessarily mean the lowest euclidian distance but rather the closest angle, measured by cosine similarity. The reason for that is that in high-dimensional space the distances or the magnitudes of vectors become much less informative. The actual semantic differences are encoded in the angles, or directions of vectors. The magnitudes may also contain useful information such as the emphasis or frequency, but they’re harder to use due to "the curse of dimensionality" - a set of phenomena resulting from increasing the number of dimensions like rapid increase in space volume.

While all this is true of the types of embeddings we talk about (and care about) in this series, in reality not all embedding methods encode semantic information about the original. What is encoded in the embedding values, and what are the relationships between datapoints depend on the specific training process of the embedding model and its purpose. You can easily imagine an embedding calculated from an image that captures some aspects of color palette, or an audio embedding that encodes its atmosphere instead of semantic contents etc.

As mentioned earlier, we calculate embeddings like this with the help of a neural network. There are many models that can do this from simple and small models you can run on your computer to large, multi-language, commercially available models like OpenAI's ADA. In our case, for the simplicity of use, and to get the best performance possible we’ve decided to use the latter. Below is a fragment of text embedding of our fox text calculated using OpenAI's ADA embeddings model. The particular version of the ADA model we’ve used creates 1.5k-long vectors:

0.0035436812	0.0083188545	-0.014174725	…	-0.031340215	0.018409548	-0.041342948

A long vector like this is not terribly useful for us by itself. We can’t really “read” any interesting information from it nor we can determine if it’s about foxes just by looking at it. However, we can use the relationships between embeddings of different texts to our advantage, for example, to find foxes or to implement efficient semantic search. But this is a topic for the next article.

Solving the Tech Debt Puzzle: Strategies that boost business

Michał Szopa — Wed, 20 Sep 2023 08:29:47 +0000

In the world of software development, it's almost inevitable that you will encounter projects struggling with technical debt and poorly written code. Such projects often provide quite a challenge, requiring a strategic approach to salvage and evolve them into robust, maintainable systems.

The article delves into the challenge of dealing with legacy code and technical debt and describes lessons learned from a real-world project we worked on in Brainhub. As software engineers, we understand that working with legacy systems is an integral part of our craft, and it's a journey that can be both very challenging and rewarding.

In the article, I share the story of a project plagued by technical debt, complex APIs, performance bottlenecks, and scalability concerns. We'll explore some strategies, tactics, and practical steps we took to breathe new life into the project, all while continuing to deliver business value.

Encountered challenges

High coupling: The backend, a critical component of our application, was mainly implemented within just two massive source files, each containing thousands of lines of code. On the front end, our Angular application suffered from enormous components loaded with complex logic, making it a formidable challenge to maintain.
Test Coverage: The project was full of issues in the realm of testing. The backend had only a single unit test, and the frontend didn't fare much better, with several unit tests that offered little value in ensuring the quality of the code.
Lack of proper local environment setup: The absence of a well-defined local development environment isolated from the other system components was a huge problem, making it nearly impossible to work independently from the rest of the system.
No observability and poor CI/CD: The project lacked essential practices for ensuring code quality and deployment efficiency. There was no observability to gain insights into system behaviour and no CI/CD pipeline testing of the system.
Lack of project documentation - The project suffered from poor quality documentation. This absence of comprehensive documentation made system discovery challenging, forcing the team to resort to manual testing and code reverse engineering to understand how various system components and functionalities operated.

The real challenge: Where to start?

While our project was undeniably full of technical issues, the most challenging task we faced was deciding where to begin the journey of making things better.
Addressing the issues mentioned above was a vital step, but attempting to tackle them all at once would have been impossible. So, where did we start, and how did we navigate this labyrinth of challenges?

Our first step was to conduct a comprehensive tech audit of the project. This audit wasn't just about identifying the technical problems; it was about making the client aware of the pressing issues of tech debt and conveying its broader implications.

The tech audit allowed us to highlight the problems and risks lurking within our codebase. We prioritised these issues based on their importance and potential impact on the project's success. More importantly, we used this audit as a platform to engage the client in a conversation about the significance of addressing tech debt - we conveyed that it wasn't just a matter of improving code quality but an essential investment in the project's future. We couldn't afford too much technical debt.

However, it's worth noting that while the tech audit was a great tool to raise awareness, it wasn't a silver bullet - the problems identified during the audit were too broad to be transformed directly into actionable tasks or epics and were often too separated from the business priorities, making them hard to plan.

The essential takeaway from this phase was clear: while the tech audit served as a crucial catalyst for change, it didn't provide an immediate roadmap for resolution. To make significant progress, we needed to connect these technical challenges with actual business value. Only by aligning our efforts with the project's primary business goals could we address tech debt effectively within the constraints of tight project deadlines.

Establishing a Testing Strategy

We all agree that any healthy project should have adequate tests that it can rely on. In our mission to improve the technical quality of the project, we also had to choose the right strategy and approach to testing so that we could focus on further development of the system with peace of mind.

Traditionally, the testing pyramid preaches the importance of a robust foundation of unit tests, followed by integration and end-to-end tests. However, in our specific case, adhering strictly to the testing pyramid's principles presented a unique challenge - our codebase had reached a state where writing comprehensive unit tests could be compared to applying a fresh coat of paint to a crumbling wall. It would seal the problems rather than fix them, and maintaining such tests would become a nightmare.

Instead, we adopted an alternative approach, which is called the Testing Trophy, and applied its main slogan stating:

Write tests. Not too many. Mostly integration

Our approach was simple: focus on covering the main end-users flow. By creating proper high-level tests using Playwright (which made the task quite enjoyable for us), primarily focusing on the E2Es and integration tests, we ensured that every change we made stayed within the primary purpose of our application. These tests became the sentinels guarding against regressions and unexpected behaviour and gave us a very high value in a relatively short period. Of course, they were not the fastest and far from perfect, but we felt much more confident in what we were doing then.

Refactoring: Purpose Over Perfection

Once we had a certain level of security and confidence assured through testing, we could start thinking about making concrete changes.

When confronted with a codebase ridden with tech debt and suboptimal code quality, it's almost natural to be tempted by the siren call of a fresh start. The idea of scrapping everything and rebuilding from scratch can be enticing, but it's a path fraught with risks and challenges. Refactoring always comes with a risk and should always be driven by a clear purpose and a close alignment with business goals.

Usually, a rewrite is only practical when it's the only option left.

The reason rewrites are so risky in practice is that replacing one working system with another is rarely an overnight change. We rarely understand what the previous system did - many of its properties are accidental in nature and tests alone will not be able to guarantee that some regressions have not been introduced after all.

Baby steps to the rescue!

Although It may sound like a much less attractive and exciting path to choose from the engineering point of view, being able to flawlessly improve the existing system may actually come with much more complex and interesting challenges.

Where to start?

We learned this key lesson: refactoring should have a well-defined purpose and always support business. While code aesthetics are essential, they should never be the sole driving force behind refactoring efforts. Instead, refactoring should be a strategic action to solve specific problems or achieve long-term benefits. Taking that into account, we have identified the following points that might help you to identify areas of improvement in your project that you might want to consider:

Pay attention to the roadmap - Identifying opportunities for refactoring can be closely guided by the project roadmap, strategically aligning code improvements with the evolving needs and priorities of the business.
Analyse the bugs - maybe they are related and come from the same tangled place in the codebase. If that's the case, that might be a reason to consider a more significant change, replacing that part of the system.
Define the most essential part of the application - that ensures that refactoring efforts are directed towards areas with the most significant impact on achieving business objectives.
Communication is key - open and transparent communication within the team is crucial. Identifying and sharing technical challenges early on and incorporating them into the planning are very important.

Case study: Let's try to put it into practice

As mentioned, we were struggling with quite a few problems. To navigate this labyrinth of issues effectively, we adopted a structured approach that leveraged testing, refactoring, and careful planning.

Using our backend as an example, we started gradually transitioning to the Nest - a robust and modular framework for building scalable applications. This transition allowed us to modernise our codebase incrementally without disrupting existing functionality by setting it up next to an existing Express application, which turned out to be as easy as adding only a few lines of code:

import { app as ExpressApp } from './legacy-app';
import { AppModule } from './app.module';

const bootstrap = async() => {
  const port = process.env.PORT || 3000;

  const nest = await NestFactory.create(
    AppModule,
    new ExpressAdapter(ExpressApp),
  );

  await nest.init();
  http.createServer(ExpressApp).listen(port);
}

While the framework was only meant to be a tool to support our changes, our primary goal and plan for further migration considered the following main points:

Isolation of technical debt: By encapsulating legacy code within the existing system and developing new features in the Nest framework, we effectively isolated the legacy code, preventing it from contaminating the new codebase.

Introduction of v2 API: We have introduced a new version of our API, featuring a cleaner, more intuitive structure, with implementation following the specifications covering it with proper tests. It allowed us to add new features and enhancements without altering the existing API, which served as the backbone of our application, while slowly moving our API clients to use the latest version.

Documenting business requirements: To gain clarity and alignment among the development team and stakeholders, we invested time in documenting business requirements comprehensively by creating a detailed specification that served as a blueprint for development and further testing.

Incremental improvements through minor rewrites: One of the fundamental principles of our approach was to avoid monumental rewrites. Instead, we opted for small, manageable baby steps that could be seamlessly integrated into our ongoing business tasks. This approach ensured that we only bit off what we could chew and allowed us to improve core functionalities while delivering new features continuously.

Example: User registration process update

I have already said a lot about the theory, so let's now take a closer look at a specific example, putting it all into practice.

At some point, we were asked to add account verification email functionality as part of the user registration process.
The current registration logic has become a complex piece of code over time as it manages user creation, database interactions, authentication, and more - all within hundreds of lines of code, so it turned out to be quite a challenge.

To address this problem, we could implement the new feature in a new Nest way, but the hard part would still be to integrate it with the legacy endpoint. The easiest way to move forward would be to rewrite the mentioned endpoint as well, solving all the mentioned problems but at the same time introducing a vast risk and putting in a great deal of work before we can even start working on the primary goal of the task.

In order to avoid all that, we decided to introduce an event-driven approach called Broker Topology.

In broker topology, you send events to a central broker (event bus), and all broker subscribers receive and process events asynchronously.

In this approach, whenever the event bus receives an event from event creators, it passes the event to all subscribers, who can then process the data with their own logic. In this case, the only dependency of both publishers and subscribers is the Event Bus, which significantly reduces the coupling.

Below is an illustrative example of implementing the event-driven approach to handle the described. By doing so, all the change that needed to be done on the legacy side was to notify the event bus about the new user registration event; that's all:

await EventBus.getInstance().dispatch(
  new NewUserRegisteredEvent(payload),
);

Which then could be handled by all interested subscribers, who are responsible for handling it in their specific way:


export class AccountVerificationService {
  constructor(
    @Inject(EVENT_BUS_PROVIDER) private readonly eventBus: EventBus,
    // ...
  ) {
    this.eventBus.registerHandler(
      ApplicationEvents.NewUserRegisteredEvent,
      (eventPayload: NewUserRegisteredEventPayload) => 
           this.processUserAccountVerification(eventPayload);
    );
  }

  // ...
}

By leveraging the event bus and embracing modularity, we achieved our goal of adding new features in a tested and maintainable manner while minimising disruption to the existing system.

Summary

Dealing with technical debt can be challenging and a long journey that demands patience and strategic thinking. After approximately ten months of dedicated effort, we transformed the described codebase from 0% to about 50% code coverage and migrated most of the backend code to the isolated Nest modules. Moreover, we successfully implemented integration and end-to-end tests, ensuring robustness and stability, checking our system thoroughly after every change, and still staying flexible enough to add new features and meet changing business requirements.
There is still much work to be done, but the project is in a different place, ready for further development.

Unlocking Agile Potential with GrowthBook and Feature Flags

Krystian Otto — Tue, 05 Sep 2023 18:14:42 +0000

In the realm of contemporary software development, the ability to promptly respond to user feedback, stay in line with market trends, and adapt to evolving requirements holds immense significance. This is precisely where feature flags come into the picture. Feature flags, also commonly referred to as feature toggles, are a programming technique that empowers developers to activate or deactivate specific features without the need to deploy new code. This approach provides developers with flexibility, reduces potential risks, and facilitates the process of experimenting with new functionalities.

Bigger picture

Before we delve into the implementation details, let's take a step back and grasp the larger context about feature flags. In a rapidly evolving technological landscape, user preferences can change unexpectedly, and market demands may take sudden turns. Feature flags emerge as a strategic tool that empowers development teams to quickly adapt to these changes. By decoupling the release of features from code deployments, you achieve the capability to make real-time decisions regarding which users have access to specific features. This provides a smooth user experience and facilitates an iterative approach to development.

Imagine being able to test new features on a subset of users before a full-scale launch, while collecting invaluable feedback and metrics along the way. This controlled strategy allows you to make data-driven decisions, improve features and avoid potential pitfalls. What's more, feature flags make A/B testing easier, where different versions of a feature are compared to identify the one that resonates most effectively with users.

From a technical point of view, function flags encourage a more modular and maintainable code base. By encapsulating new features behind flags, you can reduce the complexity of code merging, isolate possible problems and ensure that unfinished or unstable features do not disrupt the user experience.

Why did we decide to try feature flags?

Our decision to try out feature flags was driven by several factors. In particular, we observed how prominent industry leaders and corporations were skillfully using feature flags to optimize their development processes, which sparked our curiosity about the potential benefits they could offer our own projects.

We wanted to try out and play with feature flags within an internal project, without customer involvement. This controlled environment provided a safe space to experiment, learn, and improve our skills, all without pressure from external stakeholders. It acted as an invaluable testing arena where we could thoroughly understand the concepts and mechanics of feature flags, preparing the ground for their eventual integration into our customer-facing projects.

Additionally, our goal to follow trunk-based development principles heavily influenced our choice. Using feature flags aligned perfectly with our aim of continuously integrating new code into the main branch. By hiding unfinished features from users, we could smoothly add code to the main branch without causing major disruptions and without compromising the user experience.

Requirements

The implementation of feature flags introduced essential requirements to ensure smooth integration. These included interaction between frontend and backend, efficient management and adaptability to future projects.

Our approach prioritized the retrieval of flags from both the frontend and backend. This dynamic control mechanism made it easy to simply activate or deactivate features.

The administration panel played a key role in enabling the management of flags.

For implementation, we decided to host the solution independently in a Docker container. This decision gave us more control and increased security during the deployment process.

Considering cost-effectiveness was key, which led us to evaluate different tools and technologies to provide the best-fit solution.

Why did we choose GrowthBook?

User and permission management:

GrowthBook allows us to effectively manage users and their permissions, ensuring appropriate access levels for different team members.

Feature flag creation and modification:

With GrowthBook, we can easily create and modify feature flags, enabling controlled releases and targeted customization of features.

Environment management:

GrowthBook supports the creation and management of different environments, facilitating seamless testing and deployment processes.

Well-documented documentation:

GrowthBook provides comprehensive and well-documented resources, making it easier for our team to understand and utilize the tool effectively.

Easy integration of SDKs:

GrowthBook offers straightforward integration with software development kits (SDKs), simplifying the process of incorporating the tool into our existing infrastructure.

Self-hosted capability:

GrowthBook allows us to host and manage the tool internally, giving us more control and ensuring data privacy and security.

Docker setup

By default, Growthbook stores its data in a mongo database. We use the bitnami image because it comes with a basic configuration out of the box.

The admin panel and all core functions come directly from the Growthbook image

We also use proxy image. This speeds up synchronization when flags are added or switched, e.g. flag switching is immediately reflected in our application.

mongo:
    image: docker.io/bitnami/mongodb:6.0
    restart: always
    env_file:
      - ./.env
    volumes:
      - 'mongodb_master_data:/bitnami/mongodb'
    ports:
      - 27017:27017

growthbook:
    container_name: "growthbook"
    image: "growthbook/growthbook:latest"
    ports:
      - "4000:3000"
      - "4100:3100"
    depends_on:
      - mongo
    env_file:
      - ./.env
    volumes:
      - uploads:/usr/local/src/app/packages/back-end/uploads

  proxy:
    container_name: "growthbook_proxy"
    image: "growthbook/proxy:latest"
    ports:
      - "4300:3300"
    depends_on:
      - growthbook
    env_file:
      - ./.env

volumes:
  uploads:
  mongodb_master_data:
    driver: local

Administration panel

Let's take a closer look at the administration panel and go through the functionalities we use.

To start using the feature flags, the first step is to properly configure the environments. In our case, we decided on three environments: a production environment, a dedicated environment for testing purposes and all activities performed during the continuous integration process, and a staging environment.

After configuring the environments, the next task is to configure the SDKs. Each SDK corresponds to a specific environment.

And finally - flags. For our test case, let's create a boolean flag to test a new method for calculating the lead time distribution metric. Each flag must be assigned to at least one environment where toggling will be possible.

React implementation

First, for better DX, let's create an enum with the feature flags that are available in our applications and packages

export enum FEATURE_FLAG {
  FEATURE_FLAG = 'new-ltd-mertric-calculation',
}

To use a Growthbook in a React application, we need to create an instance of the Growthbook and pass it to the context provider that wraps our application. In addition, our application needs to know what flags are available at any given time and if their state changes, this should be reflected in the application. This is why loadFeatures is used in the useEffect hook with the autoRefresh option.

import { GrowthBook, GrowthBookProvider } from "@growthbook/growthbook-react";

const App = () => {
    const gb = new GrowthBook({
        apiHost: import.meta.env.VITE_FEATURE_FLAGS_API_HOST,
        clientKey: import.meta.env.VITE_FEATURE_FLAGS_CLIENT_KEY,
        enableDevMode: import.meta.env.DEV,
      });

      useEffect(() => {
        gb.loadFeatures({ autoRefresh: true });
      }, []);

    return (
        <GrowthBookProvider growthbook={gb}>
            // rest of stuff
        </GrowthBookProvider>
    );
};

To check if the flag is toggled, use useFeatureIsOn hook. This hook will return a Boolean value based on the state of the flag. Then we can render a new version of the metric or do whatever is required.

import { useFeatureIsOn } from "@growthbook/growthbook-react";

const FriendlyComponent = () => {
  const isNewLtdMetric = useFeatureIsOn(FEATURE_FLAG.NEW_LTD);

  return isNewLtdMetric ? <New/> : <Old/>
};

NestJS implementation

On the backend in the metrics tool, we use NestJS. Let's dive into the feature flag module.

First of all, create a token to manage dependency injection.

export const FEATURE_FLAG_TOKEN = Symbol('FEATURE_FLAG_TOKEN');

This token is used to create FeatureFlagService , which returns a configured instance of Growthbook.

Note that this service can be request scoped. It is not, because we have modules that are used outside the request scope. Using the request scope module in other modules with different scopes will break them.

export type FeatureFlagService = GrowthBook;

export const featureFlagProvider = {
  provide: FEATURE_FLAG_TOKEN,
  useFactory: async (
    configService: ConfigService,
  ): Promise<FeatureFlagService> => {
    const gb = new GrowthBook({
      apiHost: configService.get<string>('FEATURE_FLAGS_API_HOST')!,
      clientKey: configService.get<string>('FEATURE_FLAGS_CLIENT_KEY')!,
      enableDevMode: true,
    });

    await gb.loadFeatures();

    return gb;
  },
  inject: [ConfigService],
};

Finally, let's add this provider to the module as follows.

@Module({
  providers: [featureFlagProvider],
  exports: [featureFlagProvider],
})

To use such a service, we simply need to inject a token into another provider.

export class LeadTimeMetric {
  constructor(
    @Inject(FEATURE_FLAG_TOKEN)
    private readonly featureFlagService: FeatureFlagService,
  ){}

  calculate(...) {
    if (this.featureFlagService.isOn(FEATURE_FLAG.NEW_LTD)) {
      return; // new version of metric to be calculated
    }

    return; // old version of metric to be calculated
  }
}

Summary

Feature flags are definitely a powerful tool that makes it easier and safer to deliver new features. It takes time to get used to this new way of implementing new functionality. It is important to note that we have only scratched the surface of this tool's potential. I encourage you to consider using feature flags. In a landscape where adaptability and responsiveness are most important, feature flags offer a path that is worth exploring.

iOS CI/CD Evolution: From Bitrise to GitHub Actions Migration Study

Tomasz Pierzchała — Mon, 21 Aug 2023 10:48:43 +0000

Background

Some time ago we had a client that asked us to migrate his whole mobile CI/CD flow from Bitrise to GitHub actions. The project was a React Native, iOS-targeted application.

After finding the client’s motivations behind the decision (like having all pipelines in one place and reducing their list of vendors) and quick research, since it was our first time setting up mobile pipelines in GitHub Actions we accepted the challenge. What are the outcomes? Was it a good decision?

The workflow

The client’s pipeline workflow was pretty standard.

On each push to the repository or when the PR was opened we were first linting the code, and then testing the JS part.

Additionally on the master branch E2E Detox tests were executed. We also had the possibility to run the manual workflows for E2E and deployment.

On push / PR	Manual
1. Linting the code 2. Testing JS	1. Building the app for testing 2. E2E Detox testing
	1. Building the app for release 2. Code signing 3. Deployment to Testflight

Apple to oranges?

Both of the providers support YAML configurations where you can specify the workflows.

But Bitrise also supports the UI workflow editor. I heard the opinions that UI editors are for amateurs only, providing basic functionalities only but seriously the editor speeds up the configuration process significantly.

Let’s configure our repo to support both CI/CD providers. How can it be done?

In GitHub Actions, we need to create a new workflow file in ./github/workflows directory. It could look like this:

name: Shared / E2E (iOS)

on:
  workflow_dispatch:
  push:

jobs:
  build:
    name: E2E (iOS)
    runs-on: macos-11
    steps:
      - uses: actions/checkout@v1

      - name: Install Node
        uses: actions/setup-node@v1
        with:
        node-version: '16'

      - name: Install NPM Dependencies
        run: npm install

      - name: Cache Pods
        id: cache-pods
        uses: actions/cache@v2
        with:
        path: ios/Pods
        key: ${{ runner.os }}-pods-${{ hashFiles('**/Podfile.lock') }}
        restore-keys: |
        ${{ runner.os }}-pods-

      - name: Install Pod Dependencies
        if: steps.cache-pods.outputs.cache-hit != 'true'
        run: cd ./ios && pod install && cd ..

      - name: Install Detox Dependencies
        run: |
        brew tap wix/brew
        brew install applesimutils

      - name: Run Detox Build
        run: npm run e2e:ios:build

      - name: Run Detox Test(s)
        run: bash ${{ github.workspace }}/e2e/.ci-scripts/run-e2e-ios.sh

And voilà we have a GitHub Action workflow that can be run either on push to all branches or manually.

Now let’s see how it looks in Bitrise. Since it is another service first we need to connect it with the repo. It has a wizard that leads you through the whole process. Just a couple of dialogs / settings and you have a primary but working pipeline that we can customize.

You can utilize over 200 add-ons that are used as workflow steps, with integrations for Apple app building and deployment included. This is important because it allows configuring these steps in a much more accessible way than in GitHub.

The wizard can also automatically connect with GitHub API so that the pipeline starts on push and reports the status back.

The workflow is also available as a YAML configuration:

The crucial difference between the providers is related to their targets. GitHub Actions is a generic solution that is totally unopinionated, there are multiple shared actions available in the marketplace and you can configure it as you wish however there’s no recipe that you can quickly adapt to your project.

Bitrise however is mobile-focused, and opinionated, providing the recipes that should speed up the configuration.

My subjective feeling is that Bitrise solutions are significantly better documented, you have the aforementioned recipes, and so on. On the other hand, GitHub Actions as iOS CI is rather a niche - for instance, Detox docs don’t even say a thing about GitHub Actions configurations, whereas different providers are described. However it’s doable, we have successfully set it up in our client’s project.

Code signing

Each time setting up the pipelines for iOS development the most problematic part to me is the code signing and Testflight part.

How to store the certificate? How to sign the code? These are the questions you need to address on your own when using GH Actions.

Bitrise provides its own add-ons that can do it for you, apparently after some configuration that is usually well described in the documentation.

Returning to GH Actions, we decided to automate the process using Fastlane. For those who are not familiar with this - it is an app automation tool that allows deployment-related tasks to be performed automatically by using so-called lanes.

Sample Fastlane’s lane for building the iOS app:

private_lane :build do |options|
  app_identifier = options[:app_identifier]
  scheme = options[:scheme]

  disable_automatic_code_signing(
    team_id: team_id
  )

  get_certificates(
    keychain_path: keychain_path
  )
  get_provisioning_profile
  update_project_provisioning(
    code_signing_identity: "iPhone Distribution"
  )

  increment_build_number({
    build_number: latest_testflight_build_number(app_identifier: app_identifier) + 1
  })
  increment_version_number(
    version_number: version_number
  )

  build_app(
    scheme: scheme,
    workspace: "MyProject.xcworkspace",
    configuration: "Release",
    export_method: "app-store",
    export_team_id: team_id,
  )
end

We used Fastlane’s cert and sigh approach that handles loading the certificates/provisioning profiles and signing the app. But first, we needed to import the certificate to the keychain which is a bit complicated if you don’t work with Mac pipelines often. A sample script might look like this:

- name: Installing the certificate and provisioning profile
  env:
  KEYCHAIN_PASSWORD: ${{ secrets.KEYCHAIN_PASSWORD }}
  CERTIFICATE_BASE64: ${{ secrets.CERTIFICATE_BASE64 }}
  run: |
  # create variables
  CERT_PATH=$RUNNER_TEMP/build_certificate.p12

  # import certificate and provisioning profile from secrets
  echo -n "$CERTIFICATE_BASE64" | base64 --decode --output $CERT_PATH

  # create temporary keychain
  security create-keychain -p "$KEYCHAIN_PASSWORD" $KEYCHAIN_PATH
  security set-keychain-settings -lut 21600 $KEYCHAIN_PATH
  security unlock-keychain -p "$KEYCHAIN_PASSWORD" $KEYCHAIN_PATH

  # import certificate to keychain
  security import $CERT_PATH -P "$P12_PASSWORD" -A -t cert -f pkcs12 -k $KEYCHAIN_PATH
  security list-keychain -d user -s $KEYCHAIN_PATH

There is also an alternative Fastlane’s approach, called match that stores the certificates and provisioning profiles in a dedicated repository, it might be GitHub or another provider.

But we decided to stick with the cert and sigh approach.

How it is done in Bitrise? Well, you just need to upload the certificates and adjust a few settings:

Pricing

A penny saved is a penny earned, so let’s compare the pricing plans.

GitHub Actions pricing strategy is based on the free minutes assigned to the account and as-you-go payments when the free limit is reached.

At first glance, you might get the impression that you get 2000 or even 3000 free minutes but beware of the operating system multiplier. Yes, you get 2000/3000 minutes as long as it’s a Linux build. If you’re going to use Mac it reduces 10 times. (But When building Android on Linux this pricing might be very generous).

Each minute beyond the free limit cost at least $0.08 for a 3-core machine and the price is the same, no matter how many minutes you spend.

GitHub pricing:

	Free	Pro / Team
Mac compute-minutes included	200	300
Cost of additional minutes	$0.08 (3 cores) or $0.32 (12 cores)	$0.08 (3 cores) or $0.32 (12 cores)
Concurrency	up to 5 concurrent Mac jobs	up to 5 concurrent Mac jobs

For further details check the GitHub pricing page.

Bitrise offers 300 compute minutes for free. Unlike Github the additional minutes don’t have a fixed price - the more you consume the less you pay. The first Team tier includes 500 minutes, $0.07 each, and the last tier includes 50 000, $0.0351 each.

Bitrise pricing:

	Hobby (Free)	Team (different Tiers)
Mac compute-minutes included	300	500 - 50 0000
Cost of minute	N/A	$0.0351 (the biggest Tier, paid yearly) - $0.07 (the smallest Tier, paid monthly)
Concurrency	up to 5 concurrent Mac jobs	up to 10 concurrent Mac jobs

For further details check the Bitrise pricing page.

Conclusion

GitHub Actions is definitely a stable and robust CI solution backed by the tech giant. I enjoy working with it on our projects where we’re operating on Linux and doing common, well-known tasks. But React Native iOS pipeline is a different story, though - it is a bit challenging to set up and that’s not bad at all - we, software developers, like challenges, isn’t it? But the fact that in Bitrise the pipeline can be set up in the UI wizard with the guidance of recipes on production-ready add-ons and in most cases for a lower price makes it a no-brainer decision.

There might be some cases where GitHub Actions’ unopinionated and low-level traits may shine however in most cases I would suggest using Bitrise, especially if you’re working on an MVP project where deadlines and budgets are pretty tight.

Case study: PDF Insights with AWS Textract and OpenAI integration

Łukasz Pluszczewski — Mon, 14 Aug 2023 11:57:32 +0000

Original problem - automated PDF summarization

The company approached us with the issue of a large quantity of data to sift through in the form of pitchdecks. While each pitchdeck is generally fairly short, in most cases around 10 slides each, the issue is the number of them to analyze. We were faced with the task of automating the extraction of the most important information from unstructured hard-to-parse format - PDF. Additionally, the data is in the form of slides: with a lot of graphical cues and geometric relations between words that convey information not easily inferred from the text itself. To make it easier to analyze a large amount of data, we would need a solution that would automate as much of that process as possible: from reading the document itself, to finding interesting pieces of information like names of people involved, financial data, and so on.

Why is text extraction so hard?

The first issue we faced was getting the text contents from a PDF file. While extracting text directly from PDF, using open source tools like pdf-parse (which is used internally by langchain’s pdf-loader) did the job most of the time, we still had some issues with it: some PDFs were not parsed correctly and the tool returned empty string (like in the case of Uber sample pitchdeck ), we’ve just got some words split into individual characters and so on.

Unfortunately, getting the text contents of the PDF was just the beginning. The text in PDF is all over the place: we had slides with two or three words, some tables, lists, or just paragraphs squished between images. Below is the example of text extracted from page 2 of the example reproduction of AirBnB early pitchdeck (link, extraction done with pdf-parse library):

Welcome
AirBed&
Breakfast
Book rooms with locals, rather than hotels.
1
This is a PowerPoint reproduction of
an early AirBnB
pitchdeck
via Business Insider @
http
://
www.businessinsider.com
/airbnb
-
a
-
13
-
billion
-
dollar
-
startups
-
first
-
ever
-
pitch
-
deck
-
2011
-
9

And this is one of the better ones!

While parsing text like this is hard in itself, we also would like to be able to modify what extract from the text. We may want to know what people are involved in a business. Or do we just want to get all financial data, or maybe just the name of the industry? Each type of data extracted requires a different approach to parsing and validating text, and then a lot of testing.

How can it be solved?

Reliable text extraction

First, we’ve decided to leave open-source solutions behind. We’ve used AWS Textract to parse PDF files. This way we don’t rely on the internal structure of the PDF to get text from it (or to get nothing - like in the case of the Uber example). Textract uses OCR and machine learning to get not only text but also spatial information from the document.

Here is the Textract result (with all geometric information stripped) from the same page of the AirBnB pitchdeck reproduction

AirBed&Breakfast
Book rooms with locals, rather than hotels.
This is a PowerPoint reproduction of an early AirBnB pitchdeck via Business Insider @
http://www.businessinsider.com/airbnb-a-13-billion-dollar-startups-first-ever-pitch-deck-2011-9

But that’s not all! Textract responds with a list of Blocks (like “Page”, or “Line” for a line of text), together with their position and relationships which we can use to understand the structure of the document better

{
    "BlockType": "LINE",
    "Confidence": 99.91034698486328,
    "Geometry": {
      "BoundingBox": {
        "Height": 0.22368884086608887,
        "Left": 0.8931880593299866,
        "Top": 0.024829095229506493,
        "Width": 0.05453843995928764
      },
      "Polygon": [
        {
          "X": 0.9477264881134033,
          "Y": 0.02518528886139393
        },
        {
          "X": 0.9472269415855408,
          "Y": 0.2485179454088211
        },
        {
          "X": 0.8931880593299866,
          "Y": 0.2481813281774521
        },
        {
          "X": 0.8936926126480103,
          "Y": 0.024829095229506493
        }
      ]
    },
    "Id": "7a88c32b-a0f6-4392-aed5-c5ab8977f162",
    "Page": 1,
    "Relationships": [
      {
        "Ids": [
          "396d8b87-4712-4db0-a77d-0abbbf151bd3"
        ],
        "Type": "CHILD"
      }
    ],
    "Text": "Welcome"
  },

Most of the time, we don’t need such details, so in our case, we use only a fraction of them.

Summarisation process and AI

Now to actually parse the text and pull what we want from it. For that, the only solution that seemed viable was to use a language model. While we tested some open-source solutions, they were not up to the task. Hallucinations were too common, and responses too unpredictable. Additionally, most capable Open Source models available today are not licensed to be used commercially. So we went with the OpenAI GPT-3.5 and GPT-4 models.

We’ve decided to first let the model summarise the text and include all information from the pitchdeck in that summary. That way we have text that is complete (not just the outline) and has a structure that is easier to work with. We’ve used the following prompt for each page of the document:

Below is the text extracted from a single page of pitchdeck PDF. Write a summary of the page. List all people, statistics, and other data mentioned.
Include only what is in the text, avoid adding your own opinions or assumptions.
Answer with the bullet point summary and nothing more.

With additional instructions like “avoid adding your own opinions or assumptions” we minimize the hallucinations (models like to add fake data to the summary. GPT-3 even added a completely fake financial analysis!). When we have a summary of all pages we can ask the model to extract information from it. Here is an example of the prompt we’ve used to get the list of people referenced in the document:

List all people mentioned in the pages of the pitchdeck. Add their roles if that information is in the text.
Answer with the bullet point list and nothing more. Include only information that is in the pages, don\'t add your own opinions or assumptions.

The summarisation returned by the models (both GPT3 and 4) is of good quality: the information returned is factual and whatever is plainly stated in the document will end up in the summary as well.

However, the extracting of the list of people is a different story. Models, especially GPT-3, often answer with a list similar to this (not an actual response):

- Uber
- John Doe (CEO)
- Anabella Moody (CPO/CTO)
- j.doe@example.com
- (123) 555-0123

Not only this is clearly not a correct list of people, but also, the email was not in the source text at all, the model made it up!

We’ve also experimented with many variations of that prompt like:

Adding information that this is text extracted from PDF doesn’t seem to make any difference - models treat the input text the same way. When looking at the data there really isn’t any information for the model to infer anything from. We would need to include actual geometry data.
Skipping the summarisation part, and asking the model to get information from the text extracted from the whole document directly. This didn’t have much effect either (although I’ve seen a little worse responses at least in one case, but it was very subtle) which would suggest that we don’t need that summarisation step, especially when we do that for each page so we make quite a lot of requests. We’ve decided to keep it however as we may need a summary anyway.
Providing GPT with text together with spatial information returned by Textract. While this seems like a way to allow the model to infer some visual cues it is hard to figure out the right format. The JSON that Textract returns is quite verbose and it’s often too long to pass to the model (even with unnecessary fields stripped). Splitting up a page into smaller chunks seems wrong as the page context is often important to understand a chunk. This still needs investigating and more experiments.
While trying to solve the issue with inaccurate or hallucinated answers we’ve tried feeding the model with its answer so that it can validate and fix it. Unfortunately, our tests with GPT-3 failed - it didn’t see any issues with it’s made-up emails and phone numbers on a list that was supposed to contain the names of people. We need more tests with this approach using GPT-4 model though.

Next steps

What we miss and what is probably the most difficult is the ability to interpret the images and spatial relationships in PDF slides. While AWS Textract returns some spatial information it does not recognize images, and the data returned is hard to pass to the model. We’re still investigating how to make the model understand arrows, charts, and tables. Additionally, we would like to automate the process of online research e.g. find more information about companies mentioned in the documents using available APIs (like Crunchbase) or fetch more data on the people involved.

Summary

The case study addresses automating the extraction of vital details from numerous PDF pitchdecks. These decks are concise but numerous, making manual analysis impractical. The challenge involves extracting text and interpreting graphical elements. AWS Textract was employed for text extraction due to its OCR and layout understanding capabilities. OpenAI's GPT-3.5 and GPT-4 models were used to summarize and extract information, yet challenges arose in accurately extracting specific data like people's names or financial data. The study acknowledges the need to enhance image interpretation to understand visual elements better.

[EDIT]: Since the publication of this case study some new tools have appeared that make the process of parsing PDF presentations much, much easier. With multimodal language models like GPT-Vision, we can skip the OCR step and allow the system to interpret visual cues better than any pure text solution ever could. Stay tuned for more on that!

Collaborative Excellence: How Programmers and QA Unite in Pair Testing

Maciej — Mon, 07 Aug 2023 10:19:59 +0000

In the dynamic realm of the Polish IT industry, collaborative excellence takes center stage with the advent of pair testing. While gaining international popularity, pair testing remains a relatively fresh concept in our local environment. Through its implementation, we have witnessed an array of benefits, yet it also presented unique challenges within our mature software development model for clients. This article offers an exclusive glimpse into our firsthand experience with pair testing, unraveling its profound impact and transformative influence on software development practices.

What challenges did we face in the project? - case study

From almost the very beginning of the new large-scale project, we encountered an exceptional challenge - a massive scope of work at the start that needed to be comprehensively implemented from the early design stage, with very little room for error. At the same time, the team was rapidly growing, and as the client was satisfied, they assigned us even more tasks, resulting in the addition of new team members. In a short period of time since the project's inception, we already had several individuals involved at various levels: experienced full-stack JS specialists, DevOps professionals, business analysts, and, of course, a small QA team. As is often the case, the team consisted of individuals with different experiences from previous companies and projects, each with their own preferences for tools, solutions, and work methodologies. This required us to adopt a sophisticated and experimental approach to project management.

Our clients are highly engaged in the product development process, and we frequently communicate with them, seeking their input and providing advice on better solutions. On one hand, this is an advantage as they can provide us with valuable feedback quickly and continuously verify their requirements. On the other hand, collaborating with this particular client leads to frequent modifications of existing tasks, increasing the complexity of the project and the dependencies within it.

The power of Dev and QA collaboration

Looking back now, I can hardly believe how spontaneous interactions among team members can lead to such significant process changes. Our team's case is quite an interesting example, one that I believe can occur in most similar projects. It all started innocently with a regular video conference session between a new QA member and an experienced developer - nothing out of the ordinary, or so it seemed. It could have been just another developer explaining to a QA team member how to test a particular complex backend functionality. However, the interaction became so intense that it evolved into a conversation not only about the functionality itself but also about how to write tests for it, when to do so, and what techniques and tools to use. It ended up with both of them working in the same branch, where they had a fully implemented and tested functionality with API tests and unit tests.

At that time, we didn't formally call it pair testing, but it set off an intriguing dynamic. Developers, recognizing the benefits of these collaborative sessions, took the initiative to ask QA how they could improve tests if they could help, and ultimately, if they wanted to literally sit together for a few hours and complete a task from A to Z. We began to see a mutual need that transcended the boundaries of our traditional roles as stated on our CVs. Eventually, we started working together regularly on new tests, improving existing ones, maintaining them, and identifying uncovered test scenarios. This was a natural consequence of our increasingly frequent collaboration. Developers and QA simply began working in pairs.

What works well and what may not seem to...

Since we started working together in pairs on testing, we have observed a multitude of benefits. First and foremost, we have noticed that we deliver tasks much faster, and they are of higher quality. This is perhaps the greatest and most valuable outcome that can be achieved through pair testing. Combining the unique skills of a tester and a developer enables us to perform tasks more efficiently.

Thanks to this new approach, we now have more tests and test cases. Together, we can identify more potential issues during the coding phase, which ultimately translates into the comprehensiveness of our tests. We effortlessly handle tests and add new ones at various levels: API tests, end-to-end tests, integration tests, unit tests, and even recently, we found time for visual testing. Additionally, working in the pair testing model, where QA and developers collaborate on a single branch in our monorepo, has significantly reduced conflicts during merges or pull requests because we are both working on the same thing simultaneously.

Another noticeable advantage is the reduction in the number of reported bugs after releasing a new version of the application. This saves time because instead of desperately patching bugs in production, we can calmly focus on new features and upcoming tasks. One could say that during collaborative testing or test writing, we even perform some form of early code review. We often catch issues in each other's code during the early stages of coding. Additionally, our delivery cycle time has noticeably decreased for tasks taken under pair programming. Previously, programming and testing were often separate stages, which lengthened the entire process. Now, with our new approach, these two phases intertwine and occur simultaneously, resulting in shorter implementation times and faster feature delivery.

Pair programming has proven to be an invaluable tool in reducing context switching - a phenomenon that typically reduces work efficiency. By having the programmer and tester work together on a single task, they do not have to constantly shift their focus and switch between different tasks. This eliminates unnecessary downtime, such as setting up environments or database migrations, and allows them to concentrate on one problem. I firmly believe that pair programming is an effective way to minimize the negative impact of context switching.

Of course, like any newly introduced technique, pair testing may initially appear to have drawbacks. It may seem like a waste of resources when two people work on the same task simultaneously. However, when we consider the outcomes of this method, it becomes clear that it is an investment that brings us far greater benefits in the long run.

Pair testing - my personal definition

Pair testing, in simple terms, is:

The practice of testing in pairs, where two individuals collaborate on a single task to comprehensively create and test it.

Typically, one person is a testing specialist, while the other is a programmer who understands the technical aspects of the application. It's worth noting that this is not a strict rule, as there are cases where more individuals are involved in such work, such as business analysts.

During collaborative testing, the tester has the opportunity to understand the detailed technical aspects of the application, gaining broader perspectives, expanding their domain knowledge, and gaining a better understanding of the system's functionality. On the other hand, the programmer gets to see the application through the tester's eyes, which helps them better understand the expectations and quality requirements associated with their code. Additionally, pair testing encourages better communication, knowledge sharing, and understanding among team members, creating a more efficient workflow.

What we changed and how we reached an agreement

I want to share with you what we changed and how we reached an agreement regarding pair testing. The change wasn't easy, but as with anything, direct communication was the key. I started by informing all team members, both technical and non-technical, about our plans. We explained what pair testing is, the benefits it can bring, and how we planned to implement it. As a predominantly remote team, we started organizing video calls during our regular communication on messaging platforms. This allowed us to have real-time conversations - in the era of remote work, this aspect is crucial for exchanging ideas and clarifying any uncertainties on the spot.

We noticed that some tasks were large enough that pairing a QA with a developer for an extended period made sense during sprint planning or backlog review. We decided that regardless of who was assigned to a specific task, both individuals would log their work time. Why? Because both of us contribute effort and time to it. This way, everything aligns, and no one has missing hours, and our time tracking tool wonderfully shows who did what and when in a task.

Lastly, I want to add that although pair testing has proven to be an effective tool, not every task qualifies for this mode of work. Of course, we still execute most tasks in the normal mode, and we use pair testing as an excellent addition where we can benefit the most from this intensive collaboration.

Summary

This technique has shown us that, in the pursuit of quality, it is worthwhile to sometimes take unconventional steps and decisions, of course in agreement with the entire team and the client. Therefore, we encourage anyone interested in improving their testing process to experiment with pair testing. It may require some changes in teamwork, but the final results will definitely convince you to adopt this approach. So, don't be afraid to try, explore, and learn new techniques that are constantly evolving.

Taming Badly Typed External Libraries - How Zod Boosts Type Safety

Marcin Żmudka — Mon, 31 Jul 2023 10:07:39 +0000

TL;DR

In our projects, we use AdminJS, an external library that provides a GUI for managing database records. It's a great tool for rapidly creating CRUD interfaces for our clients. However, we ran into some issues with poorly typed code inside the library that made it difficult to integrate with our own custom logic. This is where Zod, a validation library, came to the rescue.

Zod enabled us to create robust type definitions and validate data against those definitions. This helped us to avoid runtime errors and catch issues early in development. In this article, we'll take a look at how Zod helped us to deal with the poorly typed code in AdminJS and how it improved the overall quality of our code.

Challenges with Poorly Typed External Library

When using external libraries, it's not uncommon to run into issues with poorly typed code. This was the case with AdminJS, a library that we used in our projects to provide a GUI for managing database records. While AdminJS is a great tool for rapidly creating CRUD interfaces for our clients, it also presented some challenges.

One of these challenges was creating schemas for the objects that we received from AdminJS. The library provides a Record<string, any> type, which is not particularly useful when it comes to type safety. We needed to define more robust type definitions for our data to avoid runtime errors.

This is where Zod, a validation library, came into play. We used Zod to create schemas for the objects that we received from AdminJS, which enabled us to catch errors early in development and avoid issues in production.

The code snippet above demonstrates how we defined a schema for a dashboard object that we received from AdminJS. We used Zod's object method to create an object schema with three properties: title, project, and type. We also used Zod's nativeEnum method to create an enumeration schema for the type property, which accepts two specific string values.

By creating these schemas with Zod, we were able to define the exact shape of the data we expected to receive, and catch any errors if the data did not match that shape. This helped us to ensure the quality of our code and avoid any issues caused by the poorly typed code in AdminJS.

import { z } from 'zod';

enum DashboardProviderType {
  GRAFANA = "GRAFANA",
  GOOGLE_STUDIO = "GOOGLE_STUDIO"
}

const DashboardProviderTypeEnum = z.nativeEnum(DashboardProviderType);

const DashboardPayloadSchema = z.object({
  title: z.string(),
  project: z.string(),
  type: DashboardProviderTypeEnum,
});

Creating SubSchemas to Handle Specific Data

In the previous section, we defined a schema for the entire payload that we receive from AdminJS. However, in some cases, we might only be interested in a specific part of the payload, or we might not need all the fields from the payload. In such cases, it is useful to create sub-schemas, which define a subset of the original schema.

ZOD provides several methods to create sub-schemas, such as pick, omit, and partial.

In the code snippet provided, we are using these methods to create three different sub-schemas from the original DashboardPayloadSchema:

DashboardTypeSchema: This schema picks the 'type' field from the DashboardPayloadSchema and creates a new schema with only that field.
DashboardPartialSchema: This schema creates a partial schema from the DashboardPayloadSchema, which means that all the fields are optional.
DashboardWithoutTypeSchema: This schema omits the 'type' field from the DashboardPayloadSchema and creates a new schema with only the 'title' and 'project' fields.

Creating sub-schemas can be useful when we want to validate only a part of the object, or when we want to reuse some of the fields in a different schema. It can also help in simplifying the validation logic and making it more readable.

const DashboardTypeSchema = DashboardPayloadSchema.pick({ type: true });
/*
{
    type: DashboardProviderType;
}
*/

const DashboardPartialSchema = DashboardPayloadSchema.partial() 
/* 
{
    type?: DashboardProviderType | undefined;
    title?: string | undefined;
    project?: string | undefined;
}
*/

const DashboardWithoutTypeSchema = DashboardPayloadSchema.omit({ type: true });
/*
{
    title: string;
    project: string;
}
*/

Refining Schemas with Custom Validation

One of the key features of Zod is the ability to refine schemas with custom validation logic. The refine method can be used to add validation rules to a schema, allowing you to ensure that data meets specific requirements before it is processed.

In the following code snippet, we use refine to ensure that the title field in our DashboardPayloadSchema is no longer than 255 characters. If the title is too long, Zod will throw an error with a custom message.

const DashboardTitle = z.string().refine((title) => title.length < 255, {
  message: "Title cannot be longer than 255 chars"
});

const DashboardPayloadSchema = z.object({
  title: DashboardTitle,
  project: DashbaordProject,
  type: DashboardProviderTypeEnum
});

The refine method can also accept asynchronous functions, as shown in the following example. Here, we use refine to ensure that the project field in our schema exists in a database. If the project does not exist, Zod will throw an error with a custom message.

const DashboardProject = z.string().refine(async (projectName) => {
  const project = await getProject(projectName);
  return !!project;
}, {
  message: "Project must exist in database!"
});

const DashboardPayloadSchema = z.object({
  title: DashboardTitle,
  project: DashbaordProject,
  type: DashboardProviderTypeEnum
});

Modifying Validated Values with `transform()`

While validation ensures that the received data conforms to the schema, sometimes you might need to transform the data to a different format or structure. For example, you might need to extract a certain substring from a string or make a database call to fetch additional data based on a received value.

Zod provides the transform() method that allows you to modify the validated values before returning them. The transform() method accepts a synchronous or asynchronous function that takes the validated value and returns the transformed value.

In the following code snippet, we define a DashboardProject schema that refines the received project string to ensure that it starts with the "PR_" prefix. Then, we use the transform() method to remove the prefix before returning the value.

const DashboardProject = z.string()
  .refine((project ) => project.startsWith("PR_"), {
    message: "Project must start with 'PR_' prefix"
  })
  .transform((project) => project.replace("PR_", ""));

DashboardProject.parse("PR_MT"); // => "MT"

You can also use an asynchronous function with transform() to perform more complex operations, such as making a database call:

const DashboardProject = z.string()
  .refine((project) => project.startsWith("PR_"), {
    message: "Project must start with 'PR_' prefix"
  })
  .transform(async (project) => {
    const projectObject = await getProjectObjectFromDB(project);
    return projectObject;
  });

In the above example, the getProjectObjectFromDB() function is an asynchronous function that fetches the project object from the database based on the received project string. The transform() method applies this function to the validated value and returns the result.

Inferring Types and Creating Type Guards

type Dashboard = z.infer<typeof DashboardPayloadSchema>;

// equivalent to:
type Dashboard = {
  title: string;
  project: string;
  type: DashboardProviderType;
};

export const isDashboard = (payload: unknown): payload is Dashboard => {
  return DashboardPayloadSchema.safeParse(payload).success;
};

In the above code snippet, the infer method is used to automatically infer the type of the schema defined by DashboardPayloadSchema. The inferred type is then assigned to a type alias called Dashboard. This allows us to use the inferred type throughout our codebase without having to manually define it.

Next, the code exports a type guard function called isDashboard. A type guard is a function that checks if a value is of a certain type at runtime. In this case, the isDashboard function checks if the provided payload conforms to the Dashboard type, by attempting to parse the payload using the DashboardPayloadSchema. If the parsing succeeds, the function returns true, indicating that the payload is a valid Dashboard. If the parsing fails, the function returns false.

Using type guards like isDashboard can help catch type errors at runtime and make our code more robust, especially when working with data from external sources like APIs or databases where the shape of the data may not be known in advance.

Summary

In this article, we explored how Zod, a validation library, came to the rescue when integrating AdminJS, an external library with poorly typed code. AdminJS provides a Record<string, any> type, leading to type safety issues. Zod helped us create robust type definitions and validate data against those definitions, catching errors early in development.

We defined a schema for a dashboard object using Zod's object and nativeEnum methods, ensuring the expected data shape. We also created sub-schemas with pick, omit, and partial for specific parts of the payload. Custom validations were added with the refine method to enforce requirements like string length and database existence.

Zod's transform method allowed us to modify validated values, and we learned how to infer types using infer, creating type guards to catch type errors at runtime. Overall, Zod improved code quality and reduced runtime errors, making our integration with AdminJS more efficient and reliable.

Electrons Are Fast, So Can Be Electron – How to Optimize Electron App Performance

Daniel Kawka — Mon, 24 Jul 2023 09:13:00 +0000

Electron is a popular framework for building desktop applications for different systems using the same codebase.

However, we often hear it is slow, consumes a lot of memory, and spawns multiple processes slowing down the whole system. Some very popular applications are built using Electron, including:

Microsoft Teams (but they are migrating to Edge Webview2),
Signal,
WhatsApp.

Not all of them are perfect, but there are some very good examples, like Visual Studio Code. Can we say it’s slow? In our experience, it’s the opposite – it’s quite performant and responsive.

In this article, we’ll show you how we reduced bottlenecks in our Electron application and made it fast! The presented method can be applied to Node.js-based applications like API servers or other tools requiring high performance.

Electron-based game launcher is our test subject

Our project is an Electron-based game launcher. If you play games, you probably have a few of them installed on your computer. Most launchers download game files, install updates, and verify files so games can launch without any problems.

There are parts we can’t speed up that are dependent on, e.g., connection speed, but when it comes to verifying downloaded or patched files, it’s a different story, and if the game is big, it can take an impressive amount of time for the whole process. This is our case.

Our app is responsible for downloading files and, if eligible, applying binary patches. When this is done, we must ensure that nothing gets corrupted. It does not matter what causes the corruption, our users want to play the game, and we have to make it possible.

Now, let me give you some numbers. Our games consist of 44 files of a total size of around ~4.7GB.

We must verify them all after downloading the game or an update. We used https://www.npmjs.com/package/crc to calculate the CRC of each file and verify it against the manifest file, let’s see how performant this approach is, time for some benchmarks.

Running the Electron app pre-performance-optimization benchmark test

All benchmarks are run on a 2021 MacBook Pro 14’ M1 Pro.

First, we need some files to verify. We can create a few using the command

mkfile -n 200m test_200m_1

But if we look at the content, we will see it’s all zeros!

➜  /tmp cat test_200m_1 | xxd | tail -n 10
0c7fff60: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fff70: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fff80: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fff90: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fffa0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fffb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fffc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fffd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7fffe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0c7ffff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

This might give us skewed results. Instead, we will use this command

dd if=/dev/urandom of=test_200m_1 bs=1M count=200

I will create 10 files, 200MB each, and because the data in them is random, they should have different checksums.

The benchmark code:

import crc32 from "crc/crc32";
import { createReadStream } from "fs";

const calculate = async (path) => {
 return new Promise((resolve, reject) => {
   let checksum = null;
   const readStream = createReadStream(path);

   readStream.on("data", (chunk) => {
     checksum = crc32(chunk, checksum);
   });

   readStream.on("end", () => {
     resolve(checksum);
   });

   readStream.on("error", () => {
     resolve(false);
   });
 });
};

const then = new Date().getTime();
await calculate("test_200m_1");
const now = new Date().getTime();
const elapsedTimeInMs = now - then;

console.log(elapsedTimeInMs);

It takes around 800ms to create the read stream and calculate the checksum incrementally. We prefer streams because we can’t afford to load big files into system memory. If we calculate CRC32 for all files one by one, the result is ~16700ms. It slows down after the 3rd file.

Is it any better if we use Promise.all to run them concurrently? Well… this is at the limit of measurement error. It varies at around ~16100ms.

So, here are our results so far:

Single file	10 files one by one	10 files in Promise.all
~800ms	~16700ms	~16100ms

Possible ways to optimize an Electron app performance

There are many paths you can take when optimizing an Electron app, but we are primarly interested in:

NodeJS Worker Threads
Node-API
Neon
Napi-rs
Other JS library that works natively

NodeJS Worker Threads

Worker Thread requires some boilerplate code around it. Also, it might be problematic if your code base is in TypeScript, it’s doable but requires additional tools like ts-node or configuration. We don’t want to spawn who knows how many worker threads – this would be inefficient too. The performance problem is somewhere else. It will be slow wherever we put this calculation.

Conclusion: spawning worker threads will slow down our app even more, so NodeJS Worker Threads is not for us.

Node-API

If we want it fast, Node-API looks like a perfect solution. A library written in C/C++ must be fast. If you prefer to use C++ over C, the node-addon-api can help. This is probably one of the best solutions available, especially since it is officially supported by the Node.js team. It’s super stable once it is built, but it can be painful during development. Errors are often far from easy to understand, so if you are no expert in C, it might kick your ass very easily.

Conclusion: we don’t have C skills to fix the errors, so Node-API is not for us.

Neon Bindings

Now it is getting interesting, Neon Bindings. Rust in Node.js sounds amazing, another buzzword, but is it only a buzzword? Neon says it is being used by popular apps like 1Password and Signal https://neon-bindings.com/docs/example-projects, but let’s take a look at the other Rust-based option, which is NAPI-RS.

Conclusion: Neon Bindings looks promising, but let’s see how it compares to our last option.

NAPI-RS

If we look at the documentation, NAPI-RS’s docs look much better than Neon’s. The framework is sponsored by some big names in the industry. The extensive documentation and support of big brands are sufficient reasons for us to go with NAPI-RS rather than Neon Bindings.

Conclusion: NAPI-RS provides better documentation than comparable Neon Bindings and therefore makes a safer choice.

Using NAPI-RS to optimize the Electron app performance

To optimize our Electron app, we’ll use NAPI-RS, which mixes Rust with Node.js. Rust is an attractive addition to Node.js because of its performance, memory safety, community, and tools (cargo, rust-analyzer). No wonder it’s one of the most liked languages and why more and more companies are rewriting their modules to Rust.

With NAPI-RS, we need to build a library that includes https://crates.io/crates/crc32fast to calculate CRC32 extremely fast. NAPI-RS gives us great ready-to-go workflows to build NPM packages, so building it and integrating it with the project is a breeze. Prebuilts are supported, too, so you don’t need the Rust environment at all to use it, the correct build will be downloaded and used. No matter if you use Windows, Linux, or MacOS (Apple M1 machines are on the list too.)

With the crc32fast library, we will use the Hasher instance to update the checksum from the read stream, as in JS implementation:

// Spawn and run the thread, it starts immediately
   let handle = thread::spawn(move || {
     // Has to be equal to JS implementation, it changes the checksum if different
     const BUFFER_LEN: usize = 64 * 1024;
     let mut buffer = [0u8; BUFFER_LEN];

     // Open the file, if it fails it will return -1 checksum.
     let mut f = match File::open(path) {
       Ok(f) => f,
       Err(_) => {
         return -1;
       }
     };

     // Hasher instance, allows us to calculate checksum for chunks
     let mut hasher = Hasher::new();

     loop {
       // Read bytes and put them in the buffer, again, return -1 if fails
       let read_count = match f.read(&mut buffer[..]) {
         Ok(count) => count,
         Err(_) => {
           return -1;
         }
       };

       // If this is the last chunk, read_count will be smaller than BUFFER_LEN.
       // In this case we need to shrink the buffer, we don't want to calculate the checksum for a half-filled buffer.
       if read_count != BUFFER_LEN {
         let last_buffer = &buffer[0..read_count];
         hasher.update(&last_buffer);
       } else {
         hasher.update(&buffer);
       }

       // Stop processing if this is the last chunk
       if read_count != BUFFER_LEN {
         break;
       }
     }

     // Calculate the "final" checksum and return it from thread
     let checksum = i64::from(hasher.finalize());
     checksum

Running the Electron app post-performance-optimization benchmark test

It might sound like a fake or invalid result but it’s just 75ms for a single file! It’s ten times faster than the JS implementation. When we process all files one by one, it’s around 730ms, so it also scales much better.

But that’s not all. There is one more quite simple optimization we can make. Instead of calling the native library N times (where N is the number of files), we can make it accept an array of paths and spawn a thread for each file.

Remember: Rust does not have a limit on the number of threads, as these are OS threads managed by the system. It depends on the system, so if you know how many threads will be spawned and it’s not very high, you should be safe. Otherwise, we would recommend putting a limit and processing files or doing the computation in chunks.

Let’s put our calculation in a thread per single file and return all checksum at once

// Vector of threads, to be "awaited" later
let mut threads = Vec::<std::thread::JoinHandle<i64>>::new();

for path in paths.into_iter() {
 // Spawn and run the thread, it starts immediately
 let handle = thread::spawn(move || {
   // ... code removed for brevity
 });

 // Push handle to the vector
 threads.push(handle);
}

// Prepare an empty vector for checksums
let mut results = Vec::<i64>::new();

// Go through every thread and wait for it to finish
for task in threads.into_iter() {
 // Get the checksum and push it to the vector
 let result = task.join().unwrap();
 results.push(result);
}

// Return vector(array) of checksums to JS
Ok(results)

How long does it take to call the native function with an array of paths and do all the calculations?

Only 150ms, yes, it is THAT quick. To be 100% sure, we restarted our MacBook and did two additional tests.

First run:

Rust took 463ms Checksums [
  2918571326,  644605025,
   887396193, 1902706446,
  2840008691, 3721571342,
  2187137076, 2024701528,
  3895033490, 2349731754
]
JS promise.all took 16190ms Checksum [
  2918571326,  644605025,
   887396193, 1902706446,
  2840008691, 3721571342,
  2187137076, 2024701528,
  3895033490, 2349731754
]

Second run:

Rust took 197ms Checksums [
  2918571326,  644605025,
   887396193, 1902706446,
  2840008691, 3721571342,
  2187137076, 2024701528,
  3895033490, 2349731754
]
JS promise.all took 16189ms Checksum [
  2918571326,  644605025,
   887396193, 1902706446,
  2840008691, 3721571342,
  2187137076, 2024701528,
  3895033490, 2349731754
]

Let’s bring all the results together and see how they compare.

	JS	Rust
Single file	~800ms	~75ms
10 files one by one	~16700ms	~730ms
10 files Promise.all	~16100ms	-
10 files in threads	-	~200ms

It’s worth noting that calling the native function with an empty array takes 124584 nanoseconds which is 0.12ms so the overhead is very small.

Remember to keep your Electron app unpacked

As mentioned in the beginning, all of this applies to Web APIs, CLI tools, and Electron. Basically, to everything where Node.js is used. But with Electron, there is one more thing to remember. Electron bundles the app into an archive called app.asar. Some Node modules must be unpacked in order to be loaded by the runtime. Most bundlers like Electron Builder or Forge automatically keep those modules outside the archive file, but it might happen that our library will stay in the Asar file. If so, you should specify what libraries should remain unpacked. It’s not mandatory but will reduce the overhead of unpacking and loading these .node files.

Our advice: Try experimenting with Rust and C to improve your Electron app performance

As you can see, there are multiple ways of speeding up parts of your Electron application, especially when it comes to doing heavy computations. Luckily, developers can choose from different languages and strategies to cover a wide spectrum of use cases.

In our app, verifying files is only part of the whole launcher process. The slowest part for most players is downloading the files, but this cannot be optimized beyond what your internet service provider offers. Also, some players have older machines with HDD disks where IO might be the bottleneck and not the CPU.

But if there is something we can improve and make more performant at reasonable costs, we should strive for it. If there are any functions or modules in your application that can be rewritten in either Rust or C, why not try experimenting? Such optimizations could significantly improve your app’s overall performance.

Granular Permission Management with CASL Library

Tomek Piela — Tue, 18 Jul 2023 12:09:22 +0000

Managing permission for complex applications is too… complex

User permissions management is one of the biggest challenges for complex applications. With multiple users working on different aspects, it is important to ensure that each user has the appropriate level of access to the data they need to do their job. Access control becomes an even bigger issue when the roles keep on changing as the application grows, especially in large organizations with complex hierarchies and multiple user roles.

Here's how we set up granular project management in a Metrics Tool application.

What is a Metrics Tool?

It is an essential platform for IT project management that allows users to monitor and evaluate the performance of various IT projects. The tool provides insights and metrics on project progress and other critical parameters, enabling businesses to make data-driven decisions.

Granular permissions with CASL library

This is where the CASL library comes into play.

CASL is a library for managing user permissions and access control in JavaScript applications. It provides a flexible and powerful way to define user roles and permissions and to enforce those permissions across the application.

With the CASL library, administrators can define granular permissions for different user roles, ensuring that users only have access to the features and data they need to perform their tasks. For example, project managers may have access to all project metrics, while team members may only be able to view metrics related to their specific projects.

CASL Library in action

By using the CASL library in the Metrics Tool project, organizations can ensure that their data is secure and that users have access to the right information, resulting in better decision-making and improved project outcomes.

import { AbilityBuilder, createMongoAbility } from '@casl/ability';
import { User } from '../models' // application specific interfaces

function defineAbilitiesFor(user: User) {
  const { can, cannot, build } = new AbilityBuilder(createMongoAbility);

  // can read blog posts
  can('read', 'BlogPost');
  // can manage their own blog posts
  can('manage', 'BlogPost', { author: user.id });
  // cannot delete published blog posts that where created more than a day ago
  cannot('delete', 'BlogPost', { 
    isPublished: true, 
    createdAt: { 
      $lt: Date.now() - 24 * 3600 * 1000 
    } 
  });

  return build();
}

DEV Community: Brainhub

Make Notion search great again: Vector Database

Vector Database

Weaviate DB

Weaviate interface

Traditional search

Vector search

Hybrid search

Rerank

Summary

What worked and what didn’t

Make Notion search great again: Notion API

Notion integrations

Notion client

Rate limiting

Pagination

Blocks

Pages and Databases

Make Notion search great again: semantic search

Airbus or Boeing?

Vectors are stupid, language models are not

Rerank

Make Notion search great again: vector embeddings

Brown foxes and lazy dogs

Embeddings to the rescue

Solving the Tech Debt Puzzle: Strategies that boost business

Encountered challenges

The real challenge: Where to start?

Establishing a Testing Strategy

Refactoring: Purpose Over Perfection

Where to start?

Case study: Let's try to put it into practice

Example: User registration process update

Summary

Unlocking Agile Potential with GrowthBook and Feature Flags

Bigger picture

Why did we decide to try feature flags?

Requirements

Why did we choose GrowthBook?

User and permission management:

Feature flag creation and modification:

Environment management:

Well-documented documentation:

Easy integration of SDKs:

Self-hosted capability:

Docker setup

Administration panel

React implementation

NestJS implementation

Summary

iOS CI/CD Evolution: From Bitrise to GitHub Actions Migration Study

Background

The workflow

Apple to oranges?

Code signing

Pricing

Conclusion

Case study: PDF Insights with AWS Textract and OpenAI integration

Original problem - automated PDF summarization

Why is text extraction so hard?

How can it be solved?

Reliable text extraction

Summarisation process and AI

Next steps

Summary

Collaborative Excellence: How Programmers and QA Unite in Pair Testing

What challenges did we face in the project? - case study

The power of Dev and QA collaboration

What works well and what may not seem to...

Pair testing - my personal definition

What we changed and how we reached an agreement

Summary

Taming Badly Typed External Libraries - How Zod Boosts Type Safety

TL;DR

Challenges with Poorly Typed External Library

Creating SubSchemas to Handle Specific Data

Refining Schemas with Custom Validation

Modifying Validated Values with transform()

Inferring Types and Creating Type Guards

Summary

Modifying Validated Values with `transform()`