DEV Community: Roie Schwaber-Cohen

Building a Multi-User Chatbot with Langchain and Pinecone in Next.JS

Roie Schwaber-Cohen — Mon, 27 Mar 2023 19:14:32 +0000

This post was originally published on the Pinecone.io Learning Center.

Building a chatbot has become a hot skill, and with the release of ChatGPT we see a huge number of chat applications being released. At the root of all of these applications live Large Language Models - the engine of the generative AI train. But this beast must be tamed - and that’s not always an easy task.

Since LLMs are now such an integral piece of the puzzle, there are several challenges we need to tackle in order to productionize our chatbots,

Grounding - By default, LLMs can produce responses that may have nothing to do with objective reality. We call these responses “hallucinations” - they may seem real and even convincing, but they might be entirely wrong. We need to come up with some mechanism that will ground the conversation in some source of truth we can trust.
Query limits - When we combine existing knowledge with the context, we often hit against the query limits set by the LLM provider (e.g. OpenAI)
Conversational memory - LLMs are stateless, which means they have no concept of memory. That means that they don’t maintain the chain of conversation on their own. This may cause the conversation to feel pretty frustrating for users. We need to build a mechanism that will maintain the conversation history that will be part of the context for each response we get back from the chatbot.
Multiple users - Our chatbot could be interacting with multiple users in real-time. That means we need to maintain separate conversational memory and context for each conversation.

There’s a wave of tools created specifically to make it easier for developers to work with LLMs in the context of creating conversational agents. Perhaps the best known of these tools is Langchain. It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots. Together with Pinecone, it allows us to build a knowledge base that our bot can interact with and respond to the user with contextually accurate information.

In this example, we’ll imagine that our chatbot needs to answer questions about the content of a website. To do that, we’ll need a way to store and access that information when the chatbot generates its response. That’s where the knowledge base comes in. The knowledge base is a repository of information that can be queried by our chatbot. We will need to access this information semantically, and we’ll use an LLM to get embeddings for our textual data and store it in Pinecone. The textual data in our case will come from a website which we’ll crawl regularly. After creating our index, our chatbot will be able to leverage to ground its answers in the relevant content to the user’s prompts.

Prerequisites

We assume you’re familiar with Next.JS or have a good understanding of Javascript.
This demo uses a collection of amazing services, and you’ll need to open free accounts in order to use the demo without modification:
- Pinecone
- OpenAI
- Ably
- CockroachDB
- Fingerprint Pro

Architecture

At a very high level, here’s the architecture for our chatbot:

There are three main components: The chatbot, the indexer and the Pinecone index.

The indexer crawls the source of truth, generates vector embeddings for the retrieved documents and writes those embeddings to Pinecone
A user makes a query to the chatbot
The chatbot queries Pinecone for the source of true
The chatbot responds to the user.

Let’s take a deeper look at the Indexer:

The indexer’s role is to crawl our source of truth, call the embedding model provider to generate embeddings for each document and then index those documents in Pinecone. One important detail to mention here is that the quality of the data we get from our crawler will directly affect the quality of the results our chatbot produces, so it’s critical that our crawler is able to clean up the fetched data from our source of truth as much as possible.

Next, here’s our chatbot itself:

When the user sends a prompt, we’ll pass it to the Inquiry builder chain which will produce an inquiry that is based on the conversation history. This will ensure that our queries downstream take into account questions that the user already asked. For example, if the user asked: “Where can I buy a computer?” and then follows up by “How much will it cost?”, the inquiry builder will know to interpret the user’s intent by formulating the final inquiry “How much will the computer cost?”.
Whenever a new inquiry is created, we save it in our conversation history log.
When an inquiry is resolved, it will be used to query the Pinecone index which is populated by documents inserted by our indexer. This will result in a number of potential hits, each with a corresponding document from our source of truth.
Since these documents are most likely to be long, we’ll use a summarizer chain to summarize long documents and produce a finalized summarized document that will be the used to compose the final answer. The summarizer will be aware of the inquiry and attempt to maintain as much relevant information to that inquiry as possible.
Finally our QnA chain will combine the summarized document, the conversation history and the inquiry to produce a final response to the user’s prompt.

We still have to address our multi-user strategy: we need to ensure that users interacting with our chatbot don’t contaminate each other’s conversational memories, and that responses are streamed back from the chatbot to the user that originated the conversation.

Since we’re not going to require authentication for every user that connects to our chatbot, we’ll resolve some unique ID (Or “Fingerprint”) that will help us identify users based on their browser. Our chatbot will use this unique ID to save the conversation history for each user using that key to separate them out from each other. It’ll also use the ID to stream back our responses from our chatbot over a unique (and resilient) streaming channel.

Working with Langchain

As we mentioned before, Langchain provides a collection of very helpful abstractions that make our lives easier when we build LLM based applications. To build a “chain” in Langchain, we need a model and a prompt. The prompt will be what is sent to the model when we query it, and Langchain gives us a helpful formatting utility called the PromptTemplate:

import { PromptTemplate } from "langchain/prompts";

const template = "What sound does the {animal} make?";
const prompt = new PromptTemplate({
  template: template,
  inputVariables: ["animal"],
});

Langchain also makes it very easy to interact with LLM providers like OpenAI. Here’s how we define a model using OpenAI as a provider:

import { OpenAI } from "langchain/llms";
const llm = new OpenAI();

Here’s how we use this prompt template and the model to produce a chain:

import { LLMChain } from "langchain/chains";
const chain = new LLMChain({ llm, prompt });

To invoke the chain, we use the call method:

const response = await chain.call({ animal: "cat" });
console.log({ response });

As you’ll see, this very simple paradigm of templating our prompts before very powerful when we combine a series of chains together.

While Langchain provides many types of conversational memory utilities, it doesn’t natively handle dealing with multiple users interacting with the same chatbot. We want the user to be able to interact with our knowledge base and ask it questions, without the chatbot loosing the thread of the conversation – and without polluting other threads with irrelevant information from other users interacting that with it. So for that purpose, we’ll build our own conversational memory utility that does something very similar to what Langchain does. More on that later in the post.

The build

Time to build this thing! We’re not going to review every line of the code - for that, you can review this repository. Instead, we’ll focus on the pertinent parts of the code that take some explaining.

Indexer

As we mentioned above, the indexer starts with the crawler. We use node-spider and cheerio to crawl our target url. Whenever we fetch a page, we parse it and find all the href elements in it - and if they are part of the same root domain, we queue them for to be downloaded. Since we’re planning to use the content for semantic search, we want to do away with all the HTML and preserve just the content. For that purpose, we use the turndown library which helps us convert HTML to markdown.

// Instantiate the crawler
const crawler = new Crawler(urls, 100, 200);
// Start the crawler
const pages = (await crawler.start()) as Pages[];

At the end of the crawling process, we an array of pages, each containing the markdown content of the page, it’s URL, and it’s title.

Dealing with rate limits

The process continues with two steps: embedding and indexing, and both of them are rate limited in some ways. Let’s see how to ensure our embedder and indexer play nice with these rate limits.

Embedding

We want our chatbot to be able to query Pinecone using natural language, and get back semantically relevant information. To do that, we need to do four things:

Break up the pages we crawled into small chunks
Associate each chunk with it’s original text. Once we get a “hit” on that chunk, we want to be able to use the entire text to build our final answer.
Create the vector embeddings for the chunked text.
Since Pinecone allows us to save up to 40k of data in the metadata object, we need to truncate the original text if it’s too big.

First, we instantiate an OpenAIEmbedding instance using the gpt-3.5-turbo model. Then, we use Langchain’s RecursiveCharacterTextSplitter to split the pages into chunks.

const embedder = new OpenAIEmbeddings({
  modelName: "gpt-3.5-turbo",
});

const documents = await Promise.all(
  pages.map((row) => {
    const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 300,
      chunkOverlap: 20,
    });
    const docs = splitter.splitDocuments([
      new Document({
        pageContent: row.text,
        metadata: {
          url: row.url,
          text: truncateStringByBytes(row.text, 35000),
        },
      }),
    ]);
    return docs;
  })
);

The OpenAI API embedding endpoint is limited to 3,000 requests per minute. In order to ensure we’re not blowing past the limit, we use the Bottleneck library, which allows us to control the rate at which requests are made.

const limiter = new Bottleneck({
  minTime: 50,
});

const rateLimitedGetEmbedding = limiter.wrap(getEmbedding);
vectors = (await Promise.all(
  documents.flat().map((doc) => rateLimitedGetEmbedding(doc))
)) as unknown as Vector[];

The minTime parameter defines the minimal amount of time (in milliseconds) each request will take. By wrapping the getEmbedding function we now ensure the limiter controls the rate in which it is fired.

Upserting

Now that we have our embeddings, it’s time to upsert them into Pinecone. This operation is also rate limited - we have a maximum of 2MB of vectors we can send per upsert operation. Given that we’re packing a whole lot of metadata in each vector, we should chunk our vectors array before upserting.

const sliceIntoChunks = (arr: Vector[], chunkSize: number) => {
  const res = [];
  for (let i = 0; i < arr.length; i += chunkSize) {
    const chunk = arr.slice(i, i + chunkSize);
    res.push(chunk);
  }
  return res;
};

const chunks = sliceIntoChunks(vectors, 10);

await Promise.all(
  chunks.map(async (chunk) => {
    await index!.upsert({
      upsertRequest: {
        vectors: chunk as Vector[],
      },
    });
  })
);

And that’s it! Our crawler is ready. To run the crawler, and assuming we created our Pinecone index, we simply need to start the server and issue the following request:

GET https://localhost:3000/api/crawl?urls=url1,url2&limit=10&indexName=yourIndexName

When the request completes, our new embeddings would have been upserted to Pinecone.

Chatbot

We want our chatbot to be able to answer questions based on the information found in the documents we embedded and saved in Pinecone. In this portion of the post, we’ll see how to leverage Langchain to build a collection of "chains," each improving the performance of our chatbot.

A large part of what we have to do here is what’s known as “prompt engineering," where we fine-tune the exact prompt that is sent to our chatbot in order to get the best response to our situation. Prompt engineering is more of an art than a science at this point, and there are no “right” answers for the most part. There a good practices, and a lot of tips and tricks we can apply - but the bottom line is that you’ll have to do your own finessing when it comes to the finding the specific prompt that will work for your situation.

As you saw in the architectural layout of our chatbot, we have the following steps:

Inquiry builder - takes the user prompt, inject the conversation context and builds a final inquiry that takes the context into account
Semantic document retrieval - we embed the inquiry and use it to query the documents indexed in Pinecone
Summarization chain (optional) - in our specific case, the documents we retrieve from Pinecone are going to be too long to send to OpenAI to formulate a final answer (they most likely are more than 4000 characters long). In order to overcome this, we chunk and summarize these long documents while preserving content that’s important to us. For example, it’s important to use that code samples found in documents remain intact - so we’re going to tell our summarizer to keep them unmodified even after it summarizes their original text. That said, this step is not always required, and we might be able to see good results without summarizing the full version of the document and instead relying just on the indexed chunks.
Final QnA Chain - we provide the summaries, the conversational history and the inquiry to the model to produce a final result.

User Based Conversational History

As we mentioned before, we want to make sure the conversation our user has with the chatbot is as natural as possible. In order for the chatbot to “understand” what was discussed already, we need to provide it with the conversation context. We’re using a simple SQL table (hosted on CockroachDB) to store each conversation entry:

public async addEntry({ entry, speaker }: { entry: string, speaker: string }) {
  try {
    await sequelize.query(`INSERT INTO conversations (user_id, entry, speaker) VALUES (?, ?, ?) ON CONFLICT (created_at) DO NOTHING`, {
      replacements: [this.userId, entry, speaker],
    });
  } catch (e) {
    console.log(`Error adding entry: ${e}`)
  }
}

To retrieve the conversation history, we use the following function which takes the most recent conversations (based on a limit) and returns them as an array of strings:

public async getConversation({ limit }: { limit: number }): Promise<string[]> {
  const conversation = await sequelize.query(`SELECT entry, speaker, created_at FROM conversations WHERE user_id = '${this.userId}' ORDER By created_at DESC LIMIT ${limit}`);
  const history = conversation[0] as ConversationLogEntry[]

  return history.map((entry) => {
    return `${entry.speaker.toUpperCase()}: ${entry.entry}`
  }).reverse()
}

We can now use this conversation history as part of the context to the various chains used by our chatbot.

Finessing the query

The user can use whatever prompt they’d like, and as we said before - because we want to maintain the conversation as natural as possible, we take the user’s raw prompt, combine it with the conversation history and finally produce an inquiry which will be focused on the knowledge base we’ve created.

In order to build the inquiryChain we first need a template. Here’s an example of what that might look like:

`Given the following user prompt and conversation log, formulate a question that would be the most relevant to provide the user with an answer from a knowledge base.
  You should follow the following rules when generating and answer:
  - Always prioritize the user prompt over the conversation log.
  - Ignore any conversation log that is not directly related to the user prompt.
  - Only attempt to answer if a question was posed.
  - The question should be a single sentence.
  - You should remove any punctuation from the question.
  - You should remove any words that are not relevant to the question.
  - If you are unable to formulate a question, respond with the same USER PROMPT you got.

  USER PROMPT: {userPrompt}

  CONVERSATION LOG: {conversationHistory}

  Final answer:`;

And this is how we call the chain:

const inquiryChain = new LLMChain({
  llm,
  prompt: new PromptTemplate({
    template: templates.inquirerTemplate,
    inputVariables: ["userPrompt", "conversationHistory"],
  }),
});
const inquirerChainResult = await inquiryChain.call({
  userPrompt: prompt,
  conversationHistory,
});

const inquiry = inquirerChainResult.text;

Aside: Prompt Engineering

Now that we’ve seen an example of working with prompts, let’s talk about prompt engineering - an emerging skill in and of itself.

Prompt engineering is the process of carefully crafting input queries or tasks to elicit the most accurate and useful responses from LLMs. While these models are incredibly powerful and versatile, they need a little guidance to truly get the job done correctly.

Prompt engineering involves three main components:

Phrasing: We need to experiment with different ways of presenting our input queries. The goal is to find the perfect balance between clarity and specificity, ensuring that the LLM “grasps” exactly what we’re looking for.
Context: We need to add context to our prompts to help the LLM “understand” the broader picture. This could involve providing background information, setting the stage for the desired response, or even gently nudging the model towards a particular line of thought. As we saw before, this is what we do to produce an inquiry that would be relevant to the previous prompts from the user.
Instructions: We need to give the LLM clear and concise instructions. We need to specify the format you'd like the response to take or highlight any key points you'd like the model to consider. As you saw before, we did that by defining a list of instructions that defined to the LLM exactly how to format the inquiry and how to combine it with the prompt received from the user.

Prompt engineering is all about trial and error, a dance of iteration and optimization. As we fine-tune iyr prompts, we develop a deeper understanding of how to communicate effectively with the LLM, transforming it into a more reliable and efficient problem-solving tool.

Embedding the inquiry and querying Pinecone

Next, we embed the inquiry:

const embedder = new OpenAIEmbeddings({
  modelName: "text-embedding-ada-002",
});

const embeddings = await embedder.embedQuery(inquiry);

Next, we query Pinecone to retrieve the documents for our embedded inquiry. Here we make the query passing the includeMetadata: true parameter, then map over the results and cast the metadata as the Metadata type.

type Metadata = {
  url: string;
  text: string;
};

const getMatchesFromEmbeddings = async (
  embeddings: number[],
  pinecone: PineconeClient,
  topK: number
): Promise<ScoredVector[]> => {
  const index = pinecone!.Index("crawler");
  const queryRequest = {
    vector: embeddings,
    topK,
    includeMetadata: true,
  };
  try {
    const queryResult = await index.query({
      queryRequest,
    });
    return (
      queryResult.matches?.map((match) => ({
        ...match,
        metadata: match.metadata as Metadata,
      })) || []
    );
  } catch (e) {
    console.log("Error querying embeddings: ", e);
    throw new Error(`Error querying embeddings: ${e}`);
  }
};

What we get back from this function is an array of ScoredVectors. We’ll extract the urls and document text from the metadata for each of these matches, and pass them to our summarizer.

Summarization

At the moment, OpenAI has an upper limit of 4,000 tokens per request (this will change with the release of GPT-4, with limits of 8,000 and 32,000 for some of OpenAI’s offerings). So we’re in a bit of a pickle: On the one hand, we want the context used by the chatbot to produce it’s final answer to be as detailed as possible, but we can’t pass all the raw documents that we found in our Pinecone query. The solution is to summarize the raw documents while preserving the important bits of information found in each document we summarize.

To do this, we start by combining all the documents we retrieved from Pinecone together, and then chunking them into even sized chunks of up to 4000 tokens. We summarize each chunk, and combine them together. If the resulting summarized document is still too long, we continue to recursively summarize it.

const summarizeLongDocument = async (
  document: string,
  inquiry: string,
  onSummaryDone: Function
): Promise<string> => {
  // Chunk document into 4000 character chunks
  try {
    if (document.length > 3000) {
      const chunks = chunkSubstr(document, 4000);
      let summarizedChunks: string[] = [];
      for (const chunk of chunks) {
        const result = await summarize(chunk, inquiry, onSummaryDone);
        summarizedChunks.push(result);
      }

      const result = summarizedChunks.join("\n");

      if (result.length > 4000) {
        return await summarizeLongDocument(result, inquiry, onSummaryDone);
      } else return result;
    } else {
      return document;
    }
  } catch (e) {
    throw new Error(e as string);
  }
};

To summarize each chunk, we create a new “chain”, and apply it:

const summarize = async (
  document: string,
  inquiry: string,
  onSummaryDone: Function
) => {
  const chain = new LLMChain({
    prompt: promptTemplate,
    llm,
  });

  try {
    const result = await chain.call({
      prompt: promptTemplate,
      document,
      inquiry,
    });

    onSummaryDone(result.text);
    return result.text;
  } catch (e) {
    console.log(e);
  }
};

And here’s the prompt that tells our LLM to preserve the information that’s important to us (in this case, it’s code):

`Shorten the text in the CONTENT, attempting to answer the INQUIRY. You should follow the following rules when generating the summary:
  - Any code found in the CONTENT should ALWAYS be preserved in the summary, unchanged.
  - Code will be surrounded by backticks (\`) or triple backticks (\`\`\`).
  - Summary should include code examples that are relevant to the INQUIRY, based on the content. Do not make up any code examples on your own.

  - If the INQUIRY cannot be answered, the final answer should be empty.
  - The summary should be under 4000 characters.

  INQUIRY: {inquiry}
  CONTENT: {document}

  Final answer:
  `;

Answer construction Prompt

At the end of the summarization process, we’ll the following ingredients to build our final answer:

Inquiry
Conversation history
Original URLs of the retrieved documents
Summarized documents

We’re ready to build our final chain. Instead of waiting for the entire answer to be received, we want our response to be streamed to the user token by token, so we’re going to use the ChatOpenAI class – it allows us to define a CallbackManager that handles streaming events.

const chat = new ChatOpenAI({
  streaming: true,
  verbose: true,
  modelName: "gpt-3.5-turbo",

  callbackManager: CallbackManager.fromHandlers({
    async handleLLMNewToken(token) {
      // stream the token to the user
    },
  }),
});

Whenever a new token is received, we want to stream it back to the user. For that purpose, we’ll use Ably.

Aside: Why Ably?

Ably is a real-time data delivery platform that provides infrastructure and APIs for developers to build scalable and reliable real-time applications. It can be used to handle real-time communication, data synchronization, and messaging across various platforms and devices.

As our chatbot gains more users, the number of messages exchanged between the bot and the users will increase. Ably is built to handle such growth in traffic without any performance degradation.

Ably also ensures message delivery and provides message history, even in cases of temporary disconnections or network issues. Implementing this level of reliability using only WebSockets can be challenging and time-consuming.

Finally, Ably provides built-in security features like token-based authentication and fine-grained access control, simplifying the process of securing your chatbot's real-time communication.

Setting up Ably

Setting Ably up on the API side is as simple as can be:

const client = new Ably.Realtime({ key: process.env.ABLY_API_KEY });

Whenever we stream the token to the user, we’ll publish a message on the channel we assign to the user:

const channel = ably.channels.get(userId);
channel.publish({
  data: {
    event: "response",
    token: token,
    ...
  }
})

The Application

Fortunately for us, we don’t have to build a chatbot interface from scratch. Instead, we can use the well crafted Chat UI React Kit which offers all the necessary components for building a production grade chat application. It looks something like this:

The full code listing can be found here. As you can see, we have a message box where the user can type their message. When they hit the enter key, the message will be sent. Under the chatbot’s name, we have a status box that will update whenever the chatbot wants to update the user of the activity it’s up to.

Handling incoming messages

To receive messages on the client, we first set up the useChannel effect provided by Ably:

import { useChannel } from "@ably-labs/react-hooks";

useChannel(visitorData?.visitorId! || "default", (message) => {
  switch (message.data.event) {
    case "response":
      setConversation((state) => updateChatbotMessage(state, message));
      break;
    case "status":
      setStatusMessage(message.data.message);
      break;
    case "responseEnd":
    default:
      setBotIsTyping(false);
      setStatusMessage("Waiting for query...");
  }
});

Whenever the bot sends us a status message, we’ll update the status panel.

That said, we still have to do some work to handle the incoming streaming data whenever our bot responds. As you can see, we save our conversation in a state object that has an array of ConversationEntry:

type ConversationEntry = {
  message: string,
  speaker: "bot" | "user",
  date: Date,
  id?: string,
};

Whenever a new message is returned from the chatbot, we need to update the conversation list appropriately. We basically have to “pluck” the last message from the state and continuously add to it.

const updateChatbotMessage = (
  conversation: ConversationEntry[],
  message: Types.Message
): ConversationEntry[] => {
  const interactionId = message.data.interactionId;

  const updatedConversation = conversation.reduce(
    (acc: ConversationEntry[], e: ConversationEntry) => [
      ...acc,
      e.id === interactionId
        ? { ...e, message: e.message + message.data.token }
        : e,
    ],
    []
  );

  return conversation.some((e) => e.id === interactionId)
    ? updatedConversation
    : [
        ...updatedConversation,
        {
          id: interactionId,
          message: message.data.token,
          speaker: "bot",
          date: new Date(),
        },
      ];
};

The final thing we have to do is to send the user’s request to our bot. When the submit function is called (when the enter key is pressed), we’ll add the user’s message into the conversation state object, and send the user’s message to the bot, alongside with the user’s unique identifier we get from Fingerprint.

const submit = async () => {
  setConversation((state) => [
    ...state,
    {
      message: text,
      speaker: "user",
      date: new Date(),
    },
  ]);
  try {
    setBotIsTyping(true);
    const response = await fetch("/api/chat", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ prompt: text, userId: visitorData?.visitorId }),
    });

    await response.json();
  } catch (error) {
    console.error("Error submitting message:", error);
  } finally {
    setBotIsTyping(false);
  }
  setText("");
};

And with that our application is ready to go!

Demo

To test the chatbot, I chose to run it on Pinecone’s own documentation. I first crawled https://docs.pinecone.io, and got the following results for my question:

Looks pretty good!

Final Thoughts

The chatbot and LLM space is rapidly changing. OpenAI has just announced GPT-4 and its new limits, which may change the way this and other applications approach summarization and other tasks. The JS/TS version of Langchain is continuously improving and adding new features that will simplify many of the tasks we had to craft manually.

With that said, the overall architecture for a conversational application like this will roughly be the same: We’ll always need to crawl, embed, and index our source of truth data to provide grounding for the chatbot. We’ll always need to create prompts that will help the chatbot understand the user’s intent and formulate the answer in the way we want it to be provided to the user.

We encourage you to make the most of the ongoing advancements in this space with tools like Pinecone and Langchain. Use this post as a starting point to create conversational applications that engage users and keep them coming back for more!

Building an Image Recognition App in Javascript using Pinecone, Hugging Face, and Vercel

Roie Schwaber-Cohen — Fri, 17 Mar 2023 04:19:25 +0000

This post was originally published on the Pinecone.io Learning Center

Introduction

The world of AI is rapidly expanding, and now the JavaScript/TypeScript ecosystem is joining in. With the emergence of tools like Pinecone, HuggingFace, OpenAI, Cohere, and many others, JavaScript developers can create AI applications more quickly, addressing new challenges that were once exclusive to machine learning engineers and data scientists.

Traditionally, Python has been the go-to language for AI/ML solutions. In many cases, the product of this type of code is a Python notebook. But as we all know, creating full-fledged AI applications is usually more involved, especially when we need our applications to be commercial grade. It requires us to think of the solution in a much more holistic way.

The Javascript ecosystem has the advantage of being oriented towards applications from the get-go. It has a huge collection of tools that deliver highly performant and scalable applications both on the edge and the server. Javascript developers are uniquely positioned to productize AI solutions, and now they can translate their skillset and get on the AI train without abandoning the tools they know and love.

In this post, we’ll see how we can use well-known Javascript frameworks and tools to build an AI application. As you’ll see, we won’t train a machine learning model from scratch — nor do we need to learn the math behind these algorithms. But we’ll still be able to leverage them to create a powerful AI product.

The use case we’ll tackle is image recognition: We want an AI model to recognize various objects, faces, and so on, and a mobile application that will allow users to just point the phone camera at any object, assign it with a label and "train" the application to detect that object. Then, after the training is complete, any time the user points the camera at the object, the detected label will appear.

We also need our application to support multiple users — where each user can label and train their own objects, without other users seeing those labels. Finally, we’ll need to allow users to reset their labels and delete their accounts if they so choose.

Here’s a short video that demonstrates what the app does:

Before we dive into the build, let’s discuss some potential commercial uses of this type of application.

Manufacturing and Quality Control: This application has the potential to enhance manufacturing and quality control systems. For example, it could be used to label specific systems in various states (for example, a “healthy” state and a “broken” state) and help optimize the control of those systems.
Entertainment and Gaming: Imagine the possibilities for creating immersive and interactive experiences in the world of entertainment and gaming. This technology enables developers to specify custom labels, allowing for unique character and scene recognition and engagement.
Art and Culture: This type of application can be utilized to recognize and enhance the cultural and educational value of artworks. By specifying custom labels, art and culture-related systems can accurately identify and track various art pieces.

The build

In this post, you’ll learn how to build an image recognition application using Hugging Face, Pinecone, Vercel, and React Native (with Expo). In case you’re unfamiliar with any of these, here’s a quick introduction:

Hugging Face is a platform that provides us with a large collection of pre-trained and ready-to-use AI models for various tasks and domains. It also allows us to host our own custom models and use them via a simple API. In this post, we’ll use a custom CLIP model to generate vector embeddings for any image (we’ll talk about embeddings later in this post), instead of predicting a fixed set of classes.
Pinecone is a scalable and performant vector database. It allows us to store and query high-dimensional vectors, such as the embeddings generated by our CLIP model, and find the most similar ones in real-time. Pinecone also provides a simple and intuitive API that integrates well with our Node.js backend.
Vercel is a platform that allows us to deploy serverless functions and static websites with minimal configuration and hassle. Vercel also provides us with a generous free tier and a global edge network that ensures low latency and high availability for our app.
React Native (and Expo) is a framework that allows us to build native mobile applications using Javascript and React. React Native also gives us access to many of the sensors and features that exist on phones, such as the camera, which we will use to capture images for our app.

The app will work in two steps: “training” and “detecting”.

While training, the device sends camera and label data to the backend. The backend gets embeddings from the Hugging Face model, combines the embeddings and the label, and upserts both to Pinccone.
While detecting, the image data from the camera is sent to the backend. The backend gets the embeddings and uses them as the payload to query Pinecone. Pinecone returns a result with the matching embeddings and corresponding label.

We’re going to learn how to:

Create a custom HuggingFace endpoint and query it from a Node.JS backend
Set up and interact with Pinecone using the Pinecone Node.JS client.
Deploy the backend to Vercel

Note: While we won’t detail the React-Native application build, you can find the code for the app in this Github repo

Creating a Custom Hugging Face inference endpoint

In this example, we’ll be using the CLIP model, which is a multi-modal computer vision model. By default, CLIP will provide us with image classification given an image (and optionally text).

In this case, we’re not going to use the classifications but rather the vector embeddings for the images. In simple terms, vector embeddings are the vector representation of the image, as the model “understands” it (so instead of saying “horse” or “cat” it’ll spit back an array of numbers — aka vector). Technically speaking, the model has many layers, where the final layer is the classification layer. The penultimate layer of the model produces the vector representation we call the “embedding”.

Because we aren’t trying to get specific classifications from the model and only need the vector embeddings, we’ll need to create a custom inference endpoint. Let’s see how that’s done.

Setting up the HuggingFace repo

First, we’ll initialize a new model in HuggingFace:

Clone your model repository by running the following:

git lfs install
git clone https://huggingface.co/[your-repo]/clip-embeddings

Creating the custom inference endpoint

In the cloned repo, create a new requirements.txt file with the needed dependencies:

requirements.txt

pillow
numpy

Next, create a handler.py file, which will handle incoming requests to the endpoint:

from typing import Dict, List, Any
import numpy as np
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
from io import BytesIO
import base64

class EndpointHandler():
    def __init__(self, path=""):
        # Preload all the elements you we need at inference.
        self.model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
        self.processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")


    def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
        inputs = data.get("inputs")
        text = inputs.get("text")
        imageData = inputs.get("image")
        image = Image.open(BytesIO(base64.b64decode(imageData)))
        inputs = self.processor(text=text, images=image, return_tensors="pt", padding=True)
        outputs = self.model(**inputs)
        embeddings = outputs.image_embeds.detach().numpy().flatten().tolist()
        return { "embeddings": embeddings }

The EndpointHandler class initializes the CLIP model and processor from the model (openai/clip-vit-base-patch32). The __init__ function is called when the endpoint is initialized and makes the model and processor available when calls are made to the endpoint. The processor will take our inputs and transform them into the data structure the model is expecting (we’ll get into the specifics in a moment).

The __call__ method is the main function that’s executed when the endpoint is called. This method takes a data dictionary as input and returns a list of dictionaries as output. The data dictionary contains an "inputs" key, which in turn contains two keys: "text" and "image". "text" is an array of text strings, and "image" is an encoded image in base64 format.

The method then converts the base64 encoded image data into a PIL image object and uses the processor to prepare the input data for the model. The processor tokenizes the text, converts it to tensors, and performs padding if necessary.

The processed inputs are then passed to the CLIP model for inference and the image embeddings are extracted from the output of the model. The embeddings are then flattened and converted to a list, which is returned as the output of the endpoint. The output is a dictionary with a single key "embeddings" that contains the list of embeddings.

Deploying the model to Hugging Face

To get this endpoint deployed, push the code back to the HuggingFace repo. Then, in the Hugging Face console, click the on-click-deploy button for the model.

Follow the prompts and created a “Protected” endpoint using your preferred cloud provider. Once the endpoint is set up, you’ll get the endpoint URL. We’ll use that later on in our server alongside your HuggingFace API key.

With the Hugging Face endpoint set up, it’s time to move on to setting up the Pinecone index.

Set up a Pinecone Index

Create a Pinecone account and click the “Create Index” button. You’ll see the following screen:

The CLIP model has 512 dimensions and uses the cosine similarity metric, so you’ll choose the same settings for the index. You can choose any pod type you’d like, but for the ideal performance, select the P2 pod type. Click “Create Index,” and it should take up to a couple of minutes for the index to initialize.

While in the Pinecone console, click on “API Keys” and retrieve your Pinecone API key — you’ll need it for setting up your Node.JS backend.

Setting up the Node.JS backend

Now that you’ve set up the index, you can create the server that will interact with it. The full code listing for the server is available in this repo.

Install the dependencies

We’re going to use Express and the Pinecone NodeJS client. Run the following command to install them:

npm install --save express body-parser @pinecone-database/pinecone

handleImage receives images from the mobile device looks like this:

const handleImage = async (req, res) => {
  const data = req.body;

  const { data: imageData, uri, label, stage, user } = data;
  const id = `${label}-${md5(uri)}`;
  const userHash = md5(user);
  const text = "default";
  try {
    const embeddings = await getEmbeddings(imageData, [text]);
    const result = await handleEmbedding({
      id,
      embeddings,
      text,
      label,
      stage,
      user: userHash,
    });
    res.json(result);
  } catch (e) {
    const message = `Failed handling embedding: ${e}`;
    console.log(message, e);
    res.status(500).json({ message });
  }
};

Next, we do the following:

Extract the image data, label, stage, user identifier, and image identifier from the request body
Create an identifier for the captured image
Hash the user identifier using a helper function (md5)
Call the getEmbeddings function to get the embeddings for the image
Call handleEmbeddings to deal with the training and detection step appropriately

The getEmbeddings function simply sends the image data (along with optional text) to the Hugging Face inference endpoint and returns the embeddings.

const getEmbeddings = async (imageBase64, text) => {
  const data = {
    inputs: {
      image: imageBase64,
      text,
    },
  };
  try {
    const response = await fetch(inferenceEndpointUrl, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${inferenceEndpointToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(data),
    });

    const json = await response.json();
    return json.embeddings;
  } catch (e) {
    console.log(e);
  }
};

As we saw above, the Node.JS backend has to handle the “training” and “detecting” states the application is in. The handleEmbedding function does just that:

const handleEmbedding = async ({
  id,
  embeddings,
  text,
  label,
  stage,
  user,
}) => {
  switch (stage) {
    case "training":
      return await saveEmbedding({
        id,
        values: embeddings,
        namespace: user,
        metadata: { keywords: text, label },
      });
    case "detecting":
      return await queryEmbedding({
        values: embeddings,
        namespace: user,
      });
  }
};

In “training” mode, the handleEmbeddings function will send the embeddings, the user’s identifier as well as metadata which includes the label the user provided to saveEmbeddings.
In “detecting” mode, the handleEmbeddings function will send the embeddings and the user identifier to the queryEmbedding function.

In both cases, we’ll use the user identifier to write and read embeddings to a namespace in Pinecone which will ensure we’re segregating each user’s data from all the other users.

To save the embeddings, we select the index we’ll be writing to, and then use the index.upsert method.

const saveEmbedding = async ({ id, values, metadata, namespace }) => {
  const index = pineconeClient.Index(indexName);
  const upsertRequest = {
    vectors: [
      {
        id,
        values,
        metadata,
      },
    ],
    namespace,
  };
  try {
    const response = await index.upsert({ upsertRequest });
    return response?.upsertedCount > 0
      ? {
          message: "training",
        }
      : {
          message: "failed training",
        };
  } catch (e) {
    console.log("failed", e);
  }
};

To query the embedding, we select the index, specify the number of results we want to get back from the index with the topK parameter (here we just want one result) and pass the vector we want to use for the query.

As we mentioned before, the namespace will be the user identifier, which will limit the results to the embeddings created by that user. This will also improve the performance of our queries, since it’ll only run the query on the subset of vectors belonging to the user, instead of all the vectors in the index.

The includeMetadata parameter ensures we get back the metadata associated with the vector – that will include the label which we eventually want to display to the user.

const queryEmbedding = async ({ values, namespace }) => {
  const index = pineconeClient.Index(indexName);
  const queryRequest = {
    topK: 1,
    vector: values,
    includeMetadata: true,
    namespace,
  };
  try {
    const response = await index.query({ queryRequest });
    const match = response.matches[0];
    const metadata = match?.metadata;
    const score = match?.score;
    return {
      label: metadata?.label || "Unknown",
      confidence: score,
    };
  } catch (e) {
    console.log("failed", e);
  }
};

The result of the query will include an array of matches, from which will pick the first one. For that match, we’ll return the label and the confidence score as the final result.

Finally, we have a simple Express server that exposes the /api/image endpoint:

import * as dotenv from "dotenv";
import express from "express";
import http from "http";
import bodyParser from "body-parser";
import handler from "./handler.js";

dotenv.config();
const port = process.env.PORT;

const app = express();
app.use(bodyParser.json());

const server = http.createServer(app);
app.post("/api/image", handler);

// Start the HTTP server
server.listen(port, () => console.log(`Listening on port ${port}`));

Running the server locally

If you didn’t build the server yourself, you can clone the server repo by running:

git@github.com:pinecone-io/pinecone-vision-server.git

Install the dependencies (if you haven’t already):

npm install

Create a .env file and provide the following values:

INFERENCE_ENDPOINT_TOKEN=<YOUR_HUGGING_FACE_TOKEN>
INFERENCE_ENDPOINT=<YOUR_HUGGING_ENDPOINT_URL>
PINECONE_ENVIRONMENT=<YOUR_PINECONE_ENVIRONMENT>
PINECONE_API_KEY=<YOUR_PINECONE_API>

Start the server by running:

npm run start

Deploying the server to Vercel

To deploy the server to Vercel, we simply have to add a vercel.json file with the following:

{
  "version": 2,
  "name": "pinecone-vision",
  "builds": [{ "src": "src/index.js", "use": "@vercel/node" }],
  "routes": [{ "src": "src/(.*)", "dest": "src/index.js" }]
}

Then, push the changes and set a new Vercel project for the repo. Follow these instructions to import your existing repo into a Vercel project.

You’ll need to set up the same environment variables in Vercel we mentioned earlier.

The mobile application

The mobile application has two jobs:

In training mode: Relay images from the device’s camera to the backend together with a label set by the user
In detection mode: Display the detected label for whatever the camera is pointing to

Here’s what the detectImage function does:

Get the image from the camera by calling takePictureAsync.
Resize the image down to improve the speed of inference (and overall communication)
Send the image to the server together with a label and either a “training” or “querying” stage. This will tell the backend how to handle the incoming payload.

async function detectImage() {
    if (!cameraReady) {
      return;
    }
    const pic = await cameraRef.current?.takePictureAsync({
      base64: true,
    });
    const resizedPic = await manipulateAsync(
      pic.uri,
      [{ resize: { width: 244, height: 244 } }],
      { compress: 0.4, format: SaveFormat.JPEG, base64: true }
    );

    !detecting && setNumberOfImages((prev) => prev + 1);

    const payload = {
      uri: pic.uri,
      data: resizedPic.base64,
      height: resizedPic.height,
      width: resizedPic.width,
      label,
      user,
      stage: !detecting ? "training" : "detecting",
    };

    try {
      const result = await fetch(`${ENDPOINT}/api/image`, {
        method: "POST",
        headers: {
          Accept: "application/json",
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      const json = await result.json();
      const { label, confidence: score } = json;
 ...
    } catch (e) {
      console.log("Failed", e);
    }
  }

The full application code is available in this repo. You can clone it locally and run it with Expo.

If you’re running the server locally, first create a .env file with the following value:

ENDPOINT=<YOUR_LOCAL_IP_ADDRESS>

If you’d like to use the demo endpoint instead, use the following:

ENDPOINT=pinecone-vision-latest.vercel.app

Then, run the following command:

npx expo start

You should see a scannable QR which will open the application on your phone. It should look something like this:

Summary

The combination of the popularity of Javascript and the rise of AI applications has created a lot of potential for innovation. With powerful tools like Pinecone, Vercel, and Hugging Face available, developers can now create AI applications with ease. This opens up many possibilities for developers to create AI applications for a wider range of users. If you’re a JS developer, I hope this inspires you to build something great!

How hard can Authorization be?

Roie Schwaber-Cohen — Thu, 10 Feb 2022 17:34:15 +0000

So you've decided to build authorization into your application. Sounds pretty straightforward, right? All it takes is a couple of tables in the database for roles and permissions, and we should be fine. Let's take a deeper look at some design considerations you should be aware of from the get-go.

Authorization happens everywhere

The authorization mechanism gates every action or request a user makes in the application. Many (if not all) components in the application will need to make authorization decisions - which means we can expect to see authorization logic everywhere in the codebase.

As with any other aspect of an application - we need to think about separation of concerns: how do we ensure that the authorization logic is isolated from the rest of the application components? Without a clear separation of concerns, we can end up with a situation where the authorization logic is coupled with the rest of the application. We'll have to make wide-ranging changes to the rest of the application with any updates to the authorization logic.

When using multiple services, the authorization challenge is even more complex. A typical pattern is that initially, the authorization logic will be repeated in every service - making it harder to update and maintain: every change will require updates in all the services. When more services are added, using more languages - this problem will be further exacerbated since now the authorization logic implementation will vary depending on the language used.

Changing requirements

It's more than likely that when the authorization layer is first created, it'll include some form of roles and permissions (e.g., a user can read, an admin can delete, etc.). But as the application evolves, the authorization layer will need to be updated to support new roles, more fine-grained models, and other features.

This usually means that the authorization layer will have to be rewritten - and most likely more than once in the lifetime of the application. When the authorization layer is coupled to the application, it will surely mean significant rewrites to the application as well. So say goodbye to some precious months of development cycles spent on rewriting authorization code - which isn't part of your core value proposition.

Performance

Since authorization decisions are made for every request or action in the application, they must be made in milliseconds. So ensuring that the authorizing component has the information it needs to make decisions within that time frame can be challenging since it will most likely require querying other data sources.

For example, even in the most basic scenarios, given some user identity information, the authorization component would have to resolve what roles are associated with that identity. In more complex scenarios, the authorization component would have to resolve additional user attributes or resource data to make its decisions.

To build a performant authorization layer, we have to architect the solution in a way that will allow the decision engine to fetch the data it needs in a reasonable amount of time.

Scaling

Any authorization solution needs to be able to scale as the application grows and more authorization requests are made. If the authorization layer is not separated from the rest of the application, scaling will be a challenge. In addition, since the application will make authorization requests very frequently, every request will incur an overhead, which will make the authorization layer a bottleneck.

With that said - it's hard to separate the authorization layer from the rest of the application since the decision relies on data coming from the application - and the application depends on decisions made by the authorization layer.

Identity

Authorization is tied to user identity. These days, applications are more than likely to use an identity provider (such as Auth0, Azure AD, Google Identity, Okta, etc.) to resolve the user's identity. In these cases, integration with an identity provider will be a key part of the authorization layer and will present another challenge: since the identity provider is external to the application, it'll be difficult to guarantee that it will be available at all times and that the response time will be acceptable. For that reason, some mechanism to synchronize the identity provider with the authorization layer will be needed.

But resolving the user's identity is only the first step: the user's identity has to be mapped to specific roles and, in some cases, to a set of attributes. An authorization solution will also require some way to store, maintain and serve these mappings to the decision engine in a timely manner.

Auditability

Authorization is a key part of the security model of an application. In an enterprise setting, that means that all authorization decisions must be collected and audited. This could prove challenging to implement at scale: it requires instrumenting all the places where authorization decisions are either made or enforced. All the authorization decisions need to be aggregated in a format compatible with analysis and auditing tools.

Collaboration

In most organizations, the authorization process will include participants from outside the development team. This means that it can't just be left as a series of if … else statements ingrained in the application code. Eventually, the authorization component will have to facilitate collaboration around the governance of the authorization process.

The challenge here will be to design the authorization solution in a way that will separate the authorization decision-making logic from the authorization enforcement logic. That way, authorization policies could be maintained and updated without requiring any changes to the application code. This will allow participants outside the development team to administer the authorization process.

Summary

While it might seem easy at first, creating an authorization layer is an involved process. It's not just about creating reference tables in a database between users and their roles - it's about creating a way to ensure that the authorization logic can be easily changed across the application and that it can continue to evolve as the application evolves. It's about making sure the authorization decisions don't slow down the application and that the authorization layer will scale as the application grows and the volume of authorization requests increases.

While many development teams may choose to initially build their own authorization solution, they may eventually realize that they've committed to more than they bargained for. The truth is that in most cases, while authorization is a must-have for any application to go to production - it is rarely a core part of the value proposition. Therefore, development teams should ask themselves whether they are willing to sacrifice valuable development cycles developing features that aren't part of that value proposition.

If you'd rather avoid the challenges building an authorization system entails, we think we can help. Aserto provides an enterprise-ready, cloud-native authorization service deployed to the edge that is easy to integrate and will scale with your application. It delivers single-millisecond authorization decisions and is 100% available to your application.

Aserto integrates with your existing identity provider and automatically synchronizes all the required authorization assets to the edge authorizer and provides an aggregated audit trail for authorization decisions. Sign up today and leave your authorization wows behind.

Building RBAC in Node

Roie Schwaber-Cohen — Thu, 03 Feb 2022 19:28:49 +0000

Introduction

Role Based Access Control (RBAC) is an access control pattern that governs the way users access applications based on the roles they are assigned. Roles are essentially groupings of permissions to perform operations on particular resources. Instead of assigning numerous permissions to each user, RBAC allows users to be assigned a role that grants them access to a set of resources. For example, a role could be something like evilGenius, or a sidekick. A sidekick like Morty Smith for example could have the permission to gather mega seeds, and an evilGenius like Rick would be able to create a microverse.

In this post, we'll review some of the ways to implement an RBAC pattern in a Node.js application using several open source libraries as well as the Aserto Express.js SDK. This is by no means an exhaustive guide for all the features the libraries provide, but it should give you a good idea of how to use them.

Prerequisites

You'll need a basic understanding of Javascript and Node.js to follow this post.
You'll need Node.js and Yarn installed on your machine.
You should be familiar with Rick and Morty - otherwise these users are going to make no sense ;-)

Setup

The code examples shown below can be found in this repository. To run each of them, navigate to the corresponding directory and run yarn install followed by yarn start.

All of the examples we'll demonstrate in this post have a similar structure:

They use Express.js as the web server, and they use a middleware called hasPermission to check if the user has the correct permissions to access the route.
They share a users.json file that contains the users and their assigned roles. This file will simulate a database that would be used in a real application to store and retrieve user information.



[
  {
    "id": "beth@the-smiths.com",
    "roles": ["clone"]
  },
  {
    "id": "morty@the-citadel.com",
    "roles": ["sidekick"]
  },
  {
    "id": "rick@the-citadel.com",
    "roles": ["evilGenius", "squanch"]
  }
]

The users.json file is going to be accessed by a function called resolveUserRole which, given a user will resolve their role. This function is shared by all of the examples and is found in utils.js.



const users = require("./users");
const resolveUserRole = (user) => {
  //Would query DB
  const userWithRole = users.find((u) => u.id === user.id);
  return userWithRole.role;
};

The initial setup for the Express.js app is straightforward:



const express = require("express");
const { resolveUserRoles } = require("../utils");
const app = express();
app.use(express.json());

The application will have three routes that will be protected by the hasPermission middleware, which will determine whether the user has the correct permissions to access the route, based on the action associated with that route.



app.get("/api/:asset", hasPermission("gather"), (req, res) => {
  res.send("Got Permission");
});

app.put("/api/:asset", hasPermission("consume"), (req, res) => {
  res.send("Got Permission");
});

app.delete("/api/:asset", hasPermission("destroy"), (req, res) => {
  res.send("Got Permission");
});

And finally, the application will listen on port 8080:



app.listen(8080, () => {
  console.log("listening on port 8080");
});

Testing

To test the application, we'll make a set of requests to the routes and check the responses:



curl -X <HTTP Verb> --location 'http://localhost:8080/api/<asset>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "user": {
        "id": "krisj@acmecorp.com"
    }

}'

Where <HTTP Verb> is either GET, PUT, or DELETE and <asset> is either megaSeeds or timeCrystals.

For each user, we'll expect the following:

Beth (AKA the clone): Should be only able to gather megaSeeds and timeCrystals
Morty (AKA the sidekick): Should be only able to gather and consume megaSeeds and timeCrystals
Rick (AKA the evilGenius): Should be able to gather, consume and destroy only megaSeeds and timeCrystals.

Let's go get those mega seeds!

Vanilla Node.js

To set the scene, we start with the most simplistic way of enforcing roles in a Node.js application. In this example, we're going to use a JSON file (roles.json) that will map specific roles to actions they may perform, and assets they may perform those actions on:



{
  "clone": {
    "gather": ["megaSeeds", "timeCrystals"]
  },
  "sidekick": {
    "gather": ["megaSeeds", "timeCrystals"],
    "consume": ["megaSeeds", "timeCrystals"]
  },
  "evilGenius": {
    "gather": ["megaSeeds", "timeCrystals"],
    "consume": ["megaSeeds", "timeCrystals"],
    "destroy": ["megaSeeds", "timeCrystals"]
  }
}

In this JSON snippet, the clone role will only be able to gather the megaSeeds and timeCrystals assets. The sidekick role will be able to gather and consume the megaSeeds and timeCrystals assets. The evilGenius role will be able to gather, consume, and destroy megaSeeds and timeCrystals.

The implementation of the hasPermission middleware function is going to be very simple:



const hasPermission = (action) => {
  return (req, res, next) => {
    const { user } = req.body;
    const { asset } = req.params;
    const userRoles = resolveUserRoles(user);

    const permissions = userRoles.reduce((perms, role) => {
      perms =
        roles[role] && roles[role][action]
          ? perms.concat(roles[role][action])
          : perms.concat([]);
      return perms;
    }, []);

    const allowed = permissions.includes(asset);

    allowed ? next() : res.status(403).send("Forbidden").end();
  };
};

In this example we:

Iterate over each user role
Check the existence of the user's given role in the roles object
Check the existence of actions within that given role, and finally check if the assets array associated with that role and action contains the asset the user is trying to access.
Determine whether the permissions the user has included the asset they are trying to access.

Other than being pretty simplistic, this approach is not going to be very scalable - the "policy" definition is going to become complex, highly repetitive, and thus hard to maintain.

Click here to view the full vanilla Node.js implementation.

Node-Casbin

Casbin is a powerful and efficient open-source access control library. It has SDKs in many languages, including Javascript, Go, Rust, Python, and more. It provides support for enforcing authorization based on various access control models: from a classic "subject-object-action" model, through RBAC and ABAC models to fully customizable models. It has support for many adapters for policy storage.

In Casbin, the access control model is encapsulated in a configuration file (src/rbac_model.conf):



[request_definition]
r = sub, obj, act

[policy_definition]
p = sub, obj, act

[role_definition]
g = _, _

[matchers]
m = g(r.sub , p.sub) && r.obj == p.obj && r.act == p.act

[policy_effect]
e = some(where (p.eft == allow))

Along with a policy/roles definition file (src/rbac_policy.conf)



p, clone, megaSeeds, gather
p, clone, timeCrystals, gather
p, sidekick, megaSeeds, consume
p, sidekick, timeCrystals, consume
p, evilGenius, megaSeeds, destroy
p, evilGenius, timeCrystals, destroy
g, sidekick, clone
g, evilGenius, sidekick

The request_definition section defines the request parameters. In this case, the request parameters are the minimally required parameters: subject (sub), object (obj) and action (act). It defines the parameters' names and order that the policy matcher will use to match the request.
The policy_definitions section dictates the structure of the policy. In our example, the structure matches that of the request, containing the subject, object, and action parameters. In the policy/roles definition file, we can see that there are policies (on lines beginning with p) for each role (clone, sidekick, and evilGenius)
The role_definition section is specific to the RBAC model. In our example, the model indicates that an inheritance group (g) is comprised of two members. In the policy/roles definition file, we can see two role inheritance rules for sidekick and evilGenius, where sidekick inherits from clone and evilGenius inherits from sidekick (which means the evilGenius will also have the clone permissions).
The matchers sections defines the matching rules for policy and the request. In our example, the matcher is going to check whether each of the request parameters matches the policy parameters and that the role r.sub is in the policy.

The implementation of the hasPermission middleware function for Node-Casbin is as follows:



const hasPermission = (action) => {
  return async (req, res, next) => {
    const { user } = req.body;
    const { asset } = req.params;
    const userRoles = resolveUserRoles(user);

    const e = await newEnforcer("./rbac_model.conf", "./rbac_policy.csv");

    const allowed = await userRoles.reduce(async (perms, role) => {
      const acc = await perms;
      if (acc) return true;
      const can = await e.enforce(role, asset, action);
      if (can) return true;
    }, false);

    allowed ? next() : res.status(403).send("Forbidden").end();
  };
};

In this code snippet, we create a new Casbin enforcer using the newEnforcer function. Then, we call e.enforce(role, asset, action) on each user role, and return true as soon as the result of the e.enforce function is true. We return a 403 Forbidden response if the user is not allowed to perform the action on the asset, otherwise, we call the next function to continue the middleware chain.

Click here to view the full Node-Casbin implementation.

CASL

The CASL library is an isomorphic authorization that's designed to be incrementally adoptable. Its aim is to make it easy to share permissions across UI components, API services, and database queries. CASL doesn't have the concept of a role - it can only assign a set of permission to a user. It is the responsibility of the developer to handle to the assignment of the proper permissions to a user based on their assigned roles. Instead, CASL permissions are defined as tuples of "action", "subject", "conditions" and optionally "fields".

The main concept in CASL is the "Ability", which determines what a user is able to do in the applications.

It uses a declarative syntax to define abilities, as seen below:



import { AbilityBuilder, Ability } from "@casl/ability";
import { resolveUserRoles } from "../utils.js";

export function defineRulesFor(user) {
  const { can, rules } = new AbilityBuilder(Ability);

  // If no user, no rules
  if (!user) return new Ability(rules);
  const roles = resolveUserRoles(user);

  roles.forEach((role) => {
    switch (role) {
      case "clone":
        can("gather", "Asset", { id: "megaSeeds" });
        can("gather", "Asset", { id: "timeCrystals" });
        break;
      case "sidekick":
        can("gather", "Asset", { id: "megaSeeds" });
        can("gather", "Asset", { id: "timeCrystals" });
        can("consume", "Asset", { id: "timeCrystals" });
        can("consume", "Asset", { id: "megaSeeds" });
        break;
      case "evilGenius":
        can("manage", "all");
        break;
      default:
        // anonymous users can't do anything
        can();
        break;
    }
  });

  return new Ability(rules);
}

In this code snippet, we resolve the user's role using the same resolveUserRoles utility function. Since CASL doesn't have the notion of a role, we create a switch statement that handles the assignment of permission for the various roles. For each role we call the can function which assigns a particular action (gather, consume, or destroy) to a particular resource model (Asset) with specific conditions (id has to equal the asset specified). In the case of the evilGenius role, we use the reserved manage keyword - which means the user can perform all actions, and the reserved all keyword that indicates that this role can do execute actions on all assets.

The hasPermission middleware function for CASL is very similar to the one we used in the previous example:



const hasPermission = (action) => {
  return (req, res, next) => {
    const { user } = req.body;
    const { asset: assetId } = req.params;
    const ability = defineRulesFor(user);
    const asset = new Resource(assetId);
    try {
      ForbiddenError.from(ability).throwUnlessCan(action, asset);
      next();
    } catch (error) {
      res.status(403).send("Forbidden").end();
    }
  };
};

The ability is defined by the rules set by the defineRulesFor function. Then, we wrap the error handler ForbiddenError.from(ability)... that will throw unless that ability allows the user to perform the action on the asset we pass to it. If no error is thrown, we call the next function to continue the middleware chain, otherwise, we return a 403 Forbidden response.

Click here to view the full CASL implementation.

RBAC

The rbac library provides a simple interface for RBAC authorization. It provides an asynchronous interface for the storage of the policy and supports hierarchical roles.

The policy definition is a JSON object passed to the RBAC constructor:



const { RBAC } = require("rbac");
const policy = new RBAC({
  roles: ["clone", "sidekick", "evilGenius"],
  permissions: {
    megaSeeds: ["gather", "consume", "destroy"],
    timeCrystals: ["gather", "consume", "destroy"],
  },
  grants: {
    clone: ["gather_megaSeeds", "gather_timeCrystals"],
    sidekick: ["clone", "consume_megaSeeds", "consume_timeCrystals"],
    evilGenius: ["sidekick", "destroy_megaSeeds", "destroy_timeCrystals"],
  },
});

This code snippet defines the possible roles used in the policy, the possible actions for each asset and eventually defines the mapping between the possible roles and the combination of actions and assets. The combination of actions and assets is simply the concatenation of the action string, an underscore, and the asset. We can see that sidekick also inherits the clone role, and evilGenius also inherits the sidekick role.

The hasPermission middleware function is again similar to the one we used in the previous examples, where the only difference is the call to the policy object:



const hasPermission = (action) => {
  return async (req, res, next) => {
    const { user } = req.body;
    const { asset } = req.params;
    const userRoles = resolveUserRoles(user);

    const allowed = await userRoles.reduce(async (perms, role) => {
      const acc = await perms;
      if (acc) return true;

      const can = await policy.can(role, action, asset);
      if (can) return true;
    }, false);

    allowed ? next() : res.status(403).send("Forbidden").end();
  };
};

Click here to view the full RBAC implementation.

Access-Control

The Access-Control project offers a "Chainable, friendly API" with hierarchical role inheritance. It allows developers to define roles using a single definition file or using a chain of .can calls. It only supports the CRUD action verbs, with two ownership modifiers: any and own.

In this example, we define the roles and permissions in a file called grantlist.js:



const grantList = [
  { role: "evilGenius", asset: "megaSeeds", action: "delete:any" },
  { role: "evilGenius", asset: "timeCrystals", action: "delete:any" },
  {
    role: "evilGenius",
    asset: "megaSeeds",
    action: "read:any",
  },
  { role: "editor", asset: "megaSeeds", action: "update:any" },
  { role: "editor", asset: "timeCrystals", action: "update:any" },
  {
    role: "editor",
    asset: "megaSeeds",
    action: "read:any",
    attributes: ["*", "!id"],
  },
  { role: "user", asset: "megaSeeds", action: "read:any" },
  { role: "user", asset: "timeCrystals", action: "read:any" },
];

module.exports = grantList;

As in the other examples, we have a mapping between roles, assets, and actions. Unlike the other examples, we are limited to the CRUD actions, and in our case, only read, update, and delete apply. As you'll see below, we mapped our custom actions (gather, consume and destroy) to the CRUD actions (it's a bit odd, but that's what you get when you build your authorization library only around CRUD actions...)

We also specify that the sidekick role will be able to readAny of the megaSeeds, but we also limit the attributes that can be read. Specifically, we allow the sidekick to access all the attributes except for the id attribute.

We import the grant list to our main application file, and initialize the AccessControl object:



const grantList = require("./grantlist");
const ac = new AccessControl(grantList);

In this case, instead of explicitly declaring all the roles and permissions, we can extend one role with another:



ac.grant("evilGenius").extend("sidekick");

The hasPermission implementation is a bit different than the other libraries we reviewed so far.



const hasPermission = (action) => {
  return (req, res, next) => {
    const { user } = req.body;
    const { asset } = req.params;
    const userRoles = resolveUserRoles(user);
    const allowed = userRoles.reduce((perms, role) => {
      let permissions;
      switch (action) {
        case "gather":
          permissions = ac.can(role).readAny(asset);
          if (permissions.granted) {
            perms = perms.concat(permissions);
          }
          break;
        case "consume":
          permissions = ac.can(role).updateAny(asset);
          if (permissions.granted) {
            perms = perms.concat(permissions);
          }
          break;
        case "destroy":
          permissions = ac.can(role).deleteAny(asset);
          if (permissions.granted) {
            perms = perms.concat(permissions);
          }
          break;
      }
      return perms;
    }, []);

    if (allowed.length) {
      const result = allowed.map((perm) => {
        const data = assets[asset];
        return {
          data: perm.filter(data),
          asRole: perm._.role,
        };
      });

      res.locals = result;
      next();
    } else {
      res.status(403).send("Forbidden");
    }
  };
};

In this code snippet, we switch over the action based on the CRUD verb associated with it. We then iterate over the userRoles array and collect the permissions for each role.

After collecting all the permissions, we iterate over them again and "fetch" any data the user has access to from a mock store (assets).



const assets = {
  megaSeeds: {
    id: "megaSeeds",
    content: "This is asset 1",
  },
  timeCrystals: {
    id: "timeCrystals",
    content: "This is asset 2",
  },
};

We then use the perm.filter method to filter the data such that only the allowed attributes are passed to the route function.

In this example, when we test the evilGenius user with the action gather on megaSeeds we'll get the following result:



[
  {
    "data": {
      "content": "Mega Seeds grow on Mega Trees"
    },
    "asRole": "clone"
  },
  {
    "data": {
      "id": "megaSeeds",
      "content": "Mega Seeds grow on Mega Trees"
    },
    "asRole": "evilGenius"
  }
]

Based on the grants definition above, the clone is not allowed to see the id attribute, but the evilGenius is allowed to see all the attributes.

Click here to view the full Access-Control implementation.

Aserto

Aserto takes a fundamentally different approach to authorization than all of the examples we've seen above. First and foremost - Aserto is an authorization service, with an SDK that allows easy integration into the application. Aserto can be deployed as a sidecar to your application - which guarantees maximum availability as well as a single-digit millisecond response time for authorization decisions.

There are a couple of additional key differences that sets Aserto apart from the other libraries we've reviewed so far.

Policy as Code - What we've seen in the examples so far could be grouped into an approach called "Policy as Data", where the policy itself is reasoned through the data that represents it. Aserto uses a different approach where the policy is expressed and reasoned about as code.

Reasoning about the policy as code makes the policy a lot more natural to write and maintained by developers. It takes away the need to traverse and reason about complex graphs or data structures. It also allows for more flexibility in the policy definition, as policies can be defined in a much more declarative way. Instead of convoluted data structures, developers can write the policy in a way that is a lot more concise and readable - and changes to the policy are made by changing the rules of the policy as opposed to rows in a database.

Users as First-Class Citizens - With Aserto, users and their roles are first-class citizens. Aserto provides a directory of users and their roles which is continuously synchronized with the Aserto authorizer. This allows Aserto to reason about users and their roles as part of the policy itself - without requiring role resolution as an additional external step (This is why the users.json file or the resolveUserRoles function are not going to be required as you'll see below). Having the role resolution as part of the application comes with its own set of risks - and the directory eliminates the risk of contaminating the decision engine with untrustworthy data.

Setting up Aserto

Aserto offers a console for managing policies - to create a new policy, you'll need to sign in. If you don't already have an Aserto account, you can create one here.

Add The Acmecorp IDP

To simulate the behavior of a user directory, we'll add the "Acmecorp IDP", which includes mock users that will be added to our directory. Head on to the Aserto Console, select the "Connections" tab and click the "Add Connection" button.

From the drop-down menu, select "Acmecorp"

Name the provider acmecorp and give it a description.

Finally click “Add connection”:

Create a Policy

Click here to create a new policy.

First, select your source code provider. If you haven't set one up already, you can do so by clicking the "Add a new source code connection" in the dropdown. This will bring up a modal for adding a connection to a provider. Note that Aserto supports GitHub as a source code provider, but allows you to connect to it either over an OAuth2 flow, or using a Personal Access Token (PAT).

After you're done connecting your Github account (or if you previously connected it), select "github" as your Source code provider.

Next, you'll be asked to select an organization & repo. Select the “New (using template)” radio button, and select the "policy-template" template.

Name your policy repo "policy-node-rbac" and click "Create repo".

Name your policy "policy-node-rbac":

And finally click "Add policy":

Head to Github and open the newly created repository, and clone it.



git clone https://github.com/[your-organization]/policy-node-rbac

Lastly, delete the policy hello.rego under the /src/policies folder.

Aserto Policies

Let's take a look at how policies are defined in Aserto. For the use case we presented, we'll need a policy for every route the application exposes. Let's start by creating the policy /api/read/:asset route. Under /src/policies, we'll create a file called noderbac.POST.api.read.__asset.rego, and paste the following code into it:



package noderbac.POST.api.__asset

default allowed = false

allowed {
    input.user.attributes.roles[_] == "clone"
    input.resource.asset == data.assets[_]
}

allowed {
    input.user.attributes.roles[_] == "sidekick"
    input.resource.asset == data.assets[_]
}

allowed {
    input.user.attributes.roles[_] == "evilGenius"
    input.resource.asset == data.assets[_]
}

The first line of the policy defines the name of the package, and it matches the route it will protect. Next, we define that by default, the allowed decision will be false - this means we're defaulting to a closed system, where access has to be explicitly granted.

The next three clauses will evaluate the allowed decision based on the user's roles and the asset they're trying to access. For example, the first line in the first clause will check if the user has the role of clone assigned to them. The user roles are automatically resolved by Aserto based on the user's identity.

The second line of the first clause will check whether the asset the user is trying to access is listed in the data.assets object, which is part of the policy. The asset is passed to the policy as part of the resource context (more details below). A policy can have a data file attached that could be used in the context of the policy. In our case, it includes the list of assets users can access. Under the /src folder, create a file called data.json and paste the following code into it:



{
  "assets": ["megaSeeds", "timeCrystals"]
}

Using a separate data file to define the protected assets, we don't have to explicitly define them in the policy (as we had to do in the previous examples).

The policies for /api/edit/:asset and /api/delete/:asset are identical to the ones for /api/read/:asset, except that the roles associated with each are different.

We'll create a file under /src/policies called noderbac.PUT.api.__asset.rego and paste the following code into it:



package noderbac.PUT.api.__asset

default allowed = false

allowed {
    input.user.attributes.roles[_] == "sidekick"
    input.resource.asset == data.assets[_]
}

allowed {
    input.user.attributes.roles[_] == "evilGenius"
    input.resource.asset == data.assets[_]
}

Next, we'll create a file under /src/policies called noderbac.DELETE.api.__asset.rego and paste the following code into it:



package noderbac.DELETE.api.__asset

default allowed = false

allowed {
    input.user.attributes.roles[_] == "evilGenius"
    input.resource.asset == data.assets[_]
}

As you can see, the policy for the consume route is allowing both sidekick and evilGenius access, while the policy for the destroy route is allowing access only to evilGenius.

Lastly, we'll update the .manifest file to include the reference to the data in our data.json file. Update the /src/manifest.json file to include the following:



{
  "roots": ["noderbac", "assets"]
}

To deploy the new policy, we'll just commit, tag, and push it to the repo we created:



git add .
git commit -m "Created RBAC Policy"
git push
git tag v0.0.1
git push --tags

Application implementation

The hasPermission function implementation is mostly similar, except that we're not going to resolve the user roles, since Aserto will do that for us:



const { is } = require("express-jwt-aserto");

const options = {
  authorizerServiceUrl: "https://authorizer.prod.aserto.com",
  policyId: process.env.POLICY_ID,
  authorizerApiKey: process.env.AUTHORIZER_API_KEY,
  tenantId: process.env.TENANT_ID,
  policyRoot: process.env.POLICY_ROOT,
  useAuthorizationHeader: false,
};

const hasPermission = (action) => {
  return async (req, res, next) => {
    const { user } = req.body;
    const { asset } = req.params;
    req.user = { sub: user.id };
    const allowed = await is("allowed", req, options, false, { asset });
    allowed ? next() : res.status(403).send("Forbidden").end();
  };
};

Here we pass the user's id as part of the req object. In production use cases, the req.user object would be populated after the user's authentication has been completed. The is function is going to return the allowed decision for the given route (encapsulated in the req object), for the asset we specify in the resource context.

The configuration passed to the is function (in the options object) requires that we create a .env file in the root of the project, and populate some environment variables from the Aserto console, on the Policy Details page:

Copy the Policy ID, Authorizer API Key, and Tenant ID to the .env file:



POLICY_ID=<Your Policy ID>
AUTHORIZER_API_KEY=<Your Authorizer API Key>
TENANT_ID=<Your Tenant ID>
POLICY_ROOT=noderbac

To run the example, run the following commands in the aserto directory:



yarn install
yarn start

Finally, you can test the application by running the same curl commands as before:



curl --location --request <HTTP Verb> 'http://localhost:8080/api/&lt;asset&gt;' \

--header 'Content-Type: application/json' \

--data-raw '{

    "user": {

        "id": "rick@the-citadel.com"

    }

}'

Summary

In the post, we reviewed multiple ways of adding RBAC to your application. We've seen that in most cases, users are not considered a first-class citizen concept in the authorization offering and that the process of role resolution is left to the developer, and ends up as part of the application itself, which introduces many risks. We've also seen that most solutions take the "Policy-as-Data" approach as opposed to the "Policy-as-Code" approach.

While it might seem easier to use a library to implement RBAC in your Node.JS application, it is important to consider the lifecycle of the application and how it'll grow. How will new users and roles be added? What would be the implications of changing the authorization policy? How will we reason about the authorization policy when it gets to be more complex?

Using a library means that you assume ownership of the authorization component - which requires time and effort to build and maintain. By using a service such as Aserto you can offload the responsibility of managing the authorization flow - without sacrificing the performance or availability of your application.