Joan for Xata

Posted on Oct 10, 2023 • Edited on Dec 28, 2023 • Originally published at xata.io

Using the LangChain Integration with a Serverless Database

#ai #learning #database #machinelearning

Xata has integrations with LangChain, and is available both as a vector store and a memory store.

What is LangChain?

LangChain is a popular open-source framework for developing AI applications powered by Large Language Models (LLMs). You can think of it as a collection of composable components implemented in Python and TypeScript that you can combine to implement various AI use cases.

The components typically offer a common API for different models (OpenAI, Llama, Replicate, etc.), vector stores (Pinecone, Weaviate, Chroma, etc.), databases as memory stores (DynamoDB, Redis, Planetscale, etc.) and more. By offering a common API across different models, for example, LangChain makes it easy to switch between models and compare results or use different models for different parts of the app.

These components can then be “chained together” in more complex applications, and LangChain comes with off-the-shelf implementations for several popular AI use cases (Q&A chat bots, summarization, autonomous agents, etc).

While Xata has a built-in API for the “ChatGPT on your data” use case, in this blog post we’ll see how one can implement similar functionality using LangChain and Xata integrations. This can allow for more flexibility in the details of the implementation (for example, you can chose a non-OpenAI model) at the cost of having more code to write and maintain.

The integrations

Currently, the following integrations are available :

Xata as a vector store in LangChain. This allows one to store documents with embeddings in a Xata table and perform vector search on them. The integration takes advantage of the Xata Python SDK. The integration supports filtering by metadata, which is represented in Xata columns for the maximum performance.
Xata as a vector store in LangChain.js. Same as the Python integration, but for your TypeScript/JavaScript applications.
Xata as a memory store in LangChain. This allows storing the chat message history for AI chat sessions in Xata, making it work as “memory” for LLM applications. The messages are stored in a Xata table.
Xata as a memory store in LangChain.js. Same as the Python integration, but for TypeScript/JavaScript.

Each integration comes with one or two code examples in the doc pages linked above.

The four integrations already make Xata one of the most comprehensive data solutions for LangChain, and we’re just getting started! For the near future, we’re planning to add custom retrievers for the Xata keyword and hybrid search and the Xata Ask AI endpoint.

Why choose Xata?

As we’ve pointed out above, a key benefit of LangChain is that it supports a lot of solutions for each integration type. For example, at the moment, the Python LangChain integrates with 47 (!) vector stores, while LangChain.js integrates with 24 vector stores.

So what makes Xata different and why should you consider it for your AI apps?

Xata is a serverless data platform that stores data in PostgreSQL, but also replicates it automatically to Elasticsearch. This means that it offers functionality from both Postgres (ACID transactions, constraints, etc.) and from Elasticsearch (full-text search, vector search, hybrid search) behind the same simple serverless API.

Here is why you should consider Xata for you LangChain application:

It’s comprehensive: It offers LangChain integrations not only as a vector store, but also as a memory store. Also, it offers the same integrations for both the Python and TypeScript/JavaScript versions of LangChain. Because it uses Elasticsearch behind the scene, it can offer BM25 and hybrid search in addition to just vector search.
It’s a pure serverless solution: You simply get an API endpoint, no clusters or instances to configure, because we handle the scaling. The lightweight TypeScript SDK runs in any serverless environment, including Cloudflare Workers.
It has a modern developer workflow: Xata's workflow is based on branches and has built-in integrations with platforms like GitHub, Vercel, and Netlify.
It’s easy: The Xata UI makes it very easy to manage your schema, look-up data, create and test queries and searches, and generally understand what’s going on.

How to get started?

To get started with Xata and LangChain, you can use the minimal code samples from each of the integrations above. If you are looking for more complex examples, for Python, there is a more complete example in this Jupyter Notebook. For TypeScript, check out the announcement blog post on the LangChain blog.

While the integrations added so far already make Xata one of the most comprehensive and easy to use data solutions for LangChain and AI applications, this is only the beginning! We’re planning to add more components that take advantage of Xata’s BM25 search and the Ask endpoint.

If you have any questions or ideas or if you need help implementing Xata with LangChain, reach out to us on Discord or join us on Twitter.