DEV Community: DataStax, an IBM company

Improve Your Python Search Relevancy with Astra DB Hybrid Search

Phil Nash — Wed, 30 Apr 2025 00:44:04 +0000

Astra DB now supports hybrid search, which can increase the accuracy of your search by up to 45%. It does this by performing both vector search and BM25 keyword search and then reranking the results from both to return the most relevant results.

In this post, we'll take a look at how to use Astra DB Hybrid Search in Python.

What is hybrid search?

Before we get to the code, let's go over what hybrid search actually is and why it helps. You would typically build a retrieval-augmented generation (RAG) app by creating vector embeddings for your unstructured content and storing them in a database. Then, when a user makes a query, you turn the query into a vector embedding and use it to perform a similarity search to return relevant context that you can provide to a large language model (LLM) to generate an answer.

The more accurate and relevant your search results from your database are, the better your RAG application will be. With better context, there’s less opportunity for the LLM to return inaccurate or hallucinated responses.

To improve on the relevancy of this system, we need to focus on the search element. Vector search is great at understanding context and meaning, but it can miss results that would be returned from a keyword match. Meanwhile, keyword search can be restrictive as it doesn't understand context. Performing both searches gives us the best chance of returning the top results, but you then need to combine those results so you can pass them to an LLM. This is where reranking comes in.

Reranking is performed by another machine learning model—a cross-encoder—that more accurately scores relevance because the model uses both the original query and the document to create the score. You can't use reranking models for search because it would require scoring every document in your database against the query every time; for small subsets of your data, however, this is achievable.

You can actually use a reranker to help improve vector search results, by returning more results than required, reranking to adjust the order, then returning the top results.

In hybrid search, we use reranking to rescore the combination of results from the vector and keyword searches and pick the top, most relevant results from the output.

Astra DB can now perform hybrid search by combining vector search and BM25 keyword search, then reranking using the NVIDIA NeMo Retriever reranking microservices (including the nvidia/llama-3.2-nv-rerankqa-1b-v2 reranking model). Let's take a look at how to use Astra DB hybrid search to improve search relevancy in your Python application.

Hybrid Search in Python with Astra DB

Let's start by creating a database in your DataStax account. While it’s provisioning, let's get our coding environment set up.

To use Hybrid Search in Python, you’ll need to install version 2 of astrapy as well as python-dotenv so that you can load environment variables from an .env file. Install the dependencies:

pip install "astrapy>=2.0,<3.0" python-dotenv

Create a file called .env and add your database API endpoint, access token and choose a name for your collection.

ASTRA_DB_API_ENDPOINT=
ASTRA_DB_APPLICATION_TOKEN=
ASTRA_DB_COLLECTION_NAME=

Creating a collection for hybrid search

Once the database is created, we'll need to create a collection to store our data in. We'll do this in code, because we want to create some settings that aren't yet available in the dashboard.

Create a file called create_collection.py and add this code:

import os
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
from dotenv import load_dotenv

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)

collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.DOT_PRODUCT)
    .set_vector_service(
        provider="nvidia",
        model_name="NV-Embed-QA",
    )
    .set_lexical(
        {
            "tokenizer": {"name": "standard", "args": {}},
            "filters": [
                {"name": "lowercase"},
                {"name": "stop"},
                {"name": "porterstem"},
                {"name": "asciifolding"},
            ],
        }
    )
)

collection = db.create_collection(
    os.environ["ASTRA_DB_COLLECTION_NAME"],
    definition=collection_definition,
)

In this code we create a definition for our collection and then create the collection. The definition includes details on how we want the collection to create vectors for our data as well as how it should treat the keyword search.

For vector search, we are using Astra Vectorize with the built-in NVIDIA NeMo Retriever nv-embed-qa model to create vector embeddings on insert and search. The model creates vectors with 1024 dimensions, and we configure the collection to use the dot product to calculate similarity between vectors.

For the keyword search, the default performs exact keyword matching, but we can tweak this a bit with settings like this. First, we define the tokenizer, which is how the collection breaks up the text into words. We'll use the standard tokenizer, which divides based on word boundaries and strips out punctuation. We then add filters, which transform the text to make it easier to match searches. In this case, we add four filters:

lowercase - converts all the text to lowercase
stop - removes English stop words
porterstem - applies the Porter Stemming algorithm for English, which translates different forms of words to a common stem, e.g. "search", "searches", and "searched" will all translate to the token "search"
asciifolding - translates characters into ASCII, that is it turns accented characters into an ASCII equivalent if it exists, e.g. "café" becomes "cafe"

Note that both the stop and porterstem filters are specific to English texts.

You can choose to include the filters that will work best for your data. There is more on the available filters and links to further information in the Astra DB documentation.

Now we've created our collection, we can ingest some data to search against.

Indexing data for hybrid search

Save this list of made up restaurant descriptions that we'll use as our example data as a JSON file called restaurants.json. Create a new file called ingest.py and add the following code:

import os
from astrapy import DataAPIClient
from dotenv import load_dotenv
import json

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)
collection = db.get_collection(os.environ["ASTRA_DB_COLLECTION_NAME"])

with open("restaurants.json", "r") as file:
    restaurant_data = json.load(file)
    restaurants = [{"$hybrid": restaurant} for restaurant in restaurant_data]
    collection.insert_many(restaurants)

In this code we load the restaurant descriptions and then create each as a document in Astra DB passing in the description as the $hybrid property. Creating documents with the $hybrid property does two things.

It will use the NVIDIA NeMo Retriever embedding model that we configured when we created the collection to create vector embeddings of the content. This is the same as using Astra Vectorize to generate embeddings.

It will also index the text for the new BM25 keyword search.

Run the code with:

python ingest.py

Check your collection in the DataStax dashboard, you should find both $vectorize and $lexical properties.

Performing a hybrid search

Having indexed using $hybrid, we can now perform vector and hybrid searches against this collection. Create a file called search.py and enter the following code:

import os
from astrapy import DataAPIClient
from dotenv import load_dotenv

load_dotenv()

client = DataAPIClient()
db = client.get_database(
    os.environ["ASTRA_DB_API_ENDPOINT"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
)
collection = db.get_collection(os.environ["ASTRA_DB_COLLECTION_NAME"])

cursor = collection.find(
    sort={"$vectorize": "salads"},
    limit=5,
    projection={"_id": 0, "$vectorize": 1},
)

for document in cursor:
    print(document)

This will perform a vector search on the collection using Astra Vectorize when you run:

python search.py

When you run this search you will see five results. In position four is "The Green Leaf Eatery," which is the most "salads" sounding place on the list to me. Positions one and two do mention salads, and, because it is vector search and not keyword search, position three, "Fusion Flavors Bistro," doesn't mention salads at all.

Now, let's update the search to use Hybrid Search and perform reranking on the results. You will need to change the find method to the new find_and_rerank method and pass {"$hybrid": query} as your sort field. You can add other arguments too, like hybrid_limits, which sets the number of documents to retrieve from each inner query before reranking, and include_scores, which shows the various scores used to rank documents along the way.

cursor = collection.find_and_rerank(
    sort={"$hybrid": "salads"},
    limit=5,
    hybrid_limits=10,
    projection={"_id": 0, "$vectorize": 1},
    include_scores=True,
)

for result in cursor:
    print(result.document)
    print(result.scores)

Now you will see these results:

{'$vectorize': "The Green Leaf Eatery: A bright and airy vegetarian and vegan restaurant focusing on fresh, seasonal produce. Their innovative menu features creative plant-based dishes, from vibrant salads and grain bowls to hearty vegetable curries and decadent vegan desserts. It's a celebration of healthy and delicious eating."}
{'$rerank': -3.6972656, '$vector': 0.67285335, '$vectorRank': 4, '$bm25Rank': 2, '$rrf': 0.03175403}

{'$vectorize': 'The Bohemian Brew & Bites: A quirky and eclectic cafe offering a relaxed atmosphere and a diverse menu. Enjoy gourmet sandwiches on artisanal bread, creative salads with house-made dressings, and a selection of globally inspired small plates. Their extensive coffee and craft beer menu makes it the perfect spot for a casual bite or a leisurely hangout.'}
{'$rerank': -4.5507812, '$vector': 0.6813005, '$vectorRank': 2, '$bm25Rank': 3, '$rrf': 0.032002047}

{'$vectorize': 'The Olive Grove Mediterranean: Transport yourself to the sunny shores of the Mediterranean at this charming restaurant. Their menu features flavorful Greek and Turkish dishes, from grilled kebabs and savory spanakopita to creamy hummus and vibrant salads. Enjoy the fresh herbs, olive oil, and sun-drenched flavors.'}
{'$rerank': -5.1210938, '$vector': 0.68404347, '$vectorRank': 1, '$bm25Rank': 1, '$rrf': 0.032786883}

{'$vectorize': 'Fusion Flavors Bistro: A contemporary restaurant that creatively blends different culinary traditions. Expect unexpected and exciting flavor combinations, innovative presentations, and a menu that constantly evolves. This is a place for adventurous palates seeking a unique dining experience.'}
{'$rerank': -11.375, '$vector': 0.67336804, '$vectorRank': 3, '$bm25Rank': None, '$rrf': 0.015873017}

{'$vectorize': 'The Farmhouse Kitchen: A rustic and charming restaurant celebrating the bounty of the local farm. Their menu changes seasonally, featuring dishes made with the freshest ingredients sourced directly from nearby farms. Expect simple yet elegant preparations that highlight the natural flavors of the ingredients.'}
{'$rerank': -11.375, '$vector': 0.6582356, '$vectorRank': 7, '$bm25Rank': None, '$rrf': 0.014925373}

In this output, you can see the results and also the various scores that were used to rank them. You can see that "The Green Leaf Eatery" now ranks first on the list having been ranked in fourth by vector search and second by the BM25 search. The reranker lifted it up to first place.

There are other similar movements in the list, plus in the fifth position was a restaurant that was initially ranked seventh by the vector search and doesn't contain the search term "salads." Hybrid Search initially returns more results than we need, reranks them and then returns the most relevant, so this result was lifted up into a position to be returned. Positions four and five also received the same rerank score, so were placed in their order based on one more score that is calculated, reciprocal rank fusion (RRF). RRF isn't great for reranking, but is very quick, so is useful to help with tie-breaks here.

Try running vector and hybrid searches with other search terms to get a feel for the results. In our testing, we’ve seen Hybrid Search improve relevance by up to 45%.

Next, we'll take a look at a couple of other things you will need to consider when using Hybrid Search.

Providing your own vectors

The example above used Astra Vectorize to automatically create vector embeddings, but you can always use a different model and provide your own vectors.

If you do use your own vector embedding model, then you will need to provide both the vector and the text that will be indexed for keyword search. You can do this with the special property $lexical.

Imagine you have a method that creates a vector embedding called create_embedding. You might then ingest the data like this:

with open("restaurants.json", "r") as file:
    restaurant_data = json.load(file)
    restaurants = [
        {
            "$vector": create_embedding(restaurant),
            "$lexical": restaurant,
            "description": restaurant,
        }
        for restaurant in restaurant_data
    ]
    collection.insert_many(restaurants)

Now, when you perform a hybrid search, you need to provide a $vector with which to search. Also, the default property on which the content is reranked is $vectorize, so you need to tell the database which property to rerank on too.

You also need to set the query that you want to use to perform the reranking. It can be the same query that you use for the vector search and the keyword search, or something else. You can see more about using different searches below.

You can define the query with the rerank_query argument and the field on which to perform the reranking with the rerank_on argument. For example:

query = "salad"

cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {"$vector": create_embedding(query), "$lexical": query},
    },
    rerank_query=query,
    rerank_on="description",
    limit=5,
    hybrid_limits=10,
)

for result in cursor:
    print(result.document)
    print(result.scores)

Performing different searches

You can also use different terms to perform your initial searches. This is useful because BM25 keyword search acts as a filter on the query keywords.

In our Hybrid Search example above, only three restaurant descriptions mentioned "salads" so only three results had a $bm25Rank in the results.

That worked fine for our example, but when we're dealing with a RAG application, the search queries are often in natural language rather than keyword focused. We already set up our collection to use word stems and translate accented characters into ASCII. You may also want to perform keyword extraction, using something like NLTK, SpaCy or keyBERT, on the user query so you can then use the keywords for the lexical search. This would look like:

query = "I'm looking for a restaurant that serves the best salad"

cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": create_embedding(query),
            "$lexical": extract_keywords(query),
        },
    },
    rerank_query=query,
    rerank_on="description",
    limit=5,
    hybrid_limits=10,
)

for result in cursor:
    print(result.document)
    print(result.scores)

The above code will now perform the vector search with your own vector embedding model, keyword search using keywords extracted from the user query and then rerank the results based on the initial query.

Try hybrid search for better search and RAG relevancy

Combining vector search with keyword search and a reranking model like NVIDIA NeMo Retriever nvidia/llama-3.2-nv-rerankqa-1b-v2 produces more relevant results, improving the output of your RAG application. You can get started with hybrid search and reranking in Astra DB today by signing up and using AstraPy or with Langflow.

If you want to chat more about improving retrieval accuracy, drop into the DataStax Devs Discord or drop me an email at phil.nash@datastax.com.

Build a RAG Chat App with Firebase Genkit and Astra DB

Phil Nash — Wed, 16 Apr 2025 04:17:10 +0000

Today we announced the release of a plugin for Firebase's Genkit framework for building generative AI applications. Genkit is a powerful framework that provides the primitives for building production-quality GenAI applications. From easy access to models, prompts, indexers, and retrievers, to more advanced features like flows, traces, and evals, its power lies in making it easy to do the right thing while building GenAI applications.

In this post, we'll take a look at how to use the Astra DB plugin for Genkit to build a retrieval-augmented generation application with Genkit.

Building a RAG application

Let's build a RAG application from scratch and see how straightforward it can be with Genkit and Astra DB. First, you'll need a Gemini API key, which you can get from Google AI Studio.

You’ll also need an Astra DB database to store your data and vectors; if you don't already have an account you can sign up for a free DataStax account.

Start by creating a new Astra DB database; give it a name and choose a cloud and region. This takes a couple of minutes, so carry on with the next steps while it starts up.

Setting up the app

Create a directory for your app and install the dependencies you'll need:

mkdir genkit-astra-db-rag
cd genkit-astra-db-rag
npm init --yes
npm install genkit @genkit-ai/googleai genkitx-astra-db
npm install genkit-cli tsx -D

Create a file to work in:

touch index.ts

Open index.ts and import the dependencies you installed:

import { z, genkit, Document } from "genkit";
import { textEmbedding004, googleAI, gemini20Flash } from "@genkit-ai/googleai";
import {
  astraDBIndexerRef,
  astraDBRetrieverRef,
  astraDB,
} from "genkitx-astra-db";

In this case, we're pulling in Google's text-embedding-004 model for creating vector embeddings, and the Gemini Flash 2.0 model for generation.

It's about time to create a collection in which we can store our vectors. Hopefully your database has been created now, so head to the DataStax dashboard, choose your database, open the Data Explorer, and create a collection. Give the collection a name and choose "Bring my own" for the embedding generation method. The text-embedding-004 model creates vectors with 768 dimensions (though you can choose fewer), so enter 768 for the number of dimensions and choose "Cosine" for the similarity metric.

Once you've created the collection, you'll need the API endpoint of the database, the collection name and to generate an API token.

With those, create a .env file in your application and enter the credentials:

ASTRA_DB_API_ENDPOINT=""
ASTRA_DB_APPLICATION_TOKEN=""
ASTRA_DB_COLLECTION_NAME=""

Also in the .env file, enter your API key from AI Studio too:

GEMINI_API_KEY=""

Now we can configure Genkit. In index.ts create the ai object like so:

const collectionName = process.env.ASTRA_DB_COLLECTION_NAME!

const ai = genkit({
  plugins: [
    googleAI(),
    astraDB([
      {
        clientParams: {
          applicationToken: process.env.ASTRA_DB_APPLICATION_TOKEN!,
          apiEndpoint: process.env.ASTRA_DB_API_ENDPOINT!,
        },
        collectionName: collectionName,
        embedder: textEmbedding004,
      },
    ]),
  ],
});

This sets up Genkit with the Google AI plugin for models and embeddings and the Astra DB plugin, configured with the credentials to access the collection you just created and the vector embedding model text-embedding-004.

We can now access the Astra DB indexer and retriever via the reference functions:

export const astraDBIndexer = astraDBIndexerRef({ collectionName });
export const astraDBRetriever = astraDBRetrieverRef({ collectionName });

The indexer is used to store documents in the collection and the retriever is used to perform vector search to return documents from the collection.

Ingesting data

Now we can ingest some data into Astra DB. For this RAG application, let's grab data from the web. To ingest web data, we'll need to fetch it from a URL and then extract the main content from the returned HTML. I've written before about how I like to use Readability.js to parse out the content from a page, so we'll follow that. We'll also need something to turn the content into chunks, let's use llm-chunk for this as it's relatively simple.

Install the dependencies:

npm install @mozilla/readability jsdom llm-chunk

Import them at the top of the script:

import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";
import { chunk } from "llm-chunk";

Write a function that takes a URL, fetches the HTML content, extracts the content and returns it.

async function fetchTextFromWeb(url: string) {
  const html = await fetch(url).then((res) => res.text());
  const doc = new JSDOM(html, { url });
  const reader = new Readability(doc.window.document);
  const article = reader.parse();
  return article?.textContent || "";
}

The next thing to do is write our first Genkit flow to ingest data from a URL into the collection. Flows are functions that you can run via the Genkit UI or through code. Flows have strongly defined input and output schemas using zod.

For this flow we'll accept a string which is a URL. There's no need for an output as the function will just end when it completes successfully.

export const indexWebPage = ai.defineFlow(
  {
    name: "indexPage",
    inputSchema: z.string().url().describe("URL"),
    outputSchema: z.void(),
  },
  async (url: string) => {
    const text = await ai.run("extract-text", () => fetchTextFromWeb(url));

    const chunks = await ai.run("chunk-it", async () =>
      chunk(text, { minLength: 128, maxLength: 1024, overlap: 128 })
    );

    const documents = chunks.map((text) => {
      return Document.fromText(text, { url });
    });

    return await ai.index({
      indexer: astraDBIndexer,
      documents,
    });
  }
);

The ingestion pipeline is nice and easy to read as a flow. And using ai.run around the non-Genkit functions provides an extra level of tracing that we'll be able to see later.

The Genkit UI

This seems like a good time to test out what we've built so far. Open package.json and add a script to run your application code and one to start the Genkit server.

"scripts": {
  "start": "tsx --env-file .env ./index.ts",
  "genkit": "genkit start -- npm start"
},

Now you can run npm run genkit and open the Genkit UI in your browser at localhost:4000. You can either find your flow on the dashboard or by clicking on Flows in the sidebar and then selecting it from the list.

This gives you a box to add some input. The input is the schema that we set up as the parameters to the flow. In this case, it just expects a string that’s a URL.

Enter a URL and run the flow. Once it's complete, you can open the DataStax dashboard and see the chunks and their vectors stored in the collection.

Back in the Genkit UI you can click on View trace and you’ll be shown each of the steps the flow took to fetch, chunk, embed and store the data.

Head back to the Genkit dashboard and open Retrievers from the sidebar. All we did to define the available retriever was set up the Astra DB plugin and export the astraDBRetrieverRef.

We can already use that retriever from the Genkit UI. Click on the retriever and enter the following in the input:

{
    "content": [
        {
            "text": "some search term"
        }
    ],
    "metadata": {}
}

In the options, change the property k to 5. Run the retriever and it will perform a vector search using the text you provide in the input and returning five results from the database.

We can now hook this up with a full RAG flow, in which we first retrieve context from the database and then pass it to a model to generate a response. Open the code again and define another flow:

export const ragFlow = ai.defineFlow(
  { name: "rag", inputSchema: z.string(), outputSchema: z.string() },
  async (input: string) => {
    const docs = await ai.retrieve({
      retriever: astraDBRetriever,
      query: input,
      options: { k: 3 },
    });

    const { text } = await ai.generate({
      model: gemini20Flash,
      prompt: `
You are a helpful AI assistant that can answer questions.

Use only the context provided to answer the question.
If you don't know, do not make up an answer.

Question: ${input}`,
      docs,
    });

    return text;
  }
);

Here we use the retriever to search for the string input, and then pass the resulting documents as part of a prompt to the generate function that uses the Gemini Flash 2.0 model to perform the generation.

Restart the Genkit server, open up the Flows section and choose your RAG flow. You can now input a question, make sure it's relevant to the data you indexed, and Gemini will generate a relevant response based on the docs.

Once again, you can hit the View trace button to see what happened at each stage in this request.

We've only used these flows in the Genkit interface so far, but for either of the flows, you can run them like:

await indexWebPage.run("URL");

Genkit and Astra DB make RAG easy

It took us fewer than 100 lines of code to build the two major flows required for RAG: ingestion and generation. Firebase Genkit made it easy to test our implementation as we went—without us having to build a UI for it. And the tracing in Genkit means it's easier to track down bugs in your flows.

Astra DB is an easy to use and powerful vector database, and it's even easier to use when all you need to do is configure the plugin in Genkit and reference indexers and retrievers.

You can find the code for this app on GitHub. The Astra DB plugin for Genkit is open source so if you have any issues or requests, please open an issue on the GitHub repo. And check out the Genkit docs for more on what you can build with Genkit.

Frequently Asked Questions (FAQ)

What is Astra DB?

Astra DB is a cloud-based NoSQL document store. It features an accurate and performant vector index for storing vectors which can be used for similarity searches. It comes with a Genkit plugin for integration with the Firebase Genkit framework.

What is Genkit?

Genkit is a framework for building generative AI applications. It provides essential tools such as models, prompts, indexers, retrievers, flows, traces, and evaluations. By using Genkit, developers can efficiently create applications that leverage the power of generative AI.

How do I get started with building a RAG application using Genkit and Astra DB?

To build a RAG application with Genkit and Astra DB, you need to create a database and collection within Astra DB and then install Genkit and related dependencies into your Node.js application. Once you've configured Genkit with your Astra DB credentials, you can start creating flows.

What does building the RAG application involve?

Building the RAG application involves creating a collection in Astra DB to store vectors, and setting up flows in Genkit to ingest data, and to generate responses based on context retrieved from the database. You can test these flows out using the Genkit UI.

DataStax AI Platform:

The Fastest Way to Build and Deploy AI Apps

Try For Free

How to Create Vector Embeddings in Python

Phil Nash — Wed, 09 Apr 2025 01:26:09 +0000

When you’re building a retrieval-augmented generation (RAG) app, the first thing you need to do is prepare your data. You need to:

collect your unstructured data
split it into chunks
turn those chunks into vector embeddings
store the embeddings in a vector database

There are many ways that you can create vector embeddings in Python. In this post, we’ll take a look at four ways to generate vector embeddings: locally, via API, via a framework, and with Astra DB's Vectorize.

Local vector embeddings

There are many pre-trained embedding models available on Hugging Face that you can use to create vector embeddings. Sentence Transformers (SBERT) is a library that makes it easy to use these models for vector embedding, as well as cross-encoding for reranking. It even has tools for finetuning models, if that’s something that might be of use.

You can install the library with:

pip install sentence_transformers

A popular local model for vector embedding is all-MiniLM-L6-v2. It’s trained as a good all-rounder that produces a 384-dimension vector from a chunk of text.

To use it, import sentence_transformers and create a model using the identifier from Hugging Face, in this case "all-MiniLM-L6-v2". If you want to use a model that isn't in the sentence-transformers project, like the multilingual BGE-M3, you can use the organization to identify the model too, like, "BAAI/BGE-M3". Once you've loaded the model, use the encode method to create the vector embedding. The full code looks like this:

from sentence_transformers import SentenceTransformer


model = SentenceTransformer("all-MiniLM-L6-v2")
sentence = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
embedding = model.encode(sentence)

print(embedding)
# => [ 1.95171311e-03  1.51085425e-02  3.36140348e-03  2.48030387e-02 ... ]

If you pass an array of texts to the model, they’ll all be encoded:

from sentence_transformers import SentenceTransformer


model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
    "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
    "A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.",
    "A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.",
]
embeddings = model.encode(sentences)

print(embeddings)
# => [[ 0.00195174  0.01510859  0.00336139 ...  0.07971715  0.09885529  -0.01855042]
# [-0.04523939 -0.00046248  0.02036596 ...  0.08779042  0.04936493  -0.06218244]
# [-0.05453169  0.01125113 -0.00680178 ...  0.06443197  0.08771271  -0.00063468]]

There are many more models you can use to generate vector embeddings with the sentence-transformers library and, because you’re running locally, you can try them out to see which is most appropriate for your data. You do need to watch out for any restrictions that these models might have. For example, the all-MiniLM-L6-v2 model doesn’t produce good results for more than 128 tokens and can only handle a maximum of 256 tokens. BGE-M3, on the other hand, can encode up to 8,192 tokens. However, the BGE-M3 model is a couple of gigabytes in size and all-MiniLM-L6-v2 is under 100MB, so there are space and memory constraints to consider, too.

Local embedding models like this are useful when you’re experimenting on your laptop, or if you have hardware that PyTorch can use to speed up the encoding process. It’s a good way to get comfortable running different models and seeing how they interact with your data.

If you don't want to run your models locally, there are plenty of available APIs you can use to create embeddings for your documents.

APIs

There are several services that make embedding models available as APIs. These include LLM providers like OpenAI, Google, or Cohere, as well as specialist providers like Jina AI or model hosts like Fireworks.

These API providers provide HTTP APIs, often with a Python package to make it easy to call them. You will typically require an API key from the service. Once you have that setup you can generate vector embeddings by sending your text to the API.

For example, with Google's google-genai SDK and a Gemini API key you can generate a vector embedding with their experimental Gemini embedding model like this:

from google import genai


client = genai.Client(api_key="GEMINI_API_KEY")

result = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents="A robot may not injure a human being or, through inaction, allow a human being to come to harm.")

print(result.embeddings)

Each API can be different, though many providers do make OpenAI-compatible APIs. However, each time you try a new provider you might find you have a new API to learn. Unless, of course, you try one of the available frameworks that are intended to simplify this.

Frameworks

There are several projects available, like LangChain or LlamaIndex, that create abstractions over the common components of the GenAI ecosystem, including embeddings.

Both LangChain and LlamaIndex have methods for creating vector embeddings via APIs or local models, all with the same interface. For example, you can create the same Gemini embedding as the code snippet above with LangChain like this:

from langchain_google_genai import GoogleGenerativeAIEmbeddings


embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-exp-03-07",
    google_api_key="GEMINI_API_KEY"
)
result = embeddings.embed_query("A robot may not injure a human being or, through inaction, allow a human being to come to harm.")
print(result)

As a comparison, here is how you would generate an embedding using an OpenAI embeddings model and LangChain:

from langchain_openai import OpenAIEmbeddings


embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key="OPENAI_API_KEY"
)
result = embeddings.embed_query("A robot may not injure a human being or, through inaction, allow a human being to come to harm.")
print(result)

We had to change the name of the import and the API key we used, but otherwise the code is identical. This makes it easy to swap them out and experiment.

If you're using LangChain to build your entire RAG pipeline, these embeddings fit in well with the vector database interfaces. You can provide an embedding model to the database object and LangChain handles generating the embeddings as you insert documents or perform queries. For example, here's how you can combine the Google embeddings model with the LangChain wrapper for Astra DB.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_astradb import AstraDBVectorStore


embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-exp-03-07",
    google_api_key="GEMINI_API_KEY"
)

vector_store = AstraDBVectorStore(
   collection_name="astra_vector_langchain",
   embedding=embeddings,
   api_endpoint="ASTRA_DB_API_ENDPOINT",
   token="ASTRA_DB_APPLICATION_TOKEN"
)

vector_store.add_documents(documents) # a list of document objects to store in the db

You can use the same vector_store object and associated embeddings to perform the vector search, too.

results = vector_store.similarity_search("Are robots allowed to protect themselves?")

LlamaIndex has a similar set of abstractions that enable you to combine different embedding models and vector stores. Check out this LlamaIndex introduction to RAG to learn more.

If you're new to embeddings, LangChain has a handy list of embedding models and providers that can help you find different options to try.

Directly in the database

The methods we’ve talked through so far have involved creating a vector independently of storing it in or using it to search against a vector database. When you want to store those vectors in a vector database like Astra DB, it looks a bit like this:

from astrapy import DataAPIClient


client = DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
database = client.get_database("ASTRA_DB_API_ENDPOINT")
collection = database.get_collection("COLLECTION_NAME")

result = collection.insert_one(
    {
         "text": "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
         "$vector": [0.04574034, 0.038084425, -0.00916391, ...]
    }
)

The above assumes that you have already created your vector-enabled collection with the right number of dimensions for the model you’re using.

Performing a vector search then looks like this:

cursor = collection.find(
    {},
    sort={"$vector": [0.04574034, 0.038084425, -0.00916391, ...]}
)

for document in cursor:
    print(document)

In these examples, you have to create your vectors first, before storing or searching against the database with them. In the case of the frameworks, you might not see this happen, as it has been abstracted away, but the operations are being performed.

With Astra DB, you can have the database generate the vector embeddings for you as you either insert the document into the collection or at the point of performing the search. This is called Astra Vectorize and it simplifies a crucial step in your RAG pipeline.

To use Vectorize, you first need to set up an embedding provider integration. There’s one built-in integration that you can use with no extra work; the NVIDIA NV-Embed-QA model, or you can choose one of the other embeddings providers and configure them with your API.

When you create a collection, you can choose which embedding provider you want to use with the requisite number of dimensions.

When you set up your collection this way you can add content and have it automatically vectorized by using the special property $vectorize.

result = collection.insert_one(
    {
         "$vectorize": "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
    }
)

Then, when a user query comes in, you can perform a vector search by sorting using the $vectorize property. Astra DB will create the vector embedding and then make the search in one step.

cursor = collection.find(
    {},
    sort={"$vectorize": "Are robots allowed to protect themselves?"},
    limit=5
)

There are several advantages to this approach:

The Astra DB team has done the work to make the embedding creation robust already
Making two separate API calls to create embeddings and then store them is often slower than letting Astra DB handle it
Using the built-in NVIDIA embeddings model is even quicker than that
You have less code to write and maintain

A world of vector embedding options

As we have seen, there are many choices you can make in how to implement vector embeddings, which model you use, and which provider you use. It's an important step in your RAG pipeline and it is important to spend the time to find out which model and method is right for your application and your data.

You can choose to host your own models, rely on third-party APIs, abstract the problem away through frameworks, or entrust Astra DB to create embeddings for you. Of course, if you want to avoid code entirely, then you can drag-and-drop your components into place with Langflow.

If you want to chat more about vector embeddings and RAG, drop into the DataStax Devs Discord or drop me an email at phil.nash@datastax.com.

Frequently asked questions

What are vector embeddings?

Vector embeddings are numerical representations of text in multi-dimensional space used for tasks like document retrieval and recommendation systems.

What steps are involved in creating vector embeddings for a retrieval-augmented generation (RAG) app?

To create vector embeddings, you need to:

Collect unstructured data
Split data into chunks
Turn chunks into vector embeddings
Store embeddings in a vector database

How can I create vector embeddings locally in Python?

You can create vector embeddings locally in Python using pre-trained embedding models from the HuggingFace, specifically using the sentence-transformers library.

What are some limitations of local embedding models?

Local embedding models handle a limited number of tokens effectively, and larger models require substantial memory and storage.

How can I create vector embeddings using an API?

You can create vector embeddings using APIs provided by services such as OpenAI, Google, and Cohere.

Are there frameworks to simplify embedding creation?

Yes, frameworks like LangChain and LlamaIndex offer standardized interfaces that abstract the complexities of embedding models and APIs.

What is Astra Vectorize, and how does it simplify the embedding process?

Astra Vectorize enables Astra DB to automatically generate vector embeddings as documents are inserted or queries are performed.

What are the advantages of using Astra Vectorize?

The advantages include simplified code maintenance, faster performance, improved efficiency, and robustness through pre-tested integrations.

How to Create Vector Embeddings in Node.js

Phil Nash — Thu, 03 Apr 2025 21:43:17 +0000

When you’re building a retrieval-augmented generation (RAG) app, job number one is preparing your data. You’ll need to take your unstructured data and split it up into chunks, turn those chunks into vector embeddings, and finally, store the embeddings in a vector database.

There are many ways that you can create vector embeddings in JavaScript. In this post, we’ll investigate four ways to generate vector embeddings in Node.js: locally, via API, via a framework, and with Astra DB's Vectorize.

Local vector embeddings

There are lots of open-source models available on HuggingFace that can be used to create vector embeddings. Transformers.js is a module that lets you use machine learning models in JavaScript, both in the browser and Node.js. It uses the ONNX runtime to achieve this; it works with models that have published ONNX weights, of which there are plenty. Some of those models we can use to create vector embeddings.

You can install the module with

npm install @xenova/transformers

The package can actually perform many tasks, but feature extraction is what you want for generating vector embeddings.

A popular, local model for vector embedding is all-MiniLM-L6-v2. It’s trained as a good all-rounder and produces a 384-dimension vector from a chunk of text.

To use it, import the pipeline function from Transformers.js and create an extractor that will perform "feature-extraction" using your provided model. You can then pass a chunk of text to the extractor and it will return a tensor object which you can turn into a plain JavaScript array of numbers.

All in all, it looks like this:

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2"
);

const response = await extractor(
  ["A robot may not injure a human being or, through inaction, allow a human being to come to harm."],
  { pooling: "mean", normalize: true }
);

console.log(Array.from(response.data));
// => [-0.004044221248477697,  0.026746056973934174,   0.0071970801800489426, ... ]

You can actually embed multiple texts at a time if you pass an array to the extractor. Then you can call tolist on the response and that will return you a list of arrays as your vectors.

const response = await extractor(
  [
    "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
    "A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.",
    "A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.",
  ],
  { pooling: "mean", normalize: true }
);

console.log(response.tolist());
// [
//   [ -0.006129210349172354,  0.016346964985132217,   0.009711502119898796, ...],
//   [-0.053930871188640594,  -0.002175076398998499,   0.032391052693128586, ...],
//   [-0.05358131229877472,  0.021030642092227936, 0.0010665050940588117, ...]
// ]

There are many models you can use to create vector embeddings from text, and, because you’re running locally, you can try them out to see which works best for your data. You should pay attention to the length of text that these models can handle. For example, the all-MiniLM-L6-v2 model does not provide good results for more than 128 tokens and can handle a maximum of 256 tokens, so it’s useful for sentences or small paragraphs. If you have a bigger source of text data than that, you’ll need to split your data into appropriately sized chunks.

Local embedding models like this are useful if you’re experimenting on your own machine, or have the right hardware to run them efficiently when deployed. It's an easy way to get comfortable with different models and get a feel for how things work without having to sign up to a bunch of different API services.

Having said that, there are a lot of useful vector embedding models available as an API, so let's take a look at them next.

APIs

There is an abundance of services that provide embedding models as APIs. These include LLM providers, like OpenAI, Google or Cohere, as well as specialist providers like Voyage AI or Jina. Most providers have general purpose embedding models, but some provide models trained for specific datasets, like Voyage AI's finance, law and code optimised models.

These API providers provide HTTP APIs, often with an npm package to make it easy to call them. You’ll typically need an API key from the service and you can then generate embeddings by sending your text to the API.

For example, you can use Google's text embedding models through the Gemini API like this:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({ model: "text-embedding-004"});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const result = await model.embedContent(text);
console.log(result.embedding.values);
// => [0.04574034, 0.038084425, -0.00916391, ...]

Each API is different though, so while making a request to create embeddings is normally fairly straightforward, you’ll likely have to learn a new method for each API you want to call—unless of course, you try one of the available frameworks that are intended to simplify this.

Frameworks

There are many projects out there, like LangChain or LlamaIndex, that create abstractions over the various parts of the GenAI toolchain, including embeddings.

Both LangChain and LlamaIndex enable you to generate embeddings via APIs or local models, all with the same interface. For example, here’s how you can create the same embedding as above using the Gemini API and LangChain together:

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-004",
});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const embedding = await embeddings.embedQuery(text);
console.log(embedding);
// => [0.04574034, 0.038084425, -0.00916391, ...]

To compare, this is what it looks like to use the OpenAI embeddings model through LangChain:

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-3-large",
});
const text = "A robot may not injure a human being or, through inaction, allow a human being to come to harm."

const embedding = await embeddings.embedQuery(text);
console.log(embedding);
// => [0.009445431, -0.0073068426, -0.00814802, ...]

Aside from changing the name of the import and sometimes the options, the embedding models all have a consistent interface to make it easier to swap them out.

If you’re using LangChain to create your entire pipeline, these embedding interfaces work very well alongside the vector database interfaces. You can provide an embedding model to the database integration and LangChain handles generating the embeddings as you insert documents or perform vector searches. For example, here is how to embed some documents using Google's embeddings and store them in Astra DB via LangChain:

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";
import { AstraDBVectorStore } from "@langchain/community/vectorstores/astradb";

const embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: process.env.API_KEY,
  model: "text-embedding-004",
});

const vectorStore = await AstraDBVectorStore.fromDocuments(
  documents, // a list of document objects to put in the store
  embeddings, // the embeddings model
  astraConfig, // config to connect to Astra DB
);

When you provide the embeddings model to the database object, you can then use it to perform vector searches too.

const results = vectorStore.similaritySearch("Are robots allowed to protect themselves?");

LlamaIndex allows for similar creation of embedding models and vector stores that use them. Check out the LlamaIndex documentation on RAG.

As a bonus, the lists of models that LangChain and LlamaIndex integrate are good examples of popular embedding models.

Directly in the database

So far, the methods above mostly involve creating a vector embedding independently of storing the embedding in a vector database. When you want to store those vectors in a vector database like Astra DB, it looks a bit like this:

import { DataAPIClient } from "@datastax/astra-db-ts";
const client = new DataAPIClient(process.env.ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(process.env.ASTRA_DB_API_ENDPOINT);
const collection = db.collection(process.env.ASTRA_DB_COLLECTION);

await collection.insertOne({
  text: "A robot may not injure a human being or, through inaction, allow a human being to come to harm.",
  $vector: [0.04574034, 0.038084425, -0.00916391, ...]
});

This assumes you have already created a vector enabled collection with the correct number of dimensions for the model you are using.

You can also search against the documents in your collection using a vector like this:

const cursor = collection.find({}, {
  sort: { $vector: [0.04574034, 0.038084425, -0.00916391, ...] },
  limit: 5,
});
const results = await cursor.toArray();

In this case, you have to create your vectors first, and then store or search against the database with them. Even in the case of the frameworks, that process happens, but it’s just abstracted away.

With Astra DB, you can have the database generate the embeddings for you as you’re inserting documents into a collection or as you perform a vector search against a collection.

This is called Astra DB vectorize; here's how it works.

First, set up an embedding provider integration. There is a built-in integration offering the NVIDIA NV-Embed-QA model, or you can choose one of the other providers and configure them with your own API key.

Then when you set up a collection, you can choose which embedding provider you want to use and set the correct number of dimensions.

Now, when you add a document to this collection, you can add the content using the special key $vectorize and a vector embedding will be created.

await collection.insertOne({
$vectorize: "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
});

When you want to perform a vector search against this collection, you can sort by the special $vectorize field and again, Astra DB will handle creating vector embeddings and then performing the search.

const cursor = collection.find({}, {
  sort: { $vectorize: "Are robots allowed to protect themselve?" },
  limit: 5,
});
const results = await cursor.toArray();

This has several advantages:

It's robust, as Astra DB handles the interaction with the embedding provider
It can be quicker than making two separate API calls to create embeddings and then store them
It's less code for you to write

Choose the method that works best for your application

There are many models, providers, and methods you can use to turn text into vector embeddings. Creating vector embeddings from your content is a vital part of the RAG pipeline and it does require some experimentation to get it right for your data.

You have the choice to host your own models, call on APIs, use a framework, or let Astra DB handle creating vector embeddings for you. And, if you want to avoid code altogether, you could choose to use Langflow's drag-and-drop interface to create your RAG pipeline

Building a Weather App with a Raspberry Pi, Astra DB, and Langflow

Aaron Ploetz — Fri, 14 Mar 2025 19:11:35 +0000

To celebrate PI Day this year, we thought it would be fun to build something with a Raspberry Pi that uses Astra DB and/or Langflow. Fortunately, I have just the project in-mind: A weather application!

To that end, our goal will be to use the National Weather Service’s (NWS) data API. Essentially, we will call this API to get the most recent weather data, store it in Astra DB, and display it on a simple front-end.

Requirements

To build our project, we’re going to need a few things. First of all, our development environment will use the following:

Java 17
Spring Boot
Maven
Vaadin
An Astra DB account with an active database and Langflow instance.

And of course, we’ll also need a Raspberry Pi. For this project, we used a Cana Kit™ Raspberry Pi 5 Starter Kit PRO Turbine Black (4GB RAM / 128GB Micro SD).

The weather application

The weather application we will use can be found in this GitHub repository: https://github.com/aar0np/weather-app/tree/main

This application originally appeared in Chapter 8 of the book Code with Java 21. The original project was designed to work with DataStax Astra DB, using the CQL protocol. Our fork of it is a bit different, as it can refresh its data view from either the Astra DB Data API or from a Langflow API endpoint.

At its core, the project is a Java Spring Boot application, which has a Vaadin web front end, and also exposes two restful endpoints. The idea behind the endpoints is as much for testing as it is functional. One endpoint pulls the most recent update from the NWS API for a given weather station ID, and stores it in Astra DB. The other endpoint retrieves the most-recent reading from Astra DB for a particular station and year/month combination.

Restful examples

Pulling the latest reading from the NWS for the station KMSP (Minneapolis/St.Paul International Airport), and storing it in Astra DB:

curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp

This endpoint returns a response similar to the following:

{"stationId":"https://api.weather.gov/stations/KMSP","monthBucket":202503,"timestamp":"2025-03-07T22:53:00Z","readingIcon":"https://api.weather.gov/icons/land/day/few?size=medium","stationCoordinatesLatitude":-93.22,"stationCoordinatesLongitude":44.88,"temperatureCelsius":5.6,"windDirectionDegrees":310,"windSpeedKMH":20.52,"windGustKMH":0.0,"visibilityM":16090,"precipitationLastHour":0.0,"cloudCover":{"7620":"FEW","1830":"FEW"}}

This endpoint pulls the latest reading for a specific station and year/month combination:

curl -X GET http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp/month/202503

This endpoint returns a response similar to the following:

{"stationId":"kmsp","monthBucket":202503,"timestamp":"2025-03-07T22:53:00Z","readingIcon":"https://api.weather.gov/icons/land/day/few?size=medium","stationCoordinatesLatitude":-93.22,"stationCoordinatesLongitude":44.88,"temperatureCelsius":5.6,"windDirectionDegrees":310,"windSpeedKMH":0.0,"windGustKMH":0.0,"visibilityM":0,"precipitationLastHour":0.0,"cloudCover":{"7620":"FEW","1830":"FEW"}}

Note: The above restful GET call is also used to populate the web frontend.

The Astra DB Data API

First, create a new Astra DB database. We can also use an existing database, as long as we create the following:

Keyspace named: “weatherapp”
Non-vector collection named: “weather_data”

There are two primary controller methods that handle the above data calls. The first method is named putLatestAstraAPIData and handles the restful PUT call. First, it performs a GET call on the NWS API endpoint for the stationid that was passed-in. It takes the payload, maps it to an Astra DB Data API Document type named weatherDoc. Then, it saves weatherDoc in Astra DB (via the Data API). Finally, it maps the response to a WeatherReading object named currentReading, and returns it. The code for the putLatestAstraAPIData() method is shown below:

@PutMapping("/astradb/api/latest/station/{stationid}")
public ResponseEntity<WeatherReading> putLatestAstraAPIData(
             @PathVariable(value="stationid") String stationId) {

       LatestWeather response = restTemplate.getForObject(
                    "https://api.weather.gov/stations/" + stationId + 
                    "/observations/latest", LatestWeather.class);

       Document weatherDoc = mapLatestWeatherToDocument(response, stationId);

       // save weather reading
       collection.insertOne(weatherDoc);

       // build response
       WeatherReading currentReading =
                    mapLatestWeatherToWeatherReading(response);

      return ResponseEntity.ok(currentReading);
}

The other controller method is named getLatestAstraAPIData and it handles the RESTful GET call. This method takes the stationid and the monthBucket that were passed-in, and uses the Data API to find any matching documents. As there might be multiple, the results are sorted in descending order by timestamp, and the top document is processed. This ensures that the latest document is mapped and returned.

The code for the getLatestAstraAPIData() method is shown below:

@GetMapping("/astradb/api/latest/station/{stationid}/month/{month}")
public ResponseEntity<WeatherReading> getLatestAstraAPIData(
             @PathVariable(value="stationid") String stationId,
             @PathVariable(value="month") int monthBucket) {

       Filter filters = Filters.and(eq("station_id",(stationId)),
                           (eq("month_bucket",monthBucket)));
       Sort sort = Sorts.descending("timestamp");
       FindOptions findOpts = new FindOptions().sort(sort);
       FindIterable<Document> weatherDocs = collection.find(filters, findOpts);
       List<Document> weatherDocsList = weatherDocs.all();

       if (weatherDocsList.size() > 0) {
              Document weatherTopDoc = weatherDocsList.get(0);
              WeatherReading currentReading =
                           mapDocumentToWeatherReading(weatherTopDoc);
              return ResponseEntity.ok(currentReading);
       }

       return ResponseEntity.ok(new WeatherReading());
}

The Langflow API

This entire process can also work through Langflow. Open up Langflow, create a new flow, and pick the “Simple Agent” template. This simple agent is all that we need.

A sample flow created by selecting the “Simple Agent” template in Langflow.

The agent is built with the URL “tool,” which allows the agent to call out to external web addresses, including APIs. To expose this agent, we simply need to click on the “API” tab and make note of the Langflow endpoint URL. We will add this URL as an environment variable with our application.

Inside our application, our call to Langflow is handled by a method named askAgent. Simply put, this method calls our Langflow API endpoint, maps the result, and returns it to the UI. The code for the askAgent() method can be seen below:

public WeatherReading askAgent (AgentRequest req) {

       String reqJSON = new Gson().toJson(req);
       HttpEntity<String> requestEntity =
                    new HttpEntity<>(reqJSON, langflowHeader);

       ResponseEntity<LangflowResponse> resp =
                    restTemplate.exchange(LANGFLOW_URL,
                    HttpMethod.POST,
                    requestEntity,
                    LangflowResponse.class);

       LangflowResponse lfResp = resp.getBody();
       LangflowOutput1[] outputs = lfResp.getOutputs();

       return mapLangflowResponseToWeatherReading(outputs);
}

The method inside our Vaadin UI code that calls the askAgent() method, is named refreshLangflow() and is triggered by a button on the UI. It composes a message for our Langflow agent, sends it, and uses the data returned to refresh the UI. The code can be seen below:

private void refreshLangflow() {

      String message = "Please retrieve the latest weather data (including the weather icon url) in a text format using this endpoint: "
+ "https://api.weather.gov/stations/" + stationId.getValue() + 
"/observations/latest";
       latestWeather = controller.askAgent(new AgentRequest(message));

       refreshData(latestWeather);
}

After reviewing the code, we should now be ready to build and configure our hardware.

Raspberry Pi

First of all, we will need to assemble the Pi. Fortunately, Cana Kit has a great setup video that walks through the entire process.

Note: I prefer to use Cana Kits, because they come with everything that you need, such as a Micro HDMI cable, a heat sink with fan, and a Micro SD card.

Once the Pi is assembled and running, it will check for updates and reboot. When we get to the Raspberry Pi OS desktop, we will have a few things to install (using the Terminal application).

Java

For our application to run, we need a Java Virtual Machine (JVM). As we will also need to build our application locally, we’ll need a Java Development Kit (JDK) as well. In our case, our Pi had Java 17 installed, and this is sufficient for our purposes.

Note: The Raspberry Pi OS makes it difficult to install and configure newer versions of Java. Fortunately, our project compiles just fine with Java 17.

Maven

Maven is a build- and dependency-management tool for Java. Our project was built with Maven, so we will need to install it as well:

sudo apt install maven

Git

Our Pi also had Git installed. After creating a new SSH key and adding it to your GitHub account, we should be able to clone the project repository:

git clone git@github.com:aar0np/weather-app.git

This will download the code and create a local directory for our application, where we can build and run it.

Putting it all together

First, we will cd into our project directory and then build it with Maven:

cd weather-app
mvn clean install -Pproduction

Next, we need to define three environment variables:

export ASTRA_DB_API_ENDPOINT=https://not-real-us-east1.apps.astra.datastax.com
export ASTRA_DB_APP_TOKEN=AstraCS:wtqNOTglg:725REAL238dEITHER563486d
export ASTRA_LANGFLOW_URL=https://api.langflow.astra.datastax.com/lf/6f-not-real-9493/api/v1/run/060d2-not-real-caef?stream=false

We can now run our application. Maven will create a JAR file in the weather-app/target directory. If we locate this JAR file, we can run it like this:

java -jar target/weatherapp-0.0.1-SNAPSHOT.jar

A successful run should produce several log messages, the last of which should look similar to this:

2025-03-07T20:02:02.259-06:00  INFO 53787 --- [WeatherApp] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2025-03-07T20:02:02.278-06:00  INFO 53787 --- [WeatherApp] [           main] c.d.weatherapp.WeatherappApplication     : Started WeatherappApplication in 1.795 seconds (process running for 2.147)
2025-03-07T20:02:04.855-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-03-07T20:02:04.855-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2025-03-07T20:02:04.856-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 0 ms

From another terminal window/tab, let’s add some data to the DB:

curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp

Now, navigate to the local IP on port 8080: http://127.0.0.1:8080/

The Station ID is pre-populated with “kmsp” and the current year/month is auto-generated. Clicking either “Astra DB Refresh” or “Langflow Refresh” should produce something similar to this:

Note: The Astra DB Refresh will be faster than the Langflow Refresh.

Depending on the intended usage, it might be preferable to add a crontab entry for the PUT call to keep the data recent:

crontab -e
...
15 * * * * curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp

With that, we should have a working weather application running on a Raspberry Pi! And without having to mount any pesky sensors in the backyard. Want to try this yourself? Get started with Astra DB and Langflow today!

Happy Pi Day!

Top 3 Mistakes I Made While Building AI Agents

melienherrera — Wed, 12 Mar 2025 15:13:56 +0000

Agents have become a hot topic in AI, and, like many of you, I initially wondered:

“Why can’t I just prompt my LLM to do this task for me?”
“What’s the difference between prompting a model versus using an agent?”
“Oh no, another AI concept that I have to learn?”

After diving into agent development, I quickly realized why this approach has generated so much buzz. Unlike simple LLM prompts, agents can interact with external tools, maintain state across multiple steps, and execute complex workflows. Agents are like a personal assistant who can email contacts, write documentation, and schedule appointments – deciding which tool to use when, and understanding the right moment to apply it.

This journey wasn’t without challenges. Like many developers in their discovery phase, I made mistakes along the way while building my personal assistant app. Each misstep taught me valuable lessons that improved my approach to building agents.

In this post, I’ll share my top three mistakes, in hopes that by “building in public,” I can help you avoid these same pitfalls. Let’s dive in!

Mistake #1: Overestimating the agent’s capabilities

My first mistake when I started building agents seemed to be a simple but critical one. I learned that agents have agency: an ability to decide and reason where a base LLM does not. They can select tools, maintain context, and execute multi-step plans. Because of this, I drastically underestimated the importance of clear, detailed instructions to the agent’s system prompt, and overestimated the agent’s capability to figure things out on its own.

Newsflash: agents are still powered by LLMs! Agents use LLMs as their core reasoning and decision-making engine. This means that they have both the same strengths and the same limitations of their underlying language model.

I initially created vague prompts like “You are a helpful assistant that can email people, create docs, and other operational tasks. Be clear and concise and maintain a professional tone throughou.t” I assumed that because the agent had access to email tools and documentation tools it would intuitively understand when and how to use them appropriately. However, this was not the case and my prompt was simply not enough.

What I learned through trial and error is that while agents add powerful tool-using capabilities, they are not “magic” – entirely. They still need the same level of clear guidance and explicit instructions that you’d provide in a direct LLM prompt – perhaps even more so. The agent needs to know not just that it has access to tools, but precisely when to invoke them, how to interpret their outputs, and how to integrate them back into the expected workflow. Here’s my my improved prompt:

I fundamentally misunderstood how to effectively prompt an agent. Once I started writing detailed system prompts like the one above, providing examples where needed, and referencing tool names where I could, my agent’s performance improved dramatically.

My second critical mistake was attempting to create ONE “super agent” equipped with every possible tool needed for my personal assistant app. I understood the concept of agents and tools, so I began to connect a bunch of these tools to my agent. Remember, the actions I needed the agent to be able to do spanned across document processing, email communication, data retrieval, even basic chatbot capabilities. I thought this would essentially create a powerful, all-in-one assistant.

An example of overloading the agent with multiple tools.

I found out that my agent became overwhelmed with options and struggled to manage context across complex tasks and multi-step requests such as “Access this doc [doc link], summarize it, then draft an email”. The agent would take incorrect steps, forget steps in the process, or confuse tools like the Google Docs tool versus the URL tool versus hallucinating a response from the LLM. Additionally, response times would increase dramatically depending on how complex the task was.

The solution to this came when I restructured my approach to use a multi-agent architecture with specialized components. I created an agent with focused toolsets: a document agent, an email agent, a RAG agent—and I plan to implement more! Connecting these is an orchestrator agent that essentially acts as the “decision-maker” of the app and routes tasks to the appropriate specialized agent based on the user’s request.

The orchestrator agent’s role is to understand the user’s intent, break complex requests into subtasks, and delegate them to the right agent. It was now able to handle requests such as the one above “Access this doc [doc link], summarize it, then draft an email” and break it down into something like this:

First, use GOOGLEDOCS_AGENT to access the doc link
Second, use LLM to summarize it, and form the content for the email draft
Third, use GMAIL_AGENT to create the actual email draft for the user to be able to review and easily send it off

This mistake taught me that complex AI workflows benefit from division of labor, just as humans do. Each agent should have a clearly defined scope with the right tools for the specific job. Just make sure you assign and describe those tools and jobs correctly.

This leads me to my third and probably most-critical mistake. After refining my agent prompts and implementing multi-agent architecture, I thought I was on the right track. But I quickly encountered another obstacle: my tools were not being used correctly, or in some cases, not at all. This led me to my second mistake in the agent development process, which was not properly naming or describing each tool for the agent.

When implementing tools in Langflow, I initially gave them generic names like “Email Tool” or “Docs Tool” with minimal descriptions. I assumed that since I had properly connected the APIs through Composio (a third-party app integration tool) and the functionality worked when tested individually, the agent would inherently understand how to use that. Though the agent did come through sometimes, it did not happen 100% of the time.

I discovered that meaningful tool names and descriptions are critical for the agent’s decision-making process. For example, if the input is “Access this marketing doc and summarize it: [docs link]”, the agent has to match the intent to the appropriate tool. My original Google Docs tool implementation looked something like this:

Name: “Docs Agent”
Description: “Use this to create and access docs”

With vague names and descriptions, the agent would sometimes struggle to make the correct decision consistently, fail to use the tool, or use it incorrectly. With the above description, it would attempt to use the URL tool instead of accessing the doc through the Google Docs API, which has the proper permissions.

After recognizing the issue, I implemented more descriptive naming and description:

Name - “GOOGLEDOCS_AGENT”
Description - “A Google Docs tool with access to the following tools: creating new Google Docs (give relevant titles, context, etc), edit existing Google Docs, retrieve existing Google Docs via the Google Docs link”

The improvement was immediate and significant. With clearer naming conventions and more detailed and prescriptive descriptions, the agent began consistently selecting the right tools for each task. Tool descriptions are essentially API documentation for your agent. Through this mistake, I learned that an agent, just like an LLM, can only be as effective as the information you provide about its available tools.

Conclusion

After overcoming these three mistakes – underestimating the importance of prompting, overloading the agent with tools, and poorly implementing tools – I’ve gained valuable insights into effective agent development.

The most important lesson that I learned is something I feel that I’ve always known – our tools are only as powerful as how we make them to be. Agents are powerful, and they can seem magical—but they aren’t. We still have to provide the right tools, detailed descriptions, and structured architecture in order for them to shine. Tools like Langflow have really helped me break down the concept of agents, fail fast, and iterate on my mistakes. It’s about finding the right balance between giving your agents enough information while avoiding overwhelming them with too many options or vague instructions.

For those who are embarking on their own agent-building journey, I hope you learned a thing or two from this post. The field of AI agents is still growing and evolving fast – what works well today may change tomorrow as models improve.

What mistakes have you encountered while getting started with agents? Please start a discussion in our Discord.

Frequently Asked Questions (FAQ)

What is an agent and how do they differ from regular LLM prompts?

Unlike simple LLM prompts, agents can interact with external tools, maintain state across multiple steps, and execute complex workflows.

What are multi-agents?

“Multi-agents” use specialized AI agents to focus on specific tasks or domains. Instead of using one “super agent” connected to every possible tool, a multi-agent architecture uses dedicated agents for specific functions (like document processing, email management, or data retrieval).

What does this personal assistant app do?

This personal assistant uses AI agents to handle multiple tasks, multi-step tasks, and more, such as drafting emails, summarizing meeting notes, and retrieving knowledge from a database.

What is Langflow, and why use it?

Langflow is a visual IDE for building generative and agentic AI workflows. It simplifies creating complex AI flows, enables quick iteration, and integrates seamlessly with applications.

What tools are used?

Langflow – AI app development, agents
Astra DB – Vector database, data retrieval, RAG
Composio – Application integration platform for AI Agents and LLMs, handles Gmail and Google Doc API integrations

Where can I find the flow file?

At my Github: https://github.com/melienherrera/personal-assistant-langflow

Introducing Astra DB for AI Agents: A New Era of Database Interaction

Tejas Kumar — Wed, 12 Mar 2025 08:06:35 +0000

Today we're thrilled to unveil a new way of interacting with our flagship vector database, Astra DB. Say hello to Astra DB over MCP—an innovative way to communicate with your database that leverages the Model Context Protocol (MCP) to let you create and manage databases without writing a single line of code.

What is Model Context Protocol (MCP)?

MCP is an innovation first pioneered by Anthropic in late 2024. It’s a standardized protocol designed for sharing context between language models and tools. This means that any MCP server can communicate with any MCP client, enabling language models to execute functions agentically on your behalf. Imagine being able to hand off entire functions to an AI—MCP makes that possible.

For example, popular MCP clients include:

Both can consume data from MCP servers and act agentically as a result. Let’s get hands on with Astra DB over MCP.

Hands-on with Astra DB over MCP

In our demo, we explore how Astra DB over MCP unlocks a new way of interacting with your data. Let’s walk through the process.

1. Set up your Astra DB environment

To get started, you need an Astra DB application token and an API endpoint. To get these, you’ll have to:

Sign up for Astra DB - It’s free and quick. Just sign up, create a database, and you’ll receive your API endpoint along with an application token. Here are more detailed instructions to do so.
Create your database - For this demo, we set up a vector database named “my_mcp_db.” Astra DB’s multi-cloud capability means you can choose your preferred region, and the database is ready within minutes.

2. Integrating with an MCP client

Once your Astra DB instance is ready, you can integrate it with an MCP client like Claude Desktop:

Configure Claude Desktop - Open the app, go to Preferences → Developer → Edit Config. This will take you to a JSON file. Paste the following JSON configuration snippet that includes your DB token and API endpoint.
Launch and verify - Restart Claude Desktop and watch as it connects to Astra DB—instantly revealing 10 available MCP tools.

From here, you can ask Claude to do anything you like inside your database: create collections, insert data, clean up, and more. This is a handy way of interacting with your database via an AI assistant, but we can do more when we use Cursor as our MCP client.

Building a full, end-to-end application with a UI, database, and API

The real magic happens when you use Astra DB over MCP in Cursor. To set this up:

Go to Settings -> Cursor Settings -> MCP
From there, you can add the server by clicking the "+ Add New MCP Server" button and entering the following values:

Name - Whatever you want
Type - Command
Command -

env ASTRA_DB_APPLICATION_TOKEN=your_astra_db_token ASTRA_DB_API_ENDPOINT=your_astra_db_endpoint npx -y @datastax/astra-db-mcp

Once added, your editor will be fully connected to your Astra DB database.

Now you can invoke the Cursor agent (by pressing Cmd+I on macOS) and ask it to build anything you want: whenever a database is needed, it will automatically operate Astra DB to do whatever is required.

In our demo in the video above, the language model agent executes a series of tasks: from setting up the collection to auto-generating Next.js route handlers and fixing UI issues on the fly. The result? A fully functional to-do list app powered entirely by Astra DB over MCP.

Why this matters

Astra DB over MCP demonstrates the incredible potential of combining any tool with AI agents. By enabling agentic interactions between tools and language models:

Developers can accelerate time to production without the overhead of boilerplate code.
Non-technical users can create applications that are normally reserved for seasoned programmers.
Innovation is democratized, letting you build everything from a Twitter clone to a YouTube replica with minimal effort.

What’s next?

We’re excited to see what you’ll build using this new mode of development. Whether you’re a developer, a startup founder, or a tech enthusiast, Astra DB over MCP opens up a world of possibilities. So, what will you create? Join the conversation on Discord, try out Astra DB over MCP, and let us know how you’re leveraging the power of agentic database interactions.

Frequently Asked Questions (FAQ)

1. What is Astra DB over MCP?

Astra DB over MCP is a new method of interacting with our flagship vector database, Astra DB, using the Model Context Protocol. It allows you to perform database operations through prompts with an AI agent—without writing any code.

2. What is the Model Context Protocol (MCP)?

MCP is an open standard, first pioneered by Anthropic in late 2024, that enables seamless communication between language models and external tools. It allows AI systems to share context and execute functions agentically on your behalf.

3. Do I need to write any code to interact with Astra DB over MCP?

No! One of the key benefits of this new integration is that you can perform complex database operations—such as creating collections, inserting data, and building entire applications—without writing a single line of code. MCP clients like Claude Desktop, Cursor, etc. manage all the interactions for you.

4. How do I get started with Astra DB over MCP?

Simply sign up for Astra DB, create your database to receive an API endpoint and an application token, and then configure your MCP client by updating its settings with these credentials.

5. What are some examples of MCP clients?

Popular MCP clients include Claude Desktop—a desktop application for interacting with AI models—and Cursor, an AI-enabled version of VS Code that integrates MCP tools directly into your development workflow.

6. Is Astra DB over MCP an open-source project?

Yes, AstraDB over MCP is an open-source project. You can access the code and contribute to its development via GitHub.

8. Where can I get help if I encounter issues?

You can refer to our detailed documentation, check out the GitHub repository for troubleshooting tips, or join our Discord community where fellow users and developers share advice and best practices.

5 GenAI Things You Didn't Know About Astra DB

Phil Nash — Thu, 06 Mar 2025 23:07:23 +0000

Astra DB is a high-performance NoSQL database powered by Apache Cassandra® with built-in vector search, but that's just what the product page says. Not everything fits onto one page, so I wanted to share a few things that you might not already know about Astra DB and how it helps you to build accurate, low-latency, retrieval-augmented generation (RAG) powered generative AI apps.

Astra DB can create vector embeddings for you

When ingesting data for a RAG application, there are several steps you need to take: document loading, text parsing, chunking text, creating vector embeddings, and storing it in the database. Astra DB can simplify the process by combining those last two steps.

Astra Vectorize can create vector embeddings for your text chunks at the point of inserting them into the collection.

When you create an Astra DB collection, you can choose one of the supported embedding models. There are models available from OpenAI (including Azure OpenAI), Voyage AI, Mistral AI, Jina AI, and Upstage. Astra DB also hosts NVIDIA embedding models that run in the same environment as the database, boosting performance—Wikidata reduced their data ingestion time from 30 days to two with Vectorize—and ensuring the data never leaves the database.

Once you have set up your collection with your embedding provider of choice, ingesting data with Vectorize is a case of providing the text you want turned into a vector as a special $vectorize property in the documents you are storing. In TypeScript, this looks like:

import { DataAPIClient } from "@datastax/astra-db-ts";
const client = new DataAPIClient(process.env.ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(process.env.ASTRA_DB_API_ENDPOINT);
const collection = db.collection(process.env.ASTRA_DB_COLLECTION);

await collection.insertOne({
  $vectorize: "A robot may not injure a human being or, through inaction, allow a human being to come to harm."
});

Then to perform a vector search against this collection you use the $vectorize field to sort by your query.

const cursor = collection.find({}, {
  sort: { $vectorize: "Are robots allowed to protect themselves?" },
  limit: 5,
});
const results = await cursor.toArray();

You can learn more about Astra Vectorize in the documentation.

Astra DB supports graph RAG

Depending on your data, regular vector search can sometimes miss context, which makes it harder for large language models (LLMs) to answer certain queries. Graph RAG is a technique that takes your documents, extracts links between them, and uses those links to retrieve extra contextual information at the retrieval stage. Providing extra linked context to an LLM makes for more accurate and informed answers.

Astra DB supports graph RAG via LangChain. You can replace the AstraDBVectorStore with AstraDBGraphVectorStore and ensure you ingest your data in a way that extracts the links between documents. A simplified ingestion example that reads a URL, extracts HTML links, strips the HTML, and splits the text into chunks before storing in Astra DB (using Astra Vectorize to create embeddings) might look like this:

import os

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.graph_vectorstores.extractors import (
    HtmlLinkExtractor,
    LinkExtractorTransformer
)
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain_astradb import AstraDBGraphVectorStore, CollectionVectorServiceOptions

vectorize_options = CollectionVectorServiceOptions(
    provider="nvidia",
    model_name="NV-Embed-QA",
)

vector_store = AstraDBGraphVectorStore(
    collection_name="graph",
    token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.environ.get("ASTRA_DB_API_ENDPOINT"),
    collection_vector_service_options=vectorize_options
)

urls = [
    "https://www.datastax.com/guides/graph-rag",
    "https://www.datastax.com/blog/build-graph-rag-with-unstructured-and-astra-db"
]
loader = AsyncHtmlLoader(urls)
docs = loader.load()

transformer = LinkExtractorTransformer([HtmlLinkExtractor().as_document_extractor()])
bs4_transformer = BeautifulSoupTransformer()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

docs = transformer.transform_documents(docs)
docs = bs4_transformer.transform_documents(docs)
chunks = text_splitter.split_documents(docs)

vector_store.add_documents(chunks)

Then to search Astra DB, you can use the graph store's traversal_search method to first retrieve a number of document chunks (k), before traversing the graph to the specified depth for additional chunks. In this case, we perform the search initially finding four chunks using a similarity search and then traversing the graph to a depth of two to return related chunks.

traversal_results = vector_store.traversal_search(
    query="What are the differences between Graph RAG and naive RAG?",
    k=4,
    depth=2,
)

Check out this full tutorial on building graph RAG with Unstructured and Astra DB.

Astra DB supports ColBERT

Graph RAG can help if your context is spread across chunks, but there are other situations where graph RAG won't necessarily help. If your data contains terms that aren't in the training data of your embedding model, it can be difficult to get accurate similarity search results.

One way to overcome this is to use ColBERT. ColBERT creates a vector per token in a body of text, creating a sliding window of context over entire passages and capturing unknown context much better. This does require more storage for the extra vectors, but if accuracy is your priority, it’s worthwhile.

You can use ColBERT with Astra DB in LangChain by using the RAGStack implementation.

To ingest the data, you can use the ColbertEmbeddingModel and ColbertVectorStore.

import os
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore

embedding = ColbertEmbeddingModel()
database = CassandraDatabase.from_astra(
  astra_token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
  database_id=os.environ.get("ASTRA_DB_DATABASE_ID"),
  keyspace="default_keyspace"
)
vector_store = ColbertVectorStore(
  database=database,
  embedding_model=embedding
)
results = vector_store.add_texts(texts=YOUR_LIST_OF_TEXTS, doc_id="myDocs")

Then performing a similarity search is pretty much the same as any other vector store search in LangChain.

from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel
from ragstack_langchain.colbert import ColbertVectorStore as LangchainColbertVectorStore

colbert_embedding = ColbertEmbeddingModel()
colbert_database = CassandraDatabase.from_astra(
    astra_token=YOUR_ASTRA_DB_TOKEN,
    database_id=YOUR_ASTRA_DB_ID,
    keyspace="default_keyspace"
)
vector_store = LangchainColbertVectorStore(
    database=colbert_database,
    embedding_model=colbert_embedding
)
query = "What is ColBERT?"
results = vector_store.similarity_search(query)

Check out this full tutorial on using ColBERT with Astra DB, or for a faster alternative, Jonathan Ellis's ColBERT Live!, which uses Answer AI's colbert-small-v1 model and is supported by Astra DB.

Astra DB indexes your vectors live

Your vector database needs to be both accurate and speedy in order to ensure the performance of your application. When you are ingesting or updating data in your collection, rebuilding the index takes time and leaves you with slow queries or out of date data.

Astra DB's vector indexing capabilities are a combination of Cassandra's storage-attached indexing (SAI) and JVector, a non-blocking, concurrent, graph-based vector index. What this means is that Astra DB doesn't need to rebuild or block access to its index when you are inserting vectors, they are updated live.

The upshot of this is high throughput and accuracy even under mixed loads of reads and writes. Check out this benchmark of throughput and accuracy against Pinecone, particularly when Pinecone is performing indexing. Astra DB doesn't sacrifice throughput or accuracy under load; it will always be there for your application.

Astra DB is integrated in all your favourite frameworks

We've seen so far in this post that Astra DB is available in LangChain, but you can also find it in:

LangChain.JS
LlamaIndex and LlamaIndex.TS
Haystack
Mastra (a newer framework, built by the team behind Gatsby)

And of course Astra DB is integrated into Langflow. Deeply integrated! Once you enter your application token into the Astra DB component, your databases will automatically load. Then once you select your database, you can pick the collection you need too.

You can even create a new database from within Langflow. Oh, and Langflow supports using Astra Vectorize when ingesting or performing vector search too.

Langflow is a great visual way to build agents, and Astra DB makes it easy to build RAG or agentic RAG within Langflow.

Astra DB is ready to help you build transformative AI

Whether you're looking to build with Langflow or any number of other frameworks, or try out alternative vector searches like graph RAG or ColBERT, Astra DB is there to help. And it will do it quickly, creating vectors for you via Vectorize and indexing them live so your data is always up to date.

There are so many different applications you can build; check out examples like this AI resume assistant, RAG-powered voice agent, or hum-to-search music recognition app, all powered by Astra DB.

From chat bots to autonomous agents, Astra DB supports you in building the GenAI apps that are going to transform your business.

How to Stream Responses from the Langflow API in Node.js

Phil Nash — Wed, 05 Mar 2025 21:34:53 +0000

Building flows and AI agents in Langflow is one of the fastest ways to experiment with generative AI. Once you've built your flow, you’ll want to integrate it into your own application. Langflow exposes an API for this; we’ve written before about how to use it in Node.js. We've also seen that streaming GenAI outputs makes for a better user experience. So today, we're going to combine the two and show you how to stream results from your Langflow flows in Node.js.

Using the Langflow client

The easiest way to use the Langflow API is with the @datastax/langflow-client npm module. You can get started with the client by installing the module with npm:

npm install @datastax/langflow-client

The Langflow client can be used with both self-hosted and DataStax-hosted Langflow. You can see in-depth examples of how to set it up for either version of Langflow in this blog post. But the quick version is that for either type of Langflow, you start by importing the client:

import { LangflowClient } from "@datastax/langflow-client";

For self-hosted Langflow you need the URL where you’re hosting Langflow and, if you've set up user authorisation, an API key. You then initialise the client with both:

const baseURL = "http://localhost:7860";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ baseURL, apiKey });

For DataStax-hosted Langflow, you need your Langflow ID and to generate an API key. Then you create a client with the following code:

const langflowId = "YOUR_LANGFLOW_ID";
const apiKey = "YOUR_API_KEY";
const client = new LangflowClient({ langflowId, apiKey });

Streaming with the Langflow client

To stream through the API, you need a flow that’s set up for streaming responses. A streaming flow needs a model with streaming capabilities and the stream flag turned on, connected to a chat output. The basic prompting example, with streaming turned on, is a good example of this.

If you don't already have a flow, you can use the basic prompting flow as an example.

Once you have your flow in place, open the API modal and get the flow ID.

With the flow ID and the Langflow client, you can create a flow object:

const flowId = "YOUR_FLOW_ID";
const flow = client.flow(flowId);

To stream a response from the flow, you can use the [stream function](https://www.npmjs.com/package/@datastax/langflow-client#streaming). The response is a ReadableStream that you can iterate asynchronously over.

const response = await flow.stream("Hello, how are you?");
for await (const event of response) {
  console.log(event);
}

There are three types of event that the stream emits; this is what each of them means:

add_message: a message has been added to the chat. It can refer to a human input message or a response from an AI.
token: a token has been emitted as part of a message being generated by the model.
end: all tokens have been returned; this message will also contain the same full response that you get from a non-streaming request

If you want to log out just the text from a flow response you can do the following:

const response = await flow.stream("Hello, how are you?");
for await (const event of response) {
  if (event.event === "token") {
    console.log(event.data.chunk);
  }
}

The stream function takes all the same arguments as the run function, so you can provide tweaks for your components, too.

Integrating with Express

If you want to make an API request from an Express server and then stream it to your own front-end, you can do the following:

app.get("/stream", async (_req, res) => {
  res.set("Content-Type", "text/plain");
  res.set("Transfer-Encoding", "chunked");

  const response = await flow.stream("Hello, how are you?");

  for await (const event of response) {
    if (event.event === "token") {
      res.write(event.data.chunk);
    }
  }

  res.end();
});

We explored how you can handle a stream on the front-end in this blog post.

Stream your flows

Langflow enables you to rapidly build, experiment with, and deploy GenAI applications and with the JavaScript Langflow client you can easily stream those responses in your JavaScript applications.

Please do try out the Langflow client; if you have any issues, please raise them on the GitHub repo. If you're looking for more inspiration for building AI agents with Langflow, check out these posts that cover how to build an agent that can manage your calendar with Langflow and Composio or see how you can build local agents with Langflow and Ollama.

Build a RAG-Powered Voice Agent with Twilio Voice, OpenAI, Astra DB, and Node.js

Phil Nash — Wed, 19 Feb 2025 23:08:48 +0000

With the OpenAI Realtime API, you can build speech-to-speech applications that let you interact directly with a generative AI model by speaking with it. Talking directly to a model feels really natural, and the Realtime API makes it possible to build experiences like this into your own applications and businesses.

One example of this was built by Twilio: it enables you to connect a phone call to GPT-4o with Node.js (or, if you prefer, Python). The example is great, but it only shows connecting to a plain GPT-4o with a system prompt that encourages owl facts and jokes. Much as I like owl facts, I wanted to see what else we could achieve with a voice agent like this.

In this post, we'll show you how to extend the original assistant into an agent that can choose to use tools to augment its response. We'll give it additional, up-to-date knowledge via retrieval-augmented generation (RAG) using Astra DB.

Prerequisites

First, you’ll need to set up the application from the Twilio blog post, so you'll need a Twilio account and an OpenAI API key. Make sure you can make a call and chat with the bot successfully.

You will also need a free DataStax account so you can set up RAG with Astra DB.

What we’re going to build

We already have a voice-capable bot that you can speak to over the phone. We're going to gather some up-to-date data and store it in Astra DB to help the bot answer questions.

The OpenAI Realtime API enables you to define tools that the model can use to execute functions and extend its capabilities. We’ll give the model a tool that enables it to search the database for additional information (this is an example of agentic RAG).

Ingesting data

To test out this agent, we're going to write a quick script to load and parse a web page, turn the content into chunks, turn those chunks into vector embeddings, and store them in Astra DB.

Create your database

To kick this process off, you'll need to create a database. Log into your DataStax account and, on the Astra DB dashboard, click Create a Database. Choose a Serverless (Vector) database, give it a name, and pick a provider and region. That will take a couple of minutes to provision. While it's doing that, have a think about some good web pages you might want to ingest into this database.

Once the database is ready, click on the Data Explorer tab and then the Create Collection + button. Give your collection a name, ensure it is a vector-enabled collection and choose NVIDIA as the embedding generation method. This will automatically generate vector embeddings for the content we insert into the collection.

Connect to the database

Open the application code in your favourite text editor. To get the application running, you’ll have created a .env file and populated it with your OpenAI API key (and if you didn't do that yet, now is definitely the time). Open that .env file and add some more environment variables.

ASTRA_DB_APPLICATION_TOKEN=
ASTRA_DB_API_ENDPOINT=
ASTRA_DB_COLLECTION_NAME=

Fill in the variables with the information from your database. You can find the API endpoint and generate an application token from the database overview in the Astra DB dashboard. Enter the name of the collection you just created, too.

Now we can connect to the database in the application. Install the Astra DB client from npm.

npm install @datastax/astra-db-ts

Create a new file in the application called db.js. Open the file and enter the following code:

import { DataAPIClient } from "@datastax/astra-db-ts";
import dotenv from "dotenv";

dotenv.config();

const {
  ASTRA_DB_APPLICATION_TOKEN,
  ASTRA_DB_API_ENDPOINT,
  ASTRA_DB_COLLECTION_NAME,
} = process.env;

const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);
export const collection = db.collection(ASTRA_DB_COLLECTION_NAME);

This code loads the client from the Astra DB module and the variables in the .env file into the environment. It then uses those environment variables as credentials to connect to the collection, and exports the collection object to be used elsewhere in the application.

Get some data

Now let's create a script that loads and parses a web page, then splits it into chunks and stores it in Astra DB. This script is going to combine some of the techniques in blog posts about scraping web pages, chunking text, and creating vector embeddings. To read more in depth about those, check out those posts.

Install the dependencies:

npm install @langchain/textsplitters @mozilla/readability jsdom

Create a file called ingest.js and copy the following code:

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";

import { collection } from "./db.js";

import { parseArgs } from "node:util";

const { values } = parseArgs({
  args: process.argv.slice(2),
  options: { url: { type: "string", short: "u" } },
});

const { url } = values;
const html = await fetch(url).then((res) => res.text());

const doc = new JSDOM(html, { url });
const reader = new Readability(doc.window.document);
const article = reader.parse();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 100,
});

const docs = (await splitter.splitText(article.textContent)).map((chunk) => ({
  $vectorize: chunk,
}));

await collection.insertMany(docs);

This script:

uses the Node.js argument parser to get a URL from the command line arguments
loads the web page at that URL
parses the content from the page using Readability.js and JSDOM
splits the text into 500 character chunks with 100 character overlap using the RecursiveCharacterTextSplitter
turns the chunks into objects where the chunk of text becomes the $vectorize property
inserts all the documents into the collection

Using the $vectorize property tells Astra DB to automatically create vector embeddings for this content.

We can now run this file from the command line. For example, here's how to ingest the Wikipedia page on Taylor Swift:

node ingest.js --url https://en.wikipedia.org/wiki/Taylor_Swift

Once this command has been run, check the collection in the DataStax dashboard to see the contents and the vectors.

Build the voice agent

To turn our existing voice assistant into an agent that can choose to search the database for more information, we need to provide it with a tool, or function, that it can choose to use.

Create a new file called tools.js and open it in your editor. Start by importing collection from db.js:

import { collection } from "./db.js"

Next we need to create the function that the agent can use to search the database.

When the OpenAI agent provides parameters to call a function with, it does so as an object. So the function should receive an object, from which we can destruct to extract the query. We'll then use the query to perform a vector search against our collection.

We can use Astra DB Vectorize to automatically create a vector embedding of the query. We'll also limit the results to the top 10 and ensure we return the text from the chunks by selecting $vectorize in the projection.

Calling find on the collection with these arguments will return a cursor, which we can turn into an array by calling toArray. We then iterate over the array of documents, extracting just the text and then joining the resulting array with a newline to create a single string result that can be provided as context to the agent.

async function taylorSwiftFacts({ query }) {
  const docs = await collection.find(
{},
{ $vectorize: query, limit: 10, projection: { $vectorize: 1 } }
  );
  return (await docs.toArray()).map((doc) => doc.$vectorize).join("\\n");
}

I've called the function taylorSwiftFacts because that's what I loaded with my ingestion script; feel free to use a different name.

This is our first tool; we can write more, but for now we can just export this as an object of tools.

export const TOOLS = {
  taylorSwiftFacts,
};

To help the model choose when to use this tool, it needs a description of what it can do and the arguments it expects. For each tool you provide a type, name, description, and the parameters.

For our function the type will be "function" and the name is taylorSwiftFacts. The description will tell the agent that we have up-to-date information about Taylor Swift that it can search for. The parameters are a JSON schema description of the arguments your function expects, this tool is relatively simple as it only requires one parameter called query, which is a string. The full description looks like this:

export const DESCRIPTIONS = [
  {
    type: "function",
    name: "taylorSwiftFacts",
    description:
      "Search for up to date information about Taylor Swift from her wikipedia page",
    parameters: {
      type: "object",
      properties: {
        query: {
          type: "string",
          description: "The search query",
        },
      },
    },
  },
];

Our tool definition is complete for now, so let's add them to our agent.

Handling function calls in a voice agent

We've been building supporting functions around the existing application so far, but to connect our tool to the agent we need to dig into the main body of code. Open index.js in our editor and start by importing the tool we just defined:

import Fastify from 'fastify';
import WebSocket from 'ws';
import dotenv from 'dotenv';
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';

import { DESCRIPTIONS, TOOLS } from "./tools.js";

We need to update the system prompt to more accurately describe what the agent is capable of with the tool available to it. Since we ingested the wikipedia page for Taylor Swift earlier, we can update it to behave like a Taylor Swift superfan.

Find the SYSTEM_MESSAGE constant and update with:

const SYSTEM_MESSAGE = "You are a helpful and bubbly AI assistant who loves Taylor Swift. You can use your knowledge about Taylor Swift to answer questions, but if you don't know the answer, you can search for relevant facts with your available tools.";

Next we need to provide the tool we have built to the agent. Find the initializeSession function, it defines a sessionUpdate object that includes all the details to initialize the agent. Add a tools property to the session object using the DESCRIPTIONS object we imported earlier:

          const sessionUpdate = {
                type: 'session.update',
                session: {
                    turn_detection: { type: 'server_vad' },
                    input_audio_format: 'g711_ulaw',
                    output_audio_format: 'g711_ulaw',
                    voice: VOICE,
                    instructions: SYSTEM_MESSAGE,
                    modalities: ["text", "audio"],
                    temperature: 0.8,
                    tools: DESCRIPTIONS
                }
            };

We can also provide tools on a request-by-request basis, but this agent will benefit from access to this tool in all its interactions.

Finally we need to handle the event when the model requests to use a tool. Find the event handler for when the connection to OpenAI receives a message, it looks like: openAiWs.on('message', … ).

Change the event handler to an async function:

openAiWs.on('message', async (data) => {

When the Realtime API wants to use a tool, it sends an event with the type "response.done." Within the event object there are outputs, and if one of the outputs has a type of "function_call" we know the model wants to use one of its tools.

The output provides the name of the function it wants to call and the arguments. We can look up the tool in our object of TOOLS that we imported, then call it with the arguments.

When we have the result of the function call we pass it back to the model so that it can choose what to do next. We do so by creating a new message with the type "conversation.item.create" and within that message we include an item with the type "function_call_output", the output of the function call, and the ID that the original event had, so that the model can tie the response to the original query.

We send this to the model as well as another message with the type "response.create" which requests the model use this new information to return a new response.

Overall, this enables the model to request to use the database search function we defined and provide the arguments it wants to call the function with. We are then responsible for calling the function and returning the results to the model. The whole code looks like this:

      openAiWs.on('message', async (data) => {
          try {
            const response = JSON.parse(data);

            if (LOG_EVENT_TYPES.includes(response.type)) {
              console.log(`Received event: ${response.type}`, response);
            }

            if (response.type === "response.done") {
              const outputs = response.response.output;
              const functionCall = outputs.find(
                (output) => output.type === "function_call"
              );
              if (functionCall && TOOLS[functionCall.name]) {
                const result = await TOOLS[functionCall.name](
                  JSON.parse(functionCall.arguments)
                );
                const conversationItemCreate = {
                  type: "conversation.item.create",
                  item: {
                    type: "function_call_output",
                    call_id: functionCall.call_id,
                    output: result,
                  },
                };
                openAiWs.send(JSON.stringify(conversationItemCreate));
                openAiWs.send(JSON.stringify({ type: "response.create" }));
              }
            }

            // other event handlers

Start the application and make sure it is connected to your Twilio number as described in the Twilio blog post. Now we can call and chat all things Taylor Swift.

This is now a new way to connect with the Taylor Swift bot we built a while back. So now you can chat with SwiftieGPT online or on the phone.

Give your voice assistants some agency

Real-time voice agents are very cool, but they have all the same drawbacks as a plain LLM. In this post we added agentic RAG capabilities to our voice agent and it was able to use up-to-date knowledge to answer our questions about Taylor Swift.

When you provide a voice agent with tools, like context from a vector database, the results are very impressive. The combination of Twilio, OpenAI, and Astra DB creates a very powerful agent.

You can find the code to this in my fork of the Twilio project. You don't have to stop here though; you can define and add further tools to the agent. Make sure you check out OpenAI's best practices for defining functions for your models.

If you're interested in building other agents, check out how to work with Langflow and Composio or the workshop and videos from the recent Hacking Agents event.

Are you excited about voice agents or agentic RAG? Come chat about it and what you're building in the DataStax Devs Discord.

Want to roll up your sleeves and build with OpenAI, Twilio, Cloudflare, Unstructured, and DataStax? Join us on Feb. 28 in San Francisco for the Hacking Agents Hackathon, an epic 24-hour hackathon where we'll be diving into what developers can build with the latest and greatest in AI tooling.

Unlocking Local AI: How to Use Ollama with Agents

David Jones-Gilardi — Thu, 13 Feb 2025 17:13:50 +0000

By now, there’s a good chance you’ve heard about generative AI or agentic flows (If you’re not familiar with agents and how they work watch this video to get up to speed). There’s plenty of information out there about building agents with providers like OpenAI or Anthropic. However, not everyone is comfortable with exposing their data to public model providers. We get a consistent drum of questions from folks wondering if there’s a more secure and cheaper way to run agents. Ollama is the answer.

If you've ever wondered how to run AI models securely on your own machine without sharing your data with external providers, well, here you go!

If you’d rather watch this content, here’s a video covering the same topic.

Why use Ollama?

Ollama enables you to run models locally, ensuring that your data remains private and secure. Not only that, it won’t cost you any tokens. With Ollama, you can confidently run models on your hardware, knowing that your data is safe.

Getting started with Ollama

Step 1: Install the model

If you haven’t used Ollama before, you’ll need to install it locally first. Download and install the version needed for your operating system. It takes about five minutes.

Then, navigate to the models section and select tools. It's crucial to choose models that support tool calling when you want to build an agent.

For this post, we'll use Alibaba's Qwen 2.5 7 billion parameter model, which is a great choice for local tool calling and agent interactions. It's only a 4.7GB download (Llama 3.1 405b is 243GB!) and is suitable to run on most machines.

Copy the installation command and paste it into your terminal after installing Ollama. Once the download is complete, you're ready to start working with the model!

Step 2: Setting up Langflow

Next, we'll use Langflow, a visual IDE that enables you to build generative and agentic AI flows in a low-code or no-code environment. If you're not familiar with Langflow, check out this link for more information.

1. Install Langflow: Use “uv pip install langflow” in your terminal to install Langflow locally.

2. Create a new flow: Choose the “Simple Agent” template.

Once opened, you’ll see a ready-made simple agentic flow complete with an agent (defaulting to OpenAI’s gpt-4o-mini LLM), both URL and calculator tools, and chat input and output components.

Transitioning to Ollama

Now, let's switch from OpenAI to Ollama:

1. Select custom model - In the model provider list, choose the custom option.

Since our goal is to use Ollama and not OpenAI, click the “Model Provider” dropdown in the agent component and choose “Custom.”

2. Add Ollama component: Drag and drop the Ollama model into your flow and connect the “Language Model” nodes.

3. Refresh the model list and choose qwen2.5 - Make sure to refresh the model name dropdown to populate the available models. It's essential to have Ollama running locally for this setup to work.

To use an Ollama model with your agent, it must support tool calling. In Langflow, enable the “Tool Model Enabled” radio button to filter models that have this capability. Once enabled, select **qwen2.5* for your operations.*

Running your query

Now, let's run a query using the Ollama model. Open the “Playground” and try typing in an example like “convert 200 USD to INR”. If everything is wired up correctly, the model will attempt to answer your query using the tools at its disposal.

Keep in mind that local models may take longer to process, especially larger ones. However, Qwen 2.5 is optimized for smaller machines, making it pretty solid for local use.

Experimenting with inputs

When working with smaller local models, you may need to experiment with your inputs. Sometimes, you might have to explicitly instruct the model to do something, like use the web to find the latest exchange rates. Adjusting the model's temperature settings can also help; starting with a conservative value (like 0.10) is a good practice, but feel free to increase it for more creative responses.

Notice how the agent updated its approach when I told it to “use the web to get the latest exchange rates” and gave the correct answer. This time, it used the URL tool to grab the latest exchange rate from the web as compared to relying solely on the knowledge it was trained on.

Conclusion

Finally, once your Ollama agent is set up within Langflow, you can integrate it into your applications via API, allowing you to enable your apps with full agentic capability.

That's all it takes to harness the power of local models securely with Ollama and your agents. If you have any questions or need further assistance, feel free to reach out on our Discord.

Happy coding!

How to Build a Simple AI Agent with Langflow and Composio

melienherrera — Mon, 10 Feb 2025 19:51:42 +0000

Are you trying to understand AI agents? Or perhaps you’ve started building agents, but are still struggling with tools and how to connect them to app integrations. DataStax Langflow and Composio are a great combination to help you understand these concepts.

Langflow is a visual low-code AI application builder that allows you to build agents quickly for rapid development, and Composio is an integration platform that gives developers access to hundreds of tools like GitHub, Salesforce, and Google.

In this tutorial, you’ll learn how to create a simple agent in Langflow using Composio as a tool to connect to your Google calendar. Let’s get started!

Set up

For this tutorial, you’ll need:

A DataStax Langflow account to build your AI agent
A Composio account for connecting your tools and integrations
An OpenAI account and API key

Once those two accounts are created, proceed to the following steps to get started on building an AI agent with Composio.

Start with Composio

Composio is an application integration platform that gives you access to many different tools that you could use within your AI application. This means that you no longer have to manage APIs for performing actions like creating, deleting, or updating a Google Calendar event; you just need to go through Composio and the work is done for you. We’ll walk through this here.

Once you’ve created your Composio account, you should be dropped into their dashboard. Copy your API key on the top right hand corner. Save this in your clipboard or preferred notes application for later.

Once you have obtained your API key, head over to the “Apps” tab on the left side navigation bar.

Here, you’ll see all of the available tools and integrations that you can connect with through Composio (283 and counting at the time of writing this blogpost!). Use the “Googlecalendar” integration.

Then go to “Setup Googlecalendar integration.”

Follow the steps to complete the integration with your preferred method. They offer options through code with Python or JavaScript—or simply go through authentication via Google sign-on. Once this is completed, you should receive an “Integration Successful” message, which means that you have successfully connected to Google through Composio.

You’ll be dropped into Step 3/3, “Execute tools,” where you can play around with each individual action in a playground with natural language, test out different parameters, and connect with JS and Python via various frameworks.

Now that you have your Google Calendar integration set up and your API key handy, you'll start building a simple AI agent with Langflow using Composio as a tool.

Langflow

Head over to your Langflow account and create a new flow by clicking the “Create Flow” button, which will bring up the start up menu below. You’’ll be using the “Simple Agent” flow on the “Get started” menu.

You’ll be dropped into the visual editor where you’ll notice that there’s already a flow built out. Each of the blocks that you see are called “Components.” Each component represents a functional step in the end-to-end AI flow. The “Agent” component defaults to using the gpt-4o-mini model from OpenAI, but you can choose to use other models if you prefer. This is where you’ll need to put your OpenAI API key.

Next, on the left-side navigation, you can scroll down to “Bundles” and find the Composio bundle. Drag and drop this to the flow and connect it to the “Agent” component” using the “Tool” linking points.

Refer back to the API key you got from the Composio dashboard and put it in the “Composio” component. Select the “GOOGLECALENDAR” app name, and press the “refresh” button. You’ll know that the connection with the integration has been successful when you see “GOOGLECALENDAR CONNECTED” appear under “Auth Status.”

For the purpose of this demo, select from the dropdown under “Actions to use” select all of them. This will allow you to Create, Update, Delete, and Retrieve events!

You've now set up all the components you need for your agent with Composio. It’s time to run the flow.

Run the flow

To test the flow, go to the “Playground” located in the top right corner. You can use the chat interface to give example queries to your agent flow and see how the agent makes decisions between tools.

For example, try typing in the chat input: “Add 1+1” and you’ll notice that the agent determines that it needs to use the Calculator tool to perform the query. You can inspect this by clicking the drop down menu in the agent logs where it says “AI gpt-4o-mini”.

Next, try giving it a query such as “Can you check if I have availability for January 28, 2025 at 3pm? If it's free, schedule a meeting with Bob.” Observe the response here and what decisions the agent had to make using Composio. What actions do you see it calling? What was the final response? Navigate to your Google calendar and see the created event appear on your calendar.

Wrapping up

You’ve officially set up a simple AI agent using Composio as a tool! You were able to easily connect with your Google Calendar and perform actions without having to configure the API yourself, thanks to the power of the Composio integration and Langflow’s component-based visual app-building interface. But the exploration doesn’t end here. As you saw, there are over LOTS of integrations to try within Composio—and you can easily test them all using Langflow!