Nevo David

Posted on Sep 24

8 must-know open-source repositories to build cool AI apps 🪄 ✨

#webdev #javascript #programming #ai

As someone building AI apps, I see a massive spike in user interest, and this is undoubtedly the best time to master building AI apps.

So, I have compiled a list of 8 open-source repositories you can use now to build robust AI systems.

1. Composio 👑 - Tooling Infrastructure for AI apps

I have been building AI apps for years, and let me tell you, creating a reliable and robust AI app is complex. This is especially true for AI-powered workflow automation.

However, Composio makes it a breeze to integrate third-party services, such as GitHub, Slack, Gmail, etc., with AI models for end-to-end workflow automation.

They have over 100 toolsets across business verticals, such as CRM, HRM, Dev, Admin, and Productivity, to help you build complex automation tasks.

They are open-source and have native support for Python and Javascript.

Here’s how to quickly start with Composio.

pip install composio-core

Add a GitHub integration.

composio add github

Composio handles user authentication and authorization on your behalf.

Here is how you can use the GitHub integration to star a repository.

from openai import OpenAI
from composio_openai import ComposioToolSet, App

openai_client = OpenAI(api_key="******OPENAIKEY******")

# Initialise the Composio Tool Set
composio_toolset = ComposioToolSet(api_key="**\\*\\***COMPOSIO_API_KEY**\\*\\***")

## Step 4
# Get GitHub tools that are pre-configured
actions = composio_toolset.get_actions(actions=[Action.GITHUB_ACTIVITY_STAR_REPO_FOR_AUTHENTICATED_USER])

## Step 5
my_task = "Star a repo ComposioHQ/composio on GitHub"

# Create a chat completion request to decide on the action
response = openai_client.chat.completions.create(
model="gpt-4-turbo",
tools=actions, # Passing actions we fetched earlier.
messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": my_task}
  ]
)

Run this Python script to execute the given instruction using the agent.

For more about Composio, visit their documentation.

Star the Composio repository ⭐

2. Mem0 - The memory layer of AI apps

Memory handling is a real challenge in building AI assistants. It involves storing and retrieving past interactions. An efficient memory layer is essential for providing personalized user experiences.

Mem0 is by far the best solution in this space. Mem0 remembers user preferences, adapts to individual needs, and continuously improves, making it ideal for customer support chatbots, AI assistants, and autonomous systems.

Core Features

Multi-Level Memory: User, Session, and AI Agent memory retention
Adaptive Personalization: Continuous improvement based on interactions
Developer-Friendly API: Simple integration into various applications
Cross-Platform Consistency: Uniform behaviour across devices
Managed Service: Hassle-free hosted solution

Start using Mem0 by installing via pip.

pip install mem0ai

Initialize Mem0

from mem0 import Memory
m = Memory()

Memory operations.

# 1. Add: Store a memory from any unstructured text
result = m.add("I am working on improving my tennis skills. Suggest some online courses.", user_id="alice", metadata={"category": "hobbies"})

# Created memory --> 'Improving her tennis skills.' and 'Looking for online suggestions.'

# 2. Update: update the memory
result = m.update(memory_id=<memory_id_1>, data="Likes to play tennis on weekends")

# Updated memory --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'

# 3. Search: search related memories
related_memories = m.search(query="What are Alice's hobbies?", user_id="alice")

# Retrieved memory --> 'Likes to play tennis on weekends'

# 4. Get all memories
all_memories = m.get_all()
memory_id = all_memories["memories"][0] ["id"] # get a memory_id

# All memory items --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'

# 5. Get memory history for a particular memory_id
history = m.history(memory_id=<memory_id_1>)

# Logs corresponding to memory_id_1 --> {'prev_value': 'Working on improving tennis skills and interested in online courses for tennis.', 'new_value': 'Likes to play tennis on weekends' }

For more information on Mem0 functionalities, refer to their official documentation.

Star the Mem0 repository ⭐

3. AgentOps - AI Agent observability platform

Building AI agents is now a thing, but ensuring they work as intended is another. Most developers do not bother monitoring their AI agents, which is a disaster in waiting.

AgentOps streamlines agent observability, ensuring the agents are on track. It helps in LLM cost tracking, benchmarking, and more, and it integrates with most LLMs and agent frameworks, such as CrewAI, Langchain, and Autogen.

Get started with AgentOps by installing it through pip.

pip install agentops

Initialize the AgentOps client and automatically get analytics on every LLM call.

import agentops

# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init( < INSERT YOUR API KEY HERE >)

...

# (optional: record specific functions)
@agentops.record_action('sample function being record')
def sample_function(...):
    ...

# End of program
agentops.end_session('Success')
# Woohoo You're done 🎉

Refer to their documentation for more.

Star the AgentOps repository ⭐

4. E2B - Code interpreting for AI agents

For many applications, such as building an AI engineer, analyst, or tutor, you will need a way to connect AI models with a code execution environment, and E2B is the leading solution in this space.

E2B Sandbox is a secure cloud environment for AI agents and apps.

It allows AI to run safely for long periods, using the same tools as humans, such as GitHub repositories and cloud browsers.

They offer native Code Interpreter SDKs for Python and Javascript/Typescript.

The Code Interpreter SDK allows you to run AI-generated code in a secure small VM - E2B sandbox - for AI code execution. Inside the sandbox is a Jupyter server you can control from their SDK.

Get started with E2B with the following command.

npm i @e2b/code-interpreter

Execute a program.

import { CodeInterpreter } from '@e2b/code-interpreter'

const sandbox = await CodeInterpreter.create()
await sandbox.notebook.execCell('x = 1')

const execution = await sandbox.notebook.execCell('x+=1; x')
console.log(execution.text)  // outputs 2

await sandbox.close()

For more on how to work with E2B, visit their official documentation.

Star the AgentOps repository ⭐

5. Autogen - A programming framework for AI agents

Building agentic workflows can quickly become overwhelming. Autogen, an open-source framework from Microsoft, simplifies building multi-agent workflows.

They allow you to create AI agents with distinct roles and responsibilities to accomplish complex workflows similar to those of a real-world team.

It supports multiple LLMs and conversation patterns for building effective AI agent swarms.

Quickly get started with Autogen by installing via pip.

pip install pyautogen

A small example of Autogen.

import os
from autogen import AssistantAgent, UserProxyAgent

llm_config = {"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}
assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy", code_execution_config=False)

# Start the chat
user_proxy.initiate_chat(
    assistant,
    message="Tell me a joke about NVDA and TESLA stock prices.",
)

For more, refer to their documentation.

Star the Autogen repository ⭐

6. MindsDB - The platform for building AI from enterprise data

If you are building an enterprise application, MindsDB should be the go-to choice. It lets you deploy, serve, and fine-tune models in real-time, utilizing data from databases, vector stores, or applications to build AI-powered apps—using universal tools developers already know.

They act as a bridge between data stored in various data stores with popular AI/ML frameworks such as LangChain, LlamaIndex, and AutoML.

Installing MindsDB locally via Docker or Docker Desktop, following the instructions in the linked doc pages.

Here is an example of a text-to-SQL agent, which lets you query your data sources with natural language.

-- Step 1. Connect a data source to MindsDB
CREATE DATABASE data_source
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "samples.mindsdb.com",
    "port": "5432",
    "database": "demo",
    "schema": "demo_data"
};

SELECT *
FROM data_source.car_sales;

-- Step 2. Create a skill
CREATE SKILL my_skill
USING
    type = 'text2sql',
    database = 'data_source',
    tables = ['car_sales'],
    description = 'car sales data of different car types';

SHOW SKILLS;

-- Step 3. Deploy a conversational model
CREATE ML_ENGINE langchain_engine
FROM langchain
USING
      openai_api_key = 'your openai-api-key';

CREATE MODEL my_conv_model
PREDICT answer
USING
    engine = 'langchain_engine',
    model_name = 'gpt-4',
    mode = 'conversational',
    user_column = 'question' ,
    assistant_column = 'answer',
    max_tokens = 100,
    temperature = 0,
    verbose = True,
    prompt_template = 'Answer the user input in a helpful way';

DESCRIBE my_conv_model;

-- Step 4. Create an agent
CREATE AGENT my_agent
USING
    model = 'my_conv_model',
    skills = ['my_skill'];

SHOW AGENTS;

-- Step 5. Query an agent
SELECT *
FROM my_agent
WHERE question = 'what is the average price of cars from 2018?';

SELECT *
FROM my_agent
WHERE question = 'what is the max mileage of cars from 2017?';

SELECT *
FROM my_agent
WHERE question = 'what percentage of sold cars (from 2016) are automatic/semi-automatic/manual cars?';

SELECT *
FROM my_agent
WHERE question = 'is petrol or diesel more common for cars from 2019?';

SELECT *
FROM my_agent
WHERE question = 'what is the most commonly sold model?';

For more on MondsDB, refer to the documentation.

Star the MindsDB repository ⭐

7. FireCrawl - Turn websites into LLM-ready data

Firecrawl is an open-source scrapping and crawling tool that can convert any webpage into a structured markdown format. This format can subsequently feed LLMs for various applications, such as Q&A, further model fine-tuning, etc.

You can use their API for all the actions.

Here is how to crawl a web page and all the accessible sub-pages.

curl -X POST https://api.firecrawl.dev/v1/crawl \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer fc-YOUR_API_KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "limit": 100,
      "scrapeOptions": {
        "formats": ["markdown", "html"]
      }
    }'

Scrapping

curl -X POST https://api.firecrawl.dev/v1/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://docs.firecrawl.dev",
      "formats" : ["markdown", "html"]
    }'

Mapping is used to map URLs and get website URLs.

curl -X POST https://api.firecrawl.dev/v1/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://firecrawl.dev"
    }'

For more information on Firecrawl, refer to the documentation.

Star the FireCrawl repository ⭐

8. LanceDB - Vector knowledge base for AI apps

If you are building AI apps, you will need a vector database to store and retrieve structured data like text, images, and videos. Unlike traditional databases, vector databases store embeddings of these data.

Embeddings are high-dimensional numerical representations of data. Vector databases use methods like similarity scores to retrieve relevant data.

LanceDb is an open-source vector database written in Typescript. It offers production-scale vector search, multi-modal support, Zero-copy, automatic data versioning, GPU-powered querying, and more.

Get started with LanceDB.

npm install @lancedb/lancedb

Create and query a vector database.

import * as lancedb from "@lancedb/lancedb";

const db = await lancedb.connect("data/sample-lancedb");
const table = await db.createTable("vectors", [
    { id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
    { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 },
], {mode: 'overwrite'});

const query = table.vectorSearch([0.1, 0.3]).limit(2);
const results = await query.toArray();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.query().where("price >= 10").toArray();

You can find more on LanceDB here on their documentation.

Star the LanceDB repository ⭐

That's all folks!

Join Devfest as we celebrate the future of tech

TL;DR, we created a fantastic event where you can contribute code for any AI repository and get awesome swag.

Join here: https://devfest.ai

The rules are very basic.

Go to DevFest AI - create or join an existing squad. Contribute code for any repository containing AI.
Collect points, and by the end of the event, 100 people will get awesome swag (the number might go up!)
We will also run some fun giveaways for swag during the competition.

And, of course, share your fantastic ticket 🎫