Tech hiring is broken, and the competition has become highly fierce over the past two years. However, mastering niche technologies used by big companies and building innovative applications around them can significantly enhance your CV's credibility.
So, I've compiled a list of open-source libraries to help you stand out.
Feel free to explore these libraries and let others know in the comments about the ones your organization is implementing.
Composio 👑 - AI tooling and integrations platform
AI is eating the world, and there is no dispute that the future of workforces will have a human-AI hybrid system. For this to happen, the AI model should be able to access external systems.
Composio is the industry-leading solution in this space. It provides an ever-expanding catalogue of tools and integrations across industry verticals, from CRM, HRM, and Sales to Development, Productivity, and administration.
Easily integrate apps like GitHub, Slack, Jira, Gmail, etc, with AI models to automate complex real-world workflow automation.
It has native support for Python and Javascript.
Quickly get started with Composio using pip
or npm
.
pip install composio-core
npm install composio-core
Python
Add a GitHub integration.
composio add github
Composio handles user authentication and authorization on your behalf.
Here is how you can use the GitHub integration to star a repository.
from openai import OpenAI
from composio_openai import ComposioToolSet, App
openai_client = OpenAI(api_key="******OPENAIKEY******")
# Initialise the Composio Tool Set
composio_toolset = ComposioToolSet(api_key="**\\*\\***COMPOSIO_API_KEY**\\*\\***")
## Step 4
# Get GitHub tools that are pre-configured
actions = composio_toolset.get_actions(actions=[Action.GITHUB_ACTIVITY_STAR_REPO_FOR_AUTHENTICATED_USER])
## Step 5
my_task = "Star a repo ComposioHQ/composio on GitHub"
# Create a chat completion request to decide on the action
response = openai_client.chat.completions.create(
model="gpt-4-turbo",
tools=actions, # Passing actions we fetched earlier.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": my_task}
]
)
Run this Python script to execute the given instruction using the agent.
Javascript
You can Install it using npm
, yarn
, or pnpm
.
npm install composio-core
Define a method to let the user connect their GitHub account.
import { OpenAI } from "openai";
import { OpenAIToolSet } from "composio-core";
const toolset = new OpenAIToolSet({
apiKey: process.env.COMPOSIO_API_KEY,
});
async function setupUserConnectionIfNotExists(entityId) {
const entity = await toolset.client.getEntity(entityId);
const connection = await entity.getConnection('github');
if (!connection) {
// If this entity/user hasn't already connected, the account
const connection = await entity.initiateConnection(appName);
console.log("Log in via: ", connection.redirectUrl);
return connection.waitUntilActive(60);
}
return connection;
}
Add the required tools to the OpenAI SDK and pass the entity name on to the executeAgent
function.
async function executeAgent(entityName) {
const entity = await toolset.client.getEntity(entityName)
await setupUserConnectionIfNotExists(entity.id);
const tools = await toolset.get_actions({ actions: ["github_activity_star_repo_for_authenticated_user"] }, entity.id);
const instruction = "Star a repo ComposioHQ/composio on GitHub"
const client = new OpenAI({ apiKey: process.env.OPEN_AI_API_KEY })
const response = await client.chat.completions.create({
model: "gpt-4-turbo",
messages: [{
role: "user",
content: instruction,
}],
tools: tools,
tool_choice: "auto",
})
console.log(response.choices[0].message.tool_calls);
await toolset.handle_tool_call(response, entity.id);
}
executeGithubAgent("joey")
Execute the code and let the agent do the work for you.
Composio works with famous frameworks like LangChain, LlamaIndex, CrewAi, etc.
For more information, visit the official docs, and for even more complex examples, see the repository's example sections.
Star the Composio repository ⭐
2. Apache Kafka - Distributed Event Streaming Platform
Apache Kafka is the backbone of many Fortune 500 companies requiring high-throughput event data pipelines. Having Kafka in your CV would undoubtedly make you stand out.
It is an open-source distributed platform for handling real-time data streams. It enables large volumes of event data collection, storage, and processing with high fault tolerance.
It is ideal for building event-driven systems. Big companies like Netflix, LinkedIn, and Uber use Kafka to stream real-time data and analytics, manage event-driven architectures and monitoring systems, and enable real-time recommendations and notifications.
Download the latest Kafka release and extract it to get started with it:
$ tar -xzf kafka_2.13-3.8.0.tgz
$ cd kafka_2.13-3.8.0
Set up Kafka with Kraft.
To use Kafka with Kraft, create a cluster UUID.
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
Format Log directories
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
Start Kafka server
bin/kafka-server-start.sh config/kraft/server.properties
Then, you can create topics and publish and consume events
Before you write your events, you must create topics. Run this in another shell.
bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
Now, write some events to the topic.
bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
>This is my first event
>This is my second event
Read the events.
bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
For comprehensive details on Kafka and its use, refer to this article I wrote a while back.
Read more about Kafka here.
Explore the Kafka Mirror repository ⭐
3. Grafana - The Open Observability Platform
Grafana is another open-source software used by many big companies. It is an analytics and monitoring platform that allows you to query, store, and visualize metrics from multiple data sources. You can also create, explore, and share dashboards with your teams.
Features of Grafana include
- Metrics and logs visualization.
- Dynamic dashboards.
- Alerting on Slack, Pagerduty, etc., based on custom rules for metrics.
- Explore metrics through ad-hoc queries.
- Mix multiple data sources in the same graph.
Check out the official documentation to explore Grafana in detail.
Explore the Grafana repository ⭐
4. Celery - Distributed task queue
Building a robust application can be challenging, especially when multiple events need to be accounted for. Celery can come in handy in these situations.
Celery is simple, flexible, distributed open-source software that facilitates real-time processing of task queues and scheduling. It lets you offload time-consuming tasks and execute them asynchronously in the background, improving your application's performance and scalability.
It is available in most programming languages, from Python and JS to Go and Rust.
Celery uses message brokers like Redis and RabbitMQ.
Get started quickly by installing with pip
.
pip install celery reddit
Start the Redis server in the background.
redis-server
Define a simple Task like sending an email.
from celery import Celery
# Define a Celery app with Redis as the message broker
app = Celery('tasks', broker='redis://localhost:6379/0')
# Define a simple task (e.g., sending an email)
@app.task
def send_email(recipient):
print(f"Sending email to {recipient}")
return f"Email sent to {recipient}"
Start the Celery worker by running the following command in the terminal:
celery -A tasks worker --loglevel=info
You can now use send_email
asynchronously in your Python code. Create another Python script to call the task:
python
Copy code
from tasks import send_email
# Call the task asynchronously using `.delay()`
send_email.delay('user@example.com')
Once you call send_email.delay()
, the task will be processed by the Celery worker asynchronously, and you'll see something like this in the terminal where the Celery worker is running:
[2024-09-24 12:00:00,000: INFO/MainProcess] Task tasks.send_email[abc123] succeeded in 0.001s: 'Email sent to user@example.com'
For more, refer to their official documentation.
Explore the Celery repository ⭐
5. Selenium - Browser Automation Framework
Browser automation is one of the inevitable things you will encounter at least once in your tech career. Many companies use Selenium for multiple purposes, such as Web automation, testing, and even scraping dynamic web content.
Selenium allows developers to interact with web browsers programmatically, simulating user actions like clicking buttons, filling out forms, and navigating between pages. This makes it an invaluable tool for testing web applications across browsers and platforms.
It is available in programming languages.
Install Selenium in Python with pip
.
pip install Selenium
You must install Chrome Webdriver for Chromium-based browsers and Gecko Driver for Firefox browsers.
Here’s an example of using Selenium with ChromeDriver:
python
Copy code
from selenium import webdriver
# Specify the path to your ChromeDriver executable
driver = webdriver.Chrome(executable_path='/path/to/chromedriver')
# Open a webpage
driver.get("https://www.example.com")
# Perform actions (e.g., click a button, find elements, etc.)
print(driver.title) # Print the page title
# Close the browser
driver.quit()
For more, check the documentation.
Explore the Selenium repository ⭐
6. LlamaIndex - Data Framework for LLM Applications
AI is hot right now, and multiple companies are building products around AI models. There can not be a better time to be an AI developer.
LlamaIndex is a leading framework for building applications using large language models (LLMs). It lets you connect any data store with relational, graph, or vector databases with LLMs. It provides all the bells and whistles, such as data loaders, connectors, chunkers, re-rankers, etc., to build efficient AI applications.
Quickly get started with LlamaIndex by installing it via pip
.
pip install llamaindex
A simple example of using a vector database in LlamaIndex.
# custom selection of integrations to work with core
pip install llama-index-core
pip install llama-index-llms-openai
pip install llama-index-llms-replicate
pip install llama-index-embeddings-huggingface
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
Query the database.
query_engine = index.as_query_engine()
query_engine.query("YOUR_QUESTION")
For more information, please refer to their documentation.
Explore the Llama Index repository ⭐
7. Pytorch Lightning - The deep learning framework
Knowing Pytorch lightning can help your cause better if you are into AI model development.
It’s a versatile framework built with PyTorch that helps organize and grow deep learning projects. It offers tools for training, testing, and deploying models across different areas.
Here are some advantages of using Lightning over plain PyTorch:
- It makes PyTorch code easier to read, better organized, and more user-friendly.
- It reduces repetitive code by providing built-in training loops and utilities.
- It simplifies the process of training, experimenting, and deploying models with less unnecessary code.
You can get started with Lightning by installing it with pip:
Define an auto-encoder using the Lightning module.
import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning as L
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
# define the LightningModule
class LitAutoEncoder(L.LightningModule):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder
def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x, _ = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# Logging to TensorBoard (if installed) by default
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)
Load MNIST data.
# setup data
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
The Lightning Trainer “mixes” any LightningModule with any dataset and abstracts away all the engineering complexity needed for scale.
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = L.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
For more on Lightning, check out the official documentation.
Explore the Pytorch Lightning repository ⭐
8. Posthog - Open-source product analytics platform
Building a modern application is incomplete without Posthog. It is the leading solution for product analytics, offering tools to track user behaviour, measure engagement, and improve your application with actionable insights.
This is easily one of those libraries you will need all the time. They offer cloud and self-hosting solutions.
Some key features of Posthog include
- Event Tracking: Track user interactions and behaviour in real-time.
- Session Recordings: Replay user sessions to understand how they navigate your app.
- Heatmaps: Visualize where users click and engage the most on your site.
- Feature Flags: Enable or disable features for specific user groups without redeploying code.
For more, refer to the official documentation.
Explore the Posthog repository ⭐
9. Auth0 by Okta - Authentication and Authorization platform
Implementing application authentication is essential, and knowing how to roll authentication can easily stand out.
With Auth0, you can streamline the process, enabling secure login, user management, and multi-factor authentication with minimal effort.
Some of the crucial features of Auth0.
- Single Sign-On (SSO): Seamless login across multiple applications with a single credential.
- Multi-Factor Authentication (MFA): Adds extra security with multiple verification methods.
- Role-Based Access Control (RBAC): Manage user permissions based on assigned roles for secure access control.
- Social Login Integration: Easily integrate logins via Google, Facebook, and GitHub.
Auth0 SDK is available for almost all platforms and languages.
Explore the Posthog repository ⭐
Thank you for reading the listicle.
Let me know in the comments if you know of other essential open-source AI tools. ✨
Top comments (20)
Great list!
Thank you, Nevo.
Great list!
Just a small typo
pip install celery reddit
should have beenpip install celery redis
Oops,...thanks for pointing it out.
The biggest thing I've learnt today is that you can set your post picture to be a gif.
Haha...yes, it's nice.
Very useful, I'll save it, thanks.
Thank you, Tomas; I am glad you liked it.
Thank you a lot, great information.
You are welcome, Izmir.
That's a nice list, Thanks
Thanks Tim.
A well detailed and impressive list!
Thank you so much, @skywokerr
Very useful list . Thanks for sharing 🤠
Glad you liked it, Anuj.
What isn't broken! Thank you for these helpful tipS! You're amazing, really!
Thank you so much, Anna.
Good read!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.