DEV Community: Mahmoud Mabrouk

How to Actually Start Contributing to Open Source AI Projects (A Step-by-Step Guide)

Mahmoud Mabrouk — Wed, 17 Dec 2025 23:00:00 +0000

So you've decided to contribute to open source. Great choice. But where do you actually start?

This guide will walk you through the exact process I've seen work for dozens of contributors to Agenta and other AI projects.

How to Actually Start Contributing (The Step-by-Step Guide)

You've picked a project. Now what? Here's exactly how to make your first contribution:

Phase 1: Build Trust (Week 1)

Don't jump straight into code. Start small and build credibility:

1. Join the community

Join Slack/Discord
Introduce yourself (mention you're looking to contribute)
Read the last 50+ messages to understand the vibe

2. Fix documentation

Found a typo? Fix it.
Unclear setup instructions? Clarify them.
Missing examples? Add one.

These PRs get merged fast and show you can follow the contribution process.

3. Help others

Answer questions in issues/discussions
Reproduce bugs for others
Share your setup experience

Why this works: Maintainers notice helpful community members. When you submit a code PR later, they'll remember you positively.

Phase 2: Learn the Codebase (Week 2-3)

Don't just read the code. Use it:

1. Build something with it

Follow tutorials
Build a small project
Try to break it (in a good way)

2. Run the tests

Set up the dev environment
Run the test suite
Understand what's being tested

3. Read issues

Look at closed issues to understand common problems
Read open issues to see what's needed
Pay attention to "good first issue" labels

4. Study merged PRs

See what maintainers approve
Notice code style patterns
Understand the review process

Pro tip: Take notes! Document your setup process, confusing parts, and potential improvements.

Phase 3: Make Your First Code Contribution (Week 3-4)

Start with the smallest possible change:

1. Pick a good first issue

Look for labels: "good first issue," "help wanted," "beginner-friendly"
Comment saying you'd like to work on it
Ask questions if anything is unclear

2. Before writing code:

Discuss your approach in the issue
Get feedback from maintainers
Avoid surprises in the PR

3. Write the code:

Follow the project's style guide
Write tests (this is crucial!)
Update documentation
Keep it small (< 200 lines changed)

4. Submit a great PR:

   ## What
   [Brief description of what you changed]

   ## Why
   [Why this change is needed; reference the issue]

   ## How
   [Brief explanation of your approach]

   ## Testing
   [How you tested this; include steps to reproduce]

   Fixes #123

5. Be responsive:

Respond to review feedback quickly
Don't take criticism personally
Ask questions when you don't understand
Say "thank you" when the PR is merged

Phase 4: Become a Regular Contributor (Ongoing)

Now you're part of the community:

1. Keep contributing:

Tackle slightly harder issues
Help review other PRs (learn from them)
Suggest improvements

2. Specialize:

Become the go-to person for a specific area
"The evaluation metrics person"
"The docs person"
"The TypeScript integration person"

3. Build relationships:

Thank maintainers for reviews
Help onboard new contributors
Participate in community calls/discussions

4. Think long-term:

This isn't about one PR. It's about becoming a trusted contributor.
Some contributors become maintainers
Some get hired by the company
Some build their network and reputation

Common Mistakes to Avoid

❌ Don't Do This:

1. Submitting huge PRs without discussion

"I rewrote the entire evaluation system!"
Maintainers will reject these almost instantly

2. Ignoring contribution guidelines

Not running tests
Not following code style
Not writing documentation

3. Getting discouraged by rejection

Your first PR might get rejected
This is normal and part of learning
Ask for feedback and try again

4. Working on issues already assigned

Check if someone is already working on it
Don't duplicate effort

5. Being defensive about feedback

Code review is not personal criticism
Learn from every comment
Thank reviewers for their time

✅ Do This Instead:

1. Communicate early and often

"I'd like to work on this, here's my approach"
"I'm stuck on X, can someone help?"
"Thanks for the review, let me fix that"

2. Start small, think big

First PR: Fix a typo
Second PR: Add a test
Third PR: Add a feature
Tenth PR: Architect a major improvement

3. Be patient and persistent

Good contributions take time
Some PRs need multiple rounds of review
That's how you learn

4. Help others

Answer questions in issues
Review other PRs (even as a beginner)
Share what you learned

The Career Impact: Real Talk

Let me share what I've seen happen to developers who contribute to open source:

The Portfolio Effect

Traditional portfolio: "Todo app, weather app, portfolio website"

Open source portfolio: "Contributed evaluation metrics to DeepEval, added RAG features to LlamaIndex, fixed critical bugs in Instructor"

Which would you hire?

The Network Effect

One contributor I know:

Started fixing docs in an AI project
Became a regular contributor
Got noticed by the maintainers
Was referred for a job at their company
Got hired as a junior ML engineer

Timeline: 4 months from first PR to job offer.

The Learning Effect

You'll learn more in 3 months of open source contributions than 6 months of tutorials because:

You're reading production code
You're getting feedback from senior engineers
You're solving real problems
You're learning tools and workflows used in industry

The Confidence Effect

There's a massive difference between:

"I learned LangChain from a tutorial"
"I contributed features to LlamaIndex that are used by thousands of developers"

One shows you can follow instructions. The other shows you can create value.

Your Next Steps

Here's your action plan for the next 30 days:

Week 1: Explore

[ ] Pick 2-3 projects from AI open source projects
[ ] Star them on GitHub
[ ] Read their README and documentation
[ ] Join their community channels
[ ] Try building something small with each

Week 2: Setup

[ ] Pick one project to focus on
[ ] Set up the development environment
[ ] Run the tests
[ ] Build something with it
[ ] Note any issues/improvements

Week 3: Contribute

[ ] Find 3 "good first issues"
[ ] Comment on one saying you'll work on it
[ ] Submit your first PR (even if it's just docs)
[ ] Respond to feedback

Week 4: Grow

[ ] Submit a code contribution
[ ] Help answer a question in issues
[ ] Review another contributor's PR
[ ] Plan your next contribution

After 30 days, you'll have:

✅ At least 1 merged PR
✅ Understanding of how professional development works
✅ Connections in the AI/ML community
✅ Something real to put on your resume

Final Thoughts

Contributing to open source isn't just about giving back to the community (though that's great too).

It's about investing in yourself.

Every PR you submit is:

A line on your resume
A conversation with a potential mentor
A demonstration of your skills
A step toward your dream job

The best time to start was yesterday. The second-best time is today.

Pick a project. Join their community. Submit your first PR this week.

Your future self will thank you.

Ready to Make Your First Contribution? Start with Agenta

If you're looking for the perfect project to start your open source journey, I'd love to invite you to contribute to Agenta.

Why Agenta is perfect for your first contribution:

Beginner-friendly community: We have 70+ contributors, many of whom started as complete beginners
Active mentorship: Our maintainers actively review PRs and mentor new contributors
Real-world impact: Work on production features used by companies building LLM applications
Full-stack learning: Contribute to Python/FastAPI backend or React/TypeScript frontend
Good first issues: We maintain a list of beginner-friendly issues to get you started
Career opportunities: Many of our contributors have landed ML engineer roles at top companies

What you'll work on:

LLM prompt management and evaluation
Observability and tracing features
UI/UX improvements
New LLM provider integrations
Documentation and examples

Getting started is easy:

Check out the Contributing Guide
Browse good first issues
Join our Slack community and introduce yourself

I'm a maintainer of Agenta, and I personally commit to helping you make your first contribution. Drop a message in our Slack, and let's get you started!

Repository: github.com/Agenta-AI/agenta

Resources

First Timers Only; Find beginner-friendly issues
How to Contribute to Open Source
GitHub Skills; Learn Git and GitHub

Found this helpful? Drop a ❤️ and share it with other developers!

Have questions or success stories? Drop them in the comments below! 👇

8 Beginner-Friendly AI Open Source Projects to Kickstart Your Career in 2026

Mahmoud Mabrouk — Wed, 17 Dec 2025 14:04:17 +0000

Why Contributing to AI Open Source is Your Secret Weapon

If you're a junior developer trying to break into tech in 2025, I'm going to share something that changed my career: contributing to open source AI projects.

Not taking more courses. Not building another todo app. Open source contributions.

Here's the truth that nobody tells you:

🚀 AI is Eating the World (And You Can Be Part of It)

Every company (from startups to Fortune 500s) is racing to integrate AI into their products. But here's the problem: there aren't enough developers who actually understand how to build with LLMs in production.

When you contribute to open source AI projects, you're not just learning theory. You're working with production-grade code that powers real applications. You're understanding how AI systems actually work under the hood. You're building expertise that companies desperately need right now.

💼 Your GitHub Profile Beats Your Resume

I've reviewed hundreds of applications for junior roles. You know what makes candidates stand out?

Not the degree. Not the bootcamp certificate. It's the GitHub profile.

When I see a candidate with merged pull requests to popular AI projects, I know they can:

Read and understand complex codebases
Collaborate with teams asynchronously
Write code that passes real-world code reviews
Take feedback and iterate

That's exactly what we hire for.

📚 The Best Education is Free

Forget paying $10,000 for bootcamps. The best AI/ML education is happening in public, for free, right now on GitHub.

When you contribute to open source, you get mentorship from senior engineers reviewing your code. You get real-world experience with CI/CD and testing and documentation. You get industry best practices baked into every PR. You get portfolio projects that actually matter.

🌐 Network Effects

Open source is how you meet the people who will change your career.

Maintainers become mentors. Contributors become colleagues. GitHub issues become job referrals.

I've seen it happen countless times. A junior contributor becomes a regular, gets noticed by the maintainers, and gets offered a job (or at least a strong referral).

How to Choose Where to Contribute

Before we dive into the projects, let's talk strategy. Not all open source projects are created equal, especially for beginners.

✅ What to Look For:

1. Active Maintenance

Recent commits (within the last week)
Maintainers who respond to issues and PRs
Regular releases

2. Welcoming Community

"Good first issue" labels
Contribution guidelines (CONTRIBUTING.md)
Code of conduct
Responsive on Slack/Discord

3. Clear Documentation

Setup instructions that actually work
Architecture documentation
API references
Example code

4. Your Interests + Skills

Do you like frontend? Pick projects with TypeScript/React
Prefer backend? Look for Python/FastAPI projects
Interested in a specific AI domain? (RAG, agents, evaluation, etc.)

5. Size Sweet Spot

Avoid: 100k+ star projects (too big, too complex)
Avoid: <500 star projects (might not be maintained)
Sweet spot: 3k-30k stars (active, manageable, impactful)

❌ What to Avoid:

Projects by big tech (Google, Microsoft); often too complex, enterprise-focused
Projects with no recent activity
Projects with toxic communities (check how maintainers respond to issues)
Projects with unclear contribution guidelines

The 8 Best AI Open Source Projects for Beginners

I've personally vetted these projects based on:

Beginner-friendliness: Clear codebase, good docs, welcoming community
Learning value: You'll gain practical AI/ML skills
Career impact: These technologies are in demand
Diversity: Different categories so you can follow your interests

1. 🎯 Agenta (The Open-Source LLMOps Platform)

What it does: Agenta is a complete platform for building production-grade LLM applications. It provides prompt management, LLM evaluation, and observability. Everything you need to take an LLM app from prototype to production.

Tech Stack:

Backend: Python, FastAPI, PostgreSQL, MongoDB, Redis
Frontend: TypeScript, React, Next.js, Ant Design
DevOps: Docker, Docker Compose

GitHub Stats:

⭐ Stars: 3,529
🐛 Open Issues: 112
👥 Contributors: 52+

Why it's perfect for beginners:

You can contribute to both backend (Python/FastAPI) and frontend (React/TypeScript), depending on your interests. This is full-stack learning.

You'll work on features that companies actually use. Prompt versioning. A/B testing. Evaluation pipelines. Observability dashboards. These are real production features.

The codebase is well-organized with clear separation between services. Great for learning microservices patterns. The team is super responsive to PRs and actively mentors contributors. The community already has 70+ contributors from around the world.

What you'll learn:

Building full-stack AI applications
LLM prompt management and versioning
Evaluation pipelines and metrics
Observability and tracing patterns
Working with vector databases
CI/CD for AI applications

Contribution Opportunities:

🎨 UI/UX improvements: Make the playground more intuitive
🧪 Add new evaluators: Implement evaluation metrics (toxicity, bias, etc.)
📚 Documentation: Improve setup guides, add tutorials
🔌 LLM integrations: Add support for new providers (Anthropic, Mistral, etc.)
✅ Testing: Improve test coverage
📊 Observability features: Enhance tracing and monitoring

Getting Started:

Repository: github.com/Agenta-AI/agenta

👋 Full disclosure: I'm one of the maintainers of Agenta, and I'm actively looking for contributors! We have a welcoming community and love mentoring junior developers. If you're interested in LLMOps, this is a great place to start.

2. 🌐 LiteLLM (Universal LLM API Gateway)

What it does: LiteLLM is like a universal adapter for LLMs. It lets you call 100+ LLM APIs (OpenAI, Anthropic, Gemini, Mistral, etc.) using the same OpenAI-compatible format. Plus, it includes cost tracking, load balancing, and fallbacks.

Tech Stack:

Language: Python
Core: Async Python, Pydantic
Proxy: FastAPI

GitHub Stats:

⭐ Stars: 32,548
🐛 Open Issues: 1,346
👥 Active community

Why it's perfect for beginners:

The library solves one problem really well: LLM API abstraction. This makes the codebase easy to understand. Adding support for a new LLM provider follows a clear, repeatable pattern.

You'll understand how different LLM APIs work and how to build resilient API clients. The documentation has clear examples for every use case.

What you'll learn:

Working with async Python
API design and abstraction patterns
Error handling and retry logic
Cost tracking and analytics
Load balancing strategies

Contribution Opportunities:

➕ Add new LLM providers: Support for new models/APIs
🔧 Improve error handling: Better error messages and retry logic
📊 Enhance cost tracking: More accurate token counting
📚 Documentation: Examples for specific use cases
✅ Tests: Coverage for edge cases

Repository: github.com/BerriAI/litellm

3. 🤖 Pydantic AI (Agent Framework, The Pydantic Way)

What it does: Pydantic AI is an agent framework built by the creators of Pydantic (the most popular Python validation library). It brings type safety and validation to LLM agents, making them more reliable and easier to debug.

Tech Stack:

Language: Python 3.10+
Core: Pydantic V2
Integrations: OpenAI, Anthropic, Gemini, Ollama, and more

GitHub Stats:

⭐ Stars: 13,816
🐛 Open Issues: 398
👥 Built by the Pydantic team

Why it's perfect for beginners:

The Pydantic team is known for excellent code quality and documentation. If you know Python type hints and Pydantic, you can contribute. You'll learn how to build reliable AI agents with proper validation. It's a new project with lots of opportunities to contribute.

What you'll learn:

Building AI agents with proper validation
Pydantic advanced patterns
Type-safe LLM interactions
Streaming structured outputs
Tool calling and function execution

Contribution Opportunities:

📝 Examples and tutorials: Show different agent patterns
🔌 LLM provider integrations: Add new model support
🧪 Validation patterns: Improve retry/validation logic
📚 Documentation: Expand guides and API docs
✅ Tests: Edge case coverage

Repository: github.com/pydantic/pydantic-ai

4. 📊 RAGAS (RAG Evaluation Framework)

What it does: RAGAS (Retrieval Augmented Generation Assessment) is a framework specifically designed for evaluating RAG pipelines. It provides metrics like Faithfulness, Context Relevancy, Answer Relevancy, and more.

Tech Stack:

Language: Python
Integrations: LangChain, LlamaIndex, OpenAI, Anthropic
Core: NumPy, Pandas

GitHub Stats:

⭐ Stars: 11,772
🐛 Open Issues: 284
👥 Growing community

Why it's perfect for beginners:

RAGAS specializes in RAG evaluation, making it easier to understand than general-purpose frameworks. Each evaluation metric is well-defined and isolated. This makes contributions straightforward.

RAG is everywhere, and evaluating it is critical. You're solving a real problem. The documentation clearly explains what each metric measures.

What you'll learn:

RAG architecture and patterns
Evaluation metrics and methodologies
Working with embeddings and vector similarity
LLM-as-a-judge patterns
Statistical analysis for AI systems

Contribution Opportunities:

📏 New evaluation metrics: Implement new ways to measure RAG quality
🔧 Improve existing metrics: Make them more accurate/efficient
📊 Visualization: Better reporting and dashboards
📚 Documentation: More examples and guides
✅ Tests: Validate metrics work correctly

Repository: github.com/explodinggradients/ragas

5. 🎯 Instructor (Structured Outputs from LLMs)

What it does: Instructor is the most popular Python library for getting structured outputs from LLMs. It uses Pydantic to validate and parse LLM responses, with automatic retries when validation fails.

Tech Stack:

Language: Python, TypeScript
Core: Pydantic, OpenAI SDK
Supports: 15+ LLM providers

GitHub Stats:

⭐ Stars: 11,995
📥 Downloads: 1M+ monthly
🐛 Open Issues: 85

Why it's perfect for beginners:

Instructor does one thing exceptionally well: structured LLM outputs. If you know Pydantic, you can contribute. The documentation has tons of examples for every use case.

Millions of developers use this library. Your contributions matter.

What you'll learn:

Pydantic validation patterns
LLM function calling and tool use
Retry logic and error handling
Working with multiple LLM providers
Type-safe API design

Contribution Opportunities:

📝 Examples and recipes: Show how to solve specific problems
🔌 Provider support: Add/improve LLM provider integrations
🔧 Better error messages: Help users debug validation issues
📚 Documentation: Tutorials and guides
✅ Tests: Coverage for different providers

Repository: github.com/instructor-ai/instructor

6. 🧪 DSPy (Programming, Not Prompting, LLMs)

What it does: DSPy (from Stanford NLP) shifts the paradigm from manual prompt engineering to programming with structured modules. It includes algorithms to automatically optimize prompts based on your data and metrics.

Tech Stack:

Language: Python
Core: PyTorch-style modular design
Research: Stanford NLP

GitHub Stats:

⭐ Stars: 30,825
🐛 Open Issues: 404
🎓 Stanford research project

Why it's perfect for beginners:

You'll learn a completely different way of working with LLMs. You'll stand out. The code is clean and well-documented (research quality from Stanford researchers). You'll understand how to systematically improve prompts instead of guessing.

This is an academic project with community governance (not enterprise-controlled).

What you'll learn:

Prompt optimization algorithms
Modular LLM programming
Evaluation-driven development
Bayesian optimization
Meta-learning concepts

Contribution Opportunities:

🧮 New optimizers: Implement optimization algorithms
📝 Examples: Show DSPy patterns for different use cases
📊 Metrics: Add new evaluation metrics
📚 Documentation: Tutorials and guides
🐛 Bug fixes: Improve stability

Repository: github.com/stanfordnlp/dspy

7. ✅ DeepEval ("Pytest for LLMs")

What it does: DeepEval is an evaluation framework that works like pytest. Write tests for your LLM applications and run them in CI/CD. It includes 14+ metrics for RAG, fine-tuning, hallucination detection, and more.

Tech Stack:

Language: Python 3.9+
Core: Pydantic, pytest patterns
Integrations: OpenAI, Anthropic, Cohere

GitHub Stats:

⭐ Stars: 12,632
🐛 Open Issues: 226
👥 Growing community

Why it's perfect for beginners:

If you know pytest, you'll understand DeepEval immediately. Each metric is isolated and well-defined. Every LLM app needs evaluation (you're solving a universal problem).

The metrics tell you why the score is what it is. This makes debugging easier.

What you'll learn:

LLM evaluation methodologies
Writing test frameworks
Metrics design and implementation
CI/CD for AI applications
LLM-as-a-judge patterns

Contribution Opportunities:

📏 New evaluation metrics: Implement metrics for specific use cases
🔧 Improve existing metrics: Make them more accurate
📊 Better reporting: Improve test output and visualization
📚 Documentation: Guides and examples
✅ Tests: Meta-tests for the testing framework!

Repository: github.com/confident-ai/deepeval

8. 🛡️ Guardrails AI (Reliable AI Applications)

What it does: Guardrails AI helps you build reliable AI applications by validating inputs and outputs. It includes a hub of pre-built validators (toxicity, PII detection, factual consistency, etc.) that you can combine into "guards."

Tech Stack:

Language: Python, JavaScript
Core: Pydantic validation
Hub: 50+ pre-built validators

GitHub Stats:

⭐ Stars: 6,155
🐛 Open Issues: 18
🔐 Focus: AI safety and reliability

Why it's perfect for beginners:

AI safety is critical and getting more attention. Each validator is isolated, making it easy to contribute new ones. The use cases are practical: PII detection, toxicity checking, hallucination detection. You can contribute to Python or JavaScript.

What you'll learn:

AI safety and validation patterns
Input/output filtering
PII detection algorithms
Toxicity classification
Building validation pipelines

Contribution Opportunities:

🛡️ New validators: Add validators for specific risks
🔧 Improve existing validators: Better accuracy/performance
🔌 Integrations: Support for new LLM providers
📚 Documentation: Security best practices
✅ Tests: Validate the validators!

Repository: github.com/guardrails-ai/guardrails

Get Started Today

Pick one project from this list. Star it on GitHub. Read the README. Join their community.

Then make your first contribution this week.

The best time to start was yesterday. The second-best time is now.

About the Author

I'm a maintainer of Agenta, an open-source LLMOps platform with 70+ contributors from around the world. I've seen firsthand how open source contributions can transform careers. Many of our contributors have gone on to land ML engineer roles at top companies.

If you're interested in contributing to Agenta or have questions about getting started with open source, feel free to reach out on our Slack community. We love mentoring junior developers!

Resources

First Timers Only; Find beginner-friendly issues
How to Contribute to Open Source
GitHub Skills; Learn Git and GitHub

Found this helpful? Drop a ❤️ and share it with other developers looking to break into AI!

Have questions or success stories? Drop them in the comments below! 👇

Sources & Further Reading

Create a Custom Playground to your LLM application

Mahmoud Mabrouk — Wed, 16 Apr 2025 15:59:43 +0000

This article is part of Agenta Launch Week (April 14-18, 2025), where we're announcing new features daily.

What is Agenta?

Agenta is an open-source platform built for AI engineers and teams working with large language models. Our playground, evaluation tools, and deployment solutions help streamline the entire LLM application development process. As an open-source tool, Agenta gives developers complete control over their AI infrastructure while providing enterprise-grade features for teams of all sizes.

Introducing Custom Workflows

LLM applications aren't just single prompts. They include complicated workflow logic with retrieved documents, multiple LLM calls, and various parameters:

RAG applications have embedding models, top-K values, and reranking settings
Agent workflows include reasoning steps and tool calls

When Subject Matter Experts (SMEs) like lawyers or doctors test prompts in isolation, the results often differ from what happens in the complete application or in production. The experts can't access the full application context or modify critical parameters. As a result:

SMEs working on prompts without full application context get unexpected results when those prompts run in the actual application
SMEs cannot change important parameters beyond the prompt (tool descriptions, embedding models, top-k values in RAG)
Time is wasted between developers and SMEs handing over prompt configurations only to find different results
Engineering bottlenecks form when only developers can run evaluations while SMEs can only work on single prompts

This creates unnecessary back-and-forth between developers and domain experts, slowing down development cycles.

What Custom Workflows does

Custom Workflows allows you to connect your application to Agenta with minimal changes. Once connected:

Your team will have access to a playground where they can modify any parameter you expose - prompts, embedding models, top-K values, etc.
You can version, compare, and deploy the entire configuration schema you specify to your application
SMEs can not only run the app with different configs, they can also run evaluations and annotations end-to-end for the app

How to implement it

To connect your application to Agenta, you need to expose an endpoint to your application using our Python SDK. Agenta uses the exposed endpoint to read your configuration schema and generate the appropriate playground to interact with your app, letting your team modify parameters without touching code.

Here's an example of our SDK in action:

from openai import OpenAI
from pydantic import BaseModel, Field
import agenta as ag
from agenta.sdk.types import PromptTemplate

ag.init()

client = OpenAI()
prompt1 = "Summarize the following blog post: {blog_post}"
prompt2 = "Write a tweet based on this: {output_1}"

# We define the configuration of the app using a Pydantic Class
class CoPConfig(BaseModel):
    prompt1: PromptTemplate = Field(default=PromptTemplate(user_prompt=prompt1))
    prompt2: PromptTemplate = Field(default=PromptTemplate(user_prompt=prompt2))

# Agenta will interact with your code using this route
@ag.route("/", config_schema=CoPConfig)
def generate(blog_post: str):
        # config contains the context fetched from the request
    config = ag.ConfigManager.get_from_route(schema=CoPConfig)
    formatted_prompt1 = config.prompt1.format(blog_post=blog_post)
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": formatted_prompt1}]
    )
    output_1 = completion.choices[0].message.content
    formatted_prompt2 = config.prompt2.format(output_1=output_1)
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": formatted_prompt2}]
    )
    return completion.choices[0].message.content

You can read the documentation for more details, get started, and watch the announcement for a short demo.

The Technical Solution

Behind the scenes, our SDK creates an endpoint authenticated with your Agenta credentials that Agenta uses to communicate with your app. The SDK uses FastAPI to create an openapi.json that exposes the Pydantic class you configure for your configuration. Agenta uses the openapi.jsonto discover the config schema and generate the UI playground. It then uses that endpoint to communicate with the application.

How to get started

Check out our documentation for step-by-step instructions and more examples.

Custom Workflows is the second of five major features we're releasing this week. Stay tuned for more tomorrow.

⭐ Star Agenta

Consider giving us a star! It helps us grow our community and gets Agenta in front of more developers.

Compare All LLMs in one Playground: Introducing Agenta's AI Model Hub

Mahmoud Mabrouk — Tue, 15 Apr 2025 16:13:17 +0000

This article is part of Agenta Launch Week (April 14-18, 2025), where we're announcing new features daily.

What is Agenta?

Introducing AI Model Hub

When working with LLMs, comparing different models is essential. Our users face several challenges:

You have fine-tuned models for your specific use cases but couldn't use them in Agenta's playground or evaluations.
Many of you run self-hosted models for privacy, security, or cost reasons – through Ollama or dedicated servers – but couldn't connect them to Agenta.
Enterprise users who rely on Azure OpenAI and AWS Bedrock had to switch between platforms when using Agenta.
And teams had no way to share model access without sharing raw API keys, creating security risks and complicating collaboration.

Introducing Model Hub: Your central connection point

Model Hub gives you a central place in Agenta to manage all your model connections:

Connect any model: Azure OpenAI, AWS Bedrock, self-hosted models, fine-tuned models – any model with an OpenAI-compatible API.
Configure once, use everywhere: Set up your models once and use them in playground experiments, evaluations, and deployments.
Secure team sharing: Give your team access to models without sharing API keys. Control permissions per model.
Consistent experience: Use the same Agenta interface for all your models.

How it works

Model Hub is under Configuration in your Agenta interface:

Select your provider (OpenAI, Azure, AWS, Ollama, etc.)
Enter your API keys
Connect to your self-hosted models
Share access with teammates

Once configured, your models appear throughout Agenta. Run comparisons in the playground, test performance in evaluations, and deploy with confidence – using models from any provider.

Security is built-in

When building Model Hub, we prioritized the security of your API keys and credentials:

Your model keys and credentials are encrypted at rest in our database and protected in transit with TLS encryption. We never log these sensitive details, and they're only held in memory for the minimum time needed to process requests.

Currently, access to models is managed at the project level, with all team members working on a project able to use the configured models. Every access to these credentials requires proper authentication with tokens tied to specific users and projects.

For Business and Enterprise customers, Role-Based Access Control (RBAC) provides additional security by letting you control exactly who on your team can view or modify model configurations.

We've designed this system from the ground up with security best practices, ensuring your valuable API keys and model access credentials remain protected.

What's next?

The AI Model Hub is just the first feature in our Launch Week. Four more announcements are coming in the next few days to improve your LLM development workflow.

Ready to try it? Check out Agenta on GitHub or log in to your Agenta account to set up your Model Hub.

Stay tuned for tomorrow's announcement!

⭐ Star Agenta

Consider giving us a star! It helps us grow our community and gets Agenta in front of more developers.

Build an AI code review assistant with v0.dev, litellm and Agenta

Mahmoud Mabrouk — Mon, 13 Jan 2025 19:49:47 +0000

The code for this tutorial is available here.

Ever wanted your own AI assistant to review pull requests? In this tutorial, we'll build one from scratch and take it to production. We'll create an AI assistant that can analyze PR diffs and provide meaningful code reviews—all while following LLMOps best practices.

You can try out the final product here. Just provide the URL to a public PR and receive a review from our AI assistant.

What we'll build

This tutorial walks through creating a production-ready AI assistant. Here's what we'll cover:

Writing the Code: Fetching the PR diff from GitHub and calling an LLM using LiteLLM.
Adding Observability: Instrumenting the code with Agenta to debug and monitor our app.
Prompt Engineering: Refining prompts and comparing different models using Agenta's playground.
LLM Evaluation: Using LLM-as-a-judge to evaluate prompts and select the best model.
Deployment: Deploying the app as an API and building a simple UI with v0.dev.

Let's get started!

Writing the core logic

Our AI assistant's workflow is straightforward: When given a PR URL, it fetches the diff from GitHub and passes it to an LLM for review. Let's break this down step by step.

First, we'll fetch the PR diff. GitHub provides this in an easily accessible format:

https://patch-diff.githubusercontent.com/raw/{owner}/{repo}/pull/{pr_number}.diff

Here's a Python function to retrieve the diff:

def get_pr_diff(pr_url):
    """
    Fetch the diff for a GitHub Pull Request given its URL.

    Args:
        pr_url (str): Full GitHub PR URL (e.g., https://github.com/owner/repo/pull/123)

    Returns:
        str: The PR diff text
    """
    pattern = r"github\.com/([^/]+)/([^/]+)/pull/(\d+)"
    match = re.search(pattern, pr_url)

    if not match:
        raise ValueError("Invalid GitHub PR URL format")

    owner, repo, pr_number = match.groups()

    api_url = f"https://patch-diff.githubusercontent.com/raw/{owner}/{repo}/pull/{pr_number}.diff"

    headers = {
        "Accept": "application/vnd.github.v3.diff",
        "User-Agent": "PR-Diff-Fetcher"
    }

    response = requests.get(api_url, headers=headers)
    response.raise_for_status()

    return response.text

Next, we'll use LiteLLM to handle our interactions with language models. LiteLLM provides a unified interface for working with various LLM providers—making it easy to experiment with different models later:

prompt_system = """
You are an expert Python developer performing a file-by-file review of a pull request. You have access to the full diff of the file to understand the overall context and structure. However, focus on reviewing only the specific hunk provided.
"""

prompt_user = """
Here is the diff for the file:
{diff}

Please provide a critique of the changes made in this file.
"""

def generate_critique(pr_url: str):
    diff = get_pr_diff(pr_url)
    response = litellm.completion(
        model=config.model,
        messages=[
            {"content": config.system_prompt, "role": "system"},
            {"content": config.user_prompt.format(diff=diff), "role": "user"},
        ],
    )
    return response.choices[0].message.content

Adding observability

Observability is crucial for understanding and improving LLM applications. It helps you track inputs, outputs, and the information flow, making debugging easier. We'll use Agenta for this purpose.

Agenta is an open-source LLMOps platform that provides you with all the tools needed to build production-ready LLM-powered applications. It offers a centralized environment to manage prompts, instrument applications for tracking inputs and outputs, and run evaluations to assess result quality. You can signup for free for the cloud platform or self-host our platform.

First, we initialize Agenta and set up LiteLLM callbacks. The callback automatically instruments all the LiteLLM calls:

import agenta as ag

ag.init()
litellm.callbacks = [ag.callbacks.litellm_handler()]

Then we add instrumentation decorators to both functions (generate_critique and get_pr_diff) to capture their inputs and outputs. Here's how it looks for the generate_critique function:

@ag.instrument()
def generate_critique(pr_url: str):
    diff = get_pr_diff(pr_url)
    config = ag.ConfigManager.get_from_route(schema=Config)
    response = litellm.completion(
        model=config.model,
        messages=[
            {"content": config.system_prompt, "role": "system"},
            {"content": config.user_prompt.format(diff=diff), "role": "user"},
        ],
    )
    return response.choices[0].message.content

To set up Agenta, we need to set the environment variable AGENTA_API_KEY (which you can find here) and optionally AGENTA_HOST if we're self-hosting.

We can now run the app and see the traces in Agenta.

Creating an LLM playground

Now that we have our POC, we need to iterate on it and make it production-ready. This means experimenting with different prompts and models, setting up evaluations, and versioning our configuration.

Agenta custom workflow feature lets us create an IDE-like playground for our AI-assistant workflow.

We'll add a few lines of code to create an LLM playground for our application. This will enable us to version the configuration, run end-to-end evaluations, and deploy the last versions in one click.

Here's the modified code:

import requests
import re
import sys
from urllib.parse import urlparse
from pydantic import BaseModel, Field
from typing import Annotated
import agenta as ag
import litellm
from agenta.sdk.assets import supported_llm_models

ag.init()

litellm.drop_params = True
litellm.callbacks = [ag.callbacks.litellm_handler()]

prompt_system = """
You are an expert Python developer performing a file-by-file review of a pull request. You have access to the full diff of the file to understand the overall context and structure. However, focus on reviewing only the specific hunk provided.
"""

prompt_user = """
Here is the diff for the file:
{diff}

Please provide a critique of the changes made in this file.
"""

# highlight-start
class Config(BaseModel):
    system_prompt: str = prompt_system
    user_prompt: str = prompt_user
    model: Annotated[str, ag.MultipleChoice(choices=supported_llm_models)] = Field(default="gpt-3.5-turbo")
# highlight-end

# highlight-next-line
@ag.route("/", config_schema=Config)
@ag.instrument()
def generate_critique(pr_url:str):
    diff = get_pr_diff(pr_url)
    # highlight-next-line
    config = ag.ConfigManager.get_from_route(schema=Config)
    response = litellm.completion(
        model=config.model,
        messages=[
            {"content": config.system_prompt, "role": "system"},
            {"content": config.user_prompt.format(diff=diff), "role": "user"},
        ],
    )
    return response.choices[0].message.content

Let's break it down:

Defining the configuration and the layout of the playground

from pydantic import BaseModel, Field
from typing import Annotated
from agenta.sdk.assets import supported_llm_models

class Config(BaseModel):
    system_prompt: str = prompt_system
    user_prompt: str = prompt_user
    model: Annotated[str, ag.MultipleChoice(choices=supported_llm_models)] = Field(default="gpt-3.5-turbo")

To integrate our code in Agenta, we first define the configuration schema. This helps Agenta understand the inputs and outputs of our function and create a playground for it. The configuration defines the playground's layout: the system and user prompts (str) appear as text areas, and the model appears as a multi-select dropdown (note that supported_llm_models is variable in the Agenta SDK that contains a dictionary of providers and their supported models).

Creating the entrypoint

We'll adjust our function to use the configuration. @ag.route creates an API endpoint for our function with the configuration schema we defined. This endpoint will be used by Agenta's playground, evaluation, and the deployed API to interact with our application.

ag.ConfigManager.get_from_route(schema=Config) fetches the configuration from the request sent to that API endpoint.

# highlight-next-line
@ag.route("/", config_schema=Config)
@ag.instrument()
def generate_critique(pr_url: str):
    diff = get_pr_diff(pr_url)
    # highlight-next-line
    config = ag.ConfigManager.get_from_route(schema=Config)
    response = litellm.completion(
        model=config.model,
        messages=[
            {"content": config.system_prompt, "role": "system"},
            {"content": config.user_prompt.format(diff=diff), "role": "user"},
        ],
    )
    return response.choices[0].message.content

Serving the application with Agenta

We can now add the application to Agenta. Here's what we need to do:

Run agenta init and specify our app name and API key
Run agenta variant serve app.py

The last command builds and serves your application, making it accessible through Agenta's playground. There, you can run the application end-to-end by giving it a PR URL and getting the review generated by our LLM.

Evaluating using LLM-as-a-judge

To evaluate the quality of our AI assistant's reviews and compare prompts and models, we need to set up evaluation.

First, we'll create a small test set with publicly available PRs.

Next, we'll set up an LLM-as-a-judge to evaluate the quality of the reviews.

To do this, navigate to the evaluation view, click on "Configure evaluators", then "Create new evaluator" and select "LLM-as-a-judge".

We'll get a playground where we can test different prompts and models for our human evaluator. We use the following system prompt:

You are an evaluator grading the quality of a PR review.
CRITERIA:
Technical Accuracy

The reviewer identifies and addresses technical issues, ensuring the PR meets the project's requirements and coding standards.
Code Quality

The review ensures the code is clean, readable, and adheres to established style guides and best practices.
Functionality and Performance

The reviewer provides clear, actionable, and constructive feedback, avoiding vague or unhelpful comments.
Timeliness and Thoroughness

The review is completed within a reasonable timeframe and demonstrates a thorough understanding of the code changes.

SCORE:
-The score should be between 0 and 10
-A score of 10 means that the answer is perfect. This is the highest (best) score.
A score of 0 means that the answer does not any of of the criteria. This is the lowest possible score you can give.

ANSWER ONLY THE SCORE. DO NOT USE MARKDOWN. DO NOT PROVIDE ANYTHING OTHER THAN THE NUMBER

For the user prompt, we'll use the following:

LLM APP OUTPUT: {prediction}

Note that the evaluator accesses the LLM app's output through the {prediction} variable. We can iterate on the prompt and test different models in the evaluator test view.

With our evaluator set up, we can run experiments and compare different prompts and models. In the playground, we can create multiple variants and run batch evaluations using the pr-review-quality LLM-as-a-judge.

After comparing models, we found similar performance across the board. Given this, we chose GPT-3.5-turbo for its optimal balance of speed and cost.

Deploying to production

Deployment is straightforward with Agenta:

Navigate to the overview page
Click the three dots next to your chosen variant
Select "Deploy to Production"

This gives you an API endpoint ready to use in your application.

Agenta works in both proxy mode and prompt management mode. You can either use Agenta's endpoint or deploy your own app and use the Agenta SDK to fetch the production configuration.

Building the frontend

For the frontend, we used v0.dev to quickly generate a UI. After providing our API endpoint and authentication requirements, we had a working UI in minutes. You can try it yourself: PR Review Assistant.

What's next?

With our AI assistant in production, Agenta continues to provide observability tools. We can continue enhancing it by:

Refine the Prompt: Improve the language to get more precise critiques.
Add More Context: Include the full code of changed files, not just the diffs.
Handle Large Diffs: Break down extensive changes and process them in parts.

Conclusion

In this tutorial, we've:

Built an AI assistant that reviews pull requests.
Implemented observability and prompt engineering using Agenta.
Evaluated our assistant with LLM-as-a-judge.
Deployed the assistant and connected it to a frontend.

One last thing

If you liked this tutorial, and you to learn more about AI engineering and how to build production-ready AI applications. Follow our page, and start our repository.

🔥🤖 Build an AI-Powered Discord Bot to Recommend HackerNews Posts using OpenAI, Novu and Agenta 🚀

Mahmoud Mabrouk — Wed, 06 Sep 2023 21:34:58 +0000

TL;DR

In this tutorial, you'll learn how to create an AI agent that alerts you about relevant Hacker News posts tailored to your interests. The agent sends Discord notifications whenever a post matches your criteria.

We'll write the code using Python, use Beautifulsoup for web scraping, use OpenAI with Agenta for building the AI, and Novu for Slack notifications.

Goal? No more endless scrolling on Hacker News. Let the AI bring worthy posts to you!

Agenta: The Open-source LLM app builder 🤖

A bit about us: Agenta is an end-to-end open-source LLM app builder. It enables you to quickly build, experiment, evaluate, and deploy LLM apps as APIs. You can use it by writing code in Langchain or any other framework, or directly from the UI.

Here's the plan 📝

We will write a script that does the following:

Scrape the first five Hacker News pages for post titles using Python and Beautifulsoup.
Use Agenta and GPT3.5 to categorize posts based on your interests.
Send compelling posts to your Slack channel.

Setting things up:

To get started, let's create a project folder and run Poetry. If you aren't familiar with Poetry, you should check it out as it provides an alternative to virtual environments that is much easier to use.

mkdir hnbot; cd hnbot/
poetry init

This command will guide you through creating your pyproject.toml config.


Would you like to define your main dependencies interactively? (yes/no) [yes] yes

Package to add or search for (leave blank to skip): novu
Add a package (leave blank to skip): beautifulsoup4


Do you confirm generation? (yes/no) [yes] yes

Follow the prompts to set up your project. Don't forget to install novu and beautifulsoup4.

Now let's create the folder for our package and initialize the poetry environement

% mkdir hn_bot; cd hn_bot
% cd hn_bot 
% poetry shell
(hn-bot-py3.9) (base) % poetry install

Now, we have a local environment where:

All our requirements are installed.
We have a Python package called hn_bot that is in our Python lib.

This means if we have multiple files in our library, we can import them using import hn_bot.module_name.

Scraping Hacker News Posts

Scraping the Hacker News page is straightforward since it does not use any complicated JavaScript. The pages are located at https://news.ycombinator.com/?p=pagenumber.

To find the titles and links on the page, we just need to open the web browser and access the dev console. Once there, we can check if there are any elements we can use to locate the titles and links. Luckily, it seems that every post is a span with the class "titleline."

We can use this to extract information from a single page. Let's write a function that extracts titles and links from Hacker News.

# hn_scraper.py
from typing import Dict, List

import requests
from bs4 import BeautifulSoup

def scrape_page(page_number: str) -> List[Dict[str, str]]:
    response = requests.get(f"https://news.ycombinator.com/news?p={page_number}")
    yc_web_page = response.text
    soup = BeautifulSoup(yc_web_page, 'html.parser')

    articles = []

    for article_tag in soup.find_all(name="span", class_="titleline"):
        title = article_tag.getText()
        link = article_tag.find("a")["href"]
        articles.append({"title": title, "link": link})

    return articles

We can test it by adding a print(scrape_page(1)) at the end of the script and running it on shell:

 % python hn_scraper.py
(['Linux Network Performance Parameters Explained (github.com/leandromoreira)', 'Double Commander – Changes in version 1.1.0 (github.com/doublecmd)', 'If You’ve Got a New Car, It’s a Data Privacy Nightmare (gizmodo.com)', 'Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates', 'Gcsfuse: A user-space file system for interacting with Google Cloud S

Congratulations!🎉 Now we have a script that scrapes post titles from HackerNews

Creating the AI Agent 🤖

Now that we have a list of posts, we need to use OpenAI gpt models to classify whether they are relevant based on the user's interests. For this, we are going to use Agenta.

Agenta allows you to create LLM apps from code or from the UI. Since our LLM app today is quite simple, we will create it from the UI.

Agenta can be self-hosted, however to get started quickly we'll use demo.agenta.ai.

Since our LLM app today is quite simple, we will just go ahead and create it from the UI.

You can self host agenta (Check out docs for that here (https://docs.agenta.ai/installation/local-installation/local-installation) or use the cloud-hosted demo. To get started quickly we'll do the later.

Let's go to demo.agenta.ai and login.

First, let's create a new app by clicking on "Create New App".

Then we select start from template

And use a single prompt template

Doing some Prompt Engineering In Agenta 🪄 ✨

Now we have a playground for creating the app.

First, let's add the inputs for our application. In this case, we will be using "title" for the Hacker News title and "interests" for the user's interests.

Next, we need to do a little prompt engineering. Since we are using gpt3.5 (the cheapest variant in OpenAI). It takes two messages: the system message and the user message. We can use the system message to guide the language model to answer in a certain way, while the prompt prompts the human to give the parameters of the task.

In this case, I tried a simple prompt for the system that ensures the answer is either "True" or "False." For the human prompt, I just asked the system to classify. Note that we used the fstring usual format to inject the inputs that we have added into the prompt.

Now we can then test the application with some examples of Hacker News titles:

Agenta provides tools to systematically evaluate applications and optimize prompts, parameters, and workflows (in case we are using something more complex with embeddings and retrieval augmented generations). However, in this case, such evaluation is unnecessary. The app itself is very simple, and gpt3.5 is able to solve the classification problem with minimal effort.

Let's save our changes

Then deploy the application as an API.

For this we jump to the endpoints menu and copy paste the code snippet to our code.

Wrapping it up 🌯

Now, we can create a function based on this code snippet.

# llm_classifier.py
import requests
import json

def classify_post(title: str, interests: str) -> bool:

    url = "https://demo.agenta.ai/64f1d1aefeebd024bbdb1ea4/hn_bot/v1/generate"
    params = {
        "inputs": {
            "title": title,
            "interests": interests
        },
        "temperature": 0,
        "model": "gpt-3.5-turbo",
        "maximum_length": 100,
        "prompt_system": "You are an expert in classification. You answer only with True or False.",
        "prompt_human": "Classify whether this hackernews post is interesting for someone with the following interests:\nHacker news post title: {title}\nInterests: {interests}",
        "stop_sequence": "\n",
        "top_p": 1,
        "frequence_penalty": 0,
        "presence_penalty": 0
    }

    response = requests.post(url, json=params)

    data = response.json()

    return bool(data)

Sending a Discord message 🎮

First we need to create a new channel in Discord

Next we need to create a webhook and copy the url

Now we need to setup the integration in Novu. For this we have to go to the Integration Store, click on “Add a provider”, select Discord, and don't forget to activate it!

Last, we need to create a workflow that triggers the message to be sent to our Discord. We will add the {{content}} variable to the message which we will later inject using the code.

Write the messaging function

Now it's time to write the message that will trigger the workflow

# novu_bot
from novu.config import NovuConfig
from novu.api import EventApi
from novu.api.subscriber import SubscriberApi
from novu.dto.subscriber import SubscriberDto

NovuConfig().configure("https://api.novu.co", "YOUR_API_KEY")
webhook_url = "..." # the webhook url we got from Discord

def send_message(msg):
    your_subscriber_id = "123"  # Replace this with a unique user ID.

    # Define a subscriber instance
    subscriber = SubscriberDto(
        subscriber_id=your_subscriber_id,
        email="abc@gmail.com",
        first_name="John",
        last_name="Doe"
    )

    SubscriberApi().create(subscriber)
    SubscriberApi().credentials(subscriber_id=your_subscriber_id,
                                provider_id="discord",

    EventApi().trigger(
        name="slackbot",  # The trigger ID of the workflow. It can be found on the workflow page.
        recipients=your_subscriber_id,
        payload={},  # Your Novu payload goes here
    )

Putting everything together

Now we're ready to assemble all the elements to get our AI assistant running.

Let's create an app.py file in which we first call the scraper, then the LLM classifier, and finally send a message with the interesting posts.

from hn_bot import hn_scraper, llm_classifier, novu_bot
import schedule
import time

interests = "LLMs, LLMOps, Python, Infrastructure, Tennis, MLOps, Data science, AI, startups, Computational Biology"

def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = hn_scraper.scrape_page("1")
    for title, url in posts:
        if llm_classifier.classify_post(title, interests) == "True":
            novu_bot.send_message(f"{title}\n{url}")

if __name__ == "__main__":
    main()

Et voila, we have all the interesting posts coming up in our Discord.

Finally let's schedule this to run each hour ⏰

We would like to run the script to check new posts each hour. For this we need to add the python library schedule

from hn_bot import hn_scraper, llm_classifier, novu_bot
from time import sleep
interests = "LLM, LLMOps, MLOps, Data science, AI, startups"

done_post_titles = []


def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = []
    for i in range(1, 5):
        posts += hn_scraper.scrape_page(i)
    for post in posts:
        title = post["title"]
        url = post["link"]
        if llm_classifier.classify_post(title, interests) and title not in done_post_titles:
            done_post_titles.append(title)
            novu_bot.send_message(f"{title}\n{url}")


if __name__ == "__main__":
    while True:
        main()
        sleep(3600)

Congratulations on making it thus far!🎉 You've now got an automated AI assistant keeping an eye on Hacker News for you.

Summary 📜

In this tutorial, we've built an AI-powered assistant to keep you in the loop with relevant Hacker News posts. You should have learned:

How to use Beautifulsoup for scraping hackernews
How to create an LLM app based on one prompt using Agenta and OpenAI gpt3.5
How to send notifications on Discord using Novu

You can check the code at this https://github.com/Agenta-AI/blog/tree/main/hackernews-bot

Thanks for reading!