DEV Community: cortecs

OpenCode vs Claude Code

Asmae Elazrak — Wed, 29 Oct 2025 10:09:19 +0000

AI coding assistants are becoming indispensable for developers, streamlining tasks from writing to debugging code. But as these tools proliferate, a critical question arises:

How much control do you really have over where your code goes and who can access it❓

Not all AI coding solutions offer the same level of transparency or control, and for organizations bound by strict compliance frameworks, this difference can have serious legal and operational consequences.

⚙️ Claude Code: Great for Productivity, Limited Control

Claude Code, developed by Anthropic, is a terminal-based AI coding assistant that integrates directly into your workflow. It helps with:

Code completion
Error detection
Documentation generation

It’s user-friendly, smart, and efficient — but in many cases:

💰 Fixed pricing: Commitment up to 200€/month
⚙️ Opt-out needed: Ensure your source code isn’t used for training
🔒 Limited control: Developers have limited control over where their code travels

That lack of control can become a real problem for professionals who must meet strict data policies — whether set by their company, clients, or compliance frameworks.

💡 OpenCode: The Open Source Competitor

OpenCode is an open-source, terminal-based AI coding assistant created to give developers the freedom, flexibility, and compliance missing from closed solutions. It enables developers to:

Write, debug, and refactor code using natural language
Integrate any language model of their choice (Claude, GPT, Mistral, Llama, etc.)

Key Advantages

🧠 Open Source: Fully transparent and community-audited
🔄 Bring Your Own Model (BYOM): use Claude or any other LLM
💸 Flexible pricing: Pay only for tokens used — no flat monthly commitment

Many teams — especially those operating under strict privacy or compliance policies — need to ensure their data is processed according to internal company rules. This often means keeping all activity within the EU and preventing any source code from being used for model training.

Here’s a quick overview of how to connect OpenCode with European LLM endpoints to meet those requirements.

🇪🇺 OpenCode + Cortecs: EU compliance

When paired with Cortecs, a European LLM router, it allows you to route AI requests to GDPR-compliant LLM endpoints.

🧰 Benefits Include:

Data Residency in Europe: Your code and queries never leave EU jurisdiction
No Training by Default: None of your data is used to train or fine-tune models
Built-In GDPR Compliance: Privacy-first design from the start
Seamless Integration: Works with your existing local or cloud infrastructure, including VS Code...etc

🧪 Getting Started?

Install OpenCode from the project repository.
Configure Cortecs as your model router (refer to the Cortecs Docs for setup details).
Choose your GDPR-compliant model endpoint.

In minutes, you’ll have a secure, privacy-respecting AI assistant fully integrated into your terminal workflow.

In other words, OpenCode + Cortecs gives developers full control over where and how data is processed, without sacrificing AI coding productivity 🚀.

Comparing LLM Routers

Asmae Elazrak — Wed, 16 Jul 2025 10:26:32 +0000

Large Language Models (LLMs) are rapidly reshaping the tech landscape, transforming industries from AI-powered assistants and summarization tools to smart customer support and beyond.

In today’s fast-moving AI world, developers need access to multiple models from different providers to serve diverse use cases.

The challenge isn’t just which model to use, it’s:

How do you balance reliability, cost, speed, and data privacy while using LLMs, without becoming an infrastructure engineer❓

At the heart of this problem lies the LLM router.

📦 What is an LLM Router?

An LLM router is like a smart traffic controller between your application and various LLM providers.

It helps decide:

Which model should handle each request
How to handle provider failures or slow responses
How to balance cost, speed, reliability, and compliance across providers

At a high level, an LLM router:

Accepts your inference request (like a chat prompt or code generation task)
Evaluates available LLM providers (OpenAI, Anthropic, Nebius, etc.)
Chooses the best provider based on real-time factors like cost, latency, and reliability
Sends the request to the selected provider and returns the response

Think of it as a smart, adaptable dispatcher that shields you from the complexity of managing multiple LLM APIs.

⚙️ Why Do You Need an LLM Router?

Without a router, you’re typically tied to a single provider, which brings several risks:

Vendor Lock-in: If your provider increases prices, rate limits you, or experiences downtime, you have limited options.
Missed Savings: Some providers offer similar quality at significantly lower costs.
Limited Model Specialization: Some models are better suited for code, others for summarization, chat, or creative tasks.
Data Privacy and Compliance Risks: Using non-compliant providers, especially in the EU, can lead to GDPR violations and legal issues.
Limited Model Choice: Relying on a single provider restricts your access to the growing variety of models available across the ecosystem.

With an LLM router, you can:

Load-balance across multiple providers
Failover automatically when a provider is unavailable
Optimize for cost, latency, and privacy in real time
Leverage model diversity for specialized tasks

💡 Bottom line: If you want to deliver fast, cost-efficient, reliable, and compliant AI experiences at scale, an LLM router is no longer optional.

🧐 Comparison

Let’s break down noteworthy LLM routers:

1️⃣ Cortecs

Pros:

Compliant with European GDPR.
Best coverage of the European ecosystem.
Automated failover.

Cons:

Focused on Europe and GDPR.

2️⃣ Withmartian

Pros:

Dynamically routes requests to the best-performing model for each specific query.
Offers significant cost savings by routing to cheaper models.
Outperforms even GPT-4 on OpenAI’s own evaluations.

Cons:

Pricing can be complex, with potential cost increases for advanced features or large-scale usage.
Usage in Europe may require GDPR compliance considerations.

3️⃣ Requesty

Pros:

Supports a wide range of providers through a single API key.
Provides detailed information to improve observability and cost tracking.
Offers cost savings through efficient request management.

Cons:

Smart routing classification model can be complex to configure initially.
Latency overhead from the classification model may impact ultra-low-latency applications.
Usage in Europe may require GDPR compliance considerations.

4️⃣ NotDiamond

Pros:

Uses a Random Forest Classifier to intelligently route prompts to the most suitable model.
Allows tuning of the cost-performance tradeoff through a threshold parameter.
Supports training custom routers for hyper-personalized routing tailored to specific applications.

Cons:

Custom router training can be complex to set up.
Limited public documentation on pricing, which may complicate budgeting.
Usage in Europe may require GDPR compliance considerations.

5️⃣ OpenRouter

Pros:

Provides a unified API to access multiple LLM providers.
Supports a wide range of models from various providers.
Offers higher availability with fallback options.

Cons:

Some concerns around data privacy and ownership of user-provided information.
Usage in Europe may require GDPR compliance considerations.

If you’re looking for a seamless way to optimize cost, speed, and compliance without getting buried in infrastructure, a LLM Router is a must-have.

🚀 Make your LLM workflows faster, safer, and smarter from day one.

Choosing the Right AI Provider in Europe 🇪🇺

Asmae Elazrak — Fri, 20 Jun 2025 12:53:31 +0000

Artificial Intelligence (AI) is transforming industries across Europe, from healthcare to finance to public services. In 2024, French AI startups alone raised over €1.3 billion, followed by Germany at €910 million and the UK at €318 million. As more companies prioritize data sovereignty and GDPR compliance, selecting the right European AI provider has never been more critical.

But here’s the key question: The European AI landscape is booming, but how do you choose the right provider?

The answer might be: don’t.

Locking yourself into a single AI provider can limit your flexibility, increase your costs, and put your uptime at risk.

In this article, we’ll break down the pros and cons of leading European AI providers and show how multi-provider routing with Cortecs helps you stay agile and resilient.

🗺️ Comparison: European AI Providers
- OVH: The French Cloud Pioneer
- Scaleway: Sustainable AI Infrastructure
- IONOS: The German AI Model Hub
- Mistral AI: Europe's LLM Champion
- Nebius: The GPU Price Disruptor
- T-Systems: Enterprise-Grade Digital Solutions Provider
✨ Unified Access: Bringing All Providers Together
🔗 Cortecs: Europe’s AI Gateway
🔍 Summary Table
💬 Final Thoughts
📖 Further Reading

🗺️ Comparison: European AI Providers

Here’s a quick overview of the major players in Europe’s AI landscape:

OVH: The French Cloud Pioneer

OVH stands as one of Europe's most established cloud providers, offering a comprehensive suite of AI and machine learning services with a strong emphasis on data sovereignty.

Pros:

Broad range of products and scalable infrastructure
Competitive pricing, especially for VPS and cloud hosting
Excellent customization and advanced developer features

Cons:

Occasional reliability issues and unexpected service shutdowns
Complex and sometimes buggy user interface
No refunds or money-back guarantees

Best for:

Developers, sysadmins, and technically skilled users who can manage without reliable support
Businesses needing low-cost, customizable VPS or cloud hosting in Europe
Budget-conscious users who prioritize price and flexibility

Scaleway: Sustainable AI Infrastructure

Scaleway positions itself as Europe's sustainable cloud provider, focusing on environmental responsibility while delivering high-performance AI infrastructure

Pros:

Self-provisioning services with an easy-to-use platform, enabling better billing predictability
Responsive support team, often resolving issues within a few hours
Comprehensive image library for fast setup and deployment

Cons:

Pricing changes reported on certain services
Poor handling of payment issues
Limited server and hardware options compared to larger providers

Best for:

Startups and developers need quick, user-friendly deployment with flexible scaling
Teams looking for affordable European cloud services with a solid developer experience
Users who can carefully manage payment terms and account balances

IONOS: The German AI Model Hub

IONOS has launched Germany's first multimodal AI platform, focusing on making AI accessible to small and medium-sized businesses
Pros:

Easy-to-use user dashboard
Strong security and DDoS protection, including 24/7 malware scanning
Consistent server uptime performance

Cons:

Limited customization options
Expensive signup fees for some services
Comparatively high renewal rates

Best for:

Businesses prioritizing strong security and uptime guarantees
Teams looking for a simple, user-friendly cloud dashboard
Organizations that need reliable uptime and solid DDoS protection

Mistral AI: Europe's LLM Champion

Mistral AI is primarily focused on AI models and services, rather than traditional cloud infrastructure like the other providers, and is establishing itself as a formidable competitor to OpenAI.

Pros:

Customizable structure for industry-specific solutions
Multilingual support, catering to diverse and global markets
Offering flexibility and transparency for developers

Cons:

Higher upfront integration costs
Requires AI and machine learning expertise for effective implementation
Restriction to their Mistral models, limiting the choice

Best for:

Teams that don’t require flexibility to choose external models like LLaMA or DeepSeek
Companies operating in multilingual environments
Organizations that can handle higher upfront costs in exchange for model flexibility and control

Nebius: The GPU Price Disruptor

Nebius has positioned itself as a cost-effective alternative to traditional cloud providers, offering significant savings on GPU-intensive AI workloads.

Pros:

High performance and cost-effectiveness for AI inference
Flexible, user-friendly environment for working with open-source models
Managed Kubernetes with auto-healing and container orchestration

Cons:

Costs can grow quickly if not carefully monitored
Less scalable compared to larger, more established providers
Models may be deleted occasionally, which can disrupt ongoing projects

Best for:

Teams needing fast, cost-efficient AI inference
Companies looking for an easy-to-use platform without deep MLOps expertise
Organizations open to working with a newer, fast-growing provider

T-Systems: Enterprise-Grade Digital Solutions Provider

A leading European IT and digital services company, trusted by large enterprises and regulated industries.

Pros:

Wide range of IT services, including cloud, infrastructure, and managed hosting
Secure data storage with encryption and strong security practices
Scalable solutions with reliable performance

Cons:

Higher pricing compared to some competitors, especially for smaller businesses
Complex services may require significant technical expertise and onboarding time
Issues with scaling usage limits or increasing capacity

Best for:

Enterprises needing secure, scalable, and full-service IT solutions
Organizations focused on data security and European compliance
Industry players with in-house technical teams able to manage complex deployments

✨ Unified Access: Bringing All Providers Together

Instead of locking into one provider, what if you could:

Mix and match providers on demand
Optimize for cost, speed, or uptime with simple API-level changes
Automatically fail over to the best available option during outages

👉 That’s exactly what Cortecs does.

🔗 Cortecs: Europe’s AI Gateway

Cortecs is a platform that connects you to multiple European AI providers through:

Serverless Smart Routing: Send one request, and Cortecs automatically selects the fastest, most cost-effective, or most resilient provider based on your preferences.
Dedicated Instances: Launch fully customizable LLM deployments with guaranteed compute and full control.

Why Cortecs?

✅ One Unified API
✅ Provider Flexibility
✅ Optimize for Cost, Speed, or Resiliency
✅ Built-in Failover

Cortecs isn’t another AI provider; it’s the control layer that makes your AI stack more resilient, efficient, and adaptable.

🔍 Summary Table

Provider	Best For	Pros	Cons
OVH	Developers, budget-focused users	Cheap, customizable	Occasional outages
Scaleway	Startups, eco-conscious teams	Easy to use, responsive support	Payment issues
IONOS	Security-focused SMBs	Excellent uptime, simple UI	Expensive fees
Mistral AI	AI-heavy, multilingual projects	High accuracy	High upfront cost
Nebius	GPU-intensive workloads	Cost-efficient	Scaling limitations
T-Systems	Large enterprises, regulated industries	Full-service, secure	Complex, pricey

💬 Final Thoughts

Choosing a European AI provider doesn’t have to be a long-term commitment.

With Cortecs, you can:

Stay flexible
Avoid downtime
Optimize your AI costs and performance on the fly

Whether you need serverless smart routing, dedicated deployments, or both, Cortecs helps you build AI systems that are smarter, faster, and future-proof.

📖 Further Reading

Building Intelligent Multi-Agent Systems with CrewAI

Eva Jagodic — Tue, 04 Feb 2025 13:46:47 +0000

Introduction

Multi-agent systems (MAS) for large language models (LLMs) represent a significant advancement in AI-driven problem-solving. Rather than operating in isolation, LLM agents collaborate, exchange information, and make dynamic decisions to achieve complex objectives efficiently.

From document analysis and automated research to content generation and customer support, LLM-based MAS revolutionizes workflows by offering scalability, adaptability, and efficiency. Their ability to interact and coordinate dynamically enables efficient collaboration across multiple AI-driven tasks, optimizing performance in real-world applications.

In this tutorial, we'll explore LLM multi-agent fundamentals, real-world applications, and guide you step-by-step in building your own intelligent agent system. We will be using CrewAI, an open source framework for orchestrating autonomous AI agents and we will power it with Cortecs LLM workers. Get ready to bring AI collaboration to life!

Understanding Multi-Agent Systems

What Are Multi-Agent Systems?

An LLM-based MAS consists of multiple AI agents that interact in a shared environment to process language tasks efficiently. These agents, powered by large language models, collaborate by exchanging information, analysing data, and generating responses.

Key Components of LLM Multi-Agent Systems

LLM Agents – AI-driven entities that process and generate text based on specific roles and objectives.
Environment – The digital space where agents operate, such as document repositories, chat interfaces, or APIs.
Communication – How agents share insights, using structured prompts, shared memory, or message-passing frameworks.
Decision-Making – The strategies agents use to determine responses, often involving chain-of-thought reasoning or reinforcement learning.

Benefits of LLM Multi-Agent Systems

Scalability – Handles large-scale text processing tasks efficiently.
Collaboration – Multiple agents can divide and refine tasks for better accuracy.
Adaptability – Easily integrates into various workflows and industries.
Efficiency – Automates complex workflows with minimal human intervention.

Applications of LLM Multi-Agent Systems

Automated Research – Agents collaborate to summarize, fact-check, and analyse documents.
Content Generation – Teams of AI writers draft, edit, and refine articles.
Customer Support – AI agents handle inquiries, escalate issues, and personalize responses.
Data Extraction & Analysis – AI parses structured and unstructured text for insights.

Understanding these fundamentals prepares us to implement an LLM-based MAS!

Setting Up the Development Environment

Let's install the required libraries for this example:

pip install crewai crewai-tools uv

We'll use crewai and its extension crewai-tools to orchestrate our agents, while the uv package manager helps run our crews.

Once the libraries are installed, we will create an example crew with:

crewai create crew example_crew

When prompted for a hardware provider, we can select OpenAI from the listed models. Since Cortecs LLM workers are OpenAI-compatible, we'll use our Cortecs credentials. First, create an account on Cortecs.ai, then visit your profile page to generate access credentials.

export CORTECS_CLIENT_ID=<YOUR_CORTECS_CLIENT_ID>
export CORTECS_CLIENT_SECRET=<YOUR_CLIENT_SECRET>
export OPENAI_API_KEY=<YOUR_CORTECS_API_KEY>

Next, select a model for your crew. We recommend using an 🔵 Instantly Provisioned model like cortecs/phi-4-FP8-Dynamic. The openai/ prefix indicates we're using an OpenAI-compatible endpoint.

export MODEL=openai/cortecs/phi-4-FP8-Dynamic

Adding Dynamic Provisioning to Your Example Crew

Let's dynamically provision an LLM worker to power our crew.

We will navigate to example_crew/src/example_crew/crew.py and modify the ExampleCrew class with these two key functions:

start_llm
- This function initializes the Cortecs client and starts an LLM Worker of the desired model. We'll add it to the ExampleCrew class's __init__ function to ensure it runs when the crew starts.
stop_and_delete_llm()
- To maximize cost efficiency, this function shuts down our resources when the crew completes its execution. We'll decorate it with the @after_kickoff hook to ensure proper cleanup.

Here's the modified ExampleCrew class implementation:

import os
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task, after_kickoff #Add after_kickoff import
from cortecs_py import Cortecs

@CrewBase
class ExampleCrew:

    def __init__(self):
        self.start_llm()

    def start_llm(self):
        self.cortecs_client = Cortecs()
        self.model = os.environ["MODEL"].removeprefix("openai/")

        print(f"Starting model {self.model}...")
        self.instance = self.cortecs_client.ensure_instance(self.model)
        os.environ["OPENAI_API_BASE"] = self.instance.base_url

    @after_kickoff
    def stop_and_delete_llm(self, result):
        self.cortecs_client.stop(self.instance.instance_id)
        self.cortecs_client.delete(self.instance.instance_id)
        print(f"Model {self.model} stopped and deleted.")

    #The rest of the ExampleCrew stays the same...

You can further customize your crew by modifying agents.yaml, tasks.yaml and crew.py, or by following additional examples in the crewai docs.

Before running our crew, we will add the cortecs-py dependency to our pyproject file in example_crew/pyproject.toml

dependencies = [
    "crewai[tools]>=0.100.1,<1.0.0",
    "cortecs-py>=0.1.0" #Add this line
]

Running Your Crew

To run our crew, we will first navigate to the project directory (example_crew/) and install the dependencies by running:

crewai install

Then we can execute the crew with:

crewai run

You'll see that an LLM worker instance starts up. Once it's ready, the crew executes its task. Afterward, the instance automatically stops and gets deleted.

The generated report will look similar to this:

# Comprehensive Report on Advances in Large Language Model (LLM) Technologies

## 1. Advanced Fine-Tuning Techniques

By 2025, significant advancements in fine-tuning techniques have marked a turning point for Large Language Models (LLMs). These improvements include few-shot and zero-shot learning, enabling models to perform new tasks with minimal task-specific data. Few-shot learning takes advantage of a minimal number of examples, allowing the model to generalize well across similar tasks. Zero-shot learning, on the other hand, lets the model tackle tasks without any task-specific training data. These techniques reduce dependency on extensive labeled datasets and expedite adaptation to diverse applications, offering flexibility and efficiency.

## 2. Multi-Modal Capabilities

LLMs have evolved to incorporate multi-modal data, effectively integrating information from text, images, video, and audio. This enhancement broadens their application across various sectors. In healthcare, multi-modal LLMs facilitate complex case studies by correlating clinical text with imagery and patient history. In autonomous systems, they enhance decision-making by combining sensory data with textual inputs. This synergy results in richer, more contextual insights, enabling more comprehensive understanding and interaction within environments.

...

Conclusion

In this tutorial, we've explored how to build a multi-agent system using CrewAI and Cortecs LLM workers. We covered the fundamentals of LLM-based multi-agent systems, from understanding their key components to practical implementation. We've learned how to set up your development environment, dynamically provision LLM workers, and create a functional crew that can efficiently handle complex tasks.

To dive deeper into multi-agent systems, check out the CrewAI documentation and explore the Cortecs platform. Happy building! 🚀✨

All Too Swift: Real-Time Reddit Processing Simplified with AI

Asmae Elazrak — Wed, 22 Jan 2025 08:37:49 +0000

What if you could instantly spot and respond to millions of Reddit comments, all in real-time? No delays, no limits—just fast, seamless insights as they happen.

In this guide, we’ll show you how to set up a real-time data processing system using powerful AI models with LLM Workers. To bring it to life, we’ll use a Taylor Swift bot as an example, a bot that scans Reddit comments in real-time to find and respond to discussions about Taylor Swift. ✨

Table of Contents 🗂️

The Power of Real-Time Data Processing and Dedicated Inference
Building the Reddit Bot
- Step 1: Set Up Your Environment
- Step 2: Setting Up Reddit and Initializing Cortecs Model
- Step 3: Define the Classification and Response Chains
- Step 4: Stream and Process Reddit Comments in Real-Time
Conclusion

The Power of Real-Time Data Processing and Dedicated Inference

We all know that real-time applications demand high performance, especially when you're dealing with large amounts of data. However, the challenge of processing data quickly and efficiently is easily resolved by using dedicated inference and this is where Cortecs really shines.

By leveraging Cortecs' dedicated inference models, you get a system that:

Handles High Volumes: Process hundreds of requests per second without throttling with the ability to scale seamlessly using LLM Workers dedicated to specific tasks.
Maintains Consistency: With dedicated resources like LLM Workers, you can count on stable latency, no matter the load.
Is Easy to Implement: You don’t need to worry about complex infrastructure or performance fine-tuning; it just works.

Why Dedicated Inference Matters

Traditional inference models often share resources with other users, leading to bottlenecks during peak times. With dedicated inference, you get exclusive access to computational resources, ensuring that your system remains reliable and fast even under heavy loads. This makes it ideal for applications like fraud detection, customer service automation, and content moderation.

Building the Reddit Bot 🛠️

Step 1: Set Up Your Environment

Before diving into the code, you need to install a few libraries:

pip install praw langchain-core cortecs-py

These libraries serve the following purposes:

praw: The Python Reddit API Wrapper.
langchain: A framework that helps you work with language models.
cortecs: The platform that provides high-performance models for real-time inference.

After that, to authenticate and access the Cortecs models, you need to create an account at Cortecs.ai.
Once you’ve signed up, go to your profile page, generate your access credentials, and set them as environment variables in your code.

import os

# Set the Cortecs API credentials as environment variables
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ["CORTECS_CLIENT_ID"] = "your_cortecs_client_id"
os.environ["CORTECS_CLIENT_SECRET"] = "your_cortecs_client_secret"

Step 2: Setting up Reddit and Initializing Cortecs Model

Then, you'll need to create a Reddit account and register your application to get API access. To do this, visit Reddit's API page and create a new application to obtain your Client ID and Client Secret.

Once you have your Client ID and Client Secret, you can initialize the Reddit API client and set up the Cortecs model for real-time inference as follows

import praw
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from cortecs_py import Cortecs
from cortecs_py.integrations.langchain import DedicatedLLM

if __name__ == '__main__':
   # Choose the model for real-time inference
   model_name = 'cortecs/phi-4-FP8-Dynamic'
   cortecs = Cortecs()
   # Set up Reddit API credentials
   reddit = praw.Reddit(
       client_id="YOUR_CLIENT_ID",       # Replace with your Client ID
       client_secret="YOUR_CLIENT_SECRET",  # Replace with your Client Secret
       user_agent="YOUR_USER_AGENT"     # Replace with your User Agent
)

Note that model_name refers to the model you choose for inference. In this example, we’ve selected the cortecs/phi-4-FP8-Dynamic model, which is suitable for many general-purpose tasks. You can find a list of models here.

Step 3: Define the Classification and Response Chains

In this step, we initialize the model for real-time processing and define the classification and response chains that will be used to process the posts and generate responses.

    with DedicatedLLM(cortecs, model_name, context_length=1500, temperature=0.) as llm:  
        prompt = ChatPromptTemplate.from_template("""
        Given the reddit post below, classify it as either `Art`, `Finance`, `Science`, `Taylor Swift` or `Other`.
        Do not provide an explanation.

        {channel}: {title}\n Classification:""")
        classification_chain = prompt | llm | StrOutputParser()

        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are the biggest Taylor Swift fan."),
            ("user", "Respond to this post:\n {comment}")
        ])
        response_chain = prompt | llm

So we defined two main tasks (or "chains"):

Classification Chain: The first prompt defines the classification logic for Reddit posts. It takes the post title and subreddit as input and classifies the post into categories such as Art, Finance, Science, Taylor Swift, or Other. The StrOutputParser() ensures that the output is in the desired format.
Response Chain: The second prompt generates a response if the post is about Taylor Swift. We use a system message to indicate that the model should behave as a "biggest Taylor Swift fan" and a user message to define the format for the response.

Step 4: Stream and Process Reddit Comments in Real-Time

With the classification and response chains in place, the next step is to continuously stream Reddit comments and process them in real time. This allows the bot to react to posts as they come in.

        # scan reddit in realtime 
        for post in reddit.subreddit("all").stream.comments():
            topic = classification_chain.invoke({'channel': post.subreddit_name_prefixed, 'title': post.link_title})
            print(f'{post.subreddit_name_prefixed} {post.link_title}')
            if topic == 'Taylor Swift':
                response = response_chain.invoke({'comment': post.body})
                print(post.body + '\n---> ' + response.content)

This code:

Stream Comments: Continuously monitor Reddit for new comments.
Classify Posts: Use the classification chain to categorize each post.
Respond to Specific Topics: If a post is classified as related to Taylor Swift, the bot responds with a pre-defined message.

While running the code, you can monitor the progression of the model execution on the console page of the Cortecs web interface, as shown in the image below.

Conclusion 🎉

Building real-time applications can be challenging, but with the right tools, they become much more manageable. By using LLM Workers, you can process high volumes of data without compromising performance. Whether you're classifying content, detecting trends, or automating responses, the approach shown here can be easily adapted to fit your needs.

Now, it’s your turn to try it out. Start experimenting with real-time data processing and explore the possibilities! 🚀

Streamline Your Batch Jobs: The Power of LLM Workers 🤖

Asmae Elazrak — Fri, 17 Jan 2025 12:15:35 +0000

Have you ever felt overwhelmed by the sheer volume of data you need to process or wished you could automate repetitive tasks effortlessly?

Imagine being able to summarize hundreds of research papers in minutes, extract critical insights from vast datasets, or streamline tedious workflows.
In this article, we’ll explore how Cortecs helps you unlock the full potential of large language models (LLMs) with ease, scalability, and cost-efficiency. Specifically, we’ll focus on how Cortecs simplifies handling batch jobs and massive data workloads, guiding you through everything from environment setup to seamless data processing at scale.

Let’s dive in and see how Cortecs can transform your AI journey.

Table of Contents 📚

What is Cortecs?
Setting Up Your Environment
Batch Processing with Cortecs-py
- Step 1: Loading Documents
- Step 2: Creating a Prompt
- Step 3: Batch Processing

What is Cortecs?

Cortecs is a platform that gives you on-demand access to powerful LLMs running on dedicated servers. This ensures maximum performance, reliability, and scalability for your AI tasks.

Cortecs lets you manage LLM Workers for large-scale processing, offloading tasks to specialized AI workers for high throughput and faster processing of massive datasets⚡.

Dedicated Servers for Fast AI Processing: With Cortecs, you get exclusive access to dedicated servers, meaning faster, more efficient AI processing without the competition for resources 🚀.
Easy to Set Up and Use: Cortecs is designed for simplicity. It integrates seamlessly with your existing workflows, so you can start using LLMs right away with minimal setup.
Scalable and Cost-Effective: Cortecs scales with your needs, offering dynamic resource allocation that ensures you only pay for what you use💰, keeping costs low.

Setting Up Your Environment 🛠️

Before diving into batch processing, you'll need to set up your environment. First, register at Cortecs.ai and create your access credentials on your profile page 📋.

Once you have your credentials, set them as environment variables in your code:

import os

# Set the Cortecs API credentials as environment variables
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ["CORTECS_CLIENT_ID"] = "your_cortecs_client_id"
os.environ["CORTECS_CLIENT_SECRET"] = "your_cortecs_client_secret"

You'll also need to install several Python libraries to run the example below. These can be easily installed via pip. Here are the commands to install the required packages:

!pip install langchain
!pip install langchain-community
!pip install cortecs-py
!pip install arxiv
!pip install pymupdf

Batch Processing with Cortecs-py 🔄

Cortecs-py is a lightweight Python wrapper for the Cortecs REST API. It provides you with the tools to dynamically manage your AI instances directly from your workflow, making batch processing seamless and efficient.

Combined with LangChain a versatile framework for LLM workflows, you can unlock incredible efficiency and power.

Let’s explore a real-world example of using Cortecs-py for batch processing

Step 1: Loading Documents 📄

After adding the necessary credentials and installing the required libraries, we’ll retrieve research papers from Arxiv using the ArxivLoader, focusing on a query like 'Reasoning.'

from langchain_community.document_loaders import ArxivLoader
from cortecs_py.client import Cortecs
from cortecs_py.integrations.langchain import DedicatedLLM

# Initialize Cortecs client
cortecs = Cortecs()

# Load documents
loader = ArxivLoader(
    query="reasoning",
    load_max_docs=40,
    get_ful_documents=True,
    doc_content_chars_max=25000,  
    load_all_available_meta=False
)
docs = loader.load()

Step 2: Creating a Prompt 💬

Then, we’ll create a simple prompt that asks the model to explain the document content in plain language.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("{text}\n\nExplain to me like I'm five:")

Step 3: Batch Processing 🏭

With Cortecs-py, batch processing is straightforward. The DedicatedLLM class makes it even easier as it automatically takes care of starting and stopping your infrastructure.

with DedicatedLLM(client=cortecs, model_name='cortecs/phi-4-FP8-Dynamic') as llm:
    chain = prompt | llm

    print("Processing data batch-wise ...")
    summaries = chain.batch([{"text": doc.page_content} for doc in docs])

    for summary in summaries:
        print(summary.content + '-------\n\n\n')

💡 Remark: Don't forget to choose a model that supports the required context length for your use case. In this example, we are using the phi-4-FP8-Dynamic model.
You can explore the full range of models offered by Cortecs here.

Below is an example of the batch-processing output 📊:

This simple pipeline summarized 224,200 input tokens into 12,900 output tokens in just 55 seconds, proving the efficiency of batch processing with dedicated inference.

When comparing the cost of using Cortecs for summarization tasks to other solutions like Fireworks or cloud-based services, Cortecs stands out for its cost efficiency, with no unpredictable costs. This makes it an ideal solution for companies looking to leverage AI without breaking the bank🏦.

Ready to transform your workflows and elevate your AI projects?

Discover how Cortecs can help you unlock the power of Large Language Models (LLMs) while maintaining cost efficiency🚀.

LLMs for Big Data

Marko Arnauto — Mon, 13 Jan 2025 12:00:19 +0000

We all love our chatbots, but when it comes to heavy-loads, they just don’t cut it. If you need to analyze thousands of documents at once, serverless inference — the go-to for chat applications — quickly shows its (rate) limits.

One Model — Many Users

Imagine working in a shared co-working space: it’s convenient, but your productivity depends on how crowded the space is. Similarly, serverless models like OpenAI, Anthropic or Groq rely on shared infrastructure, where performance fluctuates based on how many users are competing for resources. Strict rate limits, like Groq’s 7,000 tokens per minute, can grind progress to a halt.

Dedicated Compute — One Model per User

In contrast, dedicated inference allocates compute resources exclusively to a single user or application. This ensures predictable and consistent performance, as the only limiting factor is the computational capacity of the allocated GPUs. According to Fireworks.ai, a leading inference provider,

Graduating from serverless to on-demand deployments starts to make sense economically when you are running ~100k+ tokens per minute.

There are typically no rate limits on throughput. Billing for dedicated inference is time-based, calculated per hour or minute depending on the platform. While dedicated inference is well-suited for high-throughput, it involves a tedious setup process as well as the risk of overpaying due to idle times.

Tedious Setup

Deploying dedicated inference requires careful preparation. First, you need to rent suitable hardware to support your chosen model. Next, an inference engine such as vLLM must be configured to match the model’s requirements. Finally, secure access must be established via a TLS-encrypted connection to ensure encrypted communication. According to Philipp Schmidt, the co-founder of Hugging Face, you need one full-time developer to setup and maintain such a system.

Idle Times

Time-based billing makes cost-projections easier but on the other hand idle resources can quickly become a cost-overhead. Dedicated inference is cost-effective only when GPUs are busy. To avoid unnecessary expenses, the system should be turned off when not in use. Managing this manually can be tedious and error-prone.

LLM Workers to the Rescue

To address the downsides of dedicated inference, providers like Google, and Cortecs offer dedicated LLM workers.Without any additional configurations these workers are started and stopped on-demand — avoiding setup overhead and idle times. The required hardware is allocated, the inference engine is configured, and API connections are established all in the background. Once the workload is completed, the worker shuts down automatically.

Example

As I’m involved in the cortecs project I’m going to showcase it using our library. It can be installed with pip.

pip install cortecs-py

We will use the OpenAI python library to access the model.

pip install openai

Next, register at cortecs.ai and create your access credentials at the profile page. Then set them as environment variables.

export OPENAI_API_KEY="Your cortecs api key" export CORTECS_CLIENT_ID="Your cortecs id" export CORTECS_CLIENT_SECRET="Your cortecs secret"

It’s time to choose a model. We selected a model supporting 🔵 instant provisioning which was phi-4-FP8-Dynamic. Models that support instant provisioning enable a warm start, eliminating provisioning latency — perfect for this demonstration.

from openai import OpenAI
from cortecs_py import Cortecs

cortecs = Cortecs()
my_model = 'cortecs/phi-4-FP8-Dynamic'

# Start a new instance
my_instance = cortecs.ensure_instance(my_model)
client = OpenAI(base_url=my_instance.base_url)

completion = client.chat.completions.create(
  model=my_model,
  messages=[
    {"role": "user", "content": "Write a joke about LLMs."}
  ]
)
print(completion.choices[0].message.content)
# Stop the instance
cortecs.stop(my_instance.instance_id)

All provisioning complexity is abstracted by cortecs.ensure_instance(my_model) and cortecs.stop(my_instance.instance_id). Between these two lines, you can execute arbitrary inference tasks—whether it's generating a simple joke about LLMs or producing billions of words.

LLM Workers are a game-changer for large-scale data analysis. With no need to manage complex compute clusters, they enable seamless big data analysis and generation without the typical concerns of rate limits or exploding inference costs.
Imagine a future where LLM Workers handle highly complex tasks, such as proving mathematical theorems or executing reasoning-intensive operations. You could launch a worker, let it run at full GPU utilization to tackle the problem, and have it shut itself down automatically upon completion. The potential is enormous, and this tutorial demonstrates how to dynamically provision LLM Workers for high-performance AI tasks.

DEV Community: cortecs

OpenCode vs Claude Code

⚙️ Claude Code: Great for Productivity, Limited Control

💡 OpenCode: The Open Source Competitor

🇪🇺 OpenCode + Cortecs: EU compliance

🧪 Getting Started?

Comparing LLM Routers

📦 What is an LLM Router?

At a high level, an LLM router:

⚙️ Why Do You Need an LLM Router?

With an LLM router, you can:

🧐 Comparison

1️⃣ Cortecs

2️⃣ Withmartian

3️⃣ Requesty

4️⃣ NotDiamond

5️⃣ OpenRouter

Choosing the Right AI Provider in Europe 🇪🇺

Table of Contents

🗺️ Comparison: European AI Providers

OVH: The French Cloud Pioneer

Scaleway: Sustainable AI Infrastructure

IONOS: The German AI Model Hub

Mistral AI: Europe's LLM Champion

Nebius: The GPU Price Disruptor

T-Systems: Enterprise-Grade Digital Solutions Provider

✨ Unified Access: Bringing All Providers Together

🔗 Cortecs: Europe’s AI Gateway

🔍 Summary Table

💬 Final Thoughts

📖 Further Reading

Building Intelligent Multi-Agent Systems with CrewAI

Introduction

Table of Contents:

Understanding Multi-Agent Systems

What Are Multi-Agent Systems?

Key Components of LLM Multi-Agent Systems

Benefits of LLM Multi-Agent Systems

Applications of LLM Multi-Agent Systems

Setting Up the Development Environment

Adding Dynamic Provisioning to Your Example Crew

Running Your Crew

Conclusion

All Too Swift: Real-Time Reddit Processing Simplified with AI

Table of Contents 🗂️

The Power of Real-Time Data Processing and Dedicated Inference

Why Dedicated Inference Matters

Building the Reddit Bot 🛠️

Step 1: Set Up Your Environment

Step 2: Setting up Reddit and Initializing Cortecs Model

Step 3: Define the Classification and Response Chains

Step 4: Stream and Process Reddit Comments in Real-Time

Conclusion 🎉

Streamline Your Batch Jobs: The Power of LLM Workers 🤖

Table of Contents 📚

What is Cortecs?

Setting Up Your Environment 🛠️

Batch Processing with Cortecs-py 🔄

Step 1: Loading Documents 📄

Step 2: Creating a Prompt 💬

Step 3: Batch Processing 🏭

LLMs for Big Data

One Model — Many Users

Dedicated Compute — One Model per User

Tedious Setup

Idle Times

LLM Workers to the Rescue

Example