<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: cortecs</title>
    <description>The latest articles on DEV Community by cortecs (@cortecs).</description>
    <link>https://dev.to/cortecs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10180%2F1e7ba1da-bc26-4910-95a9-2d5a30e47b55.png</url>
      <title>DEV Community: cortecs</title>
      <link>https://dev.to/cortecs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cortecs"/>
    <language>en</language>
    <item>
      <title>OpenCode vs Claude Code</title>
      <dc:creator>Asmae Elazrak</dc:creator>
      <pubDate>Wed, 29 Oct 2025 10:09:19 +0000</pubDate>
      <link>https://dev.to/cortecs/opencode-claude-code-1f0g</link>
      <guid>https://dev.to/cortecs/opencode-claude-code-1f0g</guid>
      <description>&lt;p&gt;AI coding assistants are becoming indispensable for developers, streamlining tasks from writing to debugging code. But as these tools proliferate, a critical question arises: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How much control do you really have over where your code goes and who can access it❓&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not all AI coding solutions offer the same level of transparency or control, and for organizations bound by strict &lt;strong&gt;compliance&lt;/strong&gt; frameworks, this difference can have &lt;strong&gt;serious legal and operational consequences&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Claude Code: Great for Productivity, Limited Control
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.claude.com/product/claude-code" rel="noopener noreferrer"&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/a&gt;, developed by &lt;a href="https://www.anthropic.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;, is a &lt;strong&gt;terminal-based AI coding assistant&lt;/strong&gt; that integrates directly into your workflow. It helps with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code completion&lt;/li&gt;
&lt;li&gt;Error detection&lt;/li&gt;
&lt;li&gt;Documentation generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s user-friendly, smart, and efficient — but in many cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💰 &lt;strong&gt;Fixed pricing&lt;/strong&gt;: Commitment up to 200€/month&lt;/li&gt;
&lt;li&gt;⚙️ &lt;strong&gt;Opt-out needed&lt;/strong&gt;: Ensure your source code isn’t used for training&lt;/li&gt;
&lt;li&gt;🔒 &lt;strong&gt;Limited control&lt;/strong&gt;: Developers have &lt;strong&gt;limited control&lt;/strong&gt; over where their code travels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That &lt;strong&gt;lack of control&lt;/strong&gt; can become a real problem for professionals who must meet strict data policies — whether set by their company, clients, or compliance frameworks.&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 OpenCode: The Open Source Competitor
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenCode&lt;/strong&gt;&lt;/a&gt; is an &lt;strong&gt;open-source, terminal-based AI coding assistant&lt;/strong&gt; created to give developers the freedom, flexibility, and compliance missing from closed solutions. It enables developers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write, debug, and refactor code using natural language&lt;/li&gt;
&lt;li&gt;Integrate any language model of their choice (Claude, GPT, Mistral, Llama, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 &lt;strong&gt;Open Source:&lt;/strong&gt; Fully transparent and community-audited&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Bring Your Own Model (BYOM):&lt;/strong&gt; use Claude or any other LLM&lt;/li&gt;
&lt;li&gt;💸 &lt;strong&gt;Flexible pricing:&lt;/strong&gt; Pay only for tokens used — no flat monthly commitment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many teams — especially those operating under &lt;strong&gt;strict privacy or compliance policies&lt;/strong&gt; — need to ensure their data is processed according to &lt;strong&gt;internal company rules&lt;/strong&gt;. This often means keeping all activity within the EU and preventing any source code from being used for model training.&lt;/p&gt;

&lt;p&gt;Here’s a quick overview of how to connect &lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenCode&lt;/strong&gt;&lt;/a&gt; with European LLM endpoints to meet those requirements.&lt;/p&gt;




&lt;h2&gt;
  
  
  🇪🇺 OpenCode + Cortecs: EU compliance
&lt;/h2&gt;

&lt;p&gt;When paired with &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;Cortecs&lt;/strong&gt;&lt;/a&gt;, a &lt;strong&gt;European LLM router&lt;/strong&gt;, it allows you to route AI requests to &lt;strong&gt;GDPR-compliant LLM endpoints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53t8ltrpljke9vynr306.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53t8ltrpljke9vynr306.PNG" alt="opencode+cortecs logo" width="700" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🧰 &lt;strong&gt;Benefits Include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Residency in Europe:&lt;/strong&gt; Your code and queries never leave EU jurisdiction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Training by Default:&lt;/strong&gt; None of your data is used to train or fine-tune models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-In GDPR Compliance:&lt;/strong&gt; Privacy-first design from the start&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration:&lt;/strong&gt; Works with your existing local or cloud infrastructure, including VS Code...etc&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧪 Getting Started?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Install &lt;strong&gt;&lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt;&lt;/strong&gt; from the project repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure &lt;strong&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;&lt;/strong&gt; as your model router (refer to the &lt;a href="https://docs.cortecs.ai/integration-examples/coding/opencode" rel="noopener noreferrer"&gt;Cortecs Docs&lt;/a&gt; for setup details).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose your GDPR-compliant &lt;a href="https://cortecs.ai/serverlessModels" rel="noopener noreferrer"&gt;model endpoint&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In minutes, you’ll have a secure, privacy-respecting AI assistant fully integrated into your terminal workflow.&lt;/p&gt;

&lt;p&gt;In other words, &lt;strong&gt;OpenCode + Cortecs&lt;/strong&gt; gives developers &lt;strong&gt;full control over where and how data is processed&lt;/strong&gt;, without sacrificing AI coding productivity 🚀.&lt;/p&gt;

</description>
      <category>cortecs</category>
      <category>llm</category>
      <category>terminal</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Comparing LLM Routers</title>
      <dc:creator>Asmae Elazrak</dc:creator>
      <pubDate>Wed, 16 Jul 2025 10:26:32 +0000</pubDate>
      <link>https://dev.to/cortecs/comparing-llm-routers-54dl</link>
      <guid>https://dev.to/cortecs/comparing-llm-routers-54dl</guid>
      <description>&lt;p&gt;Large Language Models (LLMs) are rapidly reshaping the tech landscape, transforming industries from AI-powered assistants and summarization tools to smart customer support and beyond.&lt;/p&gt;

&lt;p&gt;In today’s fast-moving AI world, developers need access to multiple models from different providers to serve diverse use cases.&lt;/p&gt;

&lt;p&gt;The challenge isn’t just &lt;em&gt;which&lt;/em&gt; model to use, it’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do you balance reliability, cost, speed, and data privacy while using LLMs, without becoming an infrastructure engineer❓&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At the heart of this problem lies the &lt;strong&gt;LLM router&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq0fsatas1v0pajbhmbi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq0fsatas1v0pajbhmbi.png" alt="Image illustrating how an LLM router directs requests to multiple AI model providers" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 What is an LLM Router?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;LLM router&lt;/strong&gt; is like a smart traffic controller between your application and various LLM providers.&lt;/p&gt;

&lt;p&gt;It helps decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model should handle each request&lt;/li&gt;
&lt;li&gt;How to handle provider failures or slow responses&lt;/li&gt;
&lt;li&gt;How to balance cost, speed, reliability, and compliance across providers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  At a high level, an LLM router:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Accepts your inference request (like a chat prompt or code generation task)&lt;/li&gt;
&lt;li&gt;Evaluates available LLM providers (OpenAI, Anthropic, Nebius, etc.)&lt;/li&gt;
&lt;li&gt;Chooses the best provider based on real-time factors like cost, latency, and reliability&lt;/li&gt;
&lt;li&gt;Sends the request to the selected provider and returns the response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as a &lt;strong&gt;smart, adaptable dispatcher&lt;/strong&gt; that shields you from the complexity of managing multiple LLM APIs.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Why Do You Need an LLM Router?
&lt;/h2&gt;

&lt;p&gt;Without a router, you’re typically tied to a single provider, which brings several risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vendor Lock-in&lt;/strong&gt;: If your provider increases prices, rate limits you, or experiences downtime, you have limited options.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missed Savings&lt;/strong&gt;: Some providers offer similar quality at significantly lower costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited Model Specialization&lt;/strong&gt;: Some models are better suited for code, others for summarization, chat, or creative tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Privacy and Compliance Risks&lt;/strong&gt;: Using non-compliant providers, especially in the EU, can lead to GDPR violations and legal issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited Model Choice&lt;/strong&gt;: Relying on a single provider restricts your access to the growing variety of models available across the ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With an LLM router, you can:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Load-balance across multiple providers&lt;/li&gt;
&lt;li&gt;Failover automatically when a provider is unavailable&lt;/li&gt;
&lt;li&gt;Optimize for cost, latency, and privacy in real time&lt;/li&gt;
&lt;li&gt;Leverage model diversity for specialized tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Bottom line:&lt;/strong&gt; If you want to deliver fast, cost-efficient, reliable, and compliant AI experiences at scale, an LLM router is no longer optional.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧐 Comparison
&lt;/h2&gt;

&lt;p&gt;Let’s break down noteworthy LLM routers:&lt;/p&gt;




&lt;h3&gt;
  
  
  1️⃣ &lt;a href="https://cortecs.ai" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3b7fbb3i6wtzzzi9l2i.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3b7fbb3i6wtzzzi9l2i.PNG" alt="Cortecs Landing page screenshot" width="777" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compliant with European GDPR.&lt;/li&gt;
&lt;li&gt;Best coverage of the European ecosystem.&lt;/li&gt;
&lt;li&gt;Automated failover.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focused on Europe and GDPR.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2️⃣ &lt;a href="https://www.withmartian.com/" rel="noopener noreferrer"&gt;Withmartian&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamically routes requests to the best-performing model for each specific query.&lt;/li&gt;
&lt;li&gt;Offers significant cost savings by routing to cheaper models.&lt;/li&gt;
&lt;li&gt;Outperforms even GPT-4 on OpenAI’s own evaluations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing can be complex, with potential cost increases for advanced features or large-scale usage.&lt;/li&gt;
&lt;li&gt;Usage in Europe may require GDPR compliance considerations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3️⃣ &lt;a href="https://www.requesty.ai/" rel="noopener noreferrer"&gt;Requesty&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports a wide range of providers through a single API key.&lt;/li&gt;
&lt;li&gt;Provides detailed information to improve observability and cost tracking.&lt;/li&gt;
&lt;li&gt;Offers cost savings through efficient request management.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smart routing classification model can be complex to configure initially.&lt;/li&gt;
&lt;li&gt;Latency overhead from the classification model may impact ultra-low-latency applications.&lt;/li&gt;
&lt;li&gt;Usage in Europe may require GDPR compliance considerations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4️⃣ &lt;a href="https://www.notdiamond.ai/" rel="noopener noreferrer"&gt;NotDiamond&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses a Random Forest Classifier to intelligently route prompts to the most suitable model.&lt;/li&gt;
&lt;li&gt;Allows tuning of the cost-performance tradeoff through a threshold parameter.&lt;/li&gt;
&lt;li&gt;Supports training custom routers for hyper-personalized routing tailored to specific applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom router training can be complex to set up.&lt;/li&gt;
&lt;li&gt;Limited public documentation on pricing, which may complicate budgeting.&lt;/li&gt;
&lt;li&gt;Usage in Europe may require GDPR compliance considerations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5️⃣ &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iw3kfthl9iaatgnqhxe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iw3kfthl9iaatgnqhxe.png" alt="OpenRouter playground screenshot" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides a unified API to access multiple LLM providers.&lt;/li&gt;
&lt;li&gt;Supports a wide range of models from various providers.&lt;/li&gt;
&lt;li&gt;Offers higher availability with fallback options.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some concerns around data privacy and ownership of user-provided information.&lt;/li&gt;
&lt;li&gt;Usage in Europe may require GDPR compliance considerations.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you’re looking for a seamless way to optimize &lt;strong&gt;cost, speed, and compliance&lt;/strong&gt; without getting buried in infrastructure, a &lt;strong&gt;LLM Router&lt;/strong&gt; is a must-have.&lt;/p&gt;

&lt;p&gt;🚀 Make your LLM workflows &lt;strong&gt;faster, safer, and smarter&lt;/strong&gt; from day one.&lt;/p&gt;

</description>
      <category>cortecs</category>
      <category>llm</category>
      <category>routers</category>
      <category>eu</category>
    </item>
    <item>
      <title>Choosing the Right AI Provider in Europe 🇪🇺</title>
      <dc:creator>Asmae Elazrak</dc:creator>
      <pubDate>Fri, 20 Jun 2025 12:53:31 +0000</pubDate>
      <link>https://dev.to/cortecs/choosing-the-right-ai-provider-in-europe-1lo1</link>
      <guid>https://dev.to/cortecs/choosing-the-right-ai-provider-in-europe-1lo1</guid>
      <description>&lt;p&gt;&lt;strong&gt;Artificial Intelligence (AI)&lt;/strong&gt; is transforming industries across Europe, from healthcare to finance to public services. In 2024, French AI startups alone raised over €1.3 billion, followed by Germany at €910 million and the UK at €318 million. As more companies prioritize data sovereignty and GDPR compliance, selecting the right European AI provider has never been more critical.&lt;/p&gt;

&lt;p&gt;But here’s the key question: &lt;strong&gt;The European AI landscape is booming, but how do you choose the right provider?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer might be: &lt;strong&gt;don’t&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Locking yourself into a single AI provider can &lt;strong&gt;limit&lt;/strong&gt; your flexibility, increase your costs, and put your uptime at risk. &lt;/p&gt;

&lt;p&gt;In this article, we’ll break down the pros and cons of leading European AI providers and show how multi-provider routing with &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt; helps you stay agile and resilient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
🗺️ Comparison: European AI Providers

&lt;ul&gt;
&lt;li&gt;OVH: The French Cloud Pioneer&lt;/li&gt;
&lt;li&gt;Scaleway: Sustainable AI Infrastructure&lt;/li&gt;
&lt;li&gt;IONOS: The German AI Model Hub&lt;/li&gt;
&lt;li&gt;Mistral AI: Europe's LLM Champion&lt;/li&gt;
&lt;li&gt;Nebius: The GPU Price Disruptor&lt;/li&gt;
&lt;li&gt;T-Systems: Enterprise-Grade Digital Solutions Provider&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;✨ Unified Access: Bringing All Providers Together&lt;/li&gt;

&lt;li&gt;🔗 Cortecs: Europe’s AI Gateway&lt;/li&gt;

&lt;li&gt;🔍 Summary Table&lt;/li&gt;

&lt;li&gt;💬 Final Thoughts&lt;/li&gt;

&lt;li&gt;📖 Further Reading&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  🗺️ Comparison: European AI Providers
&lt;/h2&gt;

&lt;p&gt;Here’s a quick overview of the major players in Europe’s AI landscape:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.ovhcloud.com/" rel="noopener noreferrer"&gt;OVH: The French Cloud Pioneer&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;OVH stands as one of Europe's most established cloud providers, offering a comprehensive suite of AI and machine learning services with a strong emphasis on data sovereignty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broad range of products and scalable infrastructure&lt;/li&gt;
&lt;li&gt;Competitive pricing, especially for VPS and cloud hosting&lt;/li&gt;
&lt;li&gt;Excellent customization and advanced developer features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Occasional reliability issues and unexpected service shutdowns&lt;/li&gt;
&lt;li&gt;Complex and sometimes buggy user interface&lt;/li&gt;
&lt;li&gt;No refunds or money-back guarantees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers, sysadmins, and technically skilled users who can manage without reliable support&lt;/li&gt;
&lt;li&gt;Businesses needing low-cost, customizable VPS or cloud hosting in Europe&lt;/li&gt;
&lt;li&gt;Budget-conscious users who prioritize price and flexibility&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.scaleway.com/" rel="noopener noreferrer"&gt;Scaleway: Sustainable AI Infrastructure&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Scaleway positions itself as Europe's sustainable cloud provider, focusing on environmental responsibility while delivering high-performance AI infrastructure&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-provisioning services with an easy-to-use platform, enabling better billing predictability&lt;/li&gt;
&lt;li&gt;Responsive support team, often resolving issues within a few hours&lt;/li&gt;
&lt;li&gt;Comprehensive image library for fast setup and deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing changes reported on certain services&lt;/li&gt;
&lt;li&gt;Poor handling of payment issues&lt;/li&gt;
&lt;li&gt;Limited server and hardware options compared to larger providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Startups and developers need quick, user-friendly deployment with flexible scaling&lt;/li&gt;
&lt;li&gt;Teams looking for affordable European cloud services with a solid developer experience&lt;/li&gt;
&lt;li&gt;Users who can carefully manage payment terms and account balances&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.ionos.com/" rel="noopener noreferrer"&gt;IONOS: The German AI Model Hub&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;IONOS has launched Germany's first multimodal AI platform, focusing on making AI accessible to small and medium-sized businesses &lt;br&gt;
&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy-to-use user dashboard&lt;/li&gt;
&lt;li&gt;Strong security and DDoS protection, including 24/7 malware scanning&lt;/li&gt;
&lt;li&gt;Consistent server uptime performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limited customization options&lt;/li&gt;
&lt;li&gt;Expensive signup fees for some services&lt;/li&gt;
&lt;li&gt;Comparatively high renewal rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Businesses prioritizing strong security and uptime guarantees&lt;/li&gt;
&lt;li&gt;Teams looking for a simple, user-friendly cloud dashboard&lt;/li&gt;
&lt;li&gt;Organizations that need reliable uptime and solid DDoS protection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI: Europe's LLM Champion&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Mistral AI is primarily focused on AI models and services, rather than traditional cloud infrastructure like the other providers, and is establishing itself as a formidable competitor to OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customizable structure for industry-specific solutions&lt;/li&gt;
&lt;li&gt;Multilingual support, catering to diverse and global markets&lt;/li&gt;
&lt;li&gt;Offering flexibility and transparency for developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher upfront integration costs&lt;/li&gt;
&lt;li&gt;Requires AI and machine learning expertise for effective implementation&lt;/li&gt;
&lt;li&gt;Restriction to their Mistral models, limiting the choice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teams that don’t require flexibility to choose external models like LLaMA or DeepSeek&lt;/li&gt;
&lt;li&gt;Companies operating in multilingual environments&lt;/li&gt;
&lt;li&gt;Organizations that can handle higher upfront costs in exchange for model flexibility and control&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://nebius.com/" rel="noopener noreferrer"&gt;Nebius: The GPU Price Disruptor&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Nebius has positioned itself as a cost-effective alternative to traditional cloud providers, offering significant savings on GPU-intensive AI workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High performance and cost-effectiveness for AI inference&lt;/li&gt;
&lt;li&gt;Flexible, user-friendly environment for working with open-source models&lt;/li&gt;
&lt;li&gt;Managed Kubernetes with auto-healing and container orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Costs can grow quickly if not carefully monitored&lt;/li&gt;
&lt;li&gt;Less scalable compared to larger, more established providers&lt;/li&gt;
&lt;li&gt;Models may be deleted occasionally, which can disrupt ongoing projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teams needing fast, cost-efficient AI inference&lt;/li&gt;
&lt;li&gt;Companies looking for an easy-to-use platform without deep MLOps expertise&lt;/li&gt;
&lt;li&gt;Organizations open to working with a newer, fast-growing provider&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.t-systems.com/" rel="noopener noreferrer"&gt;T-Systems: Enterprise-Grade Digital Solutions Provider&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A leading European IT and digital services company, trusted by large enterprises and regulated industries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wide range of IT services, including cloud, infrastructure, and managed hosting&lt;/li&gt;
&lt;li&gt;Secure data storage with encryption and strong security practices&lt;/li&gt;
&lt;li&gt;Scalable solutions with reliable performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher pricing compared to some competitors, especially for smaller businesses&lt;/li&gt;
&lt;li&gt;Complex services may require significant technical expertise and onboarding time&lt;/li&gt;
&lt;li&gt;Issues with scaling usage limits or increasing capacity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprises needing secure, scalable, and full-service IT solutions&lt;/li&gt;
&lt;li&gt;Organizations focused on data security and European compliance&lt;/li&gt;
&lt;li&gt;Industry players with in-house technical teams able to manage complex deployments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ✨ Unified Access: Bringing All Providers Together
&lt;/h2&gt;

&lt;p&gt;Instead of locking into one provider, what if you could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mix and match providers&lt;/strong&gt; on demand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize for cost, speed, or uptime&lt;/strong&gt; with simple API-level changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatically fail over&lt;/strong&gt; to the best available option during outages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 That’s exactly what &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt; does.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔗 &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs: Europe’s AI Gateway&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt; is a platform that connects you to multiple European AI providers through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless Smart Routing:&lt;/strong&gt; Send one request, and Cortecs automatically selects the fastest, most cost-effective, or most resilient provider based on your preferences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated Instances:&lt;/strong&gt; Launch fully customizable LLM deployments with guaranteed compute and full control.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why Cortecs?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ One Unified API&lt;/li&gt;
&lt;li&gt;✅ Provider Flexibility&lt;/li&gt;
&lt;li&gt;✅ Optimize for Cost, Speed, or Resiliency&lt;/li&gt;
&lt;li&gt;✅ Built-in Failover&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt; isn’t another AI provider; it’s the control layer that makes your AI stack more &lt;strong&gt;resilient&lt;/strong&gt;, &lt;strong&gt;efficient&lt;/strong&gt;, and &lt;strong&gt;adaptable&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Summary Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OVH&lt;/td&gt;
&lt;td&gt;Developers, budget-focused users&lt;/td&gt;
&lt;td&gt;Cheap, customizable&lt;/td&gt;
&lt;td&gt;Occasional outages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaleway&lt;/td&gt;
&lt;td&gt;Startups, eco-conscious teams&lt;/td&gt;
&lt;td&gt;Easy to use, responsive support&lt;/td&gt;
&lt;td&gt;Payment issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IONOS&lt;/td&gt;
&lt;td&gt;Security-focused SMBs&lt;/td&gt;
&lt;td&gt;Excellent uptime, simple UI&lt;/td&gt;
&lt;td&gt;Expensive fees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral AI&lt;/td&gt;
&lt;td&gt;AI-heavy, multilingual projects&lt;/td&gt;
&lt;td&gt;High accuracy&lt;/td&gt;
&lt;td&gt;High upfront cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nebius&lt;/td&gt;
&lt;td&gt;GPU-intensive workloads&lt;/td&gt;
&lt;td&gt;Cost-efficient&lt;/td&gt;
&lt;td&gt;Scaling limitations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T-Systems&lt;/td&gt;
&lt;td&gt;Large enterprises, regulated industries&lt;/td&gt;
&lt;td&gt;Full-service, secure&lt;/td&gt;
&lt;td&gt;Complex, pricey&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  💬 Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Choosing a European AI provider doesn’t have to be a long-term commitment.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stay flexible&lt;/li&gt;
&lt;li&gt;Avoid downtime&lt;/li&gt;
&lt;li&gt;Optimize your AI costs and performance on the fly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you need &lt;strong&gt;serverless smart routing&lt;/strong&gt;, &lt;strong&gt;dedicated deployments&lt;/strong&gt;, or &lt;strong&gt;both&lt;/strong&gt;, &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt; helps you build AI systems that are smarter, faster, and future-proof.&lt;/p&gt;

&lt;h2&gt;
  
  
  📖 Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.cortecs.ai/serverless-inference/serverless-routing" rel="noopener noreferrer"&gt;Serverless Smart Routing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cortecs.ai/dedicated-inference" rel="noopener noreferrer"&gt;Dedicated Inference&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cortecs</category>
      <category>ai</category>
      <category>europe</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building Intelligent Multi-Agent Systems with CrewAI</title>
      <dc:creator>Eva Jagodic</dc:creator>
      <pubDate>Tue, 04 Feb 2025 13:46:47 +0000</pubDate>
      <link>https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2</link>
      <guid>https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent systems (MAS)&lt;/strong&gt; for large language models (LLMs) represent a significant advancement in AI-driven problem-solving. Rather than operating in isolation, LLM agents collaborate, exchange information, and make dynamic decisions to achieve complex objectives efficiently.&lt;/p&gt;

&lt;p&gt;From &lt;strong&gt;document analysis&lt;/strong&gt; and &lt;strong&gt;automated research&lt;/strong&gt; to &lt;strong&gt;content generation&lt;/strong&gt; and &lt;strong&gt;customer support&lt;/strong&gt;, LLM-based MAS revolutionizes workflows by offering scalability, adaptability, and efficiency. Their ability to interact and coordinate dynamically enables efficient collaboration across multiple AI-driven tasks, optimizing performance in real-world applications.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll explore LLM multi-agent fundamentals, real-world applications, and guide you step-by-step in building your own intelligent agent system. We will be using &lt;a href="https://docs.crewai.com/introduction" rel="noopener noreferrer"&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/a&gt;, an open source framework for orchestrating autonomous AI agents and we will power it with &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;Cortecs LLM workers&lt;/strong&gt;&lt;/a&gt;. Get ready to bring AI collaboration to life!&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2#understanding-multi-agent-systems"&gt;Understanding Multi-Agent Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2#setting-up-the-development-environment"&gt;Setting Up the Development Environment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2#adding-dynamic-provisioning-to-your-example-crew"&gt;Adding Dynamic Provisioning to Your Example Crew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2#running-your-crew"&gt;Running Your Crew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cortecs/building-intelligent-multi-agent-systems-with-crewai-1bc2#conclusion"&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Understanding Multi-Agent Systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Are Multi-Agent Systems?
&lt;/h3&gt;

&lt;p&gt;An LLM-based MAS consists of multiple AI agents that interact in a shared environment to process language tasks efficiently. These agents, powered by large language models, collaborate by exchanging information, analysing data, and generating responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Components of LLM Multi-Agent Systems
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM Agents&lt;/strong&gt; – AI-driven entities that process and generate text based on specific roles and objectives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment&lt;/strong&gt; – The digital space where agents operate, such as document repositories, chat interfaces, or APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication&lt;/strong&gt; – How agents share insights, using structured prompts, shared memory, or message-passing frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision-Making&lt;/strong&gt; – The strategies agents use to determine responses, often involving chain-of-thought reasoning or reinforcement learning.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Benefits of LLM Multi-Agent Systems
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; – Handles large-scale text processing tasks efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration&lt;/strong&gt; – Multiple agents can divide and refine tasks for better accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptability&lt;/strong&gt; – Easily integrates into various workflows and industries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt; – Automates complex workflows with minimal human intervention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applications of LLM Multi-Agent Systems
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated Research&lt;/strong&gt; – Agents collaborate to summarize, fact-check, and analyse documents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content Generation&lt;/strong&gt; – Teams of AI writers draft, edit, and refine articles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer Support&lt;/strong&gt; – AI agents handle inquiries, escalate issues, and personalize responses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Extraction &amp;amp; Analysis&lt;/strong&gt; – AI parses structured and unstructured text for insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these fundamentals prepares us to implement an LLM-based MAS!&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment
&lt;/h2&gt;

&lt;p&gt;Let's install the required libraries for this example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;crewai crewai-tools uv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We'll use &lt;code&gt;crewai&lt;/code&gt; and its extension &lt;code&gt;crewai-tools&lt;/code&gt; to orchestrate our agents, while the &lt;code&gt;uv&lt;/code&gt; package manager helps run our crews.&lt;/p&gt;

&lt;p&gt;Once the libraries are installed, we will create an example crew with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crewai create crew example_crew
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted for a hardware provider, we can select OpenAI from the listed models. Since Cortecs LLM workers are OpenAI-compatible, we'll use our Cortecs credentials. First, create an account on &lt;a href="http://cortecs.ai" rel="noopener noreferrer"&gt;Cortecs.ai&lt;/a&gt;, then visit your &lt;a href="https://cortecs.ai/userArea/userProfile" rel="noopener noreferrer"&gt;profile page&lt;/a&gt; to generate access credentials.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CORTECS_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_CORTECS_CLIENT_ID&amp;gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CORTECS_CLIENT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_CLIENT_SECRET&amp;gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_CORTECS_API_KEY&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, select a model for your crew. We recommend using an 🔵 &lt;strong&gt;Instantly Provisioned&lt;/strong&gt; model like &lt;code&gt;cortecs/phi-4-FP8-Dynamic&lt;/code&gt;. The openai/ prefix indicates we're using an OpenAI-compatible endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openai/cortecs/phi-4-FP8-Dynamic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding Dynamic Provisioning to Your Example Crew
&lt;/h2&gt;

&lt;p&gt;Let's dynamically provision an LLM worker to power our crew.&lt;/p&gt;

&lt;p&gt;We will navigate to &lt;code&gt;example_crew/src/example_crew/crew.py&lt;/code&gt; and modify the ExampleCrew class with these two key functions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;start_llm&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;This function initializes the Cortecs client and starts an LLM Worker of the desired model. We'll add it to the ExampleCrew class's &lt;code&gt;__init__&lt;/code&gt; function to ensure it runs when the crew starts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stop_and_delete_llm()&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;To maximize cost efficiency, this function shuts down our resources when the crew completes its execution. We'll decorate it with the &lt;code&gt;@after_kickoff&lt;/code&gt; hook to ensure proper cleanup.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the modified ExampleCrew class implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.project&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrewBase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;after_kickoff&lt;/span&gt; &lt;span class="c1"&gt;#Add after_kickoff import
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cortecs&lt;/span&gt;

&lt;span class="nd"&gt;@CrewBase&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExampleCrew&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_llm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cortecs_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cortecs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;removeprefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Starting model &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cortecs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ensure_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;

    &lt;span class="nd"&gt;@after_kickoff&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stop_and_delete_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cortecs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cortecs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; stopped and deleted.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;#The rest of the ExampleCrew stays the same...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can further customize your crew by modifying agents.yaml, tasks.yaml and crew.py, or by following additional examples in the &lt;a href="https://docs.crewai.com/introduction" rel="noopener noreferrer"&gt;crewai docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before running our crew, we will add the cortecs-py dependency to our pyproject file in &lt;code&gt;example_crew/pyproject.toml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;dependencies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;"crewai[tools]&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.100&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;",&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;    &lt;span class="py"&gt;"cortecs-py&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;" #Add this line&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running Your Crew
&lt;/h2&gt;

&lt;p&gt;To run our crew, we will first navigate to the project directory (&lt;code&gt;example_crew/&lt;/code&gt;) and install the dependencies by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crewai &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can execute the crew with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crewai run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see that an LLM worker instance starts up. Once it's ready, the crew executes its task. Afterward, the instance automatically stops and gets deleted.&lt;/p&gt;

&lt;p&gt;The generated report will look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Comprehensive Report on Advances in Large Language Model (LLM) Technologies

## 1. Advanced Fine-Tuning Techniques

By 2025, significant advancements in fine-tuning techniques have marked a turning point for Large Language Models (LLMs). These improvements include few-shot and zero-shot learning, enabling models to perform new tasks with minimal task-specific data. Few-shot learning takes advantage of a minimal number of examples, allowing the model to generalize well across similar tasks. Zero-shot learning, on the other hand, lets the model tackle tasks without any task-specific training data. These techniques reduce dependency on extensive labeled datasets and expedite adaptation to diverse applications, offering flexibility and efficiency.

## 2. Multi-Modal Capabilities

LLMs have evolved to incorporate multi-modal data, effectively integrating information from text, images, video, and audio. This enhancement broadens their application across various sectors. In healthcare, multi-modal LLMs facilitate complex case studies by correlating clinical text with imagery and patient history. In autonomous systems, they enhance decision-making by combining sensory data with textual inputs. This synergy results in richer, more contextual insights, enabling more comprehensive understanding and interaction within environments.

...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we've explored how to build a multi-agent system using CrewAI and Cortecs LLM workers. We covered the fundamentals of LLM-based multi-agent systems, from understanding their key components to practical implementation. We've learned how to set up your development environment, dynamically provision LLM workers, and create a functional crew that can efficiently handle complex tasks.&lt;/p&gt;

&lt;p&gt;To dive deeper into multi-agent systems, check out the &lt;a href="https://docs.crewai.com/introduction" rel="noopener noreferrer"&gt;CrewAI documentation&lt;/a&gt; and explore the &lt;a href="https://cortecs.ai" rel="noopener noreferrer"&gt;Cortecs platform&lt;/a&gt;. Happy building! 🚀✨&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>nlp</category>
      <category>cortecs</category>
    </item>
    <item>
      <title>All Too Swift: Real-Time Reddit Processing Simplified with AI</title>
      <dc:creator>Asmae Elazrak</dc:creator>
      <pubDate>Wed, 22 Jan 2025 08:37:49 +0000</pubDate>
      <link>https://dev.to/cortecs/all-too-swift-real-time-reddit-processing-simplified-with-ai-2edo</link>
      <guid>https://dev.to/cortecs/all-too-swift-real-time-reddit-processing-simplified-with-ai-2edo</guid>
      <description>&lt;p&gt;What if you could instantly spot and respond to millions of Reddit comments, all in real-time? No delays, no limits—just fast, seamless insights as they happen.&lt;/p&gt;

&lt;p&gt;In this guide, we’ll show you how to set up a real-time data processing system using powerful AI models with &lt;strong&gt;LLM Workers&lt;/strong&gt;. To bring it to life, we’ll use a &lt;strong&gt;Taylor Swift bot&lt;/strong&gt; as an example, a bot that scans Reddit comments in real-time to find and respond to discussions about Taylor Swift. ✨&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents 🗂️
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
The Power of Real-Time Data Processing and Dedicated Inference
&lt;/li&gt;
&lt;li&gt;
Building the Reddit Bot

&lt;ul&gt;
&lt;li&gt;
Step 1: Set Up Your Environment
&lt;/li&gt;
&lt;li&gt;
Step 2: Setting Up Reddit and Initializing Cortecs Model
&lt;/li&gt;
&lt;li&gt;
Step 3: Define the Classification and Response Chains
&lt;/li&gt;
&lt;li&gt;
Step 4: Stream and Process Reddit Comments in Real-Time
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Conclusion
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Power of Real-Time Data Processing and Dedicated Inference
&lt;/h2&gt;

&lt;p&gt;We all know that real-time applications demand high performance, especially when you're dealing with large amounts of data. However, the challenge of processing data quickly and efficiently is easily resolved by using &lt;strong&gt;dedicated inference&lt;/strong&gt; and this is where &lt;strong&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;&lt;/strong&gt; really shines. &lt;/p&gt;

&lt;p&gt;By leveraging Cortecs' dedicated inference models, you get a system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Handles High Volumes:&lt;/strong&gt; Process hundreds of requests per second without throttling with the ability to scale seamlessly using LLM Workers dedicated to specific tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintains Consistency:&lt;/strong&gt; With dedicated resources like LLM Workers, you can count on stable latency, no matter the load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is Easy to Implement:&lt;/strong&gt; You don’t need to worry about complex infrastructure or performance fine-tuning; it just works.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4142vha3ee6r1kzvqmmi.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4142vha3ee6r1kzvqmmi.PNG" alt="Cortex and Reddit combination and logo" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Dedicated Inference Matters
&lt;/h3&gt;

&lt;p&gt;Traditional inference models often share resources with other users, leading to bottlenecks during peak times. With &lt;strong&gt;dedicated inference&lt;/strong&gt;, you get exclusive access to computational resources, ensuring that your system remains reliable and fast even under heavy loads. This makes it ideal for applications like fraud detection, customer service automation, and content moderation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Reddit Bot 🛠️
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Set Up Your Environment
&lt;/h3&gt;

&lt;p&gt;Before diving into the code, you need to install a few libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;praw&lt;/span&gt; &lt;span class="n"&gt;langchain&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;core&lt;/span&gt; &lt;span class="n"&gt;cortecs&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These libraries serve the following purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;praw:&lt;/strong&gt; The Python Reddit API Wrapper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;langchain:&lt;/strong&gt; A framework that helps you work with language models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cortecs:&lt;/strong&gt; The platform that provides high-performance models for real-time inference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After that, to authenticate and access the Cortecs models, you need to create an account at &lt;strong&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs.ai&lt;/a&gt;&lt;/strong&gt;. &lt;br&gt;
Once you’ve signed up, go to your &lt;a href="https://cortecs.ai/userArea/userProfile" rel="noopener noreferrer"&gt;profile page&lt;/a&gt;, generate your access credentials, and set them as environment variables in your code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Set the Cortecs API credentials as environment variables
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_openai_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CORTECS_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_cortecs_client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CORTECS_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_cortecs_client_secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Setting up Reddit and Initializing Cortecs Model
&lt;/h3&gt;

&lt;p&gt;Then, you'll need to create a Reddit account and register your application to get API access. To do this, visit &lt;strong&gt;&lt;a href="https://www.reddit.com/prefs/apps" rel="noopener noreferrer"&gt;Reddit's API page&lt;/a&gt;&lt;/strong&gt; and create a new application to obtain your &lt;strong&gt;Client ID&lt;/strong&gt; and &lt;strong&gt;Client Secret&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqihnherm4bbvt8qsscq.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqihnherm4bbvt8qsscq.PNG" alt="Reddit interface for creating a bot application" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your Client ID and Client Secret, you can initialize the Reddit API client and set up the Cortecs model for real-time inference as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;praw&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cortecs&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py.integrations.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DedicatedLLM&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="c1"&gt;# Choose the model for real-time inference
&lt;/span&gt;   &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cortecs/phi-4-FP8-Dynamic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
   &lt;span class="n"&gt;cortecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cortecs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="c1"&gt;# Set up Reddit API credentials
&lt;/span&gt;   &lt;span class="n"&gt;reddit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;praw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Reddit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Replace with your Client ID
&lt;/span&gt;       &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with your Client Secret
&lt;/span&gt;       &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_USER_AGENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;     &lt;span class="c1"&gt;# Replace with your User Agent
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;model_name&lt;/code&gt; refers to the model you choose for inference. In this example, we’ve selected the &lt;code&gt;cortecs/phi-4-FP8-Dynamic&lt;/code&gt; model, which is suitable for many general-purpose tasks. You can find a list of models &lt;a href="https://cortecs.ai/models" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Define the Classification and Response Chains
&lt;/h3&gt;

&lt;p&gt;In this step, we initialize the model for real-time processing and define the classification and response chains that will be used to process the posts and generate responses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DedicatedLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cortecs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Given the reddit post below, classify it as either `Art`, `Finance`, `Science`, `Taylor Swift` or `Other`.
        Do not provide an explanation.

        {channel}: {title}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; Classification:&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;classification_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are the biggest Taylor Swift fan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Respond to this post:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; {comment}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;response_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we defined two main tasks (or "chains"):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Classification Chain:&lt;/strong&gt; The first prompt defines the classification logic for Reddit posts. It takes the post title and subreddit as input and classifies the post into categories such as &lt;em&gt;Art&lt;/em&gt;, &lt;em&gt;Finance&lt;/em&gt;, &lt;em&gt;Science&lt;/em&gt;, &lt;em&gt;Taylor Swift&lt;/em&gt;, or Other. The &lt;code&gt;StrOutputParser()&lt;/code&gt; ensures that the output is in the desired format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Response Chain:&lt;/strong&gt; The second prompt generates a response if the post is about Taylor Swift. We use a system message to indicate that the model should behave as a &lt;strong&gt;"biggest Taylor Swift fan"&lt;/strong&gt; and a user message to define the format for the response.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Stream and Process Reddit Comments in Real-Time
&lt;/h3&gt;

&lt;p&gt;With the classification and response chains in place, the next step is to continuously stream Reddit comments and process them in real time. This allows the bot to react to posts as they come in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# scan reddit in realtime 
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reddit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subreddit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classification_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;channel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subreddit_name_prefixed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;link_title&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subreddit_name_prefixed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;link_title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Taylor Swift&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;comment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;---&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stream Comments:&lt;/strong&gt; Continuously monitor Reddit for new comments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classify Posts:&lt;/strong&gt; Use the classification chain to categorize each post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Respond to Specific Topics:&lt;/strong&gt; If a post is classified as related to Taylor Swift, the bot responds with a pre-defined message.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While running the code, you can monitor the progression of the model execution on the &lt;a href="https://cortecs.ai/userArea/console" rel="noopener noreferrer"&gt;console page&lt;/a&gt; of the Cortecs web interface, as shown in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzz9m6r1et6sydnyw8c4f.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzz9m6r1et6sydnyw8c4f.PNG" alt="Console page of cortecs" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion 🎉
&lt;/h2&gt;

&lt;p&gt;Building real-time applications can be challenging, but with the right tools, they become much more manageable. By using &lt;strong&gt;LLM Workers&lt;/strong&gt;, you can process high volumes of data without compromising performance. Whether you're classifying content, detecting trends, or automating responses, the approach shown here can be easily adapted to fit your needs.&lt;/p&gt;

&lt;p&gt;Now, it’s your turn to try it out. Start experimenting with real-time data processing and explore the possibilities! 🚀&lt;/p&gt;

</description>
      <category>cortecs</category>
      <category>llm</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Streamline Your Batch Jobs: The Power of LLM Workers 🤖</title>
      <dc:creator>Asmae Elazrak</dc:creator>
      <pubDate>Fri, 17 Jan 2025 12:15:35 +0000</pubDate>
      <link>https://dev.to/cortecs/streamline-your-batch-jobs-the-power-of-cortecs-ai-inference-2jjl</link>
      <guid>https://dev.to/cortecs/streamline-your-batch-jobs-the-power-of-cortecs-ai-inference-2jjl</guid>
      <description>&lt;p&gt;Have you ever felt overwhelmed by the sheer volume of data you need to process or wished you could automate repetitive tasks effortlessly? &lt;/p&gt;

&lt;p&gt;Imagine being able to summarize hundreds of research papers in minutes, extract critical insights from vast datasets, or streamline tedious workflows. &lt;br&gt;
In this article, we’ll explore how &lt;strong&gt;Cortecs&lt;/strong&gt; helps you unlock the full potential of large language models (LLMs) with &lt;strong&gt;ease&lt;/strong&gt;, &lt;strong&gt;scalability&lt;/strong&gt;, and &lt;strong&gt;cost-efficiency&lt;/strong&gt;. Specifically, we’ll focus on how Cortecs simplifies handling batch jobs and massive data workloads, guiding you through everything from environment setup to seamless data processing at scale. &lt;/p&gt;

&lt;p&gt;Let’s dive in and see how Cortecs can transform your AI journey.&lt;/p&gt;
&lt;h2&gt;
  
  
  Table of Contents 📚
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What is Cortecs?&lt;/li&gt;
&lt;li&gt;Setting Up Your Environment&lt;/li&gt;
&lt;li&gt;
Batch Processing with Cortecs-py

&lt;ul&gt;
&lt;li&gt;Step 1: Loading Documents&lt;/li&gt;
&lt;li&gt;Step 2: Creating a Prompt&lt;/li&gt;
&lt;li&gt;Step 3: Batch Processing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  What is Cortecs?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;&lt;/strong&gt; is a platform that gives you on-demand access to powerful LLMs running on dedicated servers. This ensures maximum performance, reliability, and scalability for your AI tasks. &lt;/p&gt;

&lt;p&gt;Cortecs lets you manage LLM Workers for large-scale processing, offloading tasks to specialized AI workers for high throughput and faster processing of massive datasets⚡.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated Servers for Fast AI Processing:&lt;/strong&gt; With Cortecs, you get exclusive access to dedicated servers, meaning faster, more efficient AI processing without the competition for resources 🚀.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy to Set Up and Use:&lt;/strong&gt; Cortecs is designed for simplicity. It integrates seamlessly with your existing workflows, so you can start using LLMs right away with minimal setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable and Cost-Effective:&lt;/strong&gt; Cortecs scales with your needs, offering dynamic resource allocation that ensures you only pay for what you use💰, keeping costs low.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Setting Up Your Environment 🛠️
&lt;/h2&gt;

&lt;p&gt;Before diving into batch processing, you'll need to set up your environment. First, register at &lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs.ai&lt;/a&gt; and create your access credentials on your &lt;strong&gt;profile page&lt;/strong&gt; 📋. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0ljrdmlpqcvpsa3sgau.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0ljrdmlpqcvpsa3sgau.PNG" alt="Profile page example from Cortecs interface" width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your credentials, set them as environment variables in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Set the Cortecs API credentials as environment variables
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_openai_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CORTECS_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_cortecs_client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CORTECS_CLIENT_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_cortecs_client_secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll also need to install several Python libraries to run the example below. These can be easily installed via pip. Here are the commands to install the required packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain
&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-community
&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cortecs-py
&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;arxiv
&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pymupdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Batch Processing with Cortecs-py 🔄
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://pypi.org/project/cortecs-py/" rel="noopener noreferrer"&gt;Cortecs-py&lt;/a&gt;&lt;/strong&gt; is a lightweight Python wrapper for the Cortecs REST API. It provides you with the tools to dynamically manage your AI instances directly from your workflow, making batch processing seamless and efficient. &lt;/p&gt;

&lt;p&gt;Combined with LangChain a versatile framework for LLM workflows, you can unlock incredible efficiency and power. &lt;/p&gt;

&lt;p&gt;Let’s explore a real-world example of using &lt;strong&gt;Cortecs-py&lt;/strong&gt; for batch processing&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Loading Documents 📄
&lt;/h4&gt;

&lt;p&gt;After adding the necessary credentials and installing the required libraries, we’ll retrieve research papers from Arxiv using the &lt;strong&gt;ArxivLoader&lt;/strong&gt;, focusing on a query like 'Reasoning.'&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ArxivLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cortecs&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py.integrations.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DedicatedLLM&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Cortecs client
&lt;/span&gt;&lt;span class="n"&gt;cortecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cortecs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Load documents
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArxivLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_max_docs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_ful_documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;doc_content_chars_max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;load_all_available_meta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Creating a Prompt 💬
&lt;/h4&gt;

&lt;p&gt;Then, we’ll create a simple prompt that asks the model to explain the document content in plain language.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{text}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Explain to me like I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m five:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 3: Batch Processing 🏭
&lt;/h4&gt;

&lt;p&gt;With Cortecs-py, batch processing is straightforward. The DedicatedLLM class makes it even easier as it automatically takes care of starting and stopping your infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DedicatedLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cortecs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cortecs/phi-4-FP8-Dynamic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processing data batch-wise ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-------&lt;/span&gt;&lt;span class="se"&gt;\n\n\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;💡 &lt;strong&gt;Remark&lt;/strong&gt;: Don't forget to choose a model that supports the required context length for your use case. In this example, we are using the &lt;code&gt;phi-4-FP8-Dynamic&lt;/code&gt; model. &lt;br&gt;
You can explore the full range of models offered by Cortecs &lt;u&gt;&lt;a href="https://cortecs.ai/models" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/u&gt;.&lt;/p&gt;

&lt;p&gt;Below is an example of the batch-processing output 📊:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3s6u2qrj5m3421oxmjm.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3s6u2qrj5m3421oxmjm.PNG" alt="LLM workers output" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This simple pipeline summarized &lt;strong&gt;224,200&lt;/strong&gt; input tokens into &lt;strong&gt;12,900&lt;/strong&gt; output tokens in just &lt;strong&gt;55 seconds&lt;/strong&gt;, proving the efficiency of batch processing with dedicated inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhsjsq24pbi256echxmu.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhsjsq24pbi256echxmu.PNG" alt="Company Model Comparison" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When comparing the cost of using &lt;strong&gt;Cortecs&lt;/strong&gt; for summarization tasks to other solutions like Fireworks or cloud-based services, Cortecs stands out for its cost efficiency, with no unpredictable costs. This makes it an ideal solution for companies looking to leverage AI without breaking the bank🏦.&lt;/p&gt;

&lt;p&gt;Ready to transform your workflows and elevate your AI projects? &lt;/p&gt;

&lt;p&gt;Discover how &lt;strong&gt;&lt;a href="https://cortecs.ai/" rel="noopener noreferrer"&gt;Cortecs&lt;/a&gt;&lt;/strong&gt; can help you unlock the power of Large Language Models (LLMs) while maintaining cost efficiency🚀. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>nlp</category>
      <category>cortecs</category>
    </item>
    <item>
      <title>LLMs for Big Data</title>
      <dc:creator>Marko Arnauto</dc:creator>
      <pubDate>Mon, 13 Jan 2025 12:00:19 +0000</pubDate>
      <link>https://dev.to/cortecs/llms-for-big-data-1hfb</link>
      <guid>https://dev.to/cortecs/llms-for-big-data-1hfb</guid>
      <description>&lt;p&gt;We all love our chatbots, but when it comes to heavy-loads, they just don’t cut it. If you need to analyze thousands of documents at once, serverless inference — the go-to for chat applications — quickly shows its (rate) limits. &lt;/p&gt;

&lt;h2&gt;
  
  
  One Model — Many Users 
&lt;/h2&gt;

&lt;p&gt;Imagine working in a shared co-working space: it’s convenient, but your productivity depends on how crowded the space is. Similarly, &lt;strong&gt;serverless models&lt;/strong&gt; like OpenAI, Anthropic or Groq rely on shared infrastructure, where performance fluctuates based on how many users are competing for resources. Strict rate limits, like Groq’s 7,000 tokens per minute, can grind progress to a halt. &lt;/p&gt;

&lt;h2&gt;
  
  
  Dedicated Compute — One Model per User
&lt;/h2&gt;

&lt;p&gt;In contrast, &lt;strong&gt;dedicated inference allocates compute resources exclusively to a single user&lt;/strong&gt; or application. This ensures predictable and consistent performance, as the only limiting factor is the computational capacity of the allocated GPUs. According to &lt;a href="https://fireworks.ai" rel="noopener noreferrer"&gt;Fireworks.ai&lt;/a&gt;, a leading inference provider,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Graduating from serverless to on-demand deployments starts to make sense economically when you are running ~100k+ tokens per minute.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are typically no rate limits on throughput. Billing for dedicated inference is time-based, calculated per hour or minute depending on the platform. While dedicated inference is well-suited for high-throughput, it involves a tedious setup process as well as the risk of overpaying due to idle times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tedious Setup
&lt;/h3&gt;

&lt;p&gt;Deploying dedicated inference requires careful preparation. First, you need to rent suitable hardware to support your chosen model. Next, an inference engine such as vLLM must be configured to match the model’s requirements. Finally, secure access must be established via a TLS-encrypted connection to ensure encrypted communication. According to Philipp Schmidt, the co-founder of Hugging Face, &lt;a href="https://www.philschmid.de/cost-generative-ai" rel="noopener noreferrer"&gt;you need one full-time developer&lt;/a&gt; to setup and maintain such a system. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18v3tpy9iric55w7h52z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18v3tpy9iric55w7h52z.png" alt="Dedicated deployments require a tedious setup." width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Idle Times
&lt;/h3&gt;

&lt;p&gt;Time-based billing makes cost-projections easier but on the other hand idle resources can quickly become a cost-overhead. Dedicated inference is cost-effective only when GPUs are busy. To avoid unnecessary expenses, the system should be turned off when not in use. Managing this manually can be tedious and error-prone.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM Workers to the Rescue
&lt;/h2&gt;

&lt;p&gt;To address the downsides of dedicated inference, providers like Google, and Cortecs offer dedicated LLM workers.Without any additional configurations these workers are started and stopped on-demand — avoiding setup overhead and idle times. The required hardware is allocated, the inference engine is configured, and API connections are established all in the background. Once the workload is completed, the worker shuts down automatically. &lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;As I’m involved in the cortecs project I’m going to showcase it using our &lt;a href="https://github.com/cortecs-ai/cortecs-py" rel="noopener noreferrer"&gt;library&lt;/a&gt;. It can be installed with pip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install cortecs-py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will use the OpenAI python library to access the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, register at &lt;a href="https://cortecs.ai" rel="noopener noreferrer"&gt;cortecs.ai&lt;/a&gt; and create your access credentials at the profile page. Then set them as environment variables.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;export OPENAI_API_KEY="Your cortecs api key"&lt;br&gt;
export CORTECS_CLIENT_ID="Your cortecs id"&lt;br&gt;
export CORTECS_CLIENT_SECRET="Your cortecs secret"&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;It’s time to choose a model. We selected a model supporting 🔵 instant provisioning which was &lt;em&gt;phi-4-FP8-Dynamic&lt;/em&gt;. Models that support instant provisioning enable a warm start, eliminating provisioning latency — perfect for this demonstration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cortecs_py&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cortecs&lt;/span&gt;

&lt;span class="n"&gt;cortecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cortecs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;my_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cortecs/phi-4-FP8-Dynamic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Start a new instance
&lt;/span&gt;&lt;span class="n"&gt;my_instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cortecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ensure_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a joke about LLMs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Stop the instance
&lt;/span&gt;&lt;span class="n"&gt;cortecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All provisioning complexity is abstracted by &lt;code&gt;cortecs.ensure_instance(my_model)&lt;/code&gt; and &lt;code&gt;cortecs.stop(my_instance.instance_id)&lt;/code&gt;. Between these two lines, you can execute arbitrary inference tasks—whether it's generating a simple joke about LLMs or producing billions of words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM Workers are a game-changer&lt;/strong&gt; for large-scale data analysis. With no need to manage complex compute clusters, they enable seamless big data analysis and generation without the typical concerns of rate limits or exploding inference costs.&lt;br&gt;
Imagine a future where LLM Workers handle highly complex tasks, such as proving mathematical theorems or executing reasoning-intensive operations. You could launch a worker, let it run at full GPU utilization to tackle the problem, and have it shut itself down automatically upon completion. The potential is enormous, and this tutorial demonstrates how to dynamically provision LLM Workers for high-performance AI tasks.&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>llm</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
