<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Featherless.ai</title>
    <description>The latest articles on DEV Community by Featherless.ai (@featherlessai).</description>
    <link>https://dev.to/featherlessai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10170%2F1abaa506-fbd6-42a6-a7ad-3b1165678a3f.png</url>
      <title>DEV Community: Featherless.ai</title>
      <link>https://dev.to/featherlessai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/featherlessai"/>
    <language>en</language>
    <item>
      <title>Experimental support for Kimi-K2 by Moonshot AI now available for premium users!</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Tue, 15 Jul 2025 09:27:47 +0000</pubDate>
      <link>https://dev.to/featherlessai/experimental-support-for-kimi-k2-by-moonshot-ai-now-available-for-premium-users-47gj</link>
      <guid>https://dev.to/featherlessai/experimental-support-for-kimi-k2-by-moonshot-ai-now-available-for-premium-users-47gj</guid>
      <description>&lt;p&gt;We now have experimental support for &lt;a href="https://featherless.ai/models/moonshotai/Kimi-K2-Instruct" rel="noopener noreferrer"&gt;Kimi-K2&lt;/a&gt; on Featherless for our premium subscribers.&lt;/p&gt;

&lt;p&gt;Hop on the Kimi-K2 train 🚂&lt;/p&gt;

&lt;p&gt;Kimi-K2 achieves exceptional performance across coding, reasoning, and agentic tasks with its revolutionary 1 trillion parameter MoE architecture. A SOTA open-source model specifically designed for autonomous problem-solving.&lt;/p&gt;

&lt;p&gt;Moonshot AI has delivered breakthrough performance in agentic intelligence while maintaining open-source accessibility, positioning themselves as a formidable challenger in the frontier model space.&lt;/p&gt;

&lt;p&gt;Some highlights of the Kimi-K2 release:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Massive Scale&lt;/strong&gt;: 1T total parameters with 32B activated parameters using mixture-of-experts architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Training Instability&lt;/strong&gt;: Achieved stable pre-training on 15.5T tokens with novel MuonClip optimizer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic Excellence&lt;/strong&gt;: Specifically optimized for tool use, reasoning, and autonomous problem-solving&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superior Performance&lt;/strong&gt;: Leading results on SWE-bench, LiveCodeBench, and tool-use benchmarks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Moonshot AI's rise in the agentic AI space reflects their focused approach to two critical innovations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MuonClip Optimizer&lt;/strong&gt;: Applied the Muon optimizer at unprecedented scale with novel optimization techniques&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic Specialization&lt;/strong&gt;: Purpose-built architecture for tool use and autonomous reasoning tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Era of Experience" Training&lt;/strong&gt;: Advanced RL system using self-judging mechanisms and MCP (Model Context Protocol) tools for real-world agentic scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kimi-K2 represents a major leap forward in open-source agentic capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experimental support notice
&lt;/h2&gt;

&lt;p&gt;We're excited to offer experimental support for Kimi-K2 for our premium users. Given the substantial computational requirements of this 1T parameter model, we're closely monitoring usage patterns and operational costs. We may need to temporarily adjust or suspend availability. Try out Kimi-K2 and share your feedback to help us improve the experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experience Kimi-K2 on Featherless:
&lt;/h3&gt;

&lt;p&gt;🔔 &lt;a href="https://featherless.ai/subscription/change?plan_id=feather_pro_plus" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; to access Kimi K2&lt;/p&gt;

&lt;p&gt;🦜 Chat with it on our chat &lt;a href="https://phoenix.featherless.ai/" rel="noopener noreferrer"&gt;Phoenix&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚡ Integrate via the Featherless API&lt;/p&gt;

&lt;p&gt;🔍 Join our &lt;a href="https://discord.com/invite/featherlessai" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; to give feedback and discuss the new model&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Featherless Becomes Hugging Face’s Largest LLM Inference Provider with 6,700+ Models</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Thu, 12 Jun 2025 14:03:09 +0000</pubDate>
      <link>https://dev.to/featherlessai/featherless-becomes-hugging-faces-largest-llm-inference-provider-with-6700-models-57bh</link>
      <guid>https://dev.to/featherlessai/featherless-becomes-hugging-faces-largest-llm-inference-provider-with-6700-models-57bh</guid>
      <description>&lt;p&gt;We’re excited to announce that Featherless is now the most extensive LLM inference provider on Hugging Face, serving over 6,700 open-weight models—and counting.&lt;/p&gt;

&lt;p&gt;This milestone means developers, researchers, and teams can now run thousands of the world’s models directly from Hugging Face, backed by Featherless's serverless infrastructure, flat pricing, and production-grade scalability.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Featherless is the only Hugging Face Inference Endpoints provider supporting this scale.&lt;/p&gt;

&lt;p&gt;Any model with 100+ downloads is automatically onboarded to Featherless.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Reliable, Open AI — At Scale
&lt;/h2&gt;

&lt;p&gt;This collaboration brings together two shared commitments: accessibility and open source.&lt;/p&gt;

&lt;p&gt;With Featherless powering Hugging Face endpoints, users now get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;6,700+ Models, Instantly Available&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From DeepSeek, LLaMA, Mistral, and Qwen to new release like Magistral and Devstral. All ready to deploy, fine-tune, or benchmark.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Serverless, Scalable Infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Model cold-starts average under 250ms, enabling users to plan their usage by models and concurrent connections. No GPUs, no containers, no infrastructure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automatic Model Onboarding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hugging Face models with 100+ downloads are auto-integrated with Featherless for access.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unlimited Usage, Predictable Pricing (when subscribed to Featherless)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run any model—without usage caps, per-token math, or surprise bills.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;“Featherless AI is doing for inference what Hugging Face did for open-source model hosting, making it simple, accessible, and scalable. This partnership is a big step towards the future where anyone can have instant access to all the world’s collection of AI models.”&lt;/p&gt;

&lt;p&gt;— Eugene Cheah, Co-founder, Featherless AI&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Two Ways to Use Featherless on Hugging Face
&lt;/h2&gt;

&lt;p&gt;Starting June 12, 2025, users can invoke Featherless inference directly inside the Hugging Face platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Routed Request&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Billed by Hugging Face. Just select Featherless AI from the Inference Endpoints dropdown and go.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Custom Key or Direct Calls&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use your own Featherless API key for direct access and flat-rate unlimited usage (requires a Featherless subscription).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ &lt;a href="https://featherless.ai/docs/hugging-face" rel="noopener noreferrer"&gt;Read the Docs&lt;/a&gt; &lt;br&gt;
→ &lt;a href="https://featherless.ai/#pricing" rel="noopener noreferrer"&gt;Explore Featherless Pricing&lt;/a&gt;&lt;br&gt;
→ &lt;a href="https://featherless.ai/docs/quickstart-guide" rel="noopener noreferrer"&gt;Run Your First Model&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Future-Proofing AI Deployment
&lt;/h2&gt;

&lt;p&gt;As the world moves toward more personalized, specialized, and fine-tuned AI systems, Featherless is building the foundation.&lt;/p&gt;

&lt;p&gt;We are both a serverless inference platform and an AI research lab. Our contributions to attention-alternative architectures like RWKV help us scale models other platforms can’t. We reduce inference costs for all models by at least 10 times. And we’ve built the world’s most reliable agent for everyday use, outperforming Gemini, Claude, and GPT-4o.&lt;/p&gt;

&lt;p&gt;Together with Hugging Face, we’re making the long tail of models accessible, scalable, and production-ready.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6,700+ LLMs hosted today&lt;/li&gt;
&lt;li&gt;100% of Hugging Face public models targeted by EOY 2026 🤗&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About Featherless
&lt;/h2&gt;

&lt;p&gt;Featherless is the fastest way to run reliable, open-source AI at scale. Featherless is an AI research lab and serverless platform that gives developers, researchers, and teams instant access to the world’s largest model catalog without managing infrastructure, token limits, or hidden costs. Whether you’re building prototypes, deploying applications, or scaling intelligent systems, Featherless helps you move faster with AI you can trust. Our mission is to make personalized AGI real: open, reliable, and built for everyone.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;Explore the Catalog&lt;/a&gt;&lt;br&gt;
→ &lt;a href="https://featherless.ai/#pricing" rel="noopener noreferrer"&gt;Subscribe to Featherless&lt;/a&gt;&lt;br&gt;
→ &lt;a href="//discord.gg/featherlessai"&gt;Join our Discord&lt;/a&gt;&lt;br&gt;
→ &lt;a href="//x.com/FeatherlessAI"&gt;Follow us on X&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>webdev</category>
      <category>huggingface</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Context Isn’t Everything: Build Efficient LLM Apps with LlamaIndex + Featherless</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Mon, 02 Jun 2025 13:23:36 +0000</pubDate>
      <link>https://dev.to/featherlessai/context-isnt-everything-build-efficient-llm-apps-with-llamaindex-featherless-3ak</link>
      <guid>https://dev.to/featherlessai/context-isnt-everything-build-efficient-llm-apps-with-llamaindex-featherless-3ak</guid>
      <description>&lt;p&gt;We’re excited to announce that LlamaIndex has &lt;a href="https://docs.llamaindex.ai/en/stable/examples/llm/featherlessai/" rel="noopener noreferrer"&gt;official support&lt;/a&gt; for Featherless, bringing together two powerful tools for building production RAG applications. While everyone’s chasing longer context windows (100K, 1M tokens), we’ve noticed most production apps need something different: they need efficient retrieval that finds the right information, not all information.&lt;/p&gt;

&lt;p&gt;That’s why this integration matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LlamaIndex provides the RAG infrastructure: data loaders, chunking strategies, and vector search&lt;/li&gt;
&lt;li&gt;Featherless gives you access to 4,000+ open source models through a simple API&lt;/li&gt;
&lt;li&gt;The combination can build you a RAG pipeline that can switch between models instantly, optimize for cost, and scale without infrastructure headaches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s have a deeper look at what you can build with this new integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Retrieval is Perhaps a Better solution for your problem
&lt;/h2&gt;

&lt;p&gt;Stuffing your entire knowledge base into a single prompt might work for a simple demo, but at scale it leads to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slower response times:&lt;/strong&gt; Processing 100k tokens takes time, even on fast hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More hallucinations&lt;/strong&gt;: Models struggle with needle-in-haystack problems in massive contexts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token overflow:&lt;/strong&gt; Eventuelly you might hit limits, forcing crude truncation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you actually want is precision, just the right information, fed to the model at the right time. That’s where RAG (Retrieval-Augmented Generation) shines, and LlamaIndex handles it beautifully.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Featherless brings to the Stack
&lt;/h2&gt;

&lt;p&gt;Featherless simplifies the process of getting access to open source models. Instead of provisioning GPUs, managing infrastructure, dealing with model deployment, and worrying about usage costs, you get instant access to over 4,300 open source models including DeepSeek, Llama, Qwen, Mistral and many more. Everything runs through our API, with a simple monthly subscription, you have unlimited access and tokens to our whole model catalog and can switch between them instantly, perfect for A/B testing different approaches without any infrastructure overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickstart: Build a Local RAG application
&lt;/h2&gt;

&lt;p&gt;Let’s walk through building a Q&amp;amp;A assistant that can answer questions about your local documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install dependencies&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llama-index llama-index-llms-featherlessai llama-index-embeddings-huggingface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Set up your environment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SimpleDirectoryReader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.llms.featherlessai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FeatherlessLLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.embeddings.huggingface&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceEmbedding&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Set your Featherless API key
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Configure local embeddings
&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Efficient, high-quality embeddings
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Alternative: Use Ollama for local embeddings
# from llama_index.embeddings.ollama import OllamaEmbedding
# embed_model = OllamaEmbedding(model_name="nomic-embed-text")
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Load and Index Your Documents&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load all files from ./docs directory
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleDirectoryReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Configure Featherless as your LLM
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FeatherlessLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-32B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Or any model from featherless.ai
&lt;/span&gt;    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# Lower for more consistent retrieval
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build your vector index with free embeddings
&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Optimal for precise retrieval
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Query Your Knowledge Base&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a query engine
&lt;/span&gt;&lt;span class="n"&gt;query_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_query_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;similarity_top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieve top 3 most relevant chunks
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Ask questions
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s our onboarding process?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You just built a RAG pipeline system in under 30 lines of code, with zero of the infrastructure overhead&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Features: Streaming and Chat
&lt;/h2&gt;

&lt;p&gt;Our Featherless LlamaIndex integration supports both streaming responses and multi-turn conversations:&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming Responses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Stream for real-time output
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the key points of machine learning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-turn Chat
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.llms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful technical assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is RAG?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Stream chat responses
&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Model Switching: A/B Test Without Rewriting Code
&lt;/h2&gt;

&lt;p&gt;One of Featherless’s powers is instantly model switching. Test different models for your use case&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;models_to_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-24B-Instruct-2501&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-8B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain our refund policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models_to_test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Example: Customer Support Bot
&lt;/h2&gt;

&lt;p&gt;Here's a complete example of a customer support bot that combines multiple best practices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SimpleDirectoryReader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.llms.featherlessai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FeatherlessLLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.embeddings.huggingface&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceEmbedding&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.node_parser&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.llms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Featherless
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Set up embeddings
&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load different document types
&lt;/span&gt;&lt;span class="n"&gt;faq_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleDirectoryReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./data/faqs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;policy_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleDirectoryReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./data/policies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;product_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleDirectoryReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./data/products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Tag documents with metadata
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;faq_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;policy_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;product_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Combine all documents
&lt;/span&gt;&lt;span class="n"&gt;all_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;faq_docs&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;policy_docs&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;product_docs&lt;/span&gt;

&lt;span class="c1"&gt;# Create index with custom settings
&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;all_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;transformations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;SentenceSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Function to route queries to appropriate model
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_llm_for_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;FeatherlessLLM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="c1"&gt;# Use precise model for policy questions
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;FeatherlessLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-32B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;help&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;how&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tutorial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="c1"&gt;# Use helpful model for guidance
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;FeatherlessLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-R1-0528-Qwen3-8B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Default conversational model
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;FeatherlessLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-3.1-24B-Instruct-2503&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create a support bot function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;support_bot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Select appropriate model
&lt;/span&gt;    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_llm_for_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create query engine with filters
&lt;/span&gt;    &lt;span class="n"&gt;query_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_query_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;similarity_top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Synthesize concise answers
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add chat context if available
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chat_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:]])&lt;/span&gt;
        &lt;span class="n"&gt;full_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Previous conversation:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Current question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;full_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;

    &lt;span class="c1"&gt;# Get response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;support_bot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s your refund policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;support_bot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I reset my password?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance and efficiency strategies
&lt;/h2&gt;

&lt;p&gt;As your RAG application scales, performance optimization becomes crucial. Start with embedding caching to avoid recomputing embeddings for documents you’ve already processed. LlamaIndex makes this straightforward with its storage context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Cache embeddings to avoid recomputation
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StorageContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.storage&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SimpleDocumentStore&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.embeddings.huggingface&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceEmbedding&lt;/span&gt;

&lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-small-en-v1.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;storage_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StorageContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_defaults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;persist_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./storage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;storage_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage_context&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Featherless’s monthly subscription, you have unlimited access to all models, which fundamentally changes how you approach optimization. Instead of minimizing token usage, you can experiment freely with different models to find the perfect fit for each use case. Don’t hesitate to use larger models for complex tasks where quality matters most. &lt;/p&gt;

&lt;p&gt;Focus your optimization efforts on reducing latency through query caching for common questions and implementing parallel processing for better throughput. Since you’re not counting tokens, you can run extensive A/B tests across multiple models simultaneously, gathering real performance data to make informed decisions about which models work best for different query types. This freedom to experiment without constraints means you can optimize for what really matters: response quality and user experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s next?
&lt;/h2&gt;

&lt;p&gt;You now have the foundation for building powerful RAG applications with LlamaIndex and the Featherless integration. Start by exploring the vast model selection at &lt;a href="http://featherless.ai" rel="noopener noreferrer"&gt;featherless.ai&lt;/a&gt;, you might discover specialized models perfect for your use case that you wouldn’t have considered before.&lt;/p&gt;

&lt;p&gt;As your application grows, consider adding persistence with vector databases to handle larger document collections. Implement evaluation metrics to measure your retrieval quality and iterate on your chunking strategies. The real power comes when you start building agents that combine RAG with tool use, enabling complex workflows that go beyond simple Q&amp;amp;A. &lt;/p&gt;

&lt;p&gt;Join our community on &lt;a href="//discord.gg/featherlessai"&gt;Discord&lt;/a&gt; to share your builds and learn from others who are pushing the boundaries of what’s possible with RAG.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>llm</category>
      <category>ai</category>
      <category>rag</category>
    </item>
    <item>
      <title>Building Production-Ready LLM Apps with LangChain &amp; Featherless Serverless Inference</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Fri, 23 May 2025 15:25:01 +0000</pubDate>
      <link>https://dev.to/featherlessai/building-production-ready-llm-apps-with-langchain-featherless-serverless-inference-3kp4</link>
      <guid>https://dev.to/featherlessai/building-production-ready-llm-apps-with-langchain-featherless-serverless-inference-3kp4</guid>
      <description>&lt;p&gt;As the open source AI ecosystem rapidly evolves, developers are faced with two growing challenges: managing infrastructure and evaluating the ever-expanding universe of models. By integrating with LangChain, Featherless now enables you to build and scale LLM-powered applications with zero infrastructure hassle and instant access to over 4,300 open source models. Following up on our previous post, “&lt;a href="https://featherless.ai/blog/zero-to-ai-deploying-language-models-without-the-infrastructure-headache" rel="noopener noreferrer"&gt;Zero to AI: Deploying Language Models without the Infrastructure Headache,&lt;/a&gt;” we’re thrilled to announce a significant leap forward: &lt;strong&gt;Featherless now has a native integration with LangChain!&lt;/strong&gt; You can find us on the &lt;a href="https://python.langchain.com/docs/integrations/chat/featherless_ai/" rel="noopener noreferrer"&gt;LangChain Python documentation.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Prototype to Production: Why Combining LangChain + Featherless is a Game-Changer
&lt;/h2&gt;

&lt;p&gt;While LangChain has pioneered how developers chain together LLM operations, the challenge of managing model infrastructure remains. With Featherless we hope to solve this piece of the puzzle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalable Infrastructure&lt;/strong&gt; - Deploy production-grade LLM applications without a single line of DevOps code. No GPU provisioning, no autoscaling headaches and no containers to manage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unlimited Model Flexibility&lt;/strong&gt; - Instant access to 4,300+ (and growing everyday) open source models through a single consistent API. Swap between Mistral, Llama, DeepSeek, Qwen and thousands more by changing just one parameter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable Pricing&lt;/strong&gt; - Featherless offers straightforward subscription-based pricing with no hidden costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid Prototyping &amp;amp; Testing&lt;/strong&gt; - Evaluate different models for your use case in minutes, not days. Experiment with model parameters and find the perfect balance of performance and cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is for you to focus on your application logic while we handle the heavy lifting of inference infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickstart: Launch your LangChain App with Featherless
&lt;/h2&gt;

&lt;p&gt;Getting started is incredibly straightforward&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install necessary packages:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain-core langchain-featherless-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Note: &lt;code&gt;langchain-featherless-ai&lt;/code&gt; is the dedicated package for our native integration.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Initialize ChatFeatherlessAi as your LLM provider in LangChain:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_featherless_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatFeatherlessAi&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Featherless LLM
# Best practice: Set your API key as an environment variable (FEATHERLESS_API_KEY)
# Or, you can pass it directly:
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatFeatherlessAi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;featherless_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Replace with your actual key
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-24B-Instruct-2501&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Example model
&lt;/span&gt;    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt; &lt;span class="c1"&gt;# Adjusted for a slogan
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define a prompt template
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is a creative slogan for a product called {product}?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define an output parser
&lt;/span&gt;&lt;span class="n"&gt;output_parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Construct the chain using LCEL's pipe (|) operator
&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;output_parser&lt;/span&gt;

&lt;span class="c1"&gt;# Invoke the chain
&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Featherless AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slogan for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;_Key Change: We are now using &lt;code&gt;ChatFeatherlessAi&lt;/code&gt; directly from &lt;code&gt;langchain_featherless_ai&lt;/code&gt; instead of the OpenAI-compatible endpoint. The API key can be passed directly or set via the &lt;code&gt;FEATHERLESS_API_KEY&lt;/code&gt; environment variable.&lt;br&gt;
_&lt;/p&gt;

&lt;p&gt;Done! You’ve just powered your LangChain application with a model from Featherless using our direct, native integration and modern LCEL syntax&lt;/p&gt;
&lt;h2&gt;
  
  
  Example Use Case: Building a RAG App with Native Featherless Integration
&lt;/h2&gt;

&lt;p&gt;Let’s dig deeper on the power of this native integration by building a lightweight RAG (Retrieval-Augmented Generation) system. This is perfect for creating Q&amp;amp;A bots over your own documents.&lt;/p&gt;

&lt;p&gt;We'll use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ChatFeatherlessAi&lt;/code&gt; for LLM inference.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LangChain&lt;/code&gt; (LCEL, community packages) for orchestration and retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FAISS&lt;/code&gt; (from &lt;code&gt;langchain-community&lt;/code&gt;) as a simple in-memory vector store.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HuggingFaceEmbeddings&lt;/code&gt; (from &lt;code&gt;langchain-huggingface&lt;/code&gt;) for document embedding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;1. Install additional packages for RAG:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-community langchain-huggingface langchain-text-splitters faiss-cpu sentence-transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Note: &lt;code&gt;faiss-cpu&lt;/code&gt; is for CPU-based FAISS, use &lt;code&gt;faiss-gpu&lt;/code&gt; if you have a GPU setup.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Ingest and Index Your Documents&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_huggingface&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_text_splitters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="c1"&gt;# Create a dummy "your_document.txt" file in the same directory for this example:
# File content: "The Featherless API provides access to many LLMs. It's designed for ease of use and developer productivity."
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_document.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The Featherless API provides access to many LLMs. It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s designed for ease of use and developer productivity.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./your_document.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error preparing or loading document: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please ensure you can write to &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_document.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or create it manually.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;split_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Using a common, reliable sentence transformer model
&lt;/span&gt;    &lt;span class="n"&gt;embeddings_model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-mpnet-base-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings_model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;split_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; 
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error creating FAISS vector store or retriever: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This might be related to your PyTorch/Torchvision/FAISS setup.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ensure you followed Step 1 for installing PyTorch correctly.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No documents loaded, retriever will not be initialized.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Set Up &lt;code&gt;ChatFeatherlessAi&lt;/code&gt; and Build the RAG Chain using LCEL:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnableParallel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_featherless_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatFeatherlessAi&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Featherless LLM
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatFeatherlessAi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;featherless_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Replace
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-24B-Instruct-2501&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt; &lt;span class="c1"&gt;# Lower temperature for more factual RAG
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# RAG Prompt Template
&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer the question based only on the following context:
{context}

Question: {question}

Answer:&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;rag_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Helper function to format retrieved documents
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Construct the RAG chain using LCEL
&lt;/span&gt;    &lt;span class="n"&gt;rag_chain_from_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;format_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])))&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;rag_prompt&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;rag_chain_with_source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunnableParallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rag_chain_from_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Example Invocation
&lt;/span&gt;    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the Featherless API designed for?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_chain_with_source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sources:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (Metadata: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RAG chain not created as retriever is unavailable.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example demonstrates a modern, LCEL-based approach to building a sophisticated RAG system, seamlessly powered by serverless inference from Featherless and orchestrated via LangChain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effortless Model Experimentation: Remember the &lt;code&gt;model&lt;/code&gt; Parameter
&lt;/h2&gt;

&lt;p&gt;Want to see if LLaMA 3 provides better answers for your use case? Or perhaps test DeepSeek's capabilities? With the native &lt;code&gt;ChatFeatherlessAi&lt;/code&gt; integration, switching models is as simple as updating the &lt;code&gt;model&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize with LLaMA 3
&lt;/span&gt;&lt;span class="n"&gt;llm_llama3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatFeatherlessAi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;featherless_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Example LLaMA 3 model from Featherless
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or try DeepSeek
&lt;/span&gt;&lt;span class="n"&gt;llm_deepseek&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatFeatherlessAi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;featherless_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-V3-0324&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Example DeepSeek model from Featherless
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Then, you can plug these into your LCEL chains:
# new_chain = prompt | llm_llama3 | output_parser
# new_rag_chain_with_source = RunnableParallel(...).assign(answer=rag_chain_from_docs.with_llm(llm_deepseek))
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This frictionless model evaluation is a game-changer for prompt tuning and finding the perfect LLM for your specific task, all within a familiar LangChain paradigm.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started with Featherless and LangChain:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create your free Featherless account:&lt;/strong&gt; Sign up at Featherless.ai&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grab your API key:&lt;/strong&gt; Find it on your Featherless dashboard. Set it as an environment variable &lt;code&gt;FEATHERLESS_API_KEY&lt;/code&gt; or pass it directly to &lt;code&gt;ChatFeatherlessAi&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore the Model Catalog:&lt;/strong&gt; Discover over 4,300 models ready for instant deployment. Check the latest list &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;here.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;langchain-featherless-ai&lt;/code&gt; and other necessary &lt;code&gt;langchain&lt;/code&gt; packages, then use &lt;code&gt;ChatFeatherlessAi&lt;/code&gt; as shown.&lt;/li&gt;
&lt;li&gt;Dive into the Docs:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://featherless.ai/docs/getting-started" rel="noopener noreferrer"&gt;Featherless API Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/integrations/chat/featherless_ai/" rel="noopener noreferrer"&gt;Official LangChain Integration Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/concepts/lcel/" rel="noopener noreferrer"&gt;LangChain Expression Language (LCEL)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts: Build Without Limits
&lt;/h2&gt;

&lt;p&gt;The native synergy between LangChain's powerful orchestration (especially with LCEL) and Featherless's &lt;code&gt;ChatFeatherlessAi&lt;/code&gt; component is set to redefine how developers build, test, and ship LLM-powered applications. By removing infrastructure bottlenecks and providing vast model choice through a dedicated integration, we're empowering you to focus solely on innovation. Cold starts, model hosting, and scaling headaches are now a thing of the past.&lt;/p&gt;

&lt;p&gt;Ready to build your next groundbreaking LLM app without the usual friction? Join our &lt;a href="https://discord.gg/7gybCMPjVA" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; today to get help building your first app!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try Featherless with LangChain's native integration today!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Featherless: Open source LLMs. One API. Zero infrastructure.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>opensource</category>
      <category>langchain</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Running OpenHands LM 32B with Featherless.ai: A Practical Guide</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Tue, 08 Apr 2025 08:49:40 +0000</pubDate>
      <link>https://dev.to/featherlessai/running-openhands-lm-32b-with-featherlessai-a-practical-guide-bpl</link>
      <guid>https://dev.to/featherlessai/running-openhands-lm-32b-with-featherlessai-a-practical-guide-bpl</guid>
      <description>&lt;p&gt;The landscape of AI-powered software development is evolving at breakneck speed, and the release of &lt;strong&gt;OpenHands LM 32B&lt;/strong&gt; marks a significant leap forward. This powerful, open-source coding model, boasting an impressive 37.2% resolve rate on SWE-Bench Verified, brings enterprise-grade AI assistance directly to your local environment. By pairing it with &lt;strong&gt;Featherless.ai&lt;/strong&gt;'s efficient model hosting, you create a potent yet accessible development setup. Whether you're a solo developer aiming to accelerate your workflow or part of a team seeking freedom from proprietary solutions, this combination offers compelling advantages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Choose Featherless.ai for Your OpenHands LM Deployment?
&lt;/h2&gt;

&lt;p&gt;Running large models like OpenHands LM 32B locally can be resource-intensive. Featherless.ai provides an elegant solution, standing out in the crowded AI inference market with its unique approach to model hosting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vast Model Catalog&lt;/strong&gt;: Easily access an extensive library of models, including the &lt;strong&gt;OpenHands LM 32B&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable Subscription Pricing&lt;/strong&gt;: Enjoy straightforward costs with a subscription, avoiding volatile pay-per-token fees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to get started? The first step is installing the OpenHands application itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing OpenHands
&lt;/h2&gt;

&lt;p&gt;To get started with OpenHands, you'll need to install the application first. The installation process varies depending on your operating system and device. I recommend following the official installation guide provided by All Hands:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.all-hands.dev/modules/usage/installation" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenHands Installation Guide&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The documentation provides detailed instructions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing on macOS, Windows, and Linux&lt;/li&gt;
&lt;li&gt;System requirements&lt;/li&gt;
&lt;li&gt;Configuration options&lt;/li&gt;
&lt;li&gt;Troubleshooting common installation issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ensure you have OpenHands installed and running correctly before proceeding to connect it with the powerful LM hosted on Featherless.ai.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting OpenHands to Featherless.ai
&lt;/h2&gt;

&lt;p&gt;With the OpenHands application installed, it's time to connect it to the &lt;strong&gt;OpenHands LM 32B&lt;/strong&gt; model running efficiently on Featherless.ai. This integration unlocks the model's power without requiring local GPU resources. Here's how to set up the integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, you'll need to create an account on &lt;a href="https://featherless.ai/register" rel="noopener noreferrer"&gt;&lt;strong&gt;Featherless.ai&lt;/strong&gt;&lt;/a&gt; if you don't already have one.&lt;/li&gt;
&lt;li&gt;Once logged in, navigate to your account settings to obtain your API key. This key will authenticate your requests to the &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;&lt;strong&gt;Featherless.ai&lt;/strong&gt;&lt;/a&gt; API.&lt;/li&gt;
&lt;li&gt;Open the OpenHands application (usually configured on &lt;strong&gt;localhost:3000&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Go to the settings page (gear icon, typically at the bottom left).&lt;/li&gt;
&lt;li&gt;Since Featherless.ai provides an OpenAI-compatible API endpoint, we can configure OpenHands to use it by setting the following options within the application:&lt;/li&gt;
&lt;li&gt;Enable Advanced options (toggle switch).&lt;/li&gt;
&lt;li&gt;Set the following:

&lt;ul&gt;
&lt;li&gt;Custom Model to &lt;code&gt;openai/all-hands/openhands-lm-32b-v0.1&lt;/code&gt; . The openai/ prefix tells OpenHands to use the OpenAI API format with the specified model available on Featherless.ai.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Base URL&lt;/code&gt; to &lt;a href="https://api.featherless.ai/v1" rel="noopener noreferrer"&gt;https://api.featherless.ai/v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;API Key&lt;/code&gt; to your Featherless API Key&lt;/li&gt;
&lt;li&gt;Disable memory condensation&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Fill in your Git Provider Settings if necessary (e.g., GitHub token).&lt;/li&gt;

&lt;li&gt;Save Changes!&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwechlbthits3h5k9g1wt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwechlbthits3h5k9g1wt.png" alt="OpenHands LLM Settings" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's all it takes! Your OpenHands application is now powered by the sophisticated &lt;strong&gt;OpenHands LM 32B&lt;/strong&gt; via Featherless.ai. Start experimenting! Try feeding it complex coding challenges, asking it to refactor code, or even resolving GitHub issues directly. You might be surprised by its problem-solving prowess. Furthermore, this same setup process works seamlessly with other cutting-edge models available in the extensive Featherless.ai catalog, like the recent &lt;strong&gt;DeepSeek V3&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Ready to start building? Head over to &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;https://featherless.ai/&lt;/a&gt; to create an account. Our growing community of developers, enthusiasts, and AI practitioners is here to help you get the most out of Featherless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://discord.gg/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community to connect with other users&lt;/li&gt;
&lt;li&gt;Follow us on Twitter (&lt;a href="https://x.com/featherlessai" rel="noopener noreferrer"&gt;@FeatherlessAI&lt;/a&gt;) for the latest updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll be looking forward to seeing what you all create and share with the community.&lt;/p&gt;

</description>
      <category>coding</category>
      <category>tooling</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Initial Support for Google's Gemma 3 27B Models Now Live on Featherless.ai!</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Mon, 07 Apr 2025 14:45:00 +0000</pubDate>
      <link>https://dev.to/featherlessai/initial-support-for-googles-gemma-3-27b-models-now-live-on-featherlessai-17h5</link>
      <guid>https://dev.to/featherlessai/initial-support-for-googles-gemma-3-27b-models-now-live-on-featherlessai-17h5</guid>
      <description>&lt;p&gt;We’re thrilled to announce that we’ve added initial support for &lt;a href="https://featherless.ai/model-families/gemma3" rel="noopener noreferrer"&gt;Google’s Gemma 3 27B&lt;/a&gt; models to the Featherless.ai serverless inference platform! After dedicated work from our team, &lt;strong&gt;both the instruct-tuned and pre-trained versions of the 27B parameter model&lt;/strong&gt; are now active and ready for use.&lt;/p&gt;

&lt;p&gt;This marks the first step in bringing Gemma 3, Google’s latest state-of-the-art open model family, to our users. We plan to start onboarding &lt;strong&gt;fine-tuned versions of Gemma 3 27B over the next week.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 3 27B on Featherless.ai: Powerful Inference Without the Complexity
&lt;/h2&gt;

&lt;p&gt;Access the impressive performance of Gemma 3 27B through our simple serverless API, letting you focus on building great applications instead of managing infrastructure. Built on the same research and technology behind Google’s Gemini models, Gemma 3 27B delivers cutting-edge AI capabilities with just an API call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Updates &amp;amp; What’s Next
&lt;/h2&gt;

&lt;p&gt;Alongside this model release, we’ve also &lt;strong&gt;pulled in the latest version of vLLM.&lt;/strong&gt; This foundational update means exciting features like &lt;strong&gt;tool calling, custom grammar support, and vision pipelines for Gemma&lt;/strong&gt; are now solidly on our roadmap (and perhaps even closer than expected!). A big shoutout to our dedicated Inference team members for making these advancements possible!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Growing Spectrum of Open Models on Featherless.ai
&lt;/h2&gt;

&lt;p&gt;With the addition of Gemma 3 27B, our platform continues to offer a diverse range of powerful open models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-R1&lt;/strong&gt;: Pushing the boundaries of reasoning with 671B parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwerky-72B&lt;/strong&gt;: Built on the efficient RWKV architecture (sub-quadratic scaling) designed for significantly reduced inference costs (VRAM and compute).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QwQ-32B&lt;/strong&gt;: Strong reasoning in an efficient 32B parameter package.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 3 27B&lt;/strong&gt;: Google’s latest advancements available in a powerful 27B model.
This selection provides developers with significant choice in matching model capabilities to their specific application needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What You Can Do with Gemma 3 27B
&lt;/h2&gt;

&lt;p&gt;With our serverless implementation of Gemma 3 27B, you can immediately:&lt;/p&gt;

&lt;p&gt;Leverage model performance designed to be highly competitive, even against much larger models in preliminary evaluations.&lt;br&gt;
Build multilingual applications with its broad language support.&lt;br&gt;
Utilize the power of the 27B parameter model for demanding generative AI tasks.&lt;br&gt;
For Featherless.ai users seeking powerful AI through a simple API, Gemma 3 27B presents an exciting new option.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Gemma 3 27B on Featherless.ai Today!
&lt;/h2&gt;

&lt;p&gt;The Gemma 3 27B models are ready for immediate use through our platform. Whether you’re building applications that need sophisticated language understanding or exploring the capabilities of this new model, our serverless API provides the simplest path to integration.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat with it on &lt;a href="https://phoenix.featherless.ai/" rel="noopener noreferrer"&gt;Phoenix&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Integrate via the &lt;a href="https://docs.anthropic.com/en/api/getting-started" rel="noopener noreferrer"&gt;Featherless API&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore our documentation: Check out &lt;a href="https://featherless.ai/blog/zero-to-ai-deploying-language-models-without-the-infrastructure-headache" rel="noopener noreferrer"&gt;our implementation guides&lt;/a&gt; and &lt;a href="https://github.com/featherlessai" rel="noopener noreferrer"&gt;example code&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have questions about using Gemma 3 27B through our serverless platform? Reach out to us on &lt;a href="https://discord.com/invite/7gybCMPjVA" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or check our documentation for API references and best practices.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Supercharging Your Development Workflow: Integrating Featherless.ai with Aider and Cursor</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Thu, 03 Apr 2025 13:54:40 +0000</pubDate>
      <link>https://dev.to/featherlessai/supercharging-your-development-workflow-integrating-featherlessai-with-aider-and-cursor-39d6</link>
      <guid>https://dev.to/featherlessai/supercharging-your-development-workflow-integrating-featherlessai-with-aider-and-cursor-39d6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;So you’ve heard that the latest DeepSeek V3 or Openhands LM model is good at coding and now you’re wondering how to bring that power directly into your coding workflow. You’ve come to the right place, at &lt;a href="http://Featherless.ai" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt; we give you access to not only the latest DeepSeek but to any open-model on Hugging Face without the headache of managing infrastructure. &lt;/p&gt;

&lt;p&gt;This guide walks you through integrating &lt;a href="http://Featherless.ai" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt; with two popular AI-assisted coding tools: Aider and Cursor. Whether you’re pair programming with an AI assistant through Aider’s command-line interface or leveraging Cursor’s intelligent code completion and refactoring capabilities. Featherless.ai can significantly enhance your development experience by giving you access to the latest AND any future open-source models  you will want to work with. &lt;/p&gt;

&lt;p&gt;At the end we’ll go over some of our favorite models for coding but let’s first get started with the integration process&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Featherless API Key
&lt;/h2&gt;

&lt;p&gt;You’ll need a couple of things before you start. First and foremost, you’ll need a Featherless API key to connect to any of the 4000+ open models in our catalog. Head over to &lt;a href="http://featherless.ai" rel="noopener noreferrer"&gt;featherless.ai&lt;/a&gt; and create an account if you haven’t already. Once logged in, navigate to the API section in your dashboard where you can generate a new API key. This key is your secure passport to accessing all the powerful open-source models we host, including DeepSeek V3. Keep this key handy as you’ll need to configure it in both Aider and Cursor in the following steps. Remember to treat your API key like a password, don’t share it publicly or commit it to version control systems.&lt;/p&gt;

&lt;p&gt;You will also need to choose a model in our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt;. Let’s take the code-specific Qwen model called &lt;code&gt;Qwen/Qwen2.5-Coder-32B-Instruct&lt;/code&gt; as an example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Aider with Featherless.ai
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;First, you'll need to install Aider if you haven't already. Aider is a command-line tool that lets you pair program with AI models directly from your terminal. Visit &lt;a href="https://aider.chat/docs/installation.html" rel="noopener noreferrer"&gt;Aider's official documentation&lt;/a&gt; for the most up-to-date installation instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integrating with Featherless.ai
&lt;/h3&gt;

&lt;p&gt;Once Aider is installed, integrating it with Featherless.ai requires just two configuration files in your project folder:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a model settings file named &lt;code&gt;.aider.model.settings.yml&lt;/code&gt;:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cache_control&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;caches_by_default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;edit_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;whole&lt;/span&gt;
  &lt;span class="na"&gt;examples_as_sys_msg&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;extra_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;
  &lt;span class="na"&gt;lazy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/Qwen/Qwen2.5-Coder-32B-Instruct&lt;/span&gt;
  &lt;span class="na"&gt;reminder&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user&lt;/span&gt;
  &lt;span class="na"&gt;send_undo_reply&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;streaming&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;use_repo_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;use_system_prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;use_temperature&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create a model metadata file named &lt;code&gt;.aider.model.metadata.json&lt;/code&gt;:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"openai/Qwen/Qwen2.5-Coder-32B-Instruct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"max_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"max_output_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"input_cost_per_token"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"output_cost_per_token"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"litellm_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"support_vision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"support_function_calling"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running Aider with Featherless.ai
&lt;/h3&gt;

&lt;p&gt;Now you can start Aider with Featherless.ai by running the following command in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--openai-api-base&lt;/span&gt; &lt;span class="s1"&gt;'https://api.featherless.ai/v1'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--openai-api-key&lt;/span&gt; your_featherless_API_key &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--model&lt;/span&gt; &lt;span class="s1"&gt;'openai/Qwen/Qwen2.5-Coder-32B-Instruct'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--map-tokens&lt;/span&gt; 1024 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--model-metadata-file&lt;/span&gt; &lt;span class="s1"&gt;'/path/to/.aider.model.metadata.json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--model-settings-file&lt;/span&gt; &lt;span class="s1"&gt;'/path/to/.aider.model.settings.yml'&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;your_featherless_API_key&lt;/code&gt; with the API key you obtained from the Featherless.ai dashboard&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;/path/to/&lt;/code&gt; to the actual path where you saved your configuration files&lt;/li&gt;
&lt;li&gt;You can create separate configuration files for each model you want to use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it! You're now ready to use powerful models like Qwen2.5-Coder-32B through Aider, all powered by Featherless.ai's infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Cursor with Featherless.ai
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Configuring Cursor to Use Featherless Models
&lt;/h3&gt;

&lt;p&gt;Cursor is a powerful AI-assisted code editor that can be enhanced with custom models from Featherless.ai. Here's how to set it up step by step:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Cursor Settings&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Launch the Cursor application&lt;/li&gt;
&lt;li&gt;Click on the gear icon in the top right corner, or use the keyboard shortcut &lt;code&gt;Ctrl+Shift+J,&lt;/code&gt; (Windows/Linux) or &lt;code&gt;Cmd+Shift+J,&lt;/code&gt; (Mac)&lt;/li&gt;
&lt;li&gt;Navigate to the "Models" section in the sidebar&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure Custom Models&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Uncheck all pre-selected models to start&lt;/li&gt;
&lt;li&gt;Click the "Add Model" button&lt;/li&gt;
&lt;li&gt;Enter &lt;code&gt;Qwen/Qwen2.5-Coder-32B-Instruct&lt;/code&gt; as the model name&lt;/li&gt;
&lt;li&gt;If you want to use other models later, you can add them the same way&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftj8vaf14ij1akp7bni1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftj8vaf14ij1akp7bni1.png" alt="Add Custom Model" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Set Up Featherless API Connection&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Look for the "OpenAI API Key" section in the settings&lt;/li&gt;
&lt;li&gt;Enter your Featherless API key in the "OpenAI Key" field&lt;/li&gt;
&lt;li&gt;Find the "Override OpenAI Base URL" option and enter: &lt;code&gt;https://api.featherless.ai/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Click the "Save and Verify" button to test your connection&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1z3qa49s54qln9zqq9zs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1z3qa49s54qln9zqq9zs.png" alt="Custom API Settings" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start Using Your Custom Model&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Open or create a project in Cursor&lt;/li&gt;
&lt;li&gt;Click on the chat/AI button in the sidebar&lt;/li&gt;
&lt;li&gt;In the model selector dropdown at the top of the chat panel, select &lt;code&gt;Qwen/Qwen2.5-Coder-32B-Instruct&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start coding with your Featherless-powered model!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Troubleshooting Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connection Failed?&lt;/strong&gt; Double-check your API key for typos and ensure the base URL is entered correctly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Not Appearing?&lt;/strong&gt; Try restarting Cursor after saving your settings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow Responses?&lt;/strong&gt; Complex coding tasks might take a moment - the model is processing your entire codebase context&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recommended Models for Coding
&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up your tools with Featherless.ai, here are some of our top model recommendations:&lt;/p&gt;

&lt;h3&gt;
  
  
  Best All-Around Coding Models
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://featherless.ai/models/all-hands/openhands-lm-32b-v0.1" rel="noopener noreferrer"&gt;Openhands LM&lt;/a&gt; - The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://featherless.ai/blog/deepseek-ai/DeepSeek-V3-0324" rel="noopener noreferrer"&gt;DeepSeek V3&lt;/a&gt; - Excellent balance of speed and accuracy for general coding tasks&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://featherless.ai/blog/Qwen/Qwen2.5-Coder-32B-Instruct" rel="noopener noreferrer"&gt;Qwen2.5-Coder-32B&lt;/a&gt; - Best for complex projects and production-quality code&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://featherless.ai/blog/open-r1/OlympicCoder-32B" rel="noopener noreferrer"&gt;open-r1/OlympicCoder-32B&lt;/a&gt; - A code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>cursor</category>
      <category>aider</category>
    </item>
    <item>
      <title>Building a PDF-to-Podcast Pipeline with Open-Source AI: From Text Extraction to Voice Synthesis</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Tue, 11 Mar 2025 08:59:08 +0000</pubDate>
      <link>https://dev.to/featherlessai/building-a-pdf-to-podcast-pipeline-with-open-source-ai-from-text-extraction-to-voice-synthesis-eo1</link>
      <guid>https://dev.to/featherlessai/building-a-pdf-to-podcast-pipeline-with-open-source-ai-from-text-extraction-to-voice-synthesis-eo1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Imagine this: you’re jogging through the park, earbuds in, grinning as two lively voices chat about the latest AI research paper, just like it’s a podcast made just for you. Or picture a busy content creator with a pile of blog posts, dreaming of turning them into audio gold without spending hours recording. That’s where this AI-powered pipeline comes in. It takes static PDFs and transforms them into engaging, conversational podcasts using open-source tools. In this post, I’ll walk you through the whole process: extracting text, crafting fun scripts, and synthesizing natural audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Turn PDFs into Podcasts?
&lt;/h2&gt;

&lt;p&gt;PDFs are treasure troves of info, but let’s be real, they’re not exactly commute-friendly. Podcasts, though? They’re perfect for multitasking: driving, working out, or chilling out. The problem is, recording a podcast the old-school way: scripting, speaking, editing, is a time sink. This pipeline changes that. It automates the grind, so you can focus on the content. Here’s who could use it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Researchers&lt;/strong&gt;: Turn dense papers into listens for your morning run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professionals&lt;/strong&gt;: Make industry reports your gym-session soundtrack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bloggers&lt;/strong&gt;: Repurpose old posts into fresh podcast episodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technologies Used&lt;/strong&gt;&lt;br&gt;
The pipeline leverages several powerful open-source technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/pymupdf/PyMuPDF" rel="noopener noreferrer"&gt;PyMuPDF&lt;/a&gt;&lt;/strong&gt;: For extracting text content from PDFs while preserving structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt;&lt;/strong&gt; API: Access to all open-weight models on Hugging Face for text cleaning and creative podcast script generation by using roleplay finetunes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/hexgrad/kokoro" rel="noopener noreferrer"&gt;Kokoro TTS&lt;/a&gt;&lt;/strong&gt;: Converts text into natural-sounding audio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python Libraries&lt;/strong&gt;: Tools like Pandas, NumPy, and PyDub handle data and audio processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Complete Pipeline Overview&lt;/strong&gt;&lt;br&gt;
This pipeline architecture consists of four main stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text Extraction and Cleaning&lt;/strong&gt;: Converting PDF to structured, readable text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Podcast Script Generation&lt;/strong&gt;: Transforming factual content into natural dialogue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TTS Optimization&lt;/strong&gt;: Formatting the script for speech synthesis compatibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio Generation&lt;/strong&gt;: Creating and combining audio segments into a cohesive podcast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive into each stage in detail.&lt;/p&gt;

&lt;p&gt;The pipeline consists of four interconnected Jupyter notebooks, each handling a specific stage of the transformation process:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PDF Document → Text Extraction → Script Generation → TTS Optimization → Audio Generation&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Stage 1: Text Extraction and Cleaning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Extracting text from PDFs with PyMuPDF&lt;/strong&gt;&lt;br&gt;
The first challenge is to extract text from PDF documents while preserving its meaning and structure. PDFs are notoriously difficult to parse correctly, as they can contain multiple columns, images, headers, footers, and complex layouts. I chose PyMuPDF (via the pymupdf4llm wrapper) for its ability to handle these complexities. Here’s the core extraction function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;validate_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Convert PDF to markdown text
&lt;/span&gt;        &lt;span class="n"&gt;markdown_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pymupdf4llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Truncate if exceeds max_chars
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Truncating text to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;markdown_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;markdown_text&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;nExtraction complete! Total characters: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;markdown_text&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An unexpected error occurred: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What’s happening here? It checks the PDF’s legit, pulls text as Markdown to preserve structure (like headings), and trims it if it’s massive. For non-coders: this is like a super-smart photocopier that grabs only the words you care about. Watch out, though scanned PDFs or locked files might need some extra work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleaning and Structuring Content&lt;/strong&gt;&lt;br&gt;
Raw PDF text is often cluttered with page numbers, headers, footers, and other elements that don’t belong in a podcast script. Plus, academic and technical documents frequently contain notation that doesn’t translate well to speech. I used the Featherless.ai API to process and clean this text. This approach leverages large language models to understand the content and reformat it appropriately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_num&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process a chunk of text using Featherless API&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYS_PROMPT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text_chunk&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;processed_text&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error processing chunk &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_chunk&lt;/span&gt;  &lt;span class="c1"&gt;# Return original text in case of error
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt tells the model to keep the good stuff and ditch the rest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a world class text pre-processor, here is the raw data from a PDF, 
please parse and return it in a way that is crispy and usable to send to a 
podcast writer.
The raw data is messed up with new lines, LaTeX math and you will see fluff 
that we can remove completely. Basically take away any details that you think 
might be useless in a podcast author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s transcript.
Remember, the podcast could be on any topic whatsoever so the issues listed 
above are not exhaustive.
Please be smart with what you remove and be creative ok?
Remember DO NOT START SUMMARIZING THIS, YOU ARE ONLY CLEANING UP THE TEXT 
AND RE-WRITING WHEN NEEDED.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Before: &lt;code&gt;#Intro\nPage 1\nData is key\\LaTeX{math}here.&lt;/code&gt;&lt;br&gt;
After: Data is key&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handling Technical Challenges&lt;/strong&gt;&lt;br&gt;
Big PDFs bring big challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Limits&lt;/strong&gt;: Huge files can crash things, so I split text into 1,000-character chunks , like this:
&lt;code&gt;Text → [Chunk 1 | Chunk 2 | Chunk 3] → Processed.&lt;/code&gt;
Each chunk gets cleaned, then reassembled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weird Layouts&lt;/strong&gt;: PyMuPDF and the LLM team up to straighten out columns and tables so the flow makes sense.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output of this stage is clean, well-structured text that captures the essential information from the PDF in a format suitable for conversion to podcast dialogue.&lt;/p&gt;
&lt;h2&gt;
  
  
  Stage 2: Podcast Script Generation
&lt;/h2&gt;

&lt;p&gt;Next, we transform the text into a dialogue between two speakers using the &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai API&lt;/a&gt; and a large language model (LLM) of choice. It creates a natural back-and-forth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speaker 1&lt;/strong&gt;: The explainer, dropping clear insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speaker 2&lt;/strong&gt;: The curious one, tossing in questions and quirks.
Here’s an example output:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SPEAKER 1: Data is critical for AI—it’s what powers the system, much like fuel for an engine.
SPEAKER 2: So, if the data isn’t great, does that affect how well the AI performs?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The LLM adds natural phrasing to make it feel like a real conversation, not just a read-aloud.&lt;/p&gt;
&lt;h2&gt;
  
  
  Stage 3: TTS Optimization
&lt;/h2&gt;

&lt;p&gt;While the previous stage generated a conversational podcast script, this stage takes a different approach focused specifically on Text-to-Speech (TTS) compatibility. Instead of further processing the output from stage 2, we revisit the raw extracted text and apply specialized prompt engineering to generate a script format optimized for voice synthesis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge of TTS-Ready Scripts&lt;/strong&gt;&lt;br&gt;
Text-to-speech engines often struggle with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural-sounding dialogue that maintains distinct speaker voices&lt;/li&gt;
&lt;li&gt;Appropriate pacing and pauses&lt;/li&gt;
&lt;li&gt;Handling emotional expressions and reactions&lt;/li&gt;
&lt;li&gt;Structured, predictable formats for programmatic processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of this stage is to transform our basic script into a structured format that both preserves its conversational nature and ensures reliable TTS processing as well as adds some flair to the conversation by using a specialized roleplaying language model accessed through &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SYSTEM_PROMPT = """
You are an international Oscar-winning screenwriter who has worked with 
multiple award-winning podcasters.Your job is to rewrite the provided podcast 
transcript for an AI Text-To-Speech pipeline. 
The original transcript was written by a less experienced AI, so you need 
to enhance it significantly.
Create an engaging dialogue between two speakers, each with distinct personalities:
- Speaker 1: A captivating teacher who leads the conversation, 
explains concepts with vivid analogies and personal anecdotes, 
and makes the topic accessible and memorable. They speak clearly and 
confidently, without using filler words like "umm" or "hmm."
- Speaker 2: A curious and enthusiastic learner who keeps the conversation 
on track by asking follow-up questions. They often get excited or confused, 
expressing their reactions verbally with phrases like "That's fascinating!", 
"Wait, I'm not sure I get that," or "Wow, that's like [analogy]."
[Additional instructions...]
Return the dialogue as a list of tuples, like this:
[
    ("Speaker 1", "Text here"),
    ("Speaker 2", "Text here"),
    ...
]
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt engineers several crucial elements for TTS success:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Speaker-Specific Speech Patterns:&lt;/strong&gt; By assigning distinct personalities, the model creates natural variations in speech patterns that TTS systems can interpret more distinctively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controlled Filler Usage:&lt;/strong&gt; Speaker 1 avoids filler words while Speaker 2 can use them, creating natural rhythm without overwhelming the TTS engine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Data Format:&lt;/strong&gt; The list of tuples creates a programming-friendly format that simplifies integration with TTS systems in the next stage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By generating the script in this structured format, we eliminate many common TTS issues before they occur. The next stage can directly process this optimized script without additional parsing or formatting, streamlining the pipeline from text to spoken audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 4: Audio Generation with Kokoro
&lt;/h2&gt;

&lt;p&gt;The final stage transforms our TTS-optimized script into audio using Kokoro, an open-source text-to-speech library that provides high-quality voice synthesis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice Selection and Configuration&lt;/strong&gt;&lt;br&gt;
Kokoro offers multiple voices with different characteristics. I selected distinct voices for each speaker to enhance the natural podcast feel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize separate pipelines for each speaker with different voices
# Using American English as the base language
&lt;/span&gt;&lt;span class="n"&gt;speaker1_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# American English
&lt;/span&gt;&lt;span class="n"&gt;speaker2_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# American English
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_speech_kokoro&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Select the appropriate pipeline and voice
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Use a female voice for Speaker 1
&lt;/span&gt;        &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;speaker1_pipeline&lt;/span&gt;
        &lt;span class="n"&gt;voice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;af_heart&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Female voice
&lt;/span&gt;        &lt;span class="n"&gt;speed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Use a male voice for Speaker 2
&lt;/span&gt;        &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;speaker2_pipeline&lt;/span&gt;
        &lt;span class="n"&gt;voice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;am_fenrir&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Male voice
&lt;/span&gt;        &lt;span class="n"&gt;speed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.1&lt;/span&gt;  &lt;span class="c1"&gt;# Slightly faster
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our podcast, I chose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speaker 1: &lt;code&gt;af_heart&lt;/code&gt; - Female American English voice with excellent quality&lt;/li&gt;
&lt;li&gt;Speaker 2: &lt;code&gt;am_fenrir&lt;/code&gt; - Male American English voice with good quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These different voices create a clear distinction between speakers, making the podcast easier to follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Combining Segments with Proper Timing&lt;/strong&gt;&lt;br&gt;
To create a cohesive podcast, we need to combine individual audio segments with appropriate spacing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generate the podcast
&lt;/span&gt;&lt;span class="n"&gt;final_podcast&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AudioSegment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;podcast_segments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generating podcast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;speaker_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Speaker 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Generate audio for this segment
&lt;/span&gt;    &lt;span class="n"&gt;audio_segment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_speech_kokoro&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;speaker_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;audio_segment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add slight pause between segments
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;final_podcast&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;AudioSegment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;silent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 500ms pause
&lt;/span&gt;        &lt;span class="c1"&gt;# Add to podcast
&lt;/span&gt;        &lt;span class="n"&gt;final_podcast&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;audio_segment&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code adds a half-second pause between speaker transitions, creating a natural rhythm in the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges and Solutions
&lt;/h2&gt;

&lt;p&gt;Building this pipeline wasn’t without hurdles. Here are some key challenges and how I tackled them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Handling Complex PDF Layouts&lt;/strong&gt;: PDFs with multi-column formats, images, or tables can be tricky. PyMuPDF’s Markdown conversion preserved some structure, but additional cleaning via the Featherless.ai API removed artifacts like page numbers and headers intelligently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generating Natural Dialogue&lt;/strong&gt;: Turning static text into a dynamic conversation required careful prompt engineering. I guided the LLM to include interruptions, filler words, and personality-driven responses, making the script feel authentic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizing for TTS&lt;/strong&gt;: Ensuring the script was TTS-friendly meant structuring it for easy synthesis. Using a tuple-based format and controlling filler usage prevented common TTS pitfalls, like unnatural pacing or mispronounced expressions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;p&gt;The pipeline works well, but there’s room to grow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Language Support&lt;/strong&gt;: Adding support for PDFs and podcasts in multiple languages would broaden its reach.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced TTS Features&lt;/strong&gt;: Integrating emotional tone adjustments or background music could make the podcasts more immersive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-Tuned Models&lt;/strong&gt;: Using LLMs fine-tuned for podcast script generation could enhance dialogue quality further.
Try languages first, it’s a fun, doable leap and there are ton of finetuned models on Featherless.ai to assist you with that.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This PDF-to-podcast pipeline demonstrates the remarkable potential of open-source AI when creatively combined. By bridging PyMuPDF’s extraction capabilities with &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai’s&lt;/a&gt; language models and Kokoro’s voice synthesis, we’ve created a system that transforms static documents into engaging audio experience.&lt;/p&gt;

&lt;p&gt;The true power lies in the modular design. Each component can be independently improved or replaced as new models emerge. Want to try a different LLM? Swap the API endpoint. Prefer different voices? Modify the TTS configuration. This flexibility makes it perfect for experimentation and customization.&lt;/p&gt;

&lt;p&gt;We encourage readers to fork the project and make it their own. You can listen to a sample podcast generated with this pipeline or grab the &lt;a href="https://github.com/featherlessai/featherless-podcast" rel="noopener noreferrer"&gt;full code on GitHub&lt;/a&gt; and start building your own. Try adding your own prompts, experiment with different voice combinations, or extend it to handle research papers and technical manuals. The future of content adaptation is open, accessible, and limited only by our imagination, happy podcasting!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>beginners</category>
      <category>ai</category>
    </item>
    <item>
      <title>QwQ-32B Now Available on Featherless.ai</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Fri, 07 Mar 2025 10:56:33 +0000</pubDate>
      <link>https://dev.to/featherlessai/qwq-32b-now-available-on-featherlessai-8jo</link>
      <guid>https://dev.to/featherlessai/qwq-32b-now-available-on-featherlessai-8jo</guid>
      <description>&lt;h2&gt;
  
  
  QwQ-32B: A Powerful Lightweight in the Age of Reasoning Models
&lt;/h2&gt;

&lt;p&gt;AI development continues to advance through diverse approaches to model design and optimization. &lt;strong&gt;DeepSeek-R1&lt;/strong&gt;, with its impressive 671B parameters, has established itself as one of the most capable reasoning-focused models on the market. Its remarkable capabilities have set new benchmarks for what models in this space can achieve.&lt;/p&gt;

&lt;p&gt;Meanwhile, efficiency and adaptability continue opening new frontiers, and this is where QwQ-32B, Qwen's latest release, makes its mark.&lt;/p&gt;

&lt;h2&gt;
  
  
  QwQ-32B: Efficient Reasoning Power
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;QwQ-32B&lt;/strong&gt; delivers &lt;strong&gt;high-level reasoning, problem-solving, and strong coding/math capabilities&lt;/strong&gt; in a lightweight package. With just 32B parameters, early benchmarks show impressive performance, making it an attractive option for those looking for strong reasoning capabilities in a more efficient format.&lt;/p&gt;

&lt;p&gt;With AI applications diversifying, the demand for models that deliver &lt;strong&gt;excellent performance with a smaller resource footprint&lt;/strong&gt; continues to grow. &lt;strong&gt;QwQ-32B exemplifies how well-optimized models can achieve remarkable results.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evolution of Reasoning Models
&lt;/h2&gt;

&lt;p&gt;The AI field is evolving rapidly. While early models focused heavily on &lt;strong&gt;generative fluency and knowledge retrieval&lt;/strong&gt;, today's most exciting breakthroughs are in &lt;strong&gt;models that can reason, plan, and solve complex problems&lt;/strong&gt;. DeepSeek-R1 has been instrumental in this evolution, demonstrating the power of advanced reasoning capabilities.&lt;/p&gt;

&lt;p&gt;Now, Qwen is expanding possibilities further, showing that &lt;strong&gt;reasoning power can be delivered in different formats to meet diverse needs.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does QwQ-32B Perform?
&lt;/h2&gt;

&lt;p&gt;Testing indicates that QwQ-32B:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Excels in logical reasoning tasks&lt;/strong&gt; with impressive structured problem-solving&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performs well in math and coding&lt;/strong&gt;, key benchmarks for reasoning capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offers strong efficiency&lt;/strong&gt;, delivering high performance with reduced compute requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For users seeking &lt;strong&gt;high-quality reasoning in an efficient package&lt;/strong&gt;, QwQ-32B presents an exciting option.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means For You
&lt;/h2&gt;

&lt;p&gt;With both QwQ-32B and DeepSeek-R1 available on &lt;u&gt;&lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt;&lt;/u&gt;, you now have multiple excellent options for advanced reasoning capabilities. Our ongoing optimization efforts ensure that you'll benefit from continuous improvements in both performance and functionality.&lt;/p&gt;

&lt;p&gt;We're committed to making advanced open AI models accessible and practical for everyone. Each model in our lineup offers unique advantages to suit different use cases and requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Our Models and Share Your Thoughts
&lt;/h2&gt;

&lt;p&gt;QwQ-32B is now available alongside DeepSeek-R1, and we want to hear about your experiences with both models. Each excels in reasoning tasks while offering different profiles in terms of scale and efficiency.&lt;/p&gt;

&lt;p&gt;Leave a review on the &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;Featherless.ai&lt;/a&gt; model page: &lt;a href="https://featherless.ai/models/Qwen/QwQ-32B" rel="noopener noreferrer"&gt;QwQ-32B&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have questions about integrating these models into your workflow? Reach out to us on Discord or check our documentation for implementation guidelines and best practices.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>qwen</category>
      <category>ai</category>
    </item>
    <item>
      <title>Unlimited DeepSeek-R1 now available to Featherless premium subscribers!</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Mon, 03 Feb 2025 13:31:04 +0000</pubDate>
      <link>https://dev.to/featherlessai/unlimited-deepseek-r1-now-available-to-featherless-premium-subscribers-3ld1</link>
      <guid>https://dev.to/featherlessai/unlimited-deepseek-r1-now-available-to-featherless-premium-subscribers-3ld1</guid>
      <description>&lt;p&gt;We’re happy to announce DeepSeek-R1 support is up on Featherless for our premium subscribers! With our simple monthly subscription (no pay-per-token fees), you get unlimited access.&lt;/p&gt;

&lt;p&gt;DeepSeek has achieved exceptional performance with significantly lower costs and computational resources, challenging giants like OpenAI, Google and Meta.&lt;/p&gt;

&lt;p&gt;Some highlights of the DeepSeek-R1 release:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance on par with OpenAI-o1&lt;/li&gt;
&lt;li&gt;MIT Licensed: (Check out some of the distills in our model catalog!)&lt;/li&gt;
&lt;li&gt;Fully open-source&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DeepSeek’s rise to the forefront was not done overnight, their success comes from a year of incremental, thoughtful, specialization across two critical domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specialization in mixture-of-experts (MoE) architecture&lt;/li&gt;
&lt;li&gt;GPU compute optimization (forged under the contraints of hardware sanctions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read more about how &lt;a href="https://substack.tech-talk-cto.com/p/how-deepseek-disrupted-the-ai-giants" rel="noopener noreferrer"&gt;DeepSeek disrupted the billion-dollar budgets and GPU arsenals of Big Tech.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Experience DeepSeek-R1 on Featherless:
&lt;/h2&gt;

&lt;p&gt;🦜Chat with it on &lt;a href="https://phoenix.featherless.ai/" rel="noopener noreferrer"&gt;Phoenix&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚡Integrate via the &lt;a href="https://featherless.ai/docs/getting-started" rel="noopener noreferrer"&gt;Featherless API&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔍Explore distills in our &lt;a href="https://featherless.ai/models?query=deepseek" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>deepseek</category>
      <category>opensource</category>
      <category>ai</category>
      <category>api</category>
    </item>
    <item>
      <title>Zero to AI: Deploying Language Models Without the Infrastructure Headache</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Fri, 17 Jan 2025 14:02:16 +0000</pubDate>
      <link>https://dev.to/featherlessai/zero-to-ai-deploying-language-models-without-the-infrastructure-headache-1e8c</link>
      <guid>https://dev.to/featherlessai/zero-to-ai-deploying-language-models-without-the-infrastructure-headache-1e8c</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;You've finally found it! The perfect language model on Hugging Face that seems exactly what you need for your project. The model size is reasonable, the generation quality amazing, the community feedback is great and it handles your specific use case beautifully! Excited to start building but then reality hits. Where do you actually deploy this model? You could maybe spin up a GPU instance on AWS, manage your own infrastructure and pray you don't burn through your budget before launching. Or maybe you could try one of those specialized platforms that requires learning another set of tools and workflows. What started as excitement suddenly turns into an infrastructure nightmare.&lt;/p&gt;

&lt;p&gt;As more developers venture into AI, the gap between finding a great model and using it in a production setting remains a significant step. While platforms like OpenAI and Anthropic have made inference-by-API seamless, the vast ecosystem of open-source models (often more niche and cost-effective for different use cases) remains out of reach for many developers who just want to build applications.&lt;/p&gt;

&lt;p&gt;Whether you're a solo developer or part of a larger team. Let's explore how Featherless can help you get from model discovery of almost any Hugging Face model to production without losing your sanity (or savings).&lt;/p&gt;

&lt;h2&gt;
  
  
  Hidden costs of model deployment
&lt;/h2&gt;

&lt;p&gt;Let's take a practical look at what deploying a model, such as Llama 3.1 (8B) actually entails in different scenarios.&lt;br&gt;
At one end, platforms like RunPod offer raw GPU access starting at around $0.20 per hour for 16GB VRAM instances. This is just the beginning however, you'll have to handle CUDA drivers, PyTorch dependencies, quantization techniques. At the other end services like Hugging Face inference abstract away much of this complexity though you're still fundamentally paying for dedicated GPU time. Then there's the challenge of scaling. How do you handle multiple concurrent requests? Load balancing? Suddenly, you need expertise in Docker, Kubernetes and a spectrum of monitoring tools.&lt;/p&gt;

&lt;p&gt;Inference-as-a-service, through providers like OpenRouter and AWS Bedrock offer attractive token prices with no configuration but they often come with their own set of challenges, rigid pricing structures that can reach $0.50 per million tokens or more, limited model selections and you're locked into the provider's ecosystem. As your usage scales, cost can become unpredictable and expensive, particularly if your application usage doesn't map in a simple way to tokens.&lt;br&gt;
What started as a simple model deployment can quickly evolve into a full-time infrastructure management project.&lt;/p&gt;
&lt;h2&gt;
  
  
  Enter Featherless
&lt;/h2&gt;

&lt;p&gt;This is where we at Featherless step in. Instead of building and maintaining your own infrastructure or getting locked into expensive managed services, Featherless provides direct access to Hugging Face's vast ecosystem of models through a simple (OpenAI-compatible) API. As a serverless inference platform we handle all the complex infrastructure orchestration behind the scenes while you maintain full control over your model selection and customization options.&lt;/p&gt;

&lt;p&gt;What makes this approach advantageous is that you can deploy almost any Hugging Face model in minutes, not days or weeks, without sacrificing performance or breaking your budget. We target an output inference of 10–40 tokens per second, depending on the model and prompt size while keeping your costs predictable. Whether you're experimenting with different models or just scaling your production workloads, Featherless enables quick iteration as you can switch models with just a simple configuration change.&lt;/p&gt;

&lt;p&gt;For developers who've worked with OpenAI's API the transition is easy, we maintain API and SDK compatibility while opening up access to a huge catalog of open-source models. Enabling you to leverage your existing codebase while gaining the freedom to choose and swap between any open-source model that fits your specific use case.&lt;/p&gt;
&lt;h2&gt;
  
  
  From Zero to Hello: 5-minute model deployment
&lt;/h2&gt;

&lt;p&gt;Let's get into practical implementation. The best way to understand the simplicity of Featherless is to see it in action. In the following examples I'll quickly walk you through how to setup basic API calls with Featherless. The first thing you'll have to do is sign up for a &lt;a href="https://featherless.ai/register" rel="noopener noreferrer"&gt;Featherless account&lt;/a&gt; and choose a &lt;a href="https://featherless.ai/#pricing" rel="noopener noreferrer"&gt;subscription plan&lt;/a&gt; that fits your needs. After which you'll have access to your own &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API key&lt;/a&gt; on your dashboard, keep this close as you'll be needing it in each of the examples I'll be demonstrating. If you're new to language model APIs altogether, don't worry as we've kept the examples clean and straightforward, focusing only on the essential patterns to get you started.&lt;/p&gt;
&lt;h3&gt;
  
  
  Your first API call
&lt;/h3&gt;

&lt;p&gt;Firstly, choose your model from our vast catalog of Hugging Face models. Then depending on the use case of your application we have two endpoints. The first and simplest being &lt;code&gt;/v1/chat/completions&lt;/code&gt; which is designed for interactions where your application needs to maintain a clear user - assistant relationship and conversation flow (think ChatGPT). It accepts messages in a format that distinguishes between system instructions, user inputs and assistant responses, making it ideal for chatbots, virtual assistants or any application that requires contextual conversation management.&lt;/p&gt;

&lt;p&gt;Let's start with this simple chat completion example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example shows how to make a basic chat completion call
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Replace API key
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! How are you?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m amazing, yourself?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Great! What are you up to?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We specify the model we want to use, then provide an array of messages&lt;br&gt;
On the other hand we have &lt;code&gt;/v1/completions&lt;/code&gt; which provides a bit more advanced but more direct approach. It accepts a single text prompt and returns a completion. Giving you complete control over the prompt format. Examples where this can be useful are content and text generation or any cases where you want to implement more custom conversation formats.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example shows how to make a text completion call
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Replace with your API key
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Once upon a time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how with text completion endpoint takes a simpler input: a prompt (one string), instead of an array of messages, and allows for the generation of &lt;em&gt;any&lt;/em&gt; text. This endpoint is key for using the LLM as a &lt;em&gt;reasoning&lt;/em&gt; engine; e.g. if using an LLM to extract structured data from a block of text, like a list of email addresses out of a body of text, this is much more simply done with text completions than using chat completion.&lt;br&gt;
We've also added &lt;code&gt;max_tokens&lt;/code&gt; as a parameter here to specify the the length of response in tokens we would want back from the model. A more elaborate overview of the different parameters you can provide to the endpoint is available in our documentation.&lt;/p&gt;
&lt;h3&gt;
  
  
  OpenAI compatibility
&lt;/h3&gt;

&lt;p&gt;The widespread adoption of OpenAI's ecosystem has led to an implicit API standard for LLM integration. Featherless implements this standard, enabling any code or application designed to work with OpenAI's API to easily be reconfigured to work with Featherless instead. This compatibility extends across the ecosystem of applications and tools built for OpenAI, making the transition to Featherless straightforward for teams working with these tools. You can find a list of a few of those applications in our other blog.&lt;/p&gt;

&lt;p&gt;Now let's have a look at how we can make use of the full range of open-source models by just adjusting the standard OpenAI SDK code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using OpenAI SDK 
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only changes needed here are the client's &lt;code&gt;base_url&lt;/code&gt;, the &lt;code&gt;api_key&lt;/code&gt; and the &lt;code&gt;model&lt;/code&gt; parameter. The rest of the code is unchanged. This compatibility means you can switch between models without having to rewrite any of your existing application logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparing models
&lt;/h3&gt;

&lt;p&gt;As switching between models is as easy as changing one line we might want to compare different responses over the same prompt from models to quickly iterate over which model is adequate for your use case. We can easily do this with the following example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="c1"&gt;# Compare responses from different models with the same prompt
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compare_models&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="c1"&gt;# Add models from catalog
&lt;/span&gt;&lt;span class="n"&gt;models_to_compare&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# The prompt you want to compare
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compare_models&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain AGI in simple terms.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;models_to_compare&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What now?
&lt;/h3&gt;

&lt;p&gt;You've seen how straightforward it is to get started! Just a few lines of code and you're up and running and chatting with your first models. Before we dive deeper into the next implementations we invite you to join our growing community of developers and enthusiasts on Discord. Share your experience, struggles, and connect with others who are building with Featherless.&lt;/p&gt;

&lt;p&gt;Now in the following sections we'll introduce some basic building blocks such as how to use Featherless in LangChain and some patterns to help you with making use of the endless amount of models provided and how you can make the most of what this variety offers you.&lt;/p&gt;

&lt;p&gt;Join us on &lt;a href="https://discord.com/invite/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; to continue the conversation, now let's dive into how LangChain can extend everything we've already discussed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Basics: Integrating with LangChain
&lt;/h2&gt;

&lt;p&gt;Moving beyond basic (and individual) inference calls, let's explore how to use Featherless with more sophisticated libraries. LangChain, the most widely adopted of these libraries, provides developers with powerful tools and patterns for managing complex prompts and conversational state. Here's how you can power any LangChain application with Featherless.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Your Featherless API key
&lt;/span&gt;    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that translates English to French. Translate the user sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I love programming.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ai_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ai_msg&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With LangChain you can use their building blocks to create more advanced applications such as pipelines that can summarize and analyze large documents by breaking them into chunks, implementing conversation patterns such as simple message history to more complex summary-based approaches to help you manage your context size. &lt;/p&gt;

&lt;p&gt;The beauty of LangChain with Featherless is that you can experiment with different models for different parts of your application. Need a lighter model for classification but a more powerful one for generation? You can mix and match with the wide variety of models in our catalog while still maintaining a consistent and clean architecture. &lt;/p&gt;

&lt;p&gt;The following example briefly demonstrates the power and flexibility of the combination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunnableLambda&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;

&lt;span class="c1"&gt;# Define models for different tasks
&lt;/span&gt;&lt;span class="n"&gt;classification_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Meta-Llama-3.1-8B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;translation_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FEATHERLESS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.featherless.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Nemo-Instruct-2407&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define prompt templates
&lt;/span&gt;&lt;span class="n"&gt;translation_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Translate the following sentence from English to French:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;classification_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify the sentiment of the following text as positive, negative, or neutral:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Translation task
&lt;/span&gt;&lt;span class="n"&gt;translation_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunnableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Translation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;translated_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;translation_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;translation_prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;  &lt;span class="c1"&gt;# Extract content
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Classification task
&lt;/span&gt;&lt;span class="n"&gt;classification_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunnableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;translated_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;translated_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Passing the translated text
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classification_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;classification_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;classification_prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;translated_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Chain the tasks together
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;translation_task&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;classification_task&lt;/span&gt;

&lt;span class="c1"&gt;# Run the workflow
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I love using Featherless.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By creating separate model instances for classification and translation tasks, we can optimize our application's performance. Depending on the task at hand we can choose a specific model that tackles the nuance of that task. &lt;/p&gt;

&lt;p&gt;What's particularly powerful about this approach is its extensibility. Need to add for example a profanity filter before your translation? Simply find an appropriate model and create a new task to inject in the workflow. The architecture scales with your needs while keeping your infrastructure complexity manageable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Throughout this guide, we've hopefully equipped you to run inference on any Hugging Face model, from prototype to production, without worrying one lick about the complexity of infrastructure management or the cost of directly running GPUs. This however is just the beginning. As you've seen with the LangChain implementation, the ability to seamlessly access any Hugging Face model opens up countless possibilities for your applications. Whether you're building a specialized chatbot, implementing domain-specific analysis, or creating the next Duolingo. We'll be coming back with some more advanced examples in future blogposts so make sure to keep an eye out.&lt;/p&gt;

&lt;p&gt;Ready to start building? Head over to &lt;a href="https://featherless.ai/" rel="noopener noreferrer"&gt;https://featherless.ai/&lt;/a&gt; to create an account. Our growing community of developers, enthusiasts, and AI practitioners is here to help you get the most out of Featherless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://discord.com/invite/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community to connect with other users&lt;/li&gt;
&lt;li&gt;Follow us on X (&lt;a href="https://x.com/FeatherlessAI" rel="noopener noreferrer"&gt;@FeatherlessAI&lt;/a&gt;) for the latest updates&lt;/li&gt;
&lt;li&gt;Star our &lt;a href="https://github.com/featherlessai/featherless-cookbook" rel="noopener noreferrer"&gt;Github&lt;/a&gt; repository to stay updated on new examples and tutorials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll be looking forward to seeing what you all create and share with the community.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>ai</category>
    </item>
    <item>
      <title>Running Open Source LLMs in Popular AI Clients with Featherless: A Complete Guide</title>
      <dc:creator>Darin Verheijke</dc:creator>
      <pubDate>Fri, 10 Jan 2025 15:54:24 +0000</pubDate>
      <link>https://dev.to/featherlessai/running-open-source-llms-in-popular-ai-clients-with-featherless-a-complete-guide-2deh</link>
      <guid>https://dev.to/featherlessai/running-open-source-llms-in-popular-ai-clients-with-featherless-a-complete-guide-2deh</guid>
      <description>&lt;h2&gt;
  
  
  Table of contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;Getting started with Featherless&lt;/li&gt;
&lt;li&gt;
Role-playing client integration

&lt;ol&gt;
&lt;li&gt;SillyTavern&lt;/li&gt;
&lt;li&gt;WyvernChat&lt;/li&gt;
&lt;li&gt;VenusAI&lt;/li&gt;
&lt;li&gt;JanitorAI&lt;/li&gt;
&lt;li&gt;anime.gf&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;

Frontend chat client integration

&lt;ol&gt;
&lt;li&gt;Featherless Phoenix&lt;/li&gt;
&lt;li&gt;Typing Mind&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;Security best practices&lt;/li&gt;

&lt;li&gt;Community &amp;amp; Support&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;&lt;a id="introduction"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Remember the excitement of discovering powerful open-source language models like the latest Qwen or LLama 3 AI models, only to face the daunting challenge of actually running them? &lt;/p&gt;

&lt;p&gt;You’re not alone. While these models offer amazing capabilities, deploying them efficiently has been a significant hurdle for developers and enthusiasts alike. Today, we’re changing that with Featherless. In this guide, you’ll learn how to integrate the latest cutting-edge AI models into your favorite chat clients - whether you’re building an ai agent, a coding assistant, roleplaying, or setting up a general chat interface. &lt;/p&gt;

&lt;p&gt;I’ll walk you through practical, step-by-step instructions for popular platforms like SillyTavern, WyvernChat and Typing Mind showing you how to leverage our serverless infrastructure. No GPU or server management required, no complex deployments - just powerful language models ready to use in the tools you already know and love.&lt;/p&gt;

&lt;p&gt;&lt;a id="featherless"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started with Featherless
&lt;/h2&gt;

&lt;p&gt;Featherless is your gateway to using powerful open-source language models in your favorite applications. As a serverless inference platform, we make it simple to access the latest models without managing any infrastructure. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI compatibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our API is fully OpenAI-compatible. This means any application that works with OpenAI can be easily reconfigured to use Featherless. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Setup API&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign up for a &lt;a href="https://featherless.ai/register" rel="noopener noreferrer"&gt;Featherless account&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Subscribe to a plan that fits your needs&lt;/li&gt;
&lt;li&gt;Navigate to the &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API Keys&lt;/a&gt; section in your dashboard&lt;/li&gt;
&lt;li&gt;Create a new API Key&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it! You’re ready to use this API Key in your preferred chat client&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose your plan&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Select the &lt;a href="https://featherless.ai/#pricing" rel="noopener noreferrer"&gt;plan&lt;/a&gt; that fits your needs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🌱 Featherless Basic ($10/month)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All models up to 15B parameters&lt;/li&gt;
&lt;li&gt;2 concurrent requests&lt;/li&gt;
&lt;li&gt;Unlimited monthly usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⭐ Featherless Premium ($25/month)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All models up to 72B parameters&lt;/li&gt;
&lt;li&gt;Everything in Basic&lt;/li&gt;
&lt;li&gt;Perfect for power users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🚀 Featherless Scale ($75 per unit/month)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everything in Premium&lt;/li&gt;
&lt;li&gt;2x Premium or 6x Basic model concurrency per unit&lt;/li&gt;
&lt;li&gt;Host private models from Hugging Face&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🔒 Privacy First: We never log chats, prompts, or completions. Your conversations stay private.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Ready to begin?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Head to the specific integration guide for your preferred client, or join our &lt;a href="https://discord.gg/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; for support.&lt;/p&gt;

&lt;p&gt;&lt;a id="roleplay"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Role-playing client integration
&lt;/h2&gt;

&lt;p&gt;Having spent countless hours lost in AI roleplay conversations, let me tell you - there’s nothing quite like that moment when your character truly comes alive. Whether it’s the witty jokes that feel spontaneous, or those surprising, deep exchanges that make you forget you’re talking to an AI.&lt;/p&gt;

&lt;p&gt;From creating more complex characters with a deep lore to vibrant anime personalities, what I’ve learned is that finding the right model isn’t just a technical choice - it’s about finding a perfect way to show off the identity and personality behind your character. That’s why I’m so excited about our growing catalog of open-source models. Each one brings something special and different to your character interactions and creative writing, I’ve seen incredible roleplay scenes emerge from unexpected model choices. &lt;/p&gt;

&lt;p&gt;Let me now walk you through integrating Featherless with some of your favorite roleplay clients.&lt;/p&gt;

&lt;p&gt;&lt;a id="sillytavern"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  SillyTavern
&lt;/h3&gt;

&lt;p&gt;With &lt;a href="https://sillytavern.app/" rel="noopener noreferrer"&gt;SillyTavern&lt;/a&gt; it’s pretty easy to create a connection to Featherless. Simply click on the plug icon at the top and make the following selections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;API: Chat Completion&lt;/li&gt;
&lt;li&gt;Chat Completion source: Custom (OpenAI-compatible)&lt;/li&gt;
&lt;li&gt;Custom Endpoint: &lt;a href="https://api.featherless.ai/v1" rel="noopener noreferrer"&gt;https://api.featherless.ai/v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Custom API Key: Your Featherless &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Enter a Model ID: A model chosen from our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt; 
(e.g. meta-llama/Meta-Llama-3.1-8B-Instruct)&lt;/li&gt;
&lt;li&gt;Connect!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6g17sjawoh3su5ulmlx4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6g17sjawoh3su5ulmlx4.png" alt="Instructions SillyTavern" width="800" height="620"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once connected, you’ll see a green status indicator after which you’ll be able to send a message to your characters to ensure everything is working properly. You should receive a response within seconds.&lt;/p&gt;

&lt;p&gt;&lt;a id="wyvern"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  WyvernChat
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://app.wyvern.chat/" rel="noopener noreferrer"&gt;WyvernChat&lt;/a&gt; offers native Featherless integration right out of the box. The built-in support means you’ll spend less time configuring and more time chatting. WyvernChat provides a streamlined setup process. Whether you’re new to AI chat platforms or migrating from another client, you’ll appreciate how seamlessly Featherless meshes with WyvernChat’s clean interface. All you need is your Featherless API Key and  following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Head on over to &lt;a href="https://app.wyvern.chat/" rel="noopener noreferrer"&gt;https://app.wyvern.chat/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;At the bottom right of the page click on the plug icon&lt;/li&gt;
&lt;li&gt;Click on ‘+ Add connection’ &lt;/li&gt;
&lt;li&gt;Select ‘Featherless’ from the Type dropdown&lt;/li&gt;
&lt;li&gt;Password (API Key)*: Your Featherless &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Select a model from the list at the bottom&lt;/li&gt;
&lt;li&gt;Scroll down and press ‘Create’&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xxkyid75rvhhkopdsit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xxkyid75rvhhkopdsit.png" alt="WyvernChat" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once your connection is created head over to any of your character chats, then at the bottom right under ‘Settings’ deselect ‘Free Queue’ (this ensures you’re using your Featherless connection). Then simply choose your model from the Connection dropdown. Switching between models is as simple as creating an extra connection. You’ll know everything is working when your character responds using your chosen model - typically within a few seconds.&lt;/p&gt;

&lt;p&gt;&lt;a id="venus"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Venus AI
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://venuschat.ai/#" rel="noopener noreferrer"&gt;VenusChat&lt;/a&gt; supports OpenAI-compatible APIs out of the box, making our connection process quick and straightforward. &lt;/p&gt;

&lt;p&gt;To connect your VenusChat with Featherless, head over to any character chat:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on the Gear icon on the top right&lt;/li&gt;
&lt;li&gt;Go to “AI Model Settings”&lt;/li&gt;
&lt;li&gt;Choose ‘Open AI’ under select an AI Model&lt;/li&gt;
&lt;li&gt;Select “Reverse Proxy”&lt;/li&gt;
&lt;li&gt;Pick a model from our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt;, copy the complete url&lt;/li&gt;
&lt;li&gt;Paste the url under ‘Open API Reverse Proxy 
(e.g. &lt;a href="https://featherless.ai/models/mistralai/Mistral-Nemo-Instruct-2407" rel="noopener noreferrer"&gt;https://featherless.ai/models/mistralai/Mistral-Nemo-Instruct-2407&lt;/a&gt;) &lt;/li&gt;
&lt;li&gt;Enter your Featherless &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API Key&lt;/a&gt; under ‘Reverse Proxy Key’&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnsops0muj66tdtq3mtd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnsops0muj66tdtq3mtd.png" alt="venus" width="800" height="1053"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a id="janitor"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  JanitorAI
&lt;/h3&gt;

&lt;p&gt;Head over to chat on any character on &lt;a href="https://janitorai.com/" rel="noopener noreferrer"&gt;JanitorAI&lt;/a&gt; and let’s get you set up with just a few steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the dropdown in the topright corner and select ‘API Settings’&lt;/li&gt;
&lt;li&gt;Go to ‘Proxy’&lt;/li&gt;
&lt;li&gt;Pick a model from our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Under ‘Model’ choose custom and enter the models name 
(e.g. meta-llama/Meta-Llama-3.1-8B-Instruct)&lt;/li&gt;
&lt;li&gt;Other API/proxy URL: &lt;a href="https://api.featherless.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.featherless.ai/v1/chat/completions&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;API Key: Your Featherless &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API Key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check API Key/Model to see everything is working&lt;/li&gt;
&lt;li&gt;Scroll down and Save Settings&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd0x87iptgvhowe2jdsm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd0x87iptgvhowe2jdsm.png" alt="JanitorAI API instructions" width="379" height="859"&gt;&lt;/a&gt;&lt;br&gt;
Once you’ve saved your settings, congratulations - you can now chat with your character with any of our compatible models and switching between them is as simple as repeating steps 3-4. Feel free to experiment with different models to find the perfect fit for each character.&lt;/p&gt;

&lt;p&gt;&lt;a id="anime"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  anime.gf
&lt;/h3&gt;

&lt;p&gt;Bringing your favorite &lt;a href="https://www.anime.gf/" rel="noopener noreferrer"&gt;anime.gf&lt;/a&gt; characters to life with Featherless is straightforward. Enhance your interactions by making use of our diverse model catalog. Let’s get you setup with just a few simple steps.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on the cog in the top right of your screen&lt;/li&gt;
&lt;li&gt;Head on over to ‘A.I. Settings’ and click on ‘provider’&lt;/li&gt;
&lt;li&gt;Under API Provider select ‘Proxy’&lt;/li&gt;
&lt;li&gt;API Key: Your Featherless &lt;a href="https://featherless.ai/account/api-keys" rel="noopener noreferrer"&gt;API Key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Base URL: &lt;a href="https://api.featherless.ai/v1/" rel="noopener noreferrer"&gt;https://api.featherless.ai/v1/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model: Choose a model from our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt; (e.g. anthracite-org/magnum-v4-72b)&lt;/li&gt;
&lt;li&gt;Save&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhun67kf2ij6zr78k3fb4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhun67kf2ij6zr78k3fb4.png" alt="animepart1 API instructions" width="800" height="457"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonu8hkqm1csvmmojsgpx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonu8hkqm1csvmmojsgpx.png" alt="animepart2 API instructions" width="800" height="1339"&gt;&lt;/a&gt;&lt;br&gt;
Great! Your &lt;a href="http://anime.gf" rel="noopener noreferrer"&gt;anime.gf&lt;/a&gt; character is now powered by Featherless. Try sending a message to see the integration in action. Feel free to experiment with different models from our catalog to find the perfect match for each character - switching models is as easy as changing the model in your settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s next?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With your favorite roleplaying client connected to Featherless you’re ready to experiment with a variety of models from our catalog, join our &lt;a href="https://discord.gg/bbvhdWmPHa" rel="noopener noreferrer"&gt;discord&lt;/a&gt; to share your experiences and get model recommendations!&lt;/p&gt;

&lt;p&gt;&lt;a id="chat"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frontend chat client integration
&lt;/h2&gt;

&lt;p&gt;I’ve always find something satisfying about a clean chat interface - it’s like having a dedicated thinking space where you can have a focused conversation. Whether I’m exploring new concepts, in a deep coding session or just brainstorming ideas - platforms like Typing Mind and our own Phoenix have become essential companions. Let me show you how to set up these tools that give you access to our entire model catalog as I think you’ll find them as invaluable as I do.&lt;/p&gt;

&lt;p&gt;&lt;a id="phoenix"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Featherless Phoenix
&lt;/h3&gt;

&lt;p&gt;If you’re looking for the most straightforward way to start chatting with our models, look no further than &lt;a href="https://phoenix.featherless.ai/" rel="noopener noreferrer"&gt;Featherless Phoenix&lt;/a&gt;. As our native chat interface, it requires zero additional setup - simply login with your Featherless account, choose your preferred model from the menu in the top left corner and start chatting. It’s the perfect starting point for exploring our model catalog and finding the right language model for your needs.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftizud67ghj1r50895dql.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftizud67ghj1r50895dql.png" alt="Phoenix Featherless" width="800" height="376"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a id="typing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Typing Mind
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.typingmind.com/" rel="noopener noreferrer"&gt;Typing mind&lt;/a&gt;, as a Chat UI frontend allows you to use AI models from our whole catalog. Featherless integration is as easy as going to ‘Models’, then clicking on ‘+ Add Custom Model’ on the top right of your screen followed by these quick steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choosing ‘OpenAI Compatible API’ as API Type&lt;/li&gt;
&lt;li&gt;Endpoint: &lt;a href="https://api.featherless.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.featherless.ai/v1/chat/completions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model ID: Pick a model from our &lt;a href="https://featherless.ai/models" rel="noopener noreferrer"&gt;model catalog&lt;/a&gt; (e.g. mistralai/Mistral-Nemo-Instruct-2407)&lt;/li&gt;
&lt;li&gt;Choosing a context length (which you can find on the model’s page)&lt;/li&gt;
&lt;li&gt;Add a custom header with it’s key ‘&lt;strong&gt;authorization’&lt;/strong&gt; and the key will be &lt;strong&gt;&lt;code&gt;Bearer &amp;lt;YOUR_FEATHERLESS_API_KEY&amp;gt;&lt;/code&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add a custom body params with a number param `&lt;strong&gt;max_tokens&lt;/strong&gt;' followed by any amount up to your context length. This will be the length of your response.&lt;/li&gt;
&lt;li&gt;Lastly press &lt;strong&gt;Test&lt;/strong&gt; and if everything went well you can now '&lt;strong&gt;Add Model&lt;/strong&gt;'
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26d3ewp80lybmlnpo5md.png" alt="Typing Mind API instructions" width="566" height="889"&gt;
That’s it - Your Typing Mind interface is now connected to Featherless! Try chatting to verify if everything is working properly. Response  times will vary by model, but you should typically see results within a few seconds. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a id="security"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security best practices
&lt;/h2&gt;

&lt;p&gt;Your Featherless API Key is your secure gateway to our services and all the models you love - protecting it is crucial for your account’s security and data privacy. Never share your keys publicly and we recommend creating separate API keys for different applications. If you suspect your key might have been exposed, rotate it immediately through your Featherless dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a id="community"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Community &amp;amp; Support
&lt;/h2&gt;

&lt;p&gt;Our growing community of developers, enthusiasts, and AI practitioners is here to help you get the most out of Featherless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://discord.gg/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community to connect with other users&lt;/li&gt;
&lt;li&gt;Share your experiences with us!&lt;/li&gt;
&lt;li&gt;Get model recommendations for your specific use case&lt;/li&gt;
&lt;li&gt;Stay updated on the latest models that get added&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As the world of AI is evolving rapidly, Featherless is committed to evolving with it. As new models emerge and capabilities expand, we’re working to ensure you have seamless access and integration to all the latest advancements. Our mission to make all AI models available for serverless inference remains unchanged. &lt;/p&gt;

&lt;p&gt;We’re excited to see what you all create with Featherless. Whether you’re creating engaging characters for roleplay or exploring new applications we haven’t even imagined yet. Ready to get started? Head over to &lt;a href="https://featherless.ai/register" rel="noopener noreferrer"&gt;https://featherless.ai/&lt;/a&gt; to create an account, or join our &lt;a href="https://discord.gg/bbvhdWmPHa" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community to connect with other enthusiasts.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
