<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Farrruh</title>
    <description>The latest articles on DEV Community by Farrruh (@farrruh).</description>
    <link>https://dev.to/farrruh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1302347%2F6a285398-6119-4355-9988-faf614461870.png</url>
      <title>DEV Community: Farrruh</title>
      <link>https://dev.to/farrruh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/farrruh"/>
    <language>en</language>
    <item>
      <title>Mastering Text Embedding and Reranker with Qwen3</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Fri, 22 Aug 2025 06:51:37 +0000</pubDate>
      <link>https://dev.to/farrruh/mastering-text-embedding-and-reranker-with-qwen3-5455</link>
      <guid>https://dev.to/farrruh/mastering-text-embedding-and-reranker-with-qwen3-5455</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;Follow me on &lt;a href="https://community.alibabacloud.com/users/5611950958141783" rel="noopener noreferrer"&gt;Alibaba Cloud Community&lt;/a&gt; for cutting-edge tech insights!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32ow12efa66vwd4xgmyq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32ow12efa66vwd4xgmyq.png" alt="1" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Created by Wan&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 1: The Triple Threat: Embedding, Reranking, and Invoking
&lt;/h2&gt;
&lt;h2&gt;
  
  
  1.1 Introduction to Embedding, Reranking, and Qwen3 Models
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Introduction to Embedding and Reranking
&lt;/h3&gt;

&lt;p&gt;Text embedding and reranking are foundational technologies in natural language processing (NLP) that power modern search engines, recommendation systems, retrieval-augmented generation (RAG) pipelines, and even an Agentic AI.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbing8pd3wr7wkkrou539.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbing8pd3wr7wkkrou539.png" alt="2" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text Embedding&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Text embeddings convert unstructured text into dense numerical vectors (e.g., arrays of numbers) that capture semantic meanings. These vectors enable machines to measure the similarity between texts, supporting tasks such as semantic search, clustering, and classification. For example, a query like &lt;em&gt;"best LLM for the finance industry"&lt;/em&gt; can be matched to LLM (Large Language Model) descriptions or articles that align with its intent.   &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reranking&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reranking refines the results of an initial retrieval step by reordering candidates based on finer-grained relevance scores. While embedding models retrieve broad matches, rerankers prioritize the most contextually relevant results. For instance, a search engine might first retrieve 100 documents using embeddings, then apply a reranker to pick the top 10 most relevant ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Applications&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Web search and recommendation systems&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Legal document analysis and compliance monitoring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Healthcare research (e.g., finding clinical trials for a drug)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Financial risk assessment (e.g., analyzing loan applications)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Qwen3 Embedding and Reranking Models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05rfth3bzwojirtmsljr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05rfth3bzwojirtmsljr.png" alt="3" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Qwen3 Embedding series, built on the &lt;strong&gt;Qwen3&lt;/strong&gt; models, represents a leap forward in text representation learning. It includes &lt;strong&gt;embedding models&lt;/strong&gt; (for vectorizing text) and &lt;strong&gt;reranking models&lt;/strong&gt; (for refining search results), with parameter sizes of 0.6B, 4B, and 8B.  &lt;/p&gt;
&lt;h4&gt;
  
  
  Key Features
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Exceptional Versatility&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;State-of-the-art results on benchmarks like MTEB (Multilingual Text Embedding Benchmark) and MTEB-Code.   &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Excelling in cross-lingual and code retrieval tasks (e.g., searching GitHub repositories for Python functions).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Comprehensive Flexibility&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Sizes&lt;/strong&gt;: 0.6B (lightweight), 4B (balanced), and 8B (high-performance).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Customizable Dimensions&lt;/strong&gt;: Variable vector lengths (e.g., 1024D for Qwen3-Embedding-0.6B, 4096D for Qwen3-Embedding-8B).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instruction Awareness&lt;/strong&gt;:  Task-specific instructions (e.g., &lt;em&gt;"Given the following question, facts, and contexts, retrieve the correct answer."&lt;/em&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Multilingual Mastery&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Supporting 100+ languages, including programming languages (Python, Java, C++, etc.).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handling cross-lingual tasks (e.g., querying in English and retrieving French documents).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Evaluation Results
&lt;/h4&gt;

&lt;p&gt;Evaluation results for reranking models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86plce50xbi8479ibafm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86plce50xbi8479ibafm.png" alt="4" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Evaluation results for reranking models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;MTEB-R&lt;/th&gt;
&lt;th&gt;CMTEB-R&lt;/th&gt;
&lt;th&gt;MMTEB-R&lt;/th&gt;
&lt;th&gt;MLDR&lt;/th&gt;
&lt;th&gt;MTEB-Code&lt;/th&gt;
&lt;th&gt;FollowIR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-Embedding-0.6B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;61.82&lt;/td&gt;
&lt;td&gt;71.02&lt;/td&gt;
&lt;td&gt;64.64&lt;/td&gt;
&lt;td&gt;50.26&lt;/td&gt;
&lt;td&gt;75.41&lt;/td&gt;
&lt;td&gt;5.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jina-multilingual-reranker-v2-base&lt;/td&gt;
&lt;td&gt;0.3B&lt;/td&gt;
&lt;td&gt;58.22&lt;/td&gt;
&lt;td&gt;63.37&lt;/td&gt;
&lt;td&gt;63.73&lt;/td&gt;
&lt;td&gt;39.66&lt;/td&gt;
&lt;td&gt;58.98&lt;/td&gt;
&lt;td&gt;-0.68&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gte-multilingual-reranker-base&lt;/td&gt;
&lt;td&gt;0.3B&lt;/td&gt;
&lt;td&gt;59.51&lt;/td&gt;
&lt;td&gt;74.08&lt;/td&gt;
&lt;td&gt;59.44&lt;/td&gt;
&lt;td&gt;66.33&lt;/td&gt;
&lt;td&gt;54.18&lt;/td&gt;
&lt;td&gt;-1.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BGE-reranker-v2-m3&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;57.03&lt;/td&gt;
&lt;td&gt;72.16&lt;/td&gt;
&lt;td&gt;58.36&lt;/td&gt;
&lt;td&gt;59.51&lt;/td&gt;
&lt;td&gt;41.38&lt;/td&gt;
&lt;td&gt;-0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-Reranker-0.6B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;65.80&lt;/td&gt;
&lt;td&gt;71.31&lt;/td&gt;
&lt;td&gt;66.36&lt;/td&gt;
&lt;td&gt;67.28&lt;/td&gt;
&lt;td&gt;73.42&lt;/td&gt;
&lt;td&gt;5.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-Reranker-4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;69.76&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;75.94&lt;/td&gt;
&lt;td&gt;72.74&lt;/td&gt;
&lt;td&gt;69.97&lt;/td&gt;
&lt;td&gt;81.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14.84&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-Reranker-8B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;69.02&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;77.45&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.94&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70.19&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81.22&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.05&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Advantages&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Qwen3-Embedding-8B scores &lt;strong&gt;70.58 on MTEB Multilingual&lt;/strong&gt;, outperforming Google’s Gemini-Embedding.
&lt;/li&gt;
&lt;li&gt;Qwen3-Reranker-8B improves ranking accuracy by &lt;strong&gt;3.0 points&lt;/strong&gt; over smaller rerankers.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Efficiency&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller models (such as 0.6B) strike a balance between speed and accuracy in resource-constrained environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Customization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users can customize instruction templates for domain-specific tasks (e.g., legal contract analysis).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Disadvantages&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resource Requirements&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Larger models (such as 8B) demand significant GPU memory (e.g., 8x NVIDIA A100s for &lt;strong&gt;training&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Latency&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-performance rerankers may cause delays in real-time applications (e.g., live chatbots).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Technical Specifications&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Model Overview:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Type&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Layers&lt;/th&gt;
&lt;th&gt;Sequence Length&lt;/th&gt;
&lt;th&gt;Embedding Dimension&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;MRL Support&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Instruction Aware&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text Embedding&lt;/td&gt;
&lt;td&gt;Qwen3-Embedding-0.6B&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-Embedding-4B&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;2560&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-Embedding-8B&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;4096&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text Reranking&lt;/td&gt;
&lt;td&gt;Qwen3-Reranker-0.6B&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-Reranker-4B&lt;/td&gt;
&lt;td&gt;4B&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-Reranker-8B&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note: “&lt;strong&gt;MRL Support&lt;/strong&gt;” indicates whether the embedding model supports custom dimensions for the final embedding. “&lt;strong&gt;Instruction Aware&lt;/strong&gt;” notes whether the embedding or reranking model supports customizing the input instruction for different tasks.&lt;/p&gt;
&lt;h2&gt;
  
  
  1.2. Deploying and Invoking Embedding Models on Alibaba Cloud
&lt;/h2&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Deploying Qwen3 on PAI-EAS and Using OpenAI-Compatible Libraries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Alibaba Cloud provides two primary methods to invoke embedding models:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Studio&lt;/strong&gt;: A no-code platform offering ready-to-use models like &lt;strong&gt;text-embedding-v3&lt;/strong&gt; (ideal for quick deployment).  Visit &lt;a href="https://www.alibabacloud.com/en/product/modelstudio" rel="noopener noreferrer"&gt;Alibaba Cloud Model Studio&lt;/a&gt; for more details.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PAI-EAS&lt;/strong&gt;: A managed service for deploying custom models like &lt;strong&gt;Qwen3-Embedding-8B&lt;/strong&gt; (for advanced customization). Visit &lt;a href="https://www.alibabacloud.com/en/product/machine-learning" rel="noopener noreferrer"&gt;PAI – Platform for AI&lt;/a&gt; for more details.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Method 1: Using Model Studio for Text Embedding
&lt;/h3&gt;

&lt;p&gt;Alibaba Cloud’s &lt;strong&gt;Model Studio&lt;/strong&gt; simplifies access to pre-trained open-sourced and proprietary models, including &lt;strong&gt;text-embedding-v3&lt;/strong&gt;, without requiring deployment or infrastructure management.  &lt;/p&gt;
&lt;h4&gt;
  
  
  Step-by-Step Guide on Invoking text-embedding-v3
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Access Model Studio&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Visit &lt;a href="https://bailian.console.alibabacloud.com/" rel="noopener noreferrer"&gt;Alibaba Cloud Model Studio Console&lt;/a&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click the "Docs" tab in the top navigation bar (highlighted in red in the image).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click "Embedding" (highlighted in red in the image). This will display the embedding-related documentation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm141cz6k3r6js9kxyap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm141cz6k3r6js9kxyap.png" alt="5" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Invoke the Model via OpenAI-Compatible API&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Once selected, navigate to the &lt;strong&gt;"API Details"&lt;/strong&gt; tab to obtain the endpoint and authentication credentials.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Example request format for generating embeddings:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DASHSCOPE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with your API Key if you have not configured environment variables
&lt;/span&gt;    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://dashscope-intl.aliyuncs.com/compatible-mode/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# base_url for Model Studio
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-v3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;The quality of the clothes is excellent, very beautiful, worth the wait, I like it and will buy here again&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encoding_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump_json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Benefits of Model Studio&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Deployment Required&lt;/strong&gt;: Use pre-trained models instantly.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Pay-as-you-go pricing with automatic scaling.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;: Ideal for developers unfamiliar with setting up infrastructures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Method 2: Deploying Qwen3 Embedding Models on PAI-EAS&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For advanced use cases requiring customization (e.g., domain-specific fine-tuning), deploy Qwen3-Embedding-8B or other Qwen3 variants on &lt;strong&gt;PAI-EAS&lt;/strong&gt; (Elastic Accelerated Service). Below is a step-by-step guide based on the latest PAI tools and interfaces:  &lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-by-Step Deployment on QuickStart&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;1.  Sign in to the &lt;a href="https://pai.console.aliyun.com/" rel="noopener noreferrer"&gt;PAI console&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2mv28surq9w5t6kc7tt0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2mv28surq9w5t6kc7tt0.png" alt="6" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.  Select &lt;strong&gt;workspaces&lt;/strong&gt;, and choose &lt;em&gt;QuickStart &amp;gt;Model Gallery &amp;gt; NLP &amp;gt; embedding&lt;/em&gt;, find or search for &lt;strong&gt;Qwen3-Embedding&lt;/strong&gt; models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxi6wsdsq5xhw0vc9qkt8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxi6wsdsq5xhw0vc9qkt8.png" alt="7" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3.  Click &lt;strong&gt;Deploy&lt;/strong&gt; next to the desired model (e.g., Qwen3-Embedding-8B).  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvxpcgf4ay4v8nnksi7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvxpcgf4ay4v8nnksi7g.png" alt="8" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4.  Configure instance type, auto-scaling, and other parameters.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv01pagy493wvwamvofw2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv01pagy493wvwamvofw2.png" alt="9" width="800" height="736"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;5.  To access the recently deployed model, navigate to the Model Deployment section and select Elastic Algorithm Service (EAS). Once the "Service Status" is "Running", you will be able to start using the model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqh91ep74s030dy3u6inm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqh91ep74s030dy3u6inm.png" alt="10" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;6.  Click Invocation Method and copy the generated API endpoint for integration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uf8yzjj8eroukjkoqbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3uf8yzjj8eroukjkoqbs.png" alt="11" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This streamlined workflow ensures rapid deployment while maintaining flexibility for advanced customization.&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Send Requests via OpenAI-Compatible API&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;PAI-EAS natively supports OpenAI’s API format, enabling seamless integration with tools like &lt;code&gt;langchain&lt;/code&gt; or &lt;code&gt;openai&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;  

&lt;span class="c1"&gt;# Initialize client with PAI-EAS endpoint  
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&amp;lt;pai-eas-endpoint&amp;gt;/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-pai-api-key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="p"&gt;)&lt;/span&gt;  

&lt;span class="c1"&gt;# Generate embeddings  
&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How should I choose best LLM for the finance industry?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3-embedding-8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Outputs a 4096D vector  
&lt;/span&gt;
&lt;span class="c1"&gt;# Rerank search results  
&lt;/span&gt;&lt;span class="n"&gt;rerank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Renewable energy solutions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;  
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Solar power adoption surged by 30% in 2024.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wind energy faces challenges in urban areas.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hydrogen fuel cells offer zero-emission transportation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
    &lt;span class="p"&gt;],&lt;/span&gt;  
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3-reranker-4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns relevance scores  
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;1. Direct API Calls (Optional)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For low-level control, send raw HTTP requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;  

&lt;span class="c1"&gt;# Example request  
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;pai-eas-endpoint&amp;gt;/v1/embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &amp;lt;your-api-key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  
&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quantum computing will revolutionize cryptography.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3-embedding-8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="p"&gt;}&lt;/span&gt;  
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Key Benefits of PAI-EAS&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Domain Adaptation&lt;/strong&gt;: Fine-tuned Qwen3 models for niche tasks (e.g., financial risk analysis).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Auto-scaling for traffic spikes without manual intervention.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost Efficiency&lt;/strong&gt;:  Smaller models (e.g., Qwen3-Embedding-0.6B) for lightweight workloads.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified Ecosystem&lt;/strong&gt;: PAI’s Model Gallery, SDKs, and EAS for end-to-end MLOps.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Choose (Model Studio or PAI-EAS?)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Model Studio&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;PAI-EAS&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quick prototyping&lt;/td&gt;
&lt;td&gt;✅ No-code, instant access&lt;/td&gt;
&lt;td&gt;❌ Requires deployment setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain-specific customization&lt;/td&gt;
&lt;td&gt;❌ Limited to pre-trained models&lt;/td&gt;
&lt;td&gt;✅ Supports fine-tuning and custom models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost efficiency&lt;/td&gt;
&lt;td&gt;✅ Pay-per-token pricing&lt;/td&gt;
&lt;td&gt;✅ Flexible GPU instance pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integration with OpenAI SDK&lt;/td&gt;
&lt;td&gt;✅ OpenAI-compatible API support&lt;/td&gt;
&lt;td&gt;✅ OpenAI-compatible API support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Studio&lt;/strong&gt;: Explore the &lt;a href="https://www.alibabacloud.com/help/en/model-studio/embedding" rel="noopener noreferrer"&gt;text embedding model&lt;/a&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PAI – Platform for AI&lt;/strong&gt;: Learn more about QuickStart via the &lt;a href="https://www.alibabacloud.com/help/en/pai/user-guide/getting-started" rel="noopener noreferrer"&gt;PAI Documentation&lt;/a&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with Alibaba Cloud&lt;/strong&gt;: &lt;a href="https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1" rel="noopener noreferrer"&gt;Start your multimodal AI adventure here&lt;/a&gt;,&lt;a href="https://dev.toabout:blank"&gt; &lt;/a&gt;or contact Alibaba Cloud&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Qwen3’s embedding and reranking models offer unparalleled flexibility and performance across industries. By leveraging Alibaba Cloud’s PAI ecosystem, you can deploy and fine-tune these models to address domain-specific challenges, from financial risk analysis to medical research. Future work includes expanding multimodal capabilities (e.g., cross-modal retrieval of images and text) and optimizing for edge devices.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: Fine-Tuning Qwen3 on PAI-Lingjun and Industry Use Cases
&lt;/h2&gt;

&lt;h2&gt;
  
  
  2.1. Fine-Tuning Qwen3 Embedding &amp;amp; Reranker Models: Unlocking Domain-Specific Mastery
&lt;/h2&gt;

&lt;p&gt;In the world of AI, one size does not fit all. While Qwen3’s embedding and reranking models are pre-trained to master general tasks—from multilingual text understanding to code retrieval—their true potential shines when tailored to domains like finance, healthcare, or law. This is where &lt;strong&gt;PAI-Lingjun&lt;/strong&gt;, Alibaba Cloud’s large-scale training platform, steps in as the catalyst for transformation.  &lt;/p&gt;

&lt;h3&gt;
  
  
  The Need for Customization
&lt;/h3&gt;

&lt;p&gt;Imagine a pharmaceutical researcher sifting through millions of clinical trials to find a match for a rare disease, or a lawyer scanning thousands of contracts for a specific clause. Generic models, while powerful, often miss the subtleties of domain-specific language—terms like “EBITDA,” “myocardial infarction,” or “force majeure” demand precision. Fine-tuning bridges this gap, adapting Qwen3’s architecture to grasp the nuances of specialized tasks, from drug discovery to financial risk assessment.  &lt;/p&gt;

&lt;h3&gt;
  
  
  PAI-Lingjun: The Engine Behind Precision
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz8u82eh44qn0cobd4e9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz8u82eh44qn0cobd4e9.png" alt="12" width="800" height="596"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PAI-Lingjun is a powerhouse designed to handle the computational demands of refining Qwen3 models. With support for distributed training across GPUs/TPUs, it enables organizations to scale from 0.6B to 8B parameter models, ensuring even the most complex domains can find their ideal balance between speed and accuracy.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Components of the Workflow&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data as the Foundation&lt;/strong&gt;: Domain-specific success begins with curated data. For finance, this might mean SEC filings; for healthcare, it’s clinical notes and research papers. The richer the dataset, the deeper the model’s understanding.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Synthetic Brilliance&lt;/strong&gt;: Qwen3’s text generation capabilities create synthetic data at scale—150 million text pairs across languages—filling gaps where labeled data falls short.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Staged Mastery&lt;/strong&gt;: Training unfolds in phases. First, weakly supervised pretraining builds a broad foundation; then, high-quality labeled data sharpens focus. Finally, model merging combines checkpoints, enhancing robustness like a symphony conductor harmonizing instruments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Art of Training: A Multi-Stage Symphony&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Weakly Supervised Pretraining&lt;/strong&gt;:  &lt;/p&gt;

&lt;p&gt;Here, Qwen3 learns the rhythm of a domain. By generating synthetic data—like crafting queries for loan applications or mimicking legal jargon—it builds a scaffold of understanding, even in low-resource scenarios.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Supervised Fine-Tuning&lt;/strong&gt;:  &lt;/p&gt;

&lt;p&gt;With curated data, the model hones its expertise. A bank might train on 12 million financial documents, teaching it to spot red flags in loan applications with surgical precision.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Model Merging&lt;/strong&gt;:  &lt;/p&gt;

&lt;p&gt;Like blending colors on a palette, spherical linear interpolation (SLERP) merges checkpoints, balancing generalization and specialization. The result? A model that thrives in both breadth and depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Realities: Powering the Transformation
&lt;/h3&gt;

&lt;p&gt;Fine-tuning Qwen3-Embedding-8B isn’t for the faint of heart. It demands &lt;strong&gt;8x NVIDIA A100 GPUs&lt;/strong&gt; and 3–5 days of training time. Yet, the payoff is monumental: retrieval accuracy jumps from 72% to 89%, and domain coverage soars to 93%. Smaller models, like Qwen3-Reranker-0.6B, offer agility for real-time scoring, proving that power isn’t always about size.  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Number of model parameters&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Full-parameter training resources&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Minimum inference resources&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Model parallelism for Megatron-based training&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;7 billion&lt;/td&gt;
&lt;td&gt;Eight gu7xf GPUs or eight gu7ef GPUs&lt;/td&gt;
&lt;td&gt;One NVIDIA V100 GPU (32 GB of memory) or one NVIDIA A10 GPU (24 GB of memory)&lt;/td&gt;
&lt;td&gt;TP1 and PP1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14 billion&lt;/td&gt;
&lt;td&gt;Eight gu7xf GPUs or eight gu7ef GPUs&lt;/td&gt;
&lt;td&gt;Two NVIDIA V100 GPUs (32 GB of memory) or two NVIDIA A10 GPUs (24 GB of memory)&lt;/td&gt;
&lt;td&gt;TP2 and PP1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;72 billion&lt;/td&gt;
&lt;td&gt;Four servers, each with eight gu7xf GPUs or eight gu7ef GPUs&lt;/td&gt;
&lt;td&gt;Six NVIDIA V100 GPUs (32 GB of memory) or two gu7xf GPUs&lt;/td&gt;
&lt;td&gt;TP8 and PP2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2.2. Industry Use Cases: Transforming AI Across Verticals
&lt;/h2&gt;

&lt;h5&gt;
  
  
  1. Healthcare: Accelerating Medical Research
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Researchers struggle to find clinical trials for rare diseases like cystic fibrosis.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index &lt;strong&gt;PubMed abstracts&lt;/strong&gt; and &lt;strong&gt;arXiv papers&lt;/strong&gt; using Qwen3-Embedding.
&lt;/li&gt;
&lt;li&gt;Deploy Qwen3-Reranker to prioritize trials matching patient genotypes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h5&gt;
  
  
  2. Legal: Revolutionizing Contract Analysis
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Law firms need to identify clauses like "non-compete agreements" in contracts.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-tune Qwen3 on legal corpora (e.g., SEC filings, court rulings).
&lt;/li&gt;
&lt;li&gt;Use rerankers to highlight clauses relevant to mergers and acquisitions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h5&gt;
  
  
  3. E-Commerce: Hyper-Personalized Product Search
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Users searching for "wireless Bluetooth headphones" get irrelevant results.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train Qwen3-Embedding on product catalogs and customer reviews.
&lt;/li&gt;
&lt;li&gt;Apply rerankers to boost items with matching features (e.g., noise cancellation).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h5&gt;
  
  
  4. Finance: Precision Risk Assessment
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Banks must flag high-risk loan applications with red flags (e.g., delinquency history).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy Qwen3-Embedding to vectorize applications.
&lt;/li&gt;
&lt;li&gt;Use Qwen3-Reranker to score risk factors against regulatory guidelines.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h5&gt;
  
  
  5. Chemistry: Next-Gen Drug Discovery
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Scientists need to find molecules similar to a target compound.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train Qwen3 on chemical patents and PubChem data.
&lt;/li&gt;
&lt;li&gt;Embed molecular structures (e.g., SMILES strings) for similarity searches.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  2.3. Ready to Build Your Domain-Specific AI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjzzp4xmklyfnadck3cg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjzzp4xmklyfnadck3cg.png" alt="Introduction_to_Embedding_Reranking_and_Qwen3_Models_13_" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With PAI-Lingjun and Qwen3, the power to transform industries is at your fingertips. Whether you’re optimizing financial risk models or accelerating medical breakthroughs, Qwen3’s embedding and reranking capabilities deliver unmatched precision. Let’s redefine what’s possible—together.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Got questions? Reach out to our team or explore the _&lt;a href="https://www.alibabacloud.com/help/en/pai/user-guide/lingjun-smart-calculation-resources-single-tenant-edition" rel="noopener noreferrer"&gt;PAI-Lingjun&lt;/a&gt;&lt;/em&gt; to start your free trial today!_  &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Your Domain, Our Expertise
&lt;/h2&gt;

&lt;p&gt;Fine-tuning Qwen3 is not just a technical process—it’s a strategic leap. Whether you’re revolutionizing finance, healthcare, or materials science, PAI-Lingjun equips you to unlock AI’s full potential.   &lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Advanced Deployment Strategies and Optimization Techniques
&lt;/h2&gt;

&lt;h2&gt;
  
  
  3.1. Future Directions for Qwen3 Embedding Models
&lt;/h2&gt;

&lt;p&gt;The Qwen3 Embedding series represents a significant leap in text representation learning. However, ongoing advancements in large language models (LLMs) open new frontiers. Below are key areas of focus for future development, emphasizing &lt;strong&gt;instruction-aware embeddings&lt;/strong&gt; and &lt;strong&gt;MRL (Matryoshka Representation Learning)&lt;/strong&gt;:  &lt;/p&gt;

&lt;h3&gt;
  
  
  1. Instruction-Aware Embeddings
&lt;/h3&gt;

&lt;p&gt;Traditional models require retraining to adapt to new tasks, but Qwen3’s instruction-aware architecture allows dynamic adaptation through task-specific prompts. This eliminates the need for domain-specific fine-tuning, reducing costs and complexity.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concepts&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instruction-Aware Design&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Qwen3 Embedding models accept explicit instructions as input, guiding the model to generate embeddings tailored to specific tasks. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_detailed_instruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Instruct: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  

&lt;span class="c1"&gt;# Example: Flag loan applications with geopolitical risk factors  
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Identify loan applications with geopolitical risk factors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loan application for a tech firm in Southeast Asia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;input_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_detailed_instruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method embeds the instruction into the input context, ensuring the model focuses on domain-specific nuances (e.g., "geopolitical risk") without requiring retraining.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Few-Shot Adaptation&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By appending task-specific instructions to queries, Qwen3 can adapt to new domains with minimal labeled data. For instance, a chemistry reranker can prioritize molecules relevant to a specific drug target by including an instruction like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find molecules similar to aspirin for anti-inflammatory use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;C1CC(=O)NC(=O)C1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Aspirin's SMILES string  
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. MRL (Matryoshka Representation Learning)
&lt;/h3&gt;

&lt;p&gt;MRL enables dynamic adjustment of embedding dimensions during inference, offering flexibility without retraining. This innovation allows a single model to serve multiple scenarios (e.g., lightweight edge devices vs. high-precision servers).  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How MRL Works&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Variable Output Dimensions&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Qwen3 Embedding models generate embeddings with customizable dimensions (e.g., 1024D, 2560D, or 4096D).  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Adjustment&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During inference, you can specify the desired dimension via the &lt;code&gt;output_dimension&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generate a 2560D vector for financial risk analysis  
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2560&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advantages of MRL&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resource Efficiency&lt;/strong&gt;:  Lower-dimensional embeddings (e.g., 1024D) for edge devices and higher dimensions (e.g., 4096D) for server-grade applications.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: A single model can be deployed across diverse use cases (e.g., semantic search and molecular similarity).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Future-Proofing&lt;/strong&gt;: Easy adaptation to evolving requirements (e.g., increasing dimensionality as hardware improves).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: MRL in Healthcare&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;A pharmaceutical researcher can generate 4096D embeddings for precise molecule screening but switch to 1024D for real-time patient record clustering:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# High-precision molecule embedding  
&lt;/span&gt;&lt;span class="n"&gt;molecule_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;C1CC(=O)NC(=O)C1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  

&lt;span class="c1"&gt;# Lightweight patient record clustering  
&lt;/span&gt;&lt;span class="n"&gt;patient_notes_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Patient presents with chest pain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3.2. Optimization Techniques for Industry-Specific Tasks
&lt;/h2&gt;

&lt;h5&gt;
  
  
  1. Financial Risk Assessment
&lt;/h5&gt;

&lt;p&gt;• &lt;strong&gt;Challenge&lt;/strong&gt;: Prioritizing loan applications with red flags (e.g., delinquency history).  &lt;/p&gt;

&lt;p&gt;• &lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instruction-Aware Embedding&lt;/strong&gt;: Append task-specific instructions to queries.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Identify loans with delinquency risks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loan application for a tech startup in India&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  
&lt;span class="n"&gt;input_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_detailed_instruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MRL for Scalability&lt;/strong&gt;: Use 1024D embeddings for real-time scoring and 2560D for deeper analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;• &lt;strong&gt;Performance Metrics&lt;/strong&gt;:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Post-Optimization&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval Accuracy&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reranking Precision@10&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h5&gt;
  
  
  2. Healthcare Document Clustering
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Grouping clinical notes into categories (e.g., diagnosis, treatment plans).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instruction-Aware Embedding&lt;/strong&gt;: Use instructions like "Cluster patient records by disease severity."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MRL for Dimensionality&lt;/strong&gt;: Generate 256D embeddings for fast clustering and 4096D for detailed analysis.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Snippet&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generate embeddings for clinical notes  
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clinical_notes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  

&lt;span class="c1"&gt;# Cluster notes with HDBSCAN  
&lt;/span&gt;&lt;span class="n"&gt;clusterer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HDBSCAN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_cluster_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clusterer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  3. Code Retrieval in Software Engineering
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Finding GitHub repositories implementing specific algorithms (e.g., Dijkstra’s shortest path).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instruction-Aware Embedding&lt;/strong&gt;: Include instructions like "Prioritize Python implementations of Dijkstra’s algorithm."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MRL for Efficiency&lt;/strong&gt;: Use 1024D embeddings for quick searches and 4096D for precision.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benchmark Results&lt;/strong&gt;:  &lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;MTEB-Code Score&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Query Latency (ms)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-Embedding-8B&lt;/td&gt;
&lt;td&gt;80.68&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-Embedding-8B (MRL)&lt;/td&gt;
&lt;td&gt;85.21 (4096D)&lt;/td&gt;
&lt;td&gt;160 (higher accuracy)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why Instruction-Awareness and MRL Outperform Fine-Tuning
&lt;/h2&gt;

&lt;h4&gt;
  
  
  1. Instruction-Aware Embedding: Dynamic Adaptation Without Retraining
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Traditional fine-tuning requires retraining for each domain, which is time-consuming and resource-intensive.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Qwen3’s instruction-aware design allows developers to define task-specific instructions at inference time.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Legal&lt;/strong&gt;: &lt;em&gt;"Highlight clauses related to non-compete agreements."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E-Commerce&lt;/strong&gt;: &lt;em&gt;"Boost items with noise cancellation features."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-Shot Adaptation&lt;/strong&gt;: No need for domain-specific training data.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Savings&lt;/strong&gt;: Avoid the expense of retraining models for every use case.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. MRL: Flexible Dimensions for Any Scenario
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Fixed-dimension embeddings (e.g., 768D) force trade-offs between accuracy and efficiency.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: MRL allows dynamic adjustment of dimensions.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edge Devices&lt;/strong&gt;: Use 1024D embeddings for fast, low-memory inference.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-Precision Tasks&lt;/strong&gt;: Switch to 4096D for complex tasks like drug discovery.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single Model, Multiple Use Cases&lt;/strong&gt;: Eliminate the need for multiple models.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future-Proofing&lt;/strong&gt;: Scale dimensionality as hardware evolves without retraining.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion: Instruction-Awareness and MRL — The New Paradigm
&lt;/h3&gt;

&lt;p&gt;Qwen3 Embedding models redefine flexibility by combining &lt;strong&gt;instruction-aware embeddings&lt;/strong&gt; and &lt;strong&gt;MRL Support&lt;/strong&gt;, eliminating the need for domain-specific fine-tuning.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instruction-Aware Embeddings&lt;/strong&gt; enable developers to customize model behavior through task-specific prompts, thereby reducing the reliance on retraining.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MRL Support&lt;/strong&gt; enables dynamic dimension adjustment, ensuring optimal performance across edge and cloud deployments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By leveraging these innovations, organizations can:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reduce Costs&lt;/strong&gt;: Avoid expensive fine-tuning cycles.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accelerate Deployment&lt;/strong&gt;: Adapt models to new domains in minutes, not months.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Future-Proof Systems&lt;/strong&gt;: Scale dimensionality as hardware improves.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Qwen3 Embedding Technical Report (&lt;a href="https://arxiv.org/abs/2506.05176" rel="noopener noreferrer"&gt;arXiv:2506.05176&lt;/a&gt;)  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MTEB Benchmarks (&lt;a href="https://openreview.net/forum?id=zl3pfz4VCV" rel="noopener noreferrer"&gt;Enevoldsen et al., 2025&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Code Repository&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/QwenLM/Qwen3-Embedding#applications" rel="noopener noreferrer"&gt;Qwen3 Embedding Examples&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Contact&lt;/strong&gt;: For collaborations or inquiries, contact Alibaba Cloud.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts: The Genetic Code of Meaning Unveiled
&lt;/h2&gt;

&lt;p&gt;For the first time in history, machines can decode the &lt;strong&gt;genetic relationships between a Sanskrit poem, a Python function, and a medical diagnosis&lt;/strong&gt;—a breakthrough made accessible to all through open-source innovation. Just as DNA sequencing revolutionized biology by revealing the universal code of life, &lt;strong&gt;Qwen3 Embedding transforms AI&lt;/strong&gt; by mapping the molecular structure of meaning itself. This technology transcends language, culture, and discipline, uncovering hidden connections that redefine how AI systems understand and retrieve information.  &lt;/p&gt;

&lt;h4&gt;
  
  
  A Paradigm Shift in Understanding
&lt;/h4&gt;

&lt;p&gt;Traditional AI search operates like a keyword-matching robot, confined to surface-level text matches. Qwen3 Embedding, however, functions as a &lt;strong&gt;DNA sequencer for language&lt;/strong&gt;, capturing the deep, semantic relationships between concepts across &lt;strong&gt;250+ languages and programming paradigms&lt;/strong&gt;. Whether analyzing a medical diagnosis, a legal contract, or a quantum computing algorithm, Qwen3 deciphers the genetic code of meaning, enabling machines to grasp nuance, context, and interdisciplinary links. This isn’t just an incremental improvement—it’s a paradigm shift.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Technical Mastery and Open-Source Democratization
&lt;/h4&gt;

&lt;p&gt;Qwen3 Embedding’s &lt;strong&gt;multi-stage training pipeline&lt;/strong&gt; combines synthetic data generation, supervised fine-tuning, and model merging to achieve state-of-the-art performance. With scores of &lt;strong&gt;70.58 on MTEB Multilingual&lt;/strong&gt; and &lt;strong&gt;80.68 on MTEB Code&lt;/strong&gt;, Qwen3 surpasses proprietary giants like Google’s Gemini-Embedding, proving that open-source innovation can outpace closed ecosystems. By open-sourcing the models under the &lt;strong&gt;Apache 2.0 license&lt;/strong&gt;, Alibaba democratizes access to this "genetic code of meaning," empowering developers worldwide to build smarter, more intuitive systems.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Beyond Benchmarks: Real-World Impact
&lt;/h4&gt;

&lt;p&gt;The true power of Qwen3 lies not just in its technical specs but in its ability to &lt;strong&gt;bridge worlds&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Healthcare&lt;/strong&gt;: Accelerating drug discovery by linking molecular structures to clinical trials.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Law&lt;/strong&gt;: Automating clause analysis across multilingual contracts.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Finance&lt;/strong&gt;: Flagging risks with precision by parsing global regulatory texts.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Education&lt;/strong&gt;: Connecting interdisciplinary knowledge for personalized learning.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chemistry&lt;/strong&gt;: Revolutionizing material science by mapping molecular properties.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not hypothetical scenarios—they are realities already being shaped by Qwen3’s genetic-level understanding of meaning.  &lt;/p&gt;

&lt;h4&gt;
  
  
  The Future: From Genetic Code to Intelligent Evolution
&lt;/h4&gt;

&lt;p&gt;As AI evolves, Qwen3 Embedding sets the stage for &lt;strong&gt;multimodal systems&lt;/strong&gt; that decode not just text but images, audio, and video through the same genetic lens. Imagine an AI that understands a biomedical paper, visualizes its implications in a 3D protein model, and generates code to simulate its behavior—all through unified, cross-modal embeddings.  &lt;/p&gt;

&lt;p&gt;Moreover, Qwen3’s efficiency, ranging from lightweight 0.6B models to high-performance 8B variants, ensures adaptability for both edge devices and cloud-scale applications. The future belongs to systems that learn like organisms, evolving through exposure to diverse data ecosystems. Qwen3 Embedding is not just a tool; it is the blueprint for this evolution.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Join the Revolution
&lt;/h4&gt;

&lt;p&gt;The genetic code of meaning is now within reach. Explore Qwen3 Embedding and Reranking models on &lt;a href="https://huggingface.co/Qwen" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; and &lt;a href="https://modelscope.cn" rel="noopener noreferrer"&gt;ModelScope&lt;/a&gt;. Deploy them on Alibaba Cloud’s PAI ecosystem, or fine-tune them for your niche domain. Whether you’re a researcher, developer, or enterprise, the era of genetic AI understanding begins today.  &lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally posted on &lt;a href="https://www.alibabacloud.com/blog/mastering-text-embedding-and-reranker-with-qwen3_602308" rel="noopener noreferrer"&gt;Alibaba Cloud Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Qwen2.5 Omni: GenAI Meets Multimodality</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Fri, 18 Apr 2025 04:05:27 +0000</pubDate>
      <link>https://dev.to/farrruh/qwen25-omni-genai-meets-multimodality-2f87</link>
      <guid>https://dev.to/farrruh/qwen25-omni-genai-meets-multimodality-2f87</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;Read more of my blogs on &lt;a href="https://community.alibabacloud.com/users/5611950958141783?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Alibaba Cloud Community&lt;/a&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the Generative AI (GenAI) era, Large Language Models (LLMs) are no longer confined to text. Multimodal models like Qwen2.5 Omni bridge the gap between text, images, audio, and videos, enabling AI to think, see, hear, and speak - like us humans.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Why Multimodality Matters
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ubiquity of Multimodal Data&lt;/strong&gt;: 90% of internet traffic is visual/audio content (e.g., TikTok videos, podcasts).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-Like Interactions&lt;/strong&gt;: Users expect AI to process mixed inputs (e.g., a photo &lt;em&gt;and&lt;/em&gt; a voice query).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Industry Disruption&lt;/strong&gt;: From healthcare diagnostics to e-commerce, multimodal AI is the new standard.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Qwen2.5 Omni: Designed for Comprehensive Multimodality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Far Beyond Text: While LLMs like Qwen2.5-VL excel in text and images, Qwen2.5 Omni adds audio/video streaming, as a leap into full-sensory AI.
&lt;/li&gt;
&lt;li&gt;Unified Architecture: Unlike siloed tools, Qwen2.5 Omni is a single model for input/output across modalities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Understanding Qwen2.5 Omni: The Technical Edge
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.alicdn.com%2Fimgextra%2Fi1%2FO1CN01IWnTrG24a0JHKev1c_%21%216000000007406-2-tps-5737-3094.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.alicdn.com%2Fimgextra%2Fi1%2FO1CN01IWnTrG24a0JHKev1c_%21%216000000007406-2-tps-5737-3094.png" alt="2" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Thinker (text/audio/video processing) and Talker (speech generation) modules
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Innovations from the Technical Report
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1c3polng3zftyy0jxhl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1c3polng3zftyy0jxhl.jpg" alt="3" width="800" height="721"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Overview of Qwen2.5-Omni with the Thinker-Talker Architecture
&lt;/h3&gt;

&lt;p&gt;1.  TMRoPE Positional Encoding:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Time-aligned Multimodal RoPE ensures audio and video frames are processed in sync (e.g., lip-syncing in videos).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interleaved Chunking divides a video into 2-second blocks, combining visual/audio data to reduce latency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.  Thinker-Talker Architecture:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Thinker: An LLM for text generation and reasoning.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Talker: A dual-track model for real-time speech generation, reducing audio latency by 40% compared to Qwen2-Audio.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3.  Streaming Efficiency:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Block-wise Encoding processes audio/video in chunks, enabling real-time inference.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sliding Window Diffusion Transformer (DiT) reduces initial audio delay by limiting receptive fields.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Qwen2.5 Omni Outperforms Other Multimodal Models
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.alicdn.com%2Fimgextra%2Fi4%2FO1CN012BhLi91izGbrO92Jv_%21%216000000004483-2-tps-7914-6029.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.alicdn.com%2Fimgextra%2Fi4%2FO1CN012BhLi91izGbrO92Jv_%21%216000000004483-2-tps-7914-6029.png" alt="4" width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Task&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Qwen2.5-Omni&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Qwen2.5-VL&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GPT-4o-Mini&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;State-of-the-Art&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image→Text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;59.2 (MMMUval)&lt;/td&gt;
&lt;td&gt;58.6&lt;/td&gt;
&lt;td&gt;60.0&lt;/td&gt;
&lt;td&gt;53.9 (Other)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Video→Text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;72.4 (Video-MME)&lt;/td&gt;
&lt;td&gt;65.1&lt;/td&gt;
&lt;td&gt;64.8&lt;/td&gt;
&lt;td&gt;63.9 (Other)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multimodal Reasoning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;81.8 (MMBench)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;76.0&lt;/td&gt;
&lt;td&gt;80.5 (Other)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speech Generation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.42% WER (Chinese)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;2.33% (English)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Qwen2.5 Omni Excels
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Unified Model: You do not need to switch between audio and video models like Qwen2-Audio and Qwen2.5-VL.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Low Latency: Qwen2.5 Omni processes 2-second video chunks in real-time, which ideal for applications and services with real-time content.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Versatility: Qwen2.5 Omni handles end-to-end speech instructions as well as text (e.g., “Summarize this video and read it aloud”).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quickstart for Qwen2.5 Omni on Alibaba Cloud
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Choose the Model
&lt;/h3&gt;

&lt;p&gt;1.  Go to &lt;a href="https://bailian.console.alibabacloud.com/?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Alibaba Cloud ModelStudio&lt;/a&gt; or the &lt;a href="https://www.alibabacloud.com/en/product/modelstudio?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Model Studio introduction page&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;2.  Search for “Qwen2.5-Omni” and navigate to its page.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcowmya7n0xz1cvvzv8k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcowmya7n0xz1cvvzv8k.jpg" alt="5" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3.  Authorize access to the model (free for basic usage).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Prepare Your Environment
&lt;/h3&gt;

&lt;p&gt;Security-first setup:  &lt;/p&gt;

&lt;p&gt;1.  Create a virtual environment (recommended):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv qwen-env
&lt;span class="nb"&gt;source &lt;/span&gt;qwen-env/bin/activate  &lt;span class="c"&gt;# Linux/MacOS | Windows: qwen-env\Scripts\activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.  Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3.  Store API key securely:&lt;br&gt;&lt;br&gt;
Create a &lt;code&gt;.env&lt;/code&gt; file in your project directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DASHSCOPE_API_KEY=your_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Make an API Call with OpenAI Compatibility
&lt;/h3&gt;

&lt;p&gt;Use the OpenAI library to interact with Qwen2.5-Omni:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DASHSCOPE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://dashscope-intl.aliyuncs.com/compatible-mode/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example: Text + Audio Output
&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5-omni-7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who are you?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Specify output formats (text/audio)
&lt;/span&gt;    &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chelsie&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Enable real-time streaming
&lt;/span&gt;    &lt;span class="n"&gt;stream_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;include_usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Process streaming responses
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Partial response:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Usage stats:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Features of API
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input Type&lt;/td&gt;
&lt;td&gt;Text, images, audio, video (via URLs/Base64)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Modality&lt;/td&gt;
&lt;td&gt;Specify &lt;code&gt;modalities&lt;/code&gt; parameter (e.g., &lt;code&gt;["text", "audio"]&lt;/code&gt; for dual outputs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming Support&lt;/td&gt;
&lt;td&gt;Real-time results via &lt;code&gt;stream=True&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Environment variables for API keys (&lt;code&gt;.env&lt;/code&gt; file)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Advanced Use Cases: Pushing the Boundaries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Real-Time Video Analysis
&lt;/h3&gt;

&lt;p&gt;Use Case: Live event captioning with emotion detection.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: A 10-second video clip.
&lt;/li&gt;
&lt;li&gt;Output: Text summary + audio commentary (e.g., “The crowd is cheering热烈!”).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Cross-Modal E-commerce
&lt;/h3&gt;

&lt;p&gt;Use Case: Generate product descriptions from images and user reviews.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Input: Product image + "Write a 5-star review in Spanish"
# Output: Text review + audio version in Spanish.  
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Learn Qwen2.5 Omni?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Future-Ready Skills&lt;/strong&gt;: Multimodal models are the next-gen standard for AI applications.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Competitive Edge:&lt;/strong&gt; Businesses using Qwen2.5 Omni can:  &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reduce Costs&lt;/strong&gt;: One model for all text/audio/video tasks.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accelerate Innovation&lt;/strong&gt;: Deploy real-time apps (e.g., virtual assistants, smart surveillance).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Troubleshooting &amp;amp; Best Practices
&lt;/h2&gt;

&lt;p&gt;1.  File Size Limits:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Images:&lt;/strong&gt; ≤10MB per file.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Total Tokens:&lt;/strong&gt; Respect the model’s 32k token limit (text + image/audio embeddings).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.  Optimize for Streaming:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use Alibaba Cloud’s OSS for large files.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable &lt;code&gt;stream=True&lt;/code&gt; for real-time outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: The Future is Multimodal
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchvyg0hnh5jvrd7cgcay.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchvyg0hnh5jvrd7cgcay.jpg" alt="6" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As GenAI evolves, multimodal capabilities will dominate industries from healthcare to entertainment. By mastering Qwen2.5 Omni, you’re entering the next era of human-AI collaboration.  &lt;/p&gt;

&lt;p&gt;Start experimenting today and join the revolution!  &lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Model Studio Help: &lt;a href="https://www.alibabacloud.com/help/en/model-studio/?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Get Started Guide&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model Studio Product Page: &lt;a href="https://www.alibabacloud.com/en/product/modelstudio?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Explore Features&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Qwen2.5-Omni Blog: &lt;a href="https://qwenlm.github.io/blog/qwen2.5-omni/?utm_content=g_1000403356" rel="noopener noreferrer"&gt;In-Depth Overview&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Technical Report: &lt;a href="https://arxiv.org/abs/2503.20215?utm_content=g_1000403356" rel="noopener noreferrer"&gt;ArXiv Paper&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub: &lt;a href="https://github.com/QwenLM/Qwen2.5-Omni?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Code &amp;amp; Docs&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HuggingFace: &lt;a href="https://huggingface.co/Qwen/Qwen2.5-Omni-7B?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Model Download&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wan Visual Generation: &lt;a href="https://wan.video/?utm_content=g_1000403356" rel="noopener noreferrer"&gt;Create Amazing Videos&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>alibabacloud</category>
      <category>ai</category>
      <category>genai</category>
    </item>
    <item>
      <title>The Evolving Landscape of LLM Training Data</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Fri, 11 Apr 2025 06:02:30 +0000</pubDate>
      <link>https://dev.to/farrruh/the-evolving-landscape-of-llm-training-data-3jjd</link>
      <guid>https://dev.to/farrruh/the-evolving-landscape-of-llm-training-data-3jjd</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;Read more of &lt;a href="https://community.alibabacloud.com/users/5611950958141783?utm_content=g_1000403243" rel="noopener noreferrer"&gt;my articles on Alibaba Cloud blog&lt;/a&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Datasets are the lifeblood of artificial intelligence, especially in training large language models (LLMs) that power everything from chatbots to content generators. These datasets form the foundation upon which AI models learn and develop their capabilities. However, as the demand for more advanced AI systems grows, so does the need for high-quality, diverse, and extensive datasets. This article delves into the history of dataset usage, the types of data required at various stages of LLM training, and the challenges faced in sourcing and utilizing these datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Brief History of Dataset Usage in AI
&lt;/h2&gt;

&lt;p&gt;In the early days of AI research, datasets were meticulously curated from various sources, such as encyclopedias, parliamentary transcripts, phone call recordings, and weather forecasts. Each dataset was tailored to address specific tasks, ensuring relevance and quality. However, with the advent of transformers in 2017—a neural network architecture pivotal to modern language models—the focus shifted toward sheer volume, marking a significant change in the AI research approach. Researchers realized that the performance of LLMs improved significantly with larger models and datasets, leading to indiscriminate data scraping from the internet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5qrruel10zsdbfmzvmx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5qrruel10zsdbfmzvmx.png" alt="Image description" width="800" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By 2018, the internet had become the dominant source for all data types, including audio, images, and video. This trend has continued, resulting in a significant gap between internet-sourced data and manually curated datasets. The demand for scale also led to the widespread use of synthetic data—data generated by algorithms rather than collected from real-world interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of Data Needed for LLM Training
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pre-training
&lt;/h3&gt;

&lt;p&gt;Pre-training is the initial phase, where the model is exposed to vast amounts of text data to learn general language patterns and structures. During this stage, the model requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Diverse Text Sources: Data should come from a wide range of topics and languages to ensure broad understanding, a crucial factor in AI model development.&lt;/li&gt;
&lt;li&gt;High Volume: Billions of tokens are needed to train the model effectively.&lt;/li&gt;
&lt;li&gt;Quality Control: While quantity is crucial, maintaining a baseline level of quality is equally important as it helps prevent the model from learning incorrect or biased information. Sources often include web pages, books, articles, and other publicly available texts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, ethical considerations arise when using copyrighted materials without permission.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continuous Pre-training
&lt;/h3&gt;

&lt;p&gt;Continuous pre-training involves updating the model with new data to keep it current and improve its knowledge base. This phase requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recent Data: To incorporate the latest information and trends.&lt;/li&gt;
&lt;li&gt;Domain-Specific Data: Depending on the industry's needs, specialized datasets (e.g., medical journals for healthcare applications) may be necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Fine-tuning
&lt;/h3&gt;

&lt;p&gt;Fine-tuning adapts the pre-trained model to specific tasks or domains. It typically uses smaller, more targeted, carefully labeled, and curated datasets. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task-Specific Data: Sentiment analysis might require annotated reviews, while question-answering systems need pairs of questions and answers.&lt;/li&gt;
&lt;li&gt;Domain Adaptation: Legal documents, scientific papers, or technical manuals for specialized applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below are examples of datasets and methods used in this process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example of a Fine-Tuning Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Task-Specific Data: For sentiment analysis, the &lt;em&gt;Stanford Sentiment Treebank (SST-2) _is a widely used dataset containing annotated movie reviews labeled as positive or negative. Similarly, question-answering systems often use _SQuAD (Stanford Question Answering Dataset)&lt;/em&gt;, which pairs questions with context-based answers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Domain Adaptation: Legal applications employ the &lt;em&gt;CaseLaw Corpus&lt;/em&gt;, a collection of annotated judicial rulings, while medical models could use _PubMed Abstracts _for scientific literature analysis.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Fine-Tuning Methods&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/huggingface/peft?utm_content=g_1000403243" rel="noopener noreferrer"&gt;Parameter-Efficient Fine-Tuning (PEFT)&lt;/a&gt;: PEFT techniques, such as LoRA (Low-Rank Adaptation) or Adapter Layers, update only a small subset of the model's parameters, reducing computational costs while maintaining performance. For instance, LoRA freezes the original model weights and adds trainable low-rank matrices to specific layers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/2407.15838?spm=a2c65.11461447.0.0.1c1d29e3wrkpk4&amp;amp;file=2407.15838" rel="noopener noreferrer"&gt;Instruction Fine-Tuning&lt;/a&gt;: This method involves training the model on task-specific instructions paired with input-output examples. For example, a model fine-tuned on instructions like _"Classify the sentiment of this review: [text]" _learns to follow explicit commands, improving usability in real-world applications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/1911.02685?spm=a2c65.11461447.0.0.1c1d29e3wrkpk4&amp;amp;file=1911.02685" rel="noopener noreferrer"&gt;Transfer Learning&lt;/a&gt;: Pre-trained models are adapted to new domains by fine-tuning domain-specific corpora. For example, a general-purpose LLM can be fine-tuned on financial reports from _EDGAR SEC Filings _to specialize in stock market analysis.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining curated datasets with advanced methods like PEFT, researchers and developers can optimize LLMs for niche applications while addressing resource constraints and scalability challenges&lt;/p&gt;

&lt;h3&gt;
  
  
  Reinforcement Learning
&lt;/h3&gt;

&lt;p&gt;Reinforcement learning from human feedback (RLHF) involves training the model to align better with human preferences. This stage needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human Feedback: Ratings or corrections provided by humans to guide the model's behavior.
Interactive Data: Real-time interactions where the model receives - immediate feedback.
Below are examples of datasets and methods central to RLHF:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example of an RLHF Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Preference Datasets: RLHF begins with collecting human-labeled preference data, where humans rank or rate model outputs. For instance, OpenAI's early RLHF experiments used datasets where annotators compared multiple model-generated responses to the same prompt, labeling which ones were more helpful, truthful, or aligned with ethical guidelines. These datasets often include nuanced examples, such as distinguishing between factual and biased answers in sensitive topics like politics or healthcare.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key RLHF Methods&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Reward Model Training: A reward model is trained on human preference data to predict which outputs humans prefer. This model acts as a proxy for human judgment during reinforcement learning. For example, Alibaba Cloud's Qwen series uses reward models to penalize harmful or unsafe outputs while rewarding clarity and coherence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Proximal Policy Optimization (PPO): PPO is a reinforcement learning algorithm that fine-tunes the LLM's policy (output generation) to maximize rewards from the trained reward model. This method ensures stable updates, preventing drastic deviations from the desired behavior. For example, PPO is used to iteratively refine chatbot responses in systems like &lt;a href="https://qwen.readthedocs.io/en/latest/?utm_content=g_1000403243" rel="noopener noreferrer"&gt;Qwen&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactive Feedback Loops: Real-time human feedback is integrated into training pipelines. For example, AI assistants like Google's Gemini may deploy beta versions to collect user ratings (e.g., thumbs-up/down) on responses, which are fed back into the RLHF pipeline to improve future outputs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Safety-Critical Filtering: Specialized datasets focus on high-stakes scenarios, such as medical advice or legal queries, where errors could have serious consequences. These datasets often involve domain experts annotating outputs for accuracy and safety, ensuring the model adheres to strict guidelines.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Challenges in RLHF Datasets&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Scalability of Human Feedback: Collecting high-quality preference data is labor-intensive and expensive. Scaling this process requires balancing automation (e.g., synthetic feedback) with human oversight to avoid bias.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cultural and Ethical Bias: Preference datasets often reflect the values of annotators from specific regions (e.g., Western-centric perspectives), risking biased outputs in global applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining preference datasets, reward modeling, and iterative human feedback, RLHF ensures LLMs evolve from generic text generators to systems prioritizing safety, relevance, and human alignment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges in Sourcing Data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Exhaustion of Available Data
&lt;/h3&gt;

&lt;p&gt;One of the most pressing issues today is readily available textual data exhaustion. Major tech players have reportedly indexed almost all accessible text data from the open and dark web, including pirated books, movie subtitles, personal messages, and social media posts. With fewer new sources to tap into, the industry faces a bottleneck in further advancements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuovyucoa6kh8q1lz3nz5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuovyucoa6kh8q1lz3nz5.png" alt="Image description" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cumulative amount of data (in logarithmic scale for text, in hours for speech/video) from each source category, across all modalities. Source categories in the legend are ordered in descending order of quantity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cultural Asymmetry
&lt;/h3&gt;

&lt;p&gt;Most datasets originate from Europe and North America, reflecting a Western-centric worldview. Less than 4% of analyzed datasets come from Africa, highlighting a significant cultural imbalance. This bias can lead to skewed perceptions and reinforce stereotypes, particularly in multimodal models that generate images and videos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Centralization of Power
&lt;/h3&gt;

&lt;p&gt;Large corporations dominate the acquisition and control of influential datasets. Platforms like YouTube provide over 70% of video data used in AI training, concentrating immense power in the hands of a few entities. This centralization hinders innovation and creates barriers for smaller players who lack access to these resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collection of Dataset
&lt;/h3&gt;

&lt;p&gt;The following table shows the sources of text collections. Properties include the number of datasets, tasks, languages, and text domains. The Source column indicates the content of the collection: human-generated text on the web, language model output, or both. The final column indicates the collection's licensing status: blue for commercial use, red for non-commercial and academic research, and yellow for unclear licensing. Finally, the OAI column indicates collections that include generations of OpenAI models. The datasets are sorted chronologically to emphasise trends over time. Source &lt;a href="https://arxiv.org/pdf/2402.18041?spm=a2c65.11461447.0.0.1c1d29e3wrkpk4&amp;amp;file=2402.18041" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Collection of the text data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35qa4dkopoiutrt1vvh6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35qa4dkopoiutrt1vvh6.png" alt="Image description" width="800" height="1607"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Collection of the video data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg0ud9qopydyan7dyxvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg0ud9qopydyan7dyxvn.png" alt="Image description" width="800" height="2343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Collection of the audio data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2pz9ln6zmhdeslhmzcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2pz9ln6zmhdeslhmzcl.png" alt="Image description" width="800" height="2628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Solutions and Future Directions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Leveraging Untapped Data Sources
&lt;/h3&gt;

&lt;p&gt;Despite the apparent depletion of easily accessible data, numerous untapped sources remain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Archival Data: Libraries, periodicals, and historical records offer rich, unexplored content.&lt;/li&gt;
&lt;li&gt;Enterprise Data: Companies sit on vast troves of unused data, such as equipment telemetry, meteorological reports, system logs, and marketing statistics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Advanced LLMs can help structure and utilize these latent datasets for future training.&lt;/p&gt;

&lt;h3&gt;
  
  
  Federated Learning
&lt;/h3&gt;

&lt;p&gt;Federated learning allows models to be trained on sensitive data without transferring it outside secure environments. This method is ideal for industries dealing with confidential information, such as healthcare, finance, and telecommunications. By keeping data localized, federated learning ensures privacy while enabling collaborative model improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synthetic Data and Augmentation
&lt;/h3&gt;

&lt;p&gt;Synthetic data generation and data augmentation present promising avenues for expanding training datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Synthetic Data: Generated by algorithms, synthetic data can fill gaps in real-world data but must be handled cautiously to avoid compounding errors.&lt;/li&gt;
&lt;li&gt;Data Augmentation: Modifying existing data through techniques like flipping images, altering colors, or adjusting contrast maintains realism while increasing diversity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;As the field of AI continues to evolve, the role of datasets remains paramount. While the exhaustion of readily available data poses a challenge, it's crucial that we, as AI researchers and enthusiasts, are aware of and take responsibility for addressing issues of cultural asymmetry and centralization. Innovative solutions like leveraging untapped sources, federated learning, and synthetic data generation offer pathways forward. By combining these strategies, we can ensure equitable and diverse AI development, paving the way for more sophisticated and inclusive artificial intelligence systems.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building a RAG Service with Model Studio and AnalyticDB for PostgreSQL</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Mon, 29 Jul 2024 02:52:45 +0000</pubDate>
      <link>https://dev.to/farrruh/building-a-rag-service-with-model-studio-and-analyticdb-for-postgresql-dg1</link>
      <guid>https://dev.to/farrruh/building-a-rag-service-with-model-studio-and-analyticdb-for-postgresql-dg1</guid>
      <description>&lt;p&gt;This tutorial provides a step-by-step guide to setting up a Retrieval-Augmented Generation (RAG) service using Alibaba Cloud Model Studio, Compute Nest, and AnalyticDB for PostgreSQL. With Model Studio, you can leverage top-tier generative AI models like Qwen to develop, deploy, and manage AI applications effortlessly. This setup ensures secure and efficient data handling within your enterprise, enhancing AI capabilities and enabling seamless natural language queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.alibabacloud.com/en/product/modelstudio" rel="noopener noreferrer"&gt;Alibaba Cloud Model Studio&lt;/a&gt; provides a comprehensive platform for developing generative AI applications. Using Compute Nest and AnalyticDB for PostgreSQL, you can create a secure, efficient Retrieval-Augmented Generation (RAG) service to enhance AI capabilities within your enterprise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Alibaba Cloud Model Studio
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ndtf809rh11z06yjo3h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ndtf809rh11z06yjo3h.png" alt="Image description" width="800" height="577"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Features shown in this diagram will be launched gradually&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Model Studio?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.alibabacloud.com/en/product/modelstudio" rel="noopener noreferrer"&gt;Alibaba Cloud Model Studio&lt;/a&gt; is an end-to-end platform aimed at simplifying the development, deployment, and management of generative AI models. With access to industry-leading foundation models like Qwen-Max, Qwen-Plus, Qwen-Turbo, and Qwen 2 series, Model Studio provides tools for model fine-tuning, evaluation, deployment, and integration with enterprise systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Capabilities of Model Studio
&lt;/h3&gt;

&lt;p&gt;1.  &lt;strong&gt;Easy Access to Leading Foundation Models (FM)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Models like Qwen-Max, Qwen-Plus, Qwen-Turbo, and the Qwen 2 series power your applications with enhanced AI capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.  &lt;strong&gt;Built-In Model Inference and Evaluation Workflows&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Support for Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model compression, inference acceleration, and multi-dimensional evaluation tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One-click model deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3.  &lt;strong&gt;Simplified Generative AI Application Development&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Visual workflows for developing applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Template-based prompt engineering.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extensive APIs for integration with business systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4.  &lt;strong&gt;Comprehensive Security Measures&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Isolated VPC networks for securing data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tools for content governance and human-in-the-loop interventions to ensure responsible AI practices.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;5.  &lt;strong&gt;Third-Party Models&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for third-party models like Tongyi, showcased in Q&amp;amp;A, writing, and NL2SQL (Natural Language to SQL) functionalities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;6.  &lt;strong&gt;Data Management&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dataset cleansing and management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrieval-Augmented Generation (RAG) for enhanced search and data access.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;7.  &lt;strong&gt;Industry-Specific Models&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom models for sectors like healthcare, finance, and legal services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;8.  &lt;strong&gt;API and SDK&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assistant API and a suite of SDKs for quick integration and agent development.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before starting, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An active Alibaba Cloud account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Familiarity with cloud services and AI models.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Alibaba Cloud Account Setup
&lt;/h2&gt;

&lt;p&gt;If you haven't already, sign up for an Alibaba Cloud account: &lt;a href="https://www.alibabacloud.com/" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Access Compute Nest
&lt;/h2&gt;

&lt;p&gt;Navigate to Compute Nest and locate the service for Generative AI: &lt;a href="https://computenest.console.aliyun.com/service/instance/create/ap-southeast-1?type=user&amp;amp;ServiceId=service-09b1567c53a44da78fbf&amp;amp;ServiceVersion=beta" rel="noopener noreferrer"&gt;Compute Nest&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbm1lshyylt86owgzjze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbm1lshyylt86owgzjze.png" alt="Image description" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Set Up an Instance and Its Parameters
&lt;/h2&gt;

&lt;p&gt;Configure the necessary parameters for the instance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Service Instance Name&lt;/strong&gt;: Provide a meaningful name for the instance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Elastic Computing Services (ECS) Parameters&lt;/strong&gt;: Recommended to choose &lt;code&gt;ecs.c6.2xlarge&lt;/code&gt; for faster document processing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instance Password&lt;/strong&gt;: Create a secure password for the instance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwbsj9v1lpwimo0oq23z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwbsj9v1lpwimo0oq23z.png" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Setup AnalyticDB for PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Configure an AnalyticDB for PostgreSQL instance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instance Specification&lt;/strong&gt;: Select the suitable specification based on your data volume.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Segment Storage Size&lt;/strong&gt;: Adjust according to your needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DB Username&lt;/strong&gt;: By default &lt;code&gt;kbsuser&lt;/code&gt;, or choose your own username.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DB Password&lt;/strong&gt;: Create a strong password (avoid using symbols like "@").&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qlvdg1szmqa5nced0e1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qlvdg1szmqa5nced0e1.png" alt="Image description" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Configure WebUI Credentials
&lt;/h2&gt;

&lt;p&gt;Configure the web UI credentials to manage and interact with your RAG service:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Username&lt;/strong&gt;: Default is &lt;code&gt;admin&lt;/code&gt;, or choose another username.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Password&lt;/strong&gt;: Create a strong, secure password.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbn2kz685wnz8f6kwn72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbn2kz685wnz8f6kwn72.png" alt="Image description" width="800" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Add Model Studio API Key
&lt;/h2&gt;

&lt;p&gt;Add your Model Studio API key to authenticate and facilitate communication between services:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API Key&lt;/strong&gt;: Enter the API key you obtained from your Model Studio setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3naezme6aeu690en5t3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3naezme6aeu690en5t3.png" alt="Image description" width="800" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a guide on how to obtain your &lt;a href="https://www.alibabacloud.com/help/en/model-studio/developer-reference/get-api-key" rel="noopener noreferrer"&gt;Model Studio API key&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Network Configuration
&lt;/h2&gt;

&lt;p&gt;Choose the appropriate network settings to ensure secure and reliable connectivity:&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Existing Infrastructure Configuration
&lt;/h3&gt;

&lt;p&gt;1.  Select whether to create a new VPC (Virtual Private Cloud) or use an existing one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WhetherCreateVpc&lt;/strong&gt;: Choose &lt;code&gt;Create&lt;/code&gt; if you need a new VPC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.  &lt;strong&gt;VPC ID&lt;/strong&gt;: Enter the ID of an existing VPC or create a new one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Create VPC&lt;/strong&gt;: If creating a new VPC, follow the &lt;a href="https://www.alibabacloud.com/help/doc-detail/65398.htm" rel="noopener noreferrer"&gt;Alibaba Cloud VPC Creation Guide&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3.  &lt;strong&gt;VSwitch ID&lt;/strong&gt;: Select the ID of an existing VSwitch or create a new one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Create VSwitch&lt;/strong&gt;: Instructions are available in the &lt;a href="https://www.alibabacloud.com/help/doc-detail/65399.htm" rel="noopener noreferrer"&gt;VSwitch Creation Guide&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4.  &lt;strong&gt;Tags and Resource Groups&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tag&lt;/strong&gt;: Specify a tag that is attached to the created resource.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tag Key&lt;/strong&gt;: Choose the tag key.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tag Value&lt;/strong&gt;: Choose the tag value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resource Group&lt;/strong&gt;: Select the resource group to which the created service instance belongs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create Resource Group&lt;/strong&gt;: Follow the instructions to &lt;a href="https://www.alibabacloud.com/help/doc-detail/94497.htm" rel="noopener noreferrer"&gt;Create a Resource Group&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After configuring these settings, click &lt;strong&gt;Next: Confirm Order&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F258a970e06jovx3uslu2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F258a970e06jovx3uslu2.png" alt="Image description" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By following these steps, you will ensure that your WebUI credentials and network settings are correctly configured to support your Alibaba Cloud Model Studio RAG service effectively.&lt;/p&gt;

&lt;p&gt;After setting up these parameters, click &lt;strong&gt;Next: Confirm Order&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz54j0hudbuoe0l4878p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz54j0hudbuoe0l4878p.png" alt="Image description" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Integrate Gradio for Web UI
&lt;/h2&gt;

&lt;p&gt;Use Gradio to create a web interface for interacting with your service:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set up Gradio&lt;/strong&gt;: Follow Gradio's &lt;a href="https://www.gradio.app/docs/interface" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for installation and configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrate Services&lt;/strong&gt;: Connect Gradio to your backend services (Model Studio API endpoints and AnalyticDB for PostgreSQL).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 8: Deploy Your RAG Service
&lt;/h2&gt;

&lt;p&gt;Review all configurations and accept the &lt;strong&gt;Terms of Service&lt;/strong&gt;. Click &lt;strong&gt;Create Now&lt;/strong&gt; to deploy your RAG service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi5vuo2hps12dx9p0lix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi5vuo2hps12dx9p0lix.png" alt="Image description" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the RAG Service
&lt;/h2&gt;

&lt;h3&gt;
  
  
  General Question Answering
&lt;/h3&gt;

&lt;p&gt;Users can ask questions via the Gradio web interface, and the Model Studio API will provide responses based on the input.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9p0agh9jf0kcetyo2dz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9p0agh9jf0kcetyo2dz.png" alt="Image description" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Uploading Documents for Retrieval Augmentation
&lt;/h3&gt;

&lt;p&gt;Users can upload documents which will be stored in the vector database, enhancing the model's retrieval capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modifying the Service
&lt;/h3&gt;

&lt;p&gt;Authorized users can access the ECS instance to make any necessary changes or updates to the service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This tutorial has guided you through the comprehensive process of building a Retrieval-Augmented Generation (RAG) service using &lt;a href="https://www.alibabacloud.com/en/product/modelstudio" rel="noopener noreferrer"&gt;Alibaba Cloud Model Studio&lt;/a&gt;, Compute Nest, and AnalyticDB for PostgreSQL. By leveraging Model Studio's powerful suite of generative AI models, including Qwen, you can streamline the development, deployment, and management of AI applications within your enterprise. This setup ensures secure, scalable, and efficient interactions, from natural language queries to document retrieval enhancements. Following these steps will enable you to harness advanced AI capabilities, thereby transforming data management and utilization within your organization.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This article was originally published on &lt;a href="https://www.alibabacloud.com/blog/building-a-retrieval-augmented-generation-rag-service-on-compute-nest-with-alibaba-cloud-model-studio-and-analyticdb-for-postgresql_601412" rel="noopener noreferrer"&gt;Alibaba Cloud Blog&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.alibabacloud.com/users/5611950958141783?spm=a2c65.11461447.0.0.66b14f47QbPPFA" rel="noopener noreferrer"&gt;Click here&lt;/a&gt; to learn more tutorials on AI.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building Multimodal Services with Qwen and Model Studio</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Thu, 25 Apr 2024 07:31:08 +0000</pubDate>
      <link>https://dev.to/farrruh/building-multimodal-services-with-qwen-and-model-studio-4b84</link>
      <guid>https://dev.to/farrruh/building-multimodal-services-with-qwen-and-model-studio-4b84</guid>
      <description>&lt;p&gt;&lt;em&gt;Follow me on &lt;a href="https://www.alibabacloud.com/blog/building-multimodal-services-with-qwen-and-model-studio_600962?utm_content=g_1000393141"&gt;Alibaba Cloud Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw45oyqor1fsgpt18ymkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw45oyqor1fsgpt18ymkw.png" alt="Image description" width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are on the cusp of a new era in artificial intelligence. With multimodal AI, the synergy between audio, visual, and textual data is not just an idea but an actionable reality, in which the Qwen Family of Large Language Models (LLMs) plays a pivotal role. This blog will serve as your gateway to understanding and implementing multimodal AI using Alibaba Cloud's Model Studio, Qwen-Audio, Qwen-VL, Qwen-Agent, and OpenSearch (LLM-Based Conversational Search Edition).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://video-intl.alicdn.com/2024/Solution/Multi-Modality.mp4"&gt;Here is the demo video link&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt1csyoznrhever7aa5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt1csyoznrhever7aa5v.png" alt="Image description" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Level Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wwzqo3njrd4o751ammp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wwzqo3njrd4o751ammp.png" alt="Image description" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At its core, the multimodal AI we discuss today hinges on the following technological pillars:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/QwenLM/Qwen-Audio?utm_content=g_1000393141"&gt;&lt;strong&gt;Qwen-Audio&lt;/strong&gt;&lt;/a&gt;: Processes a wide array of audio inputs, converting them into actionable text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/QwenLM/Qwen-VL?utm_content=g_1000393141"&gt;&lt;strong&gt;Qwen-VL&lt;/strong&gt;&lt;/a&gt;: Analyzes images with unprecedented precision, revealing nuanced details and text within visuals.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.alibabacloud.com/help/en/open-search/llm-intelligent-q-a-version/introduction-to-llm-intelligent-q-a-edition?utm_content=g_1000393141"&gt;&lt;strong&gt;OpenSearch (LLM-Based Conversational Search Edition)&lt;/strong&gt;&lt;/a&gt;: Tailors Q&amp;amp;A systems to specific enterprise needs, leveraging vector retrieval and large-scale models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/QwenLM/Qwen-Agent?utm_content=g_1000393141"&gt;&lt;strong&gt;Qwen-Agent&lt;/strong&gt;&lt;/a&gt;: Orchestrates intelligent agents that follow instructions and execute complex tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.alibabacloud.com/product/genai_service_platform?utm_content=g_1000393141"&gt;&lt;strong&gt;Model Studio&lt;/strong&gt;&lt;/a&gt;: The one-stop AI development platform that brings our multimodal ecosystem to life.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All core technologies are integrated into a singular, robust API, ready for deployment on Alibaba Cloud's Elastic Computing Service (ECS), and connected to DingTalk IM or any other IM platform you choose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deep Dive into Qwen-Audio: A Symphony of Sound and Language
&lt;/h3&gt;

&lt;p&gt;Qwen-Audio is not just an audio processing tool — it's an auditory intelligence that speaks the language of sound with unparalleled fluency. It deals with everything from human speech to the subtleties of music, transforming audio to text with remarkable acuity, redefining how we interact with machines using sound as a medium.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2u6mccqhgt84its5i58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2u6mccqhgt84its5i58.png" alt="Image description" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Visual Frontier: Qwen-VL's Pioneering Vision
&lt;/h3&gt;

&lt;p&gt;In the realm of vision, Qwen-VL stands tall with models like &lt;strong&gt;Qwen-VL-Plus&lt;/strong&gt; and &lt;strong&gt;Qwen-VL-Max&lt;/strong&gt; that set new benchmarks in image processing. These models not only match but exceed the capabilities of industry giants, offering an extraordinary level of visual understanding. Whether it's recognizing minute details in a million-pixel image or comprehending complex visual scenes, Qwen-VL is your lens to clarity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjzmopq8z6nhn46myuvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjzmopq8z6nhn46myuvh.png" alt="Image description" width="717" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenSearch (LLM-Based Conversational Search Edition): One-Stop Multimodal SAAS RAG
&lt;/h3&gt;

&lt;p&gt;OpenSearch (LLM-Based Conversational Search Edition) embodies the quest for precision in a sea of data. It's the beacon that enterprises need to navigate the complexities of industry-specific Q&amp;amp;A systems. The solution is elegant — vectorize your business data, index it, and let OpenSearch find the answers that are as accurate as they are relevant to your enterprise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxg7seakw6il6nbluoaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxg7seakw6il6nbluoaq.png" alt="Image description" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Qwen-Agent: The Architect of Intelligent Interaction
&lt;/h3&gt;

&lt;p&gt;The Qwen-Agent framework is where the building blocks of intelligence are assembled to create something truly special. With it, developers can construct agents that not only understand instructions but can use tools, plan, and remember. It's not just an AI — it's a digital being that can learn and evolve to meet your application's needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4fy74qr3y89wpnt0zhq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4fy74qr3y89wpnt0zhq.png" alt="Image description" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Studio: The GenAI Powerhouse
&lt;/h3&gt;

&lt;p&gt;At the heart of this ecosystem lies &lt;a href="https://www.alibabacloud.com/product/genai_service_platform?utm_content=g_1000393141"&gt;Model Studio&lt;/a&gt;, Alibaba Cloud's generative AI playground. This is where models are not just trained but born, tailored to the unique requirements of each application. It's where the full spectrum of AI — from data management to deployment — comes together in a secure, responsible, and efficient manner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bqd1tx7a1bvhve1hdfe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bqd1tx7a1bvhve1hdfe.png" alt="Image description" width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The API: Your Multimodal Maestro
&lt;/h3&gt;

&lt;p&gt;The final act in our symphony is the creation of a unified API. Using Python and FlaskAPI, we will encapsulate the intelligence of our multimodal models into an accessible, scalable, and robust service. Deployed on ECS, this API will become the bridge that connects your applications to the intelligent orchestration of Qwen LLMs, ready to be engaged via DingTalk IM or any IM service of your preference.&lt;/p&gt;

&lt;p&gt;Integrating Qwen Family LLMs with Model Studio overall steps can be described below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Initial setup and configuration of Model Studio.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Detailed instructions for integrating Qwen-Audio and Qwen-VL with your applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strategies for leveraging OpenSearch for creating intelligent enterprise solutions, &lt;a href="https://www.alibabacloud.com/blog/opensearch-a-one-stop-solution-to-easily-integrate-llm-generative-ai-in-your-application_600509?utm_content=g_1000393141"&gt;link&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Best practices for developing and deploying Qwen-Agent for enhanced AI interactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tips for orchestrating all these components into a single, cohesive API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployment guidelines on Alibaba Cloud ECS and connectivity with DingTalk IM.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detail step-by-step tutorials where by following you will become adept at creating AI applications that can see, hear, and understand the world in ways that were previously unimaginable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases: Bringing Multimodal AI to Life
&lt;/h2&gt;

&lt;p&gt;Multimodal AI isn't a distant dream — it's already unlocking new opportunities across various industries. Here are some real-world applications where the Qwen Family LLMs and Model Studio integration can make a significant impact:&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer Service Enhancement
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6q2po3qoni58xe6wm12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6q2po3qoni58xe6wm12.png" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine a customer service system that not only understands the text queries but can also interpret the tone and emotion in a customer's voice through Qwen-Audio. It can analyze facial expressions from video calls using Qwen-VL, providing a more personalized and responsive service experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Healthcare Solutions
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcclx2ly57yop96f2m6i6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcclx2ly57yop96f2m6i6.png" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In healthcare, multimodal AI can revolutionize patient care. Qwen-VL can assist radiologists by identifying anomalies in medical imaging, while Qwen-Audio can transcribe and analyze patient interviews, and OpenSearch can deliver swift, accurate answers to complex medical inquiries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Education Platforms
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhg2n81vtlxn40yymlqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhg2n81vtlxn40yymlqf.png" alt="Image description" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multimodal AI can tailor educational content to individual learning styles. Qwen-Audio can evaluate and give feedback on language pronunciation, Qwen-VL can analyze written assignments, and OpenSearch can provide students with in-depth explanations and study materials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Efficient Retail Operations
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo75mbftnvwu0e0g5g20n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo75mbftnvwu0e0g5g20n.png" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In retail, multimodal AI can create immersive shopping experiences. Customers can use natural language to search for products using voice commands, and Qwen-VL can recommend items based on visual cues, such as colors or styles, from a photo or video.&lt;/p&gt;

&lt;h3&gt;
  
  
  Legal and Compliance Research
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdehipix6sp8awr13bmhv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdehipix6sp8awr13bmhv.png" alt="Image description" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Law firms and compliance departments can leverage multimodal AI to sift through vast amounts of legal documents. Qwen-Agent, powered by OpenSearch, can provide precise legal precedents and relevant case law, streamlining legal research and decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The convergence of multimodal AI technologies is paving the way for applications that can engage with the world in a human-like manner. The Qwen Family LLMs, each specialized in their domain, represent the building blocks of this intelligent future. With Model Studio as your development hub, the ability to create advanced, intuitive, and responsive AI applications is now at your fingertips.&lt;/p&gt;

&lt;p&gt;Embark on this journey with us as we explore the limitless potential of multimodal AI. Stay tuned for "Multimodality Unleashed: Integrating Qwen Family LLMs with Model Studio," the tutorial that will transform the way you think about and implement AI in your projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.alibabacloud.com/en/solutions/generative-ai/qwen?utm_content=g_1000393141"&gt;Start your multimodal AI adventure here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you for joining me on this exploration of multimodal AI. Your journey into the next dimension of artificial intelligence starts now.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiops</category>
      <category>machinelearning</category>
      <category>learning</category>
    </item>
    <item>
      <title>GenAI Model Optimization: Guide to Fine-Tuning and Quantization</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Wed, 03 Apr 2024 09:30:06 +0000</pubDate>
      <link>https://dev.to/farrruh/genai-model-optimization-guide-to-fine-tuning-and-quantization-16hp</link>
      <guid>https://dev.to/farrruh/genai-model-optimization-guide-to-fine-tuning-and-quantization-16hp</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes5mbvcg0170kywitpui.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes5mbvcg0170kywitpui.jpeg" alt="Image description" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Artificial Intelligence has transcended from a buzzword to a vital tool in both business and personal applications. As the AI field grows, so does the need for more efficient and task-specific models. This is where fine-tuning and quantization come into play, allowing us to refine pre-built models to better suit our needs and to do so more efficiently. Below is a guide designed to take beginners through the process of fine-tuning and quantizing a language model using Python and the Hugging Face &lt;code&gt;Transformers&lt;/code&gt; library.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Importance of Fine-Tuning and Quantization in AI
&lt;/h2&gt;

&lt;p&gt;Fine-tuning is akin to honing a broad skill set into a specialized one. A pre-trained language model might know a lot about many topics, but through fine-tuning, it can become an expert in a specific domain, such as legal jargon or medical terminology.&lt;/p&gt;

&lt;p&gt;Quantization compliments this by making these large models more resource-efficient, reducing the memory footprint and speeding up computation, which is especially beneficial when deploying models on edge devices or in environments with limited computational power.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgm2xhxxpkr9onuhhaw25.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgm2xhxxpkr9onuhhaw25.jpeg" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Value for Businesses and Individuals
&lt;/h2&gt;

&lt;p&gt;Businesses can leverage fine-tuned and quantized models to create advanced AI applications that didn't seem feasible due to resource constraints. For individuals, these techniques make it possible to run sophisticated AI on standard hardware, making personal projects or research more accessible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge6v4a6v0awpsz77tfio.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge6v4a6v0awpsz77tfio.jpeg" alt="Image description" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Your Hugging Face Account
&lt;/h2&gt;

&lt;p&gt;Before tackling the code, you'll need access to AI models and datasets. Hugging Face is the place to start:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Visit &lt;a href="https://huggingface.co/?utm_content=g_1000392349"&gt;Hugging Face&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Sign Up&lt;/strong&gt; to make a new account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Complete the registration process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify your email, and you're all set!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7reyke93h8kjbu2okdh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7reyke93h8kjbu2okdh.png" alt="Image description" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Preparing the Environment
&lt;/h3&gt;

&lt;p&gt;First, the necessary libraries are imported. You'll need the &lt;code&gt;torch&lt;/code&gt; library for PyTorch functionality, and the &lt;code&gt;transformers&lt;/code&gt; library from Hugging Face for model architectures and pre-trained weights. Other imports include &lt;code&gt;datasets&lt;/code&gt; for loading and handling datasets, and &lt;code&gt;peft&lt;/code&gt; and &lt;code&gt;trl&lt;/code&gt; for efficient training routines and quantization support.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dataset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;BitsAndBytesConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PeftModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SFTTrainer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Selecting the Model and Dataset
&lt;/h3&gt;

&lt;p&gt;Next, the code specifies the model and dataset to use, which are crucial for fine-tuning. The &lt;code&gt;model_name&lt;/code&gt; variable holds the identifier of the pre-trained model you wish to fine-tune, and &lt;code&gt;dataset_name&lt;/code&gt; is the identifier of the dataset you'll use for training.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen-7B-Chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;dataset_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlabonne/guanaco-llama2-1k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;new_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen-7B-Chat-SFT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fine-Tuning Parameters
&lt;/h3&gt;

&lt;p&gt;Parameters for fine-tuning are set using &lt;code&gt;TrainingArguments&lt;/code&gt;. This includes the number of epochs, batch size, learning rate, and more, which determine how the model will learn during the fine-tuning process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;training_arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_accumulation_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other arguments
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quantization with BitsAndBytes
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;BitsAndBytesConfig&lt;/code&gt; configures the model for quantization. By setting &lt;code&gt;load_in_4bit&lt;/code&gt; to &lt;code&gt;True&lt;/code&gt;, you're enabling the model to use a 4-bit quantized version, reducing its size and potentially increasing speed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;bnb_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BitsAndBytesConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;use_4bit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_compute_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;compute_dtype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_use_double_quant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;use_nested_quant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fine-Tuning and Training the Model
&lt;/h3&gt;

&lt;p&gt;The model is loaded with the specified configuration, and the tokenizer is prepared. The &lt;code&gt;SFTTrainer&lt;/code&gt; is then used to fine-tune the model on the loaded dataset. After training, the model is saved for future use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantization_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bnb_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other configurations
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SFTTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other configurations
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Evaluating Your Model
&lt;/h3&gt;

&lt;p&gt;With the model fine-tuned and quantized, you can now generate text based on prompts to see how well it performs. This is done using the &lt;code&gt;pipeline&lt;/code&gt; function from &lt;code&gt;transformers&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;s&amp;gt;[INST] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [/INST]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;generated_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Engaging Tutorial Readers
&lt;/h2&gt;

&lt;p&gt;This guide should walk the readers step by step, from setting up their environment to running their first fine-tuned and quantized model. Each step should be illustrated with a snippet from the code provided, explaining its purpose and guiding the reader on how to modify it for their needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By the end of this tutorial, readers will have a solid understanding of how to fine-tune and quantize a pre-trained language model. This knowledge opens up a new world of possibilities for AI applications, making models more specialized and efficient.&lt;/p&gt;

&lt;p&gt;Remember that the field of AI is constantly evolving, and staying up-to-date with the latest techniques is key to unlocking its full potential. So dive in, experiment, and don't hesitate to share your achievements and learnings with the community.&lt;/p&gt;

&lt;p&gt;Get ready to fine-tune your way to AI excellence!&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;




&lt;p&gt;Follow me on &lt;a href="https://www.alibabacloud.com/blog/genai-model-optimization-guide-to-fine-tuning-and-quantization_600954?utm_content=g_1000392349"&gt;Alibaba Cloud community&lt;/a&gt; to stay tuned!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiops</category>
      <category>cloud</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Igniting the AI Revolution - A Journey with Qwen, RAG, and LangChain</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Thu, 14 Mar 2024 06:59:36 +0000</pubDate>
      <link>https://dev.to/farrruh/igniting-the-ai-revolution-a-journey-with-qwen-rag-and-langchain-2fk7</link>
      <guid>https://dev.to/farrruh/igniting-the-ai-revolution-a-journey-with-qwen-rag-and-langchain-2fk7</guid>
      <description>&lt;p&gt;In the era of Artificial Intelligence (AI), extracting meaningful knowledge from vast datasets has become critical for both businesses and individuals. Enter Retrieval-Augmented Generation (RAG), a breakthrough that has turbocharged the capabilities of AI, empowering systems to not only generate human-like text but also pull in relevant information in real-time. This fusion produces responses that are both rich in context and precise in detail.&lt;/p&gt;

&lt;p&gt;As we set sail on the exciting voyage through the vast ocean of Artificial Intelligence (AI), it's essential to understand the three pillars that will be our guiding stars: Generative AI, Large Language Models (LLMs), LangChain, Hugging Face, and the useful application on this RAG (Retrieval-Augmented Generation).&lt;/p&gt;

&lt;h2&gt;
  
  
  Large Language Models and Generative AI: The Engines of Innovation
&lt;/h2&gt;

&lt;p&gt;At the core of our journey lie Large Language Models (LLMs) and &lt;a href="https://www.alibabacloud.com/solutions/generative-ai?utm_content=g_1000391368" rel="noopener noreferrer"&gt;Generative AI&lt;/a&gt; - two potent engines driving the innovation vessel forward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large Language Models (LLMs)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Ff2ffe972a4df238b1a0eb6ff1d91697d436378f4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Ff2ffe972a4df238b1a0eb6ff1d91697d436378f4.png" alt="1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LLMs, such as &lt;a href="https://qwenlm.github.io?utm_content=g_1000391368" rel="noopener noreferrer"&gt;Qwen&lt;/a&gt;, GPT, and others, are the titans of text, capable of understanding and generating human-like language on a massive scale. These models have been trained on extensive corpora of text data, allowing them to predict and produce coherent and contextually relevant strings of text. They are the backbone of many natural language processing tasks, from translation to content creation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generative AI (GenAI)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.alibabacloud.com/solutions/generative-ai?utm_content=g_1000391368" rel="noopener noreferrer"&gt;Generative AI&lt;/a&gt; is the artful wizard of creation within the AI realm. It encompasses technologies that generate new data instances that resemble the training data, such as images, music, and, most importantly for our voyage, text. In our context, Generative AI refers to the ability of AI to craft novel and informative responses, stories, or ideas that have never been seen before. It enables AI to not just mimic the past but to invent, innovate, and inspire.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain: Orchestrating Your AI Symphony
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Ff6b9b1ffe40a0e67e781c9a8a32036ed1a999c85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Ff6b9b1ffe40a0e67e781c9a8a32036ed1a999c85.png" alt="2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://python.langchain.com/docs/get_started/introduction?utm_content=g_1000391368" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; serves as the architect of our AI workflow, meticulously designing the structure that allows for seamless integration and interaction between various AI components. This framework simplifies the complex process of chaining together data flow from intelligent subsystems, including LLMs and retrieval systems, making tasks such as information extraction and natural language understanding more accessible than ever before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hugging Face: The AI Model Metropolis
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2F91866b01ce9ef7890f4809d708a3c6bfcf96251f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2F91866b01ce9ef7890f4809d708a3c6bfcf96251f.png" alt="3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hugging Face stands as a bustling metropolis where AI models thrive. This central hub offers a vast array of pre-trained models, serving as a fertile ground for machine learning exploration and application. To gain entry to this hub and its resources, you must create a Hugging Face account. Once you take this step, the doors to an expansive world of AI await you — just visit &lt;a href="https://huggingface.co/join?utm_content=g_1000391368" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; and sign up to begin your adventure.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG: Harnessing Vector Databases for Accelerated Intelligence
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2F1f22c1d186f8d60735a4665ac026e1ef737af524.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2F1f22c1d186f8d60735a4665ac026e1ef737af524.png" alt="4"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is a sophisticated AI technique that marries the inventive power of Generative AI with the precision of knowledge retrieval, creating a system that's not only articulate but also deeply informed. To unlock the full potential and efficiency of RAG, it integrates vector databases—a powerful tool for speedily sifting through vast information repositories. Here's an enhanced breakdown of how RAG operates with vector databases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieval with&lt;/strong&gt; &lt;a href="https://www.alibabacloud.com/blog/next-level-conversations-llm-%2B-vectordb-with-alibaba-cloud-is-customizable-and-cost-efficient_599985?utm_content=g_1000391368" rel="noopener noreferrer"&gt;&lt;strong&gt;Vector Databases&lt;/strong&gt;&lt;/a&gt;: RAG begins its process by querying a vector database, which houses embedded representations of a large corpus of information. These embeddings are high-dimensional vectors that encapsulate the semantic essence of documents or data snippets. Vector databases enable RAG to perform lightning-fast searches across these embeddings to pinpoint content that is most relevant to a given query, much like an AI swiftly navigating a digital library to find just the right book.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Augmentation with Context&lt;/strong&gt;: The relevant information retrieved from the vector database is then provided to a generative model as contextual augmentation. This step equips the AI with a concentrated dose of knowledge, enhancing its ability to craft responses that are not only creative but also contextually rich and precise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generation of Informed Responses&lt;/strong&gt;: Armed with this context, the generative model proceeds to produce text. Unlike standard generative models that rely solely on learned patterns, RAG weaves in the specifics from the retrieved data, resulting in outputs that are both imaginative and substantiated by the retrieved knowledge. The generation is thus elevated, yielding responses that are more accurate, informative, and reflective of true context.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The integration of&lt;a href="https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/user-guide/overview-vector-analysis?utm_content=g_1000391368" rel="noopener noreferrer"&gt; vector databases&lt;/a&gt; is key to RAG's efficiency. Traditional metadata search methods can be slower and less precise, but vector databases facilitate near-instantaneous retrieval of contextually relevant information, even from extremely large datasets. This approach not only saves valuable time but also ensures that the AI's responses are grounded in the most appropriate and current information available.&lt;/p&gt;

&lt;p&gt;RAG's prowess is especially advantageous in applications like chatbots, digital assistants, and sophisticated research tools — anywhere where the delivery of precise, reliable, and contextually grounded information is crucial. It's not simply about crafting responses that sound convincing; it's about generating content anchored in verifiable data and real-world knowledge.&lt;/p&gt;

&lt;p&gt;Armed with an enriched comprehension of LangChain, Hugging Face, LLMs, GenAI, and the vector database-enhanced RAG, we stand on the brink of a coding adventure that will bring these technologies to life. The Python script we'll delve into represents the synergy of these elements, demonstrating an AI system capable of responding with not just creativity and context but also with a depth of understanding once thought to be the domain of science fiction. Prepare to code and experience the transformative power of RAG with vector databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Begin Coding Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before You Begin: The Essentials
&lt;/h3&gt;

&lt;p&gt;Before we set sail on this tech odyssey, let's make sure you've got all your ducks in a row:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A Linux server is better with a GPU card – 'cause let's face it, speed is of the essence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Python 3.6 or higher – the magic wand of programming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pip or Anaconda – your handy dandy package managers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;if it is with a GPU card, then NVIDIA drivers, CUDA Toolkit, and cuDNN – the holy trinity for GPU acceleration.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Got all that? Fabulous! Let's get our hands dirty (figuratively, of course).&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the Code
&lt;/h3&gt;

&lt;p&gt;By carefully managing your Python dependencies, you ensure that your AI project is built on a stable and reliable foundation. With the dependencies in place and the environment set up correctly, you're all set to run the script and witness the power of RAG and LangChain in action.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting the Stage: Import Libraries and Load Variables
&lt;/h3&gt;

&lt;p&gt;Before we can embark on our exploration of AI with the LangChain framework and Hugging Face's Transformers library, it's crucial to establish a secure and well-configured environment. This preparation involves importing the necessary libraries and managing sensitive information such as API keys through environment variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cuda&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.embeddings.huggingface&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFaceEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.llms.huggingface_pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HuggingFacePipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When working with AI models from Hugging Face, you often need access to the Hugging Face API, which requires an API key. This key is your unique identifier when making requests to Hugging Face services, allowing you to load models and use them in your applications.&lt;/p&gt;

&lt;p&gt;Here's what you need to do to securely set up your environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Obtain Your Hugging Face API Key&lt;/strong&gt;: Once you have created your Hugging Face account, you can find your API key in your account settings under the 'Access Tokens' section.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secure Your API Key&lt;/strong&gt;: Your API key is sensitive information and should be kept private. Rather than hard-coding it into your scripts, you should use environment variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a .env File&lt;/strong&gt;: Create a file named .env. This file will store your environment variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add Your API Key to the .env File&lt;/strong&gt;: Open the .env file with a text editor and add your Hugging Face API key in the following format:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HUGGINGFACE_API_KEY=your_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace your_api_key_here with the actual API key you obtained from Hugging Face.&lt;/p&gt;

&lt;h3&gt;
  
  
  Define the Model Path and Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;modelPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-mpnet-base-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;model_kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we set the path to the pre-trained model that will be used for embeddings. We also configure the device setting, utilizing a GPU if available for faster computation, or defaulting to CPU otherwise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initialize Hugging Face Embeddings and FAISS Vector Store
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;modelPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Made up data, just for fun, but who knows in a future
&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Harrison worked at Alibaba Cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We initialize an instance of HuggingFaceEmbeddings with our chosen model and configuration. Then, we create a vectorstore using FAISS, which allows us to perform efficient similarity searches in high-dimensional spaces. We also instantiate a retriever that will fetch information based on the embeddings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set Up the Chat Prompt Template
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer the question based only on the following context:
{context}
Question: {question}
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we define a chat prompt template that will be used to structure the interaction with the AI. It includes placeholders for context and a question, which will be dynamically filled during the execution of the chain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prepare the Tokenizer and Language Model
&lt;/h3&gt;

&lt;p&gt;In the world of AI and natural language processing, the tokenizer and language model are the dynamic duo that turn text into meaningful action. The tokenizer breaks down language into pieces that the model can understand, while the language model predicts and generates language based on these inputs. In our journey, we're using Hugging Face's AutoTokenizer and AutoModelForCausalLM classes to leverage these capabilities. But it's important to remember that one size does not fit all when it comes to choosing a language model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Model Size and Computational Resources
&lt;/h4&gt;

&lt;p&gt;The size of the model is a critical factor to consider. Larger models like Qwen-72B have more parameters, which generally means they can understand and generate more nuanced text. However, they also require more computational power. If you're equipped with high-end GPUs and sufficient memory, you might opt for these larger models to get the most out of their capabilities.&lt;/p&gt;

&lt;p&gt;On the other hand, smaller models like Qwen-1.8B are much more manageable for standard computing environments. Even this tiny model should be able to run on IoT and mobile devices. While they may not capture the intricacies of language as well as their larger counterparts, they still provide excellent performance and are more accessible for those without specialized hardware.&lt;/p&gt;

&lt;h4&gt;
  
  
  Task-Specific Models
&lt;/h4&gt;

&lt;p&gt;Another point to consider is the nature of your task. If you're building a conversational AI, using a chat-specific model such as Qwen-7B-Chat might yield better results as these models are fine-tuned for dialogues and can handle the nuances of conversation better than the base models.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cost of Inference
&lt;/h4&gt;

&lt;p&gt;Larger models not only demand more from your hardware but may also incur higher costs if you're using cloud-based services to run your models. Each inference takes up processing time and resources, which can add up if you're working with a massive model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Qwen Series
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-1.8B&lt;/strong&gt;: A smaller model suitable for tasks requiring less computational power. Good for prototyping and running on machines without powerful GPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-7B&lt;/strong&gt;: A mid-size model that balances performance with computational demands. Suitable for a range of tasks, including text generation and question-answering.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-14B&lt;/strong&gt;: A larger model that can handle more complex tasks with greater nuance in language understanding and generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-72B&lt;/strong&gt;: The largest model in the series, offering state-of-the-art performance for advanced AI applications that require deep language comprehension.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-1.8B-Chat&lt;/strong&gt;: A conversational model designed specifically for building chatbots and other dialogue systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-7B-Chat&lt;/strong&gt;: Similar to Qwen-1.8B-Chat, but with increased capacity for handling more complex dialogues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-14B-Chat&lt;/strong&gt;: A high-end conversational model capable of sophisticated dialogue interactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen-72B-Chat&lt;/strong&gt;: The most advanced conversational model in the Qwen series, providing exceptional performance for demanding chat applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Making the Choice
&lt;/h4&gt;

&lt;p&gt;When deciding which model to use, weigh the benefits of a larger model against the available resources and the specific requirements of your project. If you're just starting out or developing on a smaller scale, a smaller model might be the best choice. As your needs grow, or if you require more advanced capabilities, consider moving up to a larger model.&lt;/p&gt;

&lt;p&gt;Remember, the Qwen series is open-source, so you can experiment with different models to see which one fits your project best. Here's how the model selection part of the script could look if you decided to use a different model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This can be changed to any of the Qwen models based on your needs and resources
&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen-7B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model_name_or_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen-7B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name_or_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                             &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                             &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We load a tokenizer and a causal language model from Hugging Face with the AutoTokenizer and AutoModelForCausalLM classes, respectively. These components are crucial for processing natural language inputs and generating outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the Text Generation Pipeline
&lt;/h3&gt;

&lt;p&gt;This pipeline is designed to generate text using a language model and a tokenizer that have been previously loaded. Let's break down the parameters and understand their roles in controlling the behavior of the text generation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;hf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFacePipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Explanation of Parameters in the Text Generation Pipeline:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;max_new_tokens (8192)&lt;/strong&gt;: This parameter specifies the maximum number of tokens that can be generated in the output. Tokens can be words, characters, or subwords, depending on the tokenizer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;do_sample (True)&lt;/strong&gt;: When set to True, this parameter enables probabilistic sampling from the distribution of possible next tokens generated by the model. This introduces randomness and variety in the generated text. If set to False, the model would always pick the most likely next token, leading to deterministic and less varied outputs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;temperature (0.7)&lt;/strong&gt;: The temperature parameter controls how much randomness is introduced into the sampling process. A lower temperature value (closer to 0) makes the model more confident in its choices, resulting in less random outputs, while a higher temperature value (closer to 1) encourages more randomness and diversity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;top_p (0.95)&lt;/strong&gt;: This parameter controls nucleus sampling, a technique that considers only the most probable tokens with a cumulative probability above the threshold top_p. It helps in generating text that is both diverse and coherent, avoiding the inclusion of very low-probability tokens that could make the text nonsensical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;top_k (40)&lt;/strong&gt;: Top-k sampling limits the sampling pool to the k most likely next tokens. This further refines the set of tokens that the model will consider for generating the next piece of text, ensuring that the outputs remain relevant and coherent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;repetition_penalty (1.1)&lt;/strong&gt;: This parameter discourages the model from repeating the same tokens or phrases, promoting more interesting and diverse text. A value greater than 1 penalizes and thus reduces, the likelihood of tokens that have already appeared.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After setting up the pipeline with the desired parameters, the next line of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFacePipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wraps the pipe object in a HuggingFacePipeline. This class is a part of the LangChain framework and allows the pipeline to be integrated seamlessly into LangChain's workflow for building AI applications. By wrapping the pipeline, we can now use it in conjunction with other components of the LangChain, such as retrievers and parsers, to create more complex AI systems.&lt;/p&gt;

&lt;p&gt;The careful selection of these parameters allows you to fine-tune the behavior of the text generation to suit the specific needs of your application, whether you're looking for more creative and varied outputs or aiming for consistently coherent and focused text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build and Run the RAG Chain
&lt;/h3&gt;

&lt;p&gt;The below code snippet represents a complete end-to-end RAG system where the initial question prompts a search for relevant information, which is then used to augment the generative process, resulting in an informed and contextually relevant answer to the input question.&lt;/p&gt;

&lt;p&gt;1.  &lt;strong&gt;Chain Construction&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;hf&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what's happening in this part of the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A retriever is used to fetch relevant information based on the query. The retriever’s role is to comb through a dataset or a collection of documents to find the pieces of information that are most pertinent to the question being asked. This is likely using a vector database for efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;_RunnablePassthrough() _is a component that simply passes along the question without any modification. This suggests that the chain is designed to handle the question directly, probably as it was entered by a user.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;em&gt;prompt&lt;/em&gt; is not shown in detail here, but it likely serves as a template or a set of instructions that formats the input question and the retrieved context in a way that is suitable for the next stage in the pipeline, which is the Hugging Face model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;em&gt;hf&lt;/em&gt; variable represents the Hugging Face pipeline, which is presumably a pre-trained language model capable of generating responses. This pipeline will take the formatted input from the previous step and use its generative capabilities to produce an answer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;em&gt;StrOutputParser()&lt;/em&gt; is an output parser, and its job is to take the raw output from the Hugging Face pipeline and parse it into a more user-friendly format, presumably a string.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The use of the | (pipe) operator suggests that this code is using a functional programming style, specifically the concept of function composition or a pipeline pattern where the output of one function becomes the input to the next.&lt;/p&gt;

&lt;p&gt;2.  &lt;strong&gt;Chain Invocation&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Where did Harrison work?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this line, the chain is being invoked with a specific question: &lt;em&gt;"Where did Harrison work?"&lt;/em&gt; This invocation triggers the entire sequence of operations defined in the chain. The retriever searches for relevant information, which is then passed along with the question through the prompt and into the Hugging Face model. The model generates a response based on the inputs it receives.&lt;/p&gt;

&lt;p&gt;3.  &lt;strong&gt;Printing Results&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated response is then parsed by the StrOutputParser() and returned as the final result, which is then printed to the console or another output.&lt;/p&gt;

&lt;p&gt;Finally, we construct the RAG chain by linking the retriever, prompt template, Hugging Face pipeline, and output parser. We invoke the chain with our question, and the results are printed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Fcec5fb6342a9e5d80414ca6680d031dccdbfaf90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fyqintl.alicdn.com%2Fcec5fb6342a9e5d80414ca6680d031dccdbfaf90.png" alt="6"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Your Gateway to AI Mastery
&lt;/h2&gt;

&lt;p&gt;You've just taken a giant leap into the world of AI with RAG and LangChain. By understanding and running this code, you're unlocking the potential to create intelligent systems that can reason and interact with information in unprecedented ways.&lt;/p&gt;

&lt;p&gt;Remember, this is only the beginning. The more you experiment and tinker with RAG, the deeper your understanding and the greater your ability to innovate. &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Follow me on &lt;a href="https://community.alibabacloud.com/users/5611950958141783?utm_content=g_1000391368" rel="noopener noreferrer"&gt;Alibaba Cloud Community&lt;/a&gt; to get the latest feed!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>alibaba</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>Deploy Your Own AI Chat Buddy - The Qwen Chat Model Deployment with HuggingFace Guide</title>
      <dc:creator>Farrruh</dc:creator>
      <pubDate>Mon, 26 Feb 2024 05:39:54 +0000</pubDate>
      <link>https://dev.to/farrruh/deploy-your-own-ai-chat-buddy-the-qwen-chat-model-deployment-with-huggingface-guide-26jn</link>
      <guid>https://dev.to/farrruh/deploy-your-own-ai-chat-buddy-the-qwen-chat-model-deployment-with-huggingface-guide-26jn</guid>
      <description>&lt;p&gt;Follow me on &lt;a href="https://www.alibabacloud.com/blog/deploy-your-own-ai-chat-buddy---the-qwen-chat-model-deployment-with-hugging-face-guide_600859?utm_content=g_1000390363"&gt;Alibaba Cloud community&lt;/a&gt; to stay tuned!&lt;/p&gt;




&lt;p&gt;Alright, you tech-savvy human, brace yourself for a thrilling adventure into the land of artificial intelligence! We're not just dipping our toes here; we're diving headfirst into the deep end with the Qwen Chat Model. What's on the agenda? Setting up a cleverer chatbot than a fox and respecting privacy like a top-notch secret agent. Intrigued? You should be! Let's start our journey by understanding Generative AI and LLM (Large Language Model).&lt;/p&gt;
&lt;h2&gt;
  
  
  Generative AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.alibabacloud.com/solutions/generative-ai?spm=a3c0i.7911826.6791778070.591.62d93870rUVuHs"&gt;Generative AI&lt;/a&gt; refers to the branch of artificial intelligence focused on creating new content, whether text, images, music, or other forms of media. This type of AI leverages machine learning models, particularly generative models, to understand patterns, features, and relationships in large datasets and generate outputs that are new and often indistinguishable from human-created content.&lt;/p&gt;
&lt;h3&gt;
  
  
  Types of Generative Models
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generative Adversarial Networks (GANs):&lt;/strong&gt; A type of neural network architecture where two models (the generator and discriminator) are trained simultaneously. The generator creates new data instances while the discriminator evaluates them. The process results in increasingly more convincing outputs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Variational Autoencoders (VAEs):&lt;/strong&gt; These models generate new instances similar to the input data. They're often used in image generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transformers:&lt;/strong&gt; Originally designed for NLP tasks, transformer models like GPT (Generative Pretrained Transformer) can generate coherent and contextually relevant text. They are also being adapted for generative tasks for other types of data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Applications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content Creation:&lt;/strong&gt; Generative AI can produce original artwork, write stories or articles, compose music, and create virtual environments for games and simulations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Augmentation:&lt;/strong&gt; It can generate additional training data for machine learning models, helping to improve their accuracy and robustness.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Personalization:&lt;/strong&gt; Algorithms can tailor content to individual preferences, improving user engagement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Drug Discovery:&lt;/strong&gt; Generative models can propose new molecular structures for drugs that could be effective against specific diseases.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Challenges
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality Control:&lt;/strong&gt; Ensuring that the generated content meets quality standards and is free of biases present in the training data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Computational Requirements:&lt;/strong&gt; Training generative models often requires significant computational power and large datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interpretability:&lt;/strong&gt; Understanding how these models make decisions and generate outputs can be challenging, which impacts trust and reliability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Generative AI continues to evolve rapidly, and its capabilities are expanding the boundaries of what machines can create, offering both exciting opportunities and challenges that need to be managed responsibly.&lt;/p&gt;
&lt;h2&gt;
  
  
  LLM
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90cvjcz2dftlibhvyb22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90cvjcz2dftlibhvyb22.png" alt="Image description" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What are Large Language Models (LLMs)? They are a type of artificial intelligence based on deep learning techniques that are designed to understand, generate, and work with human language. They are called "large" because they consist of many millions, or even billions, of parameters, which allow them to capture a wide array of language nuances and contexts.&lt;/p&gt;

&lt;p&gt;LLMs are trained on vast amounts of text data and use architectures such as Transformer neural networks, which have the ability to process sequences of data (like sentences) and pay attention to different parts of the sequence when making predictions. This makes them particularly effective for a range of natural language processing (NLP) tasks, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Text generation: LLMs can write essays, create poetry, or generate code based on prompts given to them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Translation: They are capable of translating text between various languages with a high degree of accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Question answering: LLMs can provide answers to questions by understanding context and extracting information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Summarization: They can condense long documents into concise summaries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sentiment analysis: LLMs can determine the sentiment behind the text, such as identifying if a review is positive or negative.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why Qwen? A Quick Rundown
&lt;/h2&gt;

&lt;p&gt;Are you on the lookout for an AI that can chat, create content, summarize, code, and much more, all while respecting your right to privacy? Look no further, the Qwen Chat Model is here to transform your data center into a bastion of secure AI-powered interactions.&lt;/p&gt;

&lt;p&gt;Qwen isn't your average chatbot. It's built on a massive language model and has been trained on a staggering 3 trillion tokens of multilingual data. This AI marvel understands both English and Chinese intricately and has been fine-tuned for human-like interaction.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Go Local with Qwen?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1bo4170q9ts9elvwo42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1bo4170q9ts9elvwo42.png" alt="Image description" width="800" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deploying Qwen locally on your server is about taking control. It's about ensuring that the conversations you have, the data processed, and the privacy promised remain under your purview. Whether you're a business looking to integrate an intelligent chat system, a developer keen on AI research, or simply an enthusiast eager to explore the bounds of conversational AI, Qwen is your go-to choice.&lt;/p&gt;

&lt;p&gt;Now, why would you want to host this LLM locally? Three words: Control, speed, and privacy. You keep your data close to your chest, responses come at lightning speed, and you can rest easy knowing that your chatbot isn't blabbing your secrets all over the public services.&lt;/p&gt;
&lt;h3&gt;
  
  
  Open-Source and Community-Driven
&lt;/h3&gt;

&lt;p&gt;The spirit of innovation in AI is amplified by the open-source community. In keeping with this tradition, the full source code for the &lt;a href="https://github.com/QwenLM/Qwen"&gt;Qwen Chat Model&lt;/a&gt; is readily available on GitHub for anyone interested in diving into the mechanics of the model, contributing to its development, or simply using it as a learning resource. Whether you're a researcher, developer, or AI hobbyist, you can access the source code at &lt;a href="https://github.com/QwenLM/Qwen"&gt;Qwen&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Before You Begin: The Essentials
&lt;/h2&gt;

&lt;p&gt;Before we set sail on this tech odyssey, let's make sure you've got all your ducks in a row:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A Linux server with a GPU card – 'cause let's face it, speed is of the essence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Python 3.6 or higher – the magic wand of programming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pip or Anaconda – your handy dandy package managers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NVIDIA drivers, CUDA Toolkit, and cuDNN – the holy trinity for GPU acceleration.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Got all that? Fabulous! Let's get our hands dirty (figuratively, of course).&lt;/p&gt;
&lt;h2&gt;
  
  
  Crafting the Conversation: Where to Run Your Python Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fic19u48qlowgasmwag22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fic19u48qlowgasmwag22.png" alt="Image description" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whether you're a die-hard fan of Visual Studio Code, a PyCharm enthusiast, or someone who enjoys the interactive flair of Jupyter Notebooks, the Python code for chatting with Qwen is flexible and IDE-agnostic. All you need is an environment that supports Python, and you're all set to bring your AI chat buddy to life.&lt;/p&gt;

&lt;p&gt;Here's a pro tip: If you're using &lt;strong&gt;VSCode&lt;/strong&gt;, take advantage of the built-in terminal to run your Python scripts seamlessly. Just open the command palette (Ctrl+Shift+P), type Python: Run Python File in Terminal, and let VSCode do the heavy lifting. You'll see Qwen's responses right in your integrated terminal.&lt;/p&gt;

&lt;p&gt;For those of you who prefer &lt;strong&gt;PyCharm&lt;/strong&gt;, running your code is just as smooth. Right-click on your script and select Run 'script_name.py', and watch as the IDE executes your conversation with Qwen. PyCharm's powerful tools and debugging features make it a great choice for developing more complex interactions.&lt;/p&gt;

&lt;p&gt;And it doesn't end there – there's a whole plethora of IDEs and code editors that welcome Python with open arms. Pick the one that suits your workflow best, and start chatting away!&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting Up Shop: The Environment
&lt;/h2&gt;

&lt;p&gt;First thing's first, let's prep your Linux server. Ensure your package list is as fresh as the morning breeze and that Python and pip are ready to work their magic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;python3 python3-pip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now for the secret ingredient: a virtual environment. It's like having a personal workspace where you can make a mess without someone yelling at you to clean up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--user&lt;/span&gt; virtualenv
virtualenv qwen_env
&lt;span class="nb"&gt;source &lt;/span&gt;qwen_env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Toolbox: Installing Dependencies
&lt;/h2&gt;

&lt;p&gt;Before we bring Qwen to life, you'll need some tools. Think of this as gathering ingredients for a Michelin-star meal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;torch torchvision torchaudio
pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember to match PyTorch with your CUDA version – it's like pairing a fine wine with the right cheese.&lt;/p&gt;

&lt;h2&gt;
  
  
  Awakening Qwen: Model Initialization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Speaking the Same Language: The Tokenizer
&lt;/h3&gt;

&lt;p&gt;Words are just words until Qwen gives them meaning. That's where the tokenizer comes in, turning your musings into something Qwen can chew on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen-7B-Chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Brains of the Operation: The Model
&lt;/h3&gt;

&lt;p&gt;Qwen's mind is vast and ready to be filled with your conversations. Here's how to wake up the sleeping giant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen-7B-Chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Depending on your hardware, you might opt for different precision modes like BF16 or FP16. It's like tuning your guitar for that perfect pitch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engaging in a Continuous Dialogue with Qwen
&lt;/h2&gt;

&lt;p&gt;Now comes the heart-thumping part – it's time to chat with Qwen! But before you get carried away with the back-and-forth, let's talk about something crucial: the art of conversation continuity.&lt;/p&gt;

&lt;p&gt;Here's a sneak peek at the kind of repartee you can expect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response, history = model.chat(tokenizer, "Greetings, Qwen! How's life in the digital realm?", history=None)
print("Qwen:", response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our opening gambit, we're greeting Qwen with no strings attached – that is, no conversational history. By setting history=None, we're telling Qwen, "This is the start of our chat." Qwen, with nothing but the current prompt to go on, will respond with the freshness of a new interaction.&lt;/p&gt;

&lt;p&gt;Now, watch the magic of context unfold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response, history = model.chat(tokenizer, "Any thoughts on the meaning of life, the universe, and everything?", history=history)
print("Qwen:", response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this round, we pass along the history we received from our previous exchange. This is like handing Qwen a diary of everything we've talked about so far. With this historical context, Qwen can craft a response that's not just witty or profound but also connected to our ongoing conversation. It's the difference between chatting with a wise friend who knows you and asking questions of a stranger.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why 'history' Matters:&lt;/strong&gt; Think of history as the thread that strings our conversation's pearls together. Without it, each response from Qwen would be an isolated pearl, beautiful but solitary. With history, every pearl is knotted securely to the last, creating a beautiful and cohesive string of dialogue. Context is king in conversation, and history is the bearer of context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keeping the Conversation Flowing:&lt;/strong&gt; Just like in human interactions, referring to past comments, jokes, or stories makes for engaging banter. Qwen, armed with the history of the conversation, can recall and reference past exchanges, making for a chat that's as continuous as it's captivating.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Ready, Set, Converse!
&lt;/h2&gt;

&lt;p&gt;Now that you're a pro on the importance of context with the history parameter, fire up that demo script and get ready for an engaging chat with Qwen. Whether you're discussing the cosmos or the best recipe for digital cookies, Qwen's ready to follow your conversational lead with all the grace of a seasoned conversationalist.&lt;/p&gt;

&lt;p&gt;Also, you can fire up that script and start the conversation. It's like opening Pandora's box, but instead of chaos, you get delightful banter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python qwen_chat.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there you have it, my friend – you've got your very own AI chat buddy, ready to conquer the world of conversation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs5leboj9yrr3ajn5qh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs5leboj9yrr3ajn5qh8.png" alt="Image description" width="800" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up: The Grand Finale
&lt;/h2&gt;

&lt;p&gt;Congratulations! You've navigated the treacherous waters of AI deployment like a seasoned captain. Qwen is now snugly settled on your server, and your data is as safe as houses.&lt;/p&gt;

&lt;p&gt;Explore the capabilities of Qwen, contribute to its development, and join a community of like-minded individuals passionate about advancing the state of AI conversations. &lt;/p&gt;

&lt;p&gt;So, go forth and engage in epic dialogues with your shiny new AI sidekick. And who knows? Maybe Qwen will surprise you with its digital wisdom or a joke that'll have you ROFL.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>genai</category>
      <category>llm</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
