<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander Uspenskiy</title>
    <description>The latest articles on DEV Community by Alexander Uspenskiy (@alexander_uspenskiy_the_great).</description>
    <link>https://dev.to/alexander_uspenskiy_the_great</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1636857%2Fe769a564-5262-45d0-918e-c4c093972c9d.jpg</url>
      <title>DEV Community: Alexander Uspenskiy</title>
      <link>https://dev.to/alexander_uspenskiy_the_great</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alexander_uspenskiy_the_great"/>
    <language>en</language>
    <item>
      <title>How to build AI SDLC Pipeline in 15 minutes using LangGraph: Fully Autonomous Development Team with 5 Agents</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Fri, 13 Mar 2026 23:40:19 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/ai-sdlc-pipeline-5-agentsfully-autonomus-40hb</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/ai-sdlc-pipeline-5-agentsfully-autonomus-40hb</guid>
      <description>&lt;h2&gt;
  
  
  The Problem with "AI-Assisted" Development
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools today are autocomplete on steroids. They make you faster at &lt;em&gt;typing&lt;/em&gt;, but the fundamental loop hasn't changed: you still decompose requirements, design architecture, write code, write tests, and review — one step at a time, context-switching between roles.&lt;/p&gt;

&lt;p&gt;What if you could delegate the &lt;em&gt;whole loop&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;That's the question behind &lt;strong&gt;AI SDLC&lt;/strong&gt; — a multi-agent pipeline where a chain of specialised AI agents handles every phase of the software development life cycle. You write a plain-English task description. One command later, you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A structured software specification&lt;/li&gt;
&lt;li&gt;A full technical design with an implementation checklist&lt;/li&gt;
&lt;li&gt;Working Python source code&lt;/li&gt;
&lt;li&gt;pytest unit tests (edge cases included)&lt;/li&gt;
&lt;li&gt;A code review with severity-coded issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No scaffolding. No boilerplate. No switching tabs.&lt;/p&gt;

&lt;p&gt;The full project is on GitLab with working code: 👉 &lt;a href="https://github.com/alexander-uspenskiy/ai_sdlc" rel="noopener noreferrer"&gt;https://github.com/alexander-uspenskiy/ai_sdlc&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why GPT 4.x models?
&lt;/h2&gt;

&lt;p&gt;Only for PoC purposes, for any production env it is highly recommended to use GPT 5.3 and higher or Opus/Sonet 4.5 and higher (as of article published).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Landscape: Agentic AI Frameworks in 2026
&lt;/h2&gt;

&lt;p&gt;Before diving into the implementation, it's worth understanding where this fits in the current ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approaches to multi-agent orchestration
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Graph/state machine&lt;/td&gt;
&lt;td&gt;Sequential pipelines, conditional routing, checkpointing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AutoGen&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conversation-based&lt;/td&gt;
&lt;td&gt;Back-and-forth agent dialogues, human-in-the-loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Role-based crew&lt;/td&gt;
&lt;td&gt;Parallel task execution, hierarchical delegation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Swarm&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handoff-based&lt;/td&gt;
&lt;td&gt;Lightweight, low-boilerplate agent handoffs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic Kernel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Plugin/planner&lt;/td&gt;
&lt;td&gt;Enterprise .NET/Python integrations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each has its niche. LangGraph is the right choice here because the SDLC is fundamentally a &lt;strong&gt;directed acyclic pipeline&lt;/strong&gt; with &lt;strong&gt;conditional error exits&lt;/strong&gt;. State flows forward, agents don't loop back, and failures need to short-circuit gracefully. That's exactly what LangGraph's &lt;code&gt;StateGraph&lt;/code&gt; was built for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not just one big prompt?
&lt;/h3&gt;

&lt;p&gt;A single "write me an app from this description" prompt degrades quickly for non-trivial tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context collapse&lt;/strong&gt; — one prompt can't simultaneously be a BA, architect, developer, QA engineer, and reviewer without each role undermining the others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No specialisation&lt;/strong&gt; — a general prompt produces general output; specialised prompts with role-specific context produce expert output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No accountability&lt;/strong&gt; — you can't easily replay from the architect stage if only the code was wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token ceiling&lt;/strong&gt; — a single-turn mega-prompt blows up for anything beyond toy examples&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pipeline approach solves all four.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6fvajqjxmpe35x5sy1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6fvajqjxmpe35x5sy1b.png" alt=" " width="800" height="799"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every node is a LangGraph node. Every edge is either unconditional (start → load_task, write_artifacts → END) or conditional (check &lt;code&gt;state["status"]&lt;/code&gt;, route to &lt;code&gt;error_handler&lt;/code&gt; if &lt;code&gt;"error"&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The entire pipeline shares one &lt;strong&gt;typed state object&lt;/strong&gt; (&lt;code&gt;SDLCState&lt;/code&gt;), defined once and validated throughout:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
class SDLCState(TypedDict):
    task_md: str                              # Input
    spec_md: str                              # BA output
    tech_design_md: str                       # Architect output (updated by Dev)
    generated_code: dict[str, str]            # Dev output: filename → content
    test_code: dict[str, str]                 # QA output: filename → content
    code_review_md: str                       # Review output
    project_name: str                         # Extracted from spec
    status: str                               # "running" | "error" | "done"
    current_agent: str
    error: Optional[str]
    messages: Annotated[list[BaseMessage], add_messages]

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Agents return only the keys they modify. LangGraph merges partial updates into the full state automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. BA Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; raw task description&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; structured Markdown spec&lt;/p&gt;

&lt;p&gt;The BA agent takes the free-form task and produces a proper specification document:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
project_name: simple_cli_todo_list

## Overview
A command-line to-do application that runs in a loop...

## Goals
- Provide a simple, interactive interface for managing tasks
- Support add, show, and delete operations

## Functional Requirements
- FR-1: `add "item"` appends a new item to the list
- FR-2: `show` displays all items numbered 1-based
- FR-3: `delete N` removes the item at position N
- FR-4: `quit` exits the loop gracefully

## Non-Functional Requirements
- Pure Python, no external dependencies
- Single-file implementation preferred

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;The first line is always &lt;code&gt;project_name: &amp;lt;snake_case_name&amp;gt;&lt;/code&gt; — this is parsed with a regex and used to name all output folders for the rest of the run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;gpt-4o-mini&lt;/code&gt;?&lt;/strong&gt; Structured document generation from a template is a lightweight task. The mini model is fast, cheap, and plenty capable here.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Architect Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; spec from BA&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; full technical design + implementation checklist&lt;/p&gt;

&lt;p&gt;The Architect produces a complete design document covering components, data models, data flow, tech stack, and file structure. The critical part is the &lt;strong&gt;Implementation Plan&lt;/strong&gt; section — a numbered checklist in &lt;code&gt;- [ ]&lt;/code&gt; format:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
## Implementation Plan

- [ ] 1. Define `TodoList` class with internal list storage
- [ ] 2. Implement `add_item(text)` method
- [ ] 3. Implement `show_items()` method
- [ ] 4. Implement `delete_item(n)` method with bounds checking
- [ ] 5. Write `main()` loop with command parsing
- [ ] 6. Handle invalid commands and out-of-range deletes

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;This checklist isn't just documentation — the Dev Agent &lt;em&gt;updates it&lt;/em&gt; after code generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Dev Agent (two LLM calls)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; technical design&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; source files as &lt;code&gt;{filename: content}&lt;/code&gt; dict + updated tech design&lt;/p&gt;

&lt;p&gt;This is the most complex agent. It makes &lt;strong&gt;two sequential LLM calls&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Call 1 — Code generation:&lt;/strong&gt;&lt;br&gt;
Returns a JSON object mapping filenames to file content. The strict JSON output contract lets us reliably parse multi-file outputs regardless of LLM formatting variations:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
{
  "todo.py": "\"\"\"Simple CLI to-do list.\"\"\"\n\nclass TodoList:\n    ...",
  "main.py": "from todo import TodoList\n\ndef main():\n    ..."
}

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;A &lt;code&gt;_parse_json_output()&lt;/code&gt; helper strips markdown fences before parsing — LLMs are inconsistent about whether they wrap JSON in &lt;code&gt;&lt;/code&gt;`&lt;code&gt;json&lt;/code&gt; blocks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Call 2 — Plan update:&lt;/strong&gt;&lt;br&gt;
Takes the tech design + generated filenames, rewrites the implementation plan with all steps marked &lt;code&gt;[x]&lt;/code&gt; and annotated with the file that implements each step:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
- [x] 1. Define `TodoList` class → todo.py
- [x] 2. Implement `add_item(text)` method → todo.py
- [x] 5. Write `main()` loop with command parsing → main.py

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;The updated &lt;code&gt;tech_design_md&lt;/code&gt; (with checked-off plan) replaces the original in state and gets persisted to disk. When you open &lt;code&gt;artifacts/&amp;lt;project&amp;gt;/tech_design.md&lt;/code&gt; after a run, you see exactly what was built and where.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;gpt-4o&lt;/code&gt; for Dev?&lt;/strong&gt; Code generation quality matters. The gap between &lt;code&gt;gpt-4o&lt;/code&gt; and &lt;code&gt;gpt-4o-mini&lt;/code&gt; on code is meaningful, especially for edge case handling, idiom correctness, and docstrings.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. QA Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; all generated source files&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; pytest test files as &lt;code&gt;{filename: content}&lt;/code&gt; dict&lt;/p&gt;

&lt;p&gt;The QA Agent reads every source file and writes comprehensive pytest tests. The key insight in the prompt: &lt;em&gt;test files are given the actual source code, not just the spec&lt;/em&gt; — this means the tests actually match the implementation's structure (real method names, real class names).&lt;/p&gt;

&lt;p&gt;Generated tests cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Happy paths (standard usage)&lt;/li&gt;
&lt;li&gt;Edge cases (empty list, boundary indices)&lt;/li&gt;
&lt;li&gt;Error conditions (invalid input, out-of-range delete)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;unittest.mock&lt;/code&gt; for any I/O or external calls&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Review Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; all source files + all test files&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; structured Markdown code review&lt;/p&gt;

&lt;p&gt;The review doc follows a consistent schema:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;

## Summary
Clean implementation of the requirements. Single-file structure is appropriate.

## Issues

| # | Severity | Location | Issue | Recommendation |
|---|----------|----------|-------|----------------|
| 1 | 🟡 Minor | todo.py:14 | No type hints on public methods | Add `-&amp;gt; None` / `-&amp;gt; str` annotations |
| 2 | 🔵 Info  | main.py:3  | No `if __name__ == "__main__"` guard | Wrap main() call |

## Test Coverage Assessment
Tests cover all three commands and error paths. Missing: concurrent access scenario (out of scope for CLI).

## Verdict: ✅ Approved

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Severity codes: 🔴 Critical, 🟠 High, 🟡 Minor, 🔵 Info.&lt;/p&gt;




&lt;h2&gt;
  
  
  State Management: The Secret Sauce
&lt;/h2&gt;

&lt;p&gt;LangGraph's state model is what makes this architecture clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context isolation between agents
&lt;/h3&gt;

&lt;p&gt;Each agent resets &lt;code&gt;"messages": []&lt;/code&gt; in its return dict. Because &lt;code&gt;messages&lt;/code&gt; uses LangGraph's &lt;code&gt;add_messages&lt;/code&gt; reducer — which &lt;em&gt;accumulates&lt;/em&gt; messages — returning an empty list clears the accumulated history:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
def ba_agent(state: SDLCState) -&amp;gt; dict:
    response = llm.invoke([SystemMessage(...), HumanMessage(...)])
    return {
        "spec_md": response.content,
        "project_name": _extract_project_name(response.content),
        "current_agent": "ba_agent",
        "messages": [],  # ← clears history for the next agent
    }

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Without this, each subsequent agent would see the entire conversation history from all previous agents — a context bleed that confuses specialised roles and wastes tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conditional error routing
&lt;/h3&gt;

&lt;p&gt;Every edge (except the final ones) uses the same routing factory:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
def _route(next_node: str):
    def route(state: SDLCState) -&amp;gt; str:
        if state.get("status") == "error":
            return "error_handler"
        return next_node
    return route

builder.add_conditional_edges("ba_agent", _route("architect_agent"))
builder.add_conditional_edges("architect_agent", _route("dev_agent"))
# ... etc

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Any agent can fail gracefully by returning &lt;code&gt;{"status": "error", "error": "message"}&lt;/code&gt;. The graph short-circuits to &lt;code&gt;error_handler&lt;/code&gt; without affecting already-written artifacts. This is critical for real-world use where LLM calls occasionally fail or return malformed output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkpointing and resumability
&lt;/h3&gt;

&lt;p&gt;The graph compiles with &lt;code&gt;MemorySaver()&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
graph = build_graph()  # compiled with MemorySaver checkpointer

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Every invocation gets a unique &lt;code&gt;thread_id&lt;/code&gt; (UUID). This means state is checkpointed at every node boundary. You can resume a failed run or inspect intermediate state without replaying the whole pipeline.&lt;/p&gt;

&lt;p&gt;The CLI exposes this with &lt;code&gt;run-from&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
# Code generation was wrong? Re-run from Dev, reusing existing spec + tech design
python sdlc_cli.py run-from dev

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;This loads persisted artifacts back into state up to the requested restart point, saving both time and API cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  The write_artifacts Node
&lt;/h2&gt;

&lt;p&gt;One of the stronger design decisions: &lt;strong&gt;agents never touch the filesystem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;All agents are pure functions of state → state. The filesystem write is centralised in a single &lt;code&gt;write_artifacts&lt;/code&gt; node that runs only after all agents succeed:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
def write_artifacts(state: SDLCState) -&amp;gt; dict:
    name = state["project_name"]
    write_artifact(name, "spec.md", state["spec_md"])
    write_artifact(name, "tech_design.md", state["tech_design_md""])
    write_artifact(name, "code_review.md", state["code_review_md"])
    all_code = {**state["generated_code"], **state["test_code"]}
    write_code_files(name, all_code)
    return {"status": "done"}

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Testable agents&lt;/strong&gt; — unit tests mock the LLM, never the filesystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic output&lt;/strong&gt; — you don't get partially written artifacts from a failed run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single I/O boundary&lt;/strong&gt; — one place to change output format, destination, or cloud upload&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Output Structure
&lt;/h2&gt;

&lt;p&gt;After &lt;code&gt;python sdlc_cli.py run&lt;/code&gt; on a "simple CLI to-do list" task:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
artifacts/simple_cli_todo_list/
    spec.md              ← BA spec with functional requirements
    tech_design.md       ← Architect design with ✓ checked implementation plan
    code_review.md       ← Severity-coded review with verdict

code/simple_cli_todo_list/
    todo.py              ← TodoList class implementation
    main.py              ← CLI loop and command parser
    test_todo.py         ← pytest tests for TodoList
    test_main.py         ← pytest tests for the CLI

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Both directories are gitignored — they're runtime outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI Tool Integrations
&lt;/h2&gt;

&lt;p&gt;The pipeline is designed to be &lt;strong&gt;AI-tool agnostic&lt;/strong&gt;. Every popular coding assistant gets its own integration file that delegates to &lt;code&gt;sdlc_cli.py&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Integration&lt;/th&gt;
&lt;th&gt;Invocation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.claude/commands/sdlc.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/sdlc run&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cursor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.cursor/commands/sdlc.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@sdlc run&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/prompts/sdlc.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prompt panel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Continue.dev&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.continue/prompts/sdlc.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/sdlc run&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windsurf&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.windsurf/rules/sdlc.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rules panel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; at the repo root is a universal context file — any tool can read it to understand the project architecture and available commands without tool-specific configuration.&lt;/p&gt;

&lt;p&gt;This pattern is increasingly important: your automation shouldn't be locked to a single AI assistant.&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced Monitoring (LLM, Cost)
&lt;/h2&gt;

&lt;p&gt;Advanced monitoring and debugging using &lt;a href="https://smith.langchain.com/" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt; dashboard(s). Inquiry, responses, timing, cost and more.&lt;/p&gt;




&lt;h2&gt;
  
  
  CLI Reference
&lt;/h2&gt;

&lt;pre&gt;
&lt;code&gt;
# Run full pipeline on input/task.md
python sdlc_cli.py run

# One-liner — write task inline and run
python sdlc_cli.py new "Build a CLI password generator"

# Re-run from a specific agent (reuses prior artifacts)
python sdlc_cli.py run-from dev   # valid: ba, architect, dev, qa, review

# Inspect outputs
python sdlc_cli.py show spec
python sdlc_cli.py show tech_design
python sdlc_cli.py show code
python sdlc_cli.py show code_review

# Check what's been built
python sdlc_cli.py status

# Run framework unit tests (all mocked, no API keys needed)
python sdlc_cli.py test

# Run QA-generated tests for the last project
python sdlc_cli.py test-generated

&lt;/code&gt;
&lt;/pre&gt;




&lt;h2&gt;
  
  
  Testing the Pipeline Itself
&lt;/h2&gt;

&lt;p&gt;The framework ships with its own unit tests in &lt;code&gt;tests/&lt;/code&gt;. These test each agent in isolation — no real API calls, no API keys required:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
# tests/test_dev_agent.py
@patch("sdlc.agents.dev_agent.llm")
def test_dev_agent_makes_two_llm_calls(mock_llm):
    mock_llm.invoke.side_effect = [
        AIMessage(content='{"main.py": "print(\'hello\')"'),
        AIMessage(content="- [x] 1. Create main.py → main.py"),
    ]
    result = dev_agent(base_state())
    assert mock_llm.invoke.call_count == 2
    assert "main.py" in result["generated_code"]

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;The Dev Agent test specifically asserts &lt;strong&gt;exactly two LLM calls&lt;/strong&gt; — if the implementation changes to make one or three calls, the test catches it. This kind of behavioural assertion is more valuable than output-content assertions for LLM-calling code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Decisions Worth Noting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why separate &lt;code&gt;write_artifacts&lt;/code&gt; from agents?&lt;/strong&gt;&lt;br&gt;
Agents stay pure and testable. A failed run doesn't leave half-written files. One node controls all I/O.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why JSON for multi-file output?&lt;/strong&gt;&lt;br&gt;
Markdown code fences are ambiguous when embedding multiple files. JSON gives a reliable, parseable structure: &lt;code&gt;{"filename.py": "content..."}&lt;/code&gt;. The &lt;code&gt;_parse_json_output()&lt;/code&gt; helper handles LLM fence inconsistencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why reset &lt;code&gt;messages&lt;/code&gt; between agents?&lt;/strong&gt;&lt;br&gt;
Each agent is a standalone expert. Prior conversation context from other agents would confuse the role and waste tokens. Clean slate per agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;gpt-4o&lt;/code&gt; for Dev/QA but &lt;code&gt;gpt-4o-mini&lt;/code&gt; for the rest?&lt;/strong&gt;&lt;br&gt;
Code generation and test generation have the highest quality ceiling — stronger model pays off. Structured document generation (spec, design, review) works well with the faster mini model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why a CLI instead of direct Python calls?&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;sdlc_cli.py&lt;/code&gt; is a single, tool-agnostic interface. Every AI coding assistant can invoke the same commands. No tool-specific knowledge required.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;pre&gt;
&lt;code&gt;
git clone https://git.epam.com/alexander_uspensky/ai-sdlc.git
cd ai-sdlc
python -m venv .venv &amp;amp;&amp;amp; .venv\Scripts\activate   # Windows
# source .venv/bin/activate                      # macOS/Linux
pip install -r requirements.txt
cp .env.example .env  # add your OPENAI_API_KEY

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Write your task:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
# Edit input/task.md with your task description, then:
python sdlc_cli.py run

# Or inline:
python sdlc_cli.py new "Build a REST API for a bookmark manager with FastAPI"

&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;Watch the pipeline run:&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;
============================================================
  Multi-Agent SDLC Pipeline
============================================================
[Pipeline] Loading task...
[BA] Analysing requirements...
[Architect] Designing system...
[Dev] Generating code (call 1/2)...
[Dev] Updating implementation plan (call 2/2)...
[QA] Writing tests...
[Review] Reviewing code...
[Pipeline] Writing artifacts to disk...
============================================================
  ✅ Pipeline complete!
  Project  : simple_cli_todo_list
  Artifacts: artifacts/simple_cli_todo_list/
  Code     : code/simple_cli_todo_list/
============================================================

&lt;/code&gt;
&lt;/pre&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current pipeline is linear — each agent hands off sequentially. Obvious extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallel QA + Review&lt;/strong&gt; — once Dev finishes, QA and Review could run concurrently (LangGraph supports fan-out/fan-in natively)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback loops&lt;/strong&gt; — if Review flags Critical issues, route back to Dev for a fix pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangSmith tracing&lt;/strong&gt; — set &lt;code&gt;LANGCHAIN_TRACING_V2=true&lt;/code&gt; and every LLM call is logged with inputs, outputs, latency, and token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model pluggability&lt;/strong&gt; — swap agents to Claude Sonnet 4.6, Gemini 2.0 Flash, or local Llama models without changing graph structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web UI&lt;/strong&gt; — LangGraph's &lt;code&gt;LangGraph Platform&lt;/code&gt; can serve the graph as an API with a streaming interface&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Multi-agent SDLC isn't about replacing developers — it's about automating the &lt;em&gt;mechanical&lt;/em&gt; parts of the cycle so you can focus on the &lt;em&gt;creative&lt;/em&gt; parts: system design decisions, edge case identification, architectural trade-offs.&lt;/p&gt;

&lt;p&gt;The LangGraph approach specifically gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit, auditable data flow&lt;/strong&gt; — state is typed and visible at every step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliable error handling&lt;/strong&gt; — any agent can fail gracefully without corrupting output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composable architecture&lt;/strong&gt; — add, remove, or swap agents without touching the graph structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumability&lt;/strong&gt; — run from any checkpoint, save API costs on partial failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full project is on GitLab with working code: 👉 &lt;a href="https://github.com/alexander-uspenskiy/ai_sdlc" rel="noopener noreferrer"&gt;https://github.com/alexander-uspenskiy/ai_sdlc&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with LangGraph 1.1.0, LangChain, OpenAI GPT-4o, Python 3.13.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;All agent tests run without API keys — &lt;code&gt;pytest tests/&lt;/code&gt; works out of the box.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>sdlc</category>
      <category>llm</category>
    </item>
    <item>
      <title>Semantic Similarity Score for AI RAG</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Mon, 19 May 2025 17:46:13 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/semantic-similarity-score-for-ai-rag-2fck</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/semantic-similarity-score-for-ai-rag-2fck</guid>
      <description>&lt;h2&gt;
  
  
  What is Semantic Similarity Score?
&lt;/h2&gt;

&lt;p&gt;A semantic similarity score measures how closely two pieces of text (like a question and an answer) relate in meaning—regardless of exact wording. In AI systems, it’s used to rank or retrieve the most relevant answers by comparing their vector embeddings. A higher score (closer to 1) means the texts are more alike in context and intent.&lt;/p&gt;

&lt;p&gt;Think of it as how well your AI understood what you meant—beyond just matching keywords.&lt;/p&gt;

&lt;p&gt;My next RAG POC will use similarity score to distinguish the better approach: the data in vector db or context from the web search agent.&lt;/p&gt;

&lt;p&gt;If interested you can find my previous articles on RAG POCs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/alexander_uspenskiy_the_great/build-the-smartest-bot-youve-ever-seen-a-7b-model-web-search-right-on-your-laptop-5eoe"&gt;Build the Smartest AI Bot You’ve Ever Seen — A 7B Model + Web Search, Right on Your Laptop&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/alexander_uspenskiy_the_great/how-to-create-your-own-rag-with-free-llm-models-and-a-knowledge-base-2odm"&gt;How to Create Your Own RAG with Free LLM Models and a Knowledge Base&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Build the Smartest AI Bot You’ve Ever Seen — A 7B Model + Web Search, Right on Your Laptop</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Tue, 22 Apr 2025 21:54:01 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/build-the-smartest-bot-youve-ever-seen-a-7b-model-web-search-right-on-your-laptop-5eoe</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/build-the-smartest-bot-youve-ever-seen-a-7b-model-web-search-right-on-your-laptop-5eoe</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcccdw80t7wqu1mv1vl6j.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcccdw80t7wqu1mv1vl6j.jpg" alt="Smartest AI Bot" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary:
&lt;/h2&gt;

&lt;p&gt;RAG Web is a Python-based application that combines web search and natural language processing to answer user queries. It uses DuckDuckGo for retrieving web search results and a Hugging Face Zephyr-7B-beta model for generating answers based on the retrieved context. &lt;/p&gt;

&lt;p&gt;This is my second article related to RAG implementation, the first part related to vector in-memory RAG on your laptop you can find here: &lt;a href="https://dev.to/alexander_uspenskiy_the_great/how-to-create-your-own-rag-with-free-llm-models-and-a-knowledge-base-2odm"&gt;How to create your own RAG&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Web Search RAG Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7cvfq8918s4v0c03vjv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7cvfq8918s4v0c03vjv.png" alt="Web Search RAG Architecture" width="800" height="203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this PO user query is sent as an input for the external web search. For this implementation it is DuckDuckGo service to avoid API and security limitations for the more efficient search services like Google. Search Result (as Context) is sent with the original user query as an input for the language transfer model (HuggingFaceH4/zephyr-7b-beta) which summarise and extracts the answer and output to the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Instructions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Clone / Copy the Project&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/alexander-uspenskiy/rag_web
&lt;span class="nb"&gt;cd &lt;/span&gt;rag_web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Create and Activate Virtual Environment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Install Requirements&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Run the Script&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python rag_web.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How the Script Works
&lt;/h2&gt;

&lt;p&gt;This is a lightweight Retrieval-Augmented Generation (RAG) implementation using:&lt;br&gt;
    • A 7B language model (Zephyr) from Hugging Face&lt;br&gt;
    • DuckDuckGo for real-time web search (no API key needed)&lt;/p&gt;
&lt;h2&gt;
  
  
  Code Breakdown
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Imports and Setup&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;duckduckgo_search&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DDGS&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;textwrap&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;transformers: From Hugging Face, used to load and interact with the LLM.&lt;/li&gt;
&lt;li&gt;DDGS: DuckDuckGo’s Python interface for search queries.&lt;/li&gt;
&lt;li&gt;textwrap: Used for formatting the output neatly.&lt;/li&gt;
&lt;li&gt;re: Regular expressions to clean the model’s output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Web Search Function&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DDGS&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ddgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ddgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Purpose: Takes a user query and performs a web search.&lt;/li&gt;
&lt;li&gt;How it works: Uses the DDGS().text(...) method to fetch search results.&lt;/li&gt;
&lt;li&gt;Returns: A list of snippet texts (just the bodies, without links/titles).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Context Generation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;snippets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snippets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;textwrap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Combines all snippet results into one big context paragraph.&lt;/li&gt;
&lt;li&gt;Applies word wrapping to improve readability (optional for model input but nice for debugging/logging).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Model Initialization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;qa_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HuggingFaceH4/zephyr-7b-beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HuggingFaceH4/zephyr-7b-beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Loads Zephyr-7B, a chat-tuned model from Hugging Face.&lt;/li&gt;
&lt;li&gt;device_map="auto" lets Hugging Face offload model parts across available hardware (e.g., MPS or CUDA).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Question Answering Function&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;a) Get Context&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Performs search and prepares the retrieved content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;b) Prepare Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;[CONTEXT]
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

[QUESTION]
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

[ANSWER]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This RAG-style prompt provides the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[CONTEXT] = retrieved text from the web&lt;/li&gt;
&lt;li&gt;[QUESTION] = user’s query&lt;/li&gt;
&lt;li&gt;[ANSWER] = expected model output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;c) Generate Answer&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;qa_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The model generates text following the [ANSWER] tag.&lt;/li&gt;
&lt;li&gt;do_sample=True allows some creativity/randomness.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;d) Post-processing&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;answer_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;generated_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[ANSWER]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;[^&amp;gt;]+&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Strips the prompt from the output.&lt;/li&gt;
&lt;li&gt;Removes any stray XML/HTML-style tags (, , etc.) the model might emit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. User Interaction Loop&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Opens a CLI loop.&lt;/li&gt;
&lt;li&gt;Reads user input from the terminal.&lt;/li&gt;
&lt;li&gt;Runs the full search + answer pipeline.&lt;/li&gt;
&lt;li&gt;Displays the answer and continues unless the user types exit or quit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User Query]
     ↓
DuckDuckGo Search API
     ↓
[Web Snippets]
     ↓
[CONTEXT] + [QUESTION] Prompt
     ↓
Zephyr 7B (Hugging Face)
     ↓
[Generated Answer]
     ↓
Display in Terminal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Zephyr-7B?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Zephyr&lt;/strong&gt; is a family of instruction-tuned, open-weight language models developed by &lt;a href="https://huggingface.co/HuggingFaceH4/zephyr-7b-beta" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;. It's designed to be helpful, honest, and harmless — and small enough to run on consumer hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Characteristics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7 Billion parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Based on Mistral-7B (dense transformer, multi-query attention)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tuning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fine-tuned using DPO (Direct Preference Optimization)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Length&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports up to 8,192 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs locally on M1/M2 Macs, GPUs, or even CPU with quantization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optimized for dialogue, instructions, and chat use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why I Picked Zephyr for This Script
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open weights&lt;/strong&gt; — no API keys, no rate limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs on laptop&lt;/strong&gt; — 7B is small enough for consumer devices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instruction-tuned&lt;/strong&gt; — great at handling prompts containing context and questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Friendly outputs&lt;/strong&gt; — fine-tuned to be helpful and safe&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy integration&lt;/strong&gt; — via Hugging Face &lt;code&gt;transformers&lt;/code&gt; pipeline&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Compared to Other Models
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zephyr-7B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open, chat-tuned, lightweight&lt;/td&gt;
&lt;td&gt;Slightly less fluent than GPT-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-3.5/4&lt;/td&gt;
&lt;td&gt;Top-tier reasoning&lt;/td&gt;
&lt;td&gt;Closed, pay-per-use, no local use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral-7B&lt;/td&gt;
&lt;td&gt;High-speed base model&lt;/td&gt;
&lt;td&gt;Needs fine-tuning for QA/chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLaMA2 7B&lt;/td&gt;
&lt;td&gt;Open and popular&lt;/td&gt;
&lt;td&gt;Less optimized for chat out-of-box&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Final Thoughts on model
&lt;/h2&gt;

&lt;p&gt;Zephyr-7B hits the sweet spot between performance, privacy, and portability. It gives you GPT-style interaction with full local control — and when combined with web search, it becomes a surprisingly capable assistant.&lt;/p&gt;

&lt;p&gt;If you're building a local AI assistant or just want to experiment with RAG pipelines without burning through API tokens, &lt;strong&gt;Zephyr-7B is a strong starting point.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage example
&lt;/h2&gt;

&lt;p&gt;You can see the RAG searches for the real-time data to add to the context and send to the model so model can generate an answer:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybl8k8j0o9b3agfa8f5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybl8k8j0o9b3agfa8f5v.png" alt="Udsage example" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;p&gt;While the baseline implementation is functional and responsive, several optimizations can improve performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Quantization&lt;/strong&gt;: Use 4-bit or 8-bit quantized versions of the model with &lt;code&gt;bitsandbytes&lt;/code&gt; to reduce memory usage and inference time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming Inference&lt;/strong&gt;: Implement token streaming for faster perceived response times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching Search Results&lt;/strong&gt;: Avoid redundant queries by caching recent DuckDuckGo results locally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async Execution&lt;/strong&gt;: Use &lt;code&gt;asyncio&lt;/code&gt; to parallelize web search and token generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Truncation&lt;/strong&gt;: Dynamically trim context to fit within model’s token limits, prioritizing relevance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future Enhancements for Enterprise RAG
&lt;/h2&gt;

&lt;p&gt;To scale this into an enterprise-grade RAG system, consider the following enhancements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector Search Integration&lt;/strong&gt;: Add to a web search a hybrid search system using vector embeddings (e.g., FAISS, Weaviate, Pinecone).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Base Sync&lt;/strong&gt;: Sync data from private sources like Confluence, Notion, SharePoint, or document stores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn Memory&lt;/strong&gt;: Add a conversation memory layer using a session buffer or vector memory for context retention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User Feedback Loop&lt;/strong&gt;: Incorporate thumbs-up/down voting to improve results and fine-tune retrieval relevance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security &amp;amp; Auditability&lt;/strong&gt;: Wrap API access and logging in enterprise security layers (SSO, encryption, RBAC).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Run inference via model serving tools like vLLM, TGI, or TorchServe with GPU acceleration and autoscaling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;This article explores how to build a lightweight Retrieval-Augmented Generation (RAG) assistant using a 7B parameter open-source language model (Zephyr-7B) and real-time web search via DuckDuckGo.&lt;/p&gt;

&lt;p&gt;The solution runs locally, requires no external APIs, and leverages Hugging Face's &lt;code&gt;transformers&lt;/code&gt; library to deliver intelligent, contextual responses to user queries.&lt;/p&gt;

&lt;p&gt;Zephyr-7B was chosen for its balance of performance and portability. It is instruction-tuned, easy to run on consumer hardware, and excels in structured question-answering tasks. When paired with live search results, it creates a powerful, self-contained research assistant.&lt;/p&gt;

&lt;p&gt;This project is ideal for developers looking to experiment with local LLMs, build RAG prototypes, or create privacy-respecting AI tools without relying on paid cloud APIs.&lt;/p&gt;

&lt;p&gt;The full implementation, code walkthrough, and architecture are detailed below.&lt;/p&gt;

&lt;p&gt;Use a GitHub repository to get a POC code: &lt;a href="https://github.com/alexander-uspenskiy/rag_web" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy Coding!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>RAG with free LLM Model</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Thu, 13 Feb 2025 22:05:57 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/rag-with-free-llm-model-44n4</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/rag-with-free-llm-model-44n4</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/alexander_uspenskiy_the_great" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1636857%2Fe769a564-5262-45d0-918e-c4c093972c9d.jpg" alt="alexander_uspenskiy_the_great"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/alexander_uspenskiy_the_great/how-to-create-your-own-rag-with-free-llm-models-and-a-knowledge-base-2odm" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;How to Create Your Own RAG with Free LLM Models and a Knowledge Base&lt;/h2&gt;
      &lt;h3&gt;Alexander Uspenskiy ・ Dec 16 '24&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#rag&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#vectordatabase&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>python</category>
      <category>ai</category>
      <category>rag</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>DeepSeek on your laptop!</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Thu, 13 Feb 2025 22:05:23 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/deepseek-on-your-laptop-347</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/deepseek-on-your-laptop-347</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/alexander_uspenskiy_the_great" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1636857%2Fe769a564-5262-45d0-918e-c4c093972c9d.jpg" alt="alexander_uspenskiy_the_great"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/alexander_uspenskiy_the_great/unlock-deepseek-r1-7b-on-your-laptop-experience-the-smartest-ai-model-i-ever-tested-1n49" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Unlock DeepSeek R1 7B on Your Laptop—Experience the Smartest AI Model I Ever Tested!&lt;/h2&gt;
      &lt;h3&gt;Alexander Uspenskiy ・ Jan 31 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#deepseek&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#code&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>python</category>
      <category>code</category>
    </item>
    <item>
      <title>Any alternative to DeepSeek?</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Thu, 13 Feb 2025 22:04:36 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/any-alternative-to-deepseek-8kh</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/any-alternative-to-deepseek-8kh</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/alexander_uspenskiy_the_great" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1636857%2Fe769a564-5262-45d0-918e-c4c093972c9d.jpg" alt="alexander_uspenskiy_the_great"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/alexander_uspenskiy_the_great/mistrals-small-24b-parameter-model-blows-minds-no-data-sent-to-china-just-pure-ai-power-2p30" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!&lt;/h2&gt;
      &lt;h3&gt;Alexander Uspenskiy ・ Feb 4 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#deepseek&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#code&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>code</category>
      <category>python</category>
    </item>
    <item>
      <title>Agentic Ai, is it our future?</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Thu, 13 Feb 2025 22:03:41 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/agentic-ai-is-it-our-future-2bo0</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/agentic-ai-is-it-our-future-2bo0</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/alexander_uspenskiy_the_great" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1636857%2Fe769a564-5262-45d0-918e-c4c093972c9d.jpg" alt="alexander_uspenskiy_the_great"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/alexander_uspenskiy_the_great/agentic-ai-revolutionizing-next-generation-software-development-teams-509p" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Agentic AI: Revolutionizing Next-Generation Software Development Teams&lt;/h2&gt;
      &lt;h3&gt;Alexander Uspenskiy ・ Feb 13 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#webdev&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#sdl&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>webdev</category>
      <category>ai</category>
      <category>sdl</category>
      <category>programming</category>
    </item>
    <item>
      <title>Agentic AI: Revolutionizing Next-Generation Software Development Teams</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Thu, 13 Feb 2025 19:49:50 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/agentic-ai-revolutionizing-next-generation-software-development-teams-509p</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/agentic-ai-revolutionizing-next-generation-software-development-teams-509p</guid>
      <description>&lt;p&gt;As we move forward at light speed toward the implementation of Agentic AI and Artificial General Intelligence (AGI), I think it’s time to consider a next-generation Software Development Life Cycle in terms of team structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, what is Agentic AI?&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agentic AI refers to AI systems that operate with a level of autonomy, decision-making, and adaptability similar to human agents. These AI models can independently plan, execute tasks, and adjust their behavior based on goals, feedback, and environmental changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Since there is no real Agentic AI on the market at the moment, we need to consider how to incorporate these systems into current or future IT teams.&lt;/p&gt;

&lt;p&gt;I would assume that, in the near future, there will be several proposals for AI-based developers—either universal or language/technology-specific. As a leader, you will need to compose a hybrid team of both human IT specialists and Agentic AI virtual specialists.&lt;/p&gt;

&lt;p&gt;Another assumption is that there will be more than one proposal on the market, meaning we’ll likely see different AI agents (on-premises as LLM/RAG, online, in private/public clouds, etc.). One of the most important questions here is security. I would classify the key parameters as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Security&lt;/li&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;Performance&lt;/li&gt;
&lt;li&gt;Redundancy&lt;/li&gt;
&lt;li&gt;Special Requirements (content window, fine-tuning, integrations, etc.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re working in government or finance, you should start thinking about on-premises/private cloud resources in advance. For any other commercial use of agents, you can begin evaluating options for the level of security your business model can tolerate.&lt;/p&gt;

&lt;p&gt;I believe that hybrid teams are the near future. For example, in a development team, you might have a set of Agentic AI developers and human dev/engineering leads with specialized AI-related skills. It’s likely that the role of a regular programmer or coder will diminish quickly.&lt;/p&gt;

&lt;p&gt;One of the interesting new fields is Agentic AI synchronization and communication interfaces. Such interfaces should be secure, fast, and agnostic to different model providers.&lt;/p&gt;

&lt;p&gt;I also believe that a “next-gen Jira” could be integrated with these interfaces, allowing a dev lead to assign tasks directly to AI models or let the models select tasks themselves, much like in a classic Agile environment.&lt;/p&gt;

&lt;p&gt;Finally, I see that the positions of Scrum Masters, Performance Engineers, and QA specialists could also be handled by Agentic AI. My hope is that software quality will rise as new approaches to automated testing are implemented.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I believe that we are (surprise but again) pretty close to the transformation point, and it is wise to start planning the organizational transformation in advance and understand how to incorporate Agentic AI (or AGI) into the existing (or new) IT teams as frictionlessly as possible.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>sdl</category>
      <category>programming</category>
    </item>
    <item>
      <title>Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Tue, 04 Feb 2025 00:56:17 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/mistrals-small-24b-parameter-model-blows-minds-no-data-sent-to-china-just-pure-ai-power-2p30</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/mistrals-small-24b-parameter-model-blows-minds-no-data-sent-to-china-just-pure-ai-power-2p30</guid>
      <description>&lt;p&gt;I've inspected the latest response from Mistral: Mistral-Small-24B-Instruct. It is bigger, slower than &lt;a href="https://dev.to/alexander_uspenskiy_the_great/unlock-deepseek-r1-7b-on-your-laptop-experience-the-smartest-ai-model-i-ever-tested-1n49"&gt;deepseek-ai/deepseek-r1-distill-qwen-7b&lt;/a&gt; but it also showing how it is thinking and   doesn't send your sensitive data to China soil :)&lt;/p&gt;

&lt;p&gt;So let's start. &lt;/p&gt;

&lt;p&gt;This project provides an interactive chat interface for the mistralai/Mistral-Small-24B-Instruct-2501 model using PyTorch and the Hugging Face Transformers library.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.8+&lt;/li&gt;
&lt;li&gt;PyTorch&lt;/li&gt;
&lt;li&gt;Transformers&lt;/li&gt;
&lt;li&gt;An Apple Silicon device (optional, for MPS support)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;br&gt;
Clone the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/alexander-uspenskiy/mistral.git
&lt;span class="nb"&gt;cd &lt;/span&gt;mistral
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create and activate a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate  &lt;span class="c"&gt;# On Windows use `venv\Scripts\activate`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Install the required packages:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;torch transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your Hugging Face Hub token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HUGGINGFACE_HUB_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_token_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Run the chat interface:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python mistral.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive chat interface with the Mistral-Small-24B-Base-2501 model.&lt;/li&gt;
&lt;li&gt;Progress indicator while generating responses.&lt;/li&gt;
&lt;li&gt;Supports Apple Silicon GPU (MPS) for faster inference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;

&lt;span class="c1"&gt;# Check if MPS (Apple Silicon GPU) is available
&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backends&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load the Mistral-Small-24B-Base-2501 model
&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-24B-Instruct-2501&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HUGGINGFACE_HUB_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Optimized for M1 GPU
&lt;/span&gt;    &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show_progress&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;stop_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_set&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|/-&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r&lt;/span&gt;&lt;span class="s"&gt;Generating response &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Interactive terminal loop
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-Small-24B-Instruct-2501 Chat Interface (type &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to quit)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;interaction_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Initialize counter
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stop_event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;progress_thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;show_progress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;progress_thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;pad_token_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token_id&lt;/span&gt;  &lt;span class="c1"&gt;# Added to avoid warning
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stop_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;progress_thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Clear the progress line
&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mistral: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs                         
Mistral: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs?
Let's denote the cost of the ball as \( B \) and the cost of the bat as \( B + 1.00 \).

According to the problem, the total cost of the bat and the ball is $1.10. Therefore, we can write the equation:

\[ B + (B + 1.00) = 1.10 \]

Simplifying the equation:

\[ 2B + 1.00 = 1.10 \]

Subtract 1.00 from both sides:

\[ 2B = 0.10 \]

Divide both sides by 2:

\[ B = 0.05 \]

So, the ball costs $0.05.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary:
&lt;/h2&gt;

&lt;p&gt;So as you see modern models are able to run locally and solve logical tasks with the excellent performance.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>code</category>
      <category>python</category>
    </item>
    <item>
      <title>Unlock DeepSeek R1 7B on Your Laptop—Experience the Smartest AI Model I Ever Tested!</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Fri, 31 Jan 2025 22:52:40 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/unlock-deepseek-r1-7b-on-your-laptop-experience-the-smartest-ai-model-i-ever-tested-1n49</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/unlock-deepseek-r1-7b-on-your-laptop-experience-the-smartest-ai-model-i-ever-tested-1n49</guid>
      <description>&lt;p&gt;To be honest, I didn’t have high expectations for the buzz surrounding DeepSeek R1. However, I decided to test the 7B DeepSeek model, deepseek-ai/deepseek-r1-distill-qwen-7b, and what I discovered truly amazed me.&lt;/p&gt;

&lt;p&gt;So let's start.&lt;/p&gt;

&lt;p&gt;This is a project that utilizes transformer models for generating responses to user queries. It leverages the &lt;code&gt;transformers&lt;/code&gt; library from HuggingFace and &lt;code&gt;torch&lt;/code&gt; for efficient model handling and inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.7 or higher&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip&lt;/code&gt; (Python package installer)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Clone the repository:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/alexander-uspenskiy/deepseek
&lt;span class="nb"&gt;cd &lt;/span&gt;deepseek
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create and activate a virtual environment:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the required packages:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers torch
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Run the main script:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python deepseek.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Follow the prompts to enter your questions. Type 'quit' to exit the interaction.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;deepseek.py&lt;/code&gt;: Main script containing the model setup and response generation logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(venv) $ python deepseek.py

Enter your question (or 'quit' to exit): A bat and a ball costs 1 dollar and 10 cents in total. The bat costs 1 dollar more than a ball. How much does the ball costs?

Response: Question: A bat and a ball costs 1 dollar and 10 cents in total. The bat costs 1 dollar more than a ball. How much does the ball costs?
Answer: 5 cents.

But wait, that doesn't make sense. If the ball is 5 cents, then the bat is 1 dollar more, which would be $1.05, and together that's $1.10. So, the ball is 5 cents.

Wait, but that seems a bit tricky. Let me think again. Maybe I should set up some equations to solve this properly.

Let me denote the cost of the ball as \( x \) dollars. Then, according to the problem, the bat costs \( x + 1 \) dollars. The total cost of the bat and the ball is $1.10, so:

\( x + (x + 1) = 1.10 \)

Simplify the equation:

\( 2x + 1 = 1.10 \)

Subtract 1 from both sides:

\( 2x = 0.10 \)

Divide both sides by 2:

\( x = 0.05 \)

So, the ball costs 5 cents. That seems correct, but let me verify.

If the ball is 5 cents, the bat is $1.05. Adding them together: 5 + 105 = 110 cents, which is $1.10. Yes, that checks out.

Wait, but sometimes these problems can be tricky. Is there any other way to interpret the problem that could lead to a different answer? Maybe the phrasing could be ambiguous. Let me read it again:

"A bat and a ball costs 1 dollar and 10 cents in total. The bat costs 1 dollar more than a ball. How much does the ball costs?"

Hmm, no, the way it's phrased seems pretty straightforward. It says the total is $1.10, and the bat is $1 more than the ball. So, with the equations I set up, it leads to the ball being 5 cents.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you see the response shows the whole reasoning process which is amazing for the model that can be executed on your laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Model ID from HuggingFace
&lt;/span&gt;    &lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/deepseek-r1-distill-qwen-7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize tokenizer
&lt;/span&gt;    &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load model with lower precision for memory efficiency
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Use fp16 for efficiency
&lt;/span&gt;        &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Automatically handle device placement
&lt;/span&gt;        &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Tokenize input
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate response
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Pass attention_mask
&lt;/span&gt;            &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;pad_token_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Decode and return response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Setup model and tokenizer
&lt;/span&gt;        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setup_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Example QA interaction
&lt;/span&gt;        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Enter your question (or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to exit): &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Answer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;If you encounter issues with the model download or execution, ensure that your internet connection is stable and try the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Ensure the virtual environment is activated:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reinstall the required packages:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; transformers torch
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check the Python interpreter being used:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;which python
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>python</category>
      <category>code</category>
    </item>
    <item>
      <title>Unlock AI-Powered Image Processing on Your Laptop with Stable Diffusion v1.5 – It’s Easier Than You Think!</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Wed, 29 Jan 2025 16:32:35 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/unlock-ai-powered-image-processing-on-your-laptop-with-stable-diffusion-v15-its-easier-than-you-2e0c</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/unlock-ai-powered-image-processing-on-your-laptop-with-stable-diffusion-v15-its-easier-than-you-2e0c</guid>
      <description>&lt;p&gt;This script leverages Stable Diffusion v1.5 from Hugging Face's Diffusers library to generate image variations based on a given text prompt. By using torch and PIL, it processes an input image, applies AI-driven transformations, and saves the results.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can clone this repo to get the code &lt;a href="https://github.com/alexander-uspenskiy/image_variations" rel="noopener noreferrer"&gt;https://github.com/alexander-uspenskiy/image_variations&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Source code:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StableDiffusionImg2ImgPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BytesIO&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Load and preprocess the input image
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Resize and preserve aspect ratio
&lt;/span&gt;    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RGB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;thumbnail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Resampling&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LANCZOS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create new image with padding to reach target size
&lt;/span&gt;    &lt;span class="n"&gt;new_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RGB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;new_image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;paste&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;target_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_image&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_image_variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;input_image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stable-diffusion-v1-5/stable-diffusion-v1-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guidance_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;7.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generate variations of an input image using a specified prompt

    Parameters:
    - input_image_path: Path or URL to the input image
    - prompt: Text prompt to guide the image generation
    - model_id: Hugging Face model ID
    - num_images: Number of variations to generate
    - strength: How much to transform the input image (0-1)
    - guidance_scale: How closely to follow the prompt
    - seed: Random seed for reproducibility

    Returns:
    - List of generated images
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Set random seed if provided
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;manual_seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load the model
&lt;/span&gt;    &lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StableDiffusionImg2ImgPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load and preprocess the input image
&lt;/span&gt;    &lt;span class="n"&gt;init_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate images
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;init_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;num_images_per_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;strength&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;guidance_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;guidance_scale&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_generated_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Save the generated images with sequential numbering
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images-out/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Example parameters
&lt;/span&gt;    &lt;span class="n"&gt;input_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images-in/Image_name.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# or URL
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draw the image in modern art style, photorealistic and detailed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate variations
&lt;/span&gt;    &lt;span class="n"&gt;generated_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_image_variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;input_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;num_images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;  &lt;span class="c1"&gt;# Optional: for reproducibility
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Save the results
&lt;/span&gt;    &lt;span class="nf"&gt;save_generated_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Works:
&lt;/h2&gt;

&lt;p&gt;Load &amp;amp; Preprocess the Input Image&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accepts both local file paths and URLs.&lt;/li&gt;
&lt;li&gt;Converts the image to RGB format and resizes it to 768×768, maintaining aspect ratio.&lt;/li&gt;
&lt;li&gt;Adds padding to fit the target size.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initialize Stable Diffusion v1.5&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loads the model on CUDA (if available) or falls back to CPU.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses StableDiffusionImg2ImgPipeline to process the input image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate AI-Modified Image Variations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Takes in a text prompt to guide the transformation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parameters like strength (0-1) and guidance scale (higher = stricter prompt adherence) allow customization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supports multiple output images per prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Save Results to image-out directory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Outputs generated images with a sequential naming scheme (generated_0.png, generated_1.png, etc.).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example Use Case
&lt;/h2&gt;

&lt;p&gt;You can transform an image of a person into a medieval king using a prompt like:&lt;br&gt;
&lt;code&gt;prompt = "Draw this person as a powerful king, photorealistic and detailed, in a medieval setting."&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial image:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1lufutf67wf6a6lefxf9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1lufutf67wf6a6lefxf9.jpg" alt=" " width="800" height="705"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkkercc6f08bsrd25pqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkkercc6f08bsrd25pqz.png" alt=" " width="768" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cons&amp;amp;Pros
&lt;/h2&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be slow on some hardware configurations.&lt;/li&gt;
&lt;li&gt;Small size model limitations. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs locally (no need for cloud services).&lt;/li&gt;
&lt;li&gt;Customizable parameters for fine-tuning output.&lt;/li&gt;
&lt;li&gt;Reproducibility with optional random seed.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>diy</category>
      <category>python</category>
      <category>coding</category>
    </item>
    <item>
      <title>Unlock the Magic of Images: A Quick and Easy Guide to Using the Cutting-Edge SmolVLM-500M Model</title>
      <dc:creator>Alexander Uspenskiy</dc:creator>
      <pubDate>Fri, 24 Jan 2025 02:36:19 +0000</pubDate>
      <link>https://dev.to/alexander_uspenskiy_the_great/unlock-the-magic-of-images-a-quick-and-easy-guide-to-using-the-cutting-edge-smolvlm-500m-model-366c</link>
      <guid>https://dev.to/alexander_uspenskiy_the_great/unlock-the-magic-of-images-a-quick-and-easy-guide-to-using-the-cutting-edge-smolvlm-500m-model-366c</guid>
      <description>&lt;p&gt;The model &lt;a href="https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct" rel="noopener noreferrer"&gt;SmolVLM-500M-Instruct&lt;/a&gt; is a state-of-the-art, compact model with 500 million parameters. Despite its relatively small size, its capabilities are remarkably impressive.&lt;/p&gt;

&lt;p&gt;Let's jump to the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForVision2Seq&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;

&lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filterwarnings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Some kwargs in processor config are unused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_and_describe_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HuggingFaceTB/SmolVLM-500M-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForVision2Seq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HuggingFaceTB/SmolVLM-500M-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe the content of this &amp;lt;image&amp;gt; in detail, give only answers in a form of text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;pixel_values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pixel_values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images/bender.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;upload_and_describe_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image Description:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Python script uses the Hugging Face Transformers library to generate a textual description of an image. It loads a pre-trained vision-to-sequence model and processor, processes an input image, and generates a descriptive text based on the image content. The script handles exceptions and prints the generated description.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can download it here: &lt;a href="https://github.com/alexander-uspenskiy/vlm" rel="noopener noreferrer"&gt;https://github.com/alexander-uspenskiy/vlm&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Based on this original non-stock image (put it to the image directory of the project): &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lubhpk7sixmcz13590z.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lubhpk7sixmcz13590z.jpg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Take a look at the description generated by the model (you can play with the prompt and parameters in the code to format the output better for any propose): &lt;strong&gt;The robot is sitting on a couch. It has eyes and mouth. He is reading something. He is holding a book with his hands. He is looking at the book. In the background, there are books in a shelf. Behind the books, there is a wall and a door. At the bottom of the image, there is a chair. The chair is white. The chair has a cushion on it. In the background, the wall is brown. The floor is grey.  in the image, the robot is silver and cream color. The book is brown.  The book is open. The robot is holding the book with both hands. The robot is looking at the book. The robot is sitting on the couch.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It looks excellent, and the model is both fast and resource-efficient compared to LLMs.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vlm</category>
      <category>python</category>
      <category>howto</category>
    </item>
  </channel>
</rss>
