<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: pielouNW</title>
    <description>The latest articles on DEV Community by pielouNW (@pielounw).</description>
    <link>https://dev.to/pielounw</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3829247%2F74724489-a661-4b45-9359-d82b4f28c1f3.png</url>
      <title>DEV Community: pielouNW</title>
      <link>https://dev.to/pielounw</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pielounw"/>
    <language>en</language>
    <item>
      <title>Beginner's Guide to Essential Terms in Artificial Intelligence</title>
      <dc:creator>pielouNW</dc:creator>
      <pubDate>Thu, 07 May 2026 12:14:58 +0000</pubDate>
      <link>https://dev.to/pielounw/beginners-guide-to-essential-terms-in-artificial-intelligence-4ip1</link>
      <guid>https://dev.to/pielounw/beginners-guide-to-essential-terms-in-artificial-intelligence-4ip1</guid>
      <description>&lt;p&gt;AI has its own language, and if you're just getting started, it can feel like everyone else got the memo but you.&lt;/p&gt;

&lt;p&gt;Terms like &lt;em&gt;tokens&lt;/em&gt;, &lt;em&gt;inference&lt;/em&gt;, and &lt;em&gt;quantization&lt;/em&gt; get tossed around in articles, videos, and job descriptions as if they're common knowledge, but they're not.&lt;/p&gt;

&lt;p&gt;This guide helps you to navigate in the AI jungle, it covers the core AI vocabulary you'll encounter and defines it simply. Whether you're building something, exploring the field, or just trying to follow the conversation, these are the terms worth knowing.&lt;/p&gt;

&lt;p&gt;You can also find this article on &lt;a href="https://www.nobodywho.ai/" rel="noopener noreferrer"&gt;NobodyWho website&lt;/a&gt; and learn how to integrate LLMs into your applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The AI Stack
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The big picture.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Artificial Intelligence (AI)
&lt;/h3&gt;

&lt;p&gt;The field of computer science focused on building systems that can perform tasks that normally require human intelligence, like understanding text or audio, recognizing images or making decisions.&lt;/p&gt;

&lt;p&gt;The term "AI" is ultimately a moving target. As the border between machine tasks and human tasks moves, the definition shifts too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Machine Learning (ML)
&lt;/h3&gt;

&lt;p&gt;A subset of AI where systems learn from data instead of being programmed. Rather than writing rules by hand, you feed the system examples and it figures out the patterns on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deep Learning (DL)
&lt;/h3&gt;

&lt;p&gt;A subset of machine learning that uses neural networks with many layers to learn from large amounts of data. It's the technology behind most modern AI breakthroughs. State-of-the-art systems for image recognition, speech synthesis, and large language models all rely on it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generative AI (Gen AI)
&lt;/h3&gt;

&lt;p&gt;Generative AI refers to Artificial Intelligence systems that are capable of creating new content such as text, images, audio, video, or code.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. How Models Are Built
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Behind the scenes.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataset
&lt;/h3&gt;

&lt;p&gt;A structured collection of data used to train, test, or evaluate a model. Datasets can contain text, images, audio, or any other form of information. The quality and size of a dataset directly affects how well a model performs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training
&lt;/h3&gt;

&lt;p&gt;The process of exposing a model to data so it can learn patterns. During training, the model adjusts its internal parameters millions (or billions) of times to get better at its task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parameters / Weights
&lt;/h3&gt;

&lt;p&gt;The internal numerical values a model learns during training. Parameters are what the model actually "knows", before being fed any prompts. They encode the patterns extracted from training data. A model with 70 billion parameters has 70 billion of these numbers, all tuned to make its outputs as accurate as possible. Weights is another term for the same thing, often used when referring to the files you download for open-weight models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fine-tuning
&lt;/h3&gt;

&lt;p&gt;The process of taking a pre-trained model and continuing to train it on a smaller, specialized dataset to adapt it to a specific task or style. Fine-tuning is faster and cheaper than training from scratch, and it's how generic models get turned into domain-specific ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distillation
&lt;/h3&gt;

&lt;p&gt;A training technique where a smaller model (the student) is trained to mimic the behavior of a larger model (the teacher). The goal is to compress the capabilities of a large, expensive model into a smaller, faster one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantization
&lt;/h3&gt;

&lt;p&gt;A technique for reducing a model's size by lowering the precision of its weights like for example, going from 32-bit floats to 8-bit integers. Quantized models are faster and cheaper to run, with a trade-off in accuracy.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. What a Model Is
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The different shapes a model can take.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Model
&lt;/h3&gt;

&lt;p&gt;The output of training, which consists of one or several files that have learned to map inputs to outputs. Models can range from a few megabytes to several terabytes.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM (Large Language Model)
&lt;/h3&gt;

&lt;p&gt;A type of deep learning model trained on massive amounts of text data to understand and generate human language. LLMs like GPT, Claude, and Gemini predict the next most likely word/token given a context. They're the engine behind most modern AI chat and writing tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  SLM (Small Language Model)
&lt;/h3&gt;

&lt;p&gt;A language model trained with fewer parameters than a typical LLM, designed to run efficiently on limited hardware like laptops, phones, or even smartwatches. SLMs are not categorically different from LLMs, but simply smaller variants.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixture of Experts (MoE)
&lt;/h3&gt;

&lt;p&gt;An architecture where only a fraction of the model's parameters are used for any given token, rather than all of them. This means MoE models still need a lot of memory to hold all the weights, but they spend less compute per token, so they run faster than a dense model of equivalent size. Recent examples include DeepSeek, Mixtral, and Qwen's MoE variants.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open-weights Models
&lt;/h3&gt;

&lt;p&gt;Models that are publicly released, allowing anyone to download, run, and fine-tune them. Popular examples include Llama 3, Mistral, Qwen, Gemma and DeepSeek.&lt;/p&gt;

&lt;p&gt;The term "open-weights" is used rather than "open-source" to specify exactly what is being released. "Open-source" refers to the publishing of source code, which is human-readable code used to produce a non-human-readable binary artifact (the compiled program). The model itself is a non-human-readable binary artifact, so the term "open-weights" is used to specify that it's the &lt;em&gt;weights&lt;/em&gt; of the model that are open, and not necessarily the training source code or dataset that was used to produce the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vision Model
&lt;/h3&gt;

&lt;p&gt;A model specialized in processing and understanding images. Vision models can classify what's in an image, detect objects, generate captions, or power visual search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multimodal Model
&lt;/h3&gt;

&lt;p&gt;A model that can process or generate more than one type of data like text, images, audio or video. For example, GPT-4o and Gemini are multimodal: you can send them an image and ask a question about it, or have them describe what they hear in an audio file.&lt;/p&gt;

&lt;p&gt;Multimodal models aren't necessarily capable of ingesting and outputting the same types of data. Many multimodal models can receive image or text inputs, and only generate text outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Model vs. Thinking Model
&lt;/h3&gt;

&lt;p&gt;These terms are often used interchangeably, but there's a subtle distinction.&lt;br&gt;
A &lt;strong&gt;reasoning model&lt;/strong&gt; is explicitly trained or prompted to work through problems step by step before producing an answer, breaking complex tasks into logical stages.&lt;br&gt;
A &lt;strong&gt;thinking model&lt;/strong&gt; typically refers to models that have a dedicated internal "thinking" phase, where the model processes before responding.&lt;br&gt;
In practice, both aim to improve accuracy on complex tasks by slowing down the output process.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Language &amp;amp; Text Processing
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;How models read and represent text under the hood.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Token
&lt;/h3&gt;

&lt;p&gt;The basic unit an LLM processes. For text, a token is roughly a word fragment, "learning" might be one token, while "incredible" might be split into two tokens: "in" and "credible". Models don't read characters or full words, they read tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tokenization
&lt;/h3&gt;

&lt;p&gt;The process of converting some kind of input (text, image, audio, etc.) into tokens. All model inputs are converted to tokens before being fed into the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embeddings
&lt;/h3&gt;

&lt;p&gt;A way of representing data (text, images, audio) as vectors (lists of numbers), in a high-dimensional space. Similar concepts end up close together in that space. Embeddings are what allow models to understand that "king" and "queen" are related, or that a photo of a cat is similar to the word "cat."&lt;/p&gt;

&lt;p&gt;Embeddings are particularly useful in RAG systems, to identify relevant sources of information to include.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Using a Model
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The controls and inputs that shape how a model behaves.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt
&lt;/h3&gt;

&lt;p&gt;The input you give to a model. For language models, a prompt is the text, like a question, instruction, or context that the model responds to. For multimodal models, the prompt could also contain an image or some audio. Prompt quality directly affects output quality. Small changes in wording can produce significantly different results.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Prompt
&lt;/h3&gt;

&lt;p&gt;A special prompt, invisible to the end user, that sets the model's behavior, tone, and constraints before the conversation begins. Developers use system prompts to give a model its "personality" or restrict what it can and can't do. It's a configuration layer on top of the model. Most models are trained to prioritize following instructions in the system prompt over any subsequent instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Window
&lt;/h3&gt;

&lt;p&gt;The maximum amount of text a model can process at once, both input and output combined. If a model has a 128k token context window, it can "see" roughly 100,000 words at a time, since there are roughly 0.75 words per token. Anything outside the context window is invisible to the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency
&lt;/h3&gt;

&lt;p&gt;The time it takes for a model to respond after receiving input.&lt;br&gt;
In AI products, latency matters for user experience. It's influenced by model size, what device it's running on, and whether the output is streamed token by token or returned all at once.&lt;br&gt;
It's useful to measure both the time-to-first-token (TTFT) and the time to complete an entire response. In use-cases where you can stream the tokens to the user, displaying the very first token when it is generated, TTFT matters most.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token throughput
&lt;/h3&gt;

&lt;p&gt;Typically measured in &lt;em&gt;tokens per second&lt;/em&gt;, this is a measure of how quickly the model can process and generate tokens.&lt;br&gt;
The speed to read a bunch of tokens and the speed to write a bunch of tokens are very different. The token throughput of reading is often around 10x as fast as the token throughput of writing tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Sampling
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Controlling how tokens are selected.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sampling
&lt;/h3&gt;

&lt;p&gt;Under the hood, generative language models output a probability distribution for the next token in a sequence. If given the sequence of tokens &lt;code&gt;["Once ", "upon ", "a "]&lt;/code&gt;, a model might output a distribution with a high probability for the token "time", a much lower probability for the token "hill", and an incredibly low probability for nonsense tokens like "13".&lt;/p&gt;

&lt;p&gt;In order to actually generate a sequence, we must select one of these tokens to accept and output to the user. This process of selecting a token from the probability distribution is known as sampling.&lt;/p&gt;

&lt;p&gt;While it's tempting to simply select the most probable token, it has been shown that language models generate much better outputs when some randomness is applied. The field of sampling in LLMs is about designing exactly how this random selection works.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sampler Chain
&lt;/h3&gt;

&lt;p&gt;A sampler consists of two phases:&lt;/p&gt;

&lt;p&gt;First, any number of transformations is applied to the probability distribution. These steps might zero the probability of a bunch of tokens, or shift the distribution of all tokens.&lt;/p&gt;

&lt;p&gt;You can try to play with the token probabilities on &lt;a href="https://artefact2.github.io/llm-sampling/" rel="noopener noreferrer"&gt;this website&lt;/a&gt;. If you drag-and-drop the sampling steps, you may notice that the order of steps applied can make a difference on the result.&lt;/p&gt;

&lt;p&gt;Once the sequence of transformations has been applied, the sampler chain finalizes by selecting a token from that distribution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Greedy Sampling
&lt;/h3&gt;

&lt;p&gt;Greedy sampling is the sampling technique where you always select the most probable token, sidestepping any randomness in the sampling process. Greedy sampling leads to very predictable and boring output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dist sampling
&lt;/h3&gt;

&lt;p&gt;Dist sampling is the practice of selecting a token randomly, weighted by each token's probability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Temperature
&lt;/h3&gt;

&lt;p&gt;A transformation that can be applied to token probability distributions to shift it towards preferring the more probable or the less probable tokens.&lt;br&gt;
If a temperature of greater than 1 is applied, the high-probability tokens are made less likely, and the low-probability tokens are made more likely.&lt;br&gt;
If a temperature of less than 1 is applied, the high-probability tokens are made more likely, and the low-probability tokens are made less likely.&lt;br&gt;
A temperature of exactly 1 has no effect.&lt;/p&gt;

&lt;p&gt;Low temperature makes the model more focused and deterministic, making it feel measured and predictable.&lt;br&gt;
High temperature introduces more randomness and variation, making it feel creative and spontaneous.&lt;/p&gt;

&lt;h3&gt;
  
  
  Top-k
&lt;/h3&gt;

&lt;p&gt;Top-k limits the model to choosing from only the k most likely tokens at each step. For example, top-k of 40 means only the 40 most probable options are considered. &lt;/p&gt;

&lt;h3&gt;
  
  
  Top-p
&lt;/h3&gt;

&lt;p&gt;Top-p (also called nucleus sampling) is more dynamic: it picks from the smallest group of tokens whose combined probability adds up to p, so at top-p of 0.9, the model considers just enough tokens to cover 90% of the probability mass.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grammar
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://en.wikipedia.org/wiki/Formal_grammar" rel="noopener noreferrer"&gt;formal grammar&lt;/a&gt; can be applied as a transformation on token probabilities. This will exclude tokens (by setting their probability to zero), if they can't possibly result in a valid completion of the grammar. This can be used to guarantee that the output will always be compatible with a certain well-defined language, so a certain parser will always work. E.g. you can apply a formal grammar to force the model to only output valid JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  DRY
&lt;/h3&gt;

&lt;p&gt;A transformation that reduces the likelihood of tokens if they have been used recently. This is useful for preventing models from repeating themselves.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. How Models Think &amp;amp; Respond
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What's actually happening when a model generates an output.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Inference
&lt;/h3&gt;

&lt;p&gt;The act of running a trained model on a new input to get an output. Training is when a model learns; inference is when it's actually used. Most of what happens when you use an AI product like chatting, generating images, transcribing audio is inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chain-of-Thought (CoT)
&lt;/h3&gt;

&lt;p&gt;A prompting technique where the model is encouraged to reason step by step before giving a final answer, rather than jumping straight to a conclusion. By writing out intermediate reasoning, like a person writing their thoughts on paper, the model tends to make fewer mistakes on complex tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hallucination
&lt;/h3&gt;

&lt;p&gt;When a model generates information that sounds confident but is factually wrong or completely made up. Hallucinations happen because models predict plausible sounding text, not verified truth.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Advanced Techniques
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Methods that extend or enhance what models can do.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG (Retrieval-Augmented Generation)
&lt;/h3&gt;

&lt;p&gt;A technique where a model retrieves relevant external information before generating a response. Instead of relying solely on what it learned during training, the model pulls in fresh data from a database or document store at inference time. It's a practical way to keep responses accurate and up to date.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Calling
&lt;/h3&gt;

&lt;p&gt;The ability of a model to invoke external functions or APIs during a conversation, things like searching the web, running code, querying a database, or reading a file. Rather than generating a plain text answer, the model recognizes when a tool would help, calls it with the right inputs, receives the result, and incorporates it into its response. Tool calling is what bridges a language model and the real world, and it's the core mechanism behind most agentic systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. AI Systems &amp;amp; Evaluation
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;How models are deployed, measured, and put to work.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent / Agentic
&lt;/h3&gt;

&lt;p&gt;An AI system that can take actions, use tools, and pursue a goal across multiple steps, rather than just responding once to a single prompt. An agentic system consists of a model, a suite of tools, and some logic for when and for how long to run it. An agent will often run in several steps, until it reaches some well-defined result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Guardrails
&lt;/h3&gt;

&lt;p&gt;Rules and filters applied to a model's inputs or outputs to keep it within acceptable boundaries. Guardrails can block harmful content, enforce topic restrictions, prevent the model from impersonating real people, or ensure responses stay on-brand for a product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alignment
&lt;/h3&gt;

&lt;p&gt;The challenge of making AI systems behave in ways that reflect human intentions, values, and goals. A misaligned model might be highly capable but pursue objectives in ways its creators didn't intend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Eval Benchmark
&lt;/h3&gt;

&lt;p&gt;A standardized test used to measure and compare response quality. Benchmarks like MMLU, HumanEval, or HellaSwag evaluate specific capabilities like reasoning, coding, language understanding, or maths. They're useful for comparing models, but a high benchmark score doesn't always translate to real-world usefulness.&lt;/p&gt;

&lt;p&gt;Visit &lt;a href="https://www.nobodywho.ai/" rel="noopener noreferrer"&gt;nobodywho.ai&lt;/a&gt; to start integrating AI into your applications!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
      <category>llm</category>
    </item>
    <item>
      <title>Run LLMs locally in Flutter apps</title>
      <dc:creator>pielouNW</dc:creator>
      <pubDate>Mon, 23 Mar 2026 14:16:40 +0000</pubDate>
      <link>https://dev.to/pielounw/run-llms-locally-in-flutter-apps-211p</link>
      <guid>https://dev.to/pielounw/run-llms-locally-in-flutter-apps-211p</guid>
      <description>&lt;p&gt;In this tutorial, you'll learn how to run a large language model (LLM) directly on a user's device — no cloud, no server, no cost. We'll start from scratch, build a working chat interface, and progressively introduce more advanced features: tool calling, sampling, and RAG.&lt;/p&gt;

&lt;p&gt;Each concept is explained before the code, so you can follow along whether you're new to on-device AI or just new to NobodyWho.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;a href="https://github.com/nobodywho-ooo/flutter-starter-example" rel="noopener noreferrer"&gt;example app&lt;/a&gt; for this article is available on GitHub if you want to jump straight to working code. It is kept up to date with the latest features — if you want the code that matches this tutorial exactly, check out &lt;a href="https://github.com/nobodywho-ooo/flutter-starter-example/tree/eba1ec3d3e75dd44e80a91db603c04dd21b47cf3" rel="noopener noreferrer"&gt;this commit&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Run AI On-Device?
&lt;/h2&gt;

&lt;p&gt;Most AI features rely on a cloud API: you send a request to a remote server, it runs the model, and sends a response back. That works well, but it comes with tradeoffs.&lt;/p&gt;

&lt;p&gt;Running the model directly on the device avoids all of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Works offline&lt;/strong&gt; — no internet connection required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy by design&lt;/strong&gt; — user data never leaves the device&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low latency&lt;/strong&gt; — no network round-trip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cloud costs&lt;/strong&gt; — inference is free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is raw capability: on-device models are smaller and less powerful than frontier cloud models. But for many use cases — summarization, chatbots, or local search — they're more than good enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  About NobodyWho
&lt;/h2&gt;

&lt;p&gt;We'll use the &lt;a href="https://github.com/nobodywho-ooo/nobodywho" rel="noopener noreferrer"&gt;NobodyWho&lt;/a&gt; library throughout this tutorial. It wraps &lt;a href="https://github.com/ggerganov/llama.cpp" rel="noopener noreferrer"&gt;llama.cpp&lt;/a&gt; in Rust and exposes a clean Flutter API for running any model in &lt;code&gt;.gguf&lt;/code&gt; format.&lt;/p&gt;

&lt;p&gt;Install it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;flutter pub add nobodywho
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then initialize the engine in &lt;code&gt;main.dart&lt;/code&gt; before your app launches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:nobodywho/nobodywho.dart'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;WidgetsFlutterBinding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensureInitialized&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;NobodyWho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;init&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;runApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="n"&gt;MyApp&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Choosing and Loading a Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Picking a Model
&lt;/h3&gt;

&lt;p&gt;We'll use &lt;strong&gt;LFM2&lt;/strong&gt;, a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. Models must be in &lt;code&gt;.gguf&lt;/code&gt; format; most will work with NobodyWho, though some may fail due to chat template formatting issues. See the &lt;a href="https://docs.nobodywho.ooo/model-selection/" rel="noopener noreferrer"&gt;model selection guide&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting the Model onto the Device
&lt;/h3&gt;

&lt;p&gt;You have two options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bundle in assets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple setup, great for development&lt;/td&gt;
&lt;td&gt;Increases app size significantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Download on demand&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keeps app size small&lt;/td&gt;
&lt;td&gt;Requires more implementation work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For this tutorial, we'll bundle the model in assets to keep things simple. In production, you'd want to use a download-on-demand approach with something like &lt;a href="https://pub.dev/packages/background_downloader" rel="noopener noreferrer"&gt;background_downloader&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an &lt;code&gt;assets/&lt;/code&gt; folder at the root of your project (if it doesn't exist).&lt;/li&gt;
&lt;li&gt;Register it in &lt;code&gt;pubspec.yaml&lt;/code&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;flutter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;assets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;assets/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Download &lt;a href="https://huggingface.co/unsloth/LFM2-700M-GGUF/resolve/main/LFM2-700M-Q4_K_M.gguf?download=true" rel="noopener noreferrer"&gt;this GGUF model&lt;/a&gt;, rename it &lt;code&gt;model.gguf&lt;/code&gt;, and place it in the &lt;code&gt;assets/&lt;/code&gt; folder.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Accessing the Model at Runtime
&lt;/h3&gt;

&lt;p&gt;NobodyWho reads the model from the filesystem, so we copy it from Flutter's asset bundle to the app's documents directory on first launch. Add &lt;code&gt;path_provider&lt;/code&gt; to handle this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;flutter pub add path_provider
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'dart:io'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:flutter/services.dart'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:path_provider/path_provider.dart'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;getApplicationDocumentsDirectory&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="si"&gt;${dir.path}&lt;/span&gt;&lt;span class="s"&gt;/model.gguf'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rootBundle&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'assets/model.gguf'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeAsBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;asUint8List&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nl"&gt;flush:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Basic Chat
&lt;/h2&gt;

&lt;p&gt;With the model in place, you're ready to start a conversation. Here's the simplest possible usage — good for testing or when you don't need a full chat UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Is water wet?'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Putting It Together
&lt;/h3&gt;

&lt;p&gt;Here's the complete minimal app so far:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'dart:io'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:flutter/material.dart'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:flutter/services.dart'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:nobodywho/nobodywho.dart'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:path_provider/path_provider.dart'&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;WidgetsFlutterBinding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ensureInitialized&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;NobodyWho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;init&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;runApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="n"&gt;MainApp&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MainApp&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="n"&gt;StatelessWidget&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="n"&gt;MainApp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_onPressed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;getApplicationDocumentsDirectory&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="si"&gt;${dir.path}&lt;/span&gt;&lt;span class="s"&gt;/model.gguf'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rootBundle&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'assets/model.gguf'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeAsBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;asUint8List&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nl"&gt;flush:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'How do I code a button in Flutter?'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="n"&gt;debugPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;debugPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error: &lt;/span&gt;&lt;span class="si"&gt;$err&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nd"&gt;@override&lt;/span&gt;
  &lt;span class="n"&gt;Widget&lt;/span&gt; &lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BuildContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;MaterialApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nl"&gt;home:&lt;/span&gt; &lt;span class="n"&gt;Scaffold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nl"&gt;body:&lt;/span&gt; &lt;span class="n"&gt;Center&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="nl"&gt;child:&lt;/span&gt; &lt;span class="n"&gt;ElevatedButton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nl"&gt;onPressed:&lt;/span&gt; &lt;span class="n"&gt;_onPressed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nl"&gt;child:&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Ask - How do I code a button in Flutter?"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For customization options like system prompts and context size, see the &lt;a href="https://docs.nobodywho.ooo/flutter/chat/" rel="noopener noreferrer"&gt;Chat documentation&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Chat Interface
&lt;/h2&gt;

&lt;p&gt;A one-shot &lt;code&gt;ask().completed()&lt;/code&gt; call is fine for single questions, but a real chat interface needs to stream tokens as they arrive — otherwise users stare at a blank screen until the full response is ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming Tokens
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'How do I code a button in Flutter?'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Each token arrives as it's generated&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;em&gt;token&lt;/em&gt; is the smallest unit a model generates — typically a word fragment, punctuation mark, or whitespace character.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling the Streaming Content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_ChatScreenState&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatScreen&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;TextEditingController&lt;/span&gt; &lt;span class="n"&gt;_textController&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TextEditingController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;_streamingContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;_responding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_ask&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;userInput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_textController&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userInput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;_responding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;setState&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;_responding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="n"&gt;_streamingContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;tokenStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tokenStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;setState&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_streamingContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_streamingContent&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// ...continued below&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updating the Message List
&lt;/h3&gt;

&lt;p&gt;Once the stream completes, fetch the full chat history and update your UI state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getChatHistory&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;copyWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nl"&gt;content:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;setState&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;_messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;_messages&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;_streamingContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;_responding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wiring Up the UI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Connect your &lt;code&gt;TextField&lt;/code&gt; to call &lt;code&gt;_ask()&lt;/code&gt; via &lt;code&gt;onSubmitted&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Render &lt;code&gt;_messages&lt;/code&gt; in a &lt;code&gt;ListView&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Append &lt;code&gt;_streamingContent&lt;/code&gt; at the bottom while streaming&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tool Calling
&lt;/h2&gt;

&lt;p&gt;Tool calling lets the model interact with the outside world. You define a set of functions — each with a name, a description, and an implementation — and the model decides when and how to call them based on the user's request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'dart:math'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:nobodywho/nobodywho.dart'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;circleAreaTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;name:&lt;/span&gt; &lt;span class="s"&gt;"circle_area"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;description:&lt;/span&gt; &lt;span class="s"&gt;"Calculates the area of a circle given its radius"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;function:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="kd"&gt;required&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;radius&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;area&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;radius&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;radius&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"Circle with radius &lt;/span&gt;&lt;span class="si"&gt;$radius&lt;/span&gt;&lt;span class="s"&gt; has area &lt;/span&gt;&lt;span class="si"&gt;${area.toStringAsFixed(2)}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;getWeatherTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;name:&lt;/span&gt; &lt;span class="s"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;description:&lt;/span&gt; &lt;span class="s"&gt;"Get the current weather for a given city"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;function:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="kd"&gt;required&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;fetchWeather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;tools:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;circleAreaTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;getWeatherTool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'What is the area of a circle with a radius of 2?'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model reads each tool's &lt;code&gt;description&lt;/code&gt; to decide when to call it, so writing clear, specific descriptions matters.&lt;/p&gt;

&lt;p&gt;See the &lt;a href="https://docs.nobodywho.ooo/flutter/tool-calling/" rel="noopener noreferrer"&gt;Tool Calling documentation&lt;/a&gt; for more.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sampling
&lt;/h2&gt;

&lt;p&gt;When generating a token, the model produces a probability distribution over every possible next token. A &lt;em&gt;sampler&lt;/em&gt; controls how the final token is chosen from that distribution.&lt;/p&gt;

&lt;p&gt;The default behavior involves some randomness, which produces natural, varied output. But you can tune it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lower temperature&lt;/strong&gt; → more deterministic, predictable output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher temperature&lt;/strong&gt; → more creative, varied output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constrained sampling&lt;/strong&gt; → force output into a specific format, such as JSON
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;sampler:&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SamplerPresets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;temperature:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the &lt;a href="https://docs.nobodywho.ooo/flutter/sampling/" rel="noopener noreferrer"&gt;Sampling documentation&lt;/a&gt; for more.&lt;/p&gt;




&lt;h2&gt;
  
  
  RAG
&lt;/h2&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) combines document search with LLM generation. The model uses retrieved documents to ground its responses in your knowledge base.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: A Customer Service Assistant
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="s"&gt;'package:nobodywho/nobodywho.dart'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// The cross-encoder re-ranks retrieved documents by relevance.&lt;/span&gt;
  &lt;span class="c1"&gt;// Recommended model:&lt;/span&gt;
  &lt;span class="c1"&gt;// https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q8_0.gguf&lt;/span&gt;
  &lt;span class="c1"&gt;// Follow the same approach as the chat model to import the reranker model.&lt;/span&gt;
  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;crossencoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;CrossEncoder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;rerankerModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;knowledge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"Our company offers a 30-day return policy for all products"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Free shipping is available on orders over &lt;/span&gt;&lt;span class="err"&gt;\$&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Customer support is available via email and phone"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"We accept credit cards, PayPal, and bank transfers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Order tracking is available through your account dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;searchKnowledgeTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nl"&gt;name:&lt;/span&gt; &lt;span class="s"&gt;"search_knowledge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;description:&lt;/span&gt; &lt;span class="s"&gt;"Search the knowledge base for relevant information"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;function:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="kd"&gt;required&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;crossencoder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;rankAndSort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;query:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;documents:&lt;/span&gt; &lt;span class="n"&gt;knowledge&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;topDocs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;topDocs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nobodywho&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nl"&gt;modelPath:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;systemPrompt:&lt;/span&gt;
        &lt;span class="s"&gt;"You are a customer service assistant. Use the search_knowledge tool "&lt;/span&gt;
        &lt;span class="s"&gt;"to find relevant information from our policies before answering."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;tools:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;searchKnowledgeTool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"What is your return policy?"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the &lt;a href="https://docs.nobodywho.ooo/flutter/embeddings-and-rag/" rel="noopener noreferrer"&gt;Embeddings &amp;amp; RAG documentation&lt;/a&gt; for more.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;You now have a complete foundation for building on-device AI features in Flutter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load and run a GGUF model&lt;/li&gt;
&lt;li&gt;Build a streaming chat interface&lt;/li&gt;
&lt;li&gt;Extend the model with tool calling&lt;/li&gt;
&lt;li&gt;Control output style with sampling&lt;/li&gt;
&lt;li&gt;Ground responses in a knowledge base with RAG&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From here, you can explore the full &lt;a href="https://docs.nobodywho.ooo/" rel="noopener noreferrer"&gt;NobodyWho documentation&lt;/a&gt; or dig into the &lt;a href="https://github.com/nobodywho-ooo/flutter-starter-example" rel="noopener noreferrer"&gt;example app&lt;/a&gt; to see everything working end to end.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>flutter</category>
      <category>llm</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
