<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jonathan P.</title>
    <description>The latest articles on DEV Community by Jonathan P. (@jprevo).</description>
    <link>https://dev.to/jprevo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3461081%2Fe018a6af-12a2-4c0e-a1cf-52462c70c74a.jpeg</url>
      <title>DEV Community: Jonathan P.</title>
      <link>https://dev.to/jprevo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jprevo"/>
    <language>en</language>
    <item>
      <title>I Built an AI Clone of Myself - Fine-Tuning with 10 Years of Data and Voice Cloning</title>
      <dc:creator>Jonathan P.</dc:creator>
      <pubDate>Tue, 17 Feb 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/jprevo/i-built-an-ai-clone-of-myself-fine-tuning-with-10-years-of-data-and-voice-cloning-4h60</link>
      <guid>https://dev.to/jprevo/i-built-an-ai-clone-of-myself-fine-tuning-with-10-years-of-data-and-voice-cloning-4h60</guid>
      <description>&lt;p&gt;I love LLMs. I use them every single day, whether it’s for work or just optimizing my personal life. Recently, I wanted to dive deeper into &lt;strong&gt;fine-tuning&lt;/strong&gt;—the process of adding a specialized training layer to an existing model so it behaves exactly how you want, without the massive time and resource commitment of training from scratch. I wanted to get my hands dirty because I learn best by doing.&lt;/p&gt;

&lt;p&gt;Then it hit me: I’ve been active on an internet forum for over 10 years. I’ve posted thousands of messages there—a goldmine of data! I realized that with a few tools, I could potentially build an &lt;strong&gt;AI clone of myself&lt;/strong&gt;: one that writes, thinks, and even speaks in my voice.&lt;/p&gt;

&lt;p&gt;I know, it sounds a bit dystopian—and honestly, it is—but it’s also incredibly fun. To be clear: I am keeping all data strictly confidential. I won't be publishing the dataset or the final model for obvious privacy reasons.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A quick disclaimer:&lt;/strong&gt; If you decide to replicate this, you &lt;strong&gt;MUST&lt;/strong&gt; use your own data. Please do not create an AI clone of another person without their explicit consent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Speaking of privacy, I set a strict rule for this project: &lt;strong&gt;the data never leaves my control.&lt;/strong&gt; Ideally, it stays on my local machine. If it has to go to a server, it must be within the EU, with &lt;em&gt;zero data retention&lt;/em&gt; and full GDPR compliance.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Trial and Error Phase
&lt;/h3&gt;

&lt;p&gt;I’ll be honest: I didn't get a working clone on the first try. Not even close. It was a long journey of trial and error. At first, I aimed too high: trying to train &lt;strong&gt;Mistral Small 3.2 24b&lt;/strong&gt; on an OVH server with an L40S GPU. The training went okay, but I could never successfully save the weights to reuse the model due to persistent &lt;strong&gt;Out Of Memory (OOM)&lt;/strong&gt; errors.&lt;/p&gt;

&lt;p&gt;I decided to pivot to local training on my own rig. I experimented with various models like &lt;strong&gt;Qwen3&lt;/strong&gt; and &lt;strong&gt;Ministral&lt;/strong&gt;, hitting different roadblocks and variable results with each. In the end, I found the "sweet spot" with Google’s &lt;strong&gt;Gemma 3&lt;/strong&gt;. That’s the model I used for the final clone.&lt;/p&gt;

&lt;p&gt;Finally, I integrated text-to-audio (and vice versa) libraries so I could actually talk to "myself" via microphone and hear my own voice respond. 😱&lt;/p&gt;

&lt;h3&gt;
  
  
  The Final Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Technology&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;RTX 4060 Ti (16GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Library&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unsloth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Base Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemma 3 12b&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+ years of forum posts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speech-to-Text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI Whisper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Text-to-Speech&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen3-TTS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Without further ado, let’s dive into how I actually built this thing!&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the Dataset
&lt;/h2&gt;

&lt;p&gt;I successfully scraped my forum messages, but they weren't ready for fine-tuning straight out of the box. There is a massive amount of "data curation" required before a model can learn from them.&lt;/p&gt;

&lt;p&gt;For fine-tuning, we need an &lt;strong&gt;"input" &amp;gt; "output"&lt;/strong&gt; structure. This teaches the LLM that when it sees a specific type of "input" (a prompt or question), it should look to my "output" (my historical style) for the response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Personal Information (PII)
&lt;/h3&gt;

&lt;p&gt;Normally, the top priority would be managing &lt;strong&gt;PII (Personally Identifiable Information)&lt;/strong&gt;. For instance, if I had posted my address—or worse, if a quoted message contained someone else’s address—I wouldn't want the model to memorize and regurgitate it. In this specific case, I control the entire pipeline and the model won't be public. Furthermore, forum members use pseudonyms and are generally careful about privacy. However, if I planned to release this model, this step would be absolutely critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning the Raw Data
&lt;/h3&gt;

&lt;p&gt;The first pass involved stripping out the "noise":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Removing artifacts:&lt;/strong&gt; Images, formatting (bold, italics), certain emojis, and YouTube links.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplifying context:&lt;/strong&gt; For messages with multiple nested quotes, I stripped everything after the first quote to keep the &lt;code&gt;input &amp;gt; output&lt;/code&gt; relationship simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalization:&lt;/strong&gt; I standardized line breaks (max 2 in a row).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality control:&lt;/strong&gt; I deleted any message shorter than 6 words to ensure the model had enough substance to learn from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;De-duplication:&lt;/strong&gt; I removed about fifty very similar messages from a recurring forum event to prevent the model from "over-fitting" on repetitive nonsense.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reverse Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;The biggest hurdle was missing data. While my forum posts are the perfect &lt;strong&gt;output&lt;/strong&gt;, about 75% of them lacked an &lt;strong&gt;input&lt;/strong&gt;. Many posts were just part of a flow or started new sub-topics without quoting anyone. How do you train a model to respond if you don't have the question?&lt;/p&gt;

&lt;p&gt;This is where I used &lt;strong&gt;Reverse Prompting&lt;/strong&gt;. I sent the following prompt to a larger LLM:&lt;/p&gt;

&lt;p&gt;(&lt;em&gt;Translated from French&lt;/em&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`# Role  
You are an expert in forum conversation simulation. Your mission is to generate the INPUT (a message from a third party) that triggered my OUTPUT (your response).  

# Task  
Analyze my POST (Output) and reconstruct the preceding message.  

# Step 1: Classification (Reasoning)  
Ask yourself: "What is the purpose of my post?"  
A. **I am asking for help / Starting a topic** -&amp;gt; COLD_START.  
B. **I am providing a solution / Giving advice** -&amp;gt; Reply to a QUESTION (Problem).  
C. **I am reacting / Contradicting / Commenting** -&amp;gt; Reply to a STATEMENT (Opinion).  

### ⚠️ DISQUALIFICATION CRITERIA (COLD_START): If my POST contains formal opening/closing markers like:  
- "Hi, ..." at the beginning + a question.  
- "Thanks" or "Thanks in advance" at the end.  
- "Could I have...", "I'm looking for...", "I need..."  
THEN -&amp;gt; It is automatically a COLD_START (IGNORE).  

# Step 2: Input Generation (CRITICAL)  
Generate the message from the other user.  

### ⚠️ GOLDEN RULES AGAINST ECHOING (Must follow strictly): 
1. **Don't "spoil" the answer:** If my output contains a specific solution (e.g., "Use Software X"), the input MUST NOT mention "X". The input should express the *need* or *problem* (e.g., "I'm looking for software to do this").  
2. **Handling quotes:** If I say "As Anon said...", the input shouldn't ask "What did Anon say?". The input should BE "Anon" or someone discussing Anon's topic.  
3. **Create friction:** If I contradict someone, the input must state the opposite of what I say. If I say "That's false," the input must say "That's true."  
4. **Style:** No robotic/AI/LLM style. Use the forum slang. No excessive politeness.  

# Input -&amp;gt; Output Logic Examples:  
- BAD: Input="Is the 4070 good?" -&amp;gt; Output="Yes, the 4070 is great." (Too easy).  
- GOOD: Input="I'm hesitating to keep my 1060, is it worth upgrading?" -&amp;gt; Output="Yes, the 4070 is great."  

- BAD: Input="What do you think of Freud?" -&amp;gt; Output="Freud said X..."  
- GOOD: Input="Honestly, psychoanalysis is total nonsense." -&amp;gt; Output="Freud said X..."  

# Output Format (Strict JSON)  
{  
  "reasoning": "Why it's a Reply vs a Cold Start. What is the intent of the other user (Naivety? False statement?)", 
  "type": "COLD_START" | "REPLY_TO_QUESTION" | "REPLY_TO_STATEMENT",  
  "generated_input": "The simulated message (empty if COLD_START)"
}  

# My POST (Output)  
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;  
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt takes my post and asks the LLM to hallucinate the "trigger" message. I refined this heavily with Gemini to ensure high-quality results. If a post was a "Cold Start" (meaning it didn't need a preceding message), the script filtered it out.&lt;/p&gt;

&lt;p&gt;I used the &lt;strong&gt;OVH LLM API&lt;/strong&gt; with &lt;code&gt;Meta-Llama-3_3-70B-Instruct&lt;/code&gt;. It’s excellent with French, cost-effective, and keeps the data within the EU with zero data retention. Processing ~6,200 messages at 350 requests per minute took about 18 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Loading data...
Loaded 8581 parsed entries
Found 86 already completed entries
Found 6326 entries without quotes to process
6240 entries remaining to process

Processing &lt;span class="k"&gt;in &lt;/span&gt;18 batches of up to 350 requests each

Processing |██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 5% | 350/6240 | ETA: 307s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Quality Filter
&lt;/h3&gt;

&lt;p&gt;Even with Llama 3.3, the dataset wasn't perfect. I needed to separate the wheat from the chaff. Since I wasn't about to manually read 6,000 messages, I used another LLM to act as a &lt;strong&gt;Quality Filter&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;(&lt;em&gt;Translated from French&lt;/em&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a quality filter for a fine-tuning dataset.  
Your role is to clean the data. Analyze the [INPUT] / [OUTPUT] pair.  

STRICT REJECTION CRITERIA (DISCARD):  
1. Tautology: The Input repeats the Output or is too perfectly tailored to it.  
2. Question-by-Question: The Output just parrots the Input phrase; it's not a natural exchange.  
3. Illogical: The Output doesn't clearly answer the Input.  
4. Failed Cold Start: The Output is clearly a conversation starter, but the Input is a forced question.  
5. Low Value: Output is too short ("lol", "ok") or lacks identifiable context.  
6. Hallucination: The Input invents contradictory facts.  

Respond ONLY in JSON format: {"verdict": "KEEP" | "DISCARD", "reason": "..."}`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this filtering, I was left with roughly &lt;strong&gt;4,300 high-quality entries&lt;/strong&gt;. I also capped the text length at 2,000 characters to prevent &lt;strong&gt;Out Of Memory&lt;/strong&gt; errors during training. Total cost? Less than €8 (and actually €0 because I had welcome credits).&lt;/p&gt;

&lt;h3&gt;
  
  
  Personal Touches &amp;amp; Data Augmentation
&lt;/h3&gt;

&lt;p&gt;To help the model understand who I am without guessing, I manually answered 100 questions about myself (education, hobbies, favorite games, etc.).&lt;/p&gt;

&lt;p&gt;Since these are high-priority data points, I used &lt;strong&gt;oversampling&lt;/strong&gt;: I duplicated these 100 entries 4 times so they represent about 8% of the dataset. This "stamps" my actual identity into the model more firmly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding the Alpaca Dataset
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Alpaca dataset&lt;/strong&gt; is a gold standard in fine-tuning; it helps small models learn how to behave like a helpful assistant. I mixed in 500 random lines from &lt;code&gt;pinzhenchen/alpaca-cleaned-fr&lt;/code&gt; (French Alpaca). This adds a layer of consistency and structure, balancing out the "chaos" of forum discussions while keeping my personal style dominant.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Final Tally
&lt;/h3&gt;

&lt;p&gt;I ended up with a &lt;code&gt;.json&lt;/code&gt; file of about 17,000 lines (2MB) containing &lt;strong&gt;5,136 total entries&lt;/strong&gt; in this format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'm thinking of switching to open-ear headphones for running. Any recommendations?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Personally, I use the Shokz OpenMove. I'm really happy with them because [...]"&lt;/span&gt;&lt;span class="w"&gt;  
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They say a model is only as good as its data, and my experiments proved that 100%. Now, let’s get to the actual training!&lt;/p&gt;

&lt;h2&gt;
  
  
  Fine-tuning with Unsloth
&lt;/h2&gt;

&lt;p&gt;The fine-tuning process itself isn't necessarily the most complex part, though finding the right settings can be a bit of a challenge.&lt;/p&gt;

&lt;p&gt;To handle this, I used &lt;strong&gt;JupyterLab&lt;/strong&gt;, a web-based interactive development environment that allows you to execute Python code in separate blocks called "cells" while keeping an active kernel. This is incredibly useful because if your last cell crashes, you can often just fix and rerun it without restarting the entire process from scratch. It’s not magic, though—you can still crash the kernel and be forced to start over!&lt;/p&gt;

&lt;p&gt;For the training itself, I used the &lt;strong&gt;Unsloth&lt;/strong&gt; library.&lt;/p&gt;

&lt;p&gt;To get started on a machine with Python 3 installed, create a virtual environment, activate it, and install JupyterLab:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv clone
&lt;span class="nb"&gt;source &lt;/span&gt;clone/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;jupyterlab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, launch the JupyterLab interface and select the Python kernel (or the Unsloth kernel if you have one configured):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jupyter lab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is the commented script I ran in JupyterLab for the training. I recommend executing it piece by piece. The original notebook that served as my inspiration can be found here: &lt;a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb" rel="noopener noreferrer"&gt;Gemma 3 (4B) Unsloth Notebook&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Install the unsloth library
&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;unsloth&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unsloth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="c1"&gt;# 1. LOAD THE BASE MODEL &amp;amp; TOKENIZER
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsloth/gemma-3-12b-it-unsloth-bnb-4bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_seq_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Crucial for consumer GPUs: loads weights in 4-bit precision to save VRAM.
&lt;/span&gt;    &lt;span class="n"&gt;load_in_8bit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;full_finetuning&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. CONFIGURE LoRA (Low-Rank Adaptation)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;finetune_vision_layers&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;finetune_language_layers&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;finetune_attention_modules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;finetune_mlp_modules&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     &lt;span class="c1"&gt;# The "Rank" of the LoRA matrices. 32 is a solid middle ground (higher = smarter but slower/more memory).
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# Scaling factor for LoRA. Rule of thumb is often alpha = r or alpha = 2*r.
&lt;/span&gt;    &lt;span class="n"&gt;lora_dropout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Unsloth doesn't support dropout natively for max speed
&lt;/span&gt;    &lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# Do not train bias vectors (saves memory).
&lt;/span&gt;    &lt;span class="n"&gt;use_gradient_checkpointing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsloth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Saves VRAM by recomputing activations during backward pass instead of storing them.
&lt;/span&gt;    &lt;span class="n"&gt;random_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3407&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Fixed seed for reproducibility.
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;concatenate_datasets&lt;/span&gt;

&lt;span class="c1"&gt;# 3. PREPARE THE DATASETS
# Load my own dataset
&lt;/span&gt;&lt;span class="n"&gt;ds_style&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed2-filtered-trimmed.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load custom/manual questions/answers and oversample it (weighting it) by multiplying the dataset by 4.
# This forces the model to pay 4x more attention to this dataset.
&lt;/span&gt;&lt;span class="n"&gt;ds_custom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;custom.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ds_custom_weighted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;concatenate_datasets&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;ds_custom&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load a French translated version of the Alpaca dataset for general instruction-following
&lt;/span&gt;&lt;span class="n"&gt;ds_alpaca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pinzhenchen/alpaca-cleaned-fr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ds_alpaca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ds_alpaca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Subsample: only take the first 500 rows so it doesn't overpower your custom data.
&lt;/span&gt;
&lt;span class="c1"&gt;# Helper function to merge Alpaca's 'instruction' and 'input' fields into a single 'input' field.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_alpaca_to_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;instruction&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;new_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Contexte :&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;new_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ds_alpaca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ds_alpaca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;format_alpaca_to_user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remove_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instruction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Combine and shuffle
&lt;/span&gt;&lt;span class="n"&gt;combined_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;concatenate_datasets&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;ds_style&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ds_custom_weighted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ds_alpaca&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;combined_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;combined_dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shuffle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. APPLY CHAT TEMPLATES
# This wraps your raw text into the exact conversational format Gemma 3 was originally trained on.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;formatting_prompts_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;input_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tu es Jonathan, un humain. Tu n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;es pas une IA.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# "You are Jonathan, a human. You are not an AI."
&lt;/span&gt;            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;tokenize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Return a new column 'text' which the SFTTrainer will look for
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;combined_dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatting_prompts_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SFTTrainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SFTConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unsloth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;is_bfloat16_supported&lt;/span&gt;

&lt;span class="c1"&gt;# 5. TRAINER CONFIGURATION
&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SFTTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dataset_text_field&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Tells the trainer to use the 'text' column we just generated
&lt;/span&gt;    &lt;span class="n"&gt;max_seq_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Match the model's loaded context window
&lt;/span&gt;    &lt;span class="n"&gt;dataset_num_proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Number of CPU cores to use for dataset processing
&lt;/span&gt;    &lt;span class="n"&gt;packing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# If True, packs multiple short examples into one sequence. False is safer for chat templates.
&lt;/span&gt;    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SFTConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bf16&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# Uses bfloat16 mixed precision. Highly recommended for Ampere+ GPUs (RTX 3000 series or newer) for stability.
&lt;/span&gt;        &lt;span class="n"&gt;fp16&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# Set to True ONLY if your GPU doesn't support bf16.
&lt;/span&gt;        &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Keep small to fit in VRAM.
&lt;/span&gt;        &lt;span class="n"&gt;gradient_accumulation_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# "Virtual" batch size. Model updates weights only after 16 steps (Effective batch size = 1 * 16 = 16).
&lt;/span&gt;        &lt;span class="n"&gt;num_train_epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Number of full passes through the training data.
&lt;/span&gt;        &lt;span class="n"&gt;warmup_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Gradually increases learning rate for the first 50 steps to prevent catastrophic forgetting early on.
&lt;/span&gt;        &lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Standard aggressive LoRA learning rate.
&lt;/span&gt;        &lt;span class="n"&gt;logging_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Print training loss every 10 steps in jupyter output
&lt;/span&gt;        &lt;span class="n"&gt;optim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paged_adamw_8bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# 8-bit optimizer. Vital for saving VRAM without sacrificing performance.
&lt;/span&gt;        &lt;span class="n"&gt;weight_decay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Regularization to prevent overfitting.
&lt;/span&gt;        &lt;span class="n"&gt;lr_scheduler_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Gradually decays the learning rate to 0 over the course of training.
&lt;/span&gt;        &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3407&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;output_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Directory where intermediate checkpoints are saved.
&lt;/span&gt;        &lt;span class="n"&gt;report_to&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Disables integrations like Weights &amp;amp; Biases or TensorBoard. Change to "wandb" if you want nice graphs.
&lt;/span&gt;        &lt;span class="n"&gt;gradient_checkpointing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# Ensures activation memory is saved (syncs with Unsloth's checkpointing config).
&lt;/span&gt;    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 6. RUN TRAINING
&lt;/span&gt;&lt;span class="n"&gt;trainer_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 7. EXPORT TO GGUF
&lt;/span&gt;
&lt;span class="c1"&gt;# Save a Q4_K_M version: This is the most popular quantization. 
# It shrinks the model drastically (approx 4-bits per weight) with very minimal quality loss. Excellent for consumer hardware.
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained_gguf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma3_forum_gguf_q4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantization_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q4_k_m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save a Q8_0 version: 8-bit quantization.
# Larger file size and requires more RAM/VRAM to run, but retains almost 100% of the original unquantized model's intelligence.
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained_gguf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma3_forum_gguf_q8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantization_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q8_0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution took about 1 hour and 30 minutes on my machine.&lt;/p&gt;

&lt;p&gt;At the end of the process, I obtained two files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gemma-3-12b-it.Q4_K_M.gguf&lt;/strong&gt;: The most compressed version of the model. It fits into just over 8GB of VRAM and is the one I'll be using for daily interaction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemma-3-12b-it.Q8_0.gguf&lt;/strong&gt;: A higher-quality version, but it requires significantly more VRAM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran a few local tests, and my writing style is definitely recognizable! However, to truly validate if the clone works, I need to put it to the test with people who actually know me...&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: Is It Convincing?
&lt;/h2&gt;

&lt;p&gt;I wanted to know if my clone was actually any good. To find out, I put together a survey using a small script that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Picks a forum quote that I had historically replied to.&lt;/li&gt;
&lt;li&gt;Retrieves my &lt;strong&gt;actual&lt;/strong&gt; response from the archives.&lt;/li&gt;
&lt;li&gt;Generates a &lt;strong&gt;new response&lt;/strong&gt; using the clone, using that same quote as the prompt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I let the script generate 10 pairs of "Real Me" vs. "AI Me" responses and asked my friends and the forum members to guess which was which.&lt;/p&gt;

&lt;p&gt;The results were fascinating.&lt;/p&gt;

&lt;p&gt;Keep in mind that in a blind test like this, a &lt;strong&gt;50% correct guess rate&lt;/strong&gt; means the answers are statistically indistinguishable—people are basically just flipping a coin because they can't tell the difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Giveaways
&lt;/h3&gt;

&lt;p&gt;Two of the AI responses were total "dead giveaways":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination:&lt;/strong&gt; In one instance, the AI hallucinated my own username and started giving advice to "Jonathan," which made zero sense in context. 93% of people spotted this one immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Syntax error:&lt;/strong&gt; In another, the AI used a sentence structure that felt "off" and robotic. 87.5% of people caught it.
These two outliers definitely skewed the final average.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The "AI Wins"
&lt;/h3&gt;

&lt;p&gt;This is where it gets crazy: &lt;strong&gt;there were two instances where the AI won.&lt;/strong&gt; In these cases, 69.8% and 62.5% of respondents thought the AI response was the "real" me. It turns out the model can mimic my style so effectively that it sometimes sounds more like "me" than I do!&lt;/p&gt;

&lt;p&gt;We also had one question hit exactly &lt;strong&gt;50%&lt;/strong&gt;, making it completely indistinguishable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Stats
&lt;/h3&gt;

&lt;p&gt;In the end, people were able to recognize my AI clone 66% of the time (50% being unrecognizable). I consider this a massive success, especially since there was absolutely &lt;strong&gt;no cherry-picking&lt;/strong&gt;: all quotes were chosen randomly, and the AI responses were generated on the first try without any "rerolls."&lt;/p&gt;

&lt;p&gt;Here are a few example exchanges (&lt;em&gt;translated from French&lt;/em&gt;) using Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Is Windows Vista a good OS?
Windows Vista is an excellent OS, you should just install it :p

&amp;gt;&amp;gt;&amp;gt; I decided to call my cat: The Great Old One.
Haha, I love that name for a cat

&amp;gt;&amp;gt;&amp;gt; Can you give me your favorite recipe?
On one hand, it's simple, but on the other, it's very effective: Basquaise chicken. 
It's a Spanish dish with chicken simmered in a sauce of red peppers and tomatoes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Clarification: I love the chicken, but hate the OS. That said, the &lt;code&gt;:p&lt;/code&gt; suggests the clone might just be trolling—which, considering the dataset, means it’s working perfectly.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Taking It a Step Further!
&lt;/h2&gt;

&lt;p&gt;Now that I had a model that thinks like me, I wanted to take it to the next level: giving it my voice and the ability to converse with it naturally.&lt;/p&gt;

&lt;p&gt;I set up a pipeline using &lt;strong&gt;Whisper&lt;/strong&gt; (OpenAI’s open-weights model) to transcribe my speech into text. That text is then fed to my "clone" model as a prompt. Once the response is generated, it’s sent to &lt;strong&gt;Qwen3-TTS&lt;/strong&gt;, a model capable of high-quality voice cloning.&lt;/p&gt;

&lt;p&gt;To set up the voice, I "trained" (or rather, fine-tuned the reference for) Qwen3-TTS by recording myself speaking for about one minute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxtdvkfk0ve1u7dd0gos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxtdvkfk0ve1u7dd0gos.png" alt="Clone Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All these models are &lt;strong&gt;open-weights&lt;/strong&gt; and run locally on my machine!&lt;/p&gt;

&lt;p&gt;Of course, we aren't quite at the level of latency you'd see in top-tier commercial models. I don't have the enterprise-grade GPU required for instantaneous replies, and I wasn't specifically optimizing for speed—for instance, I don’t stream the audio chunks from Qwen3-TTS; I simply wait for the full response to generate.&lt;/p&gt;

&lt;p&gt;Here is a short demo video. It’s in French, but you can turn on the English subtitles on YouTube! I cut the wait times after the first question to make it smoother.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/nW8AtLGEoyg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Here is the Python script used in this video: &lt;a href="https://gist.github.com/jprevo/76d2e1fe388a3ecff7631b18ef91c233" rel="noopener noreferrer"&gt;https://gist.github.com/jprevo/76d2e1fe388a3ecff7631b18ef91c233&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Ultimately, I’ve only scratched the surface of what’s possible with LLM fine-tuning. My model resembles me quite a bit in writing (and in voice!), but it still has its fair share of inconsistencies. I’m convinced that by spending even more time curating the dataset and obsessing over hyperparameters, the results could be even more uncanny.&lt;/p&gt;

&lt;p&gt;That said, I had a blast and learned an immense amount, which was the whole point. My clone doesn't really have a "job" or a practical utility, so I’ll likely leave it in a corner of my hard drive and move on to the next project.&lt;/p&gt;

&lt;p&gt;...Until the day it returns to seek its revenge! 🤖&lt;/p&gt;

&lt;p&gt;You can find me on LinkedIn here: &lt;a href="https://www.linkedin.com/in/jonathan-prevost-445758260/" rel="noopener noreferrer"&gt;Jonathan Prevost&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>machinelearning</category>
      <category>llm</category>
    </item>
    <item>
      <title>Mapstronaut: A Flexible Object-Mapping Library for JS/TS</title>
      <dc:creator>Jonathan P.</dc:creator>
      <pubDate>Tue, 26 Aug 2025 19:08:46 +0000</pubDate>
      <link>https://dev.to/jprevo/introducing-mapstronaut-a-flexible-object-mapping-library-for-jsts-3470</link>
      <guid>https://dev.to/jprevo/introducing-mapstronaut-a-flexible-object-mapping-library-for-jsts-3470</guid>
      <description>&lt;p&gt;Hey everyone!&lt;/p&gt;

&lt;p&gt;I'm a senior developer with nearly 20 years of experience, but I'm pretty new to the open-source world. I wanted to share the story of my first major library.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Mapstronaut?
&lt;/h2&gt;

&lt;p&gt;After 5 years of intensive development in TypeScript, I switched jobs about a year ago and found myself deep in the Java ecosystem. I had my preconceptions, but I have to admit: with modern frameworks and libraries, the developer experience is excellent.&lt;/p&gt;

&lt;p&gt;One library, in particular, became a daily tool for me: &lt;strong&gt;MapStruct&lt;/strong&gt;. It's a fantastic code generator for mapping one object to another. When I found myself working on Node.js or React projects again, I really missed its power and simplicity. I looked for an open-source equivalent in the JavaScript/TS world, but nothing quite fit my needs.&lt;/p&gt;

&lt;p&gt;So, I decided to build it myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Mapstronaut 👨‍🚀
&lt;/h2&gt;

&lt;p&gt;Mapstronaut is a powerful and flexible object mapping library for JavaScript and TypeScript.&lt;/p&gt;

&lt;p&gt;Here are some of its key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple, easy-to-understand declarative mapping with transformers, filters, and error handling.&lt;/li&gt;
&lt;li&gt;Advanced source/target property selectors using JSONPath and dot-prop.&lt;/li&gt;
&lt;li&gt;Auto-mapping available: maps values present in both the source and target without explicit declaration.&lt;/li&gt;
&lt;li&gt;Can perform asynchronous transformations and mappings in parallel.&lt;/li&gt;
&lt;li&gt;Very high test coverage.&lt;/li&gt;
&lt;li&gt;Comprehensive documentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Let's start with a basic mapping. Imagine you have a complex source object and need to create a simpler DTO.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;spacecraft&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Apollo 11&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Neil Armstrong&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Buzz Aldrin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Michael Collins&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;mission&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;year&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1969&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Moon&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;spacecraft.name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vesselName&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mission.destination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;target&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;spacecraft.crew[0]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;firstManOnTheMoon&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mapObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;structure&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cm"&gt;/*
result is:
{
  "vesselName": "Apollo 11",
  "target": "Moon",
  "firstManOnTheMoon": "Neil Armstrong"
}
*/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But Mapstronaut can do much more. Here is an example of a more advanced set of rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mission.budget&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;budgetBillions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Use a custom function to transform values&lt;/span&gt;
    &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;e9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// Use JSONPath to slice an array&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crew[0:2]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crew.firstTwo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// Use JSONPath to filter an array&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crew[?(@.age &amp;gt;= 30)]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crew.senior&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// Add a constant value to the output&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;NASA&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agency&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Beyond these examples, Mapstronaut is also fully async-ready. This allows you to perform complex operations, like fetching data from an API in the middle of a transformation. All async operations can be run in parallel for maximum performance. You can find detailed examples for this and more in the documentation.&lt;/p&gt;

&lt;p&gt;The project is open source under the MIT license.&lt;/p&gt;

&lt;p&gt;You can check out the code and documentation here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jprevo/mapstronaut" rel="noopener noreferrer"&gt;https://github.com/jprevo/mapstronaut&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'd love to hear your feedback. What do you use for object mapping in your projects? Are there any features you'd find useful?&lt;/p&gt;

&lt;p&gt;Feel free to open an issue or a PR on GitHub.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>javascript</category>
      <category>datastructures</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
