<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Torkian</title>
    <description>The latest articles on DEV Community by Torkian (@torkian).</description>
    <link>https://dev.to/torkian</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943111%2F354b4c8f-f2d1-4929-9575-e2d459e89a6b.png</url>
      <title>DEV Community: Torkian</title>
      <link>https://dev.to/torkian</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/torkian"/>
    <language>en</language>
    <item>
      <title>Build Your First AI App with NVIDIA NIM in 30 Minutes</title>
      <dc:creator>Torkian</dc:creator>
      <pubDate>Thu, 21 May 2026 22:43:28 +0000</pubDate>
      <link>https://dev.to/torkian/build-your-first-ai-app-with-nvidia-nim-in-30-minutes-1i43</link>
      <guid>https://dev.to/torkian/build-your-first-ai-app-with-nvidia-nim-in-30-minutes-1i43</guid>
      <description>&lt;p&gt;Most students I've taught at USC have used ChatGPT. Far fewer have called a model from code.&lt;/p&gt;

&lt;p&gt;That is the gap this post is meant to close. In 30 minutes, you'll call an NVIDIA-hosted language model from Python, pass it a small knowledge base, and make it answer only from that data. No GPU setup, no CUDA detour, no pretending a notebook is production. The goal is simple — write a normal Python program that talks to an LLM and gets useful text back.&lt;/p&gt;

&lt;p&gt;I'm B Torkian, NVIDIA Developer Champion at USC, and I use this as a starter workshop for university and community groups. I've run a version of it with about 40 USC students. What usually surprises people is how ordinary the app feels. Most of it is normal software; one function call in the middle just happens to be weirdly powerful.&lt;/p&gt;

&lt;p&gt;Everything runs in Google Colab because, for a room full of mixed laptops (I have made peace with this), boring setup wins.&lt;/p&gt;

&lt;p&gt;This is Part 1 of a 5-part series that goes from one API call all the way to a small tool-using agent. Each post stands on its own, so start here and move forward as far as you want to go.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you're building
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User question → Python app → NVIDIA NIM API → LLM response → App output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small USC campus assistant. It will call an NVIDIA-hosted Llama model, use the data you provide, and refuse when the answer isn't there.&lt;/p&gt;

&lt;p&gt;That refusal part matters. Demos can guess. Useful apps need to know when to say "I don't know."&lt;/p&gt;




&lt;h2&gt;
  
  
  What NVIDIA NIM is
&lt;/h2&gt;

&lt;p&gt;NIM stands for NVIDIA Inference Microservices. For this post, treat it as hosted model inference from NVIDIA with a clean API in front.&lt;/p&gt;

&lt;p&gt;There are two common ways to use it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hosted through NVIDIA's API Catalog at &lt;a href="https://build.nvidia.com/" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;. That's what we're using here; check the current catalog terms before you teach it, because credits and available models can change.&lt;/li&gt;
&lt;li&gt;Self-hosted on your own GPU later, with the same API shape. (That's Part 4 of this series.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Whoever decided NVIDIA's API should mimic OpenAI's saved everyone a week of onboarding. You use the client most people have already seen, point it at a different endpoint, and move on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites (5 minutes)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;A free NVIDIA Developer account — &lt;a href="https://developer.nvidia.com/" rel="noopener noreferrer"&gt;developer.nvidia.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An API key from &lt;a href="https://build.nvidia.com/" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt; → pick any model → &lt;strong&gt;Get API Key&lt;/strong&gt;. It starts with &lt;code&gt;nvapi-&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A Google account for Colab.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first time I taught this, I forgot to say the key starts with &lt;code&gt;nvapi-&lt;/code&gt;, and half the room pasted the wrong thing (usually not their fault). Check that before you debug anything else.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Open Colab and install the client
&lt;/h2&gt;

&lt;p&gt;NVIDIA's API Catalog is OpenAI-compatible, so we'll use the standard &lt;code&gt;openai&lt;/code&gt; Python client and point it at NVIDIA's endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;getpass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NVIDIA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getpass&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Paste your NVIDIA API key: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NVIDIA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-8b-instruct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;base_url&lt;/code&gt; points at NVIDIA's hosted inference endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MODEL&lt;/code&gt; is just a model name from the API Catalog. Swap it later if you want; the call shape does not change.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2: Make your first model call
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You are a helpful, concise assistant.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Explain GPU acceleration to a first-year CS student in 5 sentences.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it.&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;ask()&lt;/code&gt; function is the basic shape of a lot of AI apps — instructions in, user input in, model response out. Real systems add plumbing, but this is the core.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Use the system prompt to steer the model
&lt;/h2&gt;

&lt;p&gt;Now keep the model and change its job description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You are a sarcastic but accurate professor. Keep it under 5 sentences.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Explain GPU acceleration to a first-year CS student.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output changes because the system prompt changes the model's job. A little precision buys you a lot here; vague prompts make debugging miserable.&lt;/p&gt;

&lt;p&gt;Treat prompts like tiny specs — include constraints, output shape, and what to do when a question goes off-track. Then test with slightly annoying questions, because users will absolutely ask those.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Build the USC campus assistant
&lt;/h2&gt;

&lt;p&gt;An LLM doesn't know the USC schedule. It may still sound confident, which is exactly the problem.&lt;/p&gt;

&lt;p&gt;So put the USC campus information directly into the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;campus_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
The USC AI Club meets every Thursday at 5 PM in the engineering building, room 204.
The USC GPU computing lab is open Monday to Friday from 10 AM to 6 PM.
USC students can join the NVIDIA Developer Program for free.
The next USC AI Club workshop will cover Retrieval Augmented Generation (RAG).
Office hours for the USC AI/ML faculty are Tuesdays 2-4 PM.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;assistant_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a USC campus assistant. Answer ONLY using the
information in CAMPUS INFO below. If the answer is not in there, say
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t have that information — check with the USC AI Club.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

CAMPUS INFO:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;campus_info&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;When does the USC AI Club meet?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Is the USC GPU lab open on Saturday?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is the wifi password?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Q: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;assistant_system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it and read the outputs before moving on. The USC AI Club answer should come straight from the text. For Saturday, the model often refuses with the fallback line instead of inferring closed. That is the behavior I want for this exercise — "Monday to Friday" gives a human enough to reason about Saturday, but the exact Saturday answer is not stated in the provided data.&lt;/p&gt;

&lt;p&gt;The wifi question should also get the fallback line, because there is nothing in &lt;code&gt;campus_info&lt;/code&gt; about passwords. If your model says "I don't have that information — check with the USC AI Club," do not treat that as a failure. It stayed inside the box we gave it, which is the whole point.&lt;/p&gt;

&lt;p&gt;Last USC cohort, one student replaced the campus info with their D&amp;amp;D campaign notes and ended up with the most fun bug-hunting session of the day. The pattern works for silly data and useful data, which is why it sticks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: What you actually did
&lt;/h2&gt;

&lt;p&gt;You just built manual RAG — you picked the context by hand, inserted it into the prompt, and asked the model to answer from that context. In a production-ish version, the hand-picked &lt;code&gt;campus_info&lt;/code&gt; string becomes whatever your retrieval system finds.&lt;/p&gt;

&lt;p&gt;In a real app, the context probably comes from PDFs, docs, tickets, lecture notes, or a wiki. You retrieve a few relevant chunks at query time, usually with embeddings and a vector database, then pass only those along.&lt;/p&gt;

&lt;p&gt;The model call barely changes — &lt;code&gt;campus_info&lt;/code&gt; becomes the output of retrieval. Most of the engineering work lives in that swap.&lt;/p&gt;

&lt;p&gt;That swap is exactly what Part 2 of this series is about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get the code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/torkian/nvidia-nim-workshop" rel="noopener noreferrer"&gt;github.com/torkian/nvidia-nim-workshop&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;One-click Colab:&lt;/strong&gt; &lt;a href="https://colab.research.google.com/github/torkian/nvidia-nim-workshop/blob/main/notebook.ipynb" rel="noopener noreferrer"&gt;Open the notebook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. I run this at USC — fork it, change &lt;code&gt;campus_info&lt;/code&gt; to your school, your club, your project, and run it wherever you are.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next in this series
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Part 2: Give Your AI App Real Knowledge — Embedding-Based RAG with NVIDIA NIM.&lt;/strong&gt; We replace the hand-picked context string with a real retriever that uses NVIDIA's embedding model, cosine similarity, and a query/passage distinction that most beginners get wrong on the first try.&lt;/p&gt;

</description>
      <category>nvidia</category>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
