Build Your First AI App with NVIDIA NIM in 30 Minutes

#nvidia #ai #python #tutorial

Most students I've taught at USC have used ChatGPT. Far fewer have called a model from code.

That is the gap this post is meant to close. In 30 minutes, you'll call an NVIDIA-hosted language model from Python, pass it a small knowledge base, and make it answer only from that data. No GPU setup, no CUDA detour, no pretending a notebook is production. The goal is simple — write a normal Python program that talks to an LLM and gets useful text back.

I'm B Torkian, NVIDIA Developer Champion at USC, and I use this as a starter workshop for university and community groups. I've run a version of it with about 40 USC students. What usually surprises people is how ordinary the app feels. Most of it is normal software; one function call in the middle just happens to be weirdly powerful.

Everything runs in Google Colab because, for a room full of mixed laptops (I have made peace with this), boring setup wins.

This is Part 1 of a series that goes from one API call all the way to a streaming, tool-using agent that returns structured data. Each post stands on its own, so start here and move forward as far as you want to go.

What you're building

User question → Python app → NVIDIA NIM API → LLM response → App output

A small USC campus assistant. It will call an NVIDIA-hosted Llama model, use the data you provide, and refuse when the answer isn't there.

That refusal part matters. Demos can guess. Useful apps need to know when to say "I don't know."

What NVIDIA NIM is

NIM stands for NVIDIA Inference Microservices. For this post, treat it as hosted model inference from NVIDIA with a clean API in front.

There are two common ways to use it:

Hosted through NVIDIA's API Catalog at build.nvidia.com. That's what we're using here; check the current catalog terms before you teach it, because credits and available models can change.
Self-hosted on your own GPU later, with the same API shape. (That's Part 4 of this series.)

Whoever decided NVIDIA's API should mimic OpenAI's saved everyone a week of onboarding. You use the client most people have already seen, point it at a different endpoint, and move on.

Prerequisites (5 minutes)

A free NVIDIA Developer account — developer.nvidia.com
An API key from build.nvidia.com → pick any model → Get API Key. It starts with nvapi-.
A Google account for Colab.

The first time I taught this, I forgot to say the key starts with nvapi-, and half the room pasted the wrong thing (usually not their fault). Check that before you debug anything else.

Step 1: Open Colab and install the client

NVIDIA's API Catalog is OpenAI-compatible, so we'll use the standard openai Python client and point it at NVIDIA's endpoint.

%pip install -q openai

import os, getpass
from openai import OpenAI

os.environ['NVIDIA_API_KEY'] = getpass.getpass('Paste your NVIDIA API key: ')

client = OpenAI(
    base_url='https://integrate.api.nvidia.com/v1',
    api_key=os.environ['NVIDIA_API_KEY'],
)

MODEL = 'meta/llama-3.1-8b-instruct'

Notice two things:

base_url points at NVIDIA's hosted inference endpoint.
MODEL is just a model name from the API Catalog. Swap it later if you want; the call shape does not change.

Step 2: Make your first model call

def ask(system_prompt: str, user_message: str) -> str:
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {'role': 'system', 'content': system_prompt},
            {'role': 'user',   'content': user_message},
        ],
        temperature=0.3,
        max_tokens=400,
    )
    return response.choices[0].message.content

print(ask(
    system_prompt='You are a helpful, concise assistant.',
    user_message='Explain GPU acceleration to a first-year CS student in 5 sentences.',
))

Run it.

That ask() function is the basic shape of a lot of AI apps — instructions in, user input in, model response out. Real systems add plumbing, but this is the core.

Step 3: Use the system prompt to steer the model

Now keep the model and change its job description:

print(ask(
    system_prompt='You are a sarcastic but accurate professor. Keep it under 5 sentences.',
    user_message='Explain GPU acceleration to a first-year CS student.',
))

The output changes because the system prompt changes the model's job. A little precision buys you a lot here; vague prompts make debugging miserable.

Treat prompts like tiny specs — include constraints, output shape, and what to do when a question goes off-track. Then test with slightly annoying questions, because users will absolutely ask those.

Step 4: Build the USC campus assistant

An LLM doesn't know the USC schedule. It may still sound confident, which is exactly the problem.

So put the USC campus information directly into the prompt:

campus_info = """
The USC AI Club meets every Thursday at 5 PM in the engineering building, room 204.
The USC GPU computing lab is open Monday to Friday from 10 AM to 6 PM.
USC students can join the NVIDIA Developer Program for free to access tools and learning resources.
The next USC AI Club workshop will cover Retrieval Augmented Generation (RAG).
Office hours for the USC AI/ML faculty are Tuesdays 2-4 PM.
"""

assistant_system_prompt = f"""You are a USC campus assistant. Answer ONLY using the
information in CAMPUS INFO below. If the answer is not in there, say
"I don't have that information — check with the USC AI Club."

CAMPUS INFO:
{campus_info}
"""

for question in [
    'When does the USC AI Club meet?',
    'Is the USC GPU lab open on Saturday?',
    'What is the wifi password?',
]:
    print(f'Q: {question}')
    print(f'A: {ask(assistant_system_prompt, question)}\n')

Run it and read the outputs before moving on. The USC AI Club answer should come straight from the text. For Saturday, the model often refuses with the fallback line instead of inferring closed. That is the behavior I want for this exercise — "Monday to Friday" gives a human enough to reason about Saturday, but the exact Saturday answer is not stated in the provided data.

The wifi question should also get the fallback line, because there is nothing in campus_info about passwords. If your model says "I don't have that information — check with the USC AI Club," do not treat that as a failure. It stayed inside the box we gave it, which is the whole point.

Last USC cohort, one student replaced the campus info with their D&D campaign notes and ended up with the most fun bug-hunting session of the day. The pattern works for silly data and useful data, which is why it sticks.

Step 5: What you actually did

You just built manual RAG — you picked the context by hand, inserted it into the prompt, and asked the model to answer from that context. In a production-ish version, the hand-picked campus_info string becomes whatever your retrieval system finds.

In a real app, the context probably comes from PDFs, docs, tickets, lecture notes, or a wiki. You retrieve a few relevant chunks at query time, usually with embeddings and a vector database, then pass only those along.

The model call barely changes — campus_info becomes the output of retrieval. Most of the engineering work lives in that swap.

That swap is exactly what Part 2 of this series is about.

Get the code

Repo: github.com/torkian/nvidia-nim-workshop
One-click Colab: Open the notebook
Local Python: app.py in the repo (python3 app.py after pip install -r requirements.txt).

MIT licensed. I run this at USC — fork it, change campus_info to your school, your club, your project, and run it wherever you are.