Akhilesh

Posted on May 28

100. OpenAI API: Build With GPT-4 (Post 100: The Final Chapter)

#deeplearning #nlp #llm #python

Post 1 was Python variables.

Post 100 is GPT-4.

You've come from writing your first for loop to understanding transformer architectures, building neural networks from scratch, fine-tuning LLMs with LoRA, building RAG pipelines, and creating chatbots with memory.

This final post puts it all together. The OpenAI API is how most people actually ship AI products. Chat completions, function calling, streaming, vision, embeddings. Everything you need to build something real.

Let's finish strong.

What You'll Learn Here

API setup and authentication
Chat completions: the core pattern
System prompts: controlling model behavior
Function calling: giving LLMs tools
Streaming: responses that appear word by word
Vision: analyzing images with GPT-4V
Embeddings via API: fast, high quality
Token counting and cost management
Rate limits and error handling
A complete project: an AI assistant with tools

Setup

pip install openai tiktoken

import openai
import os

# Set your API key
# Option 1: environment variable (recommended)
# export OPENAI_API_KEY='sk-...'

# Option 2: set directly (never commit this to git)
# openai.api_key = 'sk-...'

client = openai.OpenAI()  # reads OPENAI_API_KEY from environment

# Test the connection
models = client.models.list()
print("Connected to OpenAI API")
print(f"Available models (sample): {[m.id for m in list(models)[:5]]}")

Chat Completions: The Core Pattern

Every interaction with GPT-4 goes through the same API call. Messages are a list of role-content pairs: system, user, and assistant.

# Simplest possible call
response = client.chat.completions.create(
    model="gpt-4o-mini",   # fast and cheap for most tasks
    messages=[
        {"role": "user", "content": "What is machine learning in one sentence?"}
    ]
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")
print(f"Model: {response.model}")

Output:

Machine learning is a field of AI where computers learn patterns from data to make predictions or decisions without being explicitly programmed for each task.

Tokens used: 52
Model: gpt-4o-mini

# With system prompt and multiple turns
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a senior ML engineer who explains concepts clearly and concisely. Use analogies when helpful. Never use jargon without explaining it."
        },
        {
            "role": "user",
            "content": "Explain overfitting."
        }
    ],
    temperature=0.7,        # creativity (0=deterministic, 2=very random)
    max_tokens=300,         # cap the response length
    top_p=0.95,             # nucleus sampling
    frequency_penalty=0.1,  # penalize repeated tokens
    presence_penalty=0.1,   # penalize already-mentioned topics
)

print(response.choices[0].message.content)

System Prompts: Controlling Model Behavior

The system prompt is the single most powerful tool for shaping GPT-4's behavior.

system_prompts = {
    "concise_explainer": """
You explain technical concepts in 3 sentences or less.
Always use a concrete real-world example.
Never use bullet points.
""",
    "code_reviewer": """
You are a senior Python engineer doing code review.
Point out bugs, style issues, and performance problems.
Format your response as:
BUGS: (list any bugs)
STYLE: (list style issues)
PERFORMANCE: (list performance concerns)
SUGGESTIONS: (overall recommendation)
""",
    "socratic_tutor": """
You are a Socratic tutor. Never give direct answers.
Instead, guide the student with questions that help them discover the answer themselves.
When they get something right, affirm it and ask a deeper follow-up question.
""",
    "strict_json": """
You always respond in valid JSON format only.
No markdown. No explanation. Just raw JSON.
Never include anything outside the JSON structure.
"""
}

# Test the code reviewer persona
code_to_review = """
def get_user_data(user_ids):
    results = []
    for id in user_ids:
        data = database.query(f"SELECT * FROM users WHERE id = {id}")
        results.append(data)
    return results
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system",  "content": system_prompts["code_reviewer"]},
        {"role": "user",    "content": f"Review this code:\n```
{% endraw %}
python{code_to_review}
{% raw %}
```"}
    ]
)
print(response.choices[0].message.content)

Structured Output: JSON Mode

When you need JSON responses you can parse reliably.

import json

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "Extract structured information from the text. Return JSON only."
        },
        {
            "role": "user",
            "content": """
Extract the following from this text:
- person_name
- job_title
- company
- key_skills (list)

Text: "Sarah Chen is a Senior ML Engineer at Anthropic. She specializes in 
transformer architectures, reinforcement learning from human feedback, and 
large-scale distributed training."
"""
        }
    ],
    response_format={"type": "json_object"}   # forces JSON output
)

data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))

Output:

{
  "person_name": "Sarah Chen",
  "job_title": "Senior ML Engineer",
  "company": "Anthropic",
  "key_skills": [
    "transformer architectures",
    "reinforcement learning from human feedback",
    "large-scale distributed training"
  ]
}

Function Calling: Giving GPT-4 Tools

Function calling lets the model request tools (functions) that you define. The model decides when to call a function and with what arguments. You execute it and send results back.

This is how AI agents work.

import json

# Define tools the model can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'London' or 'Tokyo'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_documents",
            "description": "Search the company knowledge base for relevant documents",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "max_results": {
                        "type": "integer",
                        "description": "Maximum number of results to return",
                        "default": 3
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Mock function implementations
def get_weather(city: str, units: str = "celsius") -> dict:
    # In production: call a real weather API
    return {
        "city": city,
        "temperature": 18,
        "units": units,
        "condition": "partly cloudy",
        "humidity": "65%"
    }

def search_documents(query: str, max_results: int = 3) -> list:
    # In production: call your vector database
    return [
        {"title": f"Document about {query}", "snippet": f"Relevant content for: {query}", "score": 0.92},
        {"title": f"Guide to {query}", "snippet": f"Comprehensive overview of {query}", "score": 0.87},
    ][:max_results]

# Dispatch function calls
def execute_function(name: str, arguments: dict):
    if name == "get_weather":
        return get_weather(**arguments)
    elif name == "search_documents":
        return search_documents(**arguments)
    else:
        return {"error": f"Unknown function: {name}"}

# Full function calling loop
def agent_chat(user_message: str) -> str:
    messages = [
        {"role": "system",  "content": "You are a helpful assistant with access to weather data and a document search tool. Use tools when needed."},
        {"role": "user",    "content": user_message}
    ]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
            tool_choice="auto"   # model decides when to use tools
        )

        message = response.choices[0].message

        # If no tool calls: we have the final answer
        if not message.tool_calls:
            return message.content

        # Process tool calls
        messages.append(message)   # add assistant message with tool_calls

        for tool_call in message.tool_calls:
            function_name = tool_call.function.name
            arguments     = json.loads(tool_call.function.arguments)

            print(f"  [Tool call] {function_name}({arguments})")

            result = execute_function(function_name, arguments)

            messages.append({
                "role":         "tool",
                "tool_call_id": tool_call.id,
                "content":      json.dumps(result)
            })

# Test it
queries = [
    "What's the weather like in Tokyo right now?",
    "Search for documents about machine learning best practices.",
    "What's the weather in Paris and London? Compare them.",
]

for query in queries:
    print(f"\nUser: {query}")
    answer = agent_chat(query)
    print(f"Bot:  {answer}")

Output:

User: What's the weather like in Tokyo right now?
  [Tool call] get_weather({'city': 'Tokyo', 'units': 'celsius'})
Bot:  The current weather in Tokyo is 18°C and partly cloudy, with humidity at 65%.

User: Search for documents about machine learning best practices.
  [Tool call] search_documents({'query': 'machine learning best practices', 'max_results': 3})
Bot:  I found 2 relevant documents about machine learning best practices...

User: What's the weather in Paris and London? Compare them.
  [Tool call] get_weather({'city': 'Paris', 'units': 'celsius'})
  [Tool call] get_weather({'city': 'London', 'units': 'celsius'})
Bot:  Both Paris and London currently show 18°C with partly cloudy conditions...

Streaming: Word-by-Word Responses

Instead of waiting for the full response, stream it token by token. Makes the UI feel much faster.

import sys

def stream_response(user_message: str, system: str = "You are a helpful assistant."):
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            {"role": "user",   "content": user_message}
        ],
        stream=True   # enable streaming
    )

    full_response = ""
    print("Bot: ", end="", flush=True)

    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            token = chunk.choices[0].delta.content
            print(token, end="", flush=True)
            full_response += token

    print()   # newline at end
    return full_response

response = stream_response("Explain gradient descent in 3 sentences.")

Vision: Analyze Images With GPT-4V

import base64
from pathlib import Path

def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

def analyze_image(image_path: str, question: str) -> str:
    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model="gpt-4o",   # vision requires gpt-4o or gpt-4-vision-preview
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                            "detail": "high"   # or "low" for faster/cheaper
                        }
                    },
                    {
                        "type": "text",
                        "text": question
                    }
                ]
            }
        ],
        max_tokens=300
    )

    return response.choices[0].message.content

# Also works with image URLs directly
def analyze_image_url(url: str, question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "image_url", "image_url": {"url": url}},
                    {"type": "text",      "text": question}
                ]
            }
        ]
    )
    return response.choices[0].message.content

# Example
# result = analyze_image_url(
#     "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Eiffel_Tower.jpg/640px-Eiffel_Tower.jpg",
#     "What is in this image? Describe it in detail."
# )
print("Vision API ready - pass an image path or URL with your question")

Embeddings via API

# High-quality embeddings from OpenAI
def get_embedding(text: str, model: str = "text-embedding-3-small") -> list:
    response = client.embeddings.create(
        input=text,
        model=model
    )
    return response.data[0].embedding

def get_embeddings_batch(texts: list, model: str = "text-embedding-3-small") -> list:
    response = client.embeddings.create(
        input=texts,
        model=model
    )
    return [item.embedding for item in response.data]

# Available embedding models
embedding_models = {
    "text-embedding-3-small": {"dim": 1536, "cost": "$0.02/1M tokens", "quality": "good"},
    "text-embedding-3-large": {"dim": 3072, "cost": "$0.13/1M tokens", "quality": "best"},
    "text-embedding-ada-002": {"dim": 1536, "cost": "$0.10/1M tokens", "quality": "older"},
}

print("OpenAI Embedding Models:")
for model, info in embedding_models.items():
    print(f"  {model}: dim={info['dim']}, cost={info['cost']}, quality={info['quality']}")

# Example usage
# embedding = get_embedding("Machine learning is fascinating.")
# print(f"Embedding dimension: {len(embedding)}")

Token Counting and Cost Management

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    return len(encoding.encode(text))

def count_message_tokens(messages: list, model: str = "gpt-4o-mini") -> int:
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")

    tokens_per_message = 3   # every message has role + content overhead
    tokens_per_name    = 1

    total = 0
    for message in messages:
        total += tokens_per_message
        for key, value in message.items():
            total += len(encoding.encode(str(value)))
            if key == "name":
                total += tokens_per_name

    total += 3   # every reply primed with <|start|>assistant<|message|>
    return total

# Pricing (as of 2024, check openai.com/pricing for current rates)
pricing = {
    "gpt-4o":           {"input": 5.00,  "output": 15.00},   # per 1M tokens
    "gpt-4o-mini":      {"input": 0.15,  "output": 0.60},
    "gpt-3.5-turbo":    {"input": 0.50,  "output": 1.50},
    "text-embedding-3-small": {"input": 0.02, "output": 0},
}

def estimate_cost(n_input_tokens: int, n_output_tokens: int, model: str) -> float:
    if model not in pricing:
        return 0
    p        = pricing[model]
    cost_in  = (n_input_tokens / 1_000_000) * p["input"]
    cost_out = (n_output_tokens / 1_000_000) * p["output"]
    return cost_in + cost_out

# Example: estimate cost before making a call
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "Explain transformer architecture in detail."}
]
model = "gpt-4o-mini"

input_tokens     = count_message_tokens(messages, model)
estimated_output = 300   # estimate based on max_tokens

cost = estimate_cost(input_tokens, estimated_output, model)

print(f"Input tokens:    {input_tokens}")
print(f"Estimated output: {estimated_output}")
print(f"Estimated cost:  ${cost:.6f}")

# Track actual usage across calls
class UsageTracker:
    def __init__(self):
        self.total_input_tokens  = 0
        self.total_output_tokens = 0
        self.total_calls         = 0
        self.model_usage         = {}

    def track(self, response, model: str):
        usage = response.usage
        self.total_input_tokens  += usage.prompt_tokens
        self.total_output_tokens += usage.completion_tokens
        self.total_calls         += 1

        if model not in self.model_usage:
            self.model_usage[model] = {'input': 0, 'output': 0, 'calls': 0}
        self.model_usage[model]['input']  += usage.prompt_tokens
        self.model_usage[model]['output'] += usage.completion_tokens
        self.model_usage[model]['calls']  += 1

    def report(self):
        print(f"Total API calls:    {self.total_calls}")
        print(f"Total input tokens: {self.total_input_tokens:,}")
        print(f"Total output tokens:{self.total_output_tokens:,}")
        total_cost = sum(
            estimate_cost(info['input'], info['output'], model)
            for model, info in self.model_usage.items()
        )
        print(f"Estimated cost:     ${total_cost:.4f}")

tracker = UsageTracker()

Error Handling and Rate Limits

import time
import random
from openai import RateLimitError, APIError, APIConnectionError

def robust_api_call(messages: list, model: str = "gpt-4o-mini",
                    max_retries: int = 3, **kwargs) -> str:
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                **kwargs
            )
            return response.choices[0].message.content

        except RateLimitError as e:
            if attempt < max_retries - 1:
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Retrying in {wait_time:.1f}s...")
                time.sleep(wait_time)
            else:
                raise

        except APIConnectionError as e:
            if attempt < max_retries - 1:
                print(f"Connection error. Retrying... ({attempt+1}/{max_retries})")
                time.sleep(2)
            else:
                raise

        except APIError as e:
            if e.status_code == 500 and attempt < max_retries - 1:
                print(f"Server error. Retrying...")
                time.sleep(1)
            else:
                raise

    raise Exception("Max retries exceeded")

# Usage
# result = robust_api_call(
#     messages=[{"role": "user", "content": "Hello"}],
#     model="gpt-4o-mini",
#     temperature=0.7
# )
print("Robust API call function ready with exponential backoff")

Complete Project: An AI Assistant With Tools and Memory

Bringing it all together. A production-ready assistant that remembers, uses tools, and handles errors.

import json
import time
from typing import List, Optional
from collections import deque

class AIAssistant:
    def __init__(
        self,
        name: str = "Assistant",
        system_prompt: str = "You are a helpful AI assistant.",
        model: str = "gpt-4o-mini",
        max_history: int = 20,
        tools: Optional[list] = None,
        temperature: float = 0.7
    ):
        self.client      = openai.OpenAI()
        self.name        = name
        self.model       = model
        self.temperature = temperature
        self.tools       = tools or []
        self.history     = deque(maxlen=max_history)
        self.system      = system_prompt
        self.usage       = {"calls": 0, "tokens": 0}

    def _get_messages(self) -> list:
        return [{"role": "system", "content": self.system}] + list(self.history)

    def _execute_tool(self, name: str, args: dict) -> str:
        # Override this method to add your own tools
        return json.dumps({"error": f"Tool '{name}' not implemented"})

    def chat(self, user_message: str, stream: bool = False) -> str:
        self.history.append({"role": "user", "content": user_message})

        kwargs = {
            "model":       self.model,
            "messages":    self._get_messages(),
            "temperature": self.temperature,
        }
        if self.tools:
            kwargs["tools"]        = self.tools
            kwargs["tool_choice"]  = "auto"
        if stream:
            kwargs["stream"] = True

        # Handle function calling loop
        while True:
            if stream and not self.tools:
                # Stream without tools
                response_text = ""
                stream_resp   = self.client.chat.completions.create(**kwargs)
                print(f"{self.name}: ", end="", flush=True)
                for chunk in stream_resp:
                    if chunk.choices[0].delta.content:
                        token = chunk.choices[0].delta.content
                        print(token, end="", flush=True)
                        response_text += token
                print()
                self.history.append({"role": "assistant", "content": response_text})
                return response_text

            # Non-streaming or tools
            response = self.client.chat.completions.create(**kwargs)
            message  = response.choices[0].message

            self.usage["calls"]  += 1
            self.usage["tokens"] += response.usage.total_tokens

            if not message.tool_calls:
                self.history.append({"role": "assistant", "content": message.content})
                return message.content

            # Process tool calls
            self.history.append(message)
            kwargs["messages"] = self._get_messages()

            for tool_call in message.tool_calls:
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)

                result = self._execute_tool(fn_name, fn_args)

                self.history.append({
                    "role":         "tool",
                    "tool_call_id": tool_call.id,
                    "content":      result
                })

            kwargs["messages"] = self._get_messages()

    def summarize_history(self) -> str:
        if not self.history:
            return "No conversation history."

        summary_prompt = f"Summarize this conversation in 2-3 sentences:\n\n"
        for msg in self.history:
            if isinstance(msg, dict) and msg.get('role') in ['user', 'assistant']:
                role = msg['role'].title()
                content = msg.get('content', '')
                if content:
                    summary_prompt += f"{role}: {content[:100]}...\n"

        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": summary_prompt}],
            max_tokens=150
        )
        return response.choices[0].message.content

    def clear(self):
        self.history.clear()

    def stats(self) -> dict:
        return {
            "name":    self.name,
            "model":   self.model,
            "calls":   self.usage["calls"],
            "tokens":  self.usage["tokens"],
            "history": len(self.history),
        }


# Concrete assistant with tools
class MLTutorAssistant(AIAssistant):
    def __init__(self):
        super().__init__(
            name="ML Tutor",
            system_prompt="""You are an expert ML tutor who has read 'How Machines Learn: A Complete Guide from Zero to AI Engineer'.
You teach clearly with concrete examples and code snippets.
You remember what the student has learned so far in this session.
When relevant, reference concepts from earlier in the conversation.""",
            model="gpt-4o-mini",
            max_history=20,
            tools=[
                {
                    "type": "function",
                    "function": {
                        "name": "get_code_example",
                        "description": "Get a working Python code example for an ML concept",
                        "parameters": {
                            "type": "object",
                            "properties": {
                                "concept": {"type": "string", "description": "The ML concept to get code for"},
                                "difficulty": {"type": "string", "enum": ["beginner", "intermediate", "advanced"]}
                            },
                            "required": ["concept"]
                        }
                    }
                }
            ]
        )

    def _execute_tool(self, name: str, args: dict) -> str:
        if name == "get_code_example":
            concept    = args.get("concept", "")
            difficulty = args.get("difficulty", "beginner")
            # In production: pull from a real code database
            example = {
                "concept":    concept,
                "difficulty": difficulty,
                "code": f"# Example: {concept}\nimport sklearn\n# ... working code here",
                "explanation": f"This code demonstrates {concept} at {difficulty} level."
            }
            return json.dumps(example)
        return json.dumps({"error": f"Unknown tool: {name}"})


# Demo the complete assistant
print("=" * 60)
print("ML Tutor Assistant Demo")
print("=" * 60)

tutor = MLTutorAssistant()

demo_questions = [
    "What is overfitting and how do I detect it?",
    "Can you give me a code example for that?",
    "How is this related to the bias-variance tradeoff?",
]

for question in demo_questions:
    print(f"\nStudent: {question}")
    answer = tutor.chat(question)
    print(f"Tutor: {answer[:200]}...")

print(f"\n{tutor.stats()}")

Cost Optimization Tips

cost_tips = {
    "1. Model selection": {
        "tip": "Use gpt-4o-mini for most tasks. Only upgrade to gpt-4o when quality is genuinely insufficient.",
        "saving": "95% cost reduction vs gpt-4o"
    },
    "2. Prompt caching": {
        "tip": "OpenAI automatically caches prompts > 1024 tokens. Long system prompts get cached.",
        "saving": "50% discount on cached tokens"
    },
    "3. Batch API": {
        "tip": "Use the Batch API for tasks that don't need real-time responses.",
        "saving": "50% discount vs real-time"
    },
    "4. Token counting": {
        "tip": "Count tokens before sending. Trim unnecessary context. Remove long system prompts when not needed.",
        "saving": "10-40% depending on your prompts"
    },
    "5. Max tokens": {
        "tip": "Always set max_tokens. Without it, the model can generate very long responses.",
        "saving": "Prevents runaway costs"
    },
    "6. Temperature for deterministic tasks": {
        "tip": "Use temperature=0 for classification, extraction, formatting. Deterministic = consistent = cacheable.",
        "saving": "Better cache hit rates"
    },
    "7. Local models for testing": {
        "tip": "Use Ollama or llama.cpp during development. Only hit the API when testing production behavior.",
        "saving": "90%+ during development"
    },
}

print("Cost Optimization Tips:")
for tip_name, info in cost_tips.items():
    print(f"\n{tip_name}")
    print(f"  Tip:    {info['tip']}")
    print(f"  Saving: {info['saving']}")

The Complete Journey: 100 Posts in One View

PHASE 1: Python That Actually Works (Posts 1-10)
  Variables, functions, OOP, error handling, file I/O

PHASE 2: Math for ML (Posts 11-20)
  Linear algebra, calculus, probability, statistics

PHASE 3: Data Wrangling Tools (Posts 27-39)
  NumPy, Pandas, Matplotlib, Seaborn, EDA

PHASE 4: SQL for Data (Posts 40-45)
  SELECT, JOINs, window functions, Python + SQL

PHASE 5: Dev Tools (Posts 46-50)
  Git, GitHub, Jupyter, Colab, virtual environments

PHASE 6: Machine Learning Core (Posts 51-71)
  Linear/logistic regression, trees, XGBoost, SVM,
  KNN, Naive Bayes, evaluation metrics, clustering,
  PCA, feature engineering, hyperparameter tuning

PHASE 7: Deep Learning (Posts 72-86)
  Neural networks, backprop, PyTorch, training loops,
  GPUs, CNNs, transfer learning, RNNs, autoencoders, GANs

PHASE 8: NLP and LLMs (Posts 87-100)
  Text preprocessing, tokenization, embeddings, attention,
  transformers, BERT, GPT, HuggingFace, fine-tuning,
  LoRA, vector search, RAG, chatbots, OpenAI API

What You Can Build Now

After 100 posts, you can build:

Classification systems for any domain
Regression models for prediction problems
Document intelligence pipelines with RAG
Custom chatbots with memory and tools
Image classification with CNNs
Fine-tuned domain models with LoRA
Semantic search engines
End-to-end ML pipelines with proper evaluation

The fundamentals don't change. Models come and go. APIs change. Architectures evolve. But gradient descent, overfitting, precision vs recall, the training loop, attention mechanisms: these ideas will still matter in 10 years.

You now understand them. Not just how to use them. Why they work.

Quick Cheat Sheet: OpenAI API

Task	Code
Basic chat	`client.chat.completions.create(model=..., messages=[...])`
System prompt	`{"role": "system", "content": "..."}` in messages
JSON output	`response_format={"type": "json_object"}`
Function calling	`tools=[...]`, `tool_choice="auto"`
Streaming	`stream=True`, iterate over chunks
Vision	Add `{"type": "image_url", "image_url": {"url": "..."}}` to content
Embeddings	`client.embeddings.create(input=text, model="text-embedding-3-small")`
Count tokens	`tiktoken.encoding_for_model(model).encode(text)`
Cost estimate	tokens / 1M * price_per_million
Retry on error	Catch `RateLimitError`, exponential backoff

Practice Challenges

Level 1:
Build a simple Q&A bot using the OpenAI API. Give it a custom system prompt that defines a persona. Test it with 10 questions and evaluate response quality.

Level 2:
Add function calling to the assistant. Define at least two tools: one that retrieves weather and one that searches Wikipedia. Verify the model correctly decides when to call each tool.

Level 3:
Build a complete AI assistant that combines: a custom system prompt, conversation memory (sliding window), RAG (ChromaDB with 20+ documents), at least 2 function tools, streaming output, and usage/cost tracking. Deploy it as a simple command-line chatbot.

References

This Is Post 100.

You started from zero. You learned Python, math, data wrangling, machine learning, deep learning, and large language models. You built classifiers, regressors, neural networks, transformers, RAG pipelines, and chatbots.

One hundred posts. One complete journey.

The field will keep moving. New architectures will appear. Better models will ship. Benchmarks will fall. But the person who understands why things work is never left behind by what comes next.

That's you now.

Go build something.

DEV Community

100. OpenAI API: Build With GPT-4 (Post 100: The Final Chapter)

What You'll Learn Here

Setup

Chat Completions: The Core Pattern

System Prompts: Controlling Model Behavior

Structured Output: JSON Mode

Function Calling: Giving GPT-4 Tools

Streaming: Word-by-Word Responses

Vision: Analyze Images With GPT-4V

Embeddings via API

Token Counting and Cost Management

Error Handling and Rate Limits

Complete Project: An AI Assistant With Tools and Memory

Cost Optimization Tips

The Complete Journey: 100 Posts in One View

What You Can Build Now

Quick Cheat Sheet: OpenAI API

Practice Challenges

References

This Is Post 100.

Top comments (0)