Akhilesh

Posted on May 18

88. The OpenAI API: Everything You Can Build

#ai #productivity #beginners #python

Every AI product you use is probably calling an API somewhere.

The chat assistant in your IDE. The customer service bot on a website. The document summarizer in your company's internal tools. The code reviewer. The email writer. Nearly all of them send text to a remote model, get text back, and display it to you.

OpenAI built the most widely used API for this. Not the only one. Not always the cheapest. But the one with the most ecosystem support, the most tutorials, the most integrations, and the API design that others have copied.

This post covers everything: chat completions, streaming, function calling, embeddings, image generation, speech, and the patterns that make production applications reliable.

Setup and First Call

from openai import OpenAI
import json
import time
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "your-key-here"))

response = client.chat.completions.create(
    model    = "gpt-3.5-turbo",
    messages = [
        {"role": "system",  "content": "You are a helpful assistant."},
        {"role": "user",    "content": "What is machine learning in one sentence?"}
    ],
    temperature = 0.7,
    max_tokens  = 150,
)

print("Basic chat completion:")
print(f"  Response: {response.choices[0].message.content}")
print()
print("Response object details:")
print(f"  model:              {response.model}")
print(f"  finish_reason:      {response.choices[0].finish_reason}")
print(f"  prompt_tokens:      {response.usage.prompt_tokens}")
print(f"  completion_tokens:  {response.usage.completion_tokens}")
print(f"  total_tokens:       {response.usage.total_tokens}")
print()

cost_per_1k = {"gpt-3.5-turbo": (0.0005, 0.0015), "gpt-4-turbo": (0.01, 0.03)}
model = "gpt-3.5-turbo"
in_cost  = response.usage.prompt_tokens     / 1000 * cost_per_1k[model][0]
out_cost = response.usage.completion_tokens / 1000 * cost_per_1k[model][1]
print(f"  Estimated cost: ${in_cost + out_cost:.6f}")

Models Available and When to Use Each

models = {
    "gpt-3.5-turbo": {
        "context":  "16K tokens",
        "in_cost":  "$0.50 / 1M tokens",
        "out_cost": "$1.50 / 1M tokens",
        "speed":    "very fast",
        "best_for": "Simple Q&A, classification, extraction, high-volume tasks"
    },
    "gpt-4o-mini": {
        "context":  "128K tokens",
        "in_cost":  "$0.15 / 1M tokens",
        "out_cost": "$0.60 / 1M tokens",
        "speed":    "fast",
        "best_for": "Most tasks — best price/performance in 2024"
    },
    "gpt-4o": {
        "context":  "128K tokens",
        "in_cost":  "$5.00 / 1M tokens",
        "out_cost": "$15.00 / 1M tokens",
        "speed":    "moderate",
        "best_for": "Complex reasoning, long documents, multimodal, code"
    },
    "gpt-4-turbo": {
        "context":  "128K tokens",
        "in_cost":  "$10.00 / 1M tokens",
        "out_cost": "$30.00 / 1M tokens",
        "speed":    "moderate",
        "best_for": "Highest capability tasks, legacy integration"
    },
}

print(f"{'Model':<15} {'Context':>10} {'Input cost':>14} {'Output cost':>14} {'Speed':>10}")
print("=" * 70)
for name, info in models.items():
    print(f"{name:<15} {info['context']:>10} {info['in_cost']:>14} "
          f"{info['out_cost']:>14} {info['speed']:>10}")

print()
print("Practical rule:")
print("  Default: gpt-4o-mini (excellent quality, lowest cost)")
print("  Complex reasoning: gpt-4o (worth the cost)")
print("  High volume, simple tasks: gpt-3.5-turbo (cheapest)")
print("  Check openai.com/pricing for updated costs (change frequently)")

Streaming Responses

print("Streaming: Show tokens as they are generated (faster perceived response):")
print()

stream = client.chat.completions.create(
    model  = "gpt-3.5-turbo",
    messages = [
        {"role": "user", "content": "List 5 key concepts in machine learning, briefly."}
    ],
    stream = True,
)

print("Streaming output:")
full_response = ""
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
        full_response += delta.content

print()
print()
print("Streaming patterns:")
print("  - Use for chat interfaces (user sees tokens appear, feels faster)")
print("  - Collect full response by accumulating chunks")
print("  - Handle finish_reason to detect end of stream")
print("  - Use try/finally to handle disconnects gracefully")

System Prompts: The Most Important Tool

def create_assistant(role, constraints, output_format=None):
    """Build a well-structured system prompt."""
    parts = [f"You are {role}."]
    if constraints:
        parts.append("Rules:")
        for constraint in constraints:
            parts.append(f"- {constraint}")
    if output_format:
        parts.append(f"Always respond in: {output_format}")
    return "\n".join(parts)

personas = {
    "Concise Technical Writer": create_assistant(
        role="a technical writer who values precision and brevity",
        constraints=[
            "Never use more than 3 sentences per answer",
            "Always use specific technical terms",
            "Provide one code example when relevant"
        ],
    ),
    "Socratic Tutor": create_assistant(
        role="a Socratic tutor who teaches through questions",
        constraints=[
            "Never give direct answers — only ask guiding questions",
            "Build on the student's own reasoning",
            "Acknowledge correct insights before probing further"
        ],
    ),
    "JSON Extractor": create_assistant(
        role="a data extraction assistant",
        constraints=[
            "Extract only what is explicitly stated in the input",
            "Use null for missing values",
            "Never infer or guess information"
        ],
        output_format="valid JSON only, no explanation, no markdown"
    ),
}

for name, prompt in personas.items():
    print(f"System prompt: {name}")
    print(f"  {prompt[:120]}...")
    print()

print("System prompt best practices:")
best_practices = [
    "Be explicit about role, constraints, and output format",
    "Use bullet points for rules (models follow them more reliably)",
    "Specify what NOT to do, not just what to do",
    "Include examples when the output format is complex",
    "Keep it concise — long system prompts dilute attention",
]
for p in best_practices:
    print(f"  • {p}")

Function Calling: Connecting LLMs to External Tools

print("\nFunction Calling: The Most Powerful OpenAI Feature")
print()
print("Without function calling: LLM can only talk.")
print("With function calling:    LLM can DO things.")
print()

tools = [
    {
        "type": "function",
        "function": {
            "name":        "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type":        "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type":        "string",
                        "enum":        ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name":        "search_database",
            "description": "Search company knowledge base",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type":        "string",
                        "description": "Search query"
                    },
                    "max_results": {
                        "type":        "integer",
                        "description": "Max results to return",
                        "default":     3
                    }
                },
                "required": ["query"]
            }
        }
    }
]

def execute_tool(tool_name, tool_args):
    """Simulate tool execution."""
    if tool_name == "get_weather":
        city = tool_args.get("city", "unknown")
        return json.dumps({"city": city, "temperature": 28, "condition": "sunny", "unit": "celsius"})
    elif tool_name == "search_database":
        return json.dumps({"results": [
            {"text": "Q3 revenue was $4.2M", "source": "Q3 Report"},
            {"text": "Premium plan costs $49/month", "source": "Pricing"}
        ]})
    return json.dumps({"error": "unknown tool"})

def run_with_tools(user_message, tools, verbose=True):
    """Complete tool-use loop."""
    messages = [{"role": "user", "content": user_message}]

    response = client.chat.completions.create(
        model   = "gpt-3.5-turbo",
        messages = messages,
        tools   = tools,
        tool_choice = "auto"
    )

    msg = response.choices[0].message

    if response.choices[0].finish_reason == "tool_calls":
        messages.append({"role": "assistant", "content": None,
                          "tool_calls": [tc.model_dump() for tc in msg.tool_calls]})

        for tool_call in msg.tool_calls:
            fn_name  = tool_call.function.name
            fn_args  = json.loads(tool_call.function.arguments)
            if verbose:
                print(f"  → Calling tool: {fn_name}({fn_args})")
            result = execute_tool(fn_name, fn_args)
            if verbose:
                print(f"  ← Tool result:  {result[:80]}")

            messages.append({
                "role":         "tool",
                "tool_call_id": tool_call.id,
                "content":      result
            })

        final = client.chat.completions.create(
            model=response.model, messages=messages)
        return final.choices[0].message.content

    return msg.content

test_queries = [
    "What's the weather like in Mumbai right now?",
    "What is our Q3 revenue?",
    "What is the capital of France?",
]

print("Function calling test:")
for query in test_queries:
    print(f"\nUser: {query}")
    answer = run_with_tools(query, tools, verbose=True)
    print(f"Bot:  {answer[:120]}")

Structured JSON Output

print("\nStructured Output: Reliable JSON from LLMs")
print()

response = client.chat.completions.create(
    model    = "gpt-3.5-turbo",
    messages = [
        {
            "role": "system",
            "content": "Extract information from the text. Respond with valid JSON only, "
                       "no markdown, no explanation. "
                       "Schema: {name: string, role: string, company: string, "
                       "skills: [string], years_experience: int|null}"
        },
        {
            "role": "user",
            "content": "John Smith is a Senior ML Engineer at Anthropic. "
                       "He has 8 years of experience and specializes in "
                       "transformer architectures, PyTorch, and distributed training."
        }
    ],
    temperature = 0,
)

raw_json = response.choices[0].message.content
parsed   = json.loads(raw_json)

print("Input: 'John Smith is a Senior ML Engineer at Anthropic...'")
print(f"Extracted JSON:")
print(json.dumps(parsed, indent=2))
print()
print("Using response_format for guaranteed JSON (gpt-4-turbo and above):")
print("  response_format={'type': 'json_object'}")
print("  Guarantees valid JSON output — no parsing errors")
print("  Still need schema in the system prompt")

Embeddings API

print("\nOpenAI Embeddings API:")
print()

texts = [
    "Machine learning learns patterns from data.",
    "Deep learning uses layered neural networks.",
    "The Eiffel Tower is in Paris, France.",
    "Artificial intelligence mimics human thinking.",
]

emb_response = client.embeddings.create(
    model = "text-embedding-3-small",
    input = texts,
)

embeddings = [item.embedding for item in emb_response.data]
print(f"Model:           text-embedding-3-small")
print(f"Dimensions:      {len(embeddings[0])}")
print(f"Total tokens:    {emb_response.usage.total_tokens}")
print(f"Texts embedded:  {len(embeddings)}")
print()

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

sim_matrix = cosine_similarity(embeddings)
print("Pairwise similarities:")
for i in range(len(texts)):
    for j in range(i+1, len(texts)):
        sim = sim_matrix[i][j]
        print(f"  {sim:.3f}  '{texts[i][:30]}...' ↔ '{texts[j][:30]}...'")

print()
print("Embedding models comparison:")
emb_models = {
    "text-embedding-3-small": ("1536 dims", "$0.02 / 1M tokens",  "Best for most use cases"),
    "text-embedding-3-large": ("3072 dims", "$0.13 / 1M tokens",  "Higher accuracy, bigger index"),
    "text-embedding-ada-002": ("1536 dims", "$0.10 / 1M tokens",  "Legacy, use 3-small instead"),
}
for name, (dims, cost, note) in emb_models.items():
    print(f"  {name:<30} {dims:<12} {cost:<22} {note}")

Image Generation: DALL-E 3

print("\nImage Generation with DALL-E 3:")
print()

image_response = client.images.generate(
    model   = "dall-e-3",
    prompt  = "A neural network visualized as a glowing network of nodes and connections, "
              "dark background, scientific illustration style, high quality",
    size    = "1024x1024",
    quality = "standard",
    n       = 1,
)

image_url = image_response.data[0].url
revised   = image_response.data[0].revised_prompt

print(f"Generated image URL: {image_url[:60]}...")
print(f"Revised prompt: {revised[:100]}...")
print()
print("DALL-E 3 vs DALL-E 2:")
dalle_models = {
    "dall-e-3": ("1024x1024 to 1792x1024", "Better quality, prompt following", "$0.040/image standard"),
    "dall-e-2": ("256 to 1024px",           "Faster, cheaper, less capable",   "$0.016/image 1024px"),
}
for name, (sizes, capability, cost) in dalle_models.items():
    print(f"  {name}: {sizes} | {capability} | {cost}")

Speech-to-Text (Whisper)

print("\nWhisper API: Speech to Text")
print()
print("Whisper is OpenAI's speech recognition model.")
print("Supports 100+ languages, extremely accurate.")
print()

whisper_example = """
import openai

client = openai.OpenAI()

# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model  = "whisper-1",
        file   = audio_file,
        language = "en",          # optional, auto-detect if omitted
        response_format = "text"  # "json", "srt", "vtt" also available
    )

print(transcript)  # Returns transcribed text

# Translate to English from any language
with open("hindi_audio.mp3", "rb") as f:
    translation = client.audio.translations.create(
        model = "whisper-1",
        file  = f
    )
print(translation.text)  # Always returns English
"""
print(whisper_example)
print("Cost: $0.006 per minute of audio")
print("Max file size: 25MB")
print("Supported formats: mp3, mp4, m4a, wav, webm, ogg")

Error Handling and Retry Logic

import time
from openai import RateLimitError, APIError, APIConnectionError

def robust_completion(messages, model="gpt-3.5-turbo",
                       max_retries=3, base_delay=1.0, **kwargs):
    """Production-grade completion with retry and error handling."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model, messages=messages, **kwargs)
            return response.choices[0].message.content

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = base_delay * (2 ** attempt)
            print(f"Rate limit hit. Waiting {wait:.1f}s... (attempt {attempt+1})")
            time.sleep(wait)

        except APIConnectionError as e:
            if attempt == max_retries - 1:
                raise
            print(f"Connection error. Retrying... (attempt {attempt+1})")
            time.sleep(base_delay)

        except APIError as e:
            if e.status_code == 500 and attempt < max_retries - 1:
                time.sleep(base_delay)
                continue
            raise

    return None

print("Error handling patterns:")
error_guide = {
    "RateLimitError":     "Too many requests. Implement exponential backoff.",
    "APIConnectionError": "Network issue. Retry with delay.",
    "AuthenticationError":"Invalid API key. Check OPENAI_API_KEY env var.",
    "BadRequestError":    "Invalid request (too long, bad format). Fix the request.",
    "APIError (500)":     "OpenAI server error. Retry a few times.",
}
for error, solution in error_guide.items():
    print(f"  {error:<25}: {solution}")

Cost Estimation and Monitoring

class CostTracker:
    """Track API costs across multiple calls."""

    PRICING = {
        "gpt-3.5-turbo":       (0.0005, 0.0015),
        "gpt-4o-mini":         (0.00015, 0.0006),
        "gpt-4o":              (0.005, 0.015),
        "text-embedding-3-small": (0.00002, 0),
    }

    def __init__(self):
        self.calls   = []
        self.total   = 0.0

    def record(self, model, prompt_tokens, completion_tokens):
        if model in self.PRICING:
            in_rate, out_rate = self.PRICING[model]
            cost = (prompt_tokens / 1000 * in_rate +
                    completion_tokens / 1000 * out_rate)
        else:
            cost = 0.0
        self.calls.append({
            "model":      model,
            "in_tokens":  prompt_tokens,
            "out_tokens": completion_tokens,
            "cost":       cost
        })
        self.total += cost
        return cost

    def summary(self):
        print(f"\nAPI Cost Summary:")
        print(f"  Total calls:  {len(self.calls)}")
        print(f"  Total tokens: {sum(c['in_tokens']+c['out_tokens'] for c in self.calls):,}")
        print(f"  Total cost:   ${self.total:.6f}")
        print(f"  Avg per call: ${self.total/len(self.calls):.6f}" if self.calls else "")

tracker = CostTracker()
tracker.record("gpt-3.5-turbo", 150, 80)
tracker.record("gpt-4o-mini",   200, 120)
tracker.record("gpt-4o-mini",   180, 90)
tracker.summary()

Reference Links

print("\nEssential OpenAI Reference Links:")
print()

refs = {
    "Official Documentation": [
        ("API Reference",             "platform.openai.com/docs/api-reference"),
        ("Cookbook (recipes)",        "cookbook.openai.com"),
        ("Prompt Engineering Guide",  "platform.openai.com/docs/guides/prompt-engineering"),
        ("Function Calling Guide",    "platform.openai.com/docs/guides/function-calling"),
        ("Rate Limits Guide",         "platform.openai.com/docs/guides/rate-limits"),
    ],
    "Models and Pricing": [
        ("Model Overview",            "platform.openai.com/docs/models"),
        ("Pricing Page",              "openai.com/pricing"),
        ("Tokenizer Tool",            "platform.openai.com/tokenizer"),
        ("Usage Dashboard",           "platform.openai.com/usage"),
    ],
    "Cheat Sheets and Tutorials": [
        ("OpenAI Python GitHub",      "github.com/openai/openai-python"),
        ("DeepLearning.AI ChatGPT API course", "learn.deeplearning.ai/chatgpt-prompt-eng"),
        ("Brex Prompt Engineering",   "github.com/brexhq/prompt-engineering"),
        ("Best practices for safety", "platform.openai.com/docs/guides/safety-best-practices"),
    ],
}

for category, links in refs.items():
    print(f"  {category}:")
    for name, url in links:
        print(f"    • {name:<40} {url}")
    print()

Try This

Create openai_practice.py.

Part 1: basic completions. Call GPT-3.5-turbo and GPT-4o-mini with the same prompt. Compare response quality, token usage, and estimated cost. Which model gives you better value for your use case?

Part 2: function calling. Define at least 3 tools (weather lookup, database search, calendar check). Implement mock versions that return fake data. Test with 5 queries: some should trigger tool calls, some should not. Verify the model picks the right tool.

Part 3: streaming interface. Build a simple command-line chat that streams responses character by character. Track total tokens used across the entire conversation. Print a cost estimate at the end.

Part 4: embedding + search. Use text-embedding-3-small to embed 30 sentences from a domain of your choice. Given a query, find the top 3 most similar sentences. Compare results to a keyword search on the same corpus. Where does semantic search win? Where does keyword search win?

What's Next

The OpenAI API covers GPT and DALL-E. The next post covers the Anthropic Claude API: different design philosophy, different strengths, and specific capabilities like the system prompt hierarchy, extended thinking, and very long context windows. After that, the Phase 8 capstone.

DEV Community