Every AI product you use is probably calling an API somewhere.
The chat assistant in your IDE. The customer service bot on a website. The document summarizer in your company's internal tools. The code reviewer. The email writer. Nearly all of them send text to a remote model, get text back, and display it to you.
OpenAI built the most widely used API for this. Not the only one. Not always the cheapest. But the one with the most ecosystem support, the most tutorials, the most integrations, and the API design that others have copied.
This post covers everything: chat completions, streaming, function calling, embeddings, image generation, speech, and the patterns that make production applications reliable.
Setup and First Call
from openai import OpenAI
import json
import time
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "your-key-here"))
response = client.chat.completions.create(
model = "gpt-3.5-turbo",
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning in one sentence?"}
],
temperature = 0.7,
max_tokens = 150,
)
print("Basic chat completion:")
print(f" Response: {response.choices[0].message.content}")
print()
print("Response object details:")
print(f" model: {response.model}")
print(f" finish_reason: {response.choices[0].finish_reason}")
print(f" prompt_tokens: {response.usage.prompt_tokens}")
print(f" completion_tokens: {response.usage.completion_tokens}")
print(f" total_tokens: {response.usage.total_tokens}")
print()
cost_per_1k = {"gpt-3.5-turbo": (0.0005, 0.0015), "gpt-4-turbo": (0.01, 0.03)}
model = "gpt-3.5-turbo"
in_cost = response.usage.prompt_tokens / 1000 * cost_per_1k[model][0]
out_cost = response.usage.completion_tokens / 1000 * cost_per_1k[model][1]
print(f" Estimated cost: ${in_cost + out_cost:.6f}")
Models Available and When to Use Each
models = {
"gpt-3.5-turbo": {
"context": "16K tokens",
"in_cost": "$0.50 / 1M tokens",
"out_cost": "$1.50 / 1M tokens",
"speed": "very fast",
"best_for": "Simple Q&A, classification, extraction, high-volume tasks"
},
"gpt-4o-mini": {
"context": "128K tokens",
"in_cost": "$0.15 / 1M tokens",
"out_cost": "$0.60 / 1M tokens",
"speed": "fast",
"best_for": "Most tasks — best price/performance in 2024"
},
"gpt-4o": {
"context": "128K tokens",
"in_cost": "$5.00 / 1M tokens",
"out_cost": "$15.00 / 1M tokens",
"speed": "moderate",
"best_for": "Complex reasoning, long documents, multimodal, code"
},
"gpt-4-turbo": {
"context": "128K tokens",
"in_cost": "$10.00 / 1M tokens",
"out_cost": "$30.00 / 1M tokens",
"speed": "moderate",
"best_for": "Highest capability tasks, legacy integration"
},
}
print(f"{'Model':<15} {'Context':>10} {'Input cost':>14} {'Output cost':>14} {'Speed':>10}")
print("=" * 70)
for name, info in models.items():
print(f"{name:<15} {info['context']:>10} {info['in_cost']:>14} "
f"{info['out_cost']:>14} {info['speed']:>10}")
print()
print("Practical rule:")
print(" Default: gpt-4o-mini (excellent quality, lowest cost)")
print(" Complex reasoning: gpt-4o (worth the cost)")
print(" High volume, simple tasks: gpt-3.5-turbo (cheapest)")
print(" Check openai.com/pricing for updated costs (change frequently)")
Streaming Responses
print("Streaming: Show tokens as they are generated (faster perceived response):")
print()
stream = client.chat.completions.create(
model = "gpt-3.5-turbo",
messages = [
{"role": "user", "content": "List 5 key concepts in machine learning, briefly."}
],
stream = True,
)
print("Streaming output:")
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
full_response += delta.content
print()
print()
print("Streaming patterns:")
print(" - Use for chat interfaces (user sees tokens appear, feels faster)")
print(" - Collect full response by accumulating chunks")
print(" - Handle finish_reason to detect end of stream")
print(" - Use try/finally to handle disconnects gracefully")
System Prompts: The Most Important Tool
def create_assistant(role, constraints, output_format=None):
"""Build a well-structured system prompt."""
parts = [f"You are {role}."]
if constraints:
parts.append("Rules:")
for constraint in constraints:
parts.append(f"- {constraint}")
if output_format:
parts.append(f"Always respond in: {output_format}")
return "\n".join(parts)
personas = {
"Concise Technical Writer": create_assistant(
role="a technical writer who values precision and brevity",
constraints=[
"Never use more than 3 sentences per answer",
"Always use specific technical terms",
"Provide one code example when relevant"
],
),
"Socratic Tutor": create_assistant(
role="a Socratic tutor who teaches through questions",
constraints=[
"Never give direct answers — only ask guiding questions",
"Build on the student's own reasoning",
"Acknowledge correct insights before probing further"
],
),
"JSON Extractor": create_assistant(
role="a data extraction assistant",
constraints=[
"Extract only what is explicitly stated in the input",
"Use null for missing values",
"Never infer or guess information"
],
output_format="valid JSON only, no explanation, no markdown"
),
}
for name, prompt in personas.items():
print(f"System prompt: {name}")
print(f" {prompt[:120]}...")
print()
print("System prompt best practices:")
best_practices = [
"Be explicit about role, constraints, and output format",
"Use bullet points for rules (models follow them more reliably)",
"Specify what NOT to do, not just what to do",
"Include examples when the output format is complex",
"Keep it concise — long system prompts dilute attention",
]
for p in best_practices:
print(f" • {p}")
Function Calling: Connecting LLMs to External Tools
print("\nFunction Calling: The Most Powerful OpenAI Feature")
print()
print("Without function calling: LLM can only talk.")
print("With function calling: LLM can DO things.")
print()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search company knowledge base",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"max_results": {
"type": "integer",
"description": "Max results to return",
"default": 3
}
},
"required": ["query"]
}
}
}
]
def execute_tool(tool_name, tool_args):
"""Simulate tool execution."""
if tool_name == "get_weather":
city = tool_args.get("city", "unknown")
return json.dumps({"city": city, "temperature": 28, "condition": "sunny", "unit": "celsius"})
elif tool_name == "search_database":
return json.dumps({"results": [
{"text": "Q3 revenue was $4.2M", "source": "Q3 Report"},
{"text": "Premium plan costs $49/month", "source": "Pricing"}
]})
return json.dumps({"error": "unknown tool"})
def run_with_tools(user_message, tools, verbose=True):
"""Complete tool-use loop."""
messages = [{"role": "user", "content": user_message}]
response = client.chat.completions.create(
model = "gpt-3.5-turbo",
messages = messages,
tools = tools,
tool_choice = "auto"
)
msg = response.choices[0].message
if response.choices[0].finish_reason == "tool_calls":
messages.append({"role": "assistant", "content": None,
"tool_calls": [tc.model_dump() for tc in msg.tool_calls]})
for tool_call in msg.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
if verbose:
print(f" → Calling tool: {fn_name}({fn_args})")
result = execute_tool(fn_name, fn_args)
if verbose:
print(f" ← Tool result: {result[:80]}")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
final = client.chat.completions.create(
model=response.model, messages=messages)
return final.choices[0].message.content
return msg.content
test_queries = [
"What's the weather like in Mumbai right now?",
"What is our Q3 revenue?",
"What is the capital of France?",
]
print("Function calling test:")
for query in test_queries:
print(f"\nUser: {query}")
answer = run_with_tools(query, tools, verbose=True)
print(f"Bot: {answer[:120]}")
Structured JSON Output
print("\nStructured Output: Reliable JSON from LLMs")
print()
response = client.chat.completions.create(
model = "gpt-3.5-turbo",
messages = [
{
"role": "system",
"content": "Extract information from the text. Respond with valid JSON only, "
"no markdown, no explanation. "
"Schema: {name: string, role: string, company: string, "
"skills: [string], years_experience: int|null}"
},
{
"role": "user",
"content": "John Smith is a Senior ML Engineer at Anthropic. "
"He has 8 years of experience and specializes in "
"transformer architectures, PyTorch, and distributed training."
}
],
temperature = 0,
)
raw_json = response.choices[0].message.content
parsed = json.loads(raw_json)
print("Input: 'John Smith is a Senior ML Engineer at Anthropic...'")
print(f"Extracted JSON:")
print(json.dumps(parsed, indent=2))
print()
print("Using response_format for guaranteed JSON (gpt-4-turbo and above):")
print(" response_format={'type': 'json_object'}")
print(" Guarantees valid JSON output — no parsing errors")
print(" Still need schema in the system prompt")
Embeddings API
print("\nOpenAI Embeddings API:")
print()
texts = [
"Machine learning learns patterns from data.",
"Deep learning uses layered neural networks.",
"The Eiffel Tower is in Paris, France.",
"Artificial intelligence mimics human thinking.",
]
emb_response = client.embeddings.create(
model = "text-embedding-3-small",
input = texts,
)
embeddings = [item.embedding for item in emb_response.data]
print(f"Model: text-embedding-3-small")
print(f"Dimensions: {len(embeddings[0])}")
print(f"Total tokens: {emb_response.usage.total_tokens}")
print(f"Texts embedded: {len(embeddings)}")
print()
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
sim_matrix = cosine_similarity(embeddings)
print("Pairwise similarities:")
for i in range(len(texts)):
for j in range(i+1, len(texts)):
sim = sim_matrix[i][j]
print(f" {sim:.3f} '{texts[i][:30]}...' ↔ '{texts[j][:30]}...'")
print()
print("Embedding models comparison:")
emb_models = {
"text-embedding-3-small": ("1536 dims", "$0.02 / 1M tokens", "Best for most use cases"),
"text-embedding-3-large": ("3072 dims", "$0.13 / 1M tokens", "Higher accuracy, bigger index"),
"text-embedding-ada-002": ("1536 dims", "$0.10 / 1M tokens", "Legacy, use 3-small instead"),
}
for name, (dims, cost, note) in emb_models.items():
print(f" {name:<30} {dims:<12} {cost:<22} {note}")
Image Generation: DALL-E 3
print("\nImage Generation with DALL-E 3:")
print()
image_response = client.images.generate(
model = "dall-e-3",
prompt = "A neural network visualized as a glowing network of nodes and connections, "
"dark background, scientific illustration style, high quality",
size = "1024x1024",
quality = "standard",
n = 1,
)
image_url = image_response.data[0].url
revised = image_response.data[0].revised_prompt
print(f"Generated image URL: {image_url[:60]}...")
print(f"Revised prompt: {revised[:100]}...")
print()
print("DALL-E 3 vs DALL-E 2:")
dalle_models = {
"dall-e-3": ("1024x1024 to 1792x1024", "Better quality, prompt following", "$0.040/image standard"),
"dall-e-2": ("256 to 1024px", "Faster, cheaper, less capable", "$0.016/image 1024px"),
}
for name, (sizes, capability, cost) in dalle_models.items():
print(f" {name}: {sizes} | {capability} | {cost}")
Speech-to-Text (Whisper)
print("\nWhisper API: Speech to Text")
print()
print("Whisper is OpenAI's speech recognition model.")
print("Supports 100+ languages, extremely accurate.")
print()
whisper_example = """
import openai
client = openai.OpenAI()
# Transcribe audio file
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model = "whisper-1",
file = audio_file,
language = "en", # optional, auto-detect if omitted
response_format = "text" # "json", "srt", "vtt" also available
)
print(transcript) # Returns transcribed text
# Translate to English from any language
with open("hindi_audio.mp3", "rb") as f:
translation = client.audio.translations.create(
model = "whisper-1",
file = f
)
print(translation.text) # Always returns English
"""
print(whisper_example)
print("Cost: $0.006 per minute of audio")
print("Max file size: 25MB")
print("Supported formats: mp3, mp4, m4a, wav, webm, ogg")
Error Handling and Retry Logic
import time
from openai import RateLimitError, APIError, APIConnectionError
def robust_completion(messages, model="gpt-3.5-turbo",
max_retries=3, base_delay=1.0, **kwargs):
"""Production-grade completion with retry and error handling."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model, messages=messages, **kwargs)
return response.choices[0].message.content
except RateLimitError as e:
if attempt == max_retries - 1:
raise
wait = base_delay * (2 ** attempt)
print(f"Rate limit hit. Waiting {wait:.1f}s... (attempt {attempt+1})")
time.sleep(wait)
except APIConnectionError as e:
if attempt == max_retries - 1:
raise
print(f"Connection error. Retrying... (attempt {attempt+1})")
time.sleep(base_delay)
except APIError as e:
if e.status_code == 500 and attempt < max_retries - 1:
time.sleep(base_delay)
continue
raise
return None
print("Error handling patterns:")
error_guide = {
"RateLimitError": "Too many requests. Implement exponential backoff.",
"APIConnectionError": "Network issue. Retry with delay.",
"AuthenticationError":"Invalid API key. Check OPENAI_API_KEY env var.",
"BadRequestError": "Invalid request (too long, bad format). Fix the request.",
"APIError (500)": "OpenAI server error. Retry a few times.",
}
for error, solution in error_guide.items():
print(f" {error:<25}: {solution}")
Cost Estimation and Monitoring
class CostTracker:
"""Track API costs across multiple calls."""
PRICING = {
"gpt-3.5-turbo": (0.0005, 0.0015),
"gpt-4o-mini": (0.00015, 0.0006),
"gpt-4o": (0.005, 0.015),
"text-embedding-3-small": (0.00002, 0),
}
def __init__(self):
self.calls = []
self.total = 0.0
def record(self, model, prompt_tokens, completion_tokens):
if model in self.PRICING:
in_rate, out_rate = self.PRICING[model]
cost = (prompt_tokens / 1000 * in_rate +
completion_tokens / 1000 * out_rate)
else:
cost = 0.0
self.calls.append({
"model": model,
"in_tokens": prompt_tokens,
"out_tokens": completion_tokens,
"cost": cost
})
self.total += cost
return cost
def summary(self):
print(f"\nAPI Cost Summary:")
print(f" Total calls: {len(self.calls)}")
print(f" Total tokens: {sum(c['in_tokens']+c['out_tokens'] for c in self.calls):,}")
print(f" Total cost: ${self.total:.6f}")
print(f" Avg per call: ${self.total/len(self.calls):.6f}" if self.calls else "")
tracker = CostTracker()
tracker.record("gpt-3.5-turbo", 150, 80)
tracker.record("gpt-4o-mini", 200, 120)
tracker.record("gpt-4o-mini", 180, 90)
tracker.summary()
Reference Links
print("\nEssential OpenAI Reference Links:")
print()
refs = {
"Official Documentation": [
("API Reference", "platform.openai.com/docs/api-reference"),
("Cookbook (recipes)", "cookbook.openai.com"),
("Prompt Engineering Guide", "platform.openai.com/docs/guides/prompt-engineering"),
("Function Calling Guide", "platform.openai.com/docs/guides/function-calling"),
("Rate Limits Guide", "platform.openai.com/docs/guides/rate-limits"),
],
"Models and Pricing": [
("Model Overview", "platform.openai.com/docs/models"),
("Pricing Page", "openai.com/pricing"),
("Tokenizer Tool", "platform.openai.com/tokenizer"),
("Usage Dashboard", "platform.openai.com/usage"),
],
"Cheat Sheets and Tutorials": [
("OpenAI Python GitHub", "github.com/openai/openai-python"),
("DeepLearning.AI ChatGPT API course", "learn.deeplearning.ai/chatgpt-prompt-eng"),
("Brex Prompt Engineering", "github.com/brexhq/prompt-engineering"),
("Best practices for safety", "platform.openai.com/docs/guides/safety-best-practices"),
],
}
for category, links in refs.items():
print(f" {category}:")
for name, url in links:
print(f" • {name:<40} {url}")
print()
Try This
Create openai_practice.py.
Part 1: basic completions. Call GPT-3.5-turbo and GPT-4o-mini with the same prompt. Compare response quality, token usage, and estimated cost. Which model gives you better value for your use case?
Part 2: function calling. Define at least 3 tools (weather lookup, database search, calendar check). Implement mock versions that return fake data. Test with 5 queries: some should trigger tool calls, some should not. Verify the model picks the right tool.
Part 3: streaming interface. Build a simple command-line chat that streams responses character by character. Track total tokens used across the entire conversation. Print a cost estimate at the end.
Part 4: embedding + search. Use text-embedding-3-small to embed 30 sentences from a domain of your choice. Given a query, find the top 3 most similar sentences. Compare results to a keyword search on the same corpus. Where does semantic search win? Where does keyword search win?
What's Next
The OpenAI API covers GPT and DALL-E. The next post covers the Anthropic Claude API: different design philosophy, different strengths, and specific capabilities like the system prompt hierarchy, extended thinking, and very long context windows. After that, the Phase 8 capstone.
Top comments (0)