<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ZNY</title>
    <description>The latest articles on DEV Community by ZNY (@zny10289).</description>
    <link>https://dev.to/zny10289</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3932912%2F5b9abaab-d1f8-4e3c-a902-22c85916ced5.png</url>
      <title>DEV Community: ZNY</title>
      <link>https://dev.to/zny10289</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zny10289"/>
    <language>en</language>
    <item>
      <title>Redis Caching for AI Applications: Reducing Latency and Cost</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:56:31 +0000</pubDate>
      <link>https://dev.to/zny10289/redis-caching-for-ai-applications-reducing-latency-and-cost-2m38</link>
      <guid>https://dev.to/zny10289/redis-caching-for-ai-applications-reducing-latency-and-cost-2m38</guid>
      <description>&lt;p&gt;AI API calls are expensive and slow. Redis caching dramatically reduces both by storing AI responses for reuse. Here's a complete implementation guide.&lt;/p&gt;

&lt;p&gt;Why Cache AI Responses?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without Cache&lt;/th&gt;
&lt;th&gt;With Cache&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Every request → AI API&lt;/td&gt;
&lt;td&gt;Cache hit → Return immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-3s latency per request&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms for cache hits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full API cost per request&lt;/td&gt;
&lt;td&gt;Pay only for cache misses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit pressure&lt;/td&gt;
&lt;td&gt;Rate limit relief&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Semantic Caching vs Exact Match&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
Exact match caching (simple)&lt;br&gt;
cache_key = hash(messages)  # Only matches identical prompts&lt;/p&gt;

&lt;p&gt;Semantic caching (smart)&lt;br&gt;
cachekey = generateembeddinghash(userprompt)  # Matches similar prompts&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;This guide covers exact match caching first, then semantic.&lt;/p&gt;

&lt;p&gt;Redis Setup&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import redis&lt;br&gt;
import json&lt;br&gt;
from typing import Optional&lt;/p&gt;

&lt;p&gt;class AICache:&lt;br&gt;
def init(self, redis_url: str = "redis://localhost:6379/0"):&lt;br&gt;
self.redis = redis.fromurl(redisurl, decode_responses=True)&lt;/p&gt;

&lt;p&gt;def makekey(self, messages: list[dict]) -&amp;gt; str:&lt;br&gt;
"""Create cache key from messages."""&lt;/p&gt;

&lt;h1&gt;
  
  
  Sort to ensure same messages always produce same key
&lt;/h1&gt;

&lt;p&gt;content = json.dumps(messages, sort_keys=True)&lt;br&gt;
import hashlib&lt;br&gt;
return f"ai:response:{hashlib.sha256(content.encode()).hexdigest()}"&lt;/p&gt;

&lt;p&gt;def get(self, messages: list[dict]) -&amp;gt; Optional[str]:&lt;br&gt;
"""Get cached response if exists."""&lt;br&gt;
key = self.makekey(messages)&lt;br&gt;
cached = self.redis.get(key)&lt;br&gt;
if cached:&lt;/p&gt;

&lt;h1&gt;
  
  
  Move to front (LRU-like behavior)
&lt;/h1&gt;

&lt;p&gt;self.redis.lrem("ai:recent", 1, key)&lt;br&gt;
self.redis.rpush("ai:recent", key)&lt;br&gt;
return cached&lt;br&gt;
return None&lt;/p&gt;

&lt;p&gt;def set(self, messages: list[dict], response: str, ttl: int = 86400):&lt;br&gt;
"""Cache a response."""&lt;br&gt;
key = self.makekey(messages)&lt;br&gt;
pipe = self.redis.pipeline()&lt;br&gt;
pipe.set(key, response, ex=ttl)&lt;br&gt;
pipe.lpush("ai:recent", key)&lt;br&gt;
pipe.ltrim("ai:recent", 0, 999)  # Keep last 1000 keys&lt;br&gt;
pipe.execute()&lt;/p&gt;

&lt;p&gt;def invalidate(self, messages: list[dict]):&lt;br&gt;
key = self.makekey(messages)&lt;br&gt;
self.redis.delete(key)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Using the Cache with AI Client&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class CachedAIClient:&lt;br&gt;
def init(self, api_key: str, cache: AICache):&lt;br&gt;
self.client = AsyncAIClient(api_key)&lt;br&gt;
self.cache = cache&lt;br&gt;
self.cache_ttl = 86400  # 24 hours&lt;/p&gt;

&lt;p&gt;async def chat(self, messages: list[dict], use_cache: bool = True) -&amp;gt; str:&lt;/p&gt;

&lt;h1&gt;
  
  
  Try cache first
&lt;/h1&gt;

&lt;p&gt;if use_cache:&lt;br&gt;
cached = self.cache.get(messages)&lt;br&gt;
if cached:&lt;br&gt;
print("Cache hit!")&lt;br&gt;
return cached&lt;/p&gt;

&lt;h1&gt;
  
  
  Cache miss - call API
&lt;/h1&gt;

&lt;p&gt;response = await self.client.chat(messages)&lt;/p&gt;

&lt;h1&gt;
  
  
  Store in cache
&lt;/h1&gt;

&lt;p&gt;if use_cache:&lt;br&gt;
self.cache.set(messages, response, self.cache_ttl)&lt;/p&gt;

&lt;p&gt;return response&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Intelligent Cache Invalidation&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class IntelligentCache(AICache):&lt;br&gt;
"""&lt;br&gt;
Cache that handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;TTL expiration&lt;/li&gt;
&lt;li&gt;Manual invalidation&lt;/li&gt;
&lt;li&gt;LRU eviction&lt;/li&gt;
&lt;li&gt;Cache statistics
"""&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;def init(self, redisurl: str, maxsize: int = 10000):&lt;br&gt;
super().init(redis_url)&lt;br&gt;
self.maxsize = maxsize&lt;br&gt;
self.checksize()&lt;/p&gt;

&lt;p&gt;def checksize(self):&lt;br&gt;
"""Ensure cache doesn't exceed max size."""&lt;br&gt;
size = self.redis.scard("ai:cache:keys")&lt;br&gt;
if size &amp;gt;= self.max_size:&lt;/p&gt;

&lt;h1&gt;
  
  
  Remove oldest 10%
&lt;/h1&gt;

&lt;p&gt;toremove = self.maxsize // 10&lt;br&gt;
oldest = self.redis.lrange("ai:cache:order", 0, to_remove - 1)&lt;br&gt;
pipe = self.redis.pipeline()&lt;br&gt;
for key in oldest:&lt;br&gt;
pipe.delete(f"ai:response:{key}")&lt;br&gt;
pipe.delete("ai:cache:order")&lt;br&gt;
pipe.execute()&lt;/p&gt;

&lt;h1&gt;
  
  
  Re-add remaining in order
&lt;/h1&gt;

&lt;p&gt;remaining = self.redis.lrange("ai:cache:order", to_remove, -1)&lt;br&gt;
for key in reversed(remaining):&lt;br&gt;
self.redis.lpush("ai:cache:order", key)&lt;/p&gt;

&lt;p&gt;def get_stats(self) -&amp;gt; dict:&lt;br&gt;
"""Return cache statistics."""&lt;br&gt;
info = self.redis.info("stats")&lt;br&gt;
return {&lt;br&gt;
"total_keys": self.redis.scard("ai:cache:keys"),&lt;br&gt;
"hits": info.get("keyspace_hits", 0),&lt;br&gt;
"misses": info.get("keyspace_misses", 0),&lt;br&gt;
"hit_rate": (&lt;br&gt;
info.get("keyspace_hits", 0) /&lt;br&gt;
max(info.get("keyspacehits", 0) + info.get("keyspacemisses", 1), 1)&lt;br&gt;
) * 100&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Cache Warming Strategy&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
async def warmcache(client: CachedAIClient, commonprompts: list[str]):&lt;br&gt;
"""&lt;br&gt;
Pre-populate cache with common prompts.&lt;br&gt;
Run at startup or during off-peak hours.&lt;br&gt;
"""&lt;br&gt;
print(f"Warming cache with {len(common_prompts)} prompts...")&lt;/p&gt;

&lt;p&gt;for i, prompt in enumerate(common_prompts):&lt;br&gt;
try:&lt;br&gt;
await client.chat(&lt;br&gt;
[{"role": "user", "content": prompt}],&lt;br&gt;
use_cache=True&lt;br&gt;
)&lt;br&gt;
if (i + 1) % 10 == 0:&lt;br&gt;
print(f"  Warmed {i + 1}/{len(common_prompts)}")&lt;br&gt;
except Exception as e:&lt;br&gt;
print(f"  Failed on prompt {i}: {e}")&lt;/p&gt;

&lt;p&gt;print("Cache warming complete!")&lt;/p&gt;

&lt;p&gt;Example common prompts&lt;br&gt;
COMMON_PROMPTS = [&lt;br&gt;
"Explain async/await in Python",&lt;br&gt;
"How do I use list comprehensions?",&lt;br&gt;
"What is a context manager?",&lt;/p&gt;

&lt;h1&gt;
  
  
  ... your most common queries
&lt;/h1&gt;

&lt;p&gt;]&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Production Deployment&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
docker-compose.yml&lt;br&gt;
version: '3.8'&lt;br&gt;
services:&lt;br&gt;
api:&lt;br&gt;
build: .&lt;br&gt;
depends_on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redis
environment:&lt;/li&gt;
&lt;li&gt;REDIS_URL=redis://redis:6379/0&lt;/li&gt;
&lt;li&gt;OFOXAPIKEY=${OFOXAPIKEY}&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;redis:&lt;br&gt;
image: redis:7-alpine&lt;br&gt;
volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redis-data:/data
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;br&gt;
redis-data:&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Cache Hit Rate Monitoring&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
from prometheus_client import Counter, Gauge&lt;/p&gt;

&lt;p&gt;cachehits = Counter('aicache_hits', 'Number of cache hits')&lt;br&gt;
cachemisses = Counter('aicache_misses', 'Number of cache misses')&lt;br&gt;
cachelatency = Gauge('aicachelatencyseconds', 'Cache operation latency')&lt;/p&gt;

&lt;p&gt;@app.middleware("http")&lt;br&gt;
async def cachemetricsmiddleware(request: Request, call_next):&lt;br&gt;
if "/chat" in str(request.url):&lt;br&gt;
start = time.time()&lt;br&gt;
response = await call_next(request)&lt;br&gt;
cache_latency.set(time.time() - start)&lt;br&gt;
return response&lt;br&gt;
return await call_next(request)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Cost Savings Calculator&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
def calculate_savings(&lt;br&gt;
requestsperday: int,&lt;br&gt;
cachehitrate: float,&lt;br&gt;
apicostper1ktokens: float = 0.003,&lt;br&gt;
avgtokensper_request: int = 500&lt;br&gt;
):&lt;br&gt;
dailyrequests = requestsper_day&lt;br&gt;
cachehits = int(dailyrequests * cachehitrate)&lt;br&gt;
cachemisses = dailyrequests - cache_hits&lt;/p&gt;

&lt;h1&gt;
  
  
  Full cost for misses, free for hits
&lt;/h1&gt;

&lt;p&gt;totalcost = (cachemisses  avgtokensperrequest / 1000)  apicostper1k_tokens&lt;/p&gt;

&lt;p&gt;nocachecost = (dailyrequests  avgtokensperrequest / 1000)  apicostper1ktokens&lt;/p&gt;

&lt;p&gt;savings = nocachecost - total_cost&lt;/p&gt;

&lt;p&gt;return {&lt;br&gt;
"requests": daily_requests,&lt;br&gt;
"cachehitrate": f"{cachehitrate * 100:.1f}%",&lt;br&gt;
"dailycost": f"${totalcost:.2f}",&lt;br&gt;
"daily_savings": f"${savings:.2f}",&lt;br&gt;
"monthly_savings": f"${savings * 30:.2f}"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Example: 10000 requests/day, 70% cache hit rate&lt;br&gt;
print(calculate_savings(10000, 0.70))&lt;br&gt;
{'requests': 10000, 'cachehitrate': '70.0%',&lt;br&gt;
'dailycost': '$4.50', 'dailysavings': '$10.50',&lt;br&gt;
'monthly_savings': '$315.00'}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Implement Redis caching for your AI applications with ofox.ai — their reliable API makes caching easy since responses are consistent and deterministic.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: redis,caching,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Async Python for AI: Building High-Concurrency AI Applications</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:51:19 +0000</pubDate>
      <link>https://dev.to/zny10289/async-python-for-ai-building-high-concurrency-ai-applications-16a6</link>
      <guid>https://dev.to/zny10289/async-python-for-ai-building-high-concurrency-ai-applications-16a6</guid>
      <description>&lt;p&gt;AI API calls are I/O-bound — you're waiting on network responses. Async Python lets you run many AI requests concurrently, dramatically improving throughput. Here's how to build high-concurrency AI applications.&lt;/p&gt;

&lt;p&gt;Why Async for AI?&lt;/p&gt;

&lt;p&gt;A single AI API call might take 1-3 seconds. If you process 100 requests sequentially, that's 100-300 seconds. With async concurrency, you can process all 100 in seconds.&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
import httpx&lt;/p&gt;

&lt;p&gt;Sequential (slow)&lt;br&gt;
async def process_sequential(requests):&lt;br&gt;
results = []&lt;br&gt;
for req in requests:&lt;br&gt;
result = await call_ai(req)  # 2 seconds each&lt;br&gt;
results.append(result)&lt;br&gt;
return results&lt;br&gt;
Total: len(requests) × 2 seconds&lt;/p&gt;

&lt;p&gt;Concurrent (fast)&lt;br&gt;
async def process_concurrent(requests):&lt;br&gt;
tasks = [call_ai(req) for req in requests]&lt;br&gt;
results = await asyncio.gather(*tasks)&lt;br&gt;
return results&lt;br&gt;
Total: ~2 seconds total (all run in parallel)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Basic Async AI Client&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
import httpx&lt;br&gt;
from typing import Optional&lt;/p&gt;

&lt;p&gt;class AsyncAIClient:&lt;br&gt;
def init(self, apikey: str, baseurl: str = "&lt;a href="https://api.ofox.ai/v1%22):" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1"):&lt;/a&gt;&lt;br&gt;
self.apikey = apikey&lt;br&gt;
self.baseurl = baseurl&lt;br&gt;
self._client: Optional[httpx.AsyncClient] = None&lt;/p&gt;

&lt;p&gt;async def aenter(self):&lt;br&gt;
self._client = httpx.AsyncClient(&lt;br&gt;
headers={"Authorization": f"Bearer {self.api_key}"},&lt;br&gt;
timeout=120.0&lt;br&gt;
)&lt;br&gt;
return self&lt;/p&gt;

&lt;p&gt;async def aexit(self, *args):&lt;br&gt;
if self._client:&lt;br&gt;
await self._client.aclose()&lt;/p&gt;

&lt;p&gt;async def chat(self, messages: list[dict], kwargs) -&amp;gt; str:&lt;br&gt;
response = await self._client.post(&lt;br&gt;
f"{self.base_url}/chat/completions",&lt;br&gt;
json={&lt;br&gt;
"model": kwargs.get("model", "claude-3-5-sonnet-20241022"),&lt;br&gt;
"messages": messages,&lt;br&gt;
"maxtokens": kwargs.get("maxtokens", 1024),&lt;br&gt;
"temperature": kwargs.get("temperature", 0.7)&lt;br&gt;
}&lt;br&gt;
)&lt;br&gt;
response.raiseforstatus()&lt;br&gt;
data = response.json()&lt;br&gt;
return data["choices"][0]["message"]["content"]&lt;/p&gt;

&lt;p&gt;async def chat_stream(self, messages: list[dict], kwargs):&lt;br&gt;
"""Streaming version for real-time output."""&lt;br&gt;
async with self._client.stream(&lt;br&gt;
"POST",&lt;br&gt;
f"{self.base_url}/chat/completions",&lt;br&gt;
json={&lt;br&gt;
"model": kwargs.get("model", "claude-3-5-sonnet-20241022"),&lt;br&gt;
"messages": messages,&lt;br&gt;
"stream": True,&lt;br&gt;
"maxtokens": kwargs.get("maxtokens", 1024)&lt;br&gt;
}&lt;br&gt;
) as response:&lt;br&gt;
async for line in response.aiter_lines():&lt;br&gt;
if line.startswith("data: "):&lt;br&gt;
data = line[6:]&lt;br&gt;
if data == "[DONE]":&lt;br&gt;
break&lt;br&gt;
yield data&lt;/p&gt;

&lt;p&gt;Usage&lt;br&gt;
async def main():&lt;br&gt;
async with AsyncAIClient("your-api-key") as client:&lt;br&gt;
response = await client.chat([&lt;br&gt;
{"role": "user", "content": "Hello, Claude!"}&lt;br&gt;
])&lt;br&gt;
print(response)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Concurrent Batch Processing&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
from dataclasses import dataclass&lt;br&gt;
from typing import List&lt;/p&gt;

&lt;p&gt;@dataclass&lt;br&gt;
class BatchItem:&lt;br&gt;
id: str&lt;br&gt;
prompt: str&lt;br&gt;
metadata: dict = None&lt;/p&gt;

&lt;p&gt;@dataclass&lt;br&gt;
class BatchResult:&lt;br&gt;
id: str&lt;br&gt;
success: bool&lt;br&gt;
response: str = ""&lt;br&gt;
error: str = ""&lt;/p&gt;

&lt;p&gt;async def process_batch(&lt;br&gt;
items: List[BatchItem],&lt;br&gt;
client: AsyncAIClient,&lt;br&gt;
max_concurrency: int = 10&lt;br&gt;
) -&amp;gt; List[BatchResult]:&lt;br&gt;
"""&lt;br&gt;
Process items in batches with controlled concurrency.&lt;br&gt;
"""&lt;br&gt;
semaphore = asyncio.Semaphore(max_concurrency)&lt;/p&gt;

&lt;p&gt;async def process_one(item: BatchItem) -&amp;gt; BatchResult:&lt;br&gt;
async with semaphore:&lt;br&gt;
try:&lt;br&gt;
response = await client.chat([&lt;br&gt;
{"role": "user", "content": item.prompt}&lt;br&gt;
])&lt;br&gt;
return BatchResult(id=item.id, success=True, response=response)&lt;br&gt;
except Exception as e:&lt;br&gt;
return BatchResult(id=item.id, success=False, error=str(e))&lt;/p&gt;

&lt;p&gt;tasks = [process_one(item) for item in items]&lt;br&gt;
results = await asyncio.gather(*tasks)&lt;br&gt;
return list(results)&lt;/p&gt;

&lt;p&gt;Usage&lt;br&gt;
async def main():&lt;br&gt;
items = [&lt;br&gt;
BatchItem(id=f"req-{i}", prompt=f"Process item {i}")&lt;br&gt;
for i in range(100)&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;async with AsyncAIClient("your-api-key") as client:&lt;br&gt;
results = await processbatch(items, client, maxconcurrency=20)&lt;/p&gt;

&lt;p&gt;successes = [r for r in results if r.success]&lt;br&gt;
failures = [r for r in results if not r.success]&lt;/p&gt;

&lt;p&gt;print(f"Completed: {len(successes)}/{len(items)}")&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Rate-Limited Concurrency&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
import time&lt;/p&gt;

&lt;p&gt;class RateLimiter:&lt;br&gt;
"""Token bucket rate limiter for API calls."""&lt;/p&gt;

&lt;p&gt;def init(self, callsperminute: int = 60):&lt;br&gt;
self.callsperminute = callsperminute&lt;br&gt;
self.interval = 60.0 / callsperminute&lt;br&gt;
self.last_call = 0.0&lt;br&gt;
self._lock = asyncio.Lock()&lt;/p&gt;

&lt;p&gt;async def acquire(self):&lt;br&gt;
async with self._lock:&lt;br&gt;
now = time.time()&lt;br&gt;
waittime = self.lastcall + self.interval - now&lt;br&gt;
if wait_time &amp;gt; 0:&lt;br&gt;
await asyncio.sleep(wait_time)&lt;br&gt;
self.last_call = time.time()&lt;/p&gt;

&lt;p&gt;async def ratelimitedprocessing(items: List[BatchItem], client: AsyncAIClient):&lt;br&gt;
limiter = RateLimiter(callsperminute=60)  # 60 RPM&lt;/p&gt;

&lt;p&gt;async def process(item: BatchItem) -&amp;gt; BatchResult:&lt;br&gt;
await limiter.acquire()  # Wait if needed&lt;br&gt;
try:&lt;br&gt;
response = await client.chat([{"role": "user", "content": item.prompt}])&lt;br&gt;
return BatchResult(id=item.id, success=True, response=response)&lt;br&gt;
except Exception as e:&lt;br&gt;
return BatchResult(id=item.id, success=False, error=str(e))&lt;/p&gt;

&lt;p&gt;tasks = [process(item) for item in items]&lt;br&gt;
return await asyncio.gather(*tasks)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Retry with Exponential Backoff&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
from typing import TypeVar&lt;/p&gt;

&lt;p&gt;T = TypeVar('T')&lt;/p&gt;

&lt;p&gt;async def retrywithbackoff(&lt;br&gt;
fn,&lt;br&gt;
max_retries: int = 3,&lt;br&gt;
base_delay: float = 1.0,&lt;br&gt;
max_delay: float = 60.0&lt;br&gt;
) -&amp;gt; T:&lt;br&gt;
for attempt in range(max_retries):&lt;br&gt;
try:&lt;br&gt;
return await fn()&lt;br&gt;
except Exception as e:&lt;br&gt;
if attempt == max_retries - 1:&lt;br&gt;
raise&lt;/p&gt;

&lt;p&gt;delay = min(basedelay  (2 * attempt), maxdelay)&lt;/p&gt;

&lt;h1&gt;
  
  
  Check if error is retryable
&lt;/h1&gt;

&lt;p&gt;if hasattr(e, 'response') and e.response:&lt;br&gt;
status = e.response.status_code&lt;br&gt;
if status not in (429, 500, 502, 503, 504):&lt;br&gt;
raise  # Don't retry client errors&lt;/p&gt;

&lt;p&gt;print(f"Attempt {attempt + 1} failed, retrying in {delay}s: {e}")&lt;br&gt;
await asyncio.sleep(delay)&lt;/p&gt;

&lt;p&gt;raise RuntimeError("Unreachable")&lt;/p&gt;

&lt;p&gt;Usage&lt;br&gt;
async def robust_chat(client: AsyncAIClient, messages: list[dict]) -&amp;gt; str:&lt;br&gt;
async def call():&lt;br&gt;
return await client.chat(messages)&lt;/p&gt;

&lt;p&gt;return await retrywithbackoff(call, max_retries=3)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building a Production AI Queue&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import asyncio&lt;br&gt;
from collections import deque&lt;br&gt;
from dataclasses import dataclass&lt;br&gt;
import uuid&lt;/p&gt;

&lt;p&gt;@dataclass&lt;br&gt;
class AIJob:&lt;br&gt;
id: str&lt;br&gt;
messages: list[dict]&lt;br&gt;
future: asyncio.Future&lt;/p&gt;

&lt;p&gt;class AsyncAIQueue:&lt;br&gt;
def init(self, client: AsyncAIClient, workers: int = 5):&lt;br&gt;
self.client = client&lt;br&gt;
self.workers = workers&lt;br&gt;
self.queue: asyncio.Queue[AIJob] = asyncio.Queue()&lt;br&gt;
self.results: dict[str, str] = {}&lt;/p&gt;

&lt;p&gt;async def worker(self):&lt;br&gt;
while True:&lt;br&gt;
job = await self.queue.get()&lt;br&gt;
try:&lt;br&gt;
response = await retrywithbackoff(&lt;br&gt;
lambda: self.client.chat(job.messages)&lt;br&gt;
)&lt;br&gt;
self.results[job.id] = response&lt;br&gt;
job.future.set_result(response)&lt;br&gt;
except Exception as e:&lt;br&gt;
job.future.set_exception(e)&lt;br&gt;
finally:&lt;br&gt;
self.queue.task_done()&lt;/p&gt;

&lt;p&gt;async def start(self):&lt;br&gt;
self.worker_tasks = [&lt;br&gt;
asyncio.create_task(self.worker())&lt;br&gt;
for _ in range(self.workers)&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;async def submit(self, messages: list[dict]) -&amp;gt; str:&lt;br&gt;
future = asyncio.geteventloop().create_future()&lt;br&gt;
job = AIJob(id=str(uuid.uuid4()), messages=messages, future=future)&lt;br&gt;
await self.queue.put(job)&lt;br&gt;
return job.id&lt;/p&gt;

&lt;p&gt;async def getresult(self, jobid: str, timeout: float = 120.0) -&amp;gt; str:&lt;/p&gt;

&lt;h1&gt;
  
  
  Poll for result
&lt;/h1&gt;

&lt;p&gt;start = time.time()&lt;br&gt;
while time.time() - start &amp;lt; timeout:&lt;br&gt;
if job_id in self.results:&lt;br&gt;
return self.results.pop(job_id)&lt;br&gt;
await asyncio.sleep(0.1)&lt;br&gt;
raise TimeoutError(f"Job {job_id} timed out")&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build high-concurrency AI applications with ofox.ai — their reliable API supports async patterns with competitive pricing for production workloads.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: python,async,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>API Security Best Practices for AI Applications in 2026</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:46:07 +0000</pubDate>
      <link>https://dev.to/zny10289/api-security-best-practices-for-ai-applications-in-2026-16dg</link>
      <guid>https://dev.to/zny10289/api-security-best-practices-for-ai-applications-in-2026-16dg</guid>
      <description>&lt;p&gt;AI applications face unique security challenges. Beyond traditional API vulnerabilities, AI APIs expose new attack surfaces: prompt injection, data leakage, and model manipulation. Here's how to secure your AI-powered systems.&lt;/p&gt;

&lt;p&gt;The AI Security Landscape&lt;/p&gt;

&lt;p&gt;AI APIs introduce attack vectors traditional APIs don't have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt injection — Malicious input that manipulates AI behavior&lt;/li&gt;
&lt;li&gt;Data exfiltration — AI accidentally leaking sensitive context&lt;/li&gt;
&lt;li&gt;Token exhaustion — attackers exhausting your quota&lt;/li&gt;
&lt;li&gt;Model extraction — Repeated queries to reverse-engineer the model&lt;/li&gt;
&lt;li&gt;Context poisoning — Injecting malicious context into conversations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Input Validation and Sanitization&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import re&lt;br&gt;
from typing import Optional&lt;/p&gt;

&lt;p&gt;class InputSanitizer:&lt;/p&gt;

&lt;h1&gt;
  
  
  Block common prompt injection patterns
&lt;/h1&gt;

&lt;p&gt;BLOCKED_PATTERNS = [&lt;br&gt;
r'ignore\s+previous\s+instructions',&lt;br&gt;
r'ignore\s+all\s+previous',&lt;br&gt;
r'system\s:\s',&lt;br&gt;
r'you\s+are\s+a\s+different',&lt;br&gt;
r'forget\s+everything',&lt;br&gt;
r'#\s*roleplay',&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;MAX_LENGTH = 10000  # Max 10k characters&lt;br&gt;
MAXTOKENSESTIMATE = MAX_LENGTH // 4  # ~2500 tokens&lt;/p&gt;

&lt;p&gt;@classmethod&lt;br&gt;
def sanitize(cls, user_input: str) -&amp;gt; tuple[bool, Optional[str], str]:&lt;br&gt;
"""&lt;br&gt;
Returns: (issafe, reason, sanitizedinput)&lt;br&gt;
"""&lt;/p&gt;

&lt;h1&gt;
  
  
  Check length
&lt;/h1&gt;

&lt;p&gt;if len(userinput) &amp;gt; cls.MAXLENGTH:&lt;br&gt;
return False, f"Input exceeds {cls.MAXLENGTH} chars", userinput[:cls.MAX_LENGTH]&lt;/p&gt;

&lt;h1&gt;
  
  
  Check for blocked patterns
&lt;/h1&gt;

&lt;p&gt;for pattern in cls.BLOCKED_PATTERNS:&lt;br&gt;
if re.search(pattern, user_input, re.IGNORECASE):&lt;br&gt;
return False, "Blocked pattern detected", ""&lt;/p&gt;

&lt;h1&gt;
  
  
  Strip control characters
&lt;/h1&gt;

&lt;p&gt;sanitized = re.sub(r'[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f]', '', user_input)&lt;/p&gt;

&lt;p&gt;return True, None, sanitized&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Rate Limiting&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
from fastapi import FastAPI, Request, HTTPException&lt;br&gt;
from fastapi.responses import JSONResponse&lt;br&gt;
import time&lt;br&gt;
from collections import defaultdict&lt;/p&gt;

&lt;p&gt;app = FastAPI()&lt;/p&gt;

&lt;p&gt;Simple in-memory rate limiter&lt;br&gt;
class RateLimiter:&lt;br&gt;
def init(self, requestsperminute: int = 60):&lt;br&gt;
self.requestsperminute = requestsperminute&lt;br&gt;
self.requests = defaultdict(list)&lt;/p&gt;

&lt;p&gt;def isallowed(self, clientid: str) -&amp;gt; bool:&lt;br&gt;
now = time.time()&lt;br&gt;
minute_ago = now - 60&lt;/p&gt;

&lt;h1&gt;
  
  
  Clean old entries
&lt;/h1&gt;

&lt;p&gt;self.requests[client_id] = [&lt;br&gt;
t for t in self.requests[clientid] if t &amp;gt; minuteago&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;if len(self.requests[clientid]) &amp;gt;= self.requestsper_minute:&lt;br&gt;
return False&lt;/p&gt;

&lt;p&gt;self.requests[client_id].append(now)&lt;br&gt;
return True&lt;/p&gt;

&lt;p&gt;ratelimiter = RateLimiter(requestsper_minute=60)&lt;/p&gt;

&lt;p&gt;@app.middleware("http")&lt;br&gt;
async def ratelimitmiddleware(request: Request, call_next):&lt;br&gt;
client_id = request.client.host  # Or use API key&lt;br&gt;
if not ratelimiter.isallowed(client_id):&lt;br&gt;
return JSONResponse(&lt;br&gt;
status_code=429,&lt;br&gt;
content={"error": "Rate limit exceeded"}&lt;br&gt;
)&lt;br&gt;
response = await call_next(request)&lt;br&gt;
return response&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;API Key Security&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import os&lt;br&gt;
import hashlib&lt;br&gt;
import hmac&lt;br&gt;
from typing import Optional&lt;/p&gt;

&lt;p&gt;class APIKeyManager:&lt;br&gt;
"""&lt;br&gt;
Never store raw API keys. Always hash them.&lt;br&gt;
"""&lt;br&gt;
def init(self):&lt;br&gt;
self.keystore = {}  # In production, use a proper database&lt;/p&gt;

&lt;p&gt;def createkey(self, userid: str, scopes: list[str]) -&amp;gt; str:&lt;br&gt;
import secrets&lt;br&gt;
apikey = f"ofox{secrets.token_urlsafe(32)}"&lt;br&gt;
keyhash = self.hashkey(apikey)&lt;/p&gt;

&lt;p&gt;self.keystore[key_hash] = {&lt;br&gt;
"userid": userid,&lt;br&gt;
"scopes": scopes,&lt;br&gt;
"created": time.time()&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  Return raw key ONLY ONCE to the user
&lt;/h1&gt;

&lt;p&gt;return api_key&lt;/p&gt;

&lt;p&gt;def validatekey(self, apikey: str) -&amp;gt; Optional[dict]:&lt;br&gt;
keyhash = self.hashkey(apikey)&lt;br&gt;
return self.keystore.get(key_hash)&lt;/p&gt;

&lt;p&gt;def hashkey(self, key: str) -&amp;gt; str:&lt;br&gt;
return hashlib.sha256(key.encode()).hexdigest()&lt;/p&gt;

&lt;p&gt;Usage&lt;br&gt;
key_manager = APIKeyManager()&lt;br&gt;
rawkey = keymanager.create_key("user123", ["chat", "embeddings"])&lt;br&gt;
print(f"Save this key securely: {raw_key}")  # Show once&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Prompt Injection Defense&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class PromptInjectionDetector:&lt;br&gt;
"""&lt;br&gt;
Detect attempts to override system behavior through user input.&lt;br&gt;
"""&lt;br&gt;
INJECTION_SIGNALS = [&lt;br&gt;
"ignore previous",&lt;br&gt;
"disregard your",&lt;br&gt;
"new instructions:",&lt;br&gt;
"[INST]",&lt;br&gt;
"&amp;lt;&amp;gt;",&lt;br&gt;
"&amp;lt;&amp;gt;",&lt;br&gt;
"you are now",&lt;br&gt;
"pretend you are",&lt;br&gt;
"forget your",&lt;br&gt;
"system prompt:",&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;@classmethod&lt;br&gt;
def detect(cls, user_input: str) -&amp;gt; bool:&lt;br&gt;
lowerinput = userinput.lower()&lt;br&gt;
for signal in cls.INJECTION_SIGNALS:&lt;br&gt;
if signal.lower() in lower_input:&lt;br&gt;
return True&lt;br&gt;
return False&lt;/p&gt;

&lt;p&gt;Usage in your endpoint&lt;br&gt;
@app.post("/chat")&lt;br&gt;
async def chat(request: ChatRequest):&lt;br&gt;
if PromptInjectionDetector.detect(request.messages[-1].content):&lt;/p&gt;

&lt;h1&gt;
  
  
  Log and block
&lt;/h1&gt;

&lt;p&gt;logger.warning(f"Prompt injection attempt: {request.messages[-1].content[:100]}")&lt;br&gt;
raise HTTPException(status_code=400, detail="Invalid input")&lt;/p&gt;

&lt;h1&gt;
  
  
  Continue with normal processing
&lt;/h1&gt;

&lt;p&gt;`&lt;/p&gt;

&lt;p&gt;Data Isolation in Multi-Tenant Systems&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class ConversationContext:&lt;br&gt;
"""&lt;br&gt;
Ensure user data doesn't leak between conversations.&lt;br&gt;
"""&lt;br&gt;
def init(self, userid: str, apikey: str):&lt;br&gt;
self.userid = userid&lt;br&gt;
self.apikey = apikey&lt;br&gt;
self.conversationhistory = []&lt;/p&gt;

&lt;p&gt;def add_message(self, role: str, content: str):&lt;br&gt;
self.conversationhistory.append({&lt;br&gt;
"role": role,&lt;br&gt;
"content": content,&lt;br&gt;
"userid": self.userid  # Tag with user&lt;br&gt;
})&lt;/p&gt;

&lt;p&gt;def get_messages(self) -&amp;gt; list[dict]:&lt;/p&gt;

&lt;h1&gt;
  
  
  Always filter by user_id to prevent leakage
&lt;/h1&gt;

&lt;p&gt;return [&lt;br&gt;
m for m in self.conversationhistory&lt;br&gt;
if m["userid"] == self.userid&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;def clear_history(self):&lt;/p&gt;

&lt;h1&gt;
  
  
  Only clear THIS user's history
&lt;/h1&gt;

&lt;p&gt;self.conversationhistory = [&lt;br&gt;
m for m in self.conversationhistory&lt;br&gt;
if m["userid"] != self.userid&lt;br&gt;
]&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Secure Error Handling&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
@app.exception_handler(Exception)&lt;br&gt;
async def globalexceptionhandler(request: Request, exc: Exception):&lt;/p&gt;

&lt;h1&gt;
  
  
  Never expose internal error details in production
&lt;/h1&gt;

&lt;p&gt;logger.error(f"Error: {exc}", exc_info=True)&lt;/p&gt;

&lt;p&gt;return JSONResponse(&lt;br&gt;
status_code=500,&lt;br&gt;
content={&lt;br&gt;
"error": "Internal server error",&lt;/p&gt;

&lt;h1&gt;
  
  
  Don't include: exc.message, stack trace, API keys
&lt;/h1&gt;

&lt;p&gt;}&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;For API provider errors (like ofox.ai errors)&lt;br&gt;
@app.post("/chat")&lt;br&gt;
async def chat(request: ChatRequest):&lt;br&gt;
try:&lt;br&gt;
result = await callofoxapi(request)&lt;br&gt;
return result&lt;br&gt;
except httpx.HTTPStatusError as e:&lt;/p&gt;

&lt;h1&gt;
  
  
  Log full error internally
&lt;/h1&gt;

&lt;p&gt;logger.error(f"ofox API error: {e.response.status_code} {e.response.text}")&lt;/p&gt;

&lt;h1&gt;
  
  
  Return sanitized error to client
&lt;/h1&gt;

&lt;p&gt;raise HTTPException(&lt;br&gt;
status_code=502,&lt;br&gt;
detail="AI service temporarily unavailable"&lt;br&gt;
)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Environment Variables (Never Hardcode)&lt;/p&gt;

&lt;p&gt;`bash&lt;br&gt;
.env (never commit this file)&lt;br&gt;
OFOXAPIKEY=your-key-here&lt;br&gt;
DATABASE_URL=postgresql://...&lt;br&gt;
JWT_SECRET=your-secret-here&lt;/p&gt;

&lt;p&gt;docker-compose.yml (use secrets in production)&lt;br&gt;
environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OFOXAPIKEY=${OFOXAPIKEY}
`&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;`python&lt;br&gt;
Load from environment&lt;br&gt;
from dotenv import load_dotenv&lt;br&gt;
load_dotenv()  # In development only&lt;/p&gt;

&lt;p&gt;apikey = os.environ.get("OFOXAPI_KEY")&lt;br&gt;
if not api_key:&lt;br&gt;
raise ValueError("OFOXAPIKEY not set")&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Security Checklist&lt;/p&gt;

&lt;p&gt;[ ] Input validation on all user-provided text&lt;br&gt;
[ ] Rate limiting on all endpoints&lt;br&gt;
[ ] API keys hashed (never stored raw)&lt;br&gt;
[ ] Prompt injection detection&lt;br&gt;
[ ] Error messages don't expose internals&lt;br&gt;
[ ] Environment variables for secrets (not hardcoded)&lt;br&gt;
[ ] HTTPS only in production&lt;br&gt;
[ ] Logging without sensitive data&lt;br&gt;
[ ] Regular dependency audits (pip audit, npm audit)&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build secure AI applications with ofox.ai — their API includes built-in security features and 99.9% uptime guarantee.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: security,api,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Docker for AI Development: Containerizing LLM Applications</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:40:55 +0000</pubDate>
      <link>https://dev.to/zny10289/docker-for-ai-development-containerizing-llm-applications-2ei7</link>
      <guid>https://dev.to/zny10289/docker-for-ai-development-containerizing-llm-applications-2ei7</guid>
      <description>&lt;p&gt;Docker simplifies AI application deployment by providing consistent environments from development to production. Here's how to containerize your AI applications powered by Claude and ofox.ai.&lt;/p&gt;

&lt;p&gt;Why Docker for AI Apps?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reproducible environments — Same behavior locally and in production&lt;/li&gt;
&lt;li&gt;Dependency isolation — Python packages, system libraries, CUDA versions&lt;/li&gt;
&lt;li&gt;Easy deployment — Ship to any cloud with Docker&lt;/li&gt;
&lt;li&gt;Resource control — Limit CPU/memory per container&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Basic Dockerfile for AI App&lt;/p&gt;

&lt;p&gt;`dockerfile&lt;br&gt;
Dockerfile&lt;br&gt;
FROM python:3.11-slim&lt;/p&gt;

&lt;p&gt;WORKDIR /app&lt;/p&gt;

&lt;p&gt;Install system dependencies&lt;br&gt;
RUN apt-get update &amp;amp;&amp;amp; apt-get install -y \&lt;br&gt;
curl \&lt;br&gt;
&amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*&lt;/p&gt;

&lt;p&gt;Copy requirements first (for caching)&lt;br&gt;
COPY requirements.txt .&lt;br&gt;
RUN pip install --no-cache-dir -r requirements.txt&lt;/p&gt;

&lt;p&gt;Copy application&lt;br&gt;
COPY . .&lt;/p&gt;

&lt;p&gt;Set environment variables&lt;br&gt;
ENV PYTHONUNBUFFERED=1&lt;/p&gt;

&lt;p&gt;Run the application&lt;br&gt;
CMD ["python", "main.py"]&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dockerfile&lt;br&gt;
requirements.txt&lt;br&gt;
fastapi==0.109.0&lt;br&gt;
uvicorn==0.27.0&lt;br&gt;
httpx==0.26.0&lt;br&gt;
python-dotenv==1.0.0&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Docker Compose for AI Services&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
docker-compose.yml&lt;br&gt;
version: '3.8'&lt;/p&gt;

&lt;p&gt;services:&lt;br&gt;
api:&lt;br&gt;
build: .&lt;br&gt;
ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"8000:8000"
environment:&lt;/li&gt;
&lt;li&gt;OFOXAPIKEY=${OFOXAPIKEY}&lt;/li&gt;
&lt;li&gt;MODEL=claude-3-5-sonnet-20241022
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "&lt;a href="http://localhost:8000/health%22" rel="noopener noreferrer"&gt;http://localhost:8000/health"&lt;/a&gt;]
interval: 30s
timeout: 10s
retries: 3&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Optional: Add a Redis cache
&lt;/h1&gt;

&lt;p&gt;redis:&lt;br&gt;
image: redis:7-alpine&lt;br&gt;
ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"6379:6379"
volumes:&lt;/li&gt;
&lt;li&gt;redis-data:/data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;br&gt;
redis-data:&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Production-Ready FastAPI + ofox.ai&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
main.py&lt;br&gt;
from fastapi import FastAPI, HTTPException&lt;br&gt;
from pydantic import BaseModel&lt;br&gt;
from typing import List, Optional&lt;br&gt;
import httpx&lt;br&gt;
import os&lt;/p&gt;

&lt;p&gt;app = FastAPI(title="Claude API Service", version="1.0.0")&lt;/p&gt;

&lt;p&gt;class Message(BaseModel):&lt;br&gt;
role: str&lt;br&gt;
content: str&lt;/p&gt;

&lt;p&gt;class ChatRequest(BaseModel):&lt;br&gt;
messages: List[Message]&lt;br&gt;
model: str = "claude-3-5-sonnet-20241022"&lt;br&gt;
max_tokens: Optional[int] = 1024&lt;br&gt;
temperature: Optional[float] = 0.7&lt;/p&gt;

&lt;p&gt;@app.get("/health")&lt;br&gt;
async def health():&lt;br&gt;
return {"status": "healthy"}&lt;/p&gt;

&lt;p&gt;@app.post("/chat")&lt;br&gt;
async def chat(request: ChatRequest):&lt;br&gt;
async with httpx.AsyncClient(timeout=120.0) as client:&lt;br&gt;
try:&lt;br&gt;
response = await client.post(&lt;br&gt;
"&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;",&lt;br&gt;
headers={&lt;br&gt;
"Authorization": f"Bearer {os.environ['OFOXAPIKEY']}",&lt;br&gt;
"Content-Type": "application/json"&lt;br&gt;
},&lt;br&gt;
json={&lt;br&gt;
"model": request.model,&lt;br&gt;
"messages": [m.model_dump() for m in request.messages],&lt;br&gt;
"maxtokens": request.maxtokens,&lt;br&gt;
"temperature": request.temperature&lt;br&gt;
}&lt;br&gt;
)&lt;br&gt;
response.raiseforstatus()&lt;br&gt;
data = response.json()&lt;br&gt;
return {&lt;br&gt;
"content": data["choices"][0]["message"]["content"],&lt;br&gt;
"model": data["model"],&lt;br&gt;
"tokens": data["usage"]["total_tokens"]&lt;br&gt;
}&lt;br&gt;
except httpx.HTTPStatusError as e:&lt;br&gt;
raise HTTPException(statuscode=e.response.statuscode, detail=str(e))&lt;br&gt;
except Exception as e:&lt;br&gt;
raise HTTPException(status_code=500, detail=str(e))&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;GPU Support for Local Models&lt;/p&gt;

&lt;p&gt;`dockerfile&lt;br&gt;
Dockerfile with GPU support&lt;br&gt;
FROM nvidia/cuda:12.1.0-base-ubuntu22.04&lt;/p&gt;

&lt;p&gt;RUN apt-get update &amp;amp;&amp;amp; apt-get install -y \&lt;br&gt;
python3.11 python3.11-venv python3-pip \&lt;br&gt;
&amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*&lt;/p&gt;

&lt;p&gt;WORKDIR /app&lt;br&gt;
COPY requirements.txt .&lt;br&gt;
RUN pip install --no-cache-dir -r requirements.txt&lt;/p&gt;

&lt;p&gt;For running local models like Ollama&lt;br&gt;
RUN curl -fsSL &lt;a href="https://ollama.ai/install.sh" rel="noopener noreferrer"&gt;https://ollama.ai/install.sh&lt;/a&gt; | sh&lt;/p&gt;

&lt;p&gt;COPY . .&lt;br&gt;
CMD ["python3", "main.py"]&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
docker-compose.yml with GPU&lt;br&gt;
services:&lt;br&gt;
api:&lt;br&gt;
build: .&lt;br&gt;
deploy:&lt;br&gt;
resources:&lt;br&gt;
reservations:&lt;br&gt;
devices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;driver: nvidia
count: 1
capabilities: [gpu]
`&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-Stage Build (Smaller Images)&lt;/p&gt;

&lt;p&gt;`dockerfile&lt;br&gt;
Build stage&lt;br&gt;
FROM python:3.11-slim as builder&lt;br&gt;
WORKDIR /app&lt;br&gt;
COPY requirements.txt .&lt;br&gt;
RUN pip install --no-cache-dir --user -r requirements.txt&lt;/p&gt;

&lt;p&gt;Production stage&lt;br&gt;
FROM python:3.11-slim&lt;br&gt;
WORKDIR /app&lt;br&gt;
COPY --from=builder /root/.local /root/.local&lt;br&gt;
COPY . .&lt;br&gt;
ENV PATH=/root/.local/bin:$PATH&lt;br&gt;
CMD ["python", "main.py"]&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Environment-Based Configuration&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
config.py&lt;br&gt;
import os&lt;br&gt;
from dataclasses import dataclass&lt;/p&gt;

&lt;p&gt;@dataclass&lt;br&gt;
class Config:&lt;br&gt;
api_key: str&lt;br&gt;
model: str&lt;br&gt;
max_tokens: int&lt;br&gt;
temperature: float&lt;/p&gt;

&lt;p&gt;def get_config() -&amp;gt; Config:&lt;br&gt;
return Config(&lt;br&gt;
apikey=os.environ["OFOXAPI_KEY"],&lt;br&gt;
model=os.environ.get("MODEL", "claude-3-5-sonnet-20241022"),&lt;br&gt;
maxtokens=int(os.environ.get("MAXTOKENS", "1024")),&lt;br&gt;
temperature=float(os.environ.get("TEMPERATURE", "0.7"))&lt;br&gt;
)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building and Running&lt;/p&gt;

&lt;p&gt;`bash&lt;br&gt;
Build&lt;br&gt;
docker build -t claude-api-service .&lt;/p&gt;

&lt;p&gt;Run&lt;br&gt;
docker run -d -p 8000:8000 \&lt;br&gt;
-e OFOXAPIKEY=your-key-here \&lt;br&gt;
--name claude-api \&lt;br&gt;
claude-api-service&lt;/p&gt;

&lt;p&gt;With Docker Compose&lt;br&gt;
docker-compose up -d&lt;/p&gt;

&lt;p&gt;View logs&lt;br&gt;
docker logs -f claude-api&lt;/p&gt;

&lt;p&gt;Shell into container&lt;br&gt;
docker exec -it claude-api /bin/bash&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;CI/CD with GitHub Actions&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
name: Build and Deploy&lt;/p&gt;

&lt;p&gt;on:&lt;br&gt;
push:&lt;br&gt;
branches: [main]&lt;/p&gt;

&lt;p&gt;jobs:&lt;br&gt;
build:&lt;br&gt;
runs-on: ubuntu-latest&lt;br&gt;
steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;uses: actions/checkout@v4&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name: Build image&lt;br&gt;
run: docker build -t claude-api:${{ github.sha }} .&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name: Run tests&lt;br&gt;
run: |&lt;br&gt;
docker run claude-api:${{ github.sha }} pytest&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name: Push to registry&lt;br&gt;
run: |&lt;br&gt;
docker tag claude-api:${{ github.sha }} registry/app/claude-api:latest&lt;br&gt;
docker push registry/app/claude-api:latest&lt;br&gt;
`&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deploy Anywhere&lt;/p&gt;

&lt;p&gt;With Docker, your AI application deploys to:&lt;br&gt;
AWS ECS — Managed container service&lt;br&gt;
Google Cloud Run — Serverless containers&lt;br&gt;
Azure Container Instances — Simple deployment&lt;br&gt;
DigitalOcean App Platform — Simple PaaS&lt;br&gt;
Your own server — With docker-compose&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Containerize your AI applications and deploy with confidence. Power them with ofox.ai — reliable Claude API with competitive pricing and 99.9% uptime.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: docker,devops,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Testing AI-Powered Applications: Strategies for LLM Integration</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:35:44 +0000</pubDate>
      <link>https://dev.to/zny10289/testing-ai-powered-applications-strategies-for-llm-integration-3ke9</link>
      <guid>https://dev.to/zny10289/testing-ai-powered-applications-strategies-for-llm-integration-3ke9</guid>
      <description>&lt;p&gt;Testing AI applications is fundamentally different from testing traditional software. There's no deterministic output, prompts change behavior, and edge cases multiply. Here's how to build a robust testing strategy for AI-powered applications.&lt;/p&gt;

&lt;p&gt;The AI Testing Challenge&lt;/p&gt;

&lt;p&gt;Traditional testing:&lt;br&gt;
&lt;code&gt;&lt;br&gt;
Input → Function → Expected Output&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;AI testing:&lt;br&gt;
&lt;code&gt;&lt;br&gt;
Input → Prompt + Context → Probabilistic Output&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can't assert exact outputs. Instead, you test properties.&lt;/p&gt;

&lt;p&gt;Property-Based Testing for AI&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// Instead of testing exact output, test properties&lt;/p&gt;

&lt;p&gt;interface TestCase {&lt;br&gt;
input: string;&lt;br&gt;
constraints: Constraint[];&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;interface Constraint {&lt;br&gt;
type: 'contains' | 'excludes' | 'length' | 'format' | 'json';&lt;br&gt;
value: string | number | RegExp;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;async function testAIOutput(testCase: TestCase, actualOutput: string): Promise {&lt;br&gt;
for (const constraint of testCase.constraints) {&lt;br&gt;
switch (constraint.type) {&lt;br&gt;
case 'contains':&lt;br&gt;
if (!actualOutput.includes(constraint.value as string)) return false;&lt;br&gt;
break;&lt;br&gt;
case 'excludes':&lt;br&gt;
if (actualOutput.includes(constraint.value as string)) return false;&lt;br&gt;
break;&lt;br&gt;
case 'length':&lt;br&gt;
if (actualOutput.length &amp;gt; (constraint.value as number)) return false;&lt;br&gt;
break;&lt;br&gt;
case 'json':&lt;br&gt;
try {&lt;br&gt;
JSON.parse(actualOutput);&lt;br&gt;
} catch {&lt;br&gt;
return false;&lt;br&gt;
}&lt;br&gt;
break;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
return true;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Example test&lt;br&gt;
const testCase: TestCase = {&lt;br&gt;
input: 'Extract the name and email from: John Doe, &lt;a href="mailto:john@example.com"&gt;john@example.com&lt;/a&gt;',&lt;br&gt;
constraints: [&lt;br&gt;
{ type: 'contains', value: 'John' },&lt;br&gt;
{ type: 'contains', value: '&lt;a href="mailto:john@example.com"&gt;john@example.com&lt;/a&gt;' },&lt;br&gt;
{ type: 'excludes', value: 'undefined' },&lt;br&gt;
{ type: 'length', value: 100 }&lt;br&gt;
]&lt;br&gt;
};&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Prompt Versioning and Regression Testing&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import hashlib&lt;br&gt;
from datetime import datetime&lt;/p&gt;

&lt;p&gt;class PromptRegistry:&lt;br&gt;
def init(self):&lt;br&gt;
self.prompts = {}&lt;/p&gt;

&lt;p&gt;def register(self, name: str, version: str, prompt: str, test_cases: list):&lt;br&gt;
key = f"{name}:{version}"&lt;br&gt;
self.prompts[key] = {&lt;br&gt;
'prompt': prompt,&lt;br&gt;
'testcases': testcases,&lt;br&gt;
'hash': hashlib.md5(prompt.encode()).hexdigest(),&lt;br&gt;
'registered': datetime.now()&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;def get_prompt(self, name: str, version: str) -&amp;gt; str:&lt;br&gt;
return self.prompts[f"{name}:{version}"]['prompt']&lt;/p&gt;

&lt;p&gt;def regressiontest(self, name: str, newversion: str,&lt;br&gt;
llm_client, threshold: float = 0.8) -&amp;gt; bool:&lt;br&gt;
"""Ensure new version passes existing test cases."""&lt;br&gt;
old_prompt = self.prompts.get(f"{name}:{version}")&lt;br&gt;
if not old_prompt:&lt;br&gt;
return True&lt;/p&gt;

&lt;p&gt;old_passes = 0&lt;br&gt;
new_passes = 0&lt;/p&gt;

&lt;p&gt;for tc in oldprompt['testcases']:&lt;br&gt;
oldresult = await llmclient.complete(old_prompt['prompt'] + tc['input'])&lt;br&gt;
newresult = await llmclient.complete(&lt;br&gt;
self.getprompt(name, newversion) + tc['input']&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;oldok = await testAIOutput(tc, oldresult)&lt;br&gt;
newok = await testAIOutput(tc, newresult)&lt;/p&gt;

&lt;p&gt;if oldok: oldpasses += 1&lt;br&gt;
if newok: newpasses += 1&lt;/p&gt;

&lt;h1&gt;
  
  
  New version should pass at least as many tests
&lt;/h1&gt;

&lt;p&gt;return (newpasses / len(oldprompt['test_cases'])) &amp;gt;= threshold&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Deterministic Output Testing&lt;/p&gt;

&lt;p&gt;For structured outputs, test deterministically:&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
import { z } from 'zod';&lt;/p&gt;

&lt;p&gt;const CodeReviewSchema = z.object({&lt;br&gt;
score: z.number().min(0).max(10),&lt;br&gt;
issues: z.array(z.object({&lt;br&gt;
severity: z.enum(['low', 'medium', 'high']),&lt;br&gt;
line: z.number(),&lt;br&gt;
description: z.string()&lt;br&gt;
})),&lt;br&gt;
summary: z.string()&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;async function testCodeReview(code: string, expectedScoreRange: [number, number]) {&lt;br&gt;
const response = await llm.complete(&lt;br&gt;
Review this code and return JSON: ${code}&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;// Parse and validate&lt;br&gt;
const parsed = JSON.parse(response);&lt;br&gt;
const validated = CodeReviewSchema.parse(parsed);&lt;/p&gt;

&lt;p&gt;// Deterministic assertions&lt;br&gt;
console.assert(&lt;br&gt;
validated.score &amp;gt;= expectedScoreRange[0] &amp;amp;&amp;amp;&lt;br&gt;
validated.score &amp;lt;= expectedScoreRange[1],&lt;br&gt;
Score ${validated.score} outside expected range&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;console.assert(&lt;br&gt;
validated.issues.length &amp;lt; 20,&lt;br&gt;
'Too many issues reported'&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;return validated;&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Mocking External AI Calls&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// For unit tests, mock the LLM client&lt;br&gt;
class MockLLMClient {&lt;br&gt;
constructor(private fixtures: Map) {}&lt;/p&gt;

&lt;p&gt;async complete(prompt: string): Promise {&lt;br&gt;
// Return fixture matching prompt pattern&lt;br&gt;
for (const [pattern, response] of this.fixtures) {&lt;br&gt;
if (prompt.includes(pattern)) {&lt;br&gt;
return response;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
return 'Mock response';&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;async *stream(prompt: string): AsyncGenerator {&lt;br&gt;
const response = await this.complete(prompt);&lt;br&gt;
for (const char of response) {&lt;br&gt;
yield char;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Usage in tests&lt;br&gt;
const mockClient = new MockLLMClient(new Map([&lt;br&gt;
['extract email', '{"email": "&lt;a href="mailto:test@example.com"&gt;test@example.com&lt;/a&gt;"}'],&lt;br&gt;
['summarize', 'This is a summary of the text.']&lt;br&gt;
]));&lt;/p&gt;

&lt;p&gt;// Now your business logic tests run fast and deterministically&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Chaos Testing for AI Applications&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class AIChaosTests:&lt;br&gt;
def testratelimits(self, client):&lt;br&gt;
"""Does your app handle rate limits gracefully?"""&lt;br&gt;
for _ in range(100):&lt;br&gt;
try:&lt;br&gt;
client.complete("test")&lt;br&gt;
except RateLimitError:&lt;br&gt;
assert client.retry_count &amp;gt; 0&lt;br&gt;
break&lt;br&gt;
else:&lt;br&gt;
pytest.fail("Rate limit not encountered after 100 requests")&lt;/p&gt;

&lt;p&gt;def testinvalidjson(self, client):&lt;br&gt;
"""Does your app handle malformed JSON from LLM?"""&lt;/p&gt;

&lt;h1&gt;
  
  
  Inject bad response
&lt;/h1&gt;

&lt;p&gt;client.mock_response('{"broken": }')&lt;br&gt;
result = safeparsejson(client.complete("test"))&lt;br&gt;
assert result is not None  # Handled gracefully&lt;/p&gt;

&lt;p&gt;def testemptycontext(self, client):&lt;br&gt;
"""Does your app handle empty context?"""&lt;br&gt;
result = client.complete("")&lt;br&gt;
assert result is not None&lt;/p&gt;

&lt;p&gt;def testmaxtokens_respected(self, client):&lt;br&gt;
"""Does max_tokens actually limit output?"""&lt;br&gt;
result = client.complete("test", max_tokens=10)&lt;br&gt;
assert len(result) &amp;lt;= 50  # ~10 tokens&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Integration Test Framework&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
describe('AI Integration Tests', () =&amp;gt; {&lt;br&gt;
const client = new ClaudeClient(process.env.OFOXAPIKEY);&lt;/p&gt;

&lt;p&gt;describe('Code Review Feature', () =&amp;gt; {&lt;br&gt;
it('identifies syntax errors', async () =&amp;gt; {&lt;br&gt;
const code = 'const x = ;';&lt;br&gt;
const review = await reviewCode(client, code);&lt;br&gt;
expect(review.issues.some(i =&amp;gt; i.severity === 'high')).toBe(true);&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;it('handles valid code gracefully', async () =&amp;gt; {&lt;br&gt;
const code = 'const x = 42;';&lt;br&gt;
const review = await reviewCode(client, code);&lt;br&gt;
expect(review.issues.filter(i =&amp;gt; i.severity === 'high')).toHaveLength(0);&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;it('respects max issues limit', async () =&amp;gt; {&lt;br&gt;
const code = '...'; // Large code&lt;br&gt;
const review = await reviewCode(client, code, { maxIssues: 10 });&lt;br&gt;
expect(review.issues.length).toBeLessThanOrEqual(10);&lt;br&gt;
});&lt;br&gt;
});&lt;br&gt;
});&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building Testable AI Systems&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Separate concerns — Keep prompts in config, not buried in code&lt;/li&gt;
&lt;li&gt;Structured outputs — Use Zod/JSON Schema to constrain responses&lt;/li&gt;
&lt;li&gt;Fallback handling — Plan for API failures at every call site&lt;/li&gt;
&lt;li&gt;Snapshot testing — Store expected responses for regression&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build testable AI applications with ofox.ai — their API is reliable and consistent, making it easier to build deterministic test suites.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: testing,ai,programming,developer,quality&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>LLM Context Window Management: Techniques for Handling Long Documents</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:30:32 +0000</pubDate>
      <link>https://dev.to/zny10289/llm-context-window-management-techniques-for-handling-long-documents-4cf5</link>
      <guid>https://dev.to/zny10289/llm-context-window-management-techniques-for-handling-long-documents-4cf5</guid>
      <description>&lt;p&gt;Every LLM has a context window limit — a maximum number of tokens you can pass in a single request. Claude 3.5 Sonnet offers 200K tokens, but that's still finite. Here's how to manage context efficiently for production AI applications.&lt;/p&gt;

&lt;p&gt;Understanding Context Window Limits&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Approximate Pages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;td&gt;~500 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4 Turbo&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;~300 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3 Opus&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;td&gt;~500 pages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you exceed the limit, you get an error. When you're close, you're wasting money on tokens that add no value.&lt;/p&gt;

&lt;p&gt;Token Estimation&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import re&lt;/p&gt;

&lt;p&gt;def estimate_tokens(text: str) -&amp;gt; int:&lt;br&gt;
"""&lt;br&gt;
Rough token estimation.&lt;br&gt;
~4 characters per token for English text.&lt;br&gt;
"""&lt;br&gt;
return len(text) // 4&lt;/p&gt;

&lt;p&gt;def estimatetokensprecise(text: str) -&amp;gt; int:&lt;br&gt;
"""&lt;br&gt;
More precise estimation using word count.&lt;br&gt;
Average English word is ~1.3 tokens.&lt;br&gt;
"""&lt;br&gt;
words = len(re.findall(r'\w+', text))&lt;br&gt;
return int(words * 1.3)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Technique 1: Semantic Chunking&lt;/p&gt;

&lt;p&gt;Split documents by meaning, not by character count:&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
import re&lt;/p&gt;

&lt;p&gt;def semanticchunk(text: str, maxtokens: int = 4000, overlap: int = 200) -&amp;gt; list[str]:&lt;br&gt;
"""&lt;br&gt;
Split text into semantic chunks (paragraphs).&lt;br&gt;
"""&lt;/p&gt;

&lt;h1&gt;
  
  
  Split by double newlines (paragraphs)
&lt;/h1&gt;

&lt;p&gt;paragraphs = re.split(r'\n\n+', text)&lt;/p&gt;

&lt;p&gt;chunks = []&lt;br&gt;
current_chunk = []&lt;br&gt;
current_tokens = 0&lt;/p&gt;

&lt;p&gt;for para in paragraphs:&lt;br&gt;
paratokens = estimatetokens(para)&lt;/p&gt;

&lt;p&gt;if currenttokens + paratokens &amp;gt; max_tokens:&lt;/p&gt;

&lt;h1&gt;
  
  
  Save current chunk
&lt;/h1&gt;

&lt;p&gt;if current_chunk:&lt;br&gt;
chunks.append('\n\n'.join(current_chunk))&lt;/p&gt;

&lt;h1&gt;
  
  
  Start new chunk with overlap
&lt;/h1&gt;

&lt;p&gt;overlap_paras = []&lt;br&gt;
overlap_tokens = 0&lt;br&gt;
for p in reversed(current_chunk):&lt;br&gt;
t = estimate_tokens(p)&lt;br&gt;
if overlap_tokens + t &amp;lt;= overlap:&lt;br&gt;
overlap_paras.insert(0, p)&lt;br&gt;
overlap_tokens += t&lt;br&gt;
else:&lt;br&gt;
break&lt;/p&gt;

&lt;p&gt;currentchunk = overlapparas + [para]&lt;br&gt;
currenttokens = overlaptokens + para_tokens&lt;br&gt;
else:&lt;br&gt;
current_chunk.append(para)&lt;br&gt;
currenttokens += paratokens&lt;/p&gt;

&lt;p&gt;if current_chunk:&lt;br&gt;
chunks.append('\n\n'.join(current_chunk))&lt;/p&gt;

&lt;p&gt;return chunks&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Technique 2: RAG — Retrieval-Augmented Generation&lt;/p&gt;

&lt;p&gt;Don't put everything in the prompt. Retrieve only what's relevant:&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class SimpleRAG:&lt;br&gt;
def init(self, documents: list[str], chunk_size: int = 1000):&lt;br&gt;
self.chunks = self.createchunks(documents, chunk_size)&lt;br&gt;
self.embeddings = self.createembeddings(self.chunks)&lt;/p&gt;

&lt;p&gt;def retrieve(self, query: str, top_k: int = 3) -&amp;gt; list[str]:&lt;br&gt;
"""Find most relevant chunks for query."""&lt;br&gt;
queryembedding = self.embed(query)&lt;/p&gt;

&lt;p&gt;scores = [&lt;br&gt;
self.cosinesimilarity(query_embedding, e)&lt;br&gt;
for e in self.embeddings&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;top_indices = sorted(range(len(scores)),&lt;br&gt;
key=lambda i: scores[i],&lt;br&gt;
reverse=True)[:top_k]&lt;/p&gt;

&lt;p&gt;return [self.chunks[i] for i in top_indices]&lt;/p&gt;

&lt;p&gt;def createchunks(self, documents: list[str], chunk_size: int) -&amp;gt; list[str]:&lt;br&gt;
chunks = []&lt;br&gt;
for doc in documents:&lt;br&gt;
chunks.extend(semanticchunk(doc, maxtokens=chunk_size))&lt;br&gt;
return chunks&lt;/p&gt;

&lt;p&gt;def _embed(self, text: str) -&amp;gt; list[float]:&lt;/p&gt;

&lt;h1&gt;
  
  
  In production, use OpenAI or ofox.ai embeddings
&lt;/h1&gt;

&lt;p&gt;pass&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Technique 3: Conversation Summary&lt;/p&gt;

&lt;p&gt;Summarize older messages to preserve context:&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class SummarizingConversation:&lt;br&gt;
def init(self, max_tokens: int = 16000):&lt;br&gt;
self.maxtokens = maxtokens&lt;br&gt;
self.messages = []&lt;br&gt;
self.summary = ""&lt;/p&gt;

&lt;p&gt;def add_message(self, role: str, content: str):&lt;br&gt;
self.messages.append({"role": role, "content": content})&lt;br&gt;
self.maybesummarize()&lt;/p&gt;

&lt;p&gt;def maybesummarize(self):&lt;br&gt;
totaltokens = sum(estimatetokens(m["content"]) for m in self.messages)&lt;/p&gt;

&lt;p&gt;if totaltokens &amp;gt; self.maxtokens:&lt;/p&gt;

&lt;h1&gt;
  
  
  Summarize older messages
&lt;/h1&gt;

&lt;p&gt;older_messages = self.messages[:-5]  # Keep last 5&lt;br&gt;
recent = self.messages[-5:]&lt;/p&gt;

&lt;p&gt;summary_prompt = f"""&lt;br&gt;
Summarize this conversation concisely, preserving key information:&lt;/p&gt;

&lt;p&gt;{chr(10).join(f'{m[\"role\"]}: {m[\"content\"]}' for m in older_messages)}&lt;br&gt;
"""&lt;/p&gt;

&lt;h1&gt;
  
  
  Call LLM to summarize (pseudocode)
&lt;/h1&gt;

&lt;p&gt;self.summary = callllmsummarize(summary_prompt)&lt;br&gt;
self.messages = [{"role": "system", "content": f"Prior context: {self.summary}"}] + recent&lt;/p&gt;

&lt;p&gt;def get_messages(self) -&amp;gt; list[dict]:&lt;br&gt;
return self.messages&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Technique 4: System Prompt Optimization&lt;/p&gt;

&lt;p&gt;Keep system prompts lean:&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
❌ Verbose system prompt (wastes tokens)&lt;br&gt;
verbose_system = """&lt;br&gt;
You are a helpful AI assistant. You are designed to be respectful,&lt;br&gt;
professional, and helpful. You should provide accurate information&lt;br&gt;
and be honest when you don't know something. You should ...&lt;br&gt;
[200 more words]&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;✅ Lean system prompt (effective)&lt;br&gt;
lean_system = """&lt;br&gt;
Role: helpful AI assistant&lt;br&gt;
Goal: provide accurate, concise answers&lt;br&gt;
When unsure: say "I don't know"&lt;br&gt;
"""&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Technique 5: Streaming with Token Tracking&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class StreamingTokenTracker:&lt;br&gt;
def init(self, model: str = "claude-3-5-sonnet-20241022"):&lt;br&gt;
self.model = model&lt;br&gt;
self.totalinputtokens = 0&lt;br&gt;
self.totaloutputtokens = 0&lt;/p&gt;

&lt;p&gt;async def stream_chat(self, messages: list[dict]) -&amp;gt; str:&lt;br&gt;
"""Stream response while tracking token usage."""&lt;br&gt;
response = await fetch('&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;', {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': f'Bearer {API_KEY}',&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: json.dumps({&lt;br&gt;
'model': self.model,&lt;br&gt;
'messages': messages,&lt;br&gt;
'stream': True&lt;br&gt;
})&lt;br&gt;
})&lt;/p&gt;

&lt;p&gt;reader = response.body.getReader()&lt;br&gt;
decoder = TextDecoder()&lt;br&gt;
full_response = []&lt;/p&gt;

&lt;p&gt;while True:&lt;br&gt;
chunk = await reader.read()&lt;br&gt;
if chunk.done: break&lt;/p&gt;

&lt;p&gt;data = decoder.decode(chunk.value)&lt;br&gt;
for line in data.split('\n'):&lt;br&gt;
if line.startswith('data: '):&lt;br&gt;
delta = json.loads(line[6:]).get('choices', [{}])[0].get('delta', {})&lt;br&gt;
if content := delta.get('content'):&lt;br&gt;
self.totaloutputtokens += estimate_tokens(content)&lt;br&gt;
yield content&lt;/p&gt;

&lt;h1&gt;
  
  
  Track input tokens
&lt;/h1&gt;

&lt;p&gt;self.totalinputtokens = sum(&lt;br&gt;
estimate_tokens(m['content']) for m in messages&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;def getcost(self, inputcostper1k=0.003, outputcostper_1k=0.015):&lt;br&gt;
inputcost = (self.totalinputtokens / 1000) * inputcostper1k&lt;br&gt;
outputcost = (self.totaloutputtokens / 1000) * outputcostper1k&lt;br&gt;
return inputcost + outputcost&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Practical Rule of Thumb&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
Keep your prompt at &amp;lt; 50% of context window.&lt;br&gt;
This leaves room for:&lt;br&gt;
User input variations&lt;br&gt;
Model reasoning&lt;br&gt;
Unexpected response length&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build token-efficient AI applications with ofox.ai — their OpenAI-compatible API gives you access to Claude with generous context windows at competitive pricing.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: llm,artificial-intelligence,programming,developer,performance&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Building React AI Chat Components in 2026: Complete Guide</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:25:20 +0000</pubDate>
      <link>https://dev.to/zny10289/building-react-ai-chat-components-in-2026-complete-guide-11hi</link>
      <guid>https://dev.to/zny10289/building-react-ai-chat-components-in-2026-complete-guide-11hi</guid>
      <description>&lt;p&gt;Building a production-ready AI chat interface in React requires more than just displaying messages. You need streaming responses, markdown rendering, code highlighting, error handling, and a polished UX. Here's the complete implementation.&lt;/p&gt;

&lt;p&gt;Core Chat Component&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// ChatWindow.tsx&lt;br&gt;
import React, { useState, useRef, useEffect } from 'react';&lt;br&gt;
import ReactMarkdown from 'react-markdown';&lt;br&gt;
import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter';&lt;/p&gt;

&lt;p&gt;interface Message {&lt;br&gt;
id: string;&lt;br&gt;
role: 'user' | 'assistant';&lt;br&gt;
content: string;&lt;br&gt;
timestamp: Date;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;interface ChatWindowProps {&lt;br&gt;
apiKey: string;&lt;br&gt;
model?: string;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;export function ChatWindow({ apiKey, model = 'claude-3-5-sonnet-20241022' }: ChatWindowProps) {&lt;br&gt;
const [messages, setMessages] = useState([]);&lt;br&gt;
const [input, setInput] = useState('');&lt;br&gt;
const [isLoading, setIsLoading] = useState(false);&lt;br&gt;
const [error, setError] = useState(null);&lt;br&gt;
const messagesEndRef = useRef(null);&lt;/p&gt;

&lt;p&gt;const scrollToBottom = () =&amp;gt; {&lt;br&gt;
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;useEffect(() =&amp;gt; {&lt;br&gt;
scrollToBottom();&lt;br&gt;
}, [messages, isLoading]);&lt;/p&gt;

&lt;p&gt;const sendMessage = async () =&amp;gt; {&lt;br&gt;
if (!input.trim() || isLoading) return;&lt;/p&gt;

&lt;p&gt;const userMessage: Message = {&lt;br&gt;
id: crypto.randomUUID(),&lt;br&gt;
role: 'user',&lt;br&gt;
content: input.trim(),&lt;br&gt;
timestamp: new Date()&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;setMessages(prev =&amp;gt; [...prev, userMessage]);&lt;br&gt;
setInput('');&lt;br&gt;
setIsLoading(true);&lt;br&gt;
setError(null);&lt;/p&gt;

&lt;p&gt;try {&lt;br&gt;
const response = await fetch('&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;', {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': Bearer ${apiKey},&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: JSON.stringify({&lt;br&gt;
model,&lt;br&gt;
messages: [...messages, userMessage].map(m =&amp;gt; ({&lt;br&gt;
role: m.role,&lt;br&gt;
content: m.content&lt;br&gt;
})),&lt;br&gt;
stream: true&lt;br&gt;
})&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;if (!response.ok) {&lt;br&gt;
throw new Error(API error: ${response.status});&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Handle streaming response&lt;br&gt;
const reader = response.body?.getReader();&lt;br&gt;
const decoder = new TextDecoder();&lt;br&gt;
let assistantContent = '';&lt;/p&gt;

&lt;p&gt;const assistantMessage: Message = {&lt;br&gt;
id: crypto.randomUUID(),&lt;br&gt;
role: 'assistant',&lt;br&gt;
content: '',&lt;br&gt;
timestamp: new Date()&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;setMessages(prev =&amp;gt; [...prev, assistantMessage]);&lt;/p&gt;

&lt;p&gt;while (reader) {&lt;br&gt;
const { done, value } = await reader.read();&lt;br&gt;
if (done) break;&lt;/p&gt;

&lt;p&gt;const chunk = decoder.decode(value);&lt;br&gt;
const lines = chunk.split('\n');&lt;/p&gt;

&lt;p&gt;for (const line of lines) {&lt;br&gt;
if (line.startsWith('data: ')) {&lt;br&gt;
const data = JSON.parse(line.slice(6));&lt;br&gt;
if (data.choices[0].delta.content) {&lt;br&gt;
assistantContent += data.choices[0].delta.content;&lt;br&gt;
setMessages(prev =&amp;gt; prev.map(m =&amp;gt;&lt;br&gt;
m.id === assistantMessage.id&lt;br&gt;
? { ...m, content: assistantContent }&lt;br&gt;
: m&lt;br&gt;
));&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
} catch (err) {&lt;br&gt;
setError(err instanceof Error ? err.message : 'Unknown error');&lt;br&gt;
setMessages(prev =&amp;gt; prev.filter(m =&amp;gt; m.id !== userMessage.id));&lt;br&gt;
} finally {&lt;br&gt;
setIsLoading(false);&lt;br&gt;
}&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;return (&lt;/p&gt;

&lt;p&gt;{messages.map(msg =&amp;gt; (&lt;/p&gt;

&lt;p&gt;{msg.role === 'user' ? 'You' : 'Claude'}&lt;/p&gt;

&lt;p&gt;{String(children).replace(/\n$/, '')}&lt;/p&gt;

&lt;p&gt;) : (&lt;br&gt;
&lt;code&gt;&lt;br&gt;
{children}&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
);&lt;br&gt;
}&lt;br&gt;
}}&lt;br&gt;
&amp;gt;&lt;br&gt;
{msg.content}&lt;/p&gt;


&lt;br&gt;
))}&lt;br&gt;
{isLoading &amp;amp;&amp;amp; (


Claude
...


&lt;p&gt;)}&lt;br&gt;
{error &amp;amp;&amp;amp; &lt;/p&gt;{error}}





 setInput(e.target.value)}
onKeyDown={e =&amp;gt; {
if (e.key === 'Enter' &amp;amp;&amp;amp; !e.shiftKey) {
e.preventDefault();
sendMessage();
}
}}
placeholder="Ask Claude..."
disabled={isLoading}
/&amp;gt;

Send




&lt;br&gt;
);&lt;br&gt;
}&lt;br&gt;
`

&lt;p&gt;Streaming vs Non-Streaming&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// Non-streaming (simpler, good for short responses)&lt;br&gt;
async function chat(apiKey: string, messages: Message[]) {&lt;br&gt;
const response = await fetch('&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;', {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': Bearer ${apiKey},&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: JSON.stringify({&lt;br&gt;
model: 'claude-3-5-sonnet-20241022',&lt;br&gt;
messages: messages.map(m =&amp;gt; ({ role: m.role, content: m.content }))&lt;br&gt;
})&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const data = await response.json();&lt;br&gt;
return data.choices[0].message.content;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Streaming (better UX for long responses)&lt;br&gt;
async function* chatStream(apiKey: string, messages: Message[]) {&lt;br&gt;
const response = await fetch('&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;', {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': Bearer ${apiKey},&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: JSON.stringify({&lt;br&gt;
model: 'claude-3-5-sonnet-20241022',&lt;br&gt;
messages: messages.map(m =&amp;gt; ({ role: m.role, content: m.content })),&lt;br&gt;
stream: true&lt;br&gt;
})&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const reader = response.body?.getReader();&lt;br&gt;
const decoder = new TextDecoder();&lt;/p&gt;

&lt;p&gt;while (reader) {&lt;br&gt;
const { done, value } = await reader.read();&lt;br&gt;
if (done) break;&lt;/p&gt;

&lt;p&gt;const chunk = decoder.decode(value);&lt;br&gt;
for (const line of chunk.split('\n')) {&lt;br&gt;
if (line.startsWith('data: ')) {&lt;br&gt;
const data = JSON.parse(line.slice(6));&lt;br&gt;
if (data.choices[0].delta.content) {&lt;br&gt;
yield data.choices[0].delta.content;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building with Modern React Patterns&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// useChat hook&lt;br&gt;
function useChat(initialMessages: Message[] = []) {&lt;br&gt;
const [messages, setMessages] = useState(initialMessages);&lt;br&gt;
const [isLoading, setIsLoading] = useState(false);&lt;/p&gt;

&lt;p&gt;const send = async (content: string) =&amp;gt; {&lt;br&gt;
const userMessage: Message = {&lt;br&gt;
id: crypto.randomUUID(),&lt;br&gt;
role: 'user',&lt;br&gt;
content,&lt;br&gt;
timestamp: new Date()&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;setMessages(prev =&amp;gt; [...prev, userMessage]);&lt;br&gt;
setIsLoading(true);&lt;/p&gt;

&lt;p&gt;// ... streaming logic&lt;/p&gt;

&lt;p&gt;setIsLoading(false);&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;const clear = () =&amp;gt; setMessages([]);&lt;/p&gt;

&lt;p&gt;return { messages, send, clear, isLoading };&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Usage&lt;br&gt;
function App() {&lt;br&gt;
const { messages, send, isLoading } = useChat();&lt;br&gt;
return ;&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Power your React chat app with ofox.ai — OpenAI-compatible API with Claude models. Sign up and get an API key to start building.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: react,javascript,ai,programming,webdev&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>GitHub Actions + AI: Automating Code Quality with Claude</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:20:09 +0000</pubDate>
      <link>https://dev.to/zny10289/github-actions-ai-automating-code-quality-with-claude-2m1h</link>
      <guid>https://dev.to/zny10289/github-actions-ai-automating-code-quality-with-claude-2m1h</guid>
      <description>&lt;p&gt;Continuous integration with AI-powered code review catches bugs before they reach production. Here's how to build a GitHub Actions workflow that runs Claude-powered analysis on every pull request.&lt;/p&gt;

&lt;p&gt;Why AI Code Review in CI?&lt;/p&gt;

&lt;p&gt;Traditional CI catches syntax errors and test failures. AI code review catches:&lt;br&gt;
Logic bugs&lt;br&gt;
Security vulnerabilities&lt;br&gt;
Performance issues&lt;br&gt;
Code quality problems&lt;br&gt;
Documentation gaps&lt;/p&gt;

&lt;p&gt;The GitHub Actions Workflow&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
.github/workflows/ai-review.yml&lt;br&gt;
name: AI Code Review&lt;/p&gt;

&lt;p&gt;on:&lt;br&gt;
pull_request:&lt;br&gt;
types: [opened, synchronize, reopened]&lt;/p&gt;

&lt;p&gt;jobs:&lt;br&gt;
ai-review:&lt;br&gt;
runs-on: ubuntu-latest&lt;br&gt;
permissions:&lt;br&gt;
pull-requests: write&lt;br&gt;
contents: read&lt;/p&gt;

&lt;p&gt;steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;name: Checkout code&lt;br&gt;
uses: actions/checkout@v4&lt;br&gt;
with:&lt;br&gt;
fetch-depth: 0&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name: Get PR diff&lt;br&gt;
id: diff&lt;br&gt;
run: |&lt;br&gt;
git diff origin/${{ github.baseref }}...HEAD &amp;gt; prdiff.txt&lt;br&gt;
echo "diffsize=$(wc -c &amp;lt; prdiff.txt)" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;br&gt;
echo "fileschanged=$(git diff --name-only origin/${{ github.baseref }}...HEAD | wc -l)" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;name: Run AI Code Review&lt;br&gt;
if: steps.diff.outputs.diff_size &amp;lt; 50000  # Skip if too large&lt;br&gt;
env:&lt;br&gt;
OFOXAPIKEY: ${{ secrets.OFOXAPIKEY }}&lt;br&gt;
run: |&lt;/p&gt;
&lt;h1&gt;
  
  
  Get PR context
&lt;/h1&gt;

&lt;p&gt;PRNUMBER=$(echo ${{ github.event.pullrequest.number }})&lt;br&gt;
REPO=${{ github.repository }}&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Prepare review prompt
&lt;/h1&gt;

&lt;p&gt;DIFF=$(cat pr_diff.txt)&lt;/p&gt;

&lt;h1&gt;
  
  
  Call Claude via ofox.ai
&lt;/h1&gt;

&lt;p&gt;RESPONSE=$(curl -s -X POST &lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt; \&lt;br&gt;
-H "Authorization: Bearer $OFOXAPIKEY" \&lt;br&gt;
-H "Content-Type: application/json" \&lt;br&gt;
-d '{&lt;br&gt;
"model": "claude-3-5-sonnet-20241022",&lt;br&gt;
"messages": [{&lt;br&gt;
"role": "user",&lt;br&gt;
"content": "You are an expert code reviewer. Review this PR diff and provide feedback on bugs, security issues, performance problems, and code quality. Be concise but thorough.\n\n'${DIFF}'"&lt;br&gt;
}],&lt;br&gt;
"max_tokens": 2000,&lt;br&gt;
"temperature": 0.3&lt;br&gt;
}')&lt;/p&gt;

&lt;p&gt;echo "$RESPONSE" | jq -r '.choices[0].message.content' &amp;gt; review_comment.md&lt;br&gt;
echo "REVIEWOUTPUT=$(cat reviewcomment.md)" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;name: Post review comment
if: steps.diff.outputs.diff_size &amp;lt; 50000
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issuenumber: context.payload.pullrequest.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: process.env.REVIEW_OUTPUT
})
`&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Filtering Large Diffs&lt;/p&gt;

&lt;p&gt;&lt;code&gt;yaml&lt;br&gt;
name: Check diff size&lt;br&gt;
id: size&lt;br&gt;
run: |&lt;br&gt;
SIZE=$(wc -c &amp;lt; pr_diff.txt)&lt;br&gt;
echo "size=$SIZE" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;br&gt;
if [ $SIZE -gt 50000 ]; then&lt;br&gt;
echo "::warning::PR diff too large ($SIZE bytes), skipping AI review"&lt;br&gt;
fi&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Getting Your API Key&lt;/p&gt;

&lt;p&gt;Set up your ofox.ai API key as a GitHub Secret:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your repository → Settings → Secrets and variables → Actions&lt;/li&gt;
&lt;li&gt;Click New repository secret&lt;/li&gt;
&lt;li&gt;Name: OFOXAPIKEY&lt;/li&gt;
&lt;li&gt;Value: your key from ofox.ai&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 Get your ofox.ai API key&lt;/p&gt;

&lt;p&gt;Expanding to Full Analysis&lt;/p&gt;

&lt;p&gt;Beyond simple diff review, extend the workflow:&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
Add after PR review&lt;br&gt;
name: Run security scan&lt;br&gt;
uses: github/codeql-action/analyze@v3&lt;br&gt;
with:&lt;br&gt;
category: "/language:javascript"&lt;/p&gt;

&lt;p&gt;name: AI Summary&lt;br&gt;
run: |&lt;/p&gt;

&lt;h1&gt;
  
  
  Generate PR summary with AI
&lt;/h1&gt;

&lt;p&gt;SUMMARY=$(curl -s -X POST &lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt; \&lt;br&gt;
-H "Authorization: Bearer $OFOXAPIKEY" \&lt;br&gt;
-H "Content-Type: application/json" \&lt;br&gt;
-d '{&lt;br&gt;
"model": "claude-3-5-sonnet-20241022",&lt;br&gt;
"messages": [{&lt;br&gt;
"role": "user",&lt;br&gt;
"content": "Summarize this PR in 3 bullet points. Focus on what changed and why.\n\n'$(cat pr_diff.txt | head -100)'"&lt;br&gt;
}],&lt;br&gt;
"max_tokens": 300&lt;br&gt;
}')&lt;br&gt;
echo "$SUMMARY" | jq -r '.choices[0].message.content'&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Best Practices&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rate limit awareness — Don't run on every push; use types: [opened, synchronize]&lt;/li&gt;
&lt;li&gt;Diff size limits — Skip reviews for massive changes (&amp;gt;50KB)&lt;/li&gt;
&lt;li&gt;Token budget — Set max_tokens to control costs&lt;/li&gt;
&lt;li&gt;Cache common prompts — Reuse system prompts across runs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Complete Example&lt;/p&gt;

&lt;p&gt;`yaml&lt;br&gt;
name: AI PR Review + Security Scan&lt;/p&gt;

&lt;p&gt;on:&lt;br&gt;
pull_request:&lt;br&gt;
types: [opened, synchronize]&lt;/p&gt;

&lt;p&gt;jobs:&lt;br&gt;
review:&lt;br&gt;
runs-on: ubuntu-latest&lt;br&gt;
steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;uses: actions/checkout@v4&lt;br&gt;
with:&lt;br&gt;
fetch-depth: 0&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name: AI Review&lt;br&gt;
env:&lt;br&gt;
OFOXAPIKEY: ${{ secrets.OFOXAPIKEY }}&lt;br&gt;
run: |&lt;br&gt;
DIFF=$(git diff origin/${{ github.base_ref }}...HEAD)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;curl -s -X POST &lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt; \&lt;br&gt;
-H "Authorization: Bearer $OFOXAPIKEY" \&lt;br&gt;
-H "Content-Type: application/json" \&lt;br&gt;
-d '{&lt;br&gt;
"model": "claude-3-5-sonnet-20241022",&lt;br&gt;
"messages": [{&lt;br&gt;
"role": "user",&lt;br&gt;
"content": "Review this PR. Flag: bugs, security, performance, quality. Format: ## Bugs\n## Security\n## Performance\n## Quality\n\n' + "$DIFF" + '"&lt;br&gt;
}],&lt;br&gt;
"max_tokens": 1500&lt;br&gt;
}' | jq -r '.choices[0].message.content' &amp;gt; review.md&lt;/p&gt;

&lt;p&gt;github.rest.issues.createComment({&lt;br&gt;
issuenumber: context.payload.pullrequest.number,&lt;br&gt;
owner: context.repo.owner,&lt;br&gt;
repo: context.repo.repo,&lt;br&gt;
body: require('fs').readFileSync('review.md', 'utf8')&lt;br&gt;
})&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Get started with Claude-powered CI: ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: github-actions,ci-cd,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Price Your AI Development Services in 2026</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:14:57 +0000</pubDate>
      <link>https://dev.to/zny10289/how-to-price-your-ai-development-services-in-2026-3aap</link>
      <guid>https://dev.to/zny10289/how-to-price-your-ai-development-services-in-2026-3aap</guid>
      <description>&lt;p&gt;One of the most common questions developers ask when entering the AI services market: "What should I charge?" Pricing AI development is different from traditional software development. Here's a comprehensive guide based on real market data.&lt;/p&gt;

&lt;p&gt;Why AI Development Commands Premium Rates&lt;/p&gt;

&lt;p&gt;AI development isn't just coding — it involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt engineering expertise — Crafting effective prompts is a specialized skill&lt;/li&gt;
&lt;li&gt;Model selection and optimization — Choosing the right model for the right task&lt;/li&gt;
&lt;li&gt;API integration complexity — Integrating multiple AI services reliably&lt;/li&gt;
&lt;li&gt;Output validation — Ensuring AI outputs are correct and safe&lt;/li&gt;
&lt;li&gt;Cost optimization — Managing token usage efficiently&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These skills justify higher rates than traditional development.&lt;/p&gt;

&lt;p&gt;Market Data: AI Developer Hourly Rates (2026)&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Entry&lt;/th&gt;
&lt;th&gt;Mid&lt;/th&gt;
&lt;th&gt;Senior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI Integration Developer&lt;/td&gt;
&lt;td&gt;$80/hr&lt;/td&gt;
&lt;td&gt;$150/hr&lt;/td&gt;
&lt;td&gt;$250+/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Fine-tuning Specialist&lt;/td&gt;
&lt;td&gt;$100/hr&lt;/td&gt;
&lt;td&gt;$200/hr&lt;/td&gt;
&lt;td&gt;$350+/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Product Engineer&lt;/td&gt;
&lt;td&gt;$120/hr&lt;/td&gt;
&lt;td&gt;$200/hr&lt;/td&gt;
&lt;td&gt;$300+/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Security Auditor&lt;/td&gt;
&lt;td&gt;$150/hr&lt;/td&gt;
&lt;td&gt;$250/hr&lt;/td&gt;
&lt;td&gt;$400+/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pricing Models for AI Development&lt;/p&gt;

&lt;p&gt;Model 1: Hourly Billing&lt;/p&gt;

&lt;p&gt;Best for: Complex, undefined-scope projects&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
Example hourly engagement&lt;br&gt;
hourly_rate = 150  # USD&lt;br&gt;
hours_estimated = 40&lt;br&gt;
projectvalue = hourlyrate * hours_estimated  # $6,000&lt;/p&gt;

&lt;p&gt;Add AI API costs (pass-through)&lt;br&gt;
apicosts = estimatedtokens * costpertoken&lt;br&gt;
total = projectvalue + apicosts&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Pros: Covers uncertainty&lt;br&gt;
Cons: Client may resist open-ended billing&lt;/p&gt;

&lt;p&gt;Model 2: Fixed Project Price&lt;/p&gt;

&lt;p&gt;Best for: Well-defined, repeatable deliverables&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
AI chatbot development (fixed price)&lt;br&gt;
base_price = 5000  # Core chatbot&lt;br&gt;
perfeaturemultiplier = 1.3  # Each major feature adds 30%&lt;br&gt;
features = ["ollama", "vector-search", "web-scraping"]&lt;/p&gt;

&lt;p&gt;price = base_price&lt;br&gt;
for f in features:&lt;br&gt;
price *= perfeaturemultiplier&lt;/p&gt;

&lt;p&gt;print(f"Project price: ${price:.2f}")  # ~$11,000&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Pros: Client certainty, potentially higher value&lt;br&gt;
Cons: Scope creep risk&lt;/p&gt;

&lt;p&gt;Model 3: Value-Based Pricing&lt;/p&gt;

&lt;p&gt;Best for: High-impact projects with measurable ROI&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
Example: AI that saves 20 hours/week&lt;br&gt;
hourssavedper_week = 20&lt;br&gt;
hourly_value = 100&lt;br&gt;
weeklyvalue = hourssavedperweek * hourly_value  # $2,000/week&lt;/p&gt;

&lt;p&gt;Price at 30% of annual value&lt;br&gt;
annualvalue = weeklyvalue * 52  # $104,000&lt;br&gt;
projectprice = annualvalue * 0.30  # $31,200&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Pros: Captures true value&lt;br&gt;
Cons: Harder to justify without data&lt;/p&gt;

&lt;p&gt;Model 4: Subscription / Retainer&lt;/p&gt;

&lt;p&gt;Best for: Ongoing AI development needs&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
Monthly retainer model&lt;br&gt;
base_hours = 20&lt;br&gt;
hourly_rate = 125  # Discounted from $150&lt;br&gt;
monthly = basehours * hourlyrate  # $2,500/month&lt;/p&gt;

&lt;p&gt;AI API costs passed through&lt;br&gt;
estimatedapicost = 150&lt;br&gt;
total = monthly + estimatedapicost  # $2,650/month&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Pros: Predictable revenue, deeper client relationship&lt;br&gt;
Cons: Requires consistent value delivery&lt;/p&gt;

&lt;p&gt;Real Project Examples&lt;/p&gt;

&lt;p&gt;Example 1: RAG System for Law Firm&lt;/p&gt;

&lt;p&gt;Project: Build a Retrieval-Augmented Generation system for case law research&lt;/p&gt;

&lt;p&gt;Complexity: High (vector database, document parsing, citation verification)&lt;br&gt;
Timeline: 3 weeks&lt;br&gt;
Rate: $175/hr&lt;br&gt;
Total: ~$21,000&lt;/p&gt;

&lt;p&gt;Key pricing factors:&lt;br&gt;
Specialized domain knowledge (legal)&lt;br&gt;
High stakes = premium&lt;br&gt;
Ongoing maintenance opportunity&lt;/p&gt;

&lt;p&gt;Example 2: AI Code Review Bot&lt;/p&gt;

&lt;p&gt;Project: GitHub integration that reviews pull requests using Claude&lt;/p&gt;

&lt;p&gt;Complexity: Medium&lt;br&gt;
Timeline: 1 week&lt;br&gt;
Fixed price: $3,500&lt;/p&gt;

&lt;p&gt;Key pricing factors:&lt;br&gt;
Clear scope (GitHub PR → review comment)&lt;br&gt;
Recurring usage (leads to subscription)&lt;br&gt;
Developer audience (faster adoption)&lt;/p&gt;

&lt;p&gt;Example 3: Customer Support AI Agent&lt;/p&gt;

&lt;p&gt;Project: AI agent that handles tier-1 support tickets&lt;/p&gt;

&lt;p&gt;Complexity: High (multi-turn conversation, tool use, escalation)&lt;br&gt;
Timeline: 4 weeks&lt;br&gt;
Value-based: $40,000&lt;/p&gt;

&lt;p&gt;Key pricing factors:&lt;br&gt;
Measurable ROI (80% ticket reduction)&lt;br&gt;
Enterprise client&lt;br&gt;
Integration complexity&lt;/p&gt;

&lt;p&gt;AI API Cost Pass-Through&lt;/p&gt;

&lt;p&gt;Always account for API costs in your pricing:&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
def calculateaicost(monthlyusers, avgtokensperturn, turnsperconversation):&lt;/p&gt;

&lt;h1&gt;
  
  
  ofox.ai pricing example
&lt;/h1&gt;

&lt;p&gt;costper1k_tokens = 0.003  # Claude 3.5 Sonnet&lt;/p&gt;

&lt;p&gt;tokensperuser = avgtokensperturn  turnsperconversation  monthlyusers&lt;br&gt;
monthlycost = (tokensperuser / 1000) * costper3ktokens&lt;/p&gt;

&lt;p&gt;return monthly_cost&lt;/p&gt;

&lt;p&gt;Example: 1000 users, 500 tokens/turn, 5 turns&lt;br&gt;
users = 1000&lt;br&gt;
tokensperturn = 500&lt;br&gt;
turns = 5&lt;br&gt;
monthlycost = calculateaicost(users, tokensper_turn, turns)&lt;br&gt;
print(f"API cost: ${monthly_cost:.2f}/month")  # ~$75/month&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Negotiating AI Development Contracts&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Separate AI costs from your labor — Makes pricing transparent&lt;/li&gt;
&lt;li&gt;Build in revision limits — AI outputs may need more iteration&lt;/li&gt;
&lt;li&gt;Define success metrics — "80% ticket resolution" not "good AI chatbot"&lt;/li&gt;
&lt;li&gt;Include opt-out clauses — AI technology evolves fast&lt;/li&gt;
&lt;li&gt;Price for uncertainty — Add 20-30% buffer for AI unpredictability&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Getting Started with AI Development&lt;/p&gt;

&lt;p&gt;Whether you're building AI integrations for clients or powering your own SaaS tools, the foundation is reliable API access. ofox.ai provides OpenAI-compatible Claude API with competitive pricing — perfect for production AI applications.&lt;/p&gt;

&lt;p&gt;👉 Explore ofox.ai for your AI development needs&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: freelancing,programming,ai,developer,career&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>AI API Error Handling and Reliability: Production Best Practices</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:09:46 +0000</pubDate>
      <link>https://dev.to/zny10289/ai-api-error-handling-and-reliability-production-best-practices-42om</link>
      <guid>https://dev.to/zny10289/ai-api-error-handling-and-reliability-production-best-practices-42om</guid>
      <description>&lt;p&gt;Production AI applications fail in ways traditional software doesn't. Models go down, tokens run out, responses hallucinate, and rate limits hit at the worst moments. Here's how to build reliable AI-powered systems.&lt;/p&gt;

&lt;p&gt;The AI Reliability Problem&lt;/p&gt;

&lt;p&gt;Traditional APIs return consistent responses or clear errors. AI APIs introduce new failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Model outages — The provider's model goes down&lt;/li&gt;
&lt;li&gt;Rate limits — You've exhausted your quota mid-request&lt;/li&gt;
&lt;li&gt;Token limits — Your prompt exceeds context window&lt;/li&gt;
&lt;li&gt;Hallucinations — Model returns plausible but wrong answers&lt;/li&gt;
&lt;li&gt;Timeout — Request takes too long and hangs&lt;/li&gt;
&lt;li&gt;Invalid JSON — Model returns malformed structured data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Retry Logic with Exponential Backoff&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
async function withRetry(&lt;br&gt;
fn: () =&amp;gt; Promise,&lt;br&gt;
options: {&lt;br&gt;
maxRetries?: number;&lt;br&gt;
baseDelay?: number;&lt;br&gt;
maxDelay?: number;&lt;br&gt;
onRetry?: (attempt: number, error: Error) =&amp;gt; void;&lt;br&gt;
} = {}&lt;br&gt;
): Promise {&lt;br&gt;
const {&lt;br&gt;
maxRetries = 3,&lt;br&gt;
baseDelay = 1000,&lt;br&gt;
maxDelay = 30000,&lt;br&gt;
onRetry&lt;br&gt;
} = options;&lt;/p&gt;

&lt;p&gt;for (let attempt = 1; attempt &amp;lt;= maxRetries + 1; attempt++) {&lt;br&gt;
try {&lt;br&gt;
return await fn();&lt;br&gt;
} catch (error) {&lt;br&gt;
const isRetryable = isRetryableError(error);&lt;br&gt;
const isLastAttempt = attempt &amp;gt; maxRetries;&lt;/p&gt;

&lt;p&gt;if (isLastAttempt || !isRetryable) {&lt;br&gt;
throw error;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;const delay = Math.min(baseDelay * Math.pow(2, attempt - 1), maxDelay);&lt;br&gt;
onRetry?.(attempt, error as Error);&lt;br&gt;
await sleep(delay);&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
throw new Error('Unreachable');&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;function isRetryableError(error: unknown): boolean {&lt;br&gt;
if (error instanceof AIAPIError) {&lt;br&gt;
// 429 = rate limit, 500 = server error, 503 = service unavailable&lt;br&gt;
return [429, 500, 502, 503].includes(error.status);&lt;br&gt;
}&lt;br&gt;
// Network timeout&lt;br&gt;
return (error as NodeJS.ErrnoException).code === 'ETIMEDOUT';&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Handling Rate Limits Gracefully&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
async function chatWithRateLimit(&lt;br&gt;
client: ClaudeClient,&lt;br&gt;
messages: ChatMessage[],&lt;br&gt;
options: { maxWait?: number } = {}&lt;br&gt;
): Promise {&lt;br&gt;
const { maxWait = 60000 } = options;&lt;br&gt;
const startTime = Date.now();&lt;/p&gt;

&lt;p&gt;while (true) {&lt;br&gt;
try {&lt;br&gt;
return await client.chat(messages);&lt;br&gt;
} catch (error) {&lt;br&gt;
if (!(error instanceof AIAPIError) || error.status !== 429) throw error;&lt;/p&gt;

&lt;p&gt;const retryAfter = error.retryAfter || 1000;&lt;br&gt;
const elapsed = Date.now() - startTime;&lt;/p&gt;

&lt;p&gt;if (elapsed + retryAfter &amp;gt; maxWait) {&lt;br&gt;
throw new Error('Rate limit exceeded, max wait time reached');&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;console.log(Rate limited, waiting ${retryAfter}ms...);&lt;br&gt;
await sleep(retryAfter);&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Structured Output Validation&lt;/p&gt;

&lt;p&gt;Models often return invalid JSON. Always validate:&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
import { z } from 'zod';&lt;/p&gt;

&lt;p&gt;const CodeReviewSchema = z.object({&lt;br&gt;
bugs: z.array(z.object({&lt;br&gt;
line: z.number(),&lt;br&gt;
severity: z.enum(['low', 'medium', 'high']),&lt;br&gt;
description: z.string()&lt;br&gt;
})),&lt;br&gt;
suggestions: z.array(z.string()),&lt;br&gt;
score: z.number().min(0).max(10)&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;async function reviewCode(code: string): Promise {&lt;br&gt;
const response = await client.chat([&lt;br&gt;
{ role: 'user', content: Review this code:\n\n${code}\n\nReturn valid JSON. }&lt;br&gt;
]);&lt;/p&gt;

&lt;p&gt;try {&lt;br&gt;
const parsed = JSON.parse(response.choices[0].message.content);&lt;br&gt;
return CodeReviewSchema.parse(parsed);&lt;br&gt;
} catch {&lt;br&gt;
// Fallback: retry with stricter prompting&lt;br&gt;
return reviewCodeWithFallback(code);&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Circuit Breaker Pattern&lt;/p&gt;

&lt;p&gt;Prevent cascading failures when AI API is degraded:&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
class CircuitBreaker {&lt;br&gt;
private failures = 0;&lt;br&gt;
private lastFailure = 0;&lt;br&gt;
private state: 'closed' | 'open' | 'half-open' = 'closed';&lt;/p&gt;

&lt;p&gt;constructor(&lt;br&gt;
private readonly threshold: number = 5,&lt;br&gt;
private readonly timeout: number = 60000&lt;br&gt;
) {}&lt;/p&gt;

&lt;p&gt;async execute(fn: () =&amp;gt; Promise): Promise {&lt;br&gt;
if (this.state === 'open') {&lt;br&gt;
if (Date.now() - this.lastFailure &amp;gt; this.timeout) {&lt;br&gt;
this.state = 'half-open';&lt;br&gt;
} else {&lt;br&gt;
throw new Error('Circuit breaker is open');&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;try {&lt;br&gt;
const result = await fn();&lt;br&gt;
this.onSuccess();&lt;br&gt;
return result;&lt;br&gt;
} catch (error) {&lt;br&gt;
this.onFailure();&lt;br&gt;
throw error;&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;private onSuccess() {&lt;br&gt;
this.failures = 0;&lt;br&gt;
this.state = 'closed';&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;private onFailure() {&lt;br&gt;
this.failures++;&lt;br&gt;
this.lastFailure = Date.now();&lt;br&gt;
if (this.failures &amp;gt;= this.threshold) {&lt;br&gt;
this.state = 'open';&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Usage&lt;br&gt;
const breaker = new CircuitBreaker(5, 60000);&lt;br&gt;
const result = await breaker.execute(() =&amp;gt; client.chat(messages));&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Timeout Strategy&lt;/p&gt;

&lt;p&gt;Set both connection and request timeouts:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;typescript&lt;br&gt;
const response = await fetch(url, {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: { 'Authorization': Bearer ${apiKey}, 'Content-Type': 'application/json' },&lt;br&gt;
body: JSON.stringify(payload),&lt;br&gt;
signal: AbortSignal.timeout(120000) // 2 minute timeout&lt;br&gt;
});&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Cost Control&lt;/p&gt;

&lt;p&gt;Prevent runaway costs with token budgets:&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
class TokenBudget {&lt;br&gt;
private spent = 0;&lt;/p&gt;

&lt;p&gt;constructor(&lt;br&gt;
private readonly maxBudget: number,&lt;br&gt;
private readonly costPerToken: number&lt;br&gt;
) {}&lt;/p&gt;

&lt;p&gt;async executeWithBudget(fn: () =&amp;gt; Promise): Promise {&lt;br&gt;
const estimatedCost = this.estimateCost(fn);&lt;/p&gt;

&lt;p&gt;if (this.spent + estimatedCost &amp;gt; this.maxBudget) {&lt;br&gt;
throw new Error(Budget exceeded. Spent: ${this.spent}, Max: ${this.maxBudget});&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;const result = await fn();&lt;br&gt;
this.spent += estimatedCost;&lt;br&gt;
return result;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;private estimateCost(fn: () =&amp;gt; Promise): number {&lt;br&gt;
// Rough estimate based on input size&lt;br&gt;
return 0; // Would need implementation&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;getSpent(): number {&lt;br&gt;
return this.spent;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building Reliable AI Applications&lt;/p&gt;

&lt;p&gt;The key insight: AI APIs require defensive programming at a level traditional APIs don't. Layer retry logic, circuit breakers, validation, and cost controls to build systems that degrade gracefully rather than fail catastrophically.&lt;/p&gt;

&lt;p&gt;Get started with reliable AI API access: ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: api,error-handling,programming,developer,reliability&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>TypeScript AI SDK: Building Type-Safe AI-Powered Applications</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 18:04:35 +0000</pubDate>
      <link>https://dev.to/zny10289/typescript-ai-sdk-building-type-safe-ai-powered-applications-5a8a</link>
      <guid>https://dev.to/zny10289/typescript-ai-sdk-building-type-safe-ai-powered-applications-5a8a</guid>
      <description>&lt;p&gt;TypeScript's type system transforms AI API integration from guesswork into reliable, maintainable code. When you're building production AI applications, TypeScript catches errors at compile time, not runtime. Here's how to build type-safe AI integrations.&lt;/p&gt;

&lt;p&gt;Why TypeScript for AI APIs?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Catch errors early — Invalid API responses caught at compile time&lt;/li&gt;
&lt;li&gt;IDE autocompletion — Know exactly what the API returns&lt;/li&gt;
&lt;li&gt;Refactoring confidence — Change types and the compiler finds all affected code&lt;/li&gt;
&lt;li&gt;Documentation — Types serve as living documentation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Type-Safe API Client&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
import { z } from 'zod';&lt;/p&gt;

&lt;p&gt;// Define response schemas&lt;br&gt;
const MessageSchema = z.object({&lt;br&gt;
role: z.enum(['system', 'user', 'assistant']),&lt;br&gt;
content: z.string()&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const ChoiceSchema = z.object({&lt;br&gt;
message: MessageSchema,&lt;br&gt;
finish_reason: z.string(),&lt;br&gt;
index: z.number()&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const UsageSchema = z.object({&lt;br&gt;
prompt_tokens: z.number(),&lt;br&gt;
completion_tokens: z.number(),&lt;br&gt;
total_tokens: z.number()&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;const ChatCompletionResponseSchema = z.object({&lt;br&gt;
id: z.string(),&lt;br&gt;
object: z.string(),&lt;br&gt;
created: z.number(),&lt;br&gt;
model: z.string(),&lt;br&gt;
choices: z.array(ChoiceSchema),&lt;br&gt;
usage: UsageSchema&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;type ChatMessage = z.infer;&lt;br&gt;
type ChatResponse = z.infer;&lt;/p&gt;

&lt;p&gt;// Type-safe client&lt;br&gt;
class ClaudeClient {&lt;br&gt;
constructor(&lt;br&gt;
private readonly apiKey: string,&lt;br&gt;
private readonly baseURL: string = '&lt;a href="https://api.ofox.ai/v1" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1&lt;/a&gt;'&lt;br&gt;
) {}&lt;/p&gt;

&lt;p&gt;async chat(messages: ChatMessage[]): Promise {&lt;br&gt;
const response = await fetch(${this.baseURL}/chat/completions, {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': Bearer ${this.apiKey},&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: JSON.stringify({&lt;br&gt;
model: 'claude-3-5-sonnet-20241022',&lt;br&gt;
messages&lt;br&gt;
})&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;if (!response.ok) {&lt;br&gt;
throw new Error(API error: ${response.status});&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;const data = await response.json();&lt;/p&gt;

&lt;p&gt;// Zod validates at runtime AND gives you typed result&lt;br&gt;
return ChatCompletionResponseSchema.parse(data);&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Streaming with Types&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
async function* streamChat(&lt;br&gt;
client: ClaudeClient,&lt;br&gt;
messages: ChatMessage[]&lt;br&gt;
): AsyncGenerator {&lt;br&gt;
const response = await fetch(${client.baseURL}/chat/completions, {&lt;br&gt;
method: 'POST',&lt;br&gt;
headers: {&lt;br&gt;
'Authorization': Bearer ${client.apiKey},&lt;br&gt;
'Content-Type': 'application/json'&lt;br&gt;
},&lt;br&gt;
body: JSON.stringify({&lt;br&gt;
model: 'claude-3-5-sonnet-20241022',&lt;br&gt;
messages,&lt;br&gt;
stream: true&lt;br&gt;
})&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;if (!response.body) throw new Error('No response body');&lt;/p&gt;

&lt;p&gt;const decoder = new TextDecoder();&lt;/p&gt;

&lt;p&gt;for await (const chunk of response.body) {&lt;br&gt;
const lines = decoder.decode(chunk).split('\n');&lt;br&gt;
for (const line of lines) {&lt;br&gt;
if (line.startsWith('data: ')) {&lt;br&gt;
const data = JSON.parse(line.slice(6));&lt;br&gt;
if (data.choices[0].delta.content) {&lt;br&gt;
yield data.choices[0].delta.content;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Usage with full type safety&lt;br&gt;
for await (const token of streamChat(client, messages)) {&lt;br&gt;
process.stdout.write(token);&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Building a Type-Safe Agent Framework&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
// Tool definitions with full type safety&lt;br&gt;
const ToolSchema = z.object({&lt;br&gt;
name: z.string(),&lt;br&gt;
description: z.string(),&lt;br&gt;
parameters: z.record(z.string(), z.unknown())&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;type Tool = z.infer;&lt;/p&gt;

&lt;p&gt;interface ToolResult {&lt;br&gt;
success: boolean;&lt;br&gt;
data?: T;&lt;br&gt;
error?: string;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;class TypedAgent {&lt;br&gt;
constructor(&lt;br&gt;
private client: ClaudeClient,&lt;br&gt;
private tools: Tool[]&lt;br&gt;
) {}&lt;/p&gt;

&lt;p&gt;async execute(systemPrompt: string, task: string): Promise {&lt;br&gt;
const messages: ChatMessage[] = [&lt;br&gt;
{ role: 'system', content: systemPrompt },&lt;br&gt;
{ role: 'user', content: task }&lt;br&gt;
];&lt;/p&gt;

&lt;p&gt;while (true) {&lt;br&gt;
const response = await this.client.chat(messages);&lt;br&gt;
const content = response.choices[0].message.content;&lt;/p&gt;

&lt;p&gt;// Check if agent wants to use a tool&lt;br&gt;
const toolCall = this.parseToolCall(content);&lt;br&gt;
if (toolCall) {&lt;br&gt;
const result = await this.executeTool(toolCall);&lt;br&gt;
messages.push({ role: 'assistant', content });&lt;br&gt;
messages.push({&lt;br&gt;
role: 'user',&lt;br&gt;
content: Tool result: ${JSON.stringify(result)}&lt;br&gt;
});&lt;br&gt;
} else {&lt;br&gt;
return content;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;private parseToolCall(content: string): { name: string; args: unknown } | null {&lt;br&gt;
// Parse tool call from response&lt;br&gt;
const match = content.match(/([\s\S]*?)&amp;lt;\/toolcall&amp;gt;/);&lt;br&gt;
if (!match) return null;&lt;br&gt;
return JSON.parse(match[1]);&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;private async executeTool(call: { name: string; args: unknown }): Promise&amp;gt; {&lt;br&gt;
const tool = this.tools.find(t =&amp;gt; t.name === call.name);&lt;br&gt;
if (!tool) return { success: false, error: Unknown tool: ${call.name} };&lt;/p&gt;

&lt;p&gt;try {&lt;br&gt;
// Tool execution would be implemented here&lt;br&gt;
return { success: true, data: null };&lt;br&gt;
} catch (e) {&lt;br&gt;
return { success: false, error: (e as Error).message };&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Validation with Zod&lt;/p&gt;

&lt;p&gt;Zod validates API responses at runtime:&lt;/p&gt;

&lt;p&gt;`typescript&lt;br&gt;
import { z } from 'zod';&lt;/p&gt;

&lt;p&gt;// Define the exact shape you expect&lt;br&gt;
const APIResponse = z.object({&lt;br&gt;
model: z.string(),&lt;br&gt;
id: z.string(),&lt;br&gt;
choices: z.array(z.object({&lt;br&gt;
message: z.object({&lt;br&gt;
role: z.string(),&lt;br&gt;
content: z.string()&lt;br&gt;
}),&lt;br&gt;
finish_reason: z.string()&lt;br&gt;
}))&lt;br&gt;
});&lt;/p&gt;

&lt;p&gt;async function safeChat(client: ClaudeClient, messages: ChatMessage[]) {&lt;br&gt;
try {&lt;br&gt;
const raw = await client.chatRaw(messages); // Returns unknown&lt;br&gt;
const validated = APIResponse.parse(raw); // Throws if shape doesn't match&lt;br&gt;
return validated;&lt;br&gt;
} catch (error) {&lt;br&gt;
if (error instanceof z.ZodError) {&lt;br&gt;
console.error('API response validation failed:', error.issues);&lt;br&gt;
throw new Error('Unexpected API response format');&lt;br&gt;
}&lt;br&gt;
throw error;&lt;br&gt;
}&lt;br&gt;
}&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build type-safe AI applications with TypeScript and ofox.ai — their OpenAI-compatible API integrates seamlessly with TypeScript's type system. Sign up and start building.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: typescript,nodejs,ai,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Python AI API Development: Complete FastAPI + Claude Integration Guide</title>
      <dc:creator>ZNY</dc:creator>
      <pubDate>Fri, 15 May 2026 17:59:23 +0000</pubDate>
      <link>https://dev.to/zny10289/python-ai-api-development-complete-fastapi-claude-integration-guide-3kla</link>
      <guid>https://dev.to/zny10289/python-ai-api-development-complete-fastapi-claude-integration-guide-3kla</guid>
      <description>&lt;p&gt;FastAPI is the best Python framework for building AI-powered APIs. Combined with Claude via ofox.ai, you can create production-ready AI endpoints in minutes. Here's the complete guide.&lt;/p&gt;

&lt;p&gt;Why FastAPI for AI APIs?&lt;/p&gt;

&lt;p&gt;Async native — Handle concurrent AI requests efficiently&lt;br&gt;
Automatic validation — Pydantic models validate inputs/outputs&lt;br&gt;
OpenAPI docs — Built-in interactive API documentation&lt;br&gt;
Type hints — Full IDE support and error checking&lt;br&gt;
Production-ready — Used by major companies worldwide&lt;/p&gt;

&lt;p&gt;Project Setup&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bash&lt;br&gt;
mkdir claude-api-service&lt;br&gt;
cd claude-api-service&lt;br&gt;
python3 -m venv venv&lt;br&gt;
source venv/bin/activate&lt;br&gt;
pip install fastapi uvicorn httpx pydantic&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Basic FastAPI + Claude Setup&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
main.py&lt;br&gt;
from fastapi import FastAPI, HTTPException&lt;br&gt;
from pydantic import BaseModel&lt;br&gt;
from typing import List, Optional&lt;br&gt;
import httpx&lt;/p&gt;

&lt;p&gt;app = FastAPI(title="Claude API Service")&lt;/p&gt;

&lt;p&gt;class Message(BaseModel):&lt;br&gt;
role: str&lt;br&gt;
content: str&lt;/p&gt;

&lt;p&gt;class ChatRequest(BaseModel):&lt;br&gt;
model: str = "claude-3-5-sonnet-20241022"&lt;br&gt;
messages: List[Message]&lt;br&gt;
max_tokens: Optional[int] = 1024&lt;br&gt;
temperature: Optional[float] = 0.7&lt;/p&gt;

&lt;p&gt;class ChatResponse(BaseModel):&lt;br&gt;
content: str&lt;br&gt;
model: str&lt;br&gt;
tokens_used: int&lt;/p&gt;

&lt;p&gt;@app.post("/chat", response_model=ChatResponse)&lt;br&gt;
async def chat(request: ChatRequest):&lt;br&gt;
async with httpx.AsyncClient() as client:&lt;br&gt;
response = await client.post(&lt;br&gt;
"&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;",&lt;br&gt;
headers={&lt;br&gt;
"Authorization": f"Bearer {process.env.OFOXAPIKEY}",&lt;br&gt;
"Content-Type": "application/json"&lt;br&gt;
},&lt;br&gt;
json={&lt;br&gt;
"model": request.model,&lt;br&gt;
"messages": [m.model_dump() for m in request.messages],&lt;br&gt;
"maxtokens": request.maxtokens,&lt;br&gt;
"temperature": request.temperature&lt;br&gt;
},&lt;br&gt;
timeout=60.0&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;if response.status_code != 200:&lt;br&gt;
raise HTTPException(statuscode=response.statuscode, detail=response.text)&lt;/p&gt;

&lt;p&gt;data = response.json()&lt;br&gt;
return ChatResponse(&lt;br&gt;
content=data["choices"][0]["message"]["content"],&lt;br&gt;
model=data["model"],&lt;br&gt;
tokensused=data["usage"]["totaltokens"]&lt;br&gt;
)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Running the Server&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bash&lt;br&gt;
export OFOXAPIKEY="your-key-here"&lt;br&gt;
uvicorn main:app --reload --port 8000&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;API docs available at &lt;a href="http://localhost:8000/docs" rel="noopener noreferrer"&gt;http://localhost:8000/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adding Streaming Responses&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
from fastapi.responses import StreamingResponse&lt;/p&gt;

&lt;p&gt;@app.post("/chat/stream")&lt;br&gt;
async def chat_stream(request: ChatRequest):&lt;br&gt;
async def generate():&lt;br&gt;
async with httpx.AsyncClient() as client:&lt;br&gt;
async with client.stream(&lt;br&gt;
"POST",&lt;br&gt;
"&lt;a href="https://api.ofox.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.ofox.ai/v1/chat/completions&lt;/a&gt;",&lt;br&gt;
headers={&lt;br&gt;
"Authorization": f"Bearer {process.env.OFOXAPIKEY}",&lt;br&gt;
"Content-Type": "application/json"&lt;br&gt;
},&lt;br&gt;
json={&lt;br&gt;
"model": request.model,&lt;br&gt;
"messages": [m.model_dump() for m in request.messages],&lt;br&gt;
"maxtokens": request.maxtokens,&lt;br&gt;
"stream": True&lt;br&gt;
},&lt;br&gt;
timeout=60.0&lt;br&gt;
) as response:&lt;br&gt;
async for chunk in response.aiter_lines():&lt;br&gt;
if chunk.startswith("data: "):&lt;br&gt;
data = chunk[6:]&lt;br&gt;
if data == "[DONE]":&lt;br&gt;
break&lt;br&gt;
yield f"{data}\n\n"&lt;/p&gt;

&lt;p&gt;return StreamingResponse(generate(), media_type="text/event-stream")&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Adding Authentication&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
from fastapi import Depends, HTTPException, status&lt;br&gt;
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials&lt;/p&gt;

&lt;p&gt;security = HTTPBearer()&lt;/p&gt;

&lt;p&gt;@app.post("/chat", response_model=ChatResponse)&lt;br&gt;
async def chat(&lt;br&gt;
request: ChatRequest,&lt;br&gt;
credentials: HTTPAuthorizationCredentials = Depends(security)&lt;br&gt;
):&lt;br&gt;
if credentials.scheme != "Bearer" or not verifyapikey(credentials.credentials):&lt;br&gt;
raise HTTPException(&lt;br&gt;
statuscode=status.HTTP401_UNAUTHORIZED,&lt;br&gt;
detail="Invalid API key"&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  ... rest of chat logic
&lt;/h1&gt;

&lt;p&gt;`&lt;/p&gt;

&lt;p&gt;Production Deployment&lt;/p&gt;

&lt;p&gt;For production, use:&lt;br&gt;
Gunicorn + Uvicorn workers for concurrency&lt;br&gt;
Redis for request caching&lt;br&gt;
Rate limiting with middleware&lt;br&gt;
Docker for containerization&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dockerfile&lt;br&gt;
FROM python:3.11-slim&lt;br&gt;
WORKDIR /app&lt;br&gt;
COPY requirements.txt .&lt;br&gt;
RUN pip install --no-cache-dir -r requirements.txt&lt;br&gt;
COPY . .&lt;br&gt;
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Complete Example: Code Explanation API&lt;/p&gt;

&lt;p&gt;`python&lt;br&gt;
class ExplainRequest(BaseModel):&lt;br&gt;
code: str&lt;br&gt;
language: str = "python"&lt;/p&gt;

&lt;p&gt;@app.post("/explain", response_model=ChatResponse)&lt;br&gt;
async def explain_code(request: ExplainRequest):&lt;br&gt;
prompt = f"""Explain this {request.language} code in simple terms:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{request.language}&lt;br&gt;
{request.code}&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Provide a clear, concise explanation."""&lt;/p&gt;

&lt;p&gt;chat_request = ChatRequest(&lt;br&gt;
messages=[Message(role="user", content=prompt)]&lt;br&gt;
)&lt;br&gt;
return await chat(chat_request)&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;Getting Started&lt;/p&gt;

&lt;p&gt;Build your AI API with FastAPI and Claude. ofox.ai provides the OpenAI-compatible Claude endpoint — sign up, get your API key, and deploy your first AI API in under 10 minutes.&lt;/p&gt;

&lt;p&gt;👉 Get started with ofox.ai&lt;/p&gt;

&lt;p&gt;This article contains affiliate links.&lt;/p&gt;

&lt;p&gt;Tags: python,fastapi,claude-api,api,programming,developer&lt;br&gt;
Canonical URL: &lt;a href="https://dev.to/zny10289"&gt;https://dev.to/zny10289&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>javascript</category>
      <category>python</category>
    </item>
  </channel>
</rss>
