At RapierCraft, we faced a unique challenge: how do you build a system that can intelligently process thousands of news articles daily while maintaining sub-second response times and 99.9% uptime? The answer led us to create UltraNews, a platform that processes over 15,000 stories daily with 98.7% AI accuracy.
This article breaks down the architectural decisions, challenges, and innovations that power UltraNews—from our autonomous AI discovery systems to our multi-LLM orchestration approach.
The Core Architecture Challenge
When we started building UltraNews, we knew traditional news aggregation approaches wouldn't work. We needed a system that could:
- Process content from thousands of diverse sources with different structures
- Adapt automatically when websites change their layouts
- Maintain consistent performance under varying loads
- Orchestrate multiple AI providers for optimal results
- Scale horizontally without losing intelligence
The solution was a three-tier architecture that separates concerns while enabling seamless communication between intelligent components.
Tier 1: The AI Intelligence Engine (Backend)
Our backend is built on FastAPI with Python 3.11+, chosen for its async capabilities and automatic API documentation. But the real innovation lies in how we've structured the intelligent components.
Autonomous Discovery Architecture
# Simplified view of our discovery coordination system
class DiscoveryCoordinator:
def __init__(self):
self.strategies = [
StructuredDiscovery(), # Sitemaps, RSS feeds
IntelligentCrawling(), # List page detection
AIExploration() # LLM-guided discovery
]
async def discover_content(self, source: Source):
for strategy in self.strategies:
try:
results = await strategy.execute(source)
if self.validate_results(results):
return results
except Exception as e:
# Automatic fallback to next strategy
continue
# If all strategies fail, mark for human review
await self.queue_for_review(source)
The key architectural decision here was escalation-based discovery. Instead of trying one approach and failing, our system automatically escalates through three phases:
- Structured Discovery: Fast, efficient parsing of sitemaps and feeds
- Intelligent Crawling: Pattern-based content detection and extraction
- AI Exploration: LLM-guided discovery when other methods fail
Multi-LLM Orchestration System
One of our biggest architectural innovations is the Multi-LLM Orchestration system. Rather than being locked into a single AI provider, we built an abstraction layer that can work with any LLM:
class LLMOrchestrator:
def __init__(self):
self.providers = {
'groq': GroqProvider(),
'openrouter': OpenRouterProvider(),
'gemini': GeminiProvider(),
'nvidia': NVIDIAProvider(),
'local': OllamaProvider()
}
self.selector = IntelligentProviderSelector()
async def process_content(self, content: str, task_type: str):
# Select optimal provider based on task complexity and cost
provider = await self.selector.select_provider(content, task_type)
try:
result = await provider.process(content)
await self.log_success(provider, task_type)
return result
except Exception as e:
# Intelligent fallback with context preservation
fallback_provider = await self.selector.get_fallback(provider)
return await fallback_provider.process(content)
This architecture gives us several advantages:
- Cost Optimization: Automatically route simple tasks to cheaper models
- Performance Optimization: Use the fastest model for time-critical tasks
- Reliability: Seamless fallback when providers have issues
- Quality Optimization: Route complex tasks to the most capable models
Per-Source Orchestration
Traditional systems treat all content sources equally. We implemented per-source orchestration, where each news source gets its own dedicated processing pipeline:
class PerSourceOrchestrator:
def __init__(self):
self.source_managers = {}
self.adaptive_controller = AdaptiveController()
async def process_source(self, source_id: int):
if source_id not in self.source_managers:
self.source_managers[source_id] = SourceManager(
source_id=source_id,
initial_resources=self.calculate_initial_resources(source_id)
)
manager = self.source_managers[source_id]
# Dynamic resource allocation based on performance
await self.adaptive_controller.adjust_resources(manager)
return await manager.process()
class SourceManager:
def __init__(self, source_id: int, initial_resources: dict):
self.source_id = source_id
self.state = SourceState.ACTIVE
self.resources = initial_resources
self.performance_metrics = PerformanceTracker()
async def process(self):
if self.state == SourceState.SUSPENDED:
return await self.attempt_recovery()
# Process with dedicated resources
return await self.execute_processing_pipeline()
This approach allows us to:
- Isolate problematic sources without affecting others
- Dynamically allocate resources based on source reliability
- Maintain source-specific optimizations and learning
- Scale processing power where it's needed most
Tier 2: Professional Management Interface
Our admin panel is built with Next.js 15 and React 19, but the architecture focuses on real-time data synchronization and performance optimization.
Real-Time WebSocket Architecture
// WebSocket coordinator for real-time updates
class WebSocketCoordinator {
private connections: Map<string, WebSocket> = new Map();
private subscriptions: Map<string, Set<string>> = new Map();
async broadcastUpdate(channel: string, data: any) {
const subscribers = this.subscriptions.get(channel) || new Set();
const updatePayload = {
timestamp: Date.now(),
channel,
data: await this.optimizePayload(data)
};
subscribers.forEach(connectionId => {
const ws = this.connections.get(connectionId);
if (ws?.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(updatePayload));
}
});
}
private async optimizePayload(data: any): Promise<any> {
// Compress and optimize data for real-time transmission
return this.compressionService.compress(data);
}
}
Component-Based Performance Optimization
We implemented a priority-based rendering system where critical components get updated first:
// Priority-based component updates
const SystemHealthCard = memo(({ priority = 'high' }: ComponentProps) => {
const { data, loading } = useWebSocketData('system-health', {
updatePriority: priority,
bufferUpdates: priority === 'low'
});
return (
<Card className="real-time-updates">
<CPUMetrics data={data.cpu} />
<MemoryMetrics data={data.memory} />
<QueueMetrics data={data.queues} />
</Card>
);
});
Tier 3: Research Platform (Public Interface)
Our public platform focuses on performance-first architecture with advanced caching strategies.
Multi-Layer Caching Architecture
class CacheOrchestrator {
private layers = [
new BrowserCache(60000), // 1 minute
new CDNCache(300000), // 5 minutes
new RedisCache(900000), // 15 minutes
new DatabaseCache() // Persistent
];
async get(key: string): Promise<any> {
for (const cache of this.layers) {
try {
const result = await cache.get(key);
if (result) {
// Populate higher-priority caches
await this.backfillCaches(key, result, cache);
return result;
}
} catch (error) {
// Continue to next cache layer
continue;
}
}
// Cache miss - fetch from source
return null;
}
private async backfillCaches(key: string, data: any, sourceCache: Cache) {
const sourceIndex = this.layers.indexOf(sourceCache);
// Populate all higher-priority caches
for (let i = 0; i < sourceIndex; i++) {
await this.layers[i].set(key, data);
}
}
}
Database Architecture: Optimized for Scale
We use PostgreSQL with advanced optimization strategies:
Connection Pooling and Query Optimization
# Optimized database session management
class DatabaseManager:
def __init__(self):
self.pool = create_async_engine(
DATABASE_URL,
poolclass=StaticPool,
pool_pre_ping=True,
pool_recycle=3600,
echo=False
)
self.priority_session_factory = self.create_priority_factory()
def create_priority_factory(self):
return sessionmaker(
bind=self.pool,
class_=AsyncSession,
expire_on_commit=False
)
async def get_priority_session(self):
# Admin requests get priority database connections
return self.priority_session_factory()
Intelligent Indexing Strategy
-- Performance-optimized indexes for high-volume queries
CREATE INDEX CONCURRENTLY idx_articles_created_at_btree
ON articles USING btree(created_at DESC);
CREATE INDEX CONCURRENTLY idx_articles_source_id_status
ON articles(source_id, processing_status)
WHERE processing_status IN ('pending', 'processing');
-- Partial indexes for common query patterns
CREATE INDEX CONCURRENTLY idx_articles_importance_high
ON articles(importance_score DESC)
WHERE importance_score > 0.7;
Deployment and Scalability Architecture
Containerized Microservices
Our deployment uses Docker with service-specific optimizations:
# Multi-stage build for optimal container size
FROM python:3.11-slim as base
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
FROM base as dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM dependencies as production
COPY . .
EXPOSE 8001
# Optimized for production workloads
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001", "--workers", "4"]
Auto-Scaling Architecture
# Kubernetes HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ultranews-backend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ultranews-backend
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Performance Results and Metrics
Our architectural decisions deliver measurable results:
- Processing Speed: 15,000+ articles processed daily
- AI Accuracy: 98.7% across all processing tasks
- Response Time: Sub-second API response times
- Uptime: 99.9% availability with intelligent failover
- Scalability: Linear scaling to 100,000+ daily articles tested
Key Architectural Lessons
- Embrace Failure: Design for component failures rather than trying to prevent them
- Intelligent Escalation: Automated fallback strategies are more reliable than perfect systems
- Resource Isolation: Per-source processing prevents cascade failures
- Multi-Provider Strategy: Never depend on a single external service
- Performance Monitoring: Real-time metrics enable proactive optimization
The Future: Continuous Architectural Evolution
UltraNews continues evolving with new architectural patterns:
- Edge Computing: Processing content closer to sources for reduced latency
- Federated Learning: Improving AI models without centralizing sensitive data
- Event-Driven Architecture: Moving toward fully reactive systems
- Quantum-Ready Encryption: Future-proofing security architecture
Conclusion
Building UltraNews taught us that modern applications need to be intelligent by design. Traditional architectural patterns work for traditional problems, but when you're processing global information at scale with multiple AI providers, you need architectures that can think, adapt, and optimize themselves.
The result is a platform that doesn't just aggregate news—it transforms information into intelligence through thoughtful architectural design and continuous optimization.
Want to learn more about building intelligent systems? Follow RapierCraft for more insights on AI-first architecture and scalable system design.
Tags: #architecture #ai #scalability #fastapi #nextjs #python #typescript #devops #microservices #realtimedata
Top comments (0)