DEV Community

Cover image for Architecting UltraNews: Building a Real-Time News Intelligence Platform That Scales
Yash Dubey
Yash Dubey

Posted on

Architecting UltraNews: Building a Real-Time News Intelligence Platform That Scales

At RapierCraft, we faced a unique challenge: how do you build a system that can intelligently process thousands of news articles daily while maintaining sub-second response times and 99.9% uptime? The answer led us to create UltraNews, a platform that processes over 15,000 stories daily with 98.7% AI accuracy.

This article breaks down the architectural decisions, challenges, and innovations that power UltraNews—from our autonomous AI discovery systems to our multi-LLM orchestration approach.

The Core Architecture Challenge

When we started building UltraNews, we knew traditional news aggregation approaches wouldn't work. We needed a system that could:

  • Process content from thousands of diverse sources with different structures
  • Adapt automatically when websites change their layouts
  • Maintain consistent performance under varying loads
  • Orchestrate multiple AI providers for optimal results
  • Scale horizontally without losing intelligence

The solution was a three-tier architecture that separates concerns while enabling seamless communication between intelligent components.

Tier 1: The AI Intelligence Engine (Backend)

Our backend is built on FastAPI with Python 3.11+, chosen for its async capabilities and automatic API documentation. But the real innovation lies in how we've structured the intelligent components.

Autonomous Discovery Architecture

# Simplified view of our discovery coordination system
class DiscoveryCoordinator:
    def __init__(self):
        self.strategies = [
            StructuredDiscovery(),  # Sitemaps, RSS feeds
            IntelligentCrawling(),  # List page detection
            AIExploration()         # LLM-guided discovery
        ]

    async def discover_content(self, source: Source):
        for strategy in self.strategies:
            try:
                results = await strategy.execute(source)
                if self.validate_results(results):
                    return results
            except Exception as e:
                # Automatic fallback to next strategy
                continue

        # If all strategies fail, mark for human review
        await self.queue_for_review(source)
Enter fullscreen mode Exit fullscreen mode

The key architectural decision here was escalation-based discovery. Instead of trying one approach and failing, our system automatically escalates through three phases:

  1. Structured Discovery: Fast, efficient parsing of sitemaps and feeds
  2. Intelligent Crawling: Pattern-based content detection and extraction
  3. AI Exploration: LLM-guided discovery when other methods fail

Multi-LLM Orchestration System

One of our biggest architectural innovations is the Multi-LLM Orchestration system. Rather than being locked into a single AI provider, we built an abstraction layer that can work with any LLM:

class LLMOrchestrator:
    def __init__(self):
        self.providers = {
            'groq': GroqProvider(),
            'openrouter': OpenRouterProvider(), 
            'gemini': GeminiProvider(),
            'nvidia': NVIDIAProvider(),
            'local': OllamaProvider()
        }
        self.selector = IntelligentProviderSelector()

    async def process_content(self, content: str, task_type: str):
        # Select optimal provider based on task complexity and cost
        provider = await self.selector.select_provider(content, task_type)

        try:
            result = await provider.process(content)
            await self.log_success(provider, task_type)
            return result
        except Exception as e:
            # Intelligent fallback with context preservation
            fallback_provider = await self.selector.get_fallback(provider)
            return await fallback_provider.process(content)
Enter fullscreen mode Exit fullscreen mode

This architecture gives us several advantages:

  • Cost Optimization: Automatically route simple tasks to cheaper models
  • Performance Optimization: Use the fastest model for time-critical tasks
  • Reliability: Seamless fallback when providers have issues
  • Quality Optimization: Route complex tasks to the most capable models

Per-Source Orchestration

Traditional systems treat all content sources equally. We implemented per-source orchestration, where each news source gets its own dedicated processing pipeline:

class PerSourceOrchestrator:
    def __init__(self):
        self.source_managers = {}
        self.adaptive_controller = AdaptiveController()

    async def process_source(self, source_id: int):
        if source_id not in self.source_managers:
            self.source_managers[source_id] = SourceManager(
                source_id=source_id,
                initial_resources=self.calculate_initial_resources(source_id)
            )

        manager = self.source_managers[source_id]

        # Dynamic resource allocation based on performance
        await self.adaptive_controller.adjust_resources(manager)

        return await manager.process()

class SourceManager:
    def __init__(self, source_id: int, initial_resources: dict):
        self.source_id = source_id
        self.state = SourceState.ACTIVE
        self.resources = initial_resources
        self.performance_metrics = PerformanceTracker()

    async def process(self):
        if self.state == SourceState.SUSPENDED:
            return await self.attempt_recovery()

        # Process with dedicated resources
        return await self.execute_processing_pipeline()
Enter fullscreen mode Exit fullscreen mode

This approach allows us to:

  • Isolate problematic sources without affecting others
  • Dynamically allocate resources based on source reliability
  • Maintain source-specific optimizations and learning
  • Scale processing power where it's needed most

Tier 2: Professional Management Interface

Our admin panel is built with Next.js 15 and React 19, but the architecture focuses on real-time data synchronization and performance optimization.

Real-Time WebSocket Architecture

// WebSocket coordinator for real-time updates
class WebSocketCoordinator {
  private connections: Map<string, WebSocket> = new Map();
  private subscriptions: Map<string, Set<string>> = new Map();

  async broadcastUpdate(channel: string, data: any) {
    const subscribers = this.subscriptions.get(channel) || new Set();

    const updatePayload = {
      timestamp: Date.now(),
      channel,
      data: await this.optimizePayload(data)
    };

    subscribers.forEach(connectionId => {
      const ws = this.connections.get(connectionId);
      if (ws?.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(updatePayload));
      }
    });
  }

  private async optimizePayload(data: any): Promise<any> {
    // Compress and optimize data for real-time transmission
    return this.compressionService.compress(data);
  }
}
Enter fullscreen mode Exit fullscreen mode

Component-Based Performance Optimization

We implemented a priority-based rendering system where critical components get updated first:

// Priority-based component updates
const SystemHealthCard = memo(({ priority = 'high' }: ComponentProps) => {
  const { data, loading } = useWebSocketData('system-health', {
    updatePriority: priority,
    bufferUpdates: priority === 'low'
  });

  return (
    <Card className="real-time-updates">
      <CPUMetrics data={data.cpu} />
      <MemoryMetrics data={data.memory} />
      <QueueMetrics data={data.queues} />
    </Card>
  );
});
Enter fullscreen mode Exit fullscreen mode

Tier 3: Research Platform (Public Interface)

Our public platform focuses on performance-first architecture with advanced caching strategies.

Multi-Layer Caching Architecture

class CacheOrchestrator {
  private layers = [
    new BrowserCache(60000), // 1 minute
    new CDNCache(300000),    // 5 minutes  
    new RedisCache(900000),  // 15 minutes
    new DatabaseCache()      // Persistent
  ];

  async get(key: string): Promise<any> {
    for (const cache of this.layers) {
      try {
        const result = await cache.get(key);
        if (result) {
          // Populate higher-priority caches
          await this.backfillCaches(key, result, cache);
          return result;
        }
      } catch (error) {
        // Continue to next cache layer
        continue;
      }
    }

    // Cache miss - fetch from source
    return null;
  }

  private async backfillCaches(key: string, data: any, sourceCache: Cache) {
    const sourceIndex = this.layers.indexOf(sourceCache);

    // Populate all higher-priority caches
    for (let i = 0; i < sourceIndex; i++) {
      await this.layers[i].set(key, data);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Database Architecture: Optimized for Scale

We use PostgreSQL with advanced optimization strategies:

Connection Pooling and Query Optimization

# Optimized database session management
class DatabaseManager:
    def __init__(self):
        self.pool = create_async_engine(
            DATABASE_URL,
            poolclass=StaticPool,
            pool_pre_ping=True,
            pool_recycle=3600,
            echo=False
        )
        self.priority_session_factory = self.create_priority_factory()

    def create_priority_factory(self):
        return sessionmaker(
            bind=self.pool,
            class_=AsyncSession,
            expire_on_commit=False
        )

    async def get_priority_session(self):
        # Admin requests get priority database connections
        return self.priority_session_factory()
Enter fullscreen mode Exit fullscreen mode

Intelligent Indexing Strategy

-- Performance-optimized indexes for high-volume queries
CREATE INDEX CONCURRENTLY idx_articles_created_at_btree 
ON articles USING btree(created_at DESC);

CREATE INDEX CONCURRENTLY idx_articles_source_id_status 
ON articles(source_id, processing_status) 
WHERE processing_status IN ('pending', 'processing');

-- Partial indexes for common query patterns
CREATE INDEX CONCURRENTLY idx_articles_importance_high 
ON articles(importance_score DESC) 
WHERE importance_score > 0.7;
Enter fullscreen mode Exit fullscreen mode

Deployment and Scalability Architecture

Containerized Microservices

Our deployment uses Docker with service-specific optimizations:

# Multi-stage build for optimal container size
FROM python:3.11-slim as base

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

FROM base as dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM dependencies as production
COPY . .
EXPOSE 8001

# Optimized for production workloads
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001", "--workers", "4"]
Enter fullscreen mode Exit fullscreen mode

Auto-Scaling Architecture

# Kubernetes HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ultranews-backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ultranews-backend
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
Enter fullscreen mode Exit fullscreen mode

Performance Results and Metrics

Our architectural decisions deliver measurable results:

  • Processing Speed: 15,000+ articles processed daily
  • AI Accuracy: 98.7% across all processing tasks
  • Response Time: Sub-second API response times
  • Uptime: 99.9% availability with intelligent failover
  • Scalability: Linear scaling to 100,000+ daily articles tested

Key Architectural Lessons

  1. Embrace Failure: Design for component failures rather than trying to prevent them
  2. Intelligent Escalation: Automated fallback strategies are more reliable than perfect systems
  3. Resource Isolation: Per-source processing prevents cascade failures
  4. Multi-Provider Strategy: Never depend on a single external service
  5. Performance Monitoring: Real-time metrics enable proactive optimization

The Future: Continuous Architectural Evolution

UltraNews continues evolving with new architectural patterns:

  • Edge Computing: Processing content closer to sources for reduced latency
  • Federated Learning: Improving AI models without centralizing sensitive data
  • Event-Driven Architecture: Moving toward fully reactive systems
  • Quantum-Ready Encryption: Future-proofing security architecture

Conclusion

Building UltraNews taught us that modern applications need to be intelligent by design. Traditional architectural patterns work for traditional problems, but when you're processing global information at scale with multiple AI providers, you need architectures that can think, adapt, and optimize themselves.

The result is a platform that doesn't just aggregate news—it transforms information into intelligence through thoughtful architectural design and continuous optimization.


Want to learn more about building intelligent systems? Follow RapierCraft for more insights on AI-first architecture and scalable system design.

Tags: #architecture #ai #scalability #fastapi #nextjs #python #typescript #devops #microservices #realtimedata

Top comments (0)