Handling Big Data Challenges: A Case Study of AllFreeNovel.cc

AllFreeNovel.cc
## Technical Challenges & Solutions

1. Data Ingestion Bottlenecks

Problem:

Daily ingestion of 50,000+ new chapters from multiple sources (CN/JP/KR) with varying formats:

XML feeds from Korean publishers
JSON APIs from Chinese platforms
Raw text dumps from Japanese partners

Solution:

# Distributed ETL Pipeline
class ChapterIngestor:
    def __init__(self):
        self.kafka_topic = "raw-chapters"
        self.schema_registry = AvroSchemaRegistry()

    async def process(self, source):
        async for chunk in source.stream():
            normalized = await self._normalize(chunk)
            await kafka.produce(
                self.kafka_topic,
                value=normalized,
                schema=self.schema_registry.get(source.format)
            )

2. Search Performance Optimization

Metrics Before Optimization:

1200ms average query latency
78% cache miss rate
12-node Elasticsearch cluster at 85% load

Implemented Solutions:

Hybrid Index Strategy
- Hot data (latest chapters): In-memory RedisSearch
- Warm data: Elasticsearch with custom tokenizer
- Cold data: ClickHouse columnar storage
Query Pipeline:

graph TD
    A[User Query] --> B{Query Type?}
    B -->|Simple| C[RedisSearch]
    B -->|Complex| D[Elasticsearch]
    B -->|Analytics| E[ClickHouse]
    C/D/E --> F[Result Blender]
    F --> G[Response]

3. Real-time Recommendations

Challenge:

Generate personalized suggestions for 2M+ DAU with <100ms latency

ML Serving Architecture:

┌─────────────┐ ┌─────────────┐
│ Feature Store│◄─────│ Flink Jobs │
└──────┬───────┘ └─────────────┘
│
┌──────▼───────┐ ┌─────────────┐
│ Model Cache │─────►│ ONNX │
└──────┬───────┘ │ Runtime │
│ └─────────────┘
┌──────▼───────┐
│ User │
│ Interactions │
└──────────────┘

Results:

P99 latency reduced from 2200ms → 89ms
Recommendation CTR increased by 37%
Monthly infrastructure cost saved: $28,500

Key Takeaways

Data Tiering is crucial for cost-performance balance
Asynchronous Processing prevents pipeline backpressure
Hybrid Indexing enables optimal query performance
Model Optimization (ONNX conversion) dramatically improves ML serving

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

DEV Community

Handling Big Data Challenges: A Case Study of AllFreeNovel.cc

1. Data Ingestion Bottlenecks

2. Search Performance Optimization

3. Real-time Recommendations

Key Takeaways

The Next Generation Developer Platform

Top comments (0)

The Next Generation Developer Platform

Read next

Почему язык программирования Java - это прекрасный выбор для начинающих разработчиков?

Linux Column Command: The Text Formatter You Did not Know You Needed

How to use FPGA to complete balance detection?

Mastering Bench Press: Techniques, Benefits, and Mistakes to Avoid for Maximum Strength Gains

Okay