In an era where software is being redefined i.e moving gradually away from human programming to more system building, it is imperative that one understands the system thinking and architecture behind modern software. Modern software is meant to be Reliable, Scalable and Maintainable
Many modern applications and Machine Learning models built in production are data intensive not necessarily compute intensive. Although recent advancements in Generative AI requires lots of compute. This means that for a lot of modern applications, the hardest part isn’t doing the heavy mathematics or CPU work, it is handling lots of data successfully.
Data Intensiveness is about storage, reading/writing data fast (Database Queries), Moving the data (APIs, distributed systems), keeping the data correct when there are many users acting at once and processing streams .
For Compute intensive applications and systems, they are about 3D rendering, Training deep learning models, Scientific simulations, and Video encoding. In Machine learning, the hardest part of getting a good ML system isn’t the model math, but the data quality. This includes the proper data collection, Data cleaning, labelling, storing, versioning and serving data correctly.
Data-heavy applications need the following:
- Databases
- Caches
- Search Indexes
- Stream Processing and Batch Processing
In the current systems for building applications and running ML models in production, many new data storage tools and processing have emerged.
MODERN DATA SYSTEMS ARCHITECTURE
The tools used in modern data systems architecture includes:
Primary Database: This is the source of truth (The real, Official data users, Orders, Posts etc.)
In-memory Cache: Very fast storage (Redis) to speed up the reads
Full-text index: Specialized search engine for fast-text search
Message Queue: A system of asynchronous tasks without slowing down the user request
Application Code: The stitching layer that talks to all of them and coordinates updates i.e. they are the bridge between databases, data warehouses, data streams or data lakes and the business models e.g. Python, SQL Queries. Therefore, in modern data systems architecture, the application code coordinates multiple specialized components to handle distinct tasks. It works as thus:
Client → Application Code (Python Script or SQL) → In-Memory Cache (If cached, read directly from the cache. If not Cached, Proceed to the primary database) →Primary Database.


Top comments (0)