DEV Community

Pankaj
Pankaj

Posted on

Template for design document of Apache Spark project

In an Apache Spark based data engineering/analytics project - what would a design document template look like ?

Of course answer depends on the business/project requirements.

But will the design document "template" contain the following aspects ?

Am I missing something ?

My current list of aspects in the design template -

(1) Tables/Views to be created (if any) in the source system in order to facilitate my project's pipeline(s). For some of the pipelines Kafka topic is the source. (2) Pipeline - schema of the data, estimated data volume per call, format (csv etc.), Kafka topic, frequency of pulling data (daily, weekly etc) from source system, is the data pulled as needed or per a schedule or based on an event, connectivity etc. (3) What kind of data objects will be created to persist the data in the data-lake ? (4) High level statement of all code changes, config changes, and data changes (including movement). (5) Which design standards/best practices are being followed ? Critical design decisions to optimize/improve pipeline performance. (6) Which regulatory compliance standards are being applied and how ? (7) Which aggregation objects/views are to be created so that data and analytics reports can be served.

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay