DEV Community

Cover image for Data and analytics reimagined - platform architecture
Ivica Kolenkaš
Ivica Kolenkaš

Posted on

Data and analytics reimagined - platform architecture

The previous article in the series introduces the business, its data-related challenges and a vision of a data platform to tackle them.

This article goes over the architecture of the platform in its most basic form - arrows and rectangles.


A business analyst trying to answer "How many white shirts do I need?" will have their work made easier by having all the relevant sales data in a uniform shape and in the same place. The reality is, the sales data from previous years is fragmented in several data silos.

Relevant data - wish vs. reality


Relevant data - wish vs. reality

These data silos will have differently shaped data (databases, files, or worse), their security practices (if any) will be different and ownership may or may not be known.

Having all the data be uniform and in the same place is a hard nut to crack for a large organization with highly autonomous teams. A good alternative is to have the data in a similar-enough place, and uniform-enough shape so that it appears "the same".

To get it into a similar-enough place and a uniform-enough shape, we put a fence around it and it became a data domain. Very similar to containerization of shipped goods and software products.

Data domains

"What is a data domain on our platform?" is a question that has a different answer depending on who you ask. For a data engineer, a data domain is a grouping of similar data; for example, all sales data for the wholesale sales channel belongs to a WHOLESALE data domain.

A security specialist would argue that a data domain is a security boundary, while a data platform engineer will say that a data domain is a collection of - spoiler alert - AzureAD groups and Snowflake objects.

All three engineers are correct; a data domain is a grouping of data that belongs together, forms a security boundary, and is formed by, in our case, Snowflake objects and AzureAD groups.

Relevant data in a similar-enough place; a data domain


Relevant data in a similar-enough place; a data domain

Encapsulating data in a data domain helps us tackle the three main challenges of the existing data landscape:

  • clear ownership
  • defined security guidelines
  • defined shape of data

Data domain ownership

Each data domain must have an owner and a self-sufficient data domain team behind it.

Being self-sufficient means that they own and manage the data lifecycle and the domain fully. Ingestion of raw data, its transformation or serving as data products is entirely up to them.

"Want to make another data product?" Sure. "Want to delete all of them?" Absolutely.

Ownership over a data domain can be split into two:

  • business (administrative) ownership
  • technical ownership

Both types of owners have an overlapping right and responsibility; to manage the access to the data they own, while being aware of any sensitive data. They are expected to reject data access requests that do not conform to rules.

The main difference is that a business owner manages access for people to perform data exploration, while the technical owner will manage it for automated processes using code (more on that in the next article!).

For everyone outside of the domain, domain owners serve as a point of contact regarding the data they own.

Security guidelines

One selling point of our data platform, our curated collection of tools, standards and processes is exactly that - standards.

Same security standards being applied to every data domain makes accessing the data sets in those domains a seamless experience. No matter where the data set you need is located, getting to it is technically the same.

For people accessing data, this means having one of the standardized domains roles that grant read or write permissions.

For machines (think scheduled jobs, automated processes etc.) this means having a service principal that authenticates with a private key, among other things.

Our security guidelines span all the systems and tools we offer on the platform. Deviations must have a very strong business case - after all, the platform should be the way but not in the way.

Shape of data products

This aspect of the platform is the trickiest to tackle from the platform perspective since the shape of data products is ultimately chosen by the owning data domain team.

We can however standardize on basic rules that the shape of a data product is:

  • managed with code ("Why Git" from Atlassian)
  • described with a data contract
  • true to that data contract

Data products are meant to be used and they provide an interface. In the same way that you know how to operate a door by its interface - unless they're Norman doors - you should know how to use a data product by seeing its interface, its contract.

Data domain teams are responsible for defining data contracts and making sure their data products adhere to them.

Data mesh

Data domains on their own are powerful, but their superpower is in their ability to interconnect.

Data mesh


Data mesh

A great definition of a data mesh architecture:

A data mesh architecture is a decentralized approach that enables domain teams to perform cross-domain data analysis on their own. At its core is the domain with its responsible team and its operational and analytical data. The domain team ingests operational data and builds analytical data models as data products to perform their own analysis. It may also choose to publish data products with data contracts to serve other domains’ data needs. Source

Data domains are the nodes in this mesh. What makes it a proper mesh are the (data) connections between these domains.

A business analyst trying to answer "How many white shirts do I need?" now has a data mesh at their disposal. A data mesh made up of clearly defined data domains, each with an owner, with described and maintained data products that adhere to a contract and have the same security guidelines.

Business analyst using a data mesh


Business analyst using a data mesh

They know who to contact regarding data access (they can even request it themselves!), and once that access is given, the data is stored in a uniform place and is secured in a uniform way.

By building data domains and by organizing them into a data mesh we have established a framework for organizing and connecting data on our platform. More importantly, we have set the groundwork for success and hugely improved our data landscape.


This article went over the base architecture of our platform and certain standards applied to objects in it.

But what are the tools of the trade? Which processes are standardized on the platform? The third article in the series goes over the tools chosen to make up the platform and to maintain it.

Top comments (0)