William Antonio Guzmán Bernal for AWS Community Builders

Posted on Jan 6, 2025 • Edited on Jan 10, 2025

Data Governance… when there's no Data nor Governance

#aws #data #governance #cloudcomputing

In 2006, during the Association of National Advertisers Conference, the following phrase was used for the first time: "Data is the new oil (Clive Humby)"


Data is the new oil (Clive Humby)

It was the Mathematician and Data Scientist Clive Humby who 'coined' this phrase to express that both data and oil share key points (Data as The New Oil Is Not Enough: Four Principles For Avoiding Data Fires) among them, high value (which gives companies significant competitive advantages), serves as raw material (which must be processed and analyzed), drives growth (helps the digital revolution), has the gift of ubiquity (being crucial in several industries) and, last but not least, represents risks and considerations (privacy, security, monopolization, among others)..

In an ideal scenario, we could assume that companies are just starting their data projects and we can propose everything from scratch, following the Best Practices available. But what to do when they already have data and there are already projects underway? It could even happen that they do not have visibility over all the data and information they are generating, and therefore, they do not have adequate management to access this data.

They probably require our support in order to get the most out of them, in the most cost-efficient way possible, and with constant optimization. With this objective in mind, a first step would be to identify what type of industry or vertical they belong to. To make this classification, we could remember that each company starts with one or several inputs, which may or may not undergo processes to generate products that will ultimately be purchased by customers or end users. With these elements in common, we could classify companies into the following categories: (Manufacturing Supply Chains Explained | NetSuite): Products / Services, Suppliers and Vendors, Raw Materials, Production, Storage and Warehousing, Distribution / Transportation, Retailers, Maintenance and Repair, or Recycling.

At this point, we should be able to clearly identify which industry or vertical the company we want to support belongs to. The next step is to confirm whether the company has full visibility of its data. In other words, does the company know what data it has, what it needs to monitor, and what it should be leveraging? To address these concerns, we could look for data in reports, dashboards, metrics, Key Performance Indicators (KPIs), notifications/alerts, and corporate objectives.

Our first point of contact could be the stakeholders or BI analysts within the company. Why them? Because they are usually the ones who have access to the already processed data, they are the ones who make the decisions or support those who make them, so that everything is as informed as possible.

In some cases, this data access is done through BI tools (such as Amazon QuickSight) or through SQL queries (using services such as Amazon Athena).


Data Consumption Reference Architecture

But we can't just ask them, as such a conversation wouldn't be the most 'friendly' at the time. As consultants we should avoid conversations like this:

Consultant: What dashboards do we have?
BI Analyst/Stakeholder: Why? What do you need?

Remember that we cannot find the right answers if we ask the wrong questions. Instead of having the above conversation, we could have this conversation:

Consultant: I would like to help improve/optimize our processes. Where can I find information to get started? Especially related to the data we collect/process. Where should I look first?
BI Analyst/Stakeholder:
a) You should ask …
b) You can't, because for that you would have to belong to the team/area of…
c) You need authorization from…
d) You can check our Intranet, check this page….
Consultant: Ok, thank you very much.

We start with a friendly approach, we put ourselves in the person's shoes. We know that we are doing something for the good of the company, but we do not want to be seen as a risk or danger to anyone. In some circumstances or companies, when someone asks this type of question it could generate distrust and we need to earn the trust of the work team. In addition, following the last example, the BI analyst or stakeholder could be guiding us (in a natural and fluid way) to one or several sponsors that we require for our work.

In other cases, the answers to this first conversation could lead us directly to the Potential Owner(s) of the Existing Data, which could be IT, Support or Maintenance Personnel. We could ask them what Protocols or Compliance are already implemented in the company, if there is a User Profiling, or even if they have a Cloud Center of Excellence. (AWS Cloud Center of Excellence, The CCoE tenets - AWS Prescriptive Guidance).

At this point it would be important to pause. What would happen if someone asked you NOT to think about a red box?

VERY likely the first thing that will come to mind is precisely the red box. This is because our brain reacts differently to a negative response than you would think. (How Does “Not” Affect What We Understand? Scientists Find Negation Mitigates Our Interpretation of Phrases). By telling someone “NOT to think about a red box,” the brain is actually mitigating that thought, rather than reversing it. In other words, our brain would think about a red box, only smaller. For this reason, it is important to be careful about our communication with other people (“Don’t think about the costs…” vs. “Think about the benefits…”) and even with ourselves (“I am going to learn/start/do…” vs. “I am learning/starting/doing…”).

We can start to guess that we are going to have quite a few interactions with the client's team. So, let's get ready for meetings!

Some points to consider when having meetings:

Listen to understand, not to respond. Also, listen with your eyes (establish and maintain respectful eye contact).
Take notes (this can be on sheets of paper), but do not take dictation, i.e. do not type on a keyboard, as this can sometimes cause distractions and give the client the feeling that they are not being listened to properly. In this regard, it is best to ask for permission to record the meeting.
Get as much context as possible before starting the meeting (who the client is, what they are about, what their products or services are, their potential competitors…)
During the course of the meeting, identify whether the meeting attendees have an administrative, commercial/sales, or technical/development profile (or even a combination of these).
For each of the attendees, also identify knowledge of the project, business, needs or requirements; clarity of objectives; technical level
BANT (Budget, Authority, Needs, Timing)

where we can have:

Who = Ejecutivos C-Level; Product Owners; Project Managers; Business Analysts; External Dependencies; Sales Representatives; Developers; Designers; Architects
Why = Company-level goals (H1, H2, Q1…); Optimization, Automation, or Improvements; Success Criteria; Metrics or KPIs; SLAs

If we have the “Who” and the “Why”, organically we will obtain the “What” for each one of the participants. The “How” will come later.

One mistake to avoid when trying to identify data (and its corresponding governance) is to postpone contact with the purely technical team. This could create a ‘resistance or refusal’ to collaborate in our purpose of analyzing data that may require more detailed control. For this reason, it is important to involve the IT team in charge of observability, maintenance, access and security of the platform we are analyzing. Some of the services that the IT team may be using in their tasks in AWS may be IAM, CloudWatch, EventBridge and SNS.


Reference Architecture IT Team

If we want to find out what data may be being overlooked, we should answer these questions: Are there standard metrics in that industry? KPIs? And for this point we can rely on institutions, organizations, best practices of each industry, among others. Some examples could be, depending on the industry:

Financial (15 Essential Financial Metrics for Advisors, Consultants, and Investors)
Healthcare (30 Healthcare KPIs & Metrics To Start Tracking Today)
Software (7 SaaS Metrics Every SaaS Company Should Care About)
Security (18 Data Security Metrics & KPIs You Need To Track)

The goal is to find out which metrics are not being considered by the client.

Before you start ‘touching the data’, it is worth asking yourself the following questions related to Security (a topic that should be included from the ‘initial moment’). Investigate whether there are security protocols and, if they do not exist, propose one based on:

What data is handled in the organization?
Why does this data exist?
Who owns the data? Who should own it?
Who needs access to what data?
Who should authorize that access?
How long should this access be granted?
How should this access be authorized (email, ticket...)?
Where should the data be stored?
Are there compliance laws or regulations applied to that data (local, national or global)?
What are the actions to be taken in the event of a data leak or breach?
Are there data backup or retention policies?

This is an excellent time to create a flow chart of the lifecycle of that data explaining the protocol created, and organize meetings to inform the appropriate people. Again, involving people in data governance management gives them the proper treatment of respect and consideration in the decisions to be made.

Once this ‘draft data security protocol’ has been drawn up, the analysis of the data flow (from right to left) can continue.


Reference Architecture Security

It is always good to remember that, like any architecture to be deployed in AWS, it should be guided by the Key Concepts, Design Principles and Architecture Best Practices contained in the AWS Well-Architected Framework, which includes the following pillars:

Operational excellence
Security
Reliability
Performance efficiency
Cost optimization
Sustainability

At this point we can introduce the concept of Governance (Management and Governance on AWS) como “the ability to implement executive board policies that your AWS cloud environment must comply with” (Guidance for Governance on AWS). The power of AWS is that this concept is not 'bound' or 'restricted' to a particular point or area, but can be applied “across the AWS ecosystem, including accounts, infrastructure, and environments owned and operated in the AWS Cloud” (Cloud Governance).


Reference Architecture Governance

Now it’s time to give the ‘official welcome’ to the Technical and Development Area!

Let's remember that we must preserve the essence of teamwork:

stakeholders,
IT team,
security,
governance
and now, development

By understanding who each of these areas is, and why they should be involved in data governance, we can understand what each area really does and then analyze the best way to actively involve them in generating Data Governance.

Ask in each area how they are now. (“As Is”), if there are differences with respect to the goals that should have already been met at that time (“To Be”), and how they think they should be (“Should Be”).

This analysis will probably lead to definitions of technology stacks, dependencies (both internal and external), possible automations, documentation protocols and testing. Remember that “Documentation is a love letter that you write to your future selves.”.

Also keep in mind this simple 'formula':

Monitoring + Alerts + Responses = Sleep Well

This goes hand in hand with confirming whether there are disaster recovery plans.

Also confirming whether there are corporate standards for coding, Best Practices to follow, patterns; whether internal talks are held to share knowledge (chapters).

If at any point we find that something is missing to implement or create, if possible propose it. To summarize in one sentence:

Be discreetly disruptive!

Continuing with our analysis from ‘right to left’, we could now ask ourselves this question: where are we going to get (or are we already getting) our data from?

For this, there is a set of representative AWS services to take data from different sources, which could be ‘cataloged’ in a data ingestion layer.


Reference Architecture - Data Ingestion

Let's look at which of these services (or similar ones) are currently in use, which data sources they are connected to, and which data is NOT being brought to consumption by stakeholders or IT. We may be looking at useful, unused data.

When we find this 'orphaned' data (it is being ingested but does not have a 'parent' in the consumption area), we can find ourselves in the last layer we are going to review: the data processing area.


Reference Architecture - Data Processing

If we wanted to summarize the mental process for generating data governance in companies that do not currently have it, involving data that may have gone unnoticed until now, we could say that we need to identify:

data
stakeholders
BI components

already existing, to then propose:

missing BI components
required protocols
automation / optimization

In the course of this analysis, the most important thing is to always remember that we are dealing with a human factor, susceptible to errors, feelings, personal and group objectives. With the best possible intention and will, always have empathy for each and every person who gives us their time, experience and advice to achieve building an effective and useful Data Governance for the company.

DEV Community

Data Governance… when there's no Data nor Governance

Top comments (0)