... Royals, Nobles - Short, tall - Upper Class, Lower Class - Green, Brown - Democrat, Republican ...
For centuries, society has developed a number of labels, adjectives, and monikers to help align different groups of people to different categories. These categories have been used for good, bad, and indifference, but no matter its intent, the purpose has always remained the same:contextual organization. This organization method has spanned not only with respect to people, but throughout all ends of the world, from farmers tagging cattle, to land developers tagging zones, even down to the core of human anatomy with genome and DNA tagging. While the methodology of each may be different, the ultimate purpose is the same; organization.
This article will start with a very brief retrospective on data. We will explore the importance of adding context to our data through a set of labels more commonly referred to as tags, similar to how a farmer would tag his cattle.
In as little as 30 years, technology has rapidly evolved, from mainframes to quantum computing, physical architecture to the almighty cloud, all of which leading to a phenomenon known as Big Data. With trillions of bytes of data being procured every day, we need to make sure we have a way to organize the data, a way to label, most importantly, a way to tag.
The concept of data tagging isn't new. Starting as early as the late 1970s the International Press Telecommunications Council (IPTC) defined a set of standards for descriptors that were added to images, more commonly known as metadata. Just like tags on a cow, metadata became the tag for which researchers could sift through digital data, building patterns, analyzing security, and determining anomalies.
I have spent over 10 years in IT, while in professional context doesn't sound very long, within the technology era those 10 years spanned through multiple generations. My career has been spent with great companies, all sharing a common problem, an abundance of data, but no common tagging of the herd.
It would be unfair to say that there wasn't any organization of data. Each team or silo has their own mechanisms for data tagging, whether it was through an application or product, or stored within a database or even a cross enterprise collaboration tool. These tags were useful, helping build correlations between data and the business, but they were unfortunately stuck in these small silos, each with their own naming conventions, requirements, and terminology.
Most of my career has been spent in site reliability engineering (SRE), where we spend the majority of our work hours sifting through logs, metrics, and various other forms of observability data. The goal is to measure ways to help guarantee the health of the business and its critical services. At any given time, terabytes of data flow across the business landscape, in hopes that someone is listening, watching, and learning. Historically, that audience has been targeted towards the developers themselves. Who else to understand the data better than those who helped create it, or create its underlying vehicle? However, that same landscape has changed over time, the audience has become much more than that, the audience is the business, the SRE's, the data scientists, operations, even customer service.
As the data landscape has grown, so has another set of tools and practices: automation and orchestration. We can take data and entrust code and logic to perform pre-defined sets of tasks based on the look and feel of the data. The pattern here is our data is turning to insights, insights to actions, actions to business value. All of which live and die by one common core concept which is data, and its organization.
The journey to this data organization doesn't come easy, especially with companies who have been around for decades and surviving with existing practices. This journey needs to start as soon as it possibly can for the sake of the greater good of the business and its
Ultimately we must treat our data like a farmer would when hand selecting raised cattle to introduce to their farm. Things a farmer would have to consider include: Where did they come from? Who do they belong to? What was their purpose? What environment were they bred in? Who can I call for some history if there is a problem? These questions are asked to gain a sense of understanding, history, as well as help us profile what we are introducing. The same questions should be asked when we look at data: Where did it come from? Who owns it? What is its purpose? What environment does it belong to? Who supports the data? These simple questions, and their answers, ultimately become tags which belong in an enterprise managed card catalog.
Realistically, data doesn't magically appear within our streams. It is procured by something, be it a service, a platform, a product, or a person. Just like a farmer would expect to be handed the history of their cattle during purchase, we want to make sure that our data has the answers to our questions as it travels through our information pipeline. We must make sure that the source tools incorporate tagging concepts to the best of their ability, and we need to make sure that they follow standards.
There isn't a one size fits all set of tags that can be used across an enterprise. This is important to understand because trying to convince the powers that be that there is will surely end up in a failed effort. Instead, we need to start to consider tagging as an architecture composed of pillars, all starting from a common base and each representing a strategic organizational function, including Security, Infrastructure, Application, Organization, and so on.
The base is the most critical component as it will be the piece that will support all pillars. We expect it to be the most controversial as we will be defining tags that are central to the business, expecting the most accuracy with respect to inclusion. In my experience, this is where you will encounter debates on what belongs, and what doesn't.
This process is important because in the event that the base set of tags isn't strong enough, the entire architecture will collapse on itself. The more accurate our base tags are, the faster we can answer the most common questions of the business's digital infrastructure, like those previously mentioned: where did it come from? Who owns it? What is its purpose? What environment does it belong to? Who supports the data?
As we stand up pillars in our ever growing farm, we must ensure that each pillar is given attention with respect to its purpose. While we can try to perform the constructing of each piece of our farm, we must be cognizant that we cannot all be masters of each domain. We must trust that while there is a core set of architects watching, caring for, and feeding our base, that there are a set of architects who can focus on each of our pillars. If you have a pillar for security, with tags specific to data types focused on things like data restrictions, you must ensure that you have a specialist as consult, one with a core understanding of the business needs with respect to security.
Similar to how a farm has farm-hands to help maintain the property, we will want to build a community of practice that can keep track of this entire data landscape. The purpose of this community will be to provide a platform for consumers to voice requirements, as well as gain understanding of our topology.
Consistency is important in all aspects of business and that holds true right down to the data. Typically driven by standards, this consistency ensures that the proper practices are followed which continue to align our original goal of organization. Just like a farmer's tags have the same look and feel, data tags are only as valuable as the standards behind them.
When developing your tagging strategy, it is best to consider what standards you will want to adopt across the enterprise. One of the most overlooked standards is the format of how tags are spelled, and in many cases, tag names can very well be case sensitive. Consider the following examples of how without standards, tag sprawl can disturb the overall goal:
nonproduction , Nonproduction, nonProduction, NonProduction, non-production
name, Name, NAME
Setting consistency rules up front will help to prevent sprawl like the ones mentioned above, and will help new adopters conform.
Hopefully by this point, you have been able to follow along as we covered topics of tagging, consistency, and data organization. Sticking true to the title of the article, we have taken our herd of data cattle, and humanely tagged them, finding them a home in a pillar based farm structure, where they will carry on life with a true identity, and not just defined by an array of bytes.
At this point, we have helped set the structure that will help data consumers (visitors to our farm if we are still continuing our analogy here), start to build metrics, automation, even cost allocations based on our tags. Our data now has a persona, and an identity, as well as a history. Our data has taken its step towards our goal: organization.