DEV Community

Cover image for Dogfooding at RudderStack: Tracking Plans Part 1
RudderStack for RudderStack

Posted on • Originally published at

Dogfooding at RudderStack: Tracking Plans Part 1

What are Tracking Plans, & why do you need one?

At RudderStack we talk a lot about the importance of owning your own data and the competitive advantage that can come from building robust analytics with complete data. Data trust is fundamental to this construct. In order to trust the data, you must trust the tools that are providing that data. That's why we built our Data Governance API and the new Tracking Plans feature we are getting ready to beta.

RudderStack Tracking Plans are the latest offering within our Data Governance API and have been one of our most requested features to date. Unlike RudderStack Transformations, which allow you to transform your data in flight, Tracking Plans allow you to plan and prescribe what your data should look like in the first place. Tracking Plans address three fundamental issues related to streaming data:

  1. Missing or improperly configured data breaks downstream SaaS applications and data warehouses. This causes problems like poorly executed automated campaigns and broken dashboards.

  2. Poorly named and duplicative events and properties. This creates confusion and mismapping in downstream tools and data warehouses.

  3. Upstream data providers make changes resulting in altered event streams. This leads to 1 and 2 above with little to no advanced notice or the ability to fix it.

How Tracking Plans solves these issues

RudderStack Tracking Plans allow you to define the specific event names and properties for each of your track, group and identify calls. In addition, you can assign the type of data associated with each property or attribute and specify whether that property or attribute is required. The Tracking Plans API also supports versioning for better control of your data streaming.

With your Tracking Plans in place, you can use the existing Data Governance API's to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we're releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.

What does the future hold?

Well, that's where we need your help. We are currently working with a few Alpha customers and using Tracking Plans ourselves on our own production instance of RudderStack. What we have on the roadmap are decisions about what types of errors or schema violations we want to track and then how to handle them. Although not set in stone, here is a sneak peak into what we're thinking so far:

Violation Type Description Action Taken
Unplanned Events An event for which no schema has been defined.
Unplanned Properties The event is defined but the property or attribute does not exist in the schema
Mismatched Data Type Data type for a particular property does not match what is defined in the schema
Required Field Missing The payload is missing a property set as required in the schema

Once this feature is fully built, the actions taken on each of these violations could include one or more of the following:

  1. Rejecting the entire payload
  2. Accepting the entire payload and sending it to downstream destinations w/ a warning flag
  3. Rerouting the entire payload to an S3 bucket (aka "dead letter queue")
  4. Removing the additional properties from the payload that are not defined in the schema
  5. Inserting default values for required fields missing in the schema
  6. More advanced options based on schema comparisons outlined here:

So, where do you start?

Connect with us on Slack or shoot us an email if you are interested in participating.

In the meantime, let's take a look at how your typical SaaS business would walk through the steps of designing and implementing Tracking Plans with RudderStack. As a part of the RudderStack Data Governance API, Tracking Plans are first and foremost managed through code, but we understand that designing the plan will be a collaborative effort involving developers and non-developers, so we designed a Tracking Plans Template Google Sheet to help get teams started.

The first step is to get your hands on a copy of the RudderStack Tracking Plans Template which will be available soon. This will help you and your team organize the various events and fields you want to capture from each of your RudderStack sources. The sheet does require that you have a user access token for your account. For help on how to create a user token, check out our Access Token user documentation.

The next step is to create a wish list of events and properties you think you might need. The goal of this first pass is not to create the be-all-end-all list, but primarily to see where data needs intersect amongst the various stakeholders and to begin building out the data architecture for your company. During this exercise, it can be helpful to start with existing higher-level paradigms like the sales and marketing funnel or executive summary reports as the underlying metrics for these are generally already agreed upon. Starting with what you already know you need to measure is a great way to begin drilling into how you measure it and, more specifically, where the data comes from in the first place and what properties or attributes will be measured (i.e., required keys and data types).

For example, let's take a sample SaaS business that has a funnel measuring the following:

Stage Team
Unique Site Visitors Marketing - Paid Digital
Leads Marketing - Engagement
MQLs Marketing - Engagement
Opportunities / Free Trials Sales - Outbound
Product activation / POC Sales - Sales Engineering
Customers Sales - Coffee Drinkers
Product usage Customer Success

Now that we have each stage defined, let's dive deeper into exactly what data elements will need to be created and tracked to reproduce our funnel and assign a source for the data. It is important to note that in some cases, such as defining a Marketing Qualified Lead (MQL), there may be multiple sources of information that contribute to qualifying any one particular lead, but in this table we are defining what system retains that information so that, should we ever need to perform an audit, Salesforce (in this example) is the system where we would confirm whether this particular lead was flagged as a MQL or not. As we are defining each metric, we will assign it to a tracking plan on our google sheet.

Funnel Step Source Metric Tracking Plan
Visitor Marketing Website & App Count of Distinct Anonymous ID Page View (Marketing) Page View (Application)
Lead Marketing Website & App Count of Distinct Email Addresses per domain Form Submit (Marketing) App Signup (Application)
MQL Salesforce Count of Salesforce Leads (not deleted) with MQL checked N/A (SFDC Cloud Extract)
Opportunity / Free Trial Salesforce Count of Opportunities where Opp Type = Initial N/A (SFDC Cloud Extract)
Product Activation App Has the User Created a Connection Connections Created
Customer Salesforce Opportunity = Close Won Opportunity Won
Product Usage App Total Event Volume N/A (aggregated from warehouse tables)

Some of our metrics will come from RudderStack Cloud Extract sources or other non-RudderStack tables in our data warehouse and therefore will not be defined in our Tracking Plan for event data.

Building out Tracking Plans

In the funnel map above we defined six different events and three different tracking plans that we want to build. This by no means defines the totality of your tracking plans but will be enough to get you started using the tools.

RudderStack Source (Tracking Plan) User Action Name RudderStack Event Name
Marketing Site Page View page_view
Marketing Site Form Submit form_submit
Application Page View page_view
Application App Signup app_signup
Application Connection Created connection_created
Salesforce Webhook* Opportunity Won opp_won

*Typically Salesforce and other SaaS tools will have data extracted using RudderStack Cloud Extract every 24 hours, however critical events like marking an Opportunity as won are important enough to trigger a real-time event being passed back through a Webhook source.

With the sources and events defined, we now need to identify the properties and property types for each event. These should now be added to the Tracking Plans Google Sheet. Each Source should have its own tab copied from the "Import Template". The tab below is a copy of the Marketing Site tab we created.

Event Name Description Property name Property type Property description Req'd
page_view User visits a page link_source string Value of UTM parameter defined as ?link_source={value} O
form_submit User submits a form page_title string Title of the page R
page_URL string URL of the page R
form_id string The ID of the form (configured in Sanity) R
label string Label for Google Analytics events (if needed) O
category string Category for Google Analytics events (if needed) O
utm_source string Optional utm parameters O
utm_medium string Optional utm parameters O
utm_campaign string Optional utm parameters O
utm_content string Optional utm parameters O
utm_term string Optional utm parameters O
raid string Optional utm parameters O
search_text string The text the user typed into the search field R

With the basics of our Marketing Site source plan created, we can now upload it to RudderStack by configuring additional settings in the Google Sheet (more on this when we release the feature).

One exciting part of the Tracking Plans Google Sheet is that you can download the latest version of a tracking plan from the RudderStack Tracking Plan API, then upload any changes you make, ensuring everyone working on the plan has the most recent set of changes.

Once a Tracking Plan has been uploaded to the API via the Google Sheet, you are ready to begin using RudderTyper. Download instructions and tutorials will be made available to beta participants.

Tracking Plans are only one piece of the puzzle

As useful as RudderStack Tracking Plans will be (and already are for our team and beta users), it should also be noted that there will always be scenarios where you still need to transform the data once it arrives from the source, either for enrichment, filtering or massaging based on the needs of the various downstream destination tools. Tracking Plans and Transformations go hand-in-hand to ensure a stable and trustworthy data feed.

There may also be times where you aren't sure what to do with particular variations of events streamed from your sources and in these cases sending them to a backup bucket such as Amazon S3 or Google Cloud Storage is an elegant solution. Check out our documentation for more information on how to leverage a variety of Cloud Storage Platforms.

Beta registration

As we continue our mission of giving developers full control over their data and their tools, we recognize and appreciate the commitments our customers have made to help improve the product and we thank you. If you would like more information on how to get signed up, please contact or hit us up on Slack.

Discussion (0)