DEV Community

Cover image for Using data for predictive analytics
Richard Li
Richard Li

Posted on • Originally published at amorphousdata.com

Using data for predictive analytics

At Ambassador, the weekly/monthly/quarterly metrics review was a core part of the operating cadence of the company. We'd meet and review metrics that spanned the business: usage, customer satisfaction, revenue, pipeline. When a metric was not tracking to plan, we'd task a team to investigate and recommend actions to get back on track.

Yet, as the business grew in complexity, these teams got slower, as they were trying to reconcile data from different systems: Salesforce, HubSpot, Google Analytics, product data. Luckily, I was introduced to the world of modern data engineering, which promised a better approach.

Modern data infrastructure, Extract-Load-Transform, and cloud data warehouses

Over the past decade, two macro trends have powered innovation in the modern data ecosystem:

  1. Unlimited demand for data. Modern applications such as machine learning and business intelligence are fueled by data -- the more, the better.
  2. Plummeting cost of storage and compute, particularly in the cloud.

These trends have given rise to the "Extract-Load-Transform" (ELT) architecture, which is an evolution from the traditional "Extract-Transform-Load" approach. In an ELT architecture, data is stored in its original format, and then transformed as needed, instead of being transformed prior to load. An ELT architecture is much more flexible to adapt to future (unanticipated) needs, as no information is ever thrown away. The tradeoff with ELT vs ETL is more storage costs -- but with cloud storage being cheap, this almost always is worthwhile.

Business intelligence is not a dashboard

We adopted a modern data infrastructure solution: Google BigQuery (cloud data warehouse), FiveTran (ELT), Metabase (Business Intelligence / Dashboard), DBT (modeling), Hightouch (Reverse ELT). This worked great, as we were able to aggregate data from our CRM, marketing, product, support, and other systems in one place. And we started building dashboards, which illuminated parts of the business we had never seen before.

But the success of the data warehouse created more demand for dashboards, and we had dashboards upon dashboards. Our small data analyst team couldn't keep up with the demand. We also struggled to keep all of our dashboards up-to-date, and track which dashboards were being used.

Our business "intelligence" solution of dashboards was pretty dumb. That's when we realized that we were overusing dashboards as both a reporting tool and a communication and alignment tool. If we went back to first principles, the reality is that we wanted data to make better decisions.

Models

Around this time, I was introduced to a CFO who was looking for her next opportunity. We weren't actively looking for a CFO, but we really liked her, and I wanted to figure out if we could get her on board. And she said something that stuck with me: "What I like to do is figure out how to turn the business into a spreadsheet."

And I realized: We didn't need more dashboards. We needed more models.

What's a model? A model is a representation of a thing that is smaller in scale than the original. Models, being smaller in scale, are easier to manipulate. In the context of business, models are everywhere: financial models, funnels, and sales productivity models are common examples. Models are super-powerful because assumptions can be quickly tested. At a previous company, we had agreed on a top-line revenue goal for the year, but when we plugged that number into the sales productivity model, it showed how we would have to more than double rep productivity. Needless to say, we added more sales reps to the plan.

We had a financial model, productivity model, and funnel model -- but what we started to do after that was to build even more granular models. Our first model was our signup to product-qualified lead model, which accounted for all the steps (and decisions) a user needed to make from signup all the way to activation all the way to lead. We then built a dashboard that tracked the different steps of this model, and, as our understanding of the model evolved, so did the dashboard.

Data teams and the evolution of business intelligence

I've talked to dozens of data teams at B2B SAAS companies since then, and I've realized our journey was typical. The modern data journey looks something like this:

Data Roadmap

Businesses usually start with a homegrown business metrics dashboard, which can be implemented in slides or spreadsheets. This dashboard is an aggregation of key metrics from each function. These metrics are pulled from each function's critical systems: sales will pull pipeline data from Salesforce, marketing will pull lead data from marketing automation, support from Zendesk, and so forth. And this strategy can work well for a long time.

At some point, companies realize they need to do more cross-functional data analysis. It's not enough to know that marketing drove X signups, or product drove Y activations, or sales created Z opportunities. Organizations realize that all of these functions need to work together, and knowing where to find signups that are most likely to result in activations and opportunities requires the integration of different data systems. This drives Phase 2: the creation of a central data team, which drives the adoption of an ELT architecture and dashobards.

Data is addictive: the more you have, the more you want. And this will lead to the data team being overwhelmed with requests to create more dashboards and analysis. The typical answer is "self-service dashboards" from a business intelligence vendor, where technologies such as Looker, or more recently, LLM-powered dashboards, have become popular. This leads to Phase 3: self-service dashboards for the rest-of-the-organization.

Self-service dashboards have a problem: they make the easy problems easy, and they don't make hard problems any easier. What typically follows from self-service are an explosion of unmaintained dashboards (because creating them is easy) but there's still a huge bottleneck on analysis. The business functions inevitably hire their own data analysts: a marketing data analyst, product analyst. And this is because the real questions can't be answered in a chat session with an LLM. This leads to Phase 4: function-specific analysts.

All these analysts querying the same data warehouse causes a different type of problem: multiple competing definitions of critical KPIs. Different teams use similar but different definitions of revenue. Marketing runs on a Sunday weekly start, while sales starts on Monday. All these different definitions start to create concerns around data integrity. The solution proposed is to standardize these definitions in a metrics layer (AirBnb's version of this story). And thus, the organization is in Phase 5: implementing metrics.

Yet with all this investment, the original concern of the business: using data to make better decisions, is taking longer than ever. Things go back to Phase 2 in an attempt to standardize access to the metrics layer, which will let everyone be more productive.

Ambassador, and a different way

At Ambassador, we got to Phase 2. The data team was a bottleneck, but we weren't big enough to justify a big investment in self-service. We were forced to innovate, and that's when we started building models. Our data evolution ended up looking like this:

Data Roadmap v2

Different functional teams who needed dashboards started by building models in spreadsheets and slides. They shared these models with the data teams, who helped them build dashboards that let them track the performance of these models.

We found that models became powerful tools for aligning the data team, the functional teams, and the broader organization on the purpose of the dashboards. Beyond the alignment, we were able to test and question assumptions and use these models in planning.

Business intelligence is about communication

Since then, I've come to view that models are a critical abstraction between analysis & dashboard tools and data itself.

Data stack

In the context of the data stack, these models are a distinct abstraction from the modeling done by tools such as DBT or LookML. These models represent how the business is operating, and capture the business processes themselves.

More generally, I've come to see business intelligence is a communication function. With BI, you're using data and visualizations and models to align the organization around the challenge at hand. Models are a crucial part of this communication function. In the process of building a model, you'll find holes in your data, and assumptions you're making -- and that's when you should go off and go validate those assumptions and fill in these holes. Most importantly, models create alignment between all the different functions in the organization, from data to engineering to go-to-market.

This post was originally published on the Amorphous Data blog.

Top comments (0)