DEV Community: Hightouch

SaaS Integration: What it is & Why it's Important

Luke Kline — Wed, 13 Oct 2021 17:49:35 +0000

Over the past few years, the technology industry has exploded. It seems like there is a new record-breaking IPO every single year. Companies are not slowing down either. In fact, corporate spending has done nothing but increase as businesses have begun to budget more and more for strategic initiatives and overall digital transformation. Gartner estimates that worldwide IT spending is projected to reach $4.2 trillion within 2021, with enterprise software accounting for about 14.2% of that total. This is an increase of 8.6% in spending from 2020. Gone are the days of expensive hardware appliances. Today every company is using an assortment of different SaaS technologies that have replaced conventional back-end systems. For instance, Sales teams typically leverage a customer relationship management (CRM) platform, Finance teams use enterprise resource planning (ERP) systems, Human Resource teams leverage human resources information systems (HRIS), Support teams use customer support (CS) tools. These tools typically include technologies like Hubspot, Salesforce, Netsuite, SAP, Workday, Gainsight, Zendesk, etc. These are just a few examples, but this list could go on and on...

What is SaaS?

Simply stated, SaaS is a software licensing model where the software is located on external servers rather than proprietary internal servers. Typically SaaS tools are built on top of the major cloud providers (i.e. AWS, Azure, GCP). In most cases, the software is provided through a subscription or license. Users typically access SaaS tools through a web browser, logging in using a username and password, rather than having to install actual software on their computer. With SaaS tools, companies don’t have to worry about any of the infrastructure or maintenance to keep the service up and running. In most cases, SaaS tools also have a lower up-front cost, since most companies offer consumption-based and flat-rate pricing models. Additionally, the largest SaaS companies are constantly innovating and rolling out new features and updates to improve their products in order to gain more customers. SaaS tools have one huge problem though. Since all SaaS apps hold a unique data set, every new SaaS tool a company adopts creates a data silo. This is because each SaaS tool is designed for a set purpose and most don’t necessarily play nicely with other tools

What is a Saas Integration?

SaaS integrations have risen up as a way to combat the problem of data silos. Simply stated, a SaaS integration connects a SaaS-based application with either another cloud-based app or on-premise software through an application programming interface or API integration. Once this integration is complete, both applications can request and share data with each other easily. SaaS integrations only provide the blueprints for connecting applications together. In order to actually connect two different tools, businesses have to create an integration, so creating data pipelines is often the most time-consuming aspect of data engineering teams (this is not typically what they were hired for though). This process can be extremely time-consuming and difficult since the average mid-sized company often has hundreds of applications and SaaS tools. Additionally, since most SaaS companies are constantly rolling out updates and new features, these integrations and pipelines are extremely prone to failure. Even worse, this data is often still in its raw state as it has not been transformed, so it’s not even that usable.

SaaS Integration Solutions

In order to get around the problems mentioned above, companies have consistently leveraged an assortment of data integration solutions:

1) iPaaS (Integration Platform as a Service)

iPaaS solutions move data directly between applications doing little to no transformation on the data. Typically they offer a visual interface to build integrations. If an API endpoint is exposed by a SaaS vendor, then an iPaaS solution can push data to it or pull data from it. In general, iPaaS solutions perform actions when a trigger is met. Simply stated, when a trigger is met or an event takes place in one system, that information is then transmitted to another application via an API or Webhook which then performs one or more predefined actions. Fundamentally, all iPaaS solutions work the same way, in that they all send data from point A to point B or vice versa. Since these point-to-point connections are just large workflows, they become even more challenging to maintain compared to a “homegrown integration” created by a data engineering team. Additionally, iPaaS solutions are only designed to handle simple objects, so companies cannot send more complex information like ARR (annual recurring revenue) or product usage data to combine it with additional information in a marketing platform like Marketo or Hubspot. At their core, iPaaS solutions are imperative, they have to be told exactly what to do and how to do it. Data integration should be imperative.

2) CDPs (Customer Data Platform)

In its simplest form, a CDP collects and consolidates customer data from various sources into a single repository and then sends that information to various destinations. CDPs provide a lot more functionality compared to iPaaS solutions. CDPs give marketers and growth teams the ability to compile and consolidate data from various sources to create segments based on user behavior and traits and then sync these segments directly back into their third-party tools to build customized experiences without having to rely on data engineering teams. CDPs were created solely for marketing purposes and exist to remove the friction that often exists between marketing and data teams. However, CDPs typically rely on extremely limited and predefined data models centered on users and accounts. In reality, every company has its own unique objects (i.e. subscriptions, carts, products, playlists, artists, etc.) Additionally, CDPs do not integrate with other technologies. For example, when it comes to data transformation, organizations are limited to the native integrations and product features built within a CDP and cannot use other tools on top of it.

3) ETL (Extract, Transform, Load)

ETL is a relatively old data integration process that dates back all the way to the 1970s. An ETL tool extracts data from first-party databases and third-party sources. After this data is extracted, it is transformed to meet the needs of analysts and data scientists and then loaded directly into the data warehouse. This creates a problem because the data is stuck in the data warehouse. Tools like Informatica have popularized this methodology of data integration. Cloud data warehouses derive insights and create reports, but they are less practical when it comes to using that data to create meaningful campaigns and tailored experiences towards customers in the same way that CDPs and iPaaS solutions are used.

4) ELT (Extract, Load, Transform)

All of the problems just expressed have given rise to a new line of thinking focused around ELT (extract, transform, load). This has largely been fueled by innovations in the cloud data warehousing space. Solutions like Snowflake and BigQuery have become extremely efficient and reliable for analytics purposes. ELT tools like Fivetran have made it really simple for businesses to move data from various sources to the data warehouse. As a native SaaS solution, Fivetran provides nearly 200 custom connectors or custom integrations for various data sources and SaaS applications that are designed to handle the “E” and “L” aspects of ELT, automating the entire data pipelining process for engineers. On the other hand, dbt (data build tool) has completely revolutionized the “T” in ELT by creating a tool that runs on top of the data warehouse to transform data with SQL. With dbt, companies can create reusable data models to orchestrate and transform their data. ELT should be thought of as the solution which empowers the data warehouse. ELT combined with the data warehouse has completely changed the data ecosystem by eliminating data silos. However, by eliminating data silos, the data warehouse has, in fact, become a data silo. Data warehouses are useful for creating dashboards and reports which are often powered through a Business Intelligence tool like PowerBI, Looker, Tableau, etc. However, neither ELT nor data warehousing has addressed the problem of SaaS integrations which is really just focused on pushing data back into the tools of non-technical business users.

The New Paradigm (Reverse ETL)

This is the exact problem that Hightouch solves with Reverse ETL. “Reverse ETL is the process of copying data from a cloud data warehouse (i.e. Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse, etc.) to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.” Whereas ETL and ELT read from the source and write to the warehouse, Reverse ETL reads from the warehouse and writes to the source. The data warehouse already has all of the information from every data source across the entire organization, so it is only logical that it is standardized as the single source of truth. With Hightouch, companies can leverage their existing data models (churn rate, lifetime value, workspaces created, etc.) to sync that information directly back into a destination of their choosing in real-time. These syncs can be done manually or scheduled at a set interval (every few minutes or hourly/daily). They can also be scheduled using custom recurrence or cron expression. Syncs can even be set to run after a dbt job is complete. Better yet, Hightouch simply runs on top of the data warehouse and doesn’t actually store any data. With Hightouch, organizations can take full control of their data stack and eliminate the bottlenecks found in conventional data integration solutions. This removes the friction between data engineers and business teams because data engineers can finally focus on the actual jobs they were hired for, and business teams can access the data they need in their native tools. Democratizing the data in the warehouse creates a single source of truth across every single different operational system or SaaS application. Ultimately, Hightouch eliminates the need for SaaS integrations. Every data integration solution should be declarative to ensure there is alignment across teams and everyone is working towards the same goals.

Learn more about Reverse ETL in our blog post or sign up directly at https://app.hightouch.io/signup

How to Enrich Hubspot Data in 6 Steps

Luke Kline — Tue, 12 Oct 2021 14:18:44 +0000

What is Data Enrichment?

Data enrichment is the process of adding key information to a specific platform or tool. In most cases, this consists of capturing information like product data, billing data, marketing data, etc. in alternative sources (i.e. Amplitude, Marketo, Salesforce, etc.) and pushing that information back into the tools your sales and marketing teams leverage on a daily basis. Generally, this tends to be your CRM. The goal is to ensure that every member of your team has access to the same information so that they can work towards the same goal. Data enrichment helps to create a 360-degree view of your customer to ensure this goal is met. I actually just wrote a more in-depth guide on data enrichment if you are interested in learning more.

Understanding Data Enrichment: A comprehensive Guide | Hightouch

Why is Data Enrichment Important for Sales and Marketing Teams?

COVID has completely changed the sales and marketing landscape. I experienced this first hand working as an SDR in the data space for a small consulting company. As the second sales hire, I was tasked to outbound new accounts and prospects in order to generate new leads. Every day I would send out personalized emails/Linkedin messages and make phone calls to various prospects, with my goal being to schedule qualified meetings. During this process, all of my interactions were captured within Hubspot, which was our CRM.

Before COVID happened I was having some decent success. Then suddenly every outside/face-to-face salesperson became an inside salesperson like me. All of my previous strategies across Linkedin, email, and phone stopped working as salespeople from every industry surged to each medium. As my lead generation stagnated somewhat, our business quickly adapted, developing several open-source utilities. These were able to generate a substantial amount of inbound leads, but I struggled to personalize my messaging because all of the data we captured around these tools was pretty one-dimensional. To be specific, in Hubspot, I could only see properties like first name, last name, job title, form submission, last conversion, email, etc. I ended up only being able to personalize my messaging around our recent conversion events (forms submitted) and the information that I dug up on Linkedin. This meant I was limited to the information that was captured with our CRM. For most companies, a ton of additional information like product analytics, marketing analytics, web analytics, etc. is acquired through other operational systems or data sources. This data is then typically transformed and fed into a data warehouse like Snowflake for analytics purposes. If this type of information had been added to Hubspot when I was working as an SDR it could have been a gamer changer for me.

When data enrichment is analyzed through a marketing lens, this type of information is even more important and relevant. Enriching your CRM data gives your marketers the ability to quickly launch ad hoc campaigns. These campaigns could be something as simple as sending automated emails when a customer reaches a certain threshold on product usage, clicks a specific link, or visits a certain page on your website. Data enrichment is valuable for your marketing and sales teams because it generates a 360-degree view of your customer so that your teams can be personalized, timely, and intentional with their messaging.

How to Use Hightouch to Enrich Data in Hubspot?

Hubspot is an extremely popular CRM. However, enriching data within the platform is actually much easier than you think. Hightouch gives you the ability to begin syncing data from a variety of different sources into a destination of your choosing and set how frequently you want the data to update. Here is a step by step guide to get started:

Step 1: Connect Hightouch to your Data Source

Among our many integrations, we support data warehouses like Snowflake, Amazon Redshift, and Google BigQuery, in addition to spreadsheets like Google Sheets and Airtable, and production databases like PostgreSQL. In this case, I have chosen Google Sheets, but we support a ton of other data sources in addition to the one’s just mentioned.

Step 2: Connect Hightouch to Your Salesforce Instance

We support a variety of different destinations including Hubspot, Marketo, Mailchimp, Braze etc. In this case, we are connecting to Hubspot.

Step 3: Create a Data Model or Leverage an Existing One.

Data aggregation has never been easier because you can leverage your own in-house data models using SQL or dbt. Additionally, with our Visual Audience Builder, SQL is no longer a barrier, because marketers can now visually filter audiences based on specific properties or events. This means that both your data teams and marketers can easily leverage this tool.

Step 4: Choose your primary key to match your records in Salesforce.

We let you map on any unique key. Other iPaaS and Salesforce solutions only give you the ability to map data to the HubspotID. While this is doable, it requires a substantial amount of effort to set up leveraging an assortment of “if/then” clauses in various workflows. Creating these workflows can take a substantial amount of time. Hightouch handles all of these details behind the scenes so you can worry about what matters most, which is gaining access to the data. In this scenario, I am leveraging email as my primary key, but this can easily be changed in mere seconds.

Step 5: Create a sync

Once you have created a sync all you need to do is choose the appropriate columns that you would like to include and map them to the proper Hubspot fields. I am using contact properties, but these columns could be changed depending on the type of data you are looking to ingest.

Step 6: Schedule your Sync

Hightouch is unique in that it lets you choose how often you want to run your syncs. You can choose to run your syncs in a variety of ways. Most companies do this manually or on a set interval (i.e. every few minutes or hourly/daily).

All of this can be done in less than five minutes and you can choose to schedule your syncs or run them manually. It’s simple, easy, and efficient. Best of all, you can enrich all of your Hubspot data with product analytics data, marketing data, billing data, etc. The use cases are nearly endless and you can choose exactly what data you want to ingest into Salesforce. If you don’t believe me, check out our documentation.

How to Enrich Salesforce Data in 6 Steps

Luke Kline — Tue, 12 Oct 2021 13:48:45 +0000

What is Data Enrichment?

Data enrichment is the process of adding key information that is captured in alternative data sources (i.e. product data, billing data, marketing data, etc) and feeding them back into a specific platform. In most cases, this is a CRM because these are the tools your sales and marketing teams leverage on a daily basis. The goal is to create a 360-degree view of the customer so that every single one of your teams has the same information and can work towards the same goal. If you are interested in learning more about data enrichment, check out this recent guide I published: Understanding Data Enrichment: A Comprehensive Guide | Hightouch

Why is Data Enrichment Important for Sales and Marketing Teams?

In the last two years, sales processes and organizations have changed drastically. I experienced this first hand when I worked as an SDR for a small consulting company in the data space. I joined as the second sales hire; my sole purpose was to generate new leads and outbound new accounts and prospects. On a daily basis, I would send out personalized emails/Linkedin messages and call various prospects, all with the goal of scheduling a meeting. Every interaction I had was captured within our CRM.

Initially, I was having success, then COVID happened and every outside/face-to-face salesperson became an inside salesperson like me. Suddenly Linkedin, email, and phone were inundated with salespeople. As the leads that I was generating dropped off slightly, the business quickly adapted and created some additional open-source offerings. We were able to generate a substantial amount of inbound leads, but it was very hard to personalize because all of our data was one-dimensional. That is to say, our CRM only displayed basic information like first name, last name, job title, form submission, last conversion, email, etc. In the end, I could only personalize my messaging around our recent conversion events (forms submitted) and information that I was able to find on Linkedin. Ultimately, I was limited to the information that was captured within our CRM. In most cases, there is a ton of additional information that is captured through other various sources (i.e. product analytics, marketing analytics, web analytics, etc.). Oftentimes, all of this data is transformed, and fed into a data warehouse like Snowflake for analysis. This information could have been extremely useful to me in the sales role if it was just added as an additional layer in our CRM.

The application of this type of information is even more important when viewed through a marketing lens. Enriching the data in your CRM gives your marketers the ability to launch ad hoc campaigns at lightning speed. An example of this could be something as simple as sending an automated email when a customer reaches a certain threshold on product usage, visits a certain page on your website, or clicks a specific link. Data enrichment provides a ton of value to your sales and marketing teams because it creates a 360-degree view of your customer and highlights information that was never correlated before. This means your teams can be personalized, intentional, and timely.

How to Use Hightouch to Enrich Data in Salesforce?

Salesforce is one of the most popular CRMs. It’s actually what we use here at Hightouch. Enriching data in Salesforce is way easier then you think. With Hightouch you can begin syncing data from a variety of our different sources into a destination of your choosing and set how frequently you want the data to update. Here is a step by step guide:

Step 1: Connect Hightouch to your Data Source

Step 2: Connect Hightouch to Your Salesforce Instance

We support a variety of different destinations including Hubspot, Marketo, Mailchimp, Braze, etc. In this case, we are connecting to Salesforce.

Step 3: Create a Data Model or Leverage an Existing One.

Data aggregation has never been easier because you can leverage your own in-house data models using SQL or dbt. Additionally, with our Visual Audience Builder, SQL is no longer a barrier, because your marketing team can visually filter audiences based on specific properties or events. This means that both your data teams and marketers can easily leverage this tool.

Step 4: Choose your primary key to match your records in Salesforce.

We let you map on any unique key. Other iPaaS and Salesforce solutions only give you the ability to map data to the SalesforceID. While this is doable, it requires a substantial amount of effort to set up, leveraging an assortment of “if/then” clauses in various workflows. Creating these workflows can take a large amount of time. Hightouch handles all of these details behind the scenes so you can worry about what matters most, which is gaining access to the data. In this scenario, I am leveraging email as my primary key, but this can easily be changed in mere seconds.

Step 5: Create a sync

Once you have created a sync, all you need to do is choose the appropriate columns that you would like to include and map them to the proper Salesforce fields. I am using contact properties, but these columns could be changed depending on the type of data you are looking to ingest.

Step 6: Schedule your Sync

All of this can be done in less than five minutes and you can choose to schedule your syncs or run them manually. It’s simple, easy, and efficient. Best of all, you can enrich all of your Salesforce data with product analytics data, marketing data, billing data, etc. The use cases are nearly endless and you can choose exactly what data you want to ingest into Salesforce. If you don’t believe me, check out our documentation.

BigQuery vs Snowflake: The Definitive Guide

Luke Kline — Thu, 07 Oct 2021 20:13:07 +0000

BigQuery vs Snowflake: The Definitive Guide

The cloud data warehouse sits at the center of every modern data stack. Without a cloud-based data warehouse, it is nearly impossible to derive insights from your data. At its core, a data warehouse is an analytics platform where information from various data sources is stored for analysis. This data is used to make high-level decisions and answer pressing business queries. Today, every company is either already leveraging a data warehouse or in the process of adopting one (if you are reading this, then you are probably the latter). Although there are several key players in the data warehousing space, this post will focus solely on BigQuery and Snowflake.

What is Snowflake?

Snowflake is a Software-as-a-Service (SaaS) based warehouse solution that can run on any of the popular cloud providers (AWS, Azure, GCP). It was purpose-built for the cloud and has a few key components which make it extremely unique compared to other cloud data warehouses. Snowflake was launched publicly in 2014 and has since become a major player in the data warehousing industry being valued at $90.35 billion as of October 2021. Snowflake was developed in the cloud and for the cloud. This means it comes with zero baggage and almost no management or operational overhead. As a native SaaS service, Snowflake handles all of the backend infrastructure so that you can focus on doing what matters most: deriving insights from your data. Snowflake offers a ton of scalability enabling near-unlimited concurrent queries.

What is BigQuery?

Google BigQuery was first launched in 2010 as a part of Google Cloud Platform and was one of the very first data warehouse solutions available in the market. However, at the time, it was largely thought of as a complex query engine. Google BigQuery has come a long way since then and it is no longer the same solution. Similar to Snowflake, with BigQuery you don’t have to set up or maintain any infrastructure. Instead, you can focus on discovering meaningful insights using standard SQL. Google BigQuery is completely native to Google and doesn’t run on any other cloud provider.

Architecture

Snowflake is a completely serverless solution that has fully separated storage from compute and is based on ANSI SQL. Its architecture is based on an assortment of traditional shared-disk and shared-nothing architectures to provide you with the best of both worlds. It makes your data available to all compute nodes in the platform by using a central repository for persisted data. Snowflake leverages MPP (massively parallel processing) to process all of your queries. This means that each individual compute cluster (virtual machine or server) stores a portion of your entire data set locally. For storage, Snowflake organizes your data into separate micro partitions that are then internally optimized and compressed into columnar storage. In fact, all of the data that is loaded into Snowflake is reorganized, optimized, and compressed into a columnar format so that it can be kept in cloud storage. Snowflake automatically handles all aspects of data storage as it relates to file size, structure, compression, metadata, statistics, and other data objects that are only accessible through SQL queries and are not directly visible to you. Processing within Snowflake is done using “virtual warehouses” or clusters of compute resources. Each warehouse is an MPP that is composed of several nodes. Snowflake’s cloud services layer coordinates all activities across Snowflake to handle everything from user requests, authentication, infrastructure management, metadata management, query parsing and optimization, access control, etc.

Google BigQuery is very similar to Snowflake in that it is serverless and separated storage from compute. It is also based on ANSI SQL. However, its architecture is quite different. BigQuery uses a vast set of multi-tenant services driven by specific Google infrastructure technologies like Dremel, Colossus, Jupiter, and Borg. Computing in Google BigQuery is done using Dremel which is a large multi-tenant compute cluster used to execute SQL queries. Dremel does the heavy lifting by turning your SQL queries into execution trees. Tree leaves in BigQuery are called “slots”. They read data from storage and do the necessary computation. The branches of the tree are called “mixers” and they handle all aggregations. A single user on your team can harness thousands of slots to execute queries on an as-needed basis. Similar to Snowflake, BigQuery compresses data into a columnar format to store data in Colossus, which is Google’s global storage system. Colossus manages data replication, recovery, and distributed management so that you are not reliant on any single point of failure. BigQuery uses Google’s Jupiter network to move your data rapidly from one location to another. All hardware resource allocation and orchestration in BigQuery is done through Borg, which is Google’s precursor to Kubernetes.

Key Features

Scalability

Snowflake offers an auto-scaling and auto suspend feature that enables clusters to stop or start during either busy or idle periods. With Snowflake your users cannot resize nodes, but they can resize clusters in a single click. Additionally, Snowflake enables you to autoscale up to 10 warehouses with a limit of 20 DML per queue in a single table. On a similar note, BigQuery automatically provisions your additional compute resources as needed and takes care of everything behind the scenes. However, with BigQuery there is a limit to 100 concurrent users by default. Both platforms let you scale up and down automatically based on demand. Additionally, Snowflake gives you the ability to isolate workloads across businesses in different warehouses so that different teams can operate independently with no concurrency issues.

Security & Compliance

Snowflake automatically provides encryption for data at rest. However, it does not provide granular permissions for columns, but it does provide permissions for schemas, tables, views, procedures, and other objects. Conversely, BigQuery provides security at a column-level as well as permissions on datasets, individual tables, views, and table access controls. Since BigQuery is a native Google offering, you also have the ability to take advantage of other Google Cloud services that have built-in security and authentication to BigQuery, making integrations much easier. Snowflake does not provide any built-in virtual private networking. However, if Snowflake is hosted in AWS, AWS PrivateLink can address this issue. On the other hand, BigQuery gives you the ability to leverage Google’s virtual private cloud. Both BigQuery and Snowflake are compliant with HIPAA, ISO 27001, PCI DSS, SOC 1 TYPE II AND soc 2 TYPE II, etc.

Data support

Both platforms support structured and semi-structured data (Avro, Parquet, Orc, CSV, JSON) and as of September 20th, 2021, Snowflake announced support for unstructured data and has made it available within public preview.

Administration

BigQuery and Snowflake enable you to manage user roles, permissions, and data security. All performance tuning happens automatically and as your data volume grows and queries become more complex, each platform automatically scales in the background to address your needs. Additionally, since each solution is offered as a SaaS service, all of the underlying maintenance and infrastructure is handled for you. BigQuery automatically handles everything and Snowflake lets administrators scale compute and storage layers independently. This means you can isolate workloads without having to deal with the sizing and permissioning effort associated with virtual warehouses in Snowflake.

Data Protection

BigQuery and Snowflake each do a really good job when it comes to protecting your data. Snowflake has two features to help with this, Time Travel and Fail-safe. With Time Travel, Snowflake preserves a state of your data before it is updated. The standard retention period for Time Travel is one day (Enterprise customers can specify a period of up to 90 days). Time Travel can be applied to databases, schemas, and tables. With Fail-safe Snowflake can recover historical data. This period is non-configurable and starts immediately after the time travel retention period ends. Although you must ask Snowflake to initiate the recovery, this feature is designed to allow Snowflake to recover any data that may have been damaged or lost due to extreme operational failures.

With BigQuery, administrators can easily revert changes without having to deal with the hassle of a recovery. BigQuery keeps a complete seven-day history of all changes against its tables. However, to preserve table data for longer than seven days, BigQuery offers a feature called table snapshots (snapshots are used to preserve the contents of a table at a particular point in time).

Pricing

Snowflake’s pricing model bills based on the usage of each individual warehouse, so the cost is largely dependent on your overall usage. Snowflake has several warehouse sizes (X-Small, Small, Medium, Large, X-Large, etc.) which all drastically vary in cost and server/cluster amount. However, Snowflake’s base pricing for an X-small warehouse starts at about $0.00056 per second. Pricing doubles with each increase in warehouse size. Snowflake has several plans that allow you to pre-purchase credits to cover usage. This is good since the upfront costs of Snowflake’s pre-purchase capacity plans offer lower rates compared to the on-demand option.

On the other hand, BigQuery charges for the number of bytes scanned or read. BigQuery offers on-demand pricing and flat-rate pricing. On-demand pricing charges you for the number of bytes processed in a given query at a rate of $5 per TB (the first TB of data processed per month is completely free of charge). With BigQuery’s flat-rate pricing model, you purchase slots (virtual CPUs) or dedicated resources to run your queries. The monthly cost for 100 slots is around $2,000 (this can be lowered to $1,700 with an annual commitment)

Storage costs for both Snowflake and BigQuery are relatively cheap. Snowflake charges $40 per TB per month for on-demand customers and $23 per month for upfront customers. On the other hand, BigQuery charges $20 per TB for active storage and $10 per TB for inactive storage.
Cloud Infrastructure
As a native SaaS offering Snowflake was designed to run on any of the major cloud providers (AWS, GCP, Azure). On the other hand, BigQuery is a native Google Cloud offering, meaning that BigQuery is only available if you are on the Google Cloud Platform.

Performance

Out of the box with no fine-tuning, Snowflake tends to outperform every data warehouse including BigQuery on query times, having faster performance and execution times. Snowflake and BigQuery are probably more alike than unlike. BigQuery will most likely be more efficient and have lower compute costs if you are running lots of queries occasionally with a high idle time. On the other hand, if you have more predictable and continuous usage it is likely that it will be more cost-effective to leverage Snowflake.

The main differences between Snowflake and BigQuery

There are several key differences to note between Snowflake and BigQuery. Firstly, scaling within Snowflake is not entirely automatic. It requires you to give some input. On the other hand, BigQuery handles everything automatically for you. Secondly, Snowflake runs on any of the major cloud providers, whereas BigQuery only runs on GCP. Snowflake is a full SaaS solution and BigQuery is a PaaS solution. Additionally, Snowflake has a unique feature called Secure Data Sharing which gives you the ability to share selected objects in a database with other Snowflake accounts. With Secure Data Sharing, no data is actually copied or transferred between accounts because everything happens in Snowflake’s unique services layer and metadata store. BigQuery does not have a data sharing feature. However, BigQuery does give you the ability to create authorized views to share query execution results with particular users or groups without giving them access to the underlying tables. BigQuery also has a feature called BigQuery ML which lets you create and execute machine learning models which can greatly improve your query performance. BigQuery definitely has the edge on Snowflake when it comes to machine learning and real-time streaming workloads. Ultimately, the use case you are trying to solve should be at the forefront of every decision you make when it comes to choosing a new cloud data platform.

What comes after Snowflake and BigQuery: Reverse ETL

The entire purpose of adopting a modern cloud data warehouse is to consolidate data silos into a centralized data repository so that analysts can leverage business intelligence tools for analytics and reporting purposes to create a single source of truth. In reality, data warehouses just create a larger data silo for your team. A data warehouse gives your team the ability to access all of your data in one place and to create high-level dashboards and reports for your key stakeholders, but none of this information is actionable for your other business teams. After all, data is only so useful when it is in a report.

This is the exact problem Reverse ETL solves. “Reverse ETL is the process of copying data from a central data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.”

Hightouch syncs data in real-time directly from your data warehouse and pushes it back into the native tools of your business users like Salesforce, Hubspot, Marketo, Amplitude, Iterable, etc. You can even leverage your existing data models (ex: lifetime value, churn rate, active users) or create new ones in Hightouch using just SQL. This means that your business teams can leverage this data in real-time to make meaningful decisions that can positively affect your bottom line. Better yet, your engineers can focus on the actual jobs they were hired to do rather than having to send CSVs and create ad hoc data pipelines.

Want to learn more about Reverse ETL?:
Download our Reverse ETL Whitepaper where we touch on the technology and applications of Reverse ETL across your business.

Looker Actions: The Definitive Guide

Luke Kline — Thu, 07 Oct 2021 20:05:22 +0000

Looker Actions: The Definitive Guide

Within the last decade, data has become increasingly valuable. In fact, it is estimated that worldwide spending on big data and analytics solutions is estimated to reach $215.7 billion in 2021. Today, every company is collecting customer data from a variety of different source systems and using that information to power insights. However, there is a key challenge across all industries, and that is data activation. To be specific, every organization has a ton of data, so the problem is figuring out how to leverage that data outside of a dashboard or a report. In most cases, data is only used to make high-level business decisions that don't have any effect on the overall customer journey.

The core challenge in marketing has always been data accessibility. When marketers are waiting on data, they can’t experiment, learn, or iterate, so all of that time is wasted. Compounded growth comes from compounded learning, and compounded learning comes from data being accessible.

-- Fareed Mosavat (VP Programs & Partners at Reforge & former Director of Product at Slack)

Data Activation is the process of turning insights into actions. This is typically done by taking clean, transformed data out of the data warehouse and pushing it back into the operational systems of business teams like sales or marketing. Simply stated, data activation is focused on democratizing data for various teams so they can use that information to create relevant customer experiences, and personalized messages within the native platforms they use on a daily basis. The purpose of this guide will be to go over everything about Looker and Looker Actions, which is a tool for data activation.

What is Looker?

At its core, Looker is a (BI) tool that is mainly used for dashboarding purposes. Google actually purchased Looker in 2020 for $2.6 billion. Looker creates insightful visualizations on the data that is specified by the user. It is a browser-based solution that is designed to make collecting, visualizing, and analyzing data simpler. Looker gives users the ability to create and share interactive dashboards, automate alerts, and leverage embedded analytics. It can actually be deployed in the cloud or on-premise depending on the needs of the business. However, getting started with Looker requires a substantial amount of effort because Looker is incapable of creating reports unless all of the data has been properly formatted and modeled in a specific way, using LookML.

What is LookML?

LookML is Looker’s unique coding language that provides predefined syntax and data types that can be used for data modeling. In order to leverage Looker’s reporting and dashboarding capabilities, all the data has to be modeled through LookML. LookML constructs a data model that Looker can use to create SQL queries which then extract the data that is required for analysis. Ultimately, LookML defines calculations, data relationships, dimensions, aggregations, and constructs SQL queries to run against a particular database. All LookML projects consist of several key components:

Models: A Model provides intuitive data exploration capabilities for specific users by creating a customized portal within various databases.
Views: A View showcases a list of fields and defines how those fields link to underlying tables or derived tables.
Explores: An Explore represents a view that business users can query by specifying the join relationships with other views.
Joins: A join gives users the ability to combine data into multiple Views that can be joined into an Explore.
Manifest Files: A manifest file contains instructions for using files imported from other projects.

Looker’s documentation provides a much more detailed explanation.

What is Looker Actions?

As stated earlier, Looker Actions is a data activation tool. Looker Actions enables users to analyze and act on data in real-time. With Looker Actions, business teams can perform tasks within other tools directly from Looker. These are popular actions that can be something as simple as automating an email, updating a value in another application, alerting team members in tools like Slack, sending an email list to a marketing platform like Marketo & Braze, etc. Looker Actions focuses on pushing data back into the native tools of business users so that they can leverage it in real-time to make meaningful decisions that can positively affect the bottom line. In addition to pushing data back into the native tools of business users, the Looker Action Hub feature enables users to deliver content to third-party services integrated with Looker directly through an "action hub server". On a similar note, there is also a form method that is often used to collect additional information from the user before a custom action is executed.

What are the problems with Looker Actions?

Looker itself is a great solution that has many valuable features and has completely changed the landscape of business intelligence tools. The overall premise behind Looker Actions is solid in theory. However, Looker Actions does not come without its flaws.

Since Looker is based on its own unique coding language (LookML), data modeling and matching capabilities are extremely limited. In order to use Looker Actions, businesses have to model all of their data in LookML. This means that companies cannot use their current data models or the native tools in their current tech stack to transform their data. This can be both time-consuming and expensive for businesses not already using Looker. Even worse, most organizations leverage SQL to transform their data. LookML is only based on SQL (not entirely built on it), so users will have to learn an entirely new language to some extent. A much simpler and more efficient tool for transforming data is dbt. Since dbt is based entirely around SQL, users can create data models that can be reused. Better yet, any data models that are dependent upon one another will automatically update whenever a model is changed. At its core, dbt is an extremely efficient tool that engineers and developers can use to orchestrate, transform, and model their data.

Looker Actions also has trouble handling the scale for large data volumes. This is because Looker Actions does not do any data differencing (diffing). Diffing is the process of checking what data has been changed before sending it to another application or system. For example, if an analyst is asked to send a list of users who visited the pricing page on Monday and then asked again to do the same thing on Tuesday, there is likely going to be some crossover (ex: on Monday the list might have 5 million visitors and on Tuesday the list might have 10 million visitors). Every time this data is requested, Looker Actions sends all records regardless of duplicates. Similarly, Looker Actions also lacks batching capabilities for many destinations, so if a large data set is needed to be moved, it will likely fail due to a rate limit issue.

Additionally, Looker Actions does not have many destinations (with only 20 current integrations), so it does not connect with every business/SaaS tool stack. It also does not give users the ability to map data using any specific field. Organizations are forced to map data to brittle fields like SalesforceID, so in order to leverage LookerActions users have to keep changing their underlying LookML data models to access the few destinations that are offered. This can be a ton of work. For example, if a business wants to update a certain field in Marketo for a specific set of users, that entire data model needs to be changed to align with the rigid mapping that Looker Actions is expecting. If Looker Actions is only able to identify users in Marketo by their MarketoID for example, users have to update their data model and add in MarketoID. Even worse, when the data model is not configured properly, marketers have to involve the data team which can drag out the process even longer.

At its core, Looker Actions forces users to create workflows within Looker where the overall user experience is heavily lacking. End tools like Braze, Marketo, Iterable, etc. already have their own dedicated UIs made for workflows, so there is really no reason to use a worse version of what these solutions have to offer. In reality, Looker is a business intelligence tool that is very efficient at aggregating data and democratizing relevant reports and dashboards to key business units and stakeholders. However, there is no logical reason why companies already leveraging additional tools like Braze or Marketo should ever create campaigns in Looker Actions.

Alternatives to Looker Actions: Hightouch

Where Looker Actions is lacking in features, Reverse ETL fills the gap.“Reverse ETL is the process of copying data from the data warehouse (i.e. the analytics platform) to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.” Hightouch solves this exact challenge by leveraging Reverse ETL to take transformed data out of the data warehouse and sync it back into the native tools of business teams like Salesforce, Marketo, Hubspot, Iterable, Braze, Amplitude, Google Sheets, etc.

Hightouch is based entirely on SQL which means that data teams can leverage their existing tools, (like dbt, which is natively integrated) to model and transform their data. Hightouch is feature-rich and has over 70 integrations and supports various data sources like Google BigQuery, Snowflake, AWS Redshift, etc. compared to the 20 integrations offered by Looker Actions.

Hightouch gives users the ability to map to any field or user attribute (i.e. email, purchase date, etc.). Hightouch also diffs data between syncs to specific destinations, saving both time and money and ensuring that no duplicate data is ingested. Whereas Looker Actions limits the fields that can be updated in end tools, Hightouch lets users update any field. Hightouch can also send data in batches unlike Looker Actions, where rows often get rate-limited and syncs fail on large data sends.

With Hightouch, rejected rows won’t cause the entire sync to fail because they are retried again in the next sync. In addition to warehouses and data lakes, Hightouch integrates directly with Looker and LookML so companies can connect directly to Looker and select their reports from there.

Better yet, Hightouch Audiences lets business users visually define and filter audiences with an intuitive UI that doesn’t require any knowledge of SQL. Marketers and other business users can use this to create custom audiences to message for various marketing campaigns and sync those audiences directly into their email or ad tools with no engineering favors needed.

Summary

Looker is a fantastic BI tool, but it is definitely lacking as it relates to data integration. Looker Actions are completely unusable if data has not been modeled through LookML. Data teams want to use the critical business tools they are comfortable with to manage their data. In most cases this is SQL. This is the exact reason why Reverse ETL and Hightouch are the future.

Want to learn more about Hightouch?

Sign up now for free or book a demo. Or download our Reverse ETL whitepaper to learn about how top companies like Imperfect Foods, CirceCI, Autotrader and Lucidchart are using their warehouse for business workflows below.

Matillion: The Definitive Guide

Luke Kline — Wed, 29 Sep 2021 15:04:13 +0000

Matillion: The Definitive Guide

Countless organizations have been collecting data for a long time and they have been trying to leverage data-based insights for even longer. Every business has a set of unique data sources which capture information on various objects like users, accounts, etc. As the complexity of the data ecosystem has increased, so has the number of disparate data sources within any given company. With the emergence of the cloud data warehouse, companies are focused on consolidating their data into a single location to eliminate the data silos that exist between different sources. Conventionally, this process has been known as data integration. The purpose of this guide will be to go over everything about Matillion, which is a tool for data integration.

Understanding Data Integration & Orchestration

In its simplest form, data integration is the process of consolidating data from various data sources into a single unified view. Without data integration, you are collecting data but not acting on it. Data integration enables your business to derive insights on the information collected. The purpose of data integration is to democratize data and remove data silos. The easiest way to understand data integration is to break it down into three pillars.

Data Acquisition: The place where data is initially collected (ex: Salesforce, Marketo, Amplitude, Netsuite, etc).
Data Ingestion: The process of moving data from various sources or operational systems to a new location like a data warehouse (ex: Snowflake, Redshift, BigQuery, Synapse, etc.).
Data Transformation/Preparation: The point at which raw data is transformed and prepped for analysis so that analysts can leverage BI tools like Tableau, PowerBI, Looker, etc. to create detailed reports and dashboards.

Ultimately a modern data stack should be an efficient end-to-end flow from data acquisition and data ingestion to data transformation). Data orchestration sits on top of data integration as another step in the process with a focus placed on managing data and automating the process as it relates to the infrastructure, data pipelines, and workflows required to move data.

Understanding ETL & ELT

Conventionally, ELT (extract, transform, and load) has been the most effective way to handle data integration. This process dates all the way back to the early 1970s and has been the standard for a long time. However, the mass adoption of data warehouses changes this paradigm because businesses realize that they can take advantage of ELT (extract, load, transform) to leverage the power of the cloud to transform their data. Data modeling and transformation tools like dbt have made this easier than ever before.

The main difference between ETL and ELT really lies in the transformation layer. With ETL you transform data before ingesting it and with ELT you transform data after it has been ingested. ETL tends to be a longer process since the data is transformed before ingestion, whereas ELT loads data in a much shorter amount of time. ETL solutions often require a ton of building and maintenance and use custom scripting languages to enable transformations. Even worse, a large portion of ETL solutions are on-premise. On the other hand, ELT solutions use SQL to tackle transformations and are almost always strictly cloud-based and fully automated.

What is Matillion?

Matillion claims to be an all-encompassing SaaS solution and cloud data integration platform that handles the entire data integration process, all the way through from acquisition and ingestion to transformation. It offers a low-code approach that utilizes the native compute power of popular cloud data warehouse systems. That is to say that Matillion leverages the power of your cloud environment to transform your data. Matillion has two flagship products, Matillion ETL and Matillion Data Loader. Matillion Data loader is strictly used for moving data. On the other hand, Matillion ETL is Matillion's flagship product. It is a more robust ETL solution. Both offer a detailed level of configuration for advanced users.

Matillion Series E Funding

As of September 15, 2021, Matillion announced $150 million in Series E funding. This was led by General Atlantic and several other companies including, Battery Ventures, Sapphire Ventures, Scale Venture Partners, and Lightspeed Venture Partners. To date at the writing of this article, Matillion has raised a total of $310 million bringing its valuation to $1.5 billion.

Matillion ETL Key Features:

Matillion ETL currently has just over 100 pre-built connectors for various data sources and currently supports Amazon Redshift, Snowflake, Azure Synapse, Google Bigquery, and Delta Lake as destinations.
In addition to pre-built connectors, you also have the ability to create your own custom connectors to any REST API source system.
Since everything is low code, you can leverage Matillion’s graphical UI to build orchestration jobs to build sophisticated ETL pipelines.
You can also build transformation jobs within Matillion by selecting from more than 30 components to create large-volume, complex transformation workflows
Every job in Matillion can be scheduled to run at a specific time or a regular interval depending on your needs. You can also create generic jobs which can be reused across different projects.
Matillion also gives you the ability to stage data in your own cloud environment.
Once your jobs are orchestrated and scheduled in Matillion everything is fully automated. Matillion’s API triggers jobs and its sophisticated flow logic handles the data.
You can even use custom scripting leveraging languages like Bash, SQL, or Python for specific transformation requirements.
Since Matillion is hosted in your cloud environment, you can send alerts and notifications directly to Slack or email.

What is Matillion Data Loader?

Matillion Data Loader is a free solution that Matillion offers to help extract data from your source systems and load it into your target destination (i.e. your data warehouse). Since Matillion Data Loader is free, it comes with fewer features. For instance, you can’t perform any transformations. Currently, Matillion Data Loader only supports around 35 connectors and does not appear to have support for Delta Lake or Azure Synapse as a destination. However, it does support Snowflake, Amazon Redshift, and Google BigQuery. Matillion Data Loader provides a simple wizard to build data pipelines for extracting and loading your data. Matillion Data Loader doesn’t offer any extra developer tools or any automation capabilities. It’s really just a point-to-point tool for moving data.

The Problems with Matillion

Matillion has positioned itself as a SaaS solution, but in reality, Matillion is an iPaaS (integration platform as a service) solution. It supplies you with a platform to enable data integration, but it is up to you to handle all of the nitty-gritty details and set everything up. Matillion’s UI is very intuitive and user-friendly and it can be relatively simple to create various data pipelines and orchestrate ELT jobs. Although Matillion is an ELT platform, it should really be thought of more as a data orchestration platform. That is to say that Matillion enables you to coordinate the execution and monitoring of your data pipelines and workflows.

The problem with this method is that it does not scale well. Depending on your data ecosystem, setting up these data pipelines quickly creates complex workflows. Even worse, you have to maintain and address errors when they inevitably occur. ETL and ELT solutions are meant to free up the time of engineers, but if your engineers are maintaining the pipelines without writing code, you have to wonder how much value you are deriving from the solution.

Additionally, Matillion largely has enterprise customers. This means it does not place any emphasis on SMB/mid-market companies. To be specific, Matillion does not provide a ton of support for smaller businesses.

Matillion Alternatives: Why Fivetran + dbt is a Better Solution

Whereas Matillion is an iPaaS solution that supplies you with a data integration platform to build ELT pipelines and orchestrate them, Fivetran is a fully managed SaaS platform that manages all of that for you. Fivetran currently supports over 150 different connections for various data sources and around ten different destinations like Snowflake, Azure Synapse, Google BigQuery, Amazon Redshift, Databricks, etc.

Unlike Matillion, Fivetran does not have a graphical UI that forces you to build and connect your various data pipelines and map out all of your workflows. Fivetran simply connects to your data source and you tell it where to load your data. You don’t have to worry about the entire data orchestration aspect or any of the common factors that break data pipelines like random errors, schema changes, execution order, changes in data models, etc.

Similar to Matillion, Fivetran is an ELT solution, but it extracts data from more sources and loads data to more destinations. However, aside from the fact that Fivetran handles all of the data orchestration for you, it is important to note that Fivetran does not currently have any transformation abilities. To be specific, Fivetran only handles the “E” (extract) and “L” (load) aspects of ELT. It doesn’t offer built-in transformations like Matillion.

Thanks to dbt, this is not a problem. If you are not familiar with dbt, it is a transformation tool that leverages SQL. It is extremely efficient at transforming data that is already loaded into your warehouse for analytics purposes. Strictly speaking, dbt gives you the ability to create data models that can be reused. Better yet, if your data models are dependent upon one another, one change in one data model will update another. If you are not using dbt today, then you will end up building it internally down the line. Pairing Fivetran and dbt together creates a flexible solution that is more efficient than Matillion.

What Comes After ETL/ELT?

Once the data is fully loaded into your warehouse and transformed, the typical process is to let your data analysts begin building reports and dashboards to address the questions coming from your key stakeholders. Often this is powered through a BI tool like Tableau, PowerBI, Thoughtspot, Looker etc.

There’s still a major problem though. All of this data that you spent so much time and effort trying to consolidate and transform in your warehouse is now siloed because now the only people who have access to that information are your analysts. A data silo is just a collection of information that is not accessible by the other parts of your organization. If your data only lives in a dashboard, it’s only useful for high-level business decisions and not actually actionable by your other team members.

Dashboards are practical for identifying trends and showing a zoomed-out view of your data, but they are not useful for associating and merging data collected by other sources to an individual user. In reality, your Sales and Marketing teams are often asking questions like “Who is the most active user in an account?” or “Which contacts have downloaded X?” “Which customers are in an active deal or POC?” or “What is the annual recurring revenue of company ABC?”, all with the intent of improving personalization.

Answering these questions often involves going to your data teams and requesting the information. In most cases, your data team already has a list of backlogged items they need to address, so they don’t have time to build an entirely new data pipeline to push this information out of the warehouse and into the tool they requested. By the time this data is made available in a CSV, it is unusable because the customer or prospect has already moved on to another point in their journey.

Why You Need Reverse ETL

“Reverse ETL is the process of copying data from the data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.” The data warehouse should be your standard of truth. However, the information it houses should not be inaccessible to your other teams. It should be democratized and made available to everyone within your organization so they can leverage it day in and day out to make decisions that can positively affect your bottom line.

Reverse ETL tools like Hightouch solve this problem by taking the transformed data out of your warehouse and syncing it back into the native tools of your business teams (i.e. Salesforce, Hubspot, Marketo, Braze, Amplitude, Asana, Google Ads, etc.). Better yet, Hightouch leverages all of the data models built in your warehouse (ex: PQL, MQL, SQL, lifetime value, propensity scores, customer health scores, churn rate, overall product usage, etc.) and pushes this information to the destination of your choosing. You can even send information directly to Slack to notify your sales, marketing, product, or customer support teams to take an action in real-time based on the criteria defined in your data models.

Reverse ETL is the final piece of a modern data stack. When it is added as another layer on top of data integration it makes for some really interesting discoveries because it allows you to iterate and experiment at a speed that is simply not possible otherwise. Likewise, when the transformed data is taken out of the warehouse and pushed back into the operational systems of your business users it creates a single source of truth across the entire organization because now the information that was in your warehouse is in the hands of everyone.

Why Can’t I Use Matillion for Reverse ETL?

ELT solutions like Fivetran and Matillion read from the source and write to the warehouse. Reverse ETL solutions like Hightouch read from the warehouse and write to the source. The process between the two is completely different. Matillion actually reads and writes in both directions for some connectors, but this only provides some underbaked capabilities because Matillion specializes in ELT, not reverse ETL.

Summary

All in all, Matillion is a tool for data integration that places a heavy amount of work on your team. Matillion is not the industry standard data integration tool. A modern ELT product like Fivetran and a dedicated transformation tool like dbt provide the best basis for creating a modern data stack. If you are using ETL/ELT, Reverse ETL is the missing piece in your architecture to activate your data.

This alternative ETL approach provides the best way to create actionable insights. At the end of the day, cloud data warehouses help power business intelligence and analytics, but they do little to leverage and democratize that data for day-to-day operations to improve the overall customer experience. This is the exact reason Reverse ETL is so valuable.

Want to learn more about Reverse ETL?:

Download our Reverse ETL Whitepaper where we touch on the technology and applications of Reverse ETL across your business.

7 Alternatives to Using Segment

Luke Kline — Wed, 29 Sep 2021 14:56:22 +0000

What is a CDP (Customer Data Platform)?

CDPs have risen up as one of the best solutions to tackle the challenge of data accessibility. Strictly speaking, CDPs collect and consolidate data from various sources and send that information to different target destinations (i.e. marketing tools and sales tools). The purpose of a CDP is to aggregate the information from various data sources and combine it together to create a single 360-degree view of the customer. In addition to this, they also provide an additional activation layer to enable marketing automation. This is because CDPs were created to analyze user behavior and personalize their experiences. Every company has data, so CDPs are useful for both B2C companies and B2B companies.

Conventionally, it has been a huge challenge for marketers to gain access to data because all of the useful information is either stuck in disparate data sources and tools. CDPs solve this problem by supplying the marketing team with a relatively easy-to-use platform that requires little or no input from the data or engineering team.

What is Segment?

Segment is one of the most popular CDPs. In fact, in 2020, Segment did over $144 million in revenue and was recently acquired by Twilio for $3.2 billion. At its core, Segment is a SaaS offering that helps businesses collect and leverage data from digital properties like websites, apps, SaaS tools, etc. Simply stated, Segment is an event tracking platform that is aimed towards app developers and SMB/mid-sized companies. Segment simplifies the data collection process and gives users the ability to spend more time leveraging their data to create personalized experiences and relevant content for customers.

What does Segment do?

Segment was originally created to solve the challenge of collecting and moving event data. In its simplest form, Segment helps fire user events that are captured in your product and sync that data to a variety of SaaS tools in addition to data warehouses. It generates messages about what is happening in an app or website and then translates the information in those messages into a format that is understandable by other tools. Segment provides an API library that can run as code on a website, app, or server to generate messages based on specific triggers defined by the user. This code can be as simple as copying and pasting a snippet into the HTML of a website to track page views, or it can be embedded within an app to send messages when a user performs a specific action like opening or closing an app or abandoning a cart after a set amount of time. Once these messages have been generated, they can be sent directly to Segment servers to be translated or forwarded to specific destinations.

Who uses Segment?

Segment has two core audiences, marketing teams, and engineering teams. Segment appeals to marketers because it gives them an easy way to collect and merge different data sets together to create various customer profiles, enrich audiences, and activate campaigns across various tools. On the other hand, engineering teams are drawn to Segment because they don’t have to spend time writing their own event tracking library and writing integrations to all of their SaaS tools since all of this is supplied through Segment’s API library. This means engineers can focus their efforts on the high-priority tasks which have the most impact on a company’s bottom line. Best of all, marketers don’t have to go through the data or engineering team every time they want to ask a question or gain access to a specific data set because all of this is provided through Segment.

What is Segment Warehouses?

Segment’s Warehouse feature gives users the ability to send information natively to various data warehouses (i.e. Snowflake, Amazon Redshift, Azure Synapse, Google BigQuery, etc.). This is super useful since the warehouse is typically the final resting point for data and acts ast the analytics platform in most organizations.

What is Segment Personas?

Segment Personas is a visual audience builder for marketers that gives businesses the ability to enrich customer profiles with new traits. Segment Personas takes event data across multiple devices and channels and intelligently merges it together using identity resolution to create a single view of the customer. Segment defines an audience as either a list of users or accounts that match specific criteria. An example of this could be users who abandoned their shopping cart at X amount of time and purchased an item in the last seven days. These audiences are basically customized segments. Marketers can define these segments in a point-and-click UI without needing to know SQL.

What is Segment Functions?

With Segment Functions, users can do basic transformations on events and send them to external tools and various APIs without having to set up or maintain any infrastructure. However, the transformation functionality is very limited and not nearly as strong as native languages like SQL or dedicated transformation tools like dbt.

What are the alternatives to Segment?

1. mParticle:

mParticle is a Segment alternative. However, whereas Segment is tailored to SMB/mid-sized companies, mParticle is built for enterprise-sized companies. Instead of falling into the CDP category, mParticle brands itself as “Customer Data Infrastructure” or (CDI). At its core, CDI focuses on data integration, data governance, and audience management. Since mParticle is tailored towards enterprise companies, it places a higher focus on providing support. In fact, all mParticle customers are assigned a dedicated customer success rep on day one. Additionally, mParticle was actually one of the very first companies in the CDP space to offer professional services and release an audience-building product. To be specific, Segment’s Personas product was a direct result of mParticle’s Audiences product.

Another core difference between the two lies in the fact that mParticle offers more robust capabilities around mobile event tracking (i.e. apps) and data integration. Since Segment strictly tailors towards SMB, it really only focuses on web tracking. All in all, mParticle tends to offer more robust capabilities than Segment. Both solutions are tailored towards developers and have a substantial implementation/setup time before marketing teams can begin leveraging either tool to the fullest extent.

2. Tealium

Tealium is a Segment alternative that places a higher emphasis on marketers instead of developers. Tealium is a CDP solution that focuses on enterprise-sized companies. Before becoming a fully-fledged CDP, Tealium fell into the category of “enterprise tag management” (i.e. a competitor to Google Tag Manager, which is a free service that gives users the ability to implement marketing tags or snippets of code for tracking purposes on their website). On the other hand, Tealium’s Flagship product, “Tealium IQ”, offers more flexibility because it is not a native Google service like Google Tag Manager. This means it integrates with a variety of different platforms.

Aside from offering the typical capabilities of a CDP, Tealium is HIPAA compliant and will sign on BAAs or business associate agreements (a contract that outlines each party’s individual responsibilities for protected health information). The same cannot be said for either Segment or mParticle. Tealium’s main selling point is focused on privacy and security. This is why so many healthcare and financial services companies are drawn to it. This removes some of the 3rd party risks and other risk factors that can affect underlying business goals.

3. Lytics

Lytics is an alternative to Segment that is very focused on empowering marketers rather than engineers and developers. Due to this reason, the implementation time for Lytics tends to be substantially longer than players like Segment, mParticle, Tealium, etc. However, as an upside, Lytics has an extremely intuitive UI that is tailored towards marketers rather than developers which makes it extremely streamlined and easy to use. Lytics has much more detailed and predictive machine learning capabilities compared to the other platforms. In fact, Lytics Machine Learning API provides a framework to create custom ML models directly within the platform. These models are self-training and continuously update in real-time. All audiences created in Lytics are also updated in real-time with no user input.

4. Rudderstack

Rudderstack is slightly different to the previous alternatives in that it is a fully open-source CDP platform tailored towards developers. Rudderstack's core product functionality enables developers to deploy data pipelines and collect customer data from various apps, websites, and platforms to autotrack events. This information can then be activated in the data warehouse. Although RudderStack claims to be open-source, most of the features like cloud connect, ETL/ELT, reverse ETL, etc. are locked behind the paid offering. RudderStack tries to compete in multiple areas, but at the end of the day, it doesn’t do anything particularly well. As the age-old quote goes “Jack of all trades, master of none.” RudderStack has a couple of upsides though. Being an open-source platform, RudderStack is the only CDP that can run entirely on-premise. Additionally, Rudderstack does not own any of the data it hosts because everything is kept within an organization’s proprietary technology stack. In most cases, companies choose RudderStack when platforms like Segment get too expensive due to the MTU (monthly tracked user) pricing model in addition to the data ownership aspect.

5. SimonData

SimonData is an email service platform combined within a CDP. It is very similar to solutions like Braze, Iterable, Salesforce Marketing Cloud, Marketo, etc. Most CDPs capture data from various sources to create audiences and then push that information back into operational platforms so that marketers can use it to launch campaigns. However, SimonData claims to do all of this in one. It connects natively to data warehouses, but it moves the data out of the warehouse which can be very bad for compliance. SimonData also locks users into a simple user/event data model rather than supporting all types of data within the warehouse, like products, groups, flights, trips, purchases, etc. SimonData also creates another challenge in that it doesn’t support notifications efficiently. The needs of marketers are evolving extremely fast. This is one of the many reasons that companies are choosing to keep marketing platforms and data platforms separate and leverage dedicated solutions like Iterable or Braze on top of a CDP.

6. ActionIQ

ActionIQ focuses on helping companies achieve a full “digital transformation”. ActionIQ is very different from other CDPs because it leverages a database and adds a CDP as an additional layer on top. To be specific, ActionIQ helps companies assemble disparate data sources together into their own unique Action IQ database and enables users to leverage this data through a conventional CDP. This solution tends to be very professional services heavy and getting data into the platform can be extremely challenging. It often takes up to a year to implement. Similarly, ActionIQ’s entire data model is focused solely on contacts and fields, so companies have little ability to leverage the data models that impact their business the most. It is really tailored towards businesses that have already made a significant investment into a specific technology stack and are simply looking for additional tools and data access.

7. Amperity

Amperity’s core customers tend to be large retail or traditional brick-and-mortar businesses with extremely disparate data sources. Amperity is a CDP platform that is highly specialized in identity resolution. It has “state of the art” machine learning technologies whereas most CDPs use simple “deterministic” identity resolution logic (e.g. static value equality on a graph, or if “email = email”, then this is the same user). Like most CDPs, Amperity does have some of the typical marketing activation capabilities that other CDPs offer. However, these are more limited. At its core, Amperity is extremely efficient at identifying and predicting customer behaviors which is a super useful trait for any company.

What are the issues with off-the-shelf CDPs?

All CDPs tend to have several similar problems. Firstly, CDPs are not a single source of truth. With the rise of cloud data warehouses (i.e. Snowflake, Redshift, BigQuery, Synapse, etc.), data warehouses now contain all customer data because companies are already using them for reporting and modeling. CDPs only have the data that is ingested into it.

Secondly, CDPs create rifts in organizations because they were solely created for marketers. This discourages collaboration between marketing and data teams. Everyone within an organization needs to be working towards the same underlying goals even if they are on different teams. Additionally, all CDPs are built on proprietary systems that don’t always pair well with other technologies. As an example, if a transformation capability doesn’t exist, users are stuck filing a support ticket.

This actually happens quite frequently because conventional CDPs do not typically have the ability to do a ton of robust transformations on the data that is stored within them. Likewise, if an assortment of bad events is loaded into a CDP, users are limited to the features built-in to the CDP to clean that data set. Similarly, as a point-to-point tool that moves data back and forth between different systems, CDPs create silos because they cannot leverage any existing technologies or tools that may already exist in an organization.

Additionally, since CDPs were created solely for marketers, the data models that they provide are not flexible. In fact, they often force organizations to “shoe-horn” their data into a strict model that makes no logical sense for the business. Lastly, CDPs store all of a company’s data which has privacy and security concerns. Each organization should own its own data so that it is not subject to the whims of a particular vendor.

What are the issues with Segment?

Segment does a decent job at moving data from point A to B. However, it has a couple of problems. Firstly, data that is pushed through Segment is never really transformed to create a proper 360 view of your customer (ex: combining billing and product data); it also cannot be combined with SQL. Segment claims to unify customers across all paths and channels to enable personalized campaigns, but these campaigns can only be so useful if all of the information that is being pushed to various marketing platforms is still in its raw state.

Additionally, Segment’s data model is limited to two objects, users and accounts; and in most cases, a user can only belong to a single account. This is problematic because every business has a unique model. For example, a company like Spotify collects information on users and accounts, but it also tracks other concepts like artists and genres which are typically treated as separate tables. However, the core problem with Segment is that it’s trying to take the place of a conventional iPaaS or ETL (extract, transform, load) tools and handle the entire end-to-end process of data integration.

Additionally, with Twilio’s acquisition of Segment, it is safe to assume that there will be some bias in the tools that are recommended. After all, Twilio is focused entirely on contacting customers, and Segment is focused solely on managing the data about them. Segment does a good job of collecting and transferring event data. However, acquiring, ingesting, and transforming data from SaaS tools is another story. Using Segment is like renting a data pipeline and most organizations want to control their technology stack from top to bottom. Proprietary data should be a competitive advantage and not a liability.

Why your warehouse should be your CDP

The main difference between a CDP and a data warehouse lies in the fact that CDPs only store customer data whereas data warehouses act as a repository for all the data across the entire organization - not just customer data. CDPs strictly focus on enriching data for marketing purposes, data warehouses can run a variety of different workloads for analytics purposes. Most organizations have standardized the data warehouse as the single source of truth because your CDP only has a subset of data whereas the warehouse has all of it. This is actually the number one reason why the data warehouse should be the CDP.

Since all of the data is often already in the data warehouse, the logical choice is to simply just use it as a CDP. A modern data stack should consist of an end-to-end flow from data acquisition, collection, and transformation. In most cases, the easiest way to enable this goal is by leveraging tools that are purposely designed to handle a single task. Fivetran, Snowflake, and dbt are great examples of this. In fact, this is the core technology stack that every data-driven company is adopting. Fivetran handles the entire data integration aspect providing a simple SaaS solution that helps businesses quickly move data out of their SaaS tools and into their data warehouse. Snowflake provides an easy way for organizations to consolidate their data into one location for analytics purposes. Lastly, dbt provides a simple transformation tool that is SQL-based, enabling users to create data models that can be reused. These three solutions combined create an effective data management platform.

However, there is a slight problem with this technology stack. Fivetran currently does not provide any way to collect and transfer event data (i.e. user actions). It also does not provide a way to move data out of the warehouse and back into the operational systems. This creates a problem because this is the main use case that Segment solves.

How to use your data warehouse as your CDP

If Segment’s main advantage was solely the fact that it is able to collect event data and move that information to various SaaS platforms, this advantage is now gone thanks to Hightouch and Snowplow.

Hightouch is a reverse ETL tool that provides a seamless integration for companies to sync data from the data warehouse to various operational systems like Marketo, Salesforce, Iterable, Hubspot, etc.

“Reverse ETL is the process of copying data from a cloud data warehouse to the operational systems of record, including but not limited to Saas tools used for growth, marketing, sales, and support.”

No information is stored within Hightouch either, all of it is kept in the data warehouse. Even better, companies can define custom objects (unlike Segment which just offers users and accounts) like workspaces, accounts, products, etc. to create audiences.

Snowplow is an open-source event tracking platform that gives users the ability to generate and process high-quality behavioral data and deliver it in real-time streams to both data lakes and data warehouses. The main advantage that Snowplow provides over a platform like Segment is data ownership. Companies leveraging Snowplow own their entire data pipeline because the data never leaves their technology stack. This means that it can be highly tailored towards the needs of the business

Both of these tools combined work unilaterally to create a more robust version of Segment. Leveraging Hightouch and Snowplow together enables more use cases and democratizes more data, all within a company’s own proprietary technology stack. There is honestly no reason not to test this workflow out since Hightouch offers one free destination and Snowplow is completely open-source.

Want to learn more about Reverse ETL? Download our Reverse ETL Whitepaper below where we touch on the technology and applications of Reverse ETL across your business.

Understanding Data Enrichment: A Comprehensive Guide | Hightouch

Luke Kline — Wed, 29 Sep 2021 14:48:12 +0000

What is Data Enrichment?

Over the last five years, the technology landscape has changed dramatically. Today nearly every single organization is collecting both first-party data and third-party data (ex: Clearbit) to power insights and make business decisions. However, one of the core challenges across all industries is data enrichment. Most companies are capable of capturing raw data and generating insights to make informed decisions, but few are able to turn these insights into action. This is why data enrichment is so important.

In its simplest form, data enrichment or data enhancement is the process of enhancing existing datasets with information that is generated from additional sources, whether it be product analytics, marketing analytics, sales analytics, billing analytics, etc. The goal is to pair this customer data together to enable cross-analysis and deeper insights. Emphasizing data enrichment processes improves data accuracy. This translates into more personalization, which in turn leads to a better customer experience.

A good way to view a data enrichment process is through the lens of an operational system. At a basic level, a CRM tool like Salesforce or Hubspot provides high-quality data on various properties like contacts, companies, deals, etc. Typically these properties have a set of sub-properties like company headquarters, first name, last name, email, phone number, deal stage, deal owner, etc. All of this contributes to the overall customer profile. This same analogy can be applied to pretty much any operational system. With data enrichment, an organization might add additional information from other sources to their CRM. One example of this could be product data since this is not something that CRM’s innately capture.

All of this information is extremely valuable and super helpful for outreach and customer personalization. Marketers use it on a daily basis to experiment and run campaigns, and salespeople leverage it to grow their sales pipeline. However, operational systems are only designed to answer simple questions so it is often difficult to maintain a holistic view of the customer.

Why is Data Enrichment so Important?

To realize why data enrichment is so important, it’s first relevant to understand the pieces of a modern data stack. For most organizations, the end goal is always to create an end-to-end flow between data collection, data integration, and data consumption. Data is generally collected through a variety of data sources (Google, Facebook, Salesforce, Hubspot Marketo, Amplitude, Zendesk, Asana, etc.). This raw information is then ingested into a cloud data platform (Snowflake) using a data integration tool (Fivetran).

When the data is in the warehouse, the next order of business is to transform and model it for analysis using a tool like dbt in most cases. Once all of this is done, the data is consumed through a dashboard using a tool like PowerBI, Tableau, Thoughtspot, Looker, etc. This information is then dispersed to various stakeholders and business teams as needed. Reports and dashboards only provide business direction though; they don’t make data actionable. Even worse, they only show a zoomed-out view of the data. This means it’s not associated with a specific prospect, user, or customer.

The problem is, all of this data is stuck in a dashboard and not actionable by anyone except data teams. Other teams like marketing, product, sales, etc cannot access the detailed version of this information without going directly through an analyst or engineer, making it impossible to answer questions like: “Who is the most active user in an account?” or “How many active users does this account have?” or “Which contacts have downloaded X marketing resource?” or “What pricing plan should customer ABC be on based on their usage?”

This means that various teams (specifically marketing and sales) can’t answer all of the questions they have because the information doesn’t exist in their native tools. Every tool is limited to the data it captures. Data Enrichment de-silos information across the entire organization and democratizes it for everyone. When more information is available, various teams can ask and answer more questions than ever before. Best of all, it creates a single unified vision across the entire organization because every single team has access to the same information. With data enrichment, businesses can gain a higher understanding of their consumers.

Data Enrichment Examples

Data enrichment can solve an assortment of use cases for companies in every industry. Conventionally a large focus has been placed on demographic data enrichment (information about the customer) and geographic data enrichment (information around the customer). However, businesses in every industry struggle with creating a 360-degree view of their consumers, so the emphasis should be placed on defining the common behavior attributes for ideal customers whether it be something simple like income level, marital status, physical address, etc. to create a more valuable data set. With that in mind, there are four main data types that provide the most value when it comes to data enrichment.

Product data refers to all information about the customer that is captured directly through the product. Some examples could be:

Purchases
Number of users
Signup date
Product usage metrics
Use Case

Sales data refers to all of the information about the customer that is captured in the sales process and pipeline. Some examples could be:

Active deals
Companies in POC/trial
First meeting
Product demo date

Marketing data refers to all of the information that is captured in the customer journey. Some examples could be:

Web pages viewed
Resources downloaded
Links clicked
Session length

Billing data refers to all of the information that is captured throughout the payment process. Some examples could be:

Contract size
(ARR) Annual recurring revenue
Last payment date

When this type of information is made available outside of the native system that it was captured in and copied into other tools, it enables some really powerful actions. For instance, marketing teams can create email lists to target specific people with ads, campaigns, and offers if they are able to associate properties like pages viewed, resources downloaded, and links clicked with specific users (like promoting deals to customers that viewed your pricing page). Refining this information even further, marketing could use these assets to score leads based on intent. Likewise, when product data is added to a platform like Hubspot or Salesforce, it helps salespeople identify which leads to target and it increases personalization from a marketing standpoint because both can see which users are most active in a given account. Additionally, when sales data is made available to marketers, targeting customers in active deal cycles becomes extremely easy. Lastly, when billing data is made available to other systems, customer support teams can trigger emails to remind customers about upcoming payments. Best of all, every single scenario just listed can be fully automated.

Data Enrichment Use Cases for Marketing and Sales

The core use cases for Data Enrichment are most often centered around marketing and sales teams. These teams are often looking for more detailed information on various leads and accounts as it relates to the customer journey as a whole.

Consider this scenario. Product-led-growth companies like Slack and Grammarly give users the ability to sign up for a free version of their products. Both of these companies offer additional features in the premium version of their products. The typical adoption path begins with a single user and expands when additional team members see the value in the product. Once enough users are leveraging the tool, management will purchase an enterprise license to cover the entire organization. This is a fantastic go-to-market strategy because it amplifies sales and marketing efforts to spread awareness and increase adoption. Obviously, this model only works with a strong product, hence the name “product-led-growth.”

Converting free users to paid customers is a major challenge, so in most cases, the role of marketing and sales in PLG companies is to accelerate the adoption cycle. This means delivering highly personalized content, messages, and offers. When nearly all of the information about the customer is captured in-product, it makes it really difficult to leverage the information because it doesn’t exist in native business systems like Salesforce or Hubspot where marketers and salespeople live on a daily basis.

What is Lead Enrichment?

Lead enrichment is all about tracking the activity of specific customers or prospects to enhance internal data. Ultimately, lead enrichment provides additional insight into existing leads by adding additional information from other sources. Typically this is done by enhancing lead information in an existing database or CRM (i.e. CRM data enrichment). This is especially necessary for companies with a PLG/self-serve signup process because all of the data about the customer is captured in-product and business teams typically don’t have access to tools like Amplitude, Heap, or Mixpanel which capture product analytics.

Knowing exactly where a prospect or customer is in their journey is absolutely crucial for sales and marketing teams because conversion rates increase when personalization increases. Being able to associate product usage, emails opened, integrations installed, links clicked, pages visited, resources downloaded, etc. is priceless. Retool solved this personalization problem by leveraging Hightouch to sync their product data in-realtime back into their CRM. By syncing fresh product usage data to Hubspot and Salesforce, the marketing and SDR team were able to launch personalized campaigns extremely quickly. This led to a 32% increase in response rate on personalized emails and a 500% increase in click and feature adoption.

What is Account Enrichment? (for Account-Based Marketing)

Account enrichment is nearly identical to lead enrichment with the only difference being that the focus is placed on accounts rather than individual leads and prospects. When product data is inaccessible to business teams it also poses a problem on an account level because individual account executives and marketers don’t know who to target.

Figuring out which inbound leads to focus on within a given account can be a huge pain and even more so when there is no data available to differentiate between them. Identifying which users are most active based on product usage helps to simplify this problem. Scoring leads based on product usage is a great way to solve this challenge and it is exactly what Zeplin did using Hightouch to sync product data back into their CRM.

Platforms for Data Enrichment

Since the end goal for data enrichment is to enhance existing data sets and the main use cases are focused on marketing and sales, it’s obvious that the optimal tools to perform data enrichment should be sales and marketing platforms. That is to say, it makes the most sense to perform data enrichment in platforms like Salesforce, Hubspot, Marketo, Acoustic, Pipedrive, etc.

Data Enrichment Tools and Services

Although types of data enrichment companies exist, there are two main categories for data enrichment tools, iPaas (Integration Platform as a Service), and Reverse ETL (Extract, Transform, Load).

iPaas

Simply stated, iPaas solutions move data between apps or external data sources with little or no transformation. They only give users the ability to send data from one source at a time and the data is still raw. Likewise, the data can’t be combined to create a 360 view of the customer. They are strictly point-to-point solutions. This makes it impossible to send information like ARR (annual recurring revenue), something simple like individual purchases would have to be sent instead. iPaas solutions like Tray and Workato are also largely based on event triggers. A trigger represents an event that takes place in an individual system.

That event is then transmitted to the integration platform through an API call or Webhook which then performs predefined actions set in place by the user. Because these solutions are often based on events or records they often run into rate limit errors. Additionally, with iPaas solutions, the user has to worry about painful “edge cases” like foreign keys, API limits, and the inevitable tree of if/else statements. One of the main drawcards for iPaaS solutions is that they provide an extremely simple UI that requires no technical knowledge. This means non-technical users can control their workflow automation needs. The downfall of this is that it can cause things can get complicated very quickly. Likewise, it is important to note that data cannot be moved unless there is an event trigger and this can cause serious problems.

Reverse ETL

Unlike iPaas solutions, Reverse ETL solutions like Hightouch integrate directly with the data warehouse, meaning that data can be synced to various operational systems so that it updates in real-time. This is extremely useful because the data warehouse is typically the single source of truth for most organizations. With Reverse ETL, operational systems can show the same information that is displayed in the warehouse - all in a matter of minutes.

Better yet, Reverse ETL solutions can leverage existing data models (ex: lifetime value, propensity scores, customer health scores, ARR/MRR, funnel stages). that have been built on top of the data warehouse. Reverse ETL can sync all of the data broken down by each user instantly. It also automatically handles rate limits retries, etc., and uses bulk APIs without requiring any user input. Hightouch lets users easily define data, map the appropriate fields, and send that information to the tool of their choosing.

The use case should always be at the forefront of any decision when considering the adoption of a tool. For more information oniPaas and Reverse ETL check out this Guide to Data Integration.

The Benefits of Data Enrichment

Basically, every single company captures all of the data required to make business decisions. However, very few leverage that data to turn insights into actions because it is kept in a dashboard or report. Even worse, the data is often siloed in a way that is not accessible by various non-technical members and these are the exact teams that need this data to drive their day-to-day decisions. This is why data enrichment is valuable. When data is accessible by everyone, it’s actionable by everyone and this means that different teams will always be working towards the same goals because they all have the same view of the customer. Every company is different, so data should always be a valuable asset. With that in mind, the importance of Reverse ETL and Operational Analytics cannot be understated.

Announcing Hightouch Audiences: Enabling Marketers to Self-Serve their Data

Zack Khan — Tue, 31 Aug 2021 18:23:31 +0000

You have this amazing idea for a marketing campaign: sending your customers this congratulatory Gif after they complete their first purchase. Genius, right?

You have the copy written. All that's left to do is to get a CSV with
all of your customers that made a recent purchase: sounds simple. You
ask your lovely data team for the CSV, and bam! You're told to add a
JIRA ticket in a never ending backlog of 100 other data requests from
across the company. The earliest they can get to your ticket is next
month. What do you do now?

We've all been there before. But why is this the norm? Here at
Hightouch, we believe that everyone should be able to use data to
personalize the experience of their customers. Today, we move one step
closer to that vision with our latest product for Marketers: Hightouch
Audiences.

Goodbye CSVs, Hello Hightouch Audiences

Hightouch Audiences allows marketers to define audiences to target for
marketing campaigns and sync those audiences to any of their marketing
tools like email or ad tools: no engineering favors required. As a
marketer, you can use our Visual Audience Builder to visually define and
filter audiences in an intuitive UI, without needing to know SQL.

Once you define an audience, it will continually sync new members of
that audience to your tools: no more having to upload a new CSV every
week.

Use Cases

Previously, the data warehouse was only accessible by SQL-savvy team
members. With Hightouch Audiences, the power of the data warehouse is democratized so that anyone can personalize customer experiences for use
cases such as:

Lifecycle marketing: Send lifecycle marketing campaigns to customers across any channel as soon as they invite a friend or abandon a shopping cart
Target Your Paid Ads: Increase your ROAS by retargeting customers who visited your pricing page or excluding customers who already purchased
Create lookalike audiences: Find new customers similar to your existing high value customers in all of your ad networks like Facebook and Google
Send Conversion Events: Send conversion events to any ad network to optimize your targeting, reduce your CAC and provide enhanced ROAS reporting

So, how does it work exactly?

You can think of Hightouch Audiences as the glue between marketing and
data teams.

First, data teams define models and relationships. With just a SQL query or dbt model, data teams can define the data that Marketing can access (columns, tables, etc). Hightouch is flexible and supports any data available in your warehouse: Users, Accounts, Workspaces, Products, etc.
Next, marketing teams create audiences on top of data models. Marketing teams can visually filter audiences based on any properties or events. For example, you can choose an audience of users who purchased a certain item or live in a certain city.
Then, audiences are synced to marketing tools. Audiences can be synced to CRMs, ad networks like Facebook, marketing automation tools like Marketo, lifecycle marketing tools like Braze or Iterable, email tools like Mailchimp and 60+ other tools.
Finally, marketing teams can run campaigns on those audiences. You can deliver personalized messaging to those audiences in your marketing tool and channel of choice, such as email, SMS or in-app.

Where does Hightouch Audiences sit within my marketing stack?

Hightouch Audiences acts as the central data source of your marketing
stack that enables you to run multi-channel campaigns in all of your
tools. For example, you can define an audience of users who visited your
pricing page. Then, you can sync that audience to your email tool (ex:
Mailchimp) for a personalized pricing email and retarget them on ad
networks (ex: Facebook) to get them to convert. All you need to do is
define your audience once, and Hightouch takes care of making that
audience available in 60+ different tools.

What makes Hightouch Audiences unique? How does it relate to a
CDP?

Most CDPs (Customer Data Platform) and audience builders like Segment
trap you in a rigid data model, lock you to use just Events and Users,
store your data, and hit rate limit issues at scale. The key difference
between Hightouch Audiences and any other audience builder is that
Hightouch is more flexible. How? Because it works on top of your data
warehouse, which is the single source of truth for your business. Your
warehouse has the full picture of your customers, including product
actions, billing information and even 3rd party data sources.

All this context on your customers allows you to create powerful
audiences. You can even use Custom Objects specific to your business
(like Workspaces, Accounts, Products, etc): for example, filtering all
customers who bought a specific Product or belong to a high value
Account. And your data stays in your warehouse, keeping your precious
customer data safe within your systems (which is especially needed for
regulated industries like Fintech and Healthcare). That's something a
CDP (Customer Data Platform) can't do.

However Hightouch does not help with event collection: you can still use
a CDP or solutions like
Snowplow for that.

Is Hightouch going to replace BI tools like Looker, Amplitude or
Mixpanel?

BI tools like Looker have also made data accessible to more people in a
company, but at the end of the day, they are focused more on analytics
(hence, "business intelligence") than activation ("putting data to
work"). In order to activate data in BI Tools, it's still fundamentally
all about CSVs. Hightouch Audiences is even more accessible given our
intuitive Visual Audience Builder, and even more powerful and automated
than CSVs because it connects directly to your marketing tools. This
saves you time from doing manual work and helps you automate your
campaigns.

What data does Hightouch Audiences use? What setup is required?

Hightouch Audiences requires that you have a data warehouse or data lake
that has event data on how your users are interacting with your product.
Hightouch does not replace your existing event tracking or ELT
workflows: in fact, it enhances it. The
more data you make available in your warehouse (such as billing data
from Stripe, sales data from Salesforce, etc), the more powerful your
audiences can be.

Does Hightouch support real time?

You need to reach customers when your message is most relevant to them,
like the example above of sending a gif as soon as a customer
completes a purchase (not 10 minutes later). Hightouch supports
real-time use cases: it can send data to marketing tools as soon as an
event lands in your warehouse, or you can use a streaming data source
like Segment, Kinesis or Kafka.

How do I try out Hightouch Audiences?

Learn more about Audiences and book a demo here.

What is Operational Analytics?

Zack Khan — Wed, 25 Aug 2021 02:15:38 +0000

Operational Analytics

It's common to hear teams talk about the importance of "data-driven decision-making". Once a lofty aspiration, innovations in data warehouses and BI tools have made it simpler and cheaper than ever to actually make sense of data. But there's an unsolved challenge - insights gathered from analytics are only valuable once they're actually used to make a change in the business that moves the needle. This is sometimes referred to the "last mile of analytics."

Without that elusive last mile, analytics is at best a reactive report card for your business, and at worst, a waste of time. At Hightouch, we've worked with hundreds of companies who struggled with the last-mile of analytics problem: all of their important data lives in the warehouse, reporting is solid, but it's too hard to take action on that data.

Operational analytics is an approach to analytics that shifts the focus from simply understanding data to actually putting that data to work in the tools that run your business. Instead of just using dashboard data to make decisions, operational analytics is about turning insights into action - automatically.

The two uses for data

Every company generally uses data in 2 ways:

Operations: using data to actually "do things". For example, triggering an email when a new customer signs up or makes a purchase.
Analytics: using data to understand what's going on in the business. For example, building an executive dashboard with KPIs across sales, marketing, finance, etc.

Operational data is all about syncing data between systems to do things like communicate with users, bill customers, alert employees, etc. Analytics is often see as one of many "destinations" for the operational data pipeline.

The persistent challenge with operational data is that it's not easy to get your various tools to "talk to" one another. For each pair of tools, you need to figure out how to get data to flow dependably and accurately between them. If you've ever gotten an email addressed to first_name, you've seen this notorious challenge rear its head.

Some other examples of operational data workflows:

A B2B software company syncing product usage data to a CRM so a sales rep knows when to reach out to a customer.
An ecommerce company syncing purchase data to an ad network so that recent purchasers don't get targeted for something they already bought.

Analytics, on the other hand, is about bringing all kinds of different data together and visualizing it in a way that paints a picture of what's going on in the business.

The beauty of analytics data (which turns out to be the key that unlocks operational analytics) is that it's often the only realm where different datasets live together harmoniously - most often in a data warehouse. Analytics data is tied together neatly through models that form the foundation of the digestible, contextual charts that analytics tools provide.

Thanks to innovations in data warehouses and the surrounding ecosystem, bringing data together for analytics has never been easier or more cost-effective.

The analytics layer is a hub, not a spoke

It just so happens that what was previously thought of as the "analytics layer" turns out to be the perfect foundation for operational data workflows, and an antidote to those challenges associated with getting systems to "talk to" one another.

As opposed to creating point-to-point connections between tools ("spokes"), companies are now beginning to use the warehouse as not just the foundation for their analytics, but as the "hub" for all operational data workflows. This is operational analytics.

There are a few reasons why the "analytics layer" is the ideal hub for operational workflows:

It's simple to aggregate and integrate data in data warehouses; it's what they're built for. Teams can easily bring customer data, billing data, employee data, and other datasets together into the fabled single view of the customer, a promise made by many SaaS vendors who haven't really delivered. Once the data's in the warehouse, the path of least resistance is to just send that data out where it needs to go.
Security: companies own their warehouse, so data never has to leave your purview and fall into the hands of yet another vendor.
This approach breaks down silos between data teams and business teams by creating a clear handoff: data teams own the raw data and model it into clean data, which empowers business teams to own the management and sync of that data to the tools they need to run the business.

This approach is dramatically changing how companies think about analytics.

Analytics has always been about understanding your business and using that knowledge to make decisions. The problem comes when those decisions have to actually get carried out. All too often, good ideas come out of analytics, but fizzle into nothing when data actually needs to be put to work.

Reporting alone is necessary, but not sufficient. It doesn't actually drive the actions that move the needle. Modern companies can no longer just make data-driven decisions. They need to act on those decisions with data and do so automatically. This is operational analytics.

An example of Operational Analytics

Let's take an example of what it might look like in practice.

Imagine you work at a software company with a freemium model. Users can sign up for free and use the product up to a certain limit, at which point they then have to pay. You might use analytics in the form of a BI dashboard to track the number of signups, the percentage of users who convert to a paid account, and the effectiveness of sales reps in converting those customers to paid. You find that the sales reps who spend time personalizing the outreach to free users with information about their specific use case tend to over-perform.

Currently, these sales reps need to track down information across systems - Slack, Salesforce, and others - in order to get the full scoop before sending out a personalized, relevant email from Hubspot. This is where operational analytics can help.

With operational analytics, the same data that's feeding your BI dashboard can be automatically synced into Hubspot. For instance, Hubspot contact and account records can be enriched with with information like: whether or not the user has fully onboarded, the last login date, and the integrations that the user has set up. Now, sales reps don't need to track down information and spend time manually writing personalized emails. This data can be used to automate that outreach, leaving reps more time to help customers.

This isn't a hypothetical example. It's exactly how Retool used Hightouch for operational analytics. Once Retool began using analytics not just for reporting, but for action, they saw some pretty staggering results, including a 32% increase in reply rate on emails, as well as 500% increase in click rate and increased feature adoption.

Want to get started?

If you're ready to get started with operational analytics, we're happy to help. Feel free to create a shared Slack channel with us here at this link: https://api.hightouch.io/api/misc/shared_slack or schedule a call with us here: https://calendly.com/mwhittle5/meeting. There's a lot to consider with operational analytics, and some teams might not be ready for Hightouch just yet. That's okay; we aren't pushy and will do our best to help.