DEV Community

Julien Kervizic
Julien Kervizic

Posted on • Originally published at Medium on

How to collect the data you need to bootstrap your digital marketing analytics

Photo by Campaign Creators on Unsplash

To a large extent boot-strapping your marketing activities with data, resolves around the collection of data of two or three specific domains depending on the scope of your business.

  • Campaign Activity : Be it from Digital Marketing, pushing ads such as Facebook or Google ads, or more through email service providers/marketing automation.
  • Clickstreams: Provides an understanding of the customer journey on site and the different factors that contributed to a conversion. Leveraging clickstream data allows to be able to properly attribute conversion to specific campaigns.
  • Sales: Digital Marketing for e-commerce websites revolves around generating online sales. These should be tracked and the campaign spend should be optimized against this objective.

Use Cases

There are quite a few use cases around marketing analytics, but we can easily show how these data sources of Campaign Activity, Clickstreams and Ecommerce sales can drive some of the biggest marketing analytics use cases such as media-mix modelling, marketing attribution and churn prevention.

Media-Mix Modeling

Media-mix modeling (MMM) allows to get an understanding of how to shift your budget mix among different advertising channels to optimize your outcome. It relies on statistics techniques to understand where 1 unit of marginal spend would be best placed.

MMM relies on two specific source of information 1. Spend data (Campaign) 2. an Outcome data (Eg: Ecommerce Sales), that we want to optimize for. In Order to be effective it is important for MMM to have a view of all the different channel spend contributing to the outcome.

Marketing Attribution

Marketing attribution role is to assign credit to specific marketing campaign, to get a better sense of their contribution to a specific objective. Attribution methods such as “last click”, provide full credit to a specific campaign when an objective is reached, in this case it provides the full credit of a campaign to the last touchpoint having contributed to the said objective, while other techniques can partial credit. Explanation of different attribution techniques are provided in the following medium posts: here and here.

Marketing attribution relies on Campaign Data (Spend), Clickstream information and Sales data in order to properly attribute Campaign activities. Along with setting up the different systems to collect the information, proper url tagging (UTM) and potentially setting up the right Cookies on the websites are necessary first step to enable this use case.

Churn Prevention

Churn identification and prevention is one of the most traditional CRM use cases. It leverages upon Sales and Campaign data (CRM) to get a better understanding as to what customers are likely to churn and to what offer they would tend to respond to in order to stick with the service offering.

Data Collection

In general one of the hardest part of empowering Marketing Analytics is the ability to source the data. It requires information to be pulled from a variety of sources. Depending on the specific of the Business trying to kick-start this use-case there can be a variety of applicable ways to integrate each of the data sources required.

Campaigns data collection

There exists quite a few way to integrate and collect campaign data, from using specific data integration solutions, to leveraging singers taps, using the different tools built in data export capabilities or through building specific API pipelines.

Data Integration solutions

Different tools exists to simplify the collection of data from marketing campaigns, Talend, Adverity, Fivetran and Alooma (recently acquired by Google) provide a series of connectors that make data integration from these different ad-sources fairly easy.

Singer Taps

Photo by Austin Neill on Unsplash

Singer taps provides open-source pre-built connectors to a series of advertising sources such Facebook, Google, Outbrain, Salesforce, Marketo, Selligent,.. The data from these sources can be then be easily fetched by only modifying some configuration settings and executing a command line call, for instance

tap-adwords -c config.json -p properties.json -s state.json

Data Exports

Photo by chuttersnap on Unsplash

Certain advertising tools allows for data exports to Big Query, a different data-warehouse tool or as file exports, for instance:

  • DoubleClick and Emarsys provide exports to Google BigQuery
  • Salesforce Marketing Cloud allows FTP/CSV exports, flow which can be automated in automation studio and then ingested into a database/data-warehouse

Custom Development — API Pipelines

Photo by Tyler Lastovich on Unsplash

Development of pipelines to pull directly the data from Facebook, Google through API calls. This requires a data engineer or developer to setup the different data-flows. Most advertising sources provide SDK for easy integrations with their platforms.

ClickStream Collection

Different alternatives exist in order to collect clickstream data, from relying on a premium analytics tool such as google analytics 360 or Adobe, leveraging a Customer Data Platform, a Clickstream collector or through setting up some custom development.

Google Analytics/Adobe Analytics

Photo by Austin Distel on Unsplash

The simplest way, if you can afford it, to collect raw clickstream data is through google/adobe Analytics. Google offers the possibility to export raw clickstream data to Big Query as part of their Google 360 offering. One of the major draw-back of going for that route is the $150k annual cost of Google Analytics 360.

Customer Data Platform

Photo by Blake Wisz on Unsplash

Most customer data platform offer the possibility to export ingested events to a data-warehouse. They ingest data from multiple sources, including website activity and are able to stream it back for processing or analysis. Depending on the size of your business, a customer data platform might prove more expensive than purchasing a Google 360 license, but offers additional benefits. Certain CDP such as segment, offer a free version up to a certain amount of events or active users.

ClickStream Collector

Different open source clickstream collectors exist, the most known one are Snowplow and divolte. They offer a way to ingest clickstream data, without the need to fully develop it. The draw-back of using these, is that you need to be managing the infrastructure.

Custom development

Another solution to collect clickstream data is through custom development. Logic App, Function App, Lambda function and an EventHub/Kinesis/PubSub setup would allow for a scalable ingestion of data, but at the cost of managing code and infrastructure .

(Online) Sales data collection

There are different ways to source information related to online sales, all of which have their own set of pros and cons:

  • Analytics Tags: Analytics tags on the website to get a sense of online sales data. Data from your analytics tools can then be exported either manually, through API call or for those with Google 360 through a Big query export.
  • Database replication/mirroring: Some e-commerce platforms expose or let you choose your own database solution. This allows for using database replication / mirroring to get a live copy of the data for analysis or reporting purpose.
  • Data Integration Solutions: Pre-built data integration solutions exists, with certain platforms, acquiring one of these solution can significantly eases the development and maintenance needed to handle this part of the data collection process.
  • Singer Taps: Certain platforms, have singer taps created for them, essentially a pre-built ETL/API client which only needs to be configured before pulling the data.
  • API Pipelines: API pipelines can be generated on schedule to pull relevant information. They do require some custom development in order to pull.
  • Webhooks: Webhooks provides for platform supporting it, of a way to export and ingest data in real-time. Their main draw back is that you need to develop and maintain infrastructure, API/Webhook receiver.
  • Streams: Some e-commerce solutions allow you to get a feed of transaction to an event stream exports. This allows for getting real-time ingestion by passing an API/Webhook receiver.

Analytics Tags

Photo by Neven Krcmarek on Unsplash

It is possible to capture online sales data through an analytics tag such as Google’s. Google provides a structured way to pass the information to analytics through its enhanced ecommerce plug-in. This allows to provide a good first pass at capturing e-commerce data, there is however quite a few drawbacks from that approach:

  1. Ad-blocker: This approach faces issues for user using certain ad-block, which blacklist Google Analytics domain and any attempt from the analytics.js tag, although there are some workarounds
  2. Page Load issues: Users, who leaves the thank you page before the tag has had the chance to fire, would not have their order data being pushed to Google Analytics
  3. Delayed Payment: Orders having payment methods that allow for delayed payment, such as Bank Transfer or Paypal, might be incorrectly classified as a successful sales or not recorded depending on the approach taken.

The main advantage from using this approach is the universality of it and the speed at which this can be deployed, usually only necessitating some tag integration for most web-shops, and for certain platform such as shopify, only some simple configuration.

Another of the advantage of setting up enhanced e-commerce tracking, is the ability to tie purchase to specific sessions and therefore be able to rely on google’s last click attribution. With it it is possible to attribute specific orders to specific campaign and sales-channel based on a last click attribution, this can be beneficial.

Database Replication & Mirroring

Photo by Fares Hamouche on Unsplash

Some E-commerce platforms, allow those operating the platform to setup their own database , this is the case of Magento, EpiServer, or SiteCore for instance. In these cases, it is possible to setup a Master-Slave database replication or database mirroring, so that the data can be used for reporting purpose without affecting the production environment.

These can be setup without custom development, and can allow for a quick turnaround for providing data for reporting purposes.

Data Integration solutions

As for Campaign data, data integration tools exists that provide turnkey integration for e-commerce data. Each of the mentioned vendors provide connectors to certain e-commerce platforms:

  • Talend: Shopify, BigCommerce and Magento are available as Talend connectors from Cloudbee,
  • Advertity: supports Hybris, Shopware, Shopify and Magento
  • FiveTran: supports Salesforce CommerceCloud, Magento, WooCommerce, Shopify and SpreeCommerce
  • Alooma: supports Shopify and Magento

The use of these data integration solution are an alternative when there is lack of technical capabilities within the team/department.

Singer Taps

Photo by Daniel Chekalov on Unsplash

Currently WooCommerce is the only web-shop, having a singer-tap connector, making it’s use quite restricted. It is however possible to develop custom singer taps for specific use. This can be a good move, when already operating online campaign data collection through SingerTaps.

Custom Development — API Pipelines

Photo by Quinten de Graaf on Unsplash

Most web-shops these days allows for pulling order information directly as API calls, this is the case of Pure SaaS platforms such as Shopify, Lightspeed, Commercecloud or Commercetools, but also of the likes of Magento.

Some of which are supported by python SDK:

One of the major drawback of this approach is that it requires custom data engineer or software engineer work, it requires polling and to a certain degree is not “real-time”. Certain platform furthermore have rate limitations that might make it impractical to work with for pulling large amounts of orders.

One of the advantage of it however, is the ability to pull updated information about specific orders to date-ranges.

Custom Development — Webhooks

Photo by Chris Scott on Unsplash

Webhooks are essentially a way to send a notification over HTTP when some type of event happen, for our purpose, they provides a way to create a real-time ingestion of data from an e-commerce platform. Web-hooks can also provide a way to go around some of the rate-limitation if there isn’t any need to make callbacks.

They do require some sort of web-hook listener API and ingestion layer in order to capture the data. These can be built with the same type of technologies used for capturing clickstream data, for example a Logic App / EventHub combination.

Most e-commerce platform supports web-hooks, Shopfiy, Lightspeed and WooCommerce, Shopware, and Big commercesupport it natively, while Magento supports it through 3rd party plugins, and platforms, such as sitecore or episerverneed custom development.

Custom Development — Streams

Photo by Boudhayan Bardhan on Unsplash

Some e-commerce platforms are able to publish events directly onto a message queue (eg: Google Pub/Sub, Azure Service Bus, AWS SQS). This is the case of Commercetools, that prefered this approach to the standard HTTP webhooks. This allows for instance to natively “duplicate” the relevant data for both processing (eg order fulfillment), and for long term storage in a datawarehouses and let the different consumer of the data “subscribe” from that single source of information.

Beside Google Pub/Sub that has a turnkey export to a Datawarehouse (BigQuery), the other technology choices will still require development work in order to ingest he data.

More from me on Hacking Analytics:


Top comments (0)