DEV Community

Cover image for RudderStack - The Open Source Alternative to Segment
Mufassir Kazi
Mufassir Kazi

Posted on • Originally published at mufassirkazi.hashnode.dev

RudderStack - The Open Source Alternative to Segment

With over 114+ team size and 3.5k+ GitHub stars, RudderStack is the most popular open-source alternative to Segment.

To put it briefly, RudderStack assists you in gathering all your customer event data. Once you have collected it, you can easily send it to various teams within your organisation or third-party tools that require the collected data for further processing.

This blog post covers the fundamentals, core features, integrations, pricing model, growth and the team behind Rudderstack. Let's get started! πŸ› οΈ

Where does it all start? πŸ€”

To gain insight into your users' behaviour, you need to map their journey on your website or mobile app. This may entail examining user interaction details such as click events, searches, impressions, recordings, and more.

All this information needs to be stored somewhere, preferably in a data -warehouse like Redshift or Snowflake. This is where RudderStack comes in. β˜„οΈ

RudderStack has SDKs (iOS, Android, JS, Python, etc) which are used to send these events. You can set up a backend (either you can self-host or RudderStack can host it for you) which collects these events and then stores them in your data warehouse πŸ«™. Once you have the data, you can leverage RudderStack features to perform various operations on your data.

Let’s talk about the most important ones:

Features ✨

Profiles

User Profiles: RudderStack

Data is accumulated from various sources and dumped into the warehouse. These sources can be your website, mobile app, marketing platforms, sales engagements, etc.

All this data, generated by different users, are made identifiable. Users can be identified by associating identifiers like email id, phone no, username, etc. Sometimes you may need multiple identifiers to distinguish between the same user for work and personal email access.

Since all this data is spread across multiple tables within a warehouse, making sense of it is cumbersome and an error-prone task.

Therefore RudderStack provides you with profiles. With profiles, you no longer need to run complex SQL queries. Instead, you get a unified 360-degree view of a user’s journey across your product(s).

User Identification

User Identification: RudderStack

So last we talked about identifying users, but how do you go about implementing that in RudderStack?

Simply, by using the identify call. It allows you to record traits about a user like his email, username, phone, etc and associate it to their actions.

This means that you should make the identify call after a user signs up for your product or logs into his account or updates his information. This way, you can record the trait with the action to identify them later.

But that's not all πŸ‘€

RudderStack also allows you to track visitors (not a customer yet) on your website by assigning them an Anonymous ID (anonymousId). Such an id can be a session ID corresponding to the visitor’s session.

Later, it helps you to turn these anonymous users into known users when they register for your product, thereby giving you the full view of a users journey - right from the time they visited the website to the point of conversion and beyond 🀯

Data governance

Data Governance: RudderStack

Events are generally created in a specific format that can include:

β€’ Event type (button click event, page view event, etc)

β€’ Event properties (version, page, timestamp, etc)

β€’ Event metadata (option clicked, text copy, etc)

Now since you’ll be generating hundreds of these events across your product, you need to maintain consistency while creating these events. Depending on the scale of your organisation, there might be multiple stakeholders who define and implement these event specifications. πŸ‘©πŸΌβ€πŸŽ¨πŸ‘¨β€πŸ’»πŸ‘©πŸΌβ€πŸŽ¨πŸ‘¨β€πŸ’»

This can lead to formatting inconsistencies that can be introduced in your data. Such inconsistencies can be missing fields, incorrect capitalisation (lowercase/uppercase) of event names or unit errors (pounds, dollars), etc.

RudderStack data governance API was created to address these data inconsistencies. It helps you to zero in on these inconsistencies by giving you access to information on all your events and their related schema. This way, you can implement processes to set alerts (🚨) for all possible errors that can happen in a particular workflow. Eg: Create alerts for incorrect capitalisation, missing data type, etc.

Integrations

Integrations: RudderStack

You often want to send a subset of your data to other 3rd party websites. An example use case can be using Mailchimp to send automated email campaigns. RudderStack forwards these events so you don’t have to deal with 3rd party libraries or wrangle with their SDKs.

Transformations

Transformations: RudderStack

Now that you know the possibility of integrating with 3rd party apps, let’s see the power of transformations. πŸ’ͺ

Transformations help you to write custom functions that transform your data before routing them to their destinations. There are several use cases where logic plays a crucial role, such as filtering to ensure that events are routed to the correct destinations, cleaning data, or enforcing data privacy through masking operations.

On top of that, transformations help you to connect to internal databases and enrich your user profiles for better analysis. They also offer a templates section that will get you started.

RudderStack πŸ†š Segment

β€’ RudderStack is open source, allowing you to keep total control of your data. With Segment, πŸ‘Ž , everything is with them.

β€’ RudderStack can turn your warehouse into a data source for the rest of your stack. Send enrich data to any destination.

RudderStack vs Segment Pricing

β€’ RudderStack pricing is based on events. Whereas, in Segment, you are priced on monthly tracked users

β€’ RudderStack also comes with added advantages by offering reverse ETL support, dedicated VPC hosting, session tracking and automatically building a customer data lake.

Growth over the years πŸ“ˆ

RudderStack Growth Infographic

Starting in 2019, RudderStack has come a long way. Its pivotal entry in the dev tool space was through being an open-source alternative to Segment. Post garnering interest, RudderStack raised its $5M seed led by S28 capital in 2020.

Fast forward, they’ve raised their series A and series B at $21M and $56M respectively and now have opened offices in India and Greenville. Reportedly, their revenue has also grown by almost four and a half times as their customer base is growing exponentially.

The Team πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

The core team is a heavy mix of professionals who had already dealt with data on a large scale.

The ship is captained by @soumyadeb_mitra, who has a PhD in database systems and was a Sr. Director at 8x8 post his previous startup acquisition (MairinaIQ) by the same company.

This is followed by @sumanthpuram, who is one of the core engineers at RudderStack since its inception. Current VP of Engineering, Sumanth was previously Co-founder & CTO at Code Astra, a mobile and web development agency.

We also have @brianylu who is the Director of Product at RudderStack. He has played a couple of cameos at big names viz DoorDash, Dropbox and Google. He also does a newsletter - This Week in Data - which you must check out.

Getting started πŸ‘‰ 3 ways

β€’ Docs: Study their open source and managed offerings to get a feel of the product.

β€’ Schedule a demo: Have a custom use case? -> Talk to one of their solution engineers.

β€’ Migration Docs: RudderStack has dedicated guides to help you migrate from other platforms (eg: Segment, Snowplow or from RudderStack self-hosted to RudderStack Cloud).

Author's Message

Open source is still nascent and needs more exposure, especially among companies still locked in the depths of commercialised software.

This is why I create bi-monthly, in-depth blog posts about open-source dev tools that catch my eye. By doing so, I hope to help companies save money, potentially hire new staff, and increase the revenue of deserving products.

If you love my content, do give me a follow on Twitter (I speak my mind there) or connect with me on LinkedIn.

Top comments (0)