A developer stars your repo, asks questions on Discord, reads your documentation, and signs up a week later. For your data workflow, these might be four different people.
Here’s how: GitHub shows the star. Discord logs the username. Your docs platform tracks an anonymous session. The CRM records an email. Nothing connects them. You know conversion happened, but you can't trace it.
When leadership asks, "What's our engagement rate?" or "Which content drives signups?", you're stuck cross-referencing spreadsheets and guessing at matches, ending up with a presentation of cluttered data.
DevRels generate data across many independent platforms, each with different schemas and definitions. GitHub's "active" means something different from your product analytics' "active.
This fragmentation is the result of how the DevRel tools were built.
In this article, we will explore why DevRel metrics often end up siloed, where they originate, how this fragmentation hinders your understanding of the developer journey, and, most importantly, how to address it. We will also show you what a data layer actually is and a tool to practically solve the fragmentation problem, walking you through a real implementation using Google Sheets so you can see exactly how unification works.
Where These Silos Lurk?
Each platform that you use optimises for a single purpose.
GitHub knows usernames and code activity, but doesn't track account IDs or CRM contacts. Discord assigns server-specific user IDs while developers use pseudonyms. Your docs platform tracks anonymous sessions. PostHog uses its own identifier scheme. HubSpot organises around company email addresses. Your backend logs API requests by key. Some data lives in Google Sheets because that's the only export option available.
These tools were never designed to track complete developer journeys. The fragmentation occurs due to issues with the architecture, as each system solves its specific problem beautifully. None were built to see what happens next.
Add schema conflicts on top. GitHub's "created_at" doesn't match your CRM's "signup_date" or your docs analytics' "registration_timestamp." One system defines "active" as logged in this month. Another means made an API call in 30 days. A third counts community messages. When you try to calculate unified metrics, you're comparing incompatible definitions.
This fragmentation creates a downstream problem, affecting your ability to understand what has actually been working and what needs to be changed.
How It Affects the Developer Journey
The moment you can't connect data across sources, your visibility into the developer journey collapses.
A developer discovers you through a conference talk, reads tutorials, stars your repo, asks Discord questions, watches a webinar, and signs up for beta. Which touchpoint mattered? You have no idea because each event lives in a separate system.
Here’s how fragmentation affects your critical workflows:
- Attribution becomes guesswork.
- Detailed product analysis fails, as “activation” has five different definitions.
- Content strategy runs blind when you see what people read, not what they do after reading it.
- Community impact remains invisible because it is difficult to connect community members to the downstream outcomes.
- Churn signals scatter across systems until it's too late to intervene.
Although the developers are on a continuous journey, your data appears as disconnected events, leading to failed strategies and decision-making that is based on guesses.
This is a problem that requires rethinking your foundation. And a step towards the solution is the data layer.
How Do We Solve This Problem?
A proper solution sits between your fragmented sources and your analysis tools. It must accomplish three things simultaneously:
- Unify identity without perfect data
Developers often use different usernames across platforms and frequently remain pseudonymous. A solution needs fuzzy matching that connects "alex_dev" in Discord, "alexcodes" on GitHub, and "alex@company.com" in the CRM. It needs to surface these matches transparently so you can review and override incorrect ones.
- Align schemas automatically
Map "signup_date" to "created_at" to "registration_timestamp" into one canonical field. Define what "activation" means across all systems. Maintain lineage so you always know where the numbers originate.
- Connect diverse data sources
Pull from code platforms, community tools, docs analytics, product analytics, CRM, backend infrastructure, and spreadsheets. Support APIs and managed ETL for live sync, but also handle exports and manual uploads.
All these requirements are what DevRels actually need to make things more organized for themselves. But is there any medium for us to create the data layers? Yes. One such tool is Astrobee.
In the subsequent sections, we will learn more about it along with practical implementation.
Practical Walkthrough: Unifying DevRel Data with Google Sheets and AstroBee
AstroBee is an AI agent that connects directly to existing data sources and builds a semantic layer capturing business logic as it evolves. Unlike tools that require clean data upfront, AstroBee generates an integrated source of truth from the data you actually have.
There are three connection methods for you to choose from: connect directly to your warehouse so AstroBee analyzes existing datasets without moving or exfiltrating your data (they support any warehouse, including BigQuery, Snowflake, and others, this is typically an enterprise feature since direct integrations require ongoing maintenance), connect source systems like PostHog and HubSpot via Fivetran's managed ETL, or upload CSV files directly.
AstroBee supports Google Sheets, PostHog, HubSpot, Salesforce, Google Analytics, PostgreSQL, and MongoDB. They're always adding new connectors, so feel free to reach out if you need one that isn't currently supported.
The process is simple: Astrobee unifies data and resolves entities, turning multiple definitions of, for example, “user” into a single unified identity. Once unified, you can output via AstroBee's analytics tool or integrate with existing workflows via MCP support; it seamlessly fits alongside your current stack without requiring pipeline refactoring.
To see how this actually works in practice, let's walk through a concrete example.
Creating the Data Layer via Google Sheets
One of the quickest ways to start building a data layer is by using Google Sheets.
For this walkthrough, I created a small dataset within Google Sheets, consisting of three tabs: one for developers, one for events, and one for content assets. Each tab represents a different aspect of the DevRel picture, such as identities, interactions, and the content they have worked on.
You can access the sheet here: https://docs.google.com/spreadsheets/d/1GBbDwscDsZKYwTZ4eAazCFiwKYWljHdfq-FfMt5Kq_0/edit?usp=sharing
The next step would be to utilize Astrobee to create a unified data layer. Here’s how you can do that.
Step 1: Create an account on Astrobee. Navigate here: https://app.astrobee.ai/
Step 2: Once you have created an account, since we’re using Google Sheets, let’s click on Connect Sources.
Step 3: Select Google Sheet as your data source and configure it. As mentioned, AstroBee uses Fivetran to securely connect to Google Sheets. Click “Continue” to proceed.
Step 4: Choose your authentication method and specify which sheet to sync. The setup guide on the right provides detailed instructions from Fivetran. Once you authorise Fivetran, you will be asked to add the sheet link (use the one shared above) and the named range.
After you’ve connected the Google Sheet, you can start creating the data layer to query your data. All you need to do is click “Create Data Layer”, and Astrobee will analyse your spreadsheet structure and generate a business model for natural language queries.
The Result
Here’s a quick demo of the data layer Astrobee generated for us:

Click here to see the video →
Once your data layer is generated, you can explore the tables Astrobee generated by clicking on the Tables tab. Each table represents a key entity from your connected data sources, such as Developer, Content, Event, and other relevant entities, depending on your use case.
Click any table to examine four aspects: Data shows raw data in table format, Description provides generated explanations of what the table represents and how it connects to your business domain, Properties & Relationships reveals all table properties (columns with data types) and relationships to other tables (showing how tables connect via foreign keys), and the SQL tab displays the underlying queries used to generate the table.
Also, notice the Patterns section? Patterns are AstroBee’s way of identifying relationships inside your data, even when the structure isn’t perfect or formally defined.
For the DevRel dataset, one of the first patterns AstroBee detected was a link between the events table and the developers table. Both sheets included a developer_id column, but neither declared a foreign key. AstroBee inferred the relationship by observing the repeated structure across rows and matching values in both tables.
Bringing It All Together
By the time the sheet is connected, identities are resolved, schemas are aligned, and the first set of patterns is validated, DevRels can now gain a clear, connected view of how developers navigate their ecosystem. What started as scattered identifiers across three separate sheets becomes a single narrative that actually reflects how developers discover, evaluate, and adopt your product.
What’s The Benefit for DevRels?
Once DevRels unifies the data, the real value appears in what you can actually do with it.
- Complete journey visibility: Trace developers from the first onboarding through every touchpoint. See which content drives progression. Identify where people get stuck.
- Real attribution: Stop guessing. Measure which activities are producing outcomes. Connect docs reading to API adoption. Link Discord engagement to retention. Get actual proof.
- Churn prediction: API usage drops, GitHub activity goes quiet, and docs consumption stops. Unified, these signals form clear warnings. Intervene before they leave.
- Community ROI: Track which developers influenced by advocates are actually converted. Quantify community impact.
- Content optimization: Measure what happens after people read your docs. Which tutorials lead to successful implementation? Which guides reduce support burden? And what articles are receiving the most views?
These benefits aren’t theoretical. All of them seem achievable when you have clear and unified data.
Wrapping Up
Most DevRel teams take fragmentation as normal. They accept that proving impact means merely exporting CSV files in reports and assessing outcomes by guesswork.
It doesn’t have to be this way. When data becomes unified, identities are resolved, and schemas align, the work all makes sense, and you can start directing strategies with proper evidence. With this, the decisions land on data that actually connects, and the impact becomes undeniable because it's measurable.
The tools to make this happen are now available. What's left is recognizing that the problem is worth solving and that partial visibility isn't good enough anymore.
To unify the data with Google Sheets and other data sources, check out and get started with Astrobee here: https://www.astrobee.ai/



Top comments (0)