<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Airbyte</title>
    <description>The latest articles on DEV Community by Airbyte (@airbytehq).</description>
    <link>https://dev.to/airbytehq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F3042%2F4caadfd0-3e4f-40ac-8865-83372328f745.png</url>
      <title>DEV Community: Airbyte</title>
      <link>https://dev.to/airbytehq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/airbytehq"/>
    <language>en</language>
    <item>
      <title>Why ETL Needs Open Source to Address the Long Tail of Integrations</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Fri, 02 Jul 2021 03:33:08 +0000</pubDate>
      <link>https://dev.to/airbytehq/why-etl-needs-open-source-to-address-the-long-tail-of-integrations-1341</link>
      <guid>https://dev.to/airbytehq/why-etl-needs-open-source-to-address-the-long-tail-of-integrations-1341</guid>
      <description>&lt;p&gt;Over the last year, our team has interviewed more than 200 companies about their data integration use cases. What we discovered is that data integration in 2021 is still a mess.  &lt;/p&gt;

&lt;h1&gt;
  
  
  The Unscalable Current Situation
&lt;/h1&gt;

&lt;p&gt;At least 80 of the 200 interviews were with users of existing ETL technology, such as Fivetran, StitchData and Matillion. We found that every one of them were also building and maintaining their own connectors even though they were using an ETL solution (or an ELT one - for simplicity, I will just use the term ETL). Why? &lt;/p&gt;

&lt;p&gt;We found two reasons: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Incomplete coverage for connectors&lt;/li&gt;
&lt;li&gt;Significant friction around database replication&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Inability to cover all connector needs
&lt;/h2&gt;

&lt;p&gt;Many users’ ETL solution didn’t support the connector they wanted, or supported it but not in the way they needed.&lt;br&gt;&lt;br&gt;
An example for context: Fivetran has been in existence for eight years and supports 150 connectors. Yet, in just two sectors -- martech and adtech -- there are over 10,000 potential connectors.&lt;/p&gt;

&lt;p&gt;The hardest part of ETL is not building the connectors, it is maintaining them. That is costly, and any closed-source solution is constrained by ROI (return on investment) considerations. As a result, ETL suppliers focus on the most popular integrations, yet companies use more and more tools every month and the long tail of connectors goes ignored. &lt;/p&gt;

&lt;p&gt;So even with ETL tools, data teams still end up investing huge amounts of money and time building and maintaining in-house connectors. &lt;/p&gt;

&lt;h2&gt;
  
  
  Inability to address the database replication use case
&lt;/h2&gt;

&lt;p&gt;Most companies store data in databases. Our interviews uncovered two significant issues with database connectors provided by  existing ETL. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Volume-based pricing&lt;/strong&gt;: Databases are  huge and serve growing amounts of data. A database with millions of rows, with the goal of serving hundreds of millions of rows, is a common sight.The issue with current ETL solutions is their volume-based pricing. It’s easy for an employee to replicate a multi-million row database with a click. And that simple click could cost a few thousand dollars! &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data privacy&lt;/strong&gt;: With today’s concerns over privacy and security, companies place an increasing importance on control of their data. The architecture of existing ETL solutions often end up pulling data out of a company’s private cloud. The closed source offerings prevent companies from closely inspecting the underlying ETL code/systems. The reduced visibility means lesser trust&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both of these points explain why companies end up building additional internal database replication pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inability to scale with data
&lt;/h2&gt;

&lt;p&gt;The two points mentioned above about volume-based pricing and data privacy also apply as companies scale. It becomes less expensive for companies to have an internal team of data engineers to build the very same pipelines maintained in ETL solutions. &lt;/p&gt;

&lt;h1&gt;
  
  
  Why Open-Source Is the Only Way Forward
&lt;/h1&gt;

&lt;p&gt;Open-source addresses many  of the points raised above. Here is what open source gives us.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The right to customize&lt;/strong&gt;: Having access to and being able to edit the code to your needs is a privilege open-source brings. For instance, what if the Salesforce connector is missing some data you need? With open source, such a change is as easy as submitting a code change. No more long threads on support tickets!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addressing the long tail of connectors&lt;/strong&gt;: You no longer need to convince a proprietary ETL provider that a connector you need is worth building. If you need a connector faster than a platform will develop it, you can build it yourself and maintain it with the help of a large user community.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broader Integrations with data tools and workflows&lt;/strong&gt;: Because an open source product must support a wide variety of stacks and workflows for orchestration, deployment, hosting, etc., you are more likely to find out-of-the-box support for your data stack and workflow (UI-based, API-based, CLI-based, etc.) with an open source community. Some of them, like Airbyte’s open source Airflow operator, are contributed by the community. To be fair, you can theoretically do that with a closed-source approach, but you’d likely need to build a lot of the tooling from scratch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debugging autonomy&lt;/strong&gt;: If you experience any connector issues, you won’t need to wait for a customer support team to get back to you or for your fix to be at the top of the priorities of a third-party company. You can fix the issue yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Out-of-the-box security and privacy compliance&lt;/strong&gt;. If the open-source project is open enough (MIT, Apache 2.0, etc.), any team can directly address their integration needs by deploying the open-source code in their infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Necessity of a Connector Development Kit
&lt;/h1&gt;

&lt;p&gt;However, open source itself is not enough to solve the data integration problem.This is because the barrier to entry for creating a robust and full-featured connector is too high.&lt;/p&gt;

&lt;p&gt;Consider for example a script that pulls data from a REST API.&lt;/p&gt;

&lt;p&gt;Conceptually this is a simple &lt;code&gt;SELECT * FROM entity&lt;/code&gt; query over some data living in a database, potentially with a &lt;code&gt;WHERE&lt;/code&gt; clause to filter by some criteria. But anyone who has written a script or connector to continuously and reliably perform this task knows it’s a bit more complicated than that. &lt;br&gt;
First, there is authentication, which can be as simple as a username/password or as complicated as implementing a whole OAuth flow (and securely storing and managing these credentials). &lt;/p&gt;

&lt;p&gt;We also need to maintain state between runs of the script so we don’t keep rereading the same data over and over. &lt;/p&gt;

&lt;p&gt;Afterwards, we’ll need to handle rate limiting and retry intermittent errors, making sure not to confuse them with real errors that can’t be retried. &lt;/p&gt;

&lt;p&gt;We’ll then want to transform data into a format suitable for downstream consumers, all while performing enough logging to fix problems when things inevitably break. &lt;/p&gt;

&lt;p&gt;Oh, and all this needs to be well tested, easily deployable... and done yesterday, of course.&lt;/p&gt;

&lt;p&gt;All in all, it currently takes a few full days to build a new REST API source connector. This barrier to entry not only means fewer connectors created by the community, but  can often mean  lower quality connectors. &lt;/p&gt;

&lt;p&gt;However, we believe that 80% of this hardship is incidental and can mostly be automated away. Reducing implementation time would significantly help the community contribute and address the long tail of connectors. If this automation is done in a smart way, we might also be able to improve standardization and thus maintenance across all connectors contributed. &lt;/p&gt;

&lt;h1&gt;
  
  
  What a Connector Development Kit Looks Like
&lt;/h1&gt;

&lt;p&gt;Let’s look again at the work involved in building a connector, but from a different perspective.&lt;/p&gt;

&lt;p&gt;Incidental complexity&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setting up the package structure&lt;/li&gt;
&lt;li&gt;Packaging the connector in a Docker container and setting up the release pipeline
Lots of repeated logic:&lt;/li&gt;
&lt;li&gt;Reinventing the same design patterns and code structure for every connector type (REST APIs, Databases, Warehouses, Lakes, etc.) &lt;/li&gt;
&lt;li&gt;Writing the same helpers for transforming data into a standard format, implementing incremental syncs, logging, input validation, etc.&lt;/li&gt;
&lt;li&gt;Testing that the connector is correctly adhering to the protocol.
Testing &lt;a href="https://en.wikipedia.org/wiki/Happy_path"&gt;happy flows&lt;/a&gt; and edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can see that a lot can be automated away, and you’ll be happy to know that Airbyte has made available an open source Connector Development Kit (CDK) to do all this. &lt;/p&gt;

&lt;p&gt;We believe in the end, the way to &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/how-to-build-thousands-of-connectors/"&gt;build thousands of high-quality connectors is to think in onion layers&lt;/a&gt;. To make a parallel with the &lt;a href="https://www.hava.io/blog/cattle-vs-pets-devops-explained"&gt;pet/cattle&lt;/a&gt; concept that is well known in DevOps/Infrastructure, a connector is cattle code, and you want to spend as little time on it as possible. This will accelerate productivity tremendously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Abstractions as onion layers
&lt;/h2&gt;

&lt;p&gt;Maximizing high-leverage work leads you to build your architecture with an onion-esque structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Nm39VCmC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dcv65nu22j1r60tcks2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Nm39VCmC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dcv65nu22j1r60tcks2z.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The center defines the lowest level of the API. Implementing a connector at that level requires a lot of engineering time. But it is your escape hatch for very complex connectors where you need a lot of control.&lt;/p&gt;

&lt;p&gt;Then, you build new layers of abstraction that helps tackle families of connectors very quickly. For example, sources have a particular interface, and destinations have a different kind of interface. &lt;/p&gt;

&lt;p&gt;Then, for sources you have different kinds like HTTP-API based connectors and Databases. HTTP connectors might be split into REST, GraphQL, and SOAP, whereas Databases might split into relational, NoSQL, and graph databases. Destinations might split into Warehouses, Datalakes, and APIs (for reverse ELT). &lt;br&gt;
The CDK is the framework for those abstractions!&lt;/p&gt;

&lt;h1&gt;
  
  
  What Is Already Available
&lt;/h1&gt;

&lt;p&gt;Airbyte’s CDK is still in its early days, so expect lots of improvements to come over time. Today, the framework ships with the following features: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Python framework for writing source connectors &lt;/li&gt;
&lt;li&gt;A generic implementation for rapidly developing connectors for HTTP APIs&lt;/li&gt;
&lt;li&gt;A test suite to test compliance with the Airbyte Protocol and happy code paths &lt;/li&gt;
&lt;li&gt;A code generator to bootstrap development and package your connector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the end, the CDK enables building robust, full-featured connectors within &lt;strong&gt;2 hours versus 2 days previously&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The Airbyte team has been using the framework internally to develop connectors, and it is the culmination of our experience developing more than 70+ connectors (our goal is 200 by end of the year with help from the user community!). Everything we learn from our own experience, along with the user community go into improving the CDK.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion - The Future Ahead
&lt;/h1&gt;

&lt;p&gt;Wouldn’t it be great to bring the time needed to build a new connector down to 10 minutes, and to extend to more and more families of possible integrations. How’s that for a moonshot! &lt;/p&gt;

&lt;p&gt;If we manage to do that together with our user community, then at long last the long tail of integrations will be addressed in no time! Not to mention that data integration pipelines will be commoditized through open-source. &lt;/p&gt;

&lt;p&gt;If you would like to get involved, we hope you’ll join our &lt;a href="https://slack.airbyte.io"&gt;Slack community&lt;/a&gt; - the most active one around data integration - as we connect to the future of open source for the benefit of all!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>database</category>
    </item>
    <item>
      <title>How “User Success” Helps Us Become the Most Active Slack Community</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Tue, 27 Apr 2021 06:11:56 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-user-success-helps-us-become-the-most-active-slack-community-4j4b</link>
      <guid>https://dev.to/airbytehq/how-user-success-helps-us-become-the-most-active-slack-community-4j4b</guid>
      <description>&lt;p&gt;Today, we’re celebrating three important milestones for &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt;. Within just 7 months of the release of our very first product (MVP) - which had only 6 connectors - we became &lt;strong&gt;the most active Slack community of data professionals around data integration&lt;/strong&gt;. This is our first milestone.&lt;/p&gt;

&lt;p&gt;As you might already know, we are a transparent company. Every month or so, we publish information on our project and company that would be confidential in other companies, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://airbyte.io/articles/our-story/the-deck-we-used-to-raise-our-seed-with-accel-in-13-days/"&gt;slides we used to raise our seed round with Accel&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Our &lt;a href="https://handbook.airbyte.io"&gt;company handbook&lt;/a&gt; with even our strategy and business model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, we want to tell you more about our Slack community, our focus on user success and what it means for our community, and two other not yet announced milestones.&lt;/p&gt;

&lt;h2&gt;The Most Active Slack Community on Data Integration&lt;/h2&gt;

&lt;p&gt;This weekend, we reached the milestone of 1,000 Slack members, and at the same time became the most active community. &lt;/p&gt;

&lt;p&gt;Within 7 months, we grew from 5 people (our original team) to &lt;strong&gt;1,020 members as of 04/26/21&lt;/strong&gt;. Out of those 1,030 members, &lt;strong&gt;450 are active weekly&lt;/strong&gt;, and this resulted in &lt;strong&gt;115k messages exchanged with the community. &lt;/strong&gt;Yes, 45% of our Slack community is active on a weekly basis, which is a great starting point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TbwMJboV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/Slack-community-1024x715.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TbwMJboV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/Slack-community-1024x715.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The last time we checked, Singer’s Slack community had 40k messages after 4 years, and Meltano had 33k within 2 years. With Airbyte reaching 115k messages in 7 months, who knows how many we’ll have in 2 or 4 years?!&lt;/p&gt;

&lt;h2&gt;Defining “User Success”&lt;/h2&gt;

&lt;p&gt;Airbyte’s community is worldwide. About 35% of our users come from the US, but the remaining majority is spread across the globe. That’s why we decided to build a remote-first team with people in France, United Kingdom, India, Singapore, New Caledonia (near Australia), and the US to cover all timezones. The goal is to be the best at what we call “user success.”&lt;/p&gt;

&lt;p&gt;What is user success? You're probably familiar with customer success, which is well known in the SaaS world. In customer success, your goal is to make your customers successful with your product. However, when you are an open-source tool, you are first focusing on becoming the industry standard, and therefore, you’re focusing on the users of your open-source project. &lt;/p&gt;

&lt;p&gt;Within Airbyte, we define “&lt;strong&gt;user success”&lt;/strong&gt; &lt;strong&gt;as our team’s focus to&lt;/strong&gt; &lt;strong&gt;help our users be successful in whatever project they want to build around data, whether it be with Airbyte or another tool.&lt;/strong&gt; We believe the best way to build trust with our community is by aligning our goals and incentives with theirs; we want them to know we have their back and always will. &lt;/p&gt;

&lt;h2&gt;Measuring User Success&lt;/h2&gt;

&lt;p&gt;We’re measuring two things: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Time to first response&lt;/li&gt;
&lt;li&gt;Time to resolution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time to first response&lt;/strong&gt; is the time elapsed between a user request on our Slack and the first response from a team member or community member. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time to resolution&lt;/strong&gt; is the time elapsed between the first user request and when the thread is marked with a ✅ emoji. That is how we notify the rest of the team that this request has been fully addressed. &lt;/p&gt;

&lt;p&gt;For the moment, we have an average of &lt;strong&gt;2 hours 30 minutes&lt;/strong&gt; for the time to first response, and our time to resolution is about &lt;strong&gt;3 hours 30 minutes&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;We were also thinking about tracking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;resolution rate&lt;/strong&gt;, i.e., the percentage of threads that have been marked with a ✅ emoji, but the data was too skewed by us sometimes forgetting to mark the thread as resolved. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;feature-coverage rate&lt;/strong&gt;, i.e., the % of users we interact with for whom we have met all their feature and connector needs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Some Examples of User Success Processes&lt;/h2&gt;

&lt;p&gt;So, what specific actions do we take in terms of user success? &lt;/p&gt;

&lt;p&gt;Well, for instance, we personally &lt;strong&gt;welcome every new Slack member&lt;/strong&gt; with a personalized message. It takes a bit of time but it is definitely worth it, as it enables us to understand their use cases and needs. &lt;/p&gt;

&lt;p&gt;You can note all of this information in your CRM or community tool (we are big fans of &lt;a href="https://orbit.love"&gt;Orbit&lt;/a&gt;) so that when you release a new feature, you can notify those who expressed any interest in that feature. That’s exactly what we do with connectors. &lt;strong&gt;Every time we support a new connector, we’ll reach out to all users who mentioned any interest in that connector.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Any interaction with the user is an opportunity to get information on how we can provide more value at a later date. &lt;/p&gt;

&lt;p&gt;In the end, as for customer success, you want your users to be more and more successful with your open-source tool, so they become your next advocates. &lt;/p&gt;

&lt;h2&gt;What Kind of Role in User Success?&lt;/h2&gt;

&lt;p&gt;Airbyte is an open-source data integration platform, so it’s targeting data engineers, analysts and scientists. The only way to help them become successful is by helping them solve their technical issues. So the role that makes sense is a &lt;strong&gt;User Success Engineer. &lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And, as a matter of fact, we just hired for the role at Airbyte. Here is the description of the role: &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Your goal as a User Success Engineer is to make our users successful when deploying or contributing to Airbyte.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The main responsibilities of the role will be:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Help users troubleshoot issues they have when deploying or contributing to Airbyte.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Write documentation and make (or suggest) code changes to resolve recurring issues.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Triage bugs to the correct team (or fix the issue yourself).&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Airbyte’s open-source community has been growing very quickly, and one component of our success is the love of our community. This role is instrumental to scaling the support to our users, and includes finding ways to reduce the overall cost of user support through better documentation and new processes. &lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An excellent candidate will become an expert in the Airbyte system. They will determine which information needs to be shared with the engineering team so that the team has a deep understanding of existing pain points. They will also filter out information that they can resolve themselves through code fixes, documentation, or by working with the users. This will allow the engineering team to be laser focused on the product goals while maintaining intense user empathy. The role is at the heart of our &lt;a href="https://handbook.airbyte.io/company/culture-and-values#humility-and-maximizing-growth"&gt;values&lt;/a&gt; of leveraging our time and abilities.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An ideal candidate can start out as an individual contributor but can grow this operation into a team as the company scales.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;---------&lt;/p&gt;

&lt;p&gt;We hope this gives you some insight on how we think about user success at Airbyte and its community. So how does this translate in terms of measurable goals?&lt;/p&gt;

&lt;h2&gt;Our Next Milestone Is 1,000 Weekly Active Slack Members&lt;/h2&gt;

&lt;p&gt;The actual metric you want to track is the activity level of your community. Having a non-engaged metric is a waste of time for everybody. So one would think that we should define our next goal in terms of messages exchanged in the community. Why not aim for 1M messages?&lt;/p&gt;

&lt;p&gt;The issue with that approach is that messages are not synonymous with value brought to your users. If it takes you half the messages to get your point across and solve your users’ issues, you should definitely go this way. Number of messages is not the right proxy, and never was in our case.&lt;/p&gt;

&lt;p&gt;The right approach is to &lt;strong&gt;track whether your community keeps being engaged, and that is, simply, weekly active members. &lt;/strong&gt;That’s why our next milestone is not signups, or messages exchanged, but 1,000 weekly active Slack members. &lt;/p&gt;

&lt;h2&gt;How to Achieve the Next Milestone&lt;/h2&gt;

&lt;p&gt;This is where we want to announce &lt;em&gt;two&lt;/em&gt; new milestones. &lt;/p&gt;

&lt;h3&gt;1. Our First Developer Advocate Hire&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/avaidyanatha/"&gt;Abhi Vaidyanatha&lt;/a&gt; joined us on 04/26. As our senior developer advocate, he will work on constantly improving our developer experience and engagement. This includes documentation, tutorials, and, therefore, insightful content for our Slack communities. &lt;/p&gt;

&lt;p&gt;Maybe we’ll do AMAs there - anything becomes possible when you have someone with the energy of Abhi!&lt;/p&gt;

&lt;h3&gt;2. Our First User Success Engineer Hire&lt;/h3&gt;

&lt;p&gt;If, by any chance, we coined the term “user success engineer,” feel free to reuse the term, as it should be open-sourced (MIT) like the rest of Airbyte 😉.&lt;/p&gt;

&lt;p&gt;Our first user success engineer should be joining us in the next few weeks. This person will help us drive the time to first response and resolution down so you’ll have the best support experience with Airbyte in the whole ETL/ELT industry - while just using the open-source edition!&lt;/p&gt;

&lt;p&gt;---&lt;/p&gt;

&lt;p&gt;You will see that the Airbyte team will be growing fast in the next few weeks. And we also have big plans for the Slack community, but we won’t reveal everything just yet as we want to keep some surprises for you!&lt;/p&gt;

&lt;p&gt;In case you didn’t join, here’s our &lt;a href="https://slack.airbyte.io"&gt;Slack community&lt;/a&gt;, and you can also contribute to our &lt;a href="https://github.com/airbytehq/airbyte"&gt;GitHub repository&lt;/a&gt;. Either way - whether you’re already a member or planning to join - we hope to hear from you soon!&lt;/p&gt;

&lt;p&gt;And yes, Airbyte is also about to become the GitHub repo with the most stars around data integration, too!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_Vpir5jp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/Github-stars-1024x697.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_Vpir5jp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/Github-stars-1024x697.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>How We Performed on Our Q1 OKRs, and The Goals for Q2</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Wed, 14 Apr 2021 10:07:05 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-we-performed-on-our-q1-okrs-and-the-goals-for-q2-55j4</link>
      <guid>https://dev.to/airbytehq/how-we-performed-on-our-q1-okrs-and-the-goals-for-q2-55j4</guid>
      <description>&lt;p&gt;In January, we shared how we were thinking about OKRs, along with &lt;a href="https://airbyte.io/articles/our-story/our-okrs-for-q1-2021/"&gt;our OKRs for Q1 2021&lt;/a&gt;. So we wanted to give some updates about them, and how they have evolved for the 2nd quarter. &lt;/p&gt;

&lt;p&gt;Our focus for 2021 is to become the open-source standard for replicating data. This entails three overarching goals: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://airbyte.io/articles/our-story/our-truth-for-2021-airbyte-just-works/"&gt;Making Airbyte just work&lt;/a&gt;&lt;/strong&gt; whatever your data infrastructure, volume and connector needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building the largest developer community for data integration&lt;/strong&gt;. We envision that most connectors will be built and maintained by the community eventually, because we will have made that so simple with our low-code framework. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Making Airbyte so easy to use in a production context&lt;/strong&gt; that Airbyte becomes the new standard for data teams to replicate data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s see how this translates itself into our first two quarterly OKRs. &lt;/p&gt;

&lt;h1&gt;
  
  
  How We Performed on Airbyte’s OKRs for Q1 2021
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. O: Growing &lt;a href="https://github.com/orbit-love/orbit-model#love"&gt;Community Love&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;What is community love? We’re still big fans of &lt;a href="https://github.com/orbit-love/orbit-model#love"&gt;Orbit’s definition&lt;/a&gt; for it. Love is a member's level of engagement and investment in the community. Someone with high love is highly active and plays key roles in the community, like contributing, moderating, and organizing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let’s first look at GitHub Stars&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this chart, we’re comparing Airbyte with other famous open-source projects around data integration: DBT and RudderStack. Our growth rate (Airbyte in red) is a huge validation that we’re not the only ones to believe that data integration will be solved with an open-source and community approach. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Obwl82fh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qc12dv0endhzomvkio1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Obwl82fh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qc12dv0endhzomvkio1t.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub stars are good awareness metrics, but they don’t mean that you actually have community adoption or contribution. We need to look at other metrics for that: &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gi77yhGq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jsyd8o9bzj7tobjxy4hk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gi77yhGq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jsyd8o9bzj7tobjxy4hk.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overall, we outperformed our Q1 OKRs for community love, even though we set aggressive goals. This is still the very beginning of our journey, but this was extremely encouraging for all the team. We strongly believe we can commoditize data integration through our growing community. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. O: Growing Production Usage
&lt;/h2&gt;

&lt;p&gt;We call “&lt;strong&gt;activated users&lt;/strong&gt;” users who have deployed Airbyte, connected a source, a destination and synced data successfully from this source to this destination. &lt;/p&gt;

&lt;p&gt;We call “&lt;strong&gt;prod users&lt;/strong&gt;” users who have been syncing data more than 5 times in the past week and 5 times in the week before. &lt;/p&gt;

&lt;p&gt;Here’s a chart showing the evolution of activated users and prod users during Q1. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--js1zDDT8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t3apk8ikv78lzpo9v93r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--js1zDDT8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t3apk8ikv78lzpo9v93r.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We don’t publish the number of prod users we have yet, but you can see that the conversion from activated to prod users is growing with time, which is what we want to see.&lt;/p&gt;

&lt;p&gt;But, is the usage of Airbyte growing among prod users?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lz-NQuir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9i77dp8d3gvl2g2p3is.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lz-NQuir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9i77dp8d3gvl2g2p3is.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we had to follow only one graph, it would be this one. It accounts for both prod user growth and usage growth within prod users. &lt;/p&gt;

&lt;p&gt;Here’s the usage growth in terms of sync per prod user:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lz-NQuir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9i77dp8d3gvl2g2p3is.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lz-NQuir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9i77dp8d3gvl2g2p3is.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overall, this was exactly what we wanted to see. Teams start by testing Airbyte for a few days or weeks, before expanding their usage to other connectors. &lt;/p&gt;

&lt;h2&gt;
  
  
  3. O: Becoming a Reliable Standard
&lt;/h2&gt;

&lt;p&gt;Airbyte can only become the new standard if connectors are reliable. You could consider that a “sanity” metric - in the sense it is not related to some growth metrics -, but it is actually where almost all of the engineering work goes. The more users use Airbyte, the more edge cases connectors get exposed to. It is a thousand-paper-cut problem, where every user comes with their needs in terms of usage, data and volume. The more users we have, the less reliable connectors can appear, and we have to seize these opportunities to strengthen them. &lt;/p&gt;

&lt;p&gt;The metrics we’re looking at in this case are the percent of failures at sync attempts:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qAaCh-EH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0h77dwsezdane29lvuh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qAaCh-EH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0h77dwsezdane29lvuh0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We launched on HackerNews on January 26th. That’s when we gained a lot more users at once and got exposed to a lot more use cases. During the whole month of February, we worked on &lt;a href="https://airbyte.io/articles/our-story/february-a-month-of-stabilization-for-a-new-acceleration-phase/"&gt;strengthening our connectors&lt;/a&gt;, and you can see in this chart how it paid off. Our KR was 5% of failures by the end of the quarter, and this is something that we will keep working on. &lt;/p&gt;

&lt;p&gt;Some other metrics we wanted to track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KR: Response time to any message on Slack or GitHub - our goal was to reach &amp;lt;30 min by end of Q1 2021.&lt;/li&gt;
&lt;li&gt;KR: Time to high bug resolution - our goal is to reach 1.5 days by the end of Q1 2021. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the end, we couldn’t really measure those 2 metrics. But the overall response time to any message on Slack was about 1-2 hours. &lt;/p&gt;

&lt;h2&gt;
  
  
  4. O: Building the Dream Team
&lt;/h2&gt;

&lt;p&gt;We strongly believe in &lt;a href="http://www.hightechinthehub.com/2014/09/talent-density/#:~:text=Talent%20density%20works%20like%20this,communities%20in%20which%20they%20engage."&gt;talent density&lt;/a&gt;, and that it’s better to have one stellar colleague than 5 average ones. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KR: 2 A+ engineers =&amp;gt; 3 engineers will be joining us in the next few weeks.&lt;/li&gt;
&lt;li&gt;KR: 1 senior developer advocate =&amp;gt; &lt;a href="https://www.linkedin.com/in/avaidyanatha/"&gt;Abhi&lt;/a&gt; will be joining us soon!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Our Q1 Milestones
&lt;/h1&gt;

&lt;p&gt;Now that we have seen how we performed on our OKRs, how did we perform on the milestones? &lt;/p&gt;

&lt;h2&gt;
  
  
  Community efforts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;January: &lt;a href="https://news.ycombinator.com/item?id=25917403"&gt;Hard launch on HackerNews&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;Building tutorials to improve the developer experience (DX) in building their own connectors, or editing pre-built ones =&amp;gt; this is still a work in progress.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Product engineering efforts
&lt;/h2&gt;

&lt;p&gt;One thing we didn’t anticipate is the toll providing great support would take on our engineering velocity. Even though we had great output, we were not able to deliver on all the milestones we had intended. &lt;/p&gt;

&lt;p&gt;For our core platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration in data stack with DBT and Airflow =&amp;gt; delivered, although we still have a lot on DBT’s front!&lt;/li&gt;
&lt;li&gt;Core upgrade strategy =&amp;gt; delivered!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For our connectors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strengthen our connectors so all our connectors are A+ =&amp;gt; we started certifying our connectors against a set of best practice, and you can now see the health status of our connectors. &lt;/li&gt;
&lt;li&gt;Schemas migration management =&amp;gt; reprioritized&lt;/li&gt;
&lt;li&gt;Seamless OAuth support =&amp;gt; reprioritized&lt;/li&gt;
&lt;li&gt;More high-level abstractions to build connectors more easily =&amp;gt; ongoing effort!&lt;/li&gt;
&lt;li&gt;An MVP for CDC (Capture Data Change) =&amp;gt; delivered!&lt;/li&gt;
&lt;li&gt;Connector upgrade strategy =&amp;gt; delivered!&lt;/li&gt;
&lt;li&gt;A public dashboard showing the stability (failure rate) of all our connectors =&amp;gt; delivered!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Our New Q2 OKRs
&lt;/h1&gt;

&lt;p&gt;So what about the next quarter? Doing OKRs is actually a great learning opportunity enabling us to make better estimates every time. This time, we have experience on how much time providing a great support experience takes in engineering time. So we can plan accordingly. &lt;/p&gt;

&lt;p&gt;For Q2, we kept the same objectives but changed some KRs that we’ve put in bold. &lt;/p&gt;

&lt;h2&gt;
  
  
  O: Growing Community Love
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;KR: Active Slack users (Q1/21: 350, Q2/21: 600)&lt;/li&gt;
&lt;li&gt;KR: GitHub stars (Q1/21: 2k, Q2/21: 4k)&lt;/li&gt;
&lt;li&gt;KR: Issue contributors from start (Q1/21: 125, Q2/21: 250)&lt;/li&gt;
&lt;li&gt;KR: PR contributors from start (Q1/21: 25, Q2/21: 50)&lt;/li&gt;
&lt;li&gt;KR: Connector Contributors (Q1/21: 10, Q2/21: 30)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  O: Growing Prod Usage
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;KR: Prod users &lt;/li&gt;
&lt;li&gt;KR: Active connections per prod user&lt;/li&gt;
&lt;li&gt;KR: # connectors (Q1/21: 56, Q2/21: 90)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  O: Becoming a Reliable Standard
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;KR: % failure at attempts&lt;/li&gt;
&lt;li&gt;KR: average throughput of connectors&lt;/li&gt;
&lt;li&gt;KR: support replicating large databases in X minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  O: Building the Dream Team
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;KR: 2 A+ engineers&lt;/li&gt;
&lt;li&gt;KR: 1 dev evangelist (to be confirmed) + 1 operations manager&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Our Next Q2 Milestones
&lt;/h1&gt;

&lt;p&gt;How does this translate into milestones?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make Airbyte the easiest way to create line-of-business connectors with our low-code solution for creating connectors quickly and more reliably.&lt;/li&gt;
&lt;li&gt;Support custom DBT models.&lt;/li&gt;
&lt;li&gt;CDC for all major database sources.&lt;/li&gt;
&lt;li&gt;Mature handling of (large) production data sets. &lt;/li&gt;
&lt;li&gt;Production-grade single node support (across platforms): creating solid AMIs, systemctl, etc., with less setup.&lt;/li&gt;
&lt;li&gt;First-class support on K8s.&lt;/li&gt;
&lt;li&gt;OAuth support for connector authentication.&lt;/li&gt;
&lt;li&gt;"Automatic" Schema change handling.&lt;/li&gt;
&lt;li&gt;Support for data lake use cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So... a lot of engineering milestones! And they can be accomplished as we grow our engineering team. &lt;/p&gt;

&lt;p&gt;Let’s see how we perform in 3 months!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>management</category>
    </item>
    <item>
      <title>How to Visualize the Time Spent by Your Team in Zoom Calls</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Mon, 05 Apr 2021 07:15:06 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-to-visualize-the-time-spent-by-your-team-in-zoom-calls-3bll</link>
      <guid>https://dev.to/airbytehq/how-to-visualize-the-time-spent-by-your-team-in-zoom-calls-3bll</guid>
      <description>&lt;p&gt;In this article, we will show you how you can understand how much your team leverages Zoom, or spends time in meetings, in a couple of minutes. We will be using &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt; (an open-source data integration platform) and &lt;a href="https://www.tableau.com"&gt;Tableau&lt;/a&gt; (a business intelligence and analytics software) for this tutorial.&lt;/p&gt;

&lt;p&gt;Here is what we will cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step 1: Setting up data replication from Zoom to a PostgreSQL database using the Airbyte Zoom connector&lt;/li&gt;
&lt;li&gt;Step 2: Connecting the PostgreSQL database to Tableau&lt;/li&gt;
&lt;li&gt;Step 3: Creating charts in Tableau with Zoom data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will produce the following charts in Tableau:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evolution of the number of meetings per week in a team&lt;/li&gt;
&lt;li&gt;Evolution of the number of hours a team spends in meetings per week&lt;/li&gt;
&lt;li&gt;Listing of team members with the number of meetings per week and number of hours spent in meetings, ranked&lt;/li&gt;
&lt;li&gt;Evolution of the number of webinars per week in a team&lt;/li&gt;
&lt;li&gt;Evolution of the number of hours a team spends in webinars per week&lt;/li&gt;
&lt;li&gt;Evolution of the number of participants for all webinars in a team per week&lt;/li&gt;
&lt;li&gt;Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s get started by replicating Zoom data using Airbyte.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Replicating Zoom data to PostgreSQL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Launching Airbyte
&lt;/h3&gt;

&lt;p&gt;In order to replicate Zoom data, we will need to use &lt;a href="https://docs.airbyte.io/integrations/sources/zoom"&gt;Airbyte’s Zoom connector&lt;/a&gt;. To do this, you need to start off Airbyte’s web app by opening up your terminal and navigating to Airbyte and running:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker-compose up&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can find more details about this in the &lt;a href="https://docs.airbyte.io/getting-started"&gt;Getting Started&lt;/a&gt; tutorial.&lt;/p&gt;

&lt;p&gt;This will start up Airbyte on &lt;code&gt;localhost:8000&lt;/code&gt;; open that address in your browser to access the Airbyte dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Hvxpm66w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/01_airbyte-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Hvxpm66w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/01_airbyte-dashboard.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the top right corner of the Airbyte dashboard, click on the &lt;strong&gt;+ new source&lt;/strong&gt; button to add a new Airbyte source. In the screen to set up the new source, enter the source name (we will use airbyte-zoom) and select &lt;strong&gt;Zoom&lt;/strong&gt; as source type.&lt;/p&gt;

&lt;p&gt;Choosing Zoom as &lt;strong&gt;source type&lt;/strong&gt; will cause Airbyte to display the configuration parameters needed to set up the Zoom source.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1Jv2z_aQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/02_setting-zoom-connector-name.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1Jv2z_aQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/02_setting-zoom-connector-name.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Zoom connector for Airbyte requires you to provide it with a Zoom JWT token. Let’s take a detour and look at how to obtain one from Zoom. &lt;/p&gt;

&lt;h3&gt;
  
  
  Obtaining a Zoom JWT Token
&lt;/h3&gt;

&lt;p&gt;To obtain a Zoom JWT Token, login to your Zoom account and go to the &lt;a href="https://marketplace.zoom.us/"&gt;Zoom Marketplace&lt;/a&gt;. If this is your first time in the marketplace, you will need to agree to the Zoom’s marketplace terms of use. &lt;/p&gt;

&lt;p&gt;Once you are in, you need to click on the &lt;strong&gt;Develop&lt;/strong&gt; dropdown and then click on &lt;strong&gt;Build App.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gCMk5Wif--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/03_click.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gCMk5Wif--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/03_click.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicking on &lt;strong&gt;Build App&lt;/strong&gt; for the first time will display a modal for you to accept the Zoom’s API license and terms of use. Do accept if you agree and you will be presented with the below screen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IwhJOVPC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/04_zoom-marketplace-build-screen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IwhJOVPC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/04_zoom-marketplace-build-screen.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;JWT&lt;/strong&gt; as the app you want to build and click on the &lt;strong&gt;Create&lt;/strong&gt; button on the card. You will be presented with a modal to enter the app name; type in &lt;code&gt;airbyte-zoom&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sgS6kZJS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/05_app-name-modal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sgS6kZJS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/05_app-name-modal.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, click on the &lt;strong&gt;Create&lt;/strong&gt; button on the modal.&lt;/p&gt;

&lt;p&gt;You will then be taken to the &lt;strong&gt;App Information&lt;/strong&gt; page of the app you just created. Fill in the required information (at the very least).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SC5OJSj7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/06_app-information.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SC5OJSj7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/06_app-information.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After filling in the needed information, click on the &lt;strong&gt;Continue&lt;/strong&gt; button. You will be taken to the &lt;strong&gt;App Credentials&lt;/strong&gt; page. Here, click on the &lt;strong&gt;View JWT Token&lt;/strong&gt; dropdown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hSS0iivv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/07_view-jwt-token.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hSS0iivv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/07_view-jwt-token.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There you can set the expiration time of the token (we will leave the default 90 minutes), and then you click on the &lt;strong&gt;Copy&lt;/strong&gt; button of the &lt;strong&gt;JWT Token&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After copying it, click on the &lt;strong&gt;Continue&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Aak8peYd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/08_activate-webhook.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Aak8peYd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/08_activate-webhook.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will be taken to a screen to activate &lt;strong&gt;Event Subscriptions&lt;/strong&gt;. Just leave it as is, as we won’t be needing Webhooks. Click on &lt;strong&gt;Continue&lt;/strong&gt;, and your app should be marked as activated. &lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting Zoom on Airbyte
&lt;/h3&gt;

&lt;p&gt;So let’s go back to the Airbyte web UI and provide it with the JWT token we copied from our Zoom app.&lt;/p&gt;

&lt;p&gt;Now click on the &lt;strong&gt;Set up source&lt;/strong&gt; button. You will see the below success message when the connection is made successfully. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZchTmLPY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/09_setup-successful.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZchTmLPY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/09_setup-successful.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And you will be taken to the page to add your destination.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting PostgreSQL on Airbyte
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oOlA8EiX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/10_destination.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oOlA8EiX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/10_destination.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For our destination, we will be using a PostgreSQL, since Tableau supports PostgreSQL as a data source. Click on the &lt;strong&gt;add destination&lt;/strong&gt; button, and then in the drop down click on &lt;strong&gt;+ add a new destination&lt;/strong&gt;.  In the page that presents itself, add the destination name and choose the Postgres destination.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--loQrIGU5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/11_choose-postgres-destination.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--loQrIGU5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/11_choose-postgres-destination.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To supply Airbyte with the PostgreSQL configuration parameters needed to make a PostgreSQL destination, we will spin off a PostgreSQL container with Docker using the following command in our terminal.  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker run --rm --name airbyte-zoom-db -e POSTGRES_PASSWORD=password -v airbyte_zoom_data:/var/lib/postgresql/data -p 2000:5432 -d postgres&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This will spin a docker container and persist the data we will be replicating in the PostgreSQL database in a Docker volume &lt;code&gt;airbyte_zoom_data&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;Now, let’s supply the above credentials to the Airbyte UI requiring those credentials.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dCuaD_k5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/12_postgres_credentials.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dCuaD_k5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/12_postgres_credentials.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then click on the &lt;strong&gt;Set up destination&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;After the connection has been made to your PostgreSQL database successfully, Airbyte will generate the schema of the data to be replicated in your database from the Zoom source.&lt;/p&gt;

&lt;p&gt;Leave all the fields checked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8bIrJIA2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/13_schema.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8bIrJIA2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/13_schema.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select a &lt;strong&gt;Sync frequency&lt;/strong&gt; of &lt;strong&gt;manual&lt;/strong&gt; and then click on &lt;strong&gt;Set up connection&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After successfully making the connection, you will see your PostgreSQL destination. Click on the Launch button to start the data replication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OGXnKDNI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/14_launch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OGXnKDNI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/14_launch.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then click on the &lt;strong&gt;airbyte-zoom-destination&lt;/strong&gt; to see the Sync page. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BOGSRgvY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/15_sync-screen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BOGSRgvY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/15_sync-screen.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Syncing should take a few minutes or longer depending on the size of the data being replicated. Once Airbyte is done replicating the data, you will get a &lt;strong&gt;succeeded&lt;/strong&gt; status.  &lt;/p&gt;

&lt;p&gt;Then, you can run the following SQL command on the PostgreSQL container to confirm that the sync was done successfully.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker exec airbyte-zoom-db psql -U postgres -c "SELECT * FROM public.users;"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now that we have our Zoom data replicated successfully via Airbyte, let’s move on and set up Tableau to make the various visualizations and analytics we want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Connect the PostgreSQL database to Tableau
&lt;/h2&gt;

&lt;p&gt;Tableau helps people and organizations to get answers from their data. It’s a visual analytic platform that makes it easy to explore and manage data.&lt;/p&gt;

&lt;p&gt;To get started with Tableau, you can opt in for a &lt;a href="https://www.tableau.com/products/trial"&gt;free trial period&lt;/a&gt; by providing your email and clicking the &lt;strong&gt;DOWNLOAD FREE TRIAL&lt;/strong&gt; button to download the Tableau desktop app. The download should automatically detect your machine type (Windows/Mac).&lt;/p&gt;

&lt;p&gt;Go ahead and install Tableau on your machine. After the installation is complete, you will need to fill in some more details to activate your free trial.&lt;/p&gt;

&lt;p&gt;Once your activation is successful, you will see your Tableau dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tgtFkZRM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/16_tableau-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tgtFkZRM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/16_tableau-dashboard.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the sidebar menu under the &lt;strong&gt;To a Server&lt;/strong&gt; section, click on the &lt;strong&gt;More…&lt;/strong&gt; menu. You will see a list of datasource connectors you can connect Tableau with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lwN2Ld-F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/17_datasources.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lwN2Ld-F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/17_datasources.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;PostgreSQL&lt;/strong&gt; and you will be presented with a connection credentials modal.&lt;/p&gt;

&lt;p&gt;Fill in the same details of the PostgreSQL database we used as the destination in Airbyte. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cA3FkFv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/18_fill-in-connection-details.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cA3FkFv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/18_fill-in-connection-details.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, click on the &lt;strong&gt;Sign In&lt;/strong&gt; button. If the connection was made successfully, you will see the Tableau dashboard for the database you just connected.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: If you are having trouble connecting PostgreSQL with Tableau, it might be because the driver Tableau comes with for PostgreSQL might not work for newer versions of PostgreSQL. You can download the JDBC driver for PostgreSQL&lt;/em&gt; &lt;a href="https://www.tableau.com/support/drivers?_ga=2.62351404.1800241672.1616922684-1838321730.1615100968"&gt;&lt;em&gt;here&lt;/em&gt;&lt;/a&gt; &lt;em&gt;and follow the setup instructions.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Now that we have replicated our Zoom data into a PostgreSQL database using Airbyte’s Zoom connector, and connected Tableau with our PostgreSQL database containing our Zoom data, let’s proceed to creating the charts we need to visualize the time spent by a team in Zoom calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Create the charts on Tableau with the Zoom data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Evolution of the number of meetings per week in a team
&lt;/h3&gt;

&lt;p&gt;To create this chart, we will need to use the count of the meetings and the &lt;strong&gt;createdAt&lt;/strong&gt; field of the &lt;strong&gt;meetings&lt;/strong&gt; table. Currently, we haven’t selected a table to work on in Tableau. So you will see a prompt to &lt;strong&gt;Drag tables here&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xp6MZoHr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/19_tableau-view-with-all-tables.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xp6MZoHr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/19_tableau-view-with-all-tables.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drag the &lt;strong&gt;meetings&lt;/strong&gt; table from the sidebar onto the space with the prompt. &lt;/p&gt;

&lt;p&gt;Now that we have the meetings table, we can start building out the chart by clicking on &lt;strong&gt;Sheet 1&lt;/strong&gt; at the bottom left of Tableau. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--itlUjjlG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/20_empty-meeting-sheet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--itlUjjlG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/20_empty-meeting-sheet.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As stated earlier, we need &lt;strong&gt;Created At&lt;/strong&gt;, but currently it’s a String data type. Let’s change that by converting it to a data time. So right click on &lt;strong&gt;Created At&lt;/strong&gt;, then select &lt;code&gt;ChangeDataType&lt;/code&gt; and choose Date &amp;amp; Time. And that’s it! That field is now of type &lt;strong&gt;Date&lt;/strong&gt; &amp;amp; &lt;strong&gt;Time&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hmVLk5Fd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/21_change-to-date-time.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hmVLk5Fd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/21_change-to-date-time.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, drag &lt;strong&gt;Created At&lt;/strong&gt; to &lt;strong&gt;Columns&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mc0zsS-q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/22_drag-created-at.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mc0zsS-q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/22_drag-created-at.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Currently, we get the Created At in &lt;strong&gt;YEAR&lt;/strong&gt;, but per our requirement we want them in Weeks, so right click on the &lt;strong&gt;YEAR(Created At)&lt;/strong&gt; and choose &lt;strong&gt;Week Number&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l7q0Icv---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/23_change-to-per-week.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l7q0Icv---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/23_change-to-per-week.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tableau should now look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NoqSB1KI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/24_meetings-per-week.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NoqSB1KI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/24_meetings-per-week.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, to finish up, we need to add the &lt;strong&gt;meetings(Count) measure&lt;/strong&gt; Tableau already calculated for us in the &lt;strong&gt;Rows&lt;/strong&gt; section. So drag &lt;strong&gt;meetings(Count)&lt;/strong&gt; onto the Columns section to complete the chart.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5817OGZH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/25_evolution-of-meetings-per-week.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5817OGZH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/25_evolution-of-meetings-per-week.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now we are done with the very first chart. Let's save the sheet and create a new Dashboard that we will add this sheet to as well as the others we will be creating.&lt;/p&gt;

&lt;p&gt;Currently the sheet shows &lt;strong&gt;Sheet 1&lt;/strong&gt;; right click on &lt;strong&gt;Sheet 1&lt;/strong&gt; at the bottom left and rename it to &lt;strong&gt;Weekly Meetings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To create our Dashboard, we can right click on the sheet we just renamed and choose &lt;strong&gt;new Dashboard&lt;/strong&gt;. Rename the Dashboard to Zoom Dashboard and drag the sheet into it to have something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Wg9s_637--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/26_zoom-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Wg9s_637--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/26_zoom-dashboard.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we have this first chart out of the way, we just need to replicate most of the process we used for this one to create the other charts. Because the steps are so similar, we will mostly be showing the finished screenshots of the charts except when we need to conform to the chart requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution of the number of hours a team spends in meetings per week
&lt;/h3&gt;

&lt;p&gt;For this chart, we need the sum of the duration spent in weekly meetings. We already have a Duration field, which is currently displaying durations in minutes. We can derive a calculated field off this field since we want the duration in hours (we just need to divide the duration field by 60).&lt;/p&gt;

&lt;p&gt;To do this, right click on the Duration field and select &lt;strong&gt;create&lt;/strong&gt;, then click on &lt;strong&gt;calculatedField&lt;/strong&gt;. Change the name to &lt;strong&gt;Duration in Hours&lt;/strong&gt;, and then the calculation should be &lt;strong&gt;[Duration]/60&lt;/strong&gt;. Click ok to create the field.&lt;/p&gt;

&lt;p&gt;So now we can drag the Duration in Hours and Created At fields onto your sheet like so:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jJICNLEw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/27_hours-spent-in-weekly-meetings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jJICNLEw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/27_hours-spent-in-weekly-meetings.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: We are adding a filter on the Duration to filter out null values. You can do this by right clicking on the &lt;strong&gt;SUM(Duration)&lt;/strong&gt; pill and clicking filter, then make sure the &lt;strong&gt;include null values&lt;/strong&gt; checkbox is unchecked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution of the number of participants for all meetings per week
&lt;/h3&gt;

&lt;p&gt;For this chart, we will need to have a calculated field called &lt;strong&gt;# of meetings attended&lt;/strong&gt;, which will be an aggregate of the counts of rows matching a particular user's  email in the &lt;code&gt;report_meeting_participants&lt;/code&gt; table plotted against the &lt;strong&gt;Created At&lt;/strong&gt; field of the &lt;strong&gt;meetings&lt;/strong&gt; table. To get this done, right click on the &lt;strong&gt;User Email&lt;/strong&gt; field. Select &lt;strong&gt;create&lt;/strong&gt; and click on &lt;strong&gt;calculatedField&lt;/strong&gt;, then enter the title of the field as &lt;strong&gt;# of meetings attended&lt;/strong&gt;. Next, enter the below formula:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;COUNT(IF [User Email] == [User Email] THEN [Id (Report Meeting Participants)] END)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then click on apply. Finally, drag the &lt;strong&gt;Created At&lt;/strong&gt; fields (make sure it’s on the &lt;strong&gt;Weekly&lt;/strong&gt; number) and the calculated field you just created to match the below screenshot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5keDWJvR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/28_number_of_participants_per_weekly_meetings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5keDWJvR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/28_number_of_participants_per_weekly_meetings.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Listing of team members with the number of meetings per week and number of hours spent in meetings, ranked.
&lt;/h3&gt;

&lt;p&gt;To get this chart, we need to create a relationship between the &lt;strong&gt;meetings table&lt;/strong&gt; and the &lt;code&gt;report_meeting_participants&lt;/code&gt; table. You can do this by dragging the &lt;code&gt;report_meeting_participants&lt;/code&gt; table in as a source alongside the &lt;strong&gt;meetings&lt;/strong&gt; table and relate both via the &lt;strong&gt;meeting id&lt;/strong&gt;. Then you will be able to create a new worksheet that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KdIgi1EL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/29_meetings-participant-ranked.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KdIgi1EL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/29_meetings-participant-ranked.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: To achieve the ranking, we simply use the sort menu icon on the top menu bar.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution of the number of webinars per week in a team
&lt;/h3&gt;

&lt;p&gt;The rest of the charts will be needing the &lt;strong&gt;webinars&lt;/strong&gt; and &lt;code&gt;report_webinar_participants&lt;/code&gt; tables. Similar to the evolution of the number of meetings per week in a team, we will be plotting the Count of webinars against the &lt;strong&gt;Created At&lt;/strong&gt; property.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iTVuNJH---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/30_weekly-webinars.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iTVuNJH---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/30_weekly-webinars.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution of the number of hours a week spends in webinars per week
&lt;/h3&gt;

&lt;p&gt;For this chart, as for the meeting’s counterpart, we will get a calculated field off the Duration field to get the &lt;strong&gt;Webinar Duration in Hours&lt;/strong&gt;, and then plot &lt;strong&gt;Created At&lt;/strong&gt; against the &lt;strong&gt;Sum of Webinar Duration in Hours&lt;/strong&gt;, as shown in the screenshot below. Note: Make sure you create a new sheet for each of these graphs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--81vKeT5U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/31_time-spent-in-weekly-webinars.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--81vKeT5U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/31_time-spent-in-weekly-webinars.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution of the number of participants for all webinars per week
&lt;/h3&gt;

&lt;p&gt;This calculation is the same as the evolution of the number of participants for all meetings per week, but instead of using the &lt;strong&gt;meetings&lt;/strong&gt; and &lt;code&gt;report_meeting_participants&lt;/code&gt; tables, we will use the webinars and &lt;code&gt;report_webinar_participants&lt;/code&gt; tables.&lt;/p&gt;

&lt;p&gt;Also, the formula will now be:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;COUNT(IF [User Email] == [User Email] THEN [Id (Report Webinar Participants)] END)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Below is the chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3hMECqkt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/32_number_of_webinar_attended_per_week.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3hMECqkt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/32_number_of_webinar_attended_per_week.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked
&lt;/h4&gt;

&lt;p&gt;Below is the chart with these specs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hMjYJvXI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/33_number-of-webinars-participants.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hMjYJvXI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://airbyte.io/wp-content/uploads/2021/04/33_number-of-webinars-participants.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we see how we can use Airbyte to get data off the Zoom API onto a PostgreSQL database, and then use that data to create some chart visualizations in Tableau. &lt;/p&gt;

&lt;p&gt;You can leverage Airbyte and Tableau to produce graphs on any collaboration tool. We just used Zoom to illustrate how it can be done. Hope this is helpful!  &lt;/p&gt;

</description>
      <category>opensource</category>
      <category>database</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Our Truth for 2021: Airbyte Just Works</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Sun, 04 Apr 2021 22:37:21 +0000</pubDate>
      <link>https://dev.to/airbytehq/our-truth-for-2021-airbyte-just-works-g0f</link>
      <guid>https://dev.to/airbytehq/our-truth-for-2021-airbyte-just-works-g0f</guid>
      <description>&lt;p&gt;We try to limit our discussions with VCs, as they can easily become a distraction. As a startup, focus is what will differentiate between success and failure. But sometimes, we can’t refuse an introduction and a discussion, as some investors have a lot of insights on your industry. &lt;/p&gt;

&lt;p&gt;Recently, we had one discussion with a top-tier VC general partner. In addition to a lot of feedback and insights, one question in particular he asked really struck me: “What is your truth for 2021?”&lt;/p&gt;

&lt;p&gt;In this article, we will explain what he means by truth, and what our immediate answer was for &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt;. &lt;/p&gt;

&lt;h1&gt;
  
  
  What is a truth?
&lt;/h1&gt;

&lt;p&gt;A truth is what we absolutely need to achieve for your company to be on the path to success. It is the one thing you need to strive for, and that should determine your priorities, strategy, initiatives, recruiting plan, etc.&lt;/p&gt;

&lt;p&gt;A truth helps put every consideration in perspective. Any time you have a decision to make you can ask yourself whether that brings you closer to that truth. Anything that doesn’t get you closer to it, you should ponder whether you should actually do it. &lt;/p&gt;

&lt;p&gt;It is by having this singular goal in mind that you will give yourself the highest chance to get there. &lt;/p&gt;

&lt;h1&gt;
  
  
  Is a truth a SMART goal in the end?
&lt;/h1&gt;

&lt;p&gt;I’m sure you have heard about “SMART” goals. SMART stands for Specific, Measurable, Achievable, Relevant, Time-based. &lt;br&gt;
It’s true that your truth needs to be specific. It cannot just be “My company is successful.” You need to define exactly what success means to you as a company.&lt;/p&gt;

&lt;p&gt;Your truth should also be achievable and relevant, and it is by definition time-based, as it’s for your current year (or another period of your choice). &lt;/p&gt;

&lt;p&gt;But the difference lies in the fact that your truth should be aspirational above being measurable. It should be very easy to express, just a few words, very memorable. &lt;/p&gt;

&lt;p&gt;When we were asked this question, we hadn’t thought about it this way, but Michel - my co-founder - and I knew the answer instantaneously. &lt;/p&gt;

&lt;h1&gt;
  
  
  Our truth for 2021: “Airbyte just works”
&lt;/h1&gt;

&lt;p&gt;What came to our mind is that for the end of 2021, we envision that Airbyte just works. This is the feeling we want all our users to have. &lt;/p&gt;

&lt;p&gt;This includes reliability of the platform and all its connectors, whatever your infrastructure and the volume of data you need to replicate. But it also includes agnosticity for whatever connector needs you have, whatever data stack you opted for. Airbyte just works. &lt;br&gt;
Let’s go into more detail. &lt;/p&gt;

&lt;h2&gt;
  
  
  Whichever your data infrastructure
&lt;/h2&gt;

&lt;p&gt;This year, we will be focusing on integrating with the rest of the data stack, should it be for orchestration (Airflow, Dagster, Prefect, etc.), data quality (Great Expectations), cloud provider (GCP, AWS, Azure…), whatever the scale, which implies we must support multi-node. Until now, we’ve been focusing on single node setup. &lt;/p&gt;

&lt;h2&gt;
  
  
  Whatever your data volume
&lt;/h2&gt;

&lt;p&gt;We are constantly improving our connectors, and are even certifying them against a set of best practices that we will keep adding to. Data integration pipelines are a thousand-paper-cut problem. Each new user brings some new use cases that may or may not be supported yet. We will continuously grow the team in charge of building new connectors and strengthening existing ones. At the end of the year, we hope we will be able to support TB-level replication. &lt;/p&gt;

&lt;h2&gt;
  
  
  Whatever your connector needs
&lt;/h2&gt;

&lt;p&gt;We want to support at least 200 connectors by the end of 2021. And this will only be the beginning. We’re working on a low-code framework to make it easier to build and maintain connectors. 200 is obviously not enough to cover all connector needs, but hopefully, we will be at a point where the developer experience to build new connectors is so easy that the number of connectors won’t be perceived as limiting to address any use cases. &lt;/p&gt;

&lt;p&gt;On that matter, we will also be working to support Kafka, Spark and webhooks. &lt;/p&gt;

&lt;p&gt;--&lt;/p&gt;

&lt;p&gt;This is our truth for 2021. By the end of the year, whatever your use case, you will be able to set up Airbyte and start fulfilling your data integration needs in a matter of hours. We believe this is the only way to commoditize data integration.&lt;/p&gt;

&lt;h1&gt;
  
  
  How you can use the truth framework elsewhere
&lt;/h1&gt;

&lt;p&gt;A last note for this article. You can use the truth framework in other contexts. &lt;/p&gt;

&lt;p&gt;For instance, we see a lot of entrepreneurs making decisions based on the amount of equity they hope to keep and the valuation of the company they hope to reach. However, they fail to remember that startups are either a 0 (you failed to exit and you died), or a 1 (you exited, IPO’d or are profitable). Any consideration of equity and valuation should actually be multiplied by this 0 or 1. &lt;/p&gt;

&lt;p&gt;So as such, you need to consider if the decisions you make bring you closer to the 1. If you keep focusing on the 1 you will see that, in the long term, they were the right decisions to make, as having a bit more equity is not important if you end up building a successful company. &lt;/p&gt;

&lt;p&gt;What is your truth? Are any of the decisions you make taking you closer to it?&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How To Build a Slack Activity Dashboard With Open Source</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Wed, 03 Mar 2021 02:45:08 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-to-build-a-slack-activity-dashboard-with-open-source-3h53</link>
      <guid>https://dev.to/airbytehq/how-to-build-a-slack-activity-dashboard-with-open-source-3h53</guid>
      <description>&lt;h1&gt;
  
  
  Build a Slack Activity Dashboard
&lt;/h1&gt;

&lt;p&gt;This article will show how to use &lt;a href="http://airbyte.io" rel="noopener noreferrer"&gt;Airbyte&lt;/a&gt; - open-source data integration platform - and &lt;a href="https://superset.apache.org/" rel="noopener noreferrer"&gt;Apache Superset&lt;/a&gt; - open-source data exploration platform - in order to build a Slack activity dashboard showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total number of members of a Slack workspace&lt;/li&gt;
&lt;li&gt;The evolution of the number of Slack workspace members&lt;/li&gt;
&lt;li&gt;Evolution of weekly messages&lt;/li&gt;
&lt;li&gt;Evolution of messages per channel&lt;/li&gt;
&lt;li&gt;Members per time zone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before we get started, let’s take a high-level look at how we are going to achieve creating a Slack dashboard using Airbyte and Apache Superset.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We will use the Airbyte’s Slack connector to get the data off a Slack workspace (we will be using Airbyte’s own Slack workspace for this tutorial).&lt;/li&gt;
&lt;li&gt;We will save the data onto a PostgreSQL database.&lt;/li&gt;
&lt;li&gt;Finally, using Apache Superset, we will implement the various metrics we care about.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Got it? Now let’s get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Replicating Data from Slack to Postgres with Airbyte
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Deploying Airbyte
&lt;/h3&gt;

&lt;p&gt;There are several easy ways to deploy Airbyte, as listed &lt;a href="https://docs.airbyte.io/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. For this tutorial, I will just use the &lt;a href="https://docs.airbyte.io/deploying-airbyte/on-your-workstation" rel="noopener noreferrer"&gt;Docker Compose method&lt;/a&gt; from my workstation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# In your workstation terminal
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command will make the Airbyte app available on &lt;code&gt;localhost:8000&lt;/code&gt;. Visit the URL on your favorite browser, and you should see Airbyte’s dashboard (if this is your first time, you will be prompted to enter your email to get started). &lt;/p&gt;

&lt;p&gt;If you haven’t set Docker up, follow the &lt;a href="https://docs.docker.com/desktop/" rel="noopener noreferrer"&gt;instructions here&lt;/a&gt; to set it up on your machine. &lt;/p&gt;

&lt;h3&gt;
  
  
  b. Setting Up Airbyte’s Slack Source Connector
&lt;/h3&gt;

&lt;p&gt;Airbyte’s Slack connector will give us access to the data. So, we are going to kick things off by setting this connector to be our data source in Airbyte’s web app. I am assuming you already have Airbyte and Docker set up on your local machine. We will be using Docker to create our PostgreSQL database container later on.  &lt;/p&gt;

&lt;p&gt;Now, let’s proceed. If you already went through the onboarding, click on the “new source” button at the top right of the Sources section. If you're going through the onboarding, then follow the instructions. &lt;/p&gt;

&lt;p&gt;You will be requested to enter a name for the source you are about to create. You can call it “slack-source”. Then, in the Source Type combo box, look for “Slack,” and then select it. Airbyte will then present the configuration fields needed for the Slack connector. So you should be seeing something like this on the Airbyte App:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftva9ig7in31poxt5i46g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftva9ig7in31poxt5i46g.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first thing you will notice is that this connector requires a Slack token. So, we have to obtain one. If you are not a workspace admin, you will need to ask for permission.&lt;/p&gt;

&lt;p&gt;Let’s walk through how we would get the Slack token we need.&lt;/p&gt;

&lt;p&gt;Assuming you are a workspace admin, open the Slack workspace and navigate to [Workspace Name] &amp;gt; Administration &amp;gt; Customize [Workspace Name]. In our case, it will be Airbyte &amp;gt; Administration &amp;gt; Customize Airbyte (as shown below):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2j18st43u8owrtbce7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2j18st43u8owrtbce7m.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the new page that opens up in your browser, you will then need to navigate to &lt;strong&gt;Configure apps&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8gdbbxmp0vvr9m74s7c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8gdbbxmp0vvr9m74s7c.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the new window that opens up, click on &lt;strong&gt;Build&lt;/strong&gt; in the top right corner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcppn68gdxsu0b7vc8oc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcppn68gdxsu0b7vc8oc.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the &lt;strong&gt;Create an App&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdl9v5mlo4akn1zrz7o5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdl9v5mlo4akn1zrz7o5t.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the modal form that follows, give your app a name - you can name it &lt;code&gt;airbyte_superset&lt;/code&gt;, then select your workspace from the Development Slack Workspace.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8y06yxkkk1y8jo52fnhp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8y06yxkkk1y8jo52fnhp.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, click on the &lt;strong&gt;Create App&lt;/strong&gt; button. You will then be presented with a screen where we are going to set permissions for our &lt;code&gt;airbyte_superset&lt;/code&gt; app, by clicking on the &lt;strong&gt;Permissions&lt;/strong&gt; button on this page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiexrcspejhypqqgdmo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiexrcspejhypqqgdmo4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next screen, navigate to the scope section. Then, click on the &lt;strong&gt;Add an OAuth Scope&lt;/strong&gt; button. This will allow you to add permission scopes for your app. At a minimum, your app should have the following permission scopes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0r90fp5iodr93ezwqn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0r90fp5iodr93ezwqn4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, we are going to add our created app to the workspace by clicking the &lt;strong&gt;Install to Workspace&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hwc9eyy0d8b3ltj4uvw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hwc9eyy0d8b3ltj4uvw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Slack will prompt you that your app is requesting permission to access your workspace of choice. Click Allow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrlw9e4slwaeafxpg25f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrlw9e4slwaeafxpg25f.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the app has been successfully installed, you will be navigated to Slack’s dashboard, where you will see the Bot User OAuth Access Token. &lt;/p&gt;

&lt;p&gt;This is the token you will provide back on the Airbyte page, where we dropped off to obtain this token. So make sure to copy it and keep it in a safe place.&lt;/p&gt;

&lt;p&gt;Now that we are done with obtaining a Slack token, let’s go back to the Airbyte page we dropped off and add the token in there. &lt;/p&gt;

&lt;p&gt;We will also need to provide Airbyte with &lt;code&gt;start_date&lt;/code&gt;. This is the date from which we want Airbyte to start replicating data from the Slack API, and we define that in the format: &lt;code&gt;YYYY-MM-DDT00:00:00Z&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We will specify ours as &lt;code&gt;2020-09-01T00:00:00Z&lt;/code&gt;. We will also tell Airbyte to exclude archived channels and not include private channels, and also to join public channels, so the latter part of the form should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F109m4f9yrzs2aq0idbgv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F109m4f9yrzs2aq0idbgv.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, click on the &lt;strong&gt;Set up source&lt;/strong&gt; button for Airbyte to set the Slack source up.&lt;/p&gt;

&lt;p&gt;If the source was set up correctly, you will be taken to the destination section of Airbyte’s dashboard, where you will tell Airbyte where to store the replicated data. &lt;/p&gt;

&lt;h3&gt;
  
  
  c. Setting Up Airbyte’s Postgres Destination Connector
&lt;/h3&gt;

&lt;p&gt;For our use case, we will be using PostgreSQL as the destination.&lt;/p&gt;

&lt;p&gt;Click the &lt;strong&gt;add destination&lt;/strong&gt; button in the top right corner, then click on &lt;strong&gt;add a new destination&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc3d0tdgq6mubqxows3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc3d0tdgq6mubqxows3j.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next screen, Airbyte will validate the source, and then present you with a form to give your destination a name. We’ll call this destination slack-destination. Then, we will select the Postgres destination type. Your screen should look like this now:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09hjo8q13e85aruholt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09hjo8q13e85aruholt9.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great! We have a form to enter Postgres connection credentials, but we haven’t set up a Postgres database. Let’s do that!&lt;/p&gt;

&lt;p&gt;Since we already have Docker installed, we can spin off a Postgres container with the following command in our terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run --rm --name slack-db -e POSTGRES_PASSWORD=password -p 2000:5432 -d postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Note that the Docker compose file for Superset ships with a Postgres database, as you can see &lt;a href="https://github.com/apache/superset/blob/master/docker-compose.yml#L40" rel="noopener noreferrer"&gt;here&lt;/a&gt;). &lt;/p&gt;

&lt;p&gt;The above command will do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create a Postgres container with the name slack-db,&lt;/li&gt;
&lt;li&gt;set the password to password,&lt;/li&gt;
&lt;li&gt;expose the container’s port 5432, as our machine’s port 2000. &lt;/li&gt;
&lt;li&gt;create a database and a user, both called postgres. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this, we can go back to the Airbyte screen and supply the information needed. Your form should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rw0vkg7esq2yj3gevps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rw0vkg7esq2yj3gevps.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then click on the &lt;strong&gt;Set up destination&lt;/strong&gt; button. &lt;/p&gt;

&lt;h3&gt;
  
  
  d. Setting Up the Replication
&lt;/h3&gt;

&lt;p&gt;You should now see the following screen:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fziv73iozmbx06muxdftx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fziv73iozmbx06muxdftx.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Airbyte will then fetch the schema for the data coming from the Slack API for your workspace. You should leave all boxes checked and then choose the sync frequency - this is the interval in which Airbyte will sync the data coming from your workspace. Let’s set the sync interval to every 24 hours.&lt;/p&gt;

&lt;p&gt;Then click on the &lt;strong&gt;Set up connection&lt;/strong&gt; button. &lt;/p&gt;

&lt;p&gt;Airbyte will now take you to the destination dashboard, where you will see the destination you just set up. Click on it to see more details about this destination.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ajvat7mi7u105c2y608.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ajvat7mi7u105c2y608.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see Airbyte running the very first sync. Depending on the size of the data Airbyte is replicating, it might take a while before syncing is complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F327d398tp0ct8qwjg59p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F327d398tp0ct8qwjg59p.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When it’s done, you will see the &lt;strong&gt;Running status&lt;/strong&gt; change to &lt;strong&gt;Succeeded&lt;/strong&gt;, and the size of the data Airbyte replicated as well as the number of records being stored on the Postgres database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9gaeznihdt5bkhnntop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9gaeznihdt5bkhnntop.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To test if the sync worked, run the following in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec slack-source psql -U postgres -c "SELECT * FROM public.users;"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should output the rows in the users’ table.&lt;/p&gt;

&lt;p&gt;To get the count of the users’ table as well, you can also run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec slack-db psql -U postgres -c "SELECT count(*) FROM public.users;"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have the data from the Slack workspace in our Postgres destination, we will head on to creating the Slack dashboard with Apache Superset.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Setting Up Apache Superset for the Dashboards
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Installing Apache Superset
&lt;/h3&gt;

&lt;p&gt;Apache Superset, or simply Superset, is a modern data exploration and visualization platform. To get started using it, we will be cloning the Superset repo. Navigate to a destination in your terminal where you want to clone the Superset repo to and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/apache/superset.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s recommended to check out the latest branch of Superset, so run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd superset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git checkout latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Superset needs you to install and build its frontend dependencies and assets. So, we will start by installing the frontend dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: The above command assumes you have both Node and NPM installed on your machine.&lt;/p&gt;

&lt;p&gt;Finally, for the frontend, we will build the assets by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm run build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, go back up one directory into the Superset directory by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download the Docker images Superset needs and build containers and start services Superset needs to run locally on your machine. &lt;/p&gt;

&lt;p&gt;Once that’s done, you should be able to access Superset on your browser by visiting &lt;a href="http://localhost:8088" rel="noopener noreferrer"&gt;&lt;code&gt;http://localhost:8088&lt;/code&gt;&lt;/a&gt;, and you should be presented with the Superset login screen.&lt;/p&gt;

&lt;p&gt;Enter username: &lt;strong&gt;admin&lt;/strong&gt; and Password: &lt;strong&gt;admin&lt;/strong&gt; to be taken to your Superset dashboard.&lt;/p&gt;

&lt;p&gt;Great! You’ve got Superset set up. Now let’s tell Superset about our Postgres Database holding the Slack data from Airbyte.&lt;/p&gt;

&lt;h3&gt;
  
  
  b. Setting Up a Postgres Database in Superset
&lt;/h3&gt;

&lt;p&gt;To do this, on the top menu in your Superset dashboard, hover on the Data dropdown and click on &lt;strong&gt;Databases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmd3qazsvsmfmuwmx1isi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmd3qazsvsmfmuwmx1isi.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the page that opens up, click on the &lt;strong&gt;+ Database&lt;/strong&gt; button in the top right corner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4y6ohog8acqweze1ia3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4y6ohog8acqweze1ia3.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, you will be presented with a modal to add your Database Name and the connection URI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxxgzrwegjej0r2mvfsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxxgzrwegjej0r2mvfsd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s call our Database &lt;code&gt;slack_db&lt;/code&gt;, and then add the following URI as the connection URI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgresql://postgres:password@docker.for.mac.localhost:2000/postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are on a Windows Machine, yours will be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgresql://postgres:password@docker.for.win.localhost:2000/postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: We are using &lt;code&gt;docker.for.[mac|win].localhost&lt;/code&gt; in order to access the localhost of your machine, because using just localhost will point to the Docker container network and not your machine’s network.&lt;/p&gt;

&lt;p&gt;Your Superset UI should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5u6nkum3oga13548i9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5u6nkum3oga13548i9a.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will need to enable some settings on this connection. Click on the &lt;strong&gt;SQL LAB SETTINGS&lt;/strong&gt; and check the following boxes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyn8ygi1h5l2mmo7k6ys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyn8ygi1h5l2mmo7k6ys.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Afterwards, click on the &lt;strong&gt;ADD&lt;/strong&gt; button, and you will see your database on the data page of Superset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mdc97wbc9oa5e85k74j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mdc97wbc9oa5e85k74j.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  c. Importing our dataset
&lt;/h3&gt;

&lt;p&gt;Now that you’ve added the database, you will need to hover over the data menu again; now click on &lt;strong&gt;Datasets&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyiltjrm7zzjxht1ik5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyiltjrm7zzjxht1ik5d.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, you will be taken to the datasets page: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wmwpenkv8m7l8y10s3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wmwpenkv8m7l8y10s3n.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We want to only see the datasets that are in our &lt;code&gt;slack_db&lt;/code&gt; database, so in the Database that is currently showing All, select &lt;code&gt;slack_db&lt;/code&gt;  and you will see that we don’t have any datasets at the moment. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5v4kavct3g9vu4q7ct6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5v4kavct3g9vu4q7ct6e.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq84hyf8s7frtvohzm450.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq84hyf8s7frtvohzm450.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can fix this by clicking on the &lt;strong&gt;+ DATASET&lt;/strong&gt; button and adding the following datasets. &lt;/p&gt;

&lt;p&gt;Note: Make sure you select the public schema under the Schema dropdown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq84hyf8s7frtvohzm450.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq84hyf8s7frtvohzm450.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we have set up Superset and given it our Slack data, let’s proceed to creating the visualizations we need. &lt;/p&gt;

&lt;p&gt;Still remember them? Here they are again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total number of members of a Slack workspace&lt;/li&gt;
&lt;li&gt;The evolution of the number of Slack workspace members&lt;/li&gt;
&lt;li&gt;Evolution of weekly messages&lt;/li&gt;
&lt;li&gt;Evolution of weekly threads created&lt;/li&gt;
&lt;li&gt;Evolution of messages per channel&lt;/li&gt;
&lt;li&gt;Members per time zone&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Creating Our Dashboards with Superset
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Total number of members of a Slack workspace
&lt;/h3&gt;

&lt;p&gt;To get this, we will first click on the users’ dataset of our &lt;code&gt;slack_db&lt;/code&gt; on the Superset dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz54at04yh1vcjyss6g5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz54at04yh1vcjyss6g5v.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, change &lt;strong&gt;untitled&lt;/strong&gt; at the top to &lt;strong&gt;Number of Members&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmmlsy1imeq1j2nsp74k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmmlsy1imeq1j2nsp74k.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now change the &lt;strong&gt;Visualization Type&lt;/strong&gt; to &lt;strong&gt;Big Number,&lt;/strong&gt; remove the &lt;strong&gt;Time Range&lt;/strong&gt; filter, and add a Subheader named “Slack Members.” So your UI should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfevpgvzqwaqfa7ky2ui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfevpgvzqwaqfa7ky2ui.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, click on the &lt;strong&gt;RUN QUERY&lt;/strong&gt; button, and you should now see the total number of members.  &lt;/p&gt;

&lt;p&gt;Pretty cool, right? Now let’s save this chart by clicking on the &lt;strong&gt;SAVE&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffemjkbdprum4nv43p3tf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffemjkbdprum4nv43p3tf.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, in the &lt;strong&gt;ADD TO DASHBOARD&lt;/strong&gt; section, type in “Slack Dashboard”, click on the “Create Slack Dashboard” button, and then click the &lt;strong&gt;Save&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;Great! We have successfully created our first Chart, and we also created the Dashboard. Subsequently, we will be following this flow to add the other charts to the created Slack Dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  b. Casting the ts column
&lt;/h3&gt;

&lt;p&gt;Before we proceed with the rest of the charts for our dashboard, if you inspect the &lt;strong&gt;ts&lt;/strong&gt; column on either the &lt;strong&gt;messages&lt;/strong&gt; table or the &lt;strong&gt;threads&lt;/strong&gt; table, you will see it’s of the type &lt;code&gt;VARCHAR&lt;/code&gt;. We can’t really use this for our charts, so we have to cast both the &lt;strong&gt;messages&lt;/strong&gt; and &lt;strong&gt;threads&lt;/strong&gt;’ &lt;strong&gt;ts&lt;/strong&gt; column as &lt;code&gt;TIMESTAMP&lt;/code&gt;. Then, we can create our charts from the results of those queries. Let’s do this.&lt;/p&gt;

&lt;p&gt;First, navigate to the &lt;strong&gt;Data&lt;/strong&gt;  menu, and click on the &lt;strong&gt;Datasets&lt;/strong&gt; link. In the list of datasets, click the &lt;strong&gt;Edit&lt;/strong&gt; button for the &lt;strong&gt;messages&lt;/strong&gt; table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4njprueelrf3uyyqfgy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4njprueelrf3uyyqfgy.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’re now in the Edit Dataset view. Click the &lt;strong&gt;Lock&lt;/strong&gt; button to enable editing of the dataset. Then, navigate to the &lt;strong&gt;Columns&lt;/strong&gt; tab, expand the &lt;strong&gt;ts&lt;/strong&gt; dropdown, and then tick the &lt;strong&gt;Is Temporal&lt;/strong&gt; box. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal06cno1ea88f2h1m0is.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal06cno1ea88f2h1m0is.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Persist the changes by clicking the Save button.&lt;/p&gt;

&lt;h3&gt;
  
  
  c. The evolution of the number of Slack workspace members
&lt;/h3&gt;

&lt;p&gt;In the exploration page, let’s first get the chart showing the evolution of the number of Slack members. To do this, make your settings on this page match the screenshot below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfr1ydme0slsggsxhgrs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfr1ydme0slsggsxhgrs.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Save this chart onto the Slack Dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  d. Evolution of weekly messages posted
&lt;/h3&gt;

&lt;p&gt;Now, we will look at the evolution of weekly messages posted. Let’s configure the chart settings on the same page as the previous one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70gx3pfcq6ir2v78c2ci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70gx3pfcq6ir2v78c2ci.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Remember, your visualization will differ based on the data you have.&lt;/p&gt;

&lt;h3&gt;
  
  
  e. Evolution of weekly threads created
&lt;/h3&gt;

&lt;p&gt;Now, we are finished with creating the message chart. Let's go over to the thread chart. You will recall that we will need to cast the &lt;strong&gt;ts&lt;/strong&gt; column as stated earlier. So, do that and get to the exploration page, and make it match the screenshot below to achieve the required visualization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5amwtuqistvdue9xiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5amwtuqistvdue9xiy.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  f. Evolution of messages per channel
&lt;/h3&gt;

&lt;p&gt;For this visualization, we will need a more complex SQL query. Here’s the query we used (as you can see in the screenshot below):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT CAST(m.ts as TIMESTAMP), c.name, m.text
FROM public.messages m
INNER JOIN public.channels c
ON m.channel_id = c_id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqv2wqnpe802sep142lw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqv2wqnpe802sep142lw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, click on &lt;strong&gt;EXPLORE&lt;/strong&gt; to be taken to the exploration page; make it match the screenshot below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrl8taxfw9pfxo38i3y3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrl8taxfw9pfxo38i3y3.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Save this chart to the dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  g. Members per time zone
&lt;/h3&gt;

&lt;p&gt;Finally, we will be visualizing members per time zone. To do this, instead of casting in the SQL lab as we’ve previously done, we will explore another method to achieve casting by using Superset’s Virtual calculated column feature. This feature allows us to write SQL queries that customize the appearance and behavior of a specific column.&lt;/p&gt;

&lt;p&gt;For our use case, we will need the updated column of the users table to be a &lt;code&gt;TIMESTAMP&lt;/code&gt;, in order to perform the visualization we need for Members per time zone. Let’s start on clicking the edit icon on the users table in Superset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zabaduxu0om922vu00k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zabaduxu0om922vu00k.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will be presented with a modal like so:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuyf202rs6uzk8ooojrt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuyf202rs6uzk8ooojrt.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the &lt;strong&gt;CALCULATED COLUMNS&lt;/strong&gt; tab:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1ucs20lwc3oszydzp0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1ucs20lwc3oszydzp0t.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, click on the &lt;strong&gt;+ ADD ITEM&lt;/strong&gt; button, and make your settings match the screenshot below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw77yh3duw8absnwr54u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw77yh3duw8absnwr54u.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, go to the &lt;strong&gt;exploration&lt;/strong&gt; page and make it match the settings below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhaiog04fjksl63ugelfd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhaiog04fjksl63ugelfd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now save this last chart, and head over to your Slack Dashboard. It should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jfmit1wx79434khmmxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jfmit1wx79434khmmxl.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Of course, you can edit how the dashboard looks to fit what you want on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we looked at using Airbyte’s Slack connector to get the data from a Slack workspace into a Postgres database, and then used Apache Superset to craft a dashboard of visualizations.If you have any questions about Airbyte, don’t hesitate to ask questions on our &lt;a href="https://slack.airbyte.io" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;! If you have questions about Superset, you can join the &lt;a href="https://superset.apache.org/community/" rel="noopener noreferrer"&gt;Superset Community Slack&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>analytics</category>
    </item>
    <item>
      <title>How to Save and Search Your Slack History on a Free Slack Plan</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Wed, 24 Feb 2021 18:01:24 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-to-save-and-search-your-slack-history-on-a-free-slack-plan-m3d</link>
      <guid>https://dev.to/airbytehq/how-to-save-and-search-your-slack-history-on-a-free-slack-plan-m3d</guid>
      <description>&lt;p&gt;The &lt;a href="https://slack.com/intl/en-nc/pricing/paid-vs-free"&gt;Slack free tier&lt;/a&gt; saves only the last 10K messages. For social Slack instances, it may be impractical to upgrade to a paid plan to retain these messages. Similarly, for an open-source project like &lt;a href="https://airbyte.io/"&gt;Airbyte&lt;/a&gt; where we interact with our community through a public Slack instance, the cost of paying for a seat for every Slack member is prohibitive.&lt;/p&gt;

&lt;p&gt;However, searching through old messages can be really helpful. Losing that history feels like some advanced form of memory loss. What was that joke about Java 8 Streams? This contributor question sounds familiar—haven't we seen it before? But you just can't remember!&lt;/p&gt;

&lt;p&gt;This tutorial will show you how you can, for free, use Airbyte to save these messages (even after Slack removes access to them). It will also provide you a convenient way to search through them.&lt;/p&gt;

&lt;p&gt;Specifically, we will export messages from your Slack instance into an open-source search engine called &lt;a href="https://github.com/meilisearch/meilisearch"&gt;MeiliSearch&lt;/a&gt;. We will be focusing on getting this setup running from your local workstation. We will mention at the end how you can set up a more productionized version of this pipeline.&lt;/p&gt;

&lt;p&gt;We want to make this process easy, so while we will link to some external documentation for further exploration, we will provide all the instructions you need here to get this up and running.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Set Up MeiliSearch
&lt;/h1&gt;

&lt;p&gt;First, let's get MeiliSearch running on our workstation. MeiliSearch has extensive docs for &lt;a href="https://docs.meilisearch.com/reference/features/installation.html#download-and-launch"&gt;getting started&lt;/a&gt;. For this tutorial, however, we will give you all the instructions you need to set up MeiliSearch using Docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 7700:7700 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/data.ms:/data.ms &lt;span class="se"&gt;\&lt;/span&gt;
  getmeili/meilisearch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it!&lt;br&gt;
MeiliSearch stores data in $(pwd)/data.ms, so if you prefer to store it somewhere else, just adjust this path.&lt;/p&gt;
&lt;h1&gt;
  
  
  2. How To Replicate Your Slack Messages to MeiliSearch
&lt;/h1&gt;
&lt;h2&gt;
  
  
  a. Set Up Airbyte
&lt;/h2&gt;

&lt;p&gt;Make sure you have Docker and Docker Compose installed. If you haven’t set Docker up, follow the &lt;a href="https://docs.docker.com/desktop/"&gt;instructions here&lt;/a&gt; to set it up on your machine. Then, run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/airbytehq/airbyte.git
&lt;span class="nb"&gt;cd &lt;/span&gt;airbyte
docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run into any problems, feel free to check out our more extensive &lt;a href="https://docs.airbyte.io/getting-started"&gt;getting started&lt;/a&gt; for more help.&lt;/p&gt;

&lt;p&gt;Once you see an Airbyte banner, the UI is ready to go at &lt;a href="http://localhost:8000/"&gt;http://localhost:8000/&lt;/a&gt;. Once you have set your user preferences, you will be brought to a page that asks you to set up a source. In the next step, we'll go over how to do that.&lt;/p&gt;

&lt;h2&gt;
  
  
  b. Set Up Airbyte’s Slack Source Connector
&lt;/h2&gt;

&lt;p&gt;In the Airbyte UI, select Slack from the dropdown. We provide step-by-step instructions for setting up the Slack source in Airbyte &lt;a href="https://docs.airbyte.io/integrations/sources/slack#setup-guide"&gt;here&lt;/a&gt;. These will walk you through how to complete the form on this page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yc7LehAC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mv03xpwjegn8s98d3tne.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yc7LehAC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mv03xpwjegn8s98d3tne.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By the end of these instructions, you should have created a Slack source in the Airbyte UI. For now, just add your Slack app to a single public channel (you can add it to more channels later). Only messages from that channel will be replicated.&lt;/p&gt;

&lt;p&gt;The Airbyte app will now prompt you to set up a destination. Next, we will walk through how to set up MeiliSearch.&lt;/p&gt;

&lt;h2&gt;
  
  
  c. Set Up Airbyte’s MeiliSearch Destination Connector
&lt;/h2&gt;

&lt;p&gt;Head back to the Airbyte UI. It should still be prompting you to set up a destination. Select "MeiliSearch" from the dropdown. For the host field, set: &lt;a href="http://localhost:7700"&gt;http://localhost:7700&lt;/a&gt;. The api_key can be left blank.&lt;/p&gt;

&lt;h2&gt;
  
  
  d. Set Up the Replication
&lt;/h2&gt;

&lt;p&gt;On the next page, you will be asked to select which streams of data you'd like to replicate. We recommend unchecking "files" and "remote files" since you won't really be able to search them easily in this search engine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WJMd_GPQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oax8vn55frbs3nv4ocrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WJMd_GPQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oax8vn55frbs3nv4ocrf.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For frequency, we recommend every 24 hours.&lt;/p&gt;

&lt;h1&gt;
  
  
  3. Search MeiliSearch
&lt;/h1&gt;

&lt;p&gt;After the connection has been saved, Airbyte should start replicating the data immediately. When it completes you should see the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_14yw0bd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vyrree2xstb7hprjtsul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_14yw0bd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vyrree2xstb7hprjtsul.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the sync is done, you can sanity check that this is all working by making a search request to MeiliSearch. Replication can take several minutes depending on the size of your Slack instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s1"&gt;'http://localhost:7700/indexes/messages/search'&lt;/span&gt; &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{ "q": "&amp;lt;search-term&amp;gt;" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, I have the following message in one of the messages that I replicated: "welcome to airbyte".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s1"&gt;'http://localhost:7700/indexes/messages/search'&lt;/span&gt; &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{ "q": "welcome to" }'&lt;/span&gt;
&lt;span class="c"&gt;# =&amp;gt; {"hits":[{"_ab_pk":"7ff9a858_6959_45e7_ad6b_16f9e0e91098","channel_id":"C01M2UUP87P","client_msg_id":"77022f01-3846-4b9d-a6d3-120a26b2c2ac","type":"message","text":"welcome to airbyte.","user":"U01AS8LGX41","ts":"2021-02-05T17:26:01.000000Z","team":"T01AB4DDR2N","blocks":[{"type":"rich_text"}],"file_ids":[],"thread_ts":"1612545961.000800"}],"offset":0,"limit":20,"nbHits":2,"exhaustiveNbHits":false,"processingTimeMs":21,"query":"test-72"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  4. Search via a UI
&lt;/h1&gt;

&lt;p&gt;Making curl requests to search your Slack History is a little clunky, so we have modified the example UI that MeiliSearch provides in &lt;a href="https://docs.meilisearch.com/learn/tutorials/getting_started.html#integrate-with-your-project"&gt;their docs&lt;/a&gt; to search through the Slack results.&lt;br&gt;
Download (or copy and paste) this &lt;a href="https://github.com/airbytehq/airbyte/blob/master/docs/tutorials/slack-history/index.html"&gt;html file&lt;/a&gt; to your workstation. Then, open it using a browser. You should now be able to write search terms in the search bar and get results instantly!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kT12htXP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7qavlp1zkkjqew3za1qf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kT12htXP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7qavlp1zkkjqew3za1qf.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  5. "Productionizing" Saving Slack History
&lt;/h1&gt;

&lt;p&gt;You can find instructions for how to host Airbyte on various cloud platforms &lt;a href="https://docs.airbyte.io/deploying-airbyte"&gt;here&lt;/a&gt;.&lt;br&gt;
Documentation on how to host MeiliSearch on cloud platforms can be found &lt;a href="https://docs.meilisearch.com/create/how_to/running_production.html#a-quick-introduction"&gt;here&lt;/a&gt;.&lt;br&gt;
If you want to use the UI mentioned in the section above, we recommend statically hosting it on S3, GCS, or equivalent.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How Open-source Can Disrupt Build vs. Buy Considerations</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Fri, 22 Jan 2021 02:29:16 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-open-source-can-disrupt-build-vs-buy-considerations-4ff5</link>
      <guid>https://dev.to/airbytehq/how-open-source-can-disrupt-build-vs-buy-considerations-4ff5</guid>
      <description>&lt;p&gt;When you’re selling or considering purchasing a B2B tool, you need to understand the build vs. buy argument. What are the pros and cons of building the tool internally vs. buying the tool from a third-party vendor? This is especially true in big companies where you have the resources to build the said tools. Early-stage startups will generally opt for the faster route, going with self-served B2B tools -- unless the pricing is prohibitive.&lt;/p&gt;

&lt;p&gt;But something we don’t often think about is how open-source just messes the whole thing up. The build is completely redefined. You now need to compare the B2B tool with the build without the open-source tool, as well as with the open-source tool, which most often lowers the barrier significantly. &lt;br&gt;
In this article, we’ll take the example of the ETL/ELT industry. We know it best, as we’re building Airbyte, the open-source ELT alternative. Let’s see how open-source for ETL / ELT with Airbyte is also flipping the previous Build vs. Buy balance on its head. &lt;/p&gt;

&lt;p&gt;We’ve produced an infographic to illustrate that point. You will see that without taking Airbyte into consideration, the build vs. buy was pretty useful with Fivetran, in contrast to building connectors yourself. But now, with Airbyte, you can either just use the open-sourced connectors and start replicating data in minutes for free, or even build new connectors (if ever Airbyte doesn’t support them) in a matter of days (vs. months before) with maintenance being crowdsourced throughout the Airbyte community. &lt;/p&gt;

&lt;h1&gt;
  
  
  The Infographic
&lt;/h1&gt;

&lt;p&gt;Here is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in white, the original “build” scenario; &lt;/li&gt;
&lt;li&gt;in blue, the original "buy" scenario with cloud-based Fivetran; &lt;/li&gt;
&lt;li&gt;in purple, the new "build" scenario with 2 options: “build non-supported connector with Airbyte” in light purple, and “use prebuilt connectors from Airbyte” in dark purple&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tHj5c-Es--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ahhvg0iy5ulpuq3fpybf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tHj5c-Es--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ahhvg0iy5ulpuq3fpybf.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s just say it: &lt;em&gt;the playing field has changed&lt;/em&gt;!&lt;/p&gt;

&lt;h1&gt;
  
  
  The Explanation
&lt;/h1&gt;

&lt;p&gt;Some context: the average business today uses well &lt;a href="https://www.wsj.com/articles/employees-are-accessing-more-and-more-business-apps-study-finds-11549580017"&gt;over 100 software apps&lt;/a&gt;, many of which contain valuable insights about an organization’s operations. Your company is likely on the way to using just as many apps, if not more, and you’ll need a solution to integrate all of the data your apps produce. &lt;/p&gt;

&lt;h2&gt;
  
  
  Time &amp;amp; Effort
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qQiA5JZw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bpyacwe95b99oudhqkp9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qQiA5JZw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bpyacwe95b99oudhqkp9.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building your own pipeline by yourself is a significant time commitment. It can take between 3-6 months to set up a basic pipeline. Furthermore, beyond the time commitment, there is some inherent complexity in building a reliable, high-performance ELT pipeline. You need to: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Obtain developer access to the data source&lt;/li&gt;
&lt;li&gt;Explore the data&lt;/li&gt;
&lt;li&gt;Design the schema/data models&lt;/li&gt;
&lt;li&gt;Set up a connector framework&lt;/li&gt;
&lt;li&gt;Test the connector and validate the data&lt;/li&gt;
&lt;li&gt;Set up orchestration, configuration validation, state management, normalization, schema migration, monitoring, etc. 
&lt;strong&gt;7. Maintain the connector&lt;/strong&gt; for every schema change that happens every few weeks. This part is very cumbersome, as it requires an increasing number of data engineers to manage your connectors. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In contrast, an off-the-shelf solution such as Fivetran can be set up in a matter of minutes with prebuilt connectors. Airbyte also takes literally 30 seconds to deploy, and you can start replicating data &lt;a href="https://www.youtube.com/watch?v=jWVYpUV9vEg"&gt;within 2 minutes&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The big difference between both options in terms of time and effort is that &lt;strong&gt;all the Fivetran customers we talked to also had to build and maintain connectors on the side&lt;/strong&gt;, as the connectors they needed were either not supported in the way they needed or not supported at all by Fivetran.&lt;/p&gt;

&lt;p&gt;That’s where the option to build with Airbyte comes in. For connectors not supported by Airbyte, it is a matter of hours to build connectors. Indeed, Airbyte already took care of having a UI, monitoring, scheduling, orchestration, integration with your data stack, automatic schema changes, etc. There is a very high chance we support your destination. So in the end, it’s only the EL part of the source connector you have to build, and Airbyte is providing some abstractions to make that easier. &lt;/p&gt;

&lt;p&gt;Regarding maintenance, the goal of Airbyte is to crowdsource throughout the community. When a connector fails because of significant API changes, it will notify the connectors’ users. As soon as the fix is made available by the Airbyte team or a community member, Airbyte will propagate the fix to all the users. The hope is that this approach will provide a better SLA than closed-source solutions such as Fivetran, not to mention the fact that you won’t have to maintain the connector yourself. &lt;/p&gt;

&lt;h2&gt;
  
  
  People &amp;amp; Money
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2wdfdt6t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zfor10c35hjlp3zkwen2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2wdfdt6t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zfor10c35hjlp3zkwen2.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From what we’ve seen, a typical company requires the equivalent of at least two or three full-time data engineers to build and maintain a data pipeline. The total cost of three full-time engineers can reach the high six figures (including benefits). So that’s a lot!&lt;/p&gt;

&lt;p&gt;Fivetran’s fees for a typical mid-sized company with five connectors is about $50,000. But you’ll have to add to that cost all the connectors you need to build and maintain by yourself. &lt;/p&gt;

&lt;p&gt;In contrast, Airbyte’s connectors are open-sourced, so you can use them for free. You also don’t need to pay for the egress to Fivetran’s infrastructure. It is possible that you might need a little bit of engineering time to operate Airbyte. If you need to build some of the connectors yourself, you will have to pay for the time spent by the data engineering team on building and maintaining them, but that would still be way less than if you had to do everything yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Opportunity Costs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FZSwJaD6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ik0meuskl6og7wjzxjbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FZSwJaD6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ik0meuskl6og7wjzxjbo.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The actual value brought by your data team is through analysis and modeling. All the data integration, cleaning and transformation is important, as they enable the analysis and modeling. So the more time your team can spend on value-producing tasks, the better for the business. &lt;br&gt;
So opportunity costs as depicted in the illustration are very important to consider. Plus, ask any data team -- they will much prefer doing analysis or modeling tasks, rather than pipelining! So you will have better talent retention this way.&lt;/p&gt;




&lt;p&gt;Now you can see how open-source can flip the previous build vs. buy balance on its head. Before Airbyte, Fivetran was an easy sell. Now, it seems the contrary. Leveraging Airbyte’s open-source technology to build your own data infrastructure seems the obvious choice. &lt;/p&gt;

&lt;p&gt;There is one last thing to consider when choosing which direction to take: the future. &lt;/p&gt;

&lt;h2&gt;
  
  
  Future Growth of Your Company
&lt;/h2&gt;

&lt;p&gt;As your company grows, you will add data sources to the pool. The complexity and effort of building and maintaining a data pipeline for a huge number of data sources can quickly escalate beyond your data engineering team’s ability to handle it. &lt;/p&gt;

&lt;p&gt;You might consider taking a chance on Fivetran’s ability to cover all or most of your connector needs, so that your team doesn’t need to build and maintain a continually increasing number of connectors (that would defeat the purpose). But, be mindful that Fivetran will always have a ROI consideration to maintaining connectors on the long tail; they won’t maintain connectors that don’t bring enough revenue to offset the maintenance costs. &lt;/p&gt;

&lt;p&gt;On the other hand, Airbyte will continue to grow the number of prebuilt community-maintained connectors, and can even take a large portion of the maintenance costs off your hands. &lt;br&gt;
When making a decision, consider how your company will evolve. And you can be sure that a great data infrastructure that grows with you will be a competitive advantage.&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>How We Leveraged Singer for Our MVP</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Mon, 30 Nov 2020 20:28:00 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-we-leveraged-singer-for-our-mvp-4jbj</link>
      <guid>https://dev.to/airbytehq/how-we-leveraged-singer-for-our-mvp-4jbj</guid>
      <description>&lt;p&gt;One of the (many) hard things about doing a startup is figuring out what that MVP should be. You are trading off between presenting something that is “good” enough that it gets people excited to use (or invest in) you and getting something done fast. In this article, we explore how we wrestled with this trade-off. Specifically, we explore our decisions around how to use Singer to bootstrap our MVP. It is something we get tons of questions about, and it was hard for us to figure out ourselves!&lt;/p&gt;

&lt;p&gt;When we set out to create an MVP for our data integration project, we began with this prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create an OSS data integration project that includes all of Singer’s major features. In addition, it should have a UI that can be used by non-technical users and has production-grade job scheduling and tracking. &lt;/li&gt;
&lt;li&gt;Do it in a month. &lt;/li&gt;
&lt;li&gt;Use Singer to bootstrap it. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We knew from the start that in the long run, we did not want Singer to be core to the working of our platform. In the short term, however, we wanted to be able to bootstrap our integration ecosystem off of Singer’s existing taps and targets. So should we make Singer part of our core platform in the beginning to bootstrap? And if so, at what cost?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--d5FT7zrG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/faal0jbi5wceqxnjphqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--d5FT7zrG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/faal0jbi5wceqxnjphqp.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This picture shows the spectrum of options we considered, from wrapping a UI around Singer and relying entirely on it as our backend to shooting for our original goal of Singer as a peripheral.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Thin UI wrapper around Singer
&lt;/h1&gt;

&lt;p&gt;This felt like the “startup-y” option. We could throw Singer, a database, and a UI in a Docker container and have “something” up and running in, perhaps, days. We never tried to go with this approach because we were able to see some really big trade-offs.&lt;/p&gt;

&lt;p&gt;Pros &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Just a few days in terms of amount of work needed&lt;/li&gt;
&lt;li&gt;No new code for each integration, just use Singer’s.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pretty much all throw-away code after the initial release.&lt;/li&gt;
&lt;li&gt;Because Singer taps / targets don’t declare their configurations (more on this later), there would be no way in the UI to tell the user what values they needed to provide in order to configure a source. We would only be able to accept a big json blob.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KKC9h1qR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/pw534dcxj15g70vos207.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KKC9h1qR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/pw534dcxj15g70vos207.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While we were going for an MVP, we did not think we would be able to get anyone interested in the first iteration. We also knew that subsequent iterations would be painful, since we would be effectively starting from scratch because the initial iteration was not a sturdy building block. We skipped this approach.&lt;/p&gt;

&lt;h1&gt;
  
  
  2. Airbyte integration configurations
&lt;/h1&gt;

&lt;p&gt;Given that we wanted to provide a UI experience that was accessible to non-data engineers, our next step was to figure out how we could make it easy to configure integrations in the UI. This meant we had to build our own configuration abstraction for integrations, because this is something that Singer does not provide (we go into more depth on this feature in the &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/why-you-should-not-build-your-data-pipeline-on-top-of-singer/"&gt;first article&lt;/a&gt; in this series). &lt;/p&gt;

&lt;p&gt;This abstraction was basically a way for each integration to declare what information it needed in order to be configured. For example, a Postgres source might need a hostname, port, etc. This layer made it possible for the UI to display user-friendly forms for setting up integrations. With this approach, we could still rely on Singer as the “backend” for the platform, but we could provide a better configuration experience for the user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--87zlSTzv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/x07j6o1cdjy4mjuvq8fd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--87zlSTzv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/x07j6o1cdjy4mjuvq8fd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In order to implement this layer, we created a standardized way to declare information about an integration and how to configure it in a JsonSchema object. When someone selects an integration in the UI, it will render a form based on that JsonSchema. The user would then provide the needed information and pass it directly to the backend.&lt;/p&gt;

&lt;p&gt;This is ultimately where we started out. And everything was good for about a week…&lt;/p&gt;

&lt;h1&gt;
  
  
  3. Dockerize Singer integrations
&lt;/h1&gt;

&lt;p&gt;Up until this point, the only thing we had to do per integration was write a JsonSchema object that declared the configuration inputs for an integration. But what if we want the form in the UI to display different fields than those that Singer taps / targets consume?&lt;/p&gt;

&lt;p&gt;The first case we ran into was in the Postgres Singer tap. That tap takes in a field called a “filter_dbs” field. This attribute restricts which databases the tap scans when being run in “discover” mode. The tap also takes in a field called ”database,” which is the name of the database from which data will be replicated. In our use case, we wanted “filter_dbs” to be populated with only a single entry, the value that the user had provided for “database.”&lt;/p&gt;

&lt;p&gt;In order to hide filter_dbs from the UI, but still populate it behind the scenes, we were going to need to write some special code that executed only when the Postgres Tap ran. But where was that code going to run? The abstraction we had was that our core platform just assumed that all integration-specific code was bundled in the Singer Tap. So we were either going to need to insert this integration-specific code into our core platform or restructure our abstraction so that we could run custom integration code that was not packaged as part of Singer.&lt;/p&gt;

&lt;p&gt;Again, we already had a rough idea of what we wanted this to look like in the long term. We imagined each integration running entirely in its own Docker container. Airbyte would handle passing messages from the container running the source to the container running the destination. We had hoped we could get to MVP without it, but ultimately, when we hit this issue, it tipped us over the edge. So we traded some time to figure out how to package Singer taps and targets into Docker containers that made it easy for us to mediate all of the interactions between the core platform and the integration running in the container.&lt;/p&gt;

&lt;h1&gt;
  
  
  4. Use the Airbyte protocol instead of the Singer protocol
&lt;/h1&gt;

&lt;p&gt;Now fast forward another couple weeks: we are on the night before we plan to do our first public launch, and nothing is working. We have 3 sources and 3 destinations, and not one of them can work with all of the others. &lt;/p&gt;

&lt;p&gt;The issue was two-fold: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We ran into &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/why-you-should-not-build-your-data-pipeline-on-top-of-singer/"&gt;inconsistencies in the Singer protocol&lt;/a&gt; that made it hard to treat all Singer Taps and Targets the same way programmatically.&lt;/li&gt;
&lt;li&gt;In falling back on Singer to handle our “backend,” there were implementation details in the way Singer worked that were incompatible with the product we wanted to build.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We won’t spend a ton of time discussing these issues, because we’ve already written about them &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/why-you-should-not-build-your-data-pipeline-on-top-of-singer/"&gt;here&lt;/a&gt;. So let’s just say we hit a point where we realized that we either needed to become the world’s foremost experts on the Singer protocol or focus on defining our own protocol. Since the latter already aligned with our long-term vision, we went in that direction. &lt;/p&gt;

&lt;p&gt;Ultimately, we tore out our hair and got through that night, and then for our next release we introduced our own &lt;a href="https://docs.airbyte.io/architecture/airbyte-specification"&gt;protocol&lt;/a&gt;. Even at our early stage, this was an expensive endeavour. It took one-ish engineers over a week to migrate us from the Singer protocol to our own (this felt like eons to us!).&lt;/p&gt;

&lt;h1&gt;
  
  
  Did we do it right?
&lt;/h1&gt;

&lt;p&gt;Obviously, this question is impossible to answer. After reading this article, you might have come to the conclusion that we should have built the first version of our product with Singer at the periphery of our system. And had we done that, we could have skipped the iteration of moving Singer from within our core system to the outskirts. I wouldn’t begrudge you that conclusion!&lt;/p&gt;

&lt;p&gt;Had we taken that approach, however, we would have delayed our initial release by an additional month (double time to MVP!). Getting something out early was valuable, because it gave us early feedback that what we were building was interesting to people. We made trade- offs to move fast, but still work from a base that we could iterate on quickly--pretty much the classic trade-off you think about when trying to launch an MVP. And, ultimately, we can’t draw any hard and fast rules other than to use your own judgment!&lt;/p&gt;

&lt;p&gt;The unexpected insight that we came away with, however, was that this approach allows us to learn a lot from Singer. Even having Singer be part of the core system for just a few weeks, we got a really good understanding of why they had solved certain issues the way they did. &lt;/p&gt;

&lt;p&gt;For example, when we first encountered the Singer Catalog, the use of a breadcrumb system to map metadata onto a schema felt unintuitive and needlessly complicated. The metadata and the schema were in the same parent object, so why did we need this complex system of having the metadata fields index into the schema? Couldn’t they be combined? After using it closely for a few weeks, we understood the complexities that come with configuring special behavior at a field level for deeply nested schemas. Had we gone our own way from the start, we would have learned this lesson much later (and the later we learned it, the harder it would have been to remedy). &lt;/p&gt;

&lt;p&gt;Building on top of Singer in the beginning forced us into a &lt;a href="https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence"&gt;Chesterton’s Fence&lt;/a&gt; situation. Each time we wanted to do something a certain way, because we thought Singer’s approach didn’t make sense, we were forced to fully understand why Singer had done things the way it did. By doing so, we avoided mistakes we would otherwise have made. We also were able to make decisions different from Singer’s while still benefiting from its experience. All in all, we feel we made the right choice. What do you think?&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why You Should NOT Build Your Data Pipeline on Top of Singer</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Mon, 30 Nov 2020 20:26:59 +0000</pubDate>
      <link>https://dev.to/airbytehq/why-you-should-not-build-your-data-pipeline-on-top-of-singer-3ei9</link>
      <guid>https://dev.to/airbytehq/why-you-should-not-build-your-data-pipeline-on-top-of-singer-3ei9</guid>
      <description>&lt;p&gt;&lt;a href="//singer.io"&gt;Singer.io&lt;/a&gt; is an open-source CLI tool that makes it easy to pipe data from one tool to another. At &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt;, we spent time determining if we could leverage Singer to programmatically send data from any of their supported data sources (taps) to any of their supported data destinations (targets).&lt;/p&gt;

&lt;p&gt;For the sake of this article, let’s say we are trying to build a tool that can do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run any Singer tap or target &lt;/li&gt;
&lt;li&gt;Provide a UI for configuring and running those taps and targets&lt;/li&gt;
&lt;li&gt;Count the number of records synced in each run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the context of these goals, being able to use Singer programmatically means writing a program that can, for any integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provide a UI with instructions on what information a user needs to input in order to configure that integration (e.g., host, password, etc).&lt;/li&gt;
&lt;li&gt;take those user-provided values and execute each integration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We know that the described requirements are not the use case that Singer sets out to solve, but nonetheless, we wanted to see if we could leverage Singer to bootstrap building out this case. Sure enough, we ran into some “gotchas” along the way. These gotchas illustrate some of the core primitives that a programmatic data integration tool requires.&lt;/p&gt;

&lt;h1&gt;
  
  
  Integrations do not declare their configurations
&lt;/h1&gt;

&lt;p&gt;The Singer protocol does not &lt;a href="https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md#config"&gt;specify how an integration should define&lt;/a&gt; what inputs it requires. This means that, in order to use most Singer taps, you need to scour the entire implementation to figure out what properties it uses; depending on the complexity of the integration, this can be pretty painful.&lt;/p&gt;

&lt;p&gt;Some integrations help out by specifying what the configuration should look like in a &lt;a href="https://github.com/singer-io/tap-stripe"&gt;readme&lt;/a&gt; or in a &lt;a href="https://github.com/singer-io/tap-hubspot/blob/master/config.sample.json"&gt;sample config&lt;/a&gt;. Even these lead to headaches. They often just list the fields that need to be passed in but do not explain what they mean, what their format is, or how to find them (good luck trying to find all the information you need to configure your Google Ads integration!). In other cases, they only list a subset, and then you have to discover the rest by reading the integration (e.g., &lt;a href="https://github.com/singer-io/tap-salesforce"&gt;tap-salesforce&lt;/a&gt; doesn’t mention is_sandbox in the docs UPDATE: someone has now added this field in the readme with this &lt;a href="https://github.com/singer-io/tap-salesforce/commit/d21bbea93471c485c4adddfdfb9ffb3e157cc45e"&gt;PR&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;These taps are great; we have happily used all of them, but because they do not specify what is required to configure them, they can’t be used programmatically. Specifically, our program needs to know that for the Postgres tap it requires the field’s hostname and port. Without this specification, the program cannot figure out how to build a valid configuration for an integration. This configuration is expensive to shim, because it requires engineering work for every single integration!&lt;/p&gt;

&lt;h1&gt;
  
  
  No way to tell which Singer feature is compatible with which integration
&lt;/h1&gt;

&lt;p&gt;Singer has excellent &lt;a href="https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md#singer-specification"&gt;documentation&lt;/a&gt; around its core protocol. It also does a nice job defining the suite of special metadata that it supports. When you start actually using Singer, however, mapping these primitives onto your integrations is difficult. For example, “replication-method” sets whether all the data from the source should be replicated (“full_table”) or just the new or updated data (“incremental”). What is unclear is which taps actually support “incremental” or “full_table” or both. &lt;/p&gt;

&lt;p&gt;Taps do not advertise, in a way that is programmatically consumable, which of these replication methods they support. Some of them mention it in their documentation, but ultimately that’s insufficient for the type of tool we want to build. So what happens when you request “incremental” from a source that only supports “full_table”? The behavior is undefined. Some taps will throw an error, some will just do a full refresh. Either way, from the point of view of the UI-based tool that we are trying to build, this isn’t really usable.&lt;/p&gt;

&lt;p&gt;The problem only gets hairier for some of the more niche metadata as well (e.g., “view-key-properties”). You either need to read the source or just try it out and see if the configuration works. This problem is adjacent to the configuration problem described in the previous section, and, similarly, requires a shim for every integration.&lt;/p&gt;

&lt;h1&gt;
  
  
  Singer’s own secret menu
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PjdV1oVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/l1hnjztiiwz9qadrsfus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PjdV1oVi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/l1hnjztiiwz9qadrsfus.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re from the West coast, you might be familiar with how In-N-Out Burger &lt;a href="https://www.eater.com/2015/4/13/8382523/secret-menus-in-n-out-fast-food-burger-animal-style"&gt;popularized the “secret” menu in fast food chains&lt;/a&gt;. While charming at a drive thru, secret menus can ruin your data integration.&lt;/p&gt;

&lt;p&gt;The Singer protocol has some of its own secret menu items. For example, we were parsing each message that a tap output into JSON using the declared schema in the Singer docs. We were trying to understand really well what messages were being sent between taps and targets, so we would fail loudly if anything was sent that did not match the documented message types. Then we started getting errors on “ActivateVersionMessage.” After spelunking in the source code for a bit, we found that this message type has existed in Singer as an experimental feature since 2017. A handful of the official Singer taps use it, but there’s no guidance on what you’re supposed to do with it (I suspect it is a feature used internally at Stitch--the paid, managed solution from the creators of Singer). If you’re building something programmatic on top of Singer, your choice is to just filter it out or let it pass and hope that stuff…just works, I guess? &lt;/p&gt;

&lt;p&gt;Handling this one case is not the end of the world, but it leaves you feeling uncertain what else is lurking in the protocol that might not play well with your system.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;So to answer our original question, can we reasonably stretch the Singer to meet our product requirements? The answer is no. Doing so would require writing custom shims for every single Singer tap and target. Since the goal with data integrations is always to scale to more integrations, having to do any work on them per integration is very expensive.&lt;/p&gt;

&lt;p&gt;The Singer protocol is underspecified for this use case. This realization makes sense, because ultimately this is not the use case for which the protocol is trying to solve. Achieving these requirements depends on integrations declaring much more information about how they are configured and which features they support. We are tackling this problem at Airbyte, so if you are looking for an OSS solution that makes it easy to move your data into a warehouse, instead of trying to roll your own on top of Singer, come check us out!&lt;/p&gt;

&lt;p&gt;This article is meant to be the first in a pair of articles. The &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/how-we-leveraged-singer-for-our-mvp/"&gt;second&lt;/a&gt; will explore the engineering journey that we took to figure out where Singer should fit into our system.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Build Thousands of Connectors</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Wed, 04 Nov 2020 04:59:37 +0000</pubDate>
      <link>https://dev.to/airbytehq/how-to-build-thousands-of-connectors-fb6</link>
      <guid>https://dev.to/airbytehq/how-to-build-thousands-of-connectors-fb6</guid>
      <description>&lt;p&gt;We’re building an open-source data integration platform at &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt;. We launched our MVP about a month ago. We were thrilled by the amount of feedback and support we got from the community. We even got our first big pull request from a contributor this week (2,000+ lines of code). But during this full month, we didn’t release any new connectors. You might wonder why we didn’t build on that momentum. If people were excited with our MVP even though it had only 6 connectors, you might think we should have ramped up on the number of connectors as fast as possible. We didn’t do that for two very important and differentiating reasons. &lt;/p&gt;

&lt;p&gt;First, we were defining exactly what the best data protocol would be if we wanted to solve data integration once and for all, and this for all companies. You can learn more about our specification &lt;a href="https://docs.airbyte.io/architecture/airbyte-specification"&gt;here&lt;/a&gt;. Even though it’s not final yet, you will have a glimpse of our vision for the future. &lt;/p&gt;

&lt;p&gt;Second, and just as important, we were building a real manufacturing plant for data integration connectors. See, our team led data integration at LiveRamp, which has more than 1,000 data ingestion connectors and 1,000+ distribution connectors. So we have the experience of abstracting what can be abstracted and simplifying the manufacturing of new integration (very often without code). We haven’t fully built our manufacturing plant, but engineers can already add one new connector every day. &lt;/p&gt;

&lt;p&gt;This article describes how we built this connector manufacturing plant. &lt;/p&gt;

&lt;h1&gt;
  
  
  What you need to think about when building a large number of connectors
&lt;/h1&gt;

&lt;p&gt;When building a large catalog of connectors, there are several things that you need to think through. &lt;/p&gt;

&lt;h2&gt;
  
  
  Initial build
&lt;/h2&gt;

&lt;p&gt;This is when you start from a blank page. This step usually requires a little bit of planning since it involves communication with external teams/companies.&lt;br&gt;
The initial build step involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to the source/destination documentation&lt;/li&gt;
&lt;li&gt;Access to test accounts, test infrastructure, etc.&lt;/li&gt;
&lt;li&gt;Using golden path encoding good practices&lt;/li&gt;
&lt;li&gt;Using the best language for the task: today, we support both Java and Python, but anyone can add their own language&lt;/li&gt;
&lt;li&gt;Creating documentation&lt;/li&gt;
&lt;li&gt;Defining the necessary inputs
##Tests
Tests are essential to make sure that any code or protocol change won’t affect the connectors. They need to run before every merge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They also ensure that the connector behaves as you expect. For that you need to run your connector against the actual production service. For example, if you’re working on the Salesforce connector, you must make sure that Salesforce actually behaves the way you expect. It is not unusual that an API or service documentation doesn’t fully reflect the reality.&lt;/p&gt;

&lt;p&gt;We currently have the foundation of our test framework; it allows developers to focus solely on providing inputs and outputs, and the rest is taken care of by the framework.&lt;/p&gt;

&lt;p&gt;These tests give us 90% certainty that the connector is fully functional. If there are edge cases, it is always possible to add more custom tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Liveliness &amp;amp; Change detection
&lt;/h2&gt;

&lt;p&gt;It is essential to ensure that the source or destination continues to behave as it was encoded  during the initial build phase and to ensure that the source or destination is still alive for monitoring purposes. &lt;/p&gt;

&lt;p&gt;These verifications must be run at a cadence, and any failure needs to be investigated and fixed, leading to the maintenance phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maintenance
&lt;/h2&gt;

&lt;p&gt;We need to define how we are going to update the connector, push changes and propagate the changes to all the running instances of Airbyte.&lt;/p&gt;

&lt;h1&gt;
  
  
  The art of building connectors is thinking in onion layers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Segmenting cattle code
&lt;/h2&gt;

&lt;p&gt;To make a parallel with the pet/cattle concept that is well known in DevOps/Infrastructure, a connector is cattle code, and you want to spend as little time on it as possible. Anything you can do to prevent yourself from doing work in the future, you need to do. This will accelerate your production tremendously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Abstractions as onion layers
&lt;/h2&gt;

&lt;p&gt;Maximizing high-leverage work leads you to build your architecture with an onion-esque structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9ZKLTobV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/vipnb7qkii3lxn52cg8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9ZKLTobV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/vipnb7qkii3lxn52cg8j.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The center defines the lowest level of the API. Implementing a connector at that level requires a lot of engineering time. But, it is your escape hatch for very complex connectors where you need a lot of control.&lt;/p&gt;

&lt;p&gt;Then, you build new layers of abstraction that help tackle families of connectors very quickly.&lt;/p&gt;

&lt;p&gt;Today, we’ve built one of these abstractions to support existing Singer integration. Building an integration leveraging Singer takes us less than 3 hours, and our goal is to bring it down to less than 10 minutes. &lt;/p&gt;

&lt;p&gt;We have the same ambition for every other family of sources and destinations.&lt;/p&gt;

&lt;p&gt;As we continue to improve our manufacturing plant for connectors, we will build tools that will allow us to handle 95% of integrations with no or very little code.&lt;/p&gt;

&lt;p&gt;This is how we are going to address the long tail of integrations and how we’re going to make integrations a commodity.&lt;/p&gt;

&lt;h1&gt;
  
  
  What Airbyte has built up to now
&lt;/h1&gt;

&lt;p&gt;We’ve built the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The center of the onion&lt;/li&gt;
&lt;li&gt;The golden path in Java &amp;amp; Python to build new connectors&lt;/li&gt;
&lt;li&gt;The first version of the integration test framework&lt;/li&gt;
&lt;li&gt;Connectors: 10 sources with a rate of 1 new source per day, and 4 destinations&lt;/li&gt;
&lt;li&gt;A layer to quickly support Singer integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What our ambitions are with this connector manufacturing plant
&lt;/h2&gt;

&lt;p&gt;We want to reach a rate of 5 connectors per day and accelerate even beyond that. &lt;/p&gt;

&lt;p&gt;We also want to provide the community with more tools to build and contribute their own connectors. Ideally, 95% of connectors can be added to Airbyte with no code.&lt;/p&gt;

&lt;p&gt;-&lt;/p&gt;

&lt;p&gt;We hope this gives you a better understanding of what we’ve been up to and what our real ambitions are. If you see any ways to improve this architecture, we’re all ears. Don’t hesitate to join our &lt;a href="https://slack.airbyte.io"&gt;Slack&lt;/a&gt; to discuss any questions or suggestions with the team.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>database</category>
    </item>
    <item>
      <title>Why the Future of ETL Is Not ELT, But EL(T)</title>
      <dc:creator>John Lafleur</dc:creator>
      <pubDate>Wed, 04 Nov 2020 04:51:39 +0000</pubDate>
      <link>https://dev.to/airbytehq/why-the-future-of-etl-is-not-elt-but-el-t-5dbf</link>
      <guid>https://dev.to/airbytehq/why-the-future-of-etl-is-not-elt-but-el-t-5dbf</guid>
      <description>&lt;p&gt;How we store and manage data has completely changed over the last decade. We moved from an ETL world to an ELT world, with companies like Fivetran pushing the trend. However, we don’t think it is going to stop there; ELT is a transition in our mind towards EL(T) (with EL decoupled from T). And to understand this, we need to discern the underlying reasons for this trend, as they might show what’s in store for the future. &lt;/p&gt;

&lt;p&gt;This is what we will be doing in this article. I’m the co-founder of &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt;, the new upcoming open-source standard for data integrations.&lt;/p&gt;

&lt;h1&gt;
  
  
  What are the problems with ETL?
&lt;/h1&gt;

&lt;p&gt;Historically, the data pipeline process consisted of extracting, transforming, and loading data into a warehouse or a data lake. There are serious disadvantages to this sequence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inflexibility
&lt;/h2&gt;

&lt;p&gt;ETL is inherently rigid. It forces data analysts to know beforehand every way they are going to use the data, every report they are going to produce. Any change they make can be costly. It can potentially affect data consumers downstream of the initial extraction. &lt;/p&gt;

&lt;h2&gt;
  
  
  Lack of visibility
&lt;/h2&gt;

&lt;p&gt;Every transformation performed on the data obscures some of the underlying information. Analysts won’t see all the data in the warehouse, only the one that was kept during the transformation phase. This is risky, as conclusions might be drawn based on data that hasn’t been properly sliced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lack of Autonomy for Analysts
&lt;/h2&gt;

&lt;p&gt;Last but not least, building an ETL-based data pipeline is often beyond the technical capabilities of analysts. It typically requires the close involvement of engineering talent, along with additional code to extract and transform each source of data. &lt;/p&gt;

&lt;p&gt;The alternative to a complex engineering project is to conduct analyses and build reports on an ad hoc, time-intensive, and ultimately unsustainable basis.&lt;/p&gt;

&lt;h1&gt;
  
  
  What changed and why ELT is way better
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Cloud-based Computation and Storage of Data
&lt;/h2&gt;

&lt;p&gt;The ETL approach was once necessary because of the high costs of on-premises computation and storage. With the rapid growth of cloud-based data warehouses such as Snowflake, and the plummeting cost of cloud-based computation and storage, there is little reason to continue doing transformation before loading at the final destination. Indeed, flipping the two enables analysts to do a better job in an autonomous way. &lt;/p&gt;

&lt;h2&gt;
  
  
  ELT Supports Agile Decision-Making for Analysts
&lt;/h2&gt;

&lt;p&gt;When analysts can load data before transforming it, they don’t have to determine beforehand exactly what insights they want to generate before deciding on the exact schema they need to get. &lt;/p&gt;

&lt;p&gt;Instead, the underlying source data is directly replicated to a data warehouse, comprising a “&lt;strong&gt;single source of truth&lt;/strong&gt;.” Analysts can then perform transformations on the data as needed. Analysts will always be able to go back to the original data and won’t suffer from transformations that might have &lt;strong&gt;compromised the integrity of the data&lt;/strong&gt;, giving them a free hand. This makes the business intelligence process incomparably more flexible and safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  ELT Promotes Data Literacy Across the Whole Company
&lt;/h2&gt;

&lt;p&gt;When used in combination with cloud-based business intelligence tools such as Looker, Mode, and Tableau, the ELT approach also broadens access to a common set of analytics across organizations. Business intelligence dashboards become accessible even to relatively non-technical users. &lt;/p&gt;

&lt;p&gt;-&lt;/p&gt;

&lt;p&gt;We’re big fans of ELT at Airbyte, too. But ELT is &lt;a href="https://airbyte.io/articles/data-engineering-thoughts/how-we-can-commoditize-data-integration-pipelines/"&gt;not completely solving the data integration problem&lt;/a&gt; and has problems of its own. We think EL needs to be completely decoupled from T.&lt;/p&gt;

&lt;h1&gt;
  
  
  What’s changing now and why EL(T) is the future
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Merging of Data Lakes and Warehouses
&lt;/h2&gt;

&lt;p&gt;There was a great analysis by Andreessen Horowitz about how data infrastructures are evolving. Here is the architecture diagram of the modern data infrastructure they came up with after a lot of interviews with industry leaders.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KRqrAT5h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p9958eqdyi3p7o7ghsp8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KRqrAT5h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p9958eqdyi3p7o7ghsp8.jpg" alt="Alt Text"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Data infrastructure serves two purposes at a high level: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helps business leaders make better decisions through the use of data - analytic use cases &lt;/li&gt;
&lt;li&gt;Builds data intelligence into customer-facing applications, including via machine learning - operational use cases
Two parallel ecosystems have grown up around these broad use cases. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data warehouse forms the foundation of the analytics ecosystem. Most warehouses store data in a structured format. They are designed to generate insights from core business metrics, usually with SQL (although Python is growing in popularity). &lt;/p&gt;

&lt;p&gt;The data lake is the backbone of the operational ecosystem. By storing data in raw form, it delivers the flexibility, scale, and performance required for applications and more advanced data processing needs. Data lakes operate on a wide range of languages including Java/Scala, Python, R, and SQL.&lt;/p&gt;

&lt;p&gt;What’s really interesting is that modern data warehouses and data lakes are starting to resemble one another – both offering commodity storage, native horizontal scaling, semi-structured data types, ACID transactions, interactive SQL queries, and so on.&lt;/p&gt;

&lt;p&gt;So you might be wondering if data warehouses and data lakes are on a path toward convergence. Will they become interchangeable in a stack? Will data warehouses also be used for the operational use case?&lt;/p&gt;

&lt;h2&gt;
  
  
  EL(T) Supports Both Use Cases: Analytics and Operational ML
&lt;/h2&gt;

&lt;p&gt;EL, in contrast to ELT, completely decouples the Extract-Load part from any optional transformation that may occur. &lt;br&gt;
The operational use cases are all unique in the way incoming data is leveraged. Some might use a unique transformation process; some might not even use any transformation. &lt;/p&gt;

&lt;p&gt;In regards to the analytics case, analysts will need to get the incoming data normalized for their own needs at some point. But decoupling EL from T would let them choose whichever normalization tool they want. DBT has been gaining a lot of traction lately among data engineering and data science teams. It has become the open-source standard for transformation. Even Fivetran integrates with them to let teams use DBT if they’re used to it. &lt;/p&gt;

&lt;h2&gt;
  
  
  EL Scales Faster and Leverages the Whole Ecosystem
&lt;/h2&gt;

&lt;p&gt;Transformation is where all the edge cases lie. For every specific need within any company, there is a schema normalization unique to it, for each and every one of the tools. &lt;/p&gt;

&lt;p&gt;By decoupling EL from the T, this enables the industry to start covering the long tail of connectors. At Airbyte, we’re building a “connector manufacturing plant” so we can get to 1,000 pre-built connectors in a matter of months. &lt;/p&gt;

&lt;p&gt;Furthermore, as mentioned above, it would help teams leverage the whole ecosystem in an easier way. You start to see an open-source standard for every need. In a sense, the future data architecture might look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LN_54bxq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qtz73v0zzkbhsgxk7lbj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LN_54bxq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/qtz73v0zzkbhsgxk7lbj.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the end, extract and load will be decoupled from transformation. Do you agree with us? If so, you might be interested to have a look at what &lt;a href="https://airbyte.io"&gt;Airbyte&lt;/a&gt; does.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>database</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
