This blog is a part of the Airbyte + Motherduck Hackathon where Iโll demonstrate how to connect Google Sheets with Motherduck using Airbyte. This setup forms the backbone of my Dropbox Sentiment Analysis Dashboard, enabling seamless data integration and storage for analysis. This blog makes it easy to make your fist setup on Airbyte between your source and destination, it is advised to go through the official documentation after this. Letโs dive in! ๐คฟ๐
Overview of the Project ๐บ๏ธ
The goal is to analyze user reviews of the Dropbox app using sentiment analysis techniques. Here's a breakdown of the workflow:
- Dataset Source: A CSV dataset of Dropbox app user reviews, downloaded from Kaggle.
- Preprocessing: Uploaded the CSV to Google Sheets for basic formatting (e.g., converting ratings from text to integers).
- Airbyte Integration: Used Airbyte to connect Google Sheets (source) with Motherduck (destination).
- Destination Setup: Motherduck stores the data in DuckDB (similar to SQL databases).
- Analysis: Built a sentiment analysis dashboard using Python and Streamlit.
Let me walk you through the setup process for Airbyte and Motherduck. ๐ฎ
What is Airbyte? ๐ง
Airbyte is an open-source data integration platform that helps synchronize data between different sources and destinations. It provides a wide range of connectors and a user-friendly interface to automate data workflows.
What is Motherduck? ๐ค
Motherduck is a cloud-based platform built on DuckDB, a fast and lightweight SQL engine. It allows efficient data analysis and management, making it an excellent choice for scalable and real-time data handling.
Setting Up Airbyte ๐ชผ
Step 1: Go to Airbyte and log in.
Youโll land in the Airbyte workspace. Follow these steps:
Create a New Connection
- Click on New Connection and choose Google Sheet as your source.
- Share your dataset on Google Sheets and copy the link.
- Paste the shared link into the placeholder in Airbyte.
- Authenticate your Google account (ensure it's the same account linked to the Google Sheet).
Select Destination
- Under the Marketplace, search for and select Motherduck.
- Authenticate Motherduck as the destination (process is written below).
Configuring Motherduck ๐ฆ
Step 2: Head over to Motherduck and sign up.
- After signing up, delete the sample workspace (not needed for this setup).
- Navigate to Settings under your profile.
- In the General tab, generate a Motherduck token (API Key).
- Copy the token and paste it into Airbyte when prompted.
Schedule the Sync ๐๏ธ
- Configure the sync schedule to keep your Motherduck database updated with any changes in the Google Sheet.
- Click Next to finalize the connection.
Validating the Connection ๐
After completing the setup, check if the source data has successfully transferred to the destination:
- On the left panel of your Motherduck page, find Attached Databases.
- Under
my_db
, navigate tomain
, where youโll see your dataset (e.g.,dropbox_reviews
). - Start a new notebook and run queries to confirm the data transfer.
Example query:
from my_db.main.dropbox_reviews
select
score,
content,
reviewId,
_airbyte_raw_id,
_airbyte_extracted_at
limit 100
Whatโs Next? ๐
This blog covers the setup of Airbyte and Motherduck for seamless data integration. In my next post, Iโll dive into:
- Project Structure: A detailed walkthrough of the Dropbox Sentiment Analysis project.
- Coding Logic: Explanation of Python libraries used for sentiment analysis.
- Dashboard Deployment: How to deploy the application on Streamlit.
PROJECT ๐ : https://airbyte-motherduck-hack-dropbox-sentiment-analysis.streamlit.app
Stay tuned for an exciting journey into sentiment analysis of Dropbox User Reviews! ๐๐๐ชผ๐ฆ
Top comments (0)