DEV Community

Yulong Liu
Yulong Liu

Posted on

[Tutorial] Setup Your in House AI Slack Bot for Data Analytics in Natural Language

Learn more about our open source project on Github

In today's data-driven world, organizations are constantly looking for ways to make their data analysis process more efficient and streamlined. The recent surge in Large Language Model technologies has revolutionized the field of analytics by enabling natural language interactions, which gave access to data analytics to everyone. DSensei is an open-source Slack bot that connects this technology with people by enabling
them to request data analytics in plain English directly within their familiar instant messaging tool.

In this post, we'll walk through the steps to setup Dsensei in Slack connecting to BigQuery as its data source.

Prerequisites

  • A Slack account with admin access to the workspace to install Dsensei
  • An OpenAI API key (you can generate from this link)
  • A Google Cloud account with roles/iam.serviceAccountCreator IAM role to create a service account for BigQuery (see this doc for detail)

Setup

Now let's start the setup process.

Step 1. Create an Setup Slack App

First let's create a new Slack App and install it to the workspace:

  • Sign in to your Slack account in browser and navigate to https://api.slack.com/apps
  • Click on the "Create New App" button to create a new app, and select "From an app manifest".
  • Select the workspace you want to install the app into.
  • Inside the "Enter app manifest below" dialog select the format to YAML and paste the following manifest:
 display_information:
  name: sensei
features:
  app_home:
    home_tab_enabled: false
    messages_tab_enabled: true
    messages_tab_read_only_enabled: false
  bot_user:
    display_name: sensei
    always_online: true
  slash_commands:
    - command: /info
      description: Get information about DB
      usage_hint: /info [dbs] | [tables db] | [schema db.table]
      should_escape: false
oauth_config:
  scopes:
    bot:
      - app_mentions:read
      - chat:write
      - commands
      - im:history
      - files:write
      - files:read
settings:
  event_subscriptions:
    bot_events:
      - app_mention
      - message.im
  interactivity:
    is_enabled: true
  org_deploy_enabled: false
  socket_mode_enabled: true
  token_rotation_enabled: false
Enter fullscreen mode Exit fullscreen mode

Step 2. Setup BigQuery Credentials

Then let's create a Google Cloud Service Account Dsensei to access Bigquery:

  • Sign in to your Google Cloud console in browser and navigate to https://console.cloud.google.com/iam-admin/serviceaccounts.
  • Select the project you want Dsensei to have access to and you should get into the service account management page.
  • Click "+Create service account".
  • Input "Dsensei" to the "Service account name" and "dsensei" to the "Service account ID", optionally you can also put some description. Click "CREATE AND CONTINUE". screenshot
  • Grant the "BigQuery Data Viewer" and "BigQuery Job User" role to the service account and click "Done". screenshot
  • Locate the newly create service account in the list, toggle the "Action" menu and click the "Manage keys" button. screenshot
  • In the "Keys" tab click "ADD KEY" -> "Create new key" to open the key creation dialog.
  • Inside the key creation dialog, select "JSON" key type and click "CREATE".
  • It should download the json file, save the json file into the proper path and we will use it later.

Step 3. Setup DSensei Service

Now with the Slack App and BigQuery access setup, let's setup the Dsensei service:

  • Checkout the Dsensei repo: git clone https://github.com/logunify/dsensei.git
  • Switch to Node 18. You can do that with nvm use 18, details about nvm installation and usage can be found in this doc.
  • Inside the checked out dsensei folder, run npm install to install all the dependencies.
  • Config credentials, we provide a template on the config, you can use it by renaming .env.example to .env:

    • Config slack credentials:
    • Goto https://api.slack.com/apps and select the app you just created.
    • On the sidebar, select "OAuth & Permissions", and find the oauth token under "Oauth Tokens for Your Workspace" section. It should start with xoxb-. Copy the token put it under SLACK_BOT_TOKEN in the .env file.

    screenshot

    • Then select the "Basic Information" on sidebar, and find the "Signing Secret" under the "App Credentials" section. Copy the secret and put it under SLACK_SIGNING_SECRET in the .env file.

    screenshot

    • Finally locate "App-Level Tokens" section under the "Basic Information" tab and click the "Generate Token and Scopes" button to generate a app token. Add the connections:write scope in the dialog and click "Generate" to generate the token. The Slack App Token should start with xapp-. Copy the token and put it under SLACK_APP_TOKEN in the .env file.

    screenshot

    • Config OpenAI API key:
    • Find your OpenAI API key in this page, copy the token and put it under OPENAI_API_KEY in the .env file.
    • Config BigQuery key:
    • In the .env file, set BQ_KEY to the path to the key file we generated above for the service account, like /Users/foo/gcp/dsensei.json
  • [Optional] Whitelist dataset and tables.

    • You might want to limit the dataset / tables this tool can access, you can do so by list the dataset in a comma separated string in the DATABASES field, and / or comma separated dataset.tablename list in the TABLES field.

Start DSensei and Verify in Slack!

That's it, now we should have everything setup, let's start Dsensei and verify it in Slack!

  • [Optional] For the demo purpose, we copy a public sample ecommerce dataset into our BigQuery project and name it ecommerce:
    • Goto this link and click "COPY"
    • Select "CREATE NEW DATASET"
    • Change the Project ID to the project we generate the service account in Step 2
    • Click "CREATE DATASET"
  • Run npm run prod to start Dsensei service. You should see Dsensei initialize and loading schema in the logs. In our example, in looks like below:
2023-04-26T23:10:00.526Z [SlackApp] info: 
    ____    _____                                _  
   / __ \  / ___/  ___    ____    _____  ___    (_)
  / / / /  \__ \  / _ \  / __ \  / ___/ / _ \  / /
 / /_/ /  ___/ / /  __/ / / / / (__  ) /  __/ / /
/_____/  /____/  \___/ /_/ /_/ /____/  \___/ /_/

2023-04-26T23:10:00.529Z [DataSourceLoader] info: Use data source from BigQuery
[INFO]  socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
2023-04-26T23:10:00.782Z [SlackApp] info: Sensei is up running, listening on port 3000
2023-04-26T23:10:00.986Z [BigQuery] info: Loaded databases: ecommerce
[INFO]  socket-mode:SocketModeClient:0 Now connected to Slack
2023-04-26T23:10:01.414Z [BigQuery] info: Loaded table: ecommerce.distribution_centers
2023-04-26T23:10:01.448Z [BigQuery] info: Loaded table: ecommerce.users
2023-04-26T23:10:01.452Z [BigQuery] info: Loaded table: ecommerce.events
2023-04-26T23:10:01.454Z [BigQuery] info: Loaded table: ecommerce.products
2023-04-26T23:10:01.456Z [BigQuery] info: Loaded table: ecommerce.inventory_items
2023-04-26T23:10:01.504Z [BigQuery] info: Loaded table: ecommerce.orders
2023-04-26T23:10:01.529Z [BigQuery] info: Loaded table: ecommerce.order_items
2023-04-26T23:10:01.530Z [BigQuery] info: All 1 databases are loaded.
2023-04-26T23:10:04.334Z [BigQuery] info: Enrichment finished.
Enter fullscreen mode Exit fullscreen mode
  • Now is the fun part, let's play around with Dsensei in Slack:

    • Create a new channel and add sensei to the channel (you can also locate the "sensei" under the "Apps" section on the left and ping it directly): screenshot
    • We start with commands to verify that schemas have been loaded. Send /info dbs to get all the database and here's the response:
    Your databases: 
    --------------------
    ecommerce
    
    • Send /info tables ecommerce to get all the tables under ecommerce dataset and here's the response:
    Your tables in ecommerce: 
    --------------------
    distribution_centers
    events
    inventory_items
    order_items
    orders
    products
    users
    
    • Send /info schema ecommerce products to get the detailed schema of the products table and here's the response:
    Table ecommerce.products schema:
    name                    type     description
    --------------------------------------------
                        id  INTEGER             
                      cost    FLOAT             
                  category   STRING             
                      name   STRING             
                    brand   STRING             
              retail_price    FLOAT             
                department   STRING             
                      sku   STRING             
    distribution_center_id  INTEGER             
    
    • After verify the schemas are loaded, we will test with some data questions. Say we want to know the number of orders this month, we can simply just send "@sensei how many new orders do I have this month?" and Dsensei will respond in a thread:

    screenshot

    • We can follow up with Dsensei within the thread, for instance with the number of orders we have this month, we want to know how does it compared with last month. Dsensei will memorize context of all conversation with the thread so we can directly ask the follow up question by sending "@sensei how does it compare with last month?" in the thread, and we get:

    screenshot

    pretty cool huh?

    • It can answer more advanced question that needs to conduct more sql query. For instance, we want to know the brands that process their more effectively and limit to only brands with more than 100 orders to reduce the bias, which entails the question as: "@sensei for those brands having more than 100 orders, give me the top 5 brands that fastest at shipping their orders?".

    screenshot

  • Now we have everything setup, enjoy your analytics with DSensei!

Closing Thought

In conclusion, leveraging the power of chatbot to access and analyze data can greatly improve the efficiency and effectiveness of data analysis processes in organizations. DSensei is a great open-source Slack bot that can help you achieve this goal by allowing you to access and analyze data through natural language commands. By following the steps outlined in this post, you can easily set up DSensei and link it to your BigQuery data source to start benefiting from its features.

We’re passionately developing this project and would love for you to be a part of our community on Discord where you can receive the latest news, report bugs, and make feature requests. Please also feel free to submit any feedback on github directly.

Try a live demo of DSensei in our Slack Channel

Learn more about our open source project on Github

Top comments (0)