DEV Community

Payal Gupta for AWS Community Builders

Posted on

1

How to use Glue crawler to add tables automatically

This document will cover the steps on how to use Glue crawler to extract data from S3 to automatically add tables to the glue DB and run queries on it from Dremio or Athena

Setup Diagram

Setup Diagram

Steps to follow

  • Create an S3 bucket and upload the raw data i.e, csv, json files.

  • Go to AWS Glue Console and Create Glue DB

  • Go to Tables page and Select Add Tables using crawler on the top right corner

Add Tables using Crawler

This should land you to the AWS Glue Crawler setup page

Follow below steps to fill in the details

  • Name - Enter the Crawler name
  • Add data source

    • Data source - Select S3
    • Location of S3 data - Select In this account (if that’s the case)
    • S3 path - Browse for the S3 bucket which contains the data and don’t forget to add forward slash at the end
    • Subsequent crawler runs - Select Crawl all sub-folders
  • Click Add an S3 data source

  • Click Next → Configure security settings

  • Click Create new IAM role and give a name to the role. It will create a new IAM role required by the Glue crawler to extract the data present in the S3 bucket

  • Next, Set output and scheduling

    • Select the Target Database - you can choose default or create a new one
    • Crawler schedule - On Demand
  • Next → Review and Create → Create Crawler

  • Now, the crawler has been successfully created and you can run the crawler

Run the crawler

It will take few minutes to extract the data from S3 bucket and once it is done, you should see the state as Ready

Now, you should be able to see a table added in the glue DB

  1. Go to Dremio → Add the glue catalog as a source
  2. Name - Enter glue catalog name
  3. Region - Select the AWS region
  4. Authentication - AWS Access key

Click Save and run queries on the glue DB from Dremio! or Athena

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

Top comments (0)

Create a simple OTP system with AWS Serverless cover image

Create a simple OTP system with AWS Serverless

Implement a One Time Password (OTP) system with AWS Serverless services including Lambda, API Gateway, DynamoDB, Simple Email Service (SES), and Amplify Web Hosting using VueJS for the frontend.

Read full post

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay