DEV Community

2

Glue cross-account setup

This document will cover detailed steps on how to query glue DB catalog from Dremio in a cross-account setup using AWS Lake formation

Use-case
Account A - Dremio is deployed here and AWS Glue_DB_A is created and added as a source in Dremio

Account B - AWS Glue_DB_B is created and data is located in the S3 bucket

Customer wants to share Glue-DB B catalog with Glue-DB A and query the data located in account B from Dremio

Setup Diagram

Image description

Role of each of service in the given setup -

  • Lake Formation - To create data mesh, simplify cross-account data sharing, and create resource links

  • Resource Access Manager - To share resources and view shared Data catalog

  • IAM User - To provide cross-account read/write access to the S3 bucket to run queries from Dremio

  • Amazon Athena - Just to test whether lake formation access is working fine or not

Steps

  • Resource Sharing using Lake Formation and Resource Access Manager

First we need to use Lake Formation and Resource Access Manager to share glue catalog from account B to A

Steps for Account-B:

  1. Create Glue DB named Glue_DB_B

  2. Create Glue Table in this DB, point to S3 location where data resides, and provide schema
    OR
    You can use glue crawler to automatically extract data from S3 and add glue table for you.

  3. Go to Lake Formation console -> Data Lake Location -> Register same S3 location -> Use default IAM role -> AWSServiceRoleForLakeFormationDataAccess

  4. Go to Lake Formation -> Databases -> Select Glue_DB_B -> Actions -> Grant -> Fill in (External Account), put AWS Account-A ID -> Choose a specific table

For DB, grant Alter, Create table, Describe
For Table, grant Alter, Delete, Describe, Drop, Insert
Enter fullscreen mode Exit fullscreen mode
  1. Go to Resource Access Manager console -> Shared by me in the left pane -> Resource Shares You should be able to view your shared resources

Steps for Account-A:

  1. Go to Resource Access Manager β†’ Shared with me β†’ Resource Shares β†’ Accept your Resource Share

  2. Now, Go to Lake Formation -> Table -> Your shared table will appear here -> Click on table -> Actions -> create Resource link

  3. Table will now appear italicized in the glue db as shown below

Provide cross-account read/write access to the S3 bucket

Steps to do so:

  • Go to Account B β†’ S3 console
  • Select your S3 bucket
  • Go to the Permissions tab
  • Edit Bucket Policy and add the following policy (make sure to add the AWS Account-A ID, IAM User name, and bucket name)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>/*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetLifecycleConfiguration",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode
  1. Add Glue catalog as a source in Dremio

Last step is to add Glue_DB_A as a source in Dremio :

  • Go to Add Source
  • Select AWS Glue Data Catalog
  • Fill in the details - Name, Region, Authentication
  • Hit Save

You should be able to view the datasets from both the glue catalogs and run queries on them.

Or

You can run the query on the glue source via Athena instead of Dremio.

Image of AssemblyAI tool

Transforming Interviews into Publishable Stories with AssemblyAI

Insightview is a modern web application that streamlines the interview workflow for journalists. By leveraging AssemblyAI's LeMUR and Universal-2 technology, it transforms raw interview recordings into structured, actionable content, dramatically reducing the time from recording to publication.

Key Features:
πŸŽ₯ Audio/video file upload with real-time preview
πŸ—£οΈ Advanced transcription with speaker identification
⭐ Automatic highlight extraction of key moments
✍️ AI-powered article draft generation
πŸ“€ Export interview's subtitles in VTT format

Read full post

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

πŸ‘‹ Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay