DEV Community

Jacob for AWS Community Builders

Posted on

Data Engineering and ChatGPT

ChatGPT is out and everybody is trying it. Blogs, songs are easy, but can it help with Data Engineering?

question 1

i have a table named 'people' with fields: id, name, job_type and a table names 'jobs' with fields: id,job_name. Create a join on this two tables in sql

To create a join on two tables in SQL, you can use the JOIN keyword. Here's an example:

SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id
Enter fullscreen mode Exit fullscreen mode

This query will return a table with columns name and job_name, and each row will represent a person and the job they have. The JOIN keyword is used to specify the tables to be joined, and the ON keyword is used to specify the criteria for matching rows in the two tables. In this case, the ON clause specifies that rows in the people table should be matched with rows in the jobs table based on the value of job_type in the people table and the value of id in the jobs table.


Impressive, but let's see how much it can do.

question 2

can you insert some random data in both tables

To insert data into a table in SQL, you can use the INSERT INTO statement. Here's an example:

INSERT INTO people (id, name, job_type)
VALUES (1, 'John', 2),
       (2, 'Jane', 1),
       (3, 'Bob', 3);
Enter fullscreen mode Exit fullscreen mode

This will insert three rows into the people table. Similarly, you can use the INSERT INTO statement to insert data into the jobs table as well:

INSERT INTO jobs (id, job_name)
VALUES (1, 'Teacher'),
       (2, 'Engineer'),
       (3, 'Doctor');
Enter fullscreen mode Exit fullscreen mode

This will insert three rows into the jobs table. Note that the values in the job_type column of the people table should correspond to the values in the id column of the jobs table, so that the join will work correctly.


Great, let's try how it's context is.

question 3

can you show the result of the join with these example data

Here's an example of how you can use the JOIN keyword to join the people and jobs tables and show the results:

SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id

Enter fullscreen mode Exit fullscreen mode

This query will return a table with columns name and job_name, and each row will represent a person and the job they have. For example, if the data in the people and jobs tables are as follows:

people
id  name  job_type
1   John  2
2   Jane  1
3   Bob   3

jobs
id  job_name
1   Teacher
2   Engineer
3   Doctor
Enter fullscreen mode Exit fullscreen mode

Then the result of this query will be:

name    job_name
John    Engineer
Jane    Teacher
Bob     Doctor
Enter fullscreen mode Exit fullscreen mode

This shows that John is an engineer, Jane is a teacher, and Bob is a doctor.


I'm a AWS Data Engineer, curious how far it goes.

question 4

can you write the terraform code to execute this query in athena

To execute a SQL query in Amazon Athena using Terraform, you can use the aws_athena_query_execution resource. Here's an example:

resource "aws_athena_query_execution" "example" {
  query = <<QUERY
SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id
QUERY

  result_configuration {
    output_location = "s3://my-bucket/query-results"
  }
}
Enter fullscreen mode Exit fullscreen mode

This code will create an Athena query execution that runs the specified SQL query. The query results will be saved to the specified S3 bucket. Note that this code assumes that the people and jobs tables have already been created in Athena. You will need to use additional Terraform code to create these tables. For more information, see the AWS Athena documentation.


Very impressive, gives the example and the explanation. How far can we go with this?

question 5

can you write this query in pyspark in a glue job

! Hmm...something seems to have gone wrong. Maybe try me again in a little bit.


Lucky :) seems we still have some work as a data engineer, but ChatGPT is very powerful. But the amount of progress is staggering. In a short while it'll be able to create complex etl jobs. With access to the data it will analyse data and suggest analytics.

Exciting development!

Top comments (5)

Collapse
 
florinutz profile image
Florin • Edited

haha lol glue job urbandictionary.com/define.php?ter...
that killed your thread

Collapse
 
jverhoeks profile image
Jacob

LOL :)

Collapse
 
yssoe profile image
Yves

Nice

Collapse
 
lorenz1989 profile image
lorenz1989

Have you ever wished you could see the answer to your Google search immediately on the page, without having to click on any links? Now it's possible with the chatGPT Chrome extension, which enhances your search experience. You can give it a try:
https://chrome.google.com/webstore/detail/chatgpt-for-search-engine/feeonheemodpkdckaljcjogdncpiiban?hl=en-GB&authuser=0

Collapse
 
rogave profile image
Robert Garcia Ventura

Very impressive indeed!