DEV Community

RogerWoods
RogerWoods

Posted on • Originally published at tablechat.hashnode.dev

How proficient is generated AI in transforming text or natural language into SQL?

SQL, as the most important standard syntax in the field of data analysis, has been widely applied by people. Every developer, data engineer, and database administrator must learn SQL in order to interact with databases. Although SQL is not a very difficult syntax, it involves many details. Most developers may struggle with simple Join and feel hard to write high-quality and efficient SQL statements.

Whether you believe it or not, with the advent of Large Language models like ChatGPT, generating accurate SQL through natural language may make SQL writing less crucial. I dare not say that AI will make SQL disappear, but it will at least relieve people from the agony of meticulously crafting every syntax, saving a significant amount of time, especially for programmers who are not proficient in writing SQL. Let me show you some examples what AI can do now.

Start From Create Table and Add Data

First of all, you need to have a clear idea of the table and fields you want to create. You can ask GPT for some suggestions, but often in our daily production systems, we already have an idea of the structure of the table. Let's take the example of student school performance. We need tables look like this.

Table Student

student_number name birthday gender
0001 Jhon 1995-06-04 male

Table Course

course_number name teacher_number
0001 Math 0002

Table Score

student_number course_number score
0001 0001 90

Table Teacher

teacher_number name
0001 William

Hard to write so many field name and create data. Well, let GPT help us.
just type:

create two tables for me: student with student_num, name, birthday and gender. course with course number, name and teacher number 
Enter fullscreen mode Exit fullscreen mode

The description seems not so accurate, but the result is very good. It can even choose the field type, char length for you.

createtable

Then, we generate another two tables with the same plain sentence. But a small amazing thing still shocks me: in the table Score, it uses student_number and course_number as union primary key! How does it know this?

createattable

As a test table, I should create some data and insert them into the tables. It is too hard and annoying to create so many names, ages, scores and so on. Don't worry, we can still use AI to do this though people seldom use this function in production environment.

fake about 20 data randomly for all the for tables for me, and make the data is correlated
Enter fullscreen mode Exit fullscreen mode

fakedata

fakedata2

Wow, we have table and data now. Let's do something with the table.

Basic Data Query

Firstly let's just see all the data of one table, so easy right?

show student data for me
Enter fullscreen mode Exit fullscreen mode

show
No surprising, it can certainly show the right result. OK, add some diffculty, let him give us a fuzzy query.

show teacher name endwith n
Enter fullscreen mode Exit fullscreen mode

like
Clearly using the key word "like".

Complex Data Query with Multiple tables

Maybe just query one table is not so hard, then let me try some complex syntax. First, choose the tables you want to query with the software instead of all the database. And write sentence that should use multiple talbes.

calculate all the average score of students and tell me the students name whose scroe is large than 60
Enter fullscreen mode Exit fullscreen mode

avg
AI exactly uses the key word of Join, Group by, Having and function AVG to return thc correct results.
Then I want to try an inline query.

query all the students name number whose courses score are all under 80
Enter fullscreen mode Exit fullscreen mode

allscore
No doubt, inline query is shown here. But why it use "NOT IN" instead IN? Let's check with AI. By asking it to optimize the sql for me. Well it now uses "IN" and add a judgement of NULL value. So the choice of AI may be short first~

aiin
Let's see some thing more complex.

query all the students who has not learned all the courses
Enter fullscreen mode Exit fullscreen mode

allcourses
Wow, it's hard for most people to write such code.

Use of Function

As we can see it uses some function such as AVG above to generate results you want, it can use more functions. For example,

query all students ages by year
Enter fullscreen mode Exit fullscreen mode

date
We can see it uses the function of TIMESTAMPDIFF. Then let me ask him something complex.

ranking the students scores by the course number and show course numbers, student number, score and ranking number.
Enter fullscreen mode Exit fullscreen mode

ranking
Wow, Wow, Wow! Do you see the use of ROW_NUMBER() and OVER key word? Can you write out sql like this?

Many people will doubt the ability of generated AI before using it as I was before. But after I use it more and more, it gives me more surprise.

Tips:The examples above is written with a software called TableChat , it can generate SQL according to you table structure and run to return to see the results.

Top comments (0)