mgracer7795

Posted on Sep 22, 2023

Data Science - Structured Query Language

Working with Structured Query Language

For my 2nd post as a student at the FlatIron School I'll be sharing what I've learned from our Phase 2 course material. One of the central topics that was associated with our code challenge and final project was Structured Query Language or SQL. Throughout the course material we also have Checkpoints that are assignments that need to be completed. I'll be sharing one of the code assignments we had and how I solved the problems.

Introduction to the SQL Checkpoint

The tasks of the checkpoint were to create query's, select specific columns, specific rows, and then join tables from the Structured Query Language together. To create query's, you need to first import SQL files through a connection function.

Here is a visualization of the tables from our SQL file that contains information about the fictional trading company "Northwind Traders". From the tables presented above, product, order, and shipper were the primary tables of interest for the checkpoint.

First Task

The first task assigned was to select a specific table and filter for the rows where one of the columns, discontinued, was equal to 1. To select a specific table you have to create a specific query that pulls a table from the tables of the main SQL file. The code for this is below:

first_query = ("""
SELECT *
FROM products;
''')

pd.read_sql(first_query, conn).head()

As you can see, we created a query where you need to call the SELECT and FROM functions. The * represents all columns FROM the table products. Then you can assign that query to the variable first_query and read it via the pandas function. This is a relatively simple function to start, but you can see how we are now able to view the contents of the products table from the function we called.

2nd Task

The second task was to filter for the rows where one of the columns, discontinued was only 1 instead of 1 AND 0. To do this, we call a 2nd query that adds a WHERE function to specify how we'd like to filter the SQL table we have generated above. The code for this 2nd query is below:

second_query = ("""
SELECT *
FROM products
WHERE Discontinued == 1;
""")

pd.read_sql(second_query,conn)

Now, if you view the before and after images above you can see how we filtered our products table for a specific condition where our Discontinued column is equal to 1.

The 3rd Task

The 3rd task was to filter for rows where both the UnitsInStock and Discontinued columns were equal (==) to 0. To accomplish this task we had to create a new query with a new WHERE statement to filter our dataset. The code is below:

third_query = ("""
SELECT *
FROM products
WHERE UnitsInStock == 0 AND Discontinued == 0;
""")

pd.read_sql(third_query,conn)

The result is the product name Gorgonzola Telino. This is consistent with what the expected answer was.

4th Task

The fourth task was to filter for ProductName and UnitPrice in descending order by UnitPrice. We only need the first 5 highest results. To generate these results we have to create a 4th query.

fourth_query = ("""
SELECT ProductName, UnitPrice
FROM product
ORDER BY UnitPrice DESC
Limit 5
""")

pd.read_sql(fourth_query, conn)

Here you can see how we were able to generate these results using the ORDER BY function because we're filtering for columns instead of rows. Then we use the Limit function to only view the first 5 rows.

5th Task

The 5th task was the most complex for this exercise and required joining tables together to get a specific result. We need the name and phone number for shippers who have charged more than $1000 for an order. Our initial problem is that we have to merge two tables together. This is because the Shipper table has Company Id, Name, and Phone Number. The Order table has information on the shipping cost via the "Freight" column.

To break this process down into steps we need to first create a query for shipper:

pd.read_sql("""
SELECT *
FROM Shipper
""",conn)

Then we need to create a query for order where we are filtering that freight column to be > 1000.

pd.read_sql("""
SELECT *
FROM "order"
WHERE Freight > 1000;
""",conn)

This is accomplished with a WHERE statement for Freight. Now we need to merge these two tables together to generate our final result.

fifth_query = ("""
SELECT shipp.Id AS Id_s,
shipp.CompanyName,
shipp.Phone

FROM
(SELECT *
FROM "order") AS ord
INNER JOIN
(SELECT *
FROM Shipper) AS shipp
ON ord.EmployeeId = shipp.Id
WHERE ord.Freight > 1000

""")

What we accomplish here is create a subquery for our order and shipping tables. We join the order table with the shipper table on the Employee Id column. Then we have a WHERE statement for prices greater than 1000. We have to also select columns from the new names for our joined tables. The process here is:

Create FROM subquery with information from first table, "order".
Create JOIN subquery with information from our second table.
Create ON function to join tables on a specific shared column between the two tables.
create a WHERE function to filter for rows/conditions we want.
create SELECT names for our joined tables.

The end result is finding the Name and Phone Number for the Shipper charging more than $1000 for an order. This was the final task of the checkpoint! Thanks for following along with me on my journey through the data science program at the FlatIron School! Looking forward to posting more for upcoming course material.

DEV Community

Data Science - Structured Query Language

Top comments (0)