DEV Community

Jonathan Reeves
Jonathan Reeves

Posted on

Codecadmey Data Science Course Part IV

Hello, and welcome back to my series that focuses on my time through the Codecademy Data Science course. I am excited to say that I have made it through the SQL section. I will break down my experience with this course so far. Earlier posts were a breakdown of what I was doing module by module and I realized that will lead to a lot of posts as there are 26 modules in the course that contain various articles, exercises and projects to complete. That is a lot of posts to try to keep up. So I have decided that I will write a post for each section. This one is going to finish out the SQL portion. The next post after this will cover Python and so on and so forth. Let's get cracking.

Intermediate SQL

I know that I talked about this briefly in my other post but I want to say a few more things related to this particular module as it was the one that I felt had the most important concepts and covers the most use cases.

So first and foremost I want to say that by no means is this course, Learn SQL or Data Science, an easy cake walk. I struggled throughout most of this section. I didn't know SQL that great going into this and there are plenty of aspects of SQL that I definitely still don't know that great. But what this section showed me is that there are ways to use SQL vast amounts of built in functions to manipulate data, order it and spit it out so that you can have a pretty looking table with all the necessary data in the order that you want it in. Up to this point in my course and in my personal and professional experience working with SQL, SELECT statements with WHERE and FROM clauses were about as far as I went with SQL. The first section in the part helped me finish and understand a particular report I am creating now for work.

Aggregate Functions

This section showed me how to use COUNT() and countless other functions that allow for manipulating data. Things I didn't realize were helpful in creating tables and reports until I took this course. If you have used SQL in the past like I have and didn't quite delve deep into all that it offers I highly recommend going through this course. Again I am not a part of Codecademy nor am I receiving any benefits from talking about it here. I am just expressing my time through the course and what I gained out of it.

Did You Know?

This piece of code right here:
SELECT DOB,
DATEADD(MINUTE, [OT], [DOB]) AS CDOB,
SUM[IT] AS SumOfIT
SUM(GC) AS SumOfGC
FROM [dbo].[departmentCheck]
WHERE StoreId = 2 AND DOB = '2020-07-28'
GROUP BY DOB, DATEADD(MINUTE, [OT], [DOB])
ORDER BY CDOB;
is what I was able to take away from the course. There are a few functions in here that aren't covered in the course however I was able to better understand them and how to look for them because of this course. A lot of programming is knowing the right questions to ask based on the concepts we have learned. If you know the question you want to ask finding the answer is a lot easier than you think. If, for example you have a question about SELECT then you are probably not going to know that the SUM() is taking the column IT and calculating the SUM of each item in that column and then creating a new column named SumOFIT. Pretty awesome right?

Aggregate Functions Projects and Applications

In this section you are introduced to three projects that vary in difficulty. In one of the projects you are tasked with getting the the highest ranking posts on a blog site. It was a really awesome project to see how using aggregate functions help make your select queries that much more powerful. By fine tuning your query you are able to get the data you want and massage it to create columns that help you better understand what posts you should be reading based on their popularity.

Working With Multiple Tables

This section has you using aggregate functions to join multiple tables to create one table with the important data from each other table. These were awesome. It really shows you the power of SQL and why you would use these. For instance lets say you are working for a streaming company like Spotify. You are a data scientist and they want to know how many users have signed up for the service. Well as many of you know Spotify has free and paid for subscription services. Which means in one table you will have the free accounts and in another table you will have the paid for accounts. You could write a query that shows only the paid for accounts but what if one of the paid for accounts moved to a free account before the this purchased table could be updated? How would you know? By looking at the free account table you would notice a new account that just hit and you could see that that account originally showed up in the purchased table but was now on the free table. That means you could let people in the company know and the marketing team could take steps to try to figure out why they dropped to the free plan and potentially give a coupon to that person via email for free months of premium or lower the monthly fee for a time. How awesome is that? You just helped Spotify's marketing team bring on another potential, paying, customer. On your first day! Rock star status!

Going off Platform and Analyzing Real Data

The next two sections I am combining to be one section because going off platform is basically just showing you how to install a version of SQL on your local computer so you can practice downloading csv files and other data files to store in a database and play around with. There aren't any projects or even example data files to download and play with. Only steps on installing SQLite to your local computer.

Analyzing real data with SQL on the other hand is where things in the course took a turn for the worse. Luckily there is a nice forum that you are able to visit to ask questions when you are stuck on certain problems. This section was a tough one. I saw forum posts from users that have been using SQL for years and just wanted to brush up on their skills asking questions about their queries because they didn't know what was wrong. The section might have been tough but it was still fun to go through and a very vital one for aspiring SQL programmer to go through as it will test your skills and abilities and make you sweat as you think about the query you need to write in order to accomplish the task. There are no step by step hand holding instructions. Just do this and then do that to get the result. Even the hint just barely scratches the surface of what you need to do to be back on the right path.

Conclusion

All in all the Learn SQL module was definitely the right choice for starting out the Data Science course. What better way to get you into thinking like a Data Scientist than with manipulating data in databases. Showing off real world examples of case studies that companies may need your SQL skills for. Learning how to use SQL to manipulate data is obviously just the tip of the iceberg for a data scientist but if you can't do it then chances are you might want to look elsewhere for a career path. You will need to know at the basics of SQL in order to use a language like Python or even JavaScript to get data from a database programmitically. Without a basic understanding of SQL I would have been lost my first day on the job. As always I hope you enjoyed the read and look forward to seeing how many likes and or comments are posted. Enjoy.

Latest comments (2)

Collapse
 
alexantra profile image
Alex Antra

Got to know the basics! I meet too many scientists who just know pandas and pyspark and wonder why they cant get their heads around an RDS. Good article

Collapse
 
redhoodjt1988 profile image
Jonathan Reeves

Thanks I appreciate that. It's been a lot of fun. Very interesting topics. I'm looking forward to the numpy and pandas sections still have a ways to go before I reach them though.