DEV Community

Sean Coughlin
Sean Coughlin

Posted on • Originally published at blog.seancoughlin.me on

UIUC MCS - CS 513 Review - Theory and Practice of Data Cleaning

Overview

  • TLDR: 513 won't teach you very much, and what you will learn is highly outdated, but it's an easy 500-level course.

  • Difficulty: Very easy

  • Opinion: Disliked

  • Weekly workload: 2 hours

  • Semester: Summer 2023

Class Content

Lecture Content

Every week consisted of about an hour of lectures. The topics covered included data validation, profiling, relational models, Datalog, SQL, Workflows, Provenance, and YesWorkflow. I could not figure out exactly when these lectures were recorded, but I'm guessing they are close to a decade old. Ideally, they should all be rerecorded at this point and factor in newer material.

I'm not sure why the course title includes 'theory' as the lectures focused entirely on data-cleaning practices. But each week had links to data cleaning papers, and those contained good resources. A diligent student who consumes all those external papers could learn a lot and cover a lot of theory ground. The course doesn't have any mechanism for enforcing reading.

Assignments

As with most MCS courses, there were weekly quizzes. The quizzes allowed for unlimited attempts and never took more than a few minutes to complete.

There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.

The class did not have any exams. Instead, it concluded with a two-phase group project. Groups consisted of three people. The setup of the project did not require much collaboration, and my team corresponded entirely over Teams messages without any synchronous meetings.

The project required cleaning some given datasets. Then you had to write a paper analyzing essentially how dirty the dataset was before and how much you were able to clean or improve it through your process. You also had to submit documentation about your cleaning process and write up some potential benefits of the cleaning. There was not any difficulty with the project.

My Takeaways

This class is ridiculously easy. It does not feel adequate at the graduate level and certainly should not be a 500-level course. I can see how many would be disappointed by the lack of rigor in what is an otherwise challenging program. If you are paying by the credit hour, it makes sense that you would want a considerable knowledge return on investment. I simply don't think this class offers that.

I think the biggest disappointment is data cleaning is a crucial skill for all data science or software engineer jobs. The content is so important that the class deserves to be good! If the content was updated and some of the assignments swapped this class could be something special. Unfortunately, the execution is not there right now.

All that being said, there are not many 500-level options, so you will probably need to take this class. Additionally, the low difficulty did make for a very well-balanced semester when paired with CS 416. I would recommend pairing this class with something else, and you'll still have a decently challenging semester.

Banner Credit

The banner was generated using the UIUC LinkedIn Banner Generator. It is an awesome tool if you need an Illinois-themed banner for anything.

More Reviews

Check out uiucmcs.org for more reviews of MCS courses. I don't know who maintains this site, but it's a good review collection from many semesters.

I have also written up a CS 427 review, a CS 435 review, a CS 498 Cloud Computing review, and a CS 416 review.


Originally published at https://blog.seancoughlin.me.

Top comments (0)