loading...

CS50 Week 5 - Speller

kewbish profile image Emilie Ma Originally published at kewbish.github.io Updated on ・5 min read

This post is edited from my original blog post. Find it, and other posts by me on kewbish.github.io/blog.

Introduction

Finally, I'm over the worst of CS50 (in my opinion, at least). Week 5 was a bit of a difficult lesson and problem set, but in the end, it actually wasn't as hard as I thought it'd be. Week 5 covers data structures - detailing hash tables, linked lists, and tries, which are a combination of both! This is the week I was looking the most forward to (my original purpose for taking CS50 was for data structures and algorithms after all), and dreading as well.

Skip to my thoughts.

Notes

Here we go:

  • you can't reassign something if it doesn't exist yet
    • remember to initialize to a chunk of memory
  • also, you should remember malloc's effects if you reassign, and use free if you reassign a malloc
  • arrays are difficult to resize, because they're initialized to a certain amount of memory
    • could move a copy of the array to a larger, free area
    • then can delete old copy
  • we could also use realloc
    • as its name implies, it reallocates memory
    • give it the pointer of the old array
    • will return address of new array
    • remember to free variable
  • data structures are custom structures to store information
    • made with structs
    • builds off included data types
  • linked lists
    • basically an array, but each element points to the next one
    • elements are not together in memory
    • each element includes a pointer to the next
  • can't access the middle of the list with just [x] notation
    • there isn't a 'middle'
    • need to navigate through the entire list first
  • also takes twice as much memory per element
    • needs to store the next pointer
  • usually constructed of a struct
    • one part to store the actual data, and the same struct pointing to the next struct
    • initialize first to NULL, so you can assign
  • introduce a new notation, -> notation
    • similar to dot notation of a pointer
    • node->next = x;
  • need to use a while loop to iterate through the properties
    • check if not NULL
    • set the variable
  • if you want to add to the beginning
    • set a pointer to point to the beginning
    • then set the list to the last pointer
    • inserts a node at the beginning
  • to insert in the middle
    • do something similar
    • need to create a temporary variable for the swap
  • linked lists are O(n) time, need to follow each node pointer to find the next
  • also introduces a tree
    • each node points to two nodes, like the famous binary search tree
    • makes binary search very easy, only compare two nodes
    • makes insertion easy as well, only rearrange a small subset
    • search is O(log n)
  • need to balance these though, or else may become reweighted
    • also memory-expensive, but can search faster
  • hash table combines arrays and linked lists
    • each element in the array is a linked list
    • can add elements quickly, and the initial searching time is decreased
    • however, they might all end up in the same element, in which case the time efficiency is negated
    • get as close as possible to O(1) when the number of elements equals the possible values
  • retrieval tree provides O(1) searching, but at cost of space
    • stores each level of element (here, letters) in a separate array
    • in this example, 26x as much memory
  • more data structures
    • stacks -> last in, first out, like email inbox
    • queue -> first in, first out, like line in a store
    • dictionary -> map keys to values, like Python!
  • these data structures can be implemented with arrays, linked lists, hashtables, and other structures

Problem Set

This week, we only had Speller to work through. But don't underestimate it either - it took long days of work to figure out. The logic wasn't too hard to implement, actually.

First, I split up the problem in its subparts:

  • hash -> I decided to use the simplest hash function - just the first character. Could this be optimized? Yes, but I just wanted to try the data structure out first, and not have to worry about copying a hash function from online that I didn't completely understand either.
  • load -> Made an array, and I'd put each word into its appropriate element. I just appended the current word to the end of the linked list, and lowercased the entire string as well.
  • size -> In load, I'd created a line to increment a global variable, which made size just a return count; statement.
  • check -> I hashed the current word to compare, and then used a while loop to iterate over the linked list and checked if it matched the current targeted word.
  • unload -> I iterated over each element in the array, and again iterated over the linked list to free each pointer in the list.

The first time I wrote the program, it worked as intended, so check. But, upon check50-ing, I got a bunch of valgrind errors. I had forgotten to free a bunch of malloc'ed variables, and to fix this, I tried to use a character array instead. Also, I finally learned to use valgrind properly - I'd kind of ignored it in the past week, given that there wasn't a check for a memory leak, just a reminder that it could exist. I also realized that my unload function was logically incorrect, and would always return true right away. After fixing this, I attempted to test it - but now, it didn't produce the intended output. Oops.

I stripped out the entire program, and rewrote it from scratch, including what I'd learned about valgrind and memory allocation in the first runthrough. (Now that I think about it, it might have been something to do with redirecting output to the wrong file, but hey, 🤦s aside, it was a good experience to rewrite.)

Now, I was memory leak free, and working with the correct output. Nice! In the process of scrolling dozens of Stack Overflow pages, I really learned to appreciate the full power of valgrind. It's a great resource for checking memory leaks, and despite its rather scary, confusing interface, it's an essential tool. I never had to keep memory management in mind with Python, but C made me more cognizant of the lower-level management that goes into C.

Conclusion

Am I going to continue pursuing C? I don't really think I will. Would I have ignored C if I were to take some version of CS50 where I didn't need C? Probably not. I've learned a lot about lower level things, like memory management, as well as getting a glimpse into how the easy-to-use features in Python were really implemented. As countless people have said before, working with C makes one really appreciate how nice higher level languages are to work with, and it was a great experience.

That said, I still can't wait to get into the later half of CS50 - Python, SQL, and web? Yes, thank you very much. I guess this would be the more application side of CS50, and I'm excited to get learning.

What are some recommendations you have to avoid memory management errors, or some areas of research that you'd recommend to understand memory (in Python or C) more?

Discussion

pic
Editor guide