At the beginning of this year, I was promoted to intermediate developer π
At your company, that might be an IC2 - or whichever level is after your entry level developer, but right before the senior developer. In any case, I was now at a place in my career where computer science fundamentals needed to be stronger compared to the beginning when I could just throw myself into building things with what I learned in full-stack Javascript bootcamp.
I decided I needed to better understand data structures and be more comfortable with algorithms. Not because I wanted to leetcode more. I really don't want to leetcode more. But I couldn't shake the feeling that I would be better off if I understood more why data structure A over data structure B.
So I reached out to a friend for help and this is what I've learned π€
What did I know about Big O notation?
My mental model of Big O has always been this:
1) A unit of measurement
2) Related to computer science that
3) Describes the complexity of things
From here, I needed to understand why? π
Why must we measure the complexity of things?
As developers, we deal with data.
Sometimes not very much of it, like on a static website. Sometimes a whole lot of it. The multi-millions of users kind. And most of the time, that data is not in a format that we need and we need to manipulate it. Sort it, filter it, or find something. Sometimes we even need to change it into an entirely different format! And how efficiently we do that matters at scale.
What's also true is that there are many ways to solve a problem. This is especially true in programming. You can then think of Big O notation as a way to describe how efficient a solution is relative to another one.
What types of Big O notation are there?
In this post, we'll focus just on the types that apply to arrays but know there are a number of them that you can see below:
Source: Big O Cheatsheet
For arrays, you can have 2 types of time complexities (or Big O):
1) Constant time or O(1)
2) Linear time or O(n)
Source: Big O Notation for Arrays by KodinKevin on YouTube
With Big O, the n refers to the amount of data you are working with.
Practical Examples
Example A. Kanto Starter Pokemon
Let's say you're building a Pokemon app and you have an array of Pokemon.
const kantoStarters = ['Charmander', 'Bulbasaur', 'Squirtle']
If you know the index of Squirtle in the array, you can access it by simply doing kantoStarters[index]
. If this was instead an array of all 151 Kanto Pokemon, the number of steps it takes to access a Pokemon at a known index will be the same as when there were only 3 starter Pokemon because you can go directly to the index of the Pokemon. Hence, access in an array is considered constant time - also known as O(1).
Because constant time takes the least number of steps to complete an operation, it is considered the most efficient. Check that first graph out again!
Example B. All Kanto Pokemon
Let's say instead of knowing where exactly to look for a Pokemon in an array, we have to flip through it like a clothing rack at the mall or files in a filing cabinet. In this case, it would take at worst as many steps as there are Pokemon. Remember that n in Big O notation stands for the amount of data we are working with. So should we have to look through an unordered array of all 151 Pokemon to find a Psyduck it would take us O(n) steps. This is called linear time because given more data we take proportionately more steps.
At this point, since constant time - or O(1) - takes a constant amount of steps, no matter the amount of data versus linear time - or O(n) - which takes proportionately more steps when given more data, we can say that constant time is faster or more efficient than linear time π¨
Example C. It Depends
Once we move into insertion or removal of data into an array, it gets a little nuanced. Let's say we create a new type of Pikachu that wears a coloured party hat (think Nintendo 64 Super Smash Bros) and we wanted to officially recognize it as a Kanto Pokemon: Party Pikachu. If we add Party Pikachu to the end of the list of Pokemon, that would only take one step. Hence, insertion at the end of arrays is constant time - or O(1). The same goes for removal.
It's different, however, if we're trying to insert or remove an item from any other place in the array. Why? If we added Party Pikachu to the beginning, all the indices of the Pokemon after it would have to change because the order of Pokemon is now different. This also applies if Party Pikachu were to be added in the middle of the list. We would have to take as many steps as the number of Pokemon that come after it to change the indices to the new ones. Hence, insertion or removal anywhere but the end is linear time - or O(n).
const originalKantoPokemon = ['Bulbasaur', 'Ivysaur', 'Venusaur'] // and so on
// Where Bulbasaur is index 0
const newKantoPokemon = ['Party Pikachu', 'Bulbasaur', 'Ivysaur'] // and so on
// Where Bulbasaur is now index 1
Career Value
You might be thinking, "That's great and all but why do I need to know this?" That's fair. I've been able to have a successful last 4-5 years as a developer without it. Heck, I even got promoted. But there's two possible reasons:
1) You want to get hired at a company that does leetcode.
FAANG companies - also known as Facebook, Amazon, Apple, Netflix, and Google - or similar, are infamous for testing leetcode, algorithms, and data structures in their interview process. If you want to get hired by them, you need to be able to reference Big O when you write an algorithmic solution.
2) You need to come up with efficient solutions.
Even if you avoid interviewing for companies that do leetcode, you will still have to work with data. And unless you can always work with a small amount of data, how performant the solutions you write to handle data will be important. Especially as you become a more senior engineer.
(This will become more apparent as I continue this series by moving into showing actual algorithms. Follow me and stay tuned!)
I'm personally in the second boat but I've since been opening myself up to the idea of the first one. Let's get better first then we'll see π€‘
Onward
I was the kind of kid who was, for all intents and purposes, intelligent but didn't identify with being good at STEM subjects despite being an honour roll student throughout my education. Heck, my favourite subject was music. But at some point, you hit a wall that makes you realize your work could go much more smoothly if you deepened your knowledge in a particular area π
My goal is to be able to confidently answer why we should store data a certain way (i.e. dictionary vs. list) or traverse large amounts of data in a certain way, no matter if I'm being asked in an interview or if I simply have to complete a task for a job I'm currently employed for ππ»
You can think of what we discussed so far as the building blocks for choosing between multiple ways of handling data. If we know that searching through an array is linear time and we later find out that there's an alternate solution for searching through data that is constant time, which is faster, we might want to use the latter solution. However, there's other things to weigh, like readability and maintainability. More on that another time.
I'll keep learning and be sure to share more π¬
Off to study linked lists!
Keep it candid,
Thuy ππ»ββοΈ
Note: This post focuses more on practical examples than it does on mathematical visuals. This is because not everyone will understand Big O with mathematical graphs. But if you are someone that will, I recommend this.
Top comments (8)
We don't always need things to be super duper fast.
Often times one may prefer an O(N log N) algorithm over an O(N) one because the code will be simpler. For example, in some cases you can avoid sorting things, but doing so will make your code 5x longer and more complex.
However, in general we want to avoid O(N^2) type stuff such as this Java code:
This sort of thing can appear in our code by mistake and can really create a lot of slow downs. And sometimes we say: "ah.. no big deal.. I'll just make it search through the list every time.. how long can that list really be?"
And bad things can happen. For example:
Adding a line to convert otherList into a set would make this algorithm linear instead of quadratic.
Rarely, it really really does make sense to optimize it to as fast as possible. But that is generally when we're optimizing some process that takes minutes or hours, and we really want to bring that down as much as possible. Or if we're doing something realtime.
Fortunately, often you can solve things by just using a dictionary (or hash map) or a set (or hash set) to speed up your code a lot just using the datastructures your library provides.
So as a rule of thumb:
Leetcode can be expensive to build and expensive to maintain. So make sure it's worth it.
Also, in the real world if you're dealing with tons of data, often you delegate that hard work to the database. So writing a super fast SQL query can be more important that importing it all in RAM and running some fancy binomial queue based super treap solution. ;)
I had mentioned that there are other factors to consider like readability and maintainability so definitely agree with this! "It depends" hehe.
Well done! This is a great intro to Big O!
One thing I noticed is that you said adding to an array is constant time. Not quite correct since arrays in JavaScript really work like vectors. A true array is immutable, so if you want a bigger array than what you've got, you have to create a new, larger array and copy the old elements over to it, making the resize cost O(n). Vectors hide this from you to make it easier to work with them.
Interesting! I don't know too much about vectors. If you have any intro resources, let me know. Thanks :)
Sure! This article seems helpful: stackoverflow.com/questions/150790...
Note that adding to a vector is amortized constant time, meaning that it essentials averages out to constant time in normal situations. You'll also notice that retrieval from a Map or Dictionary is also linear time, but is amortized constant time.
Looking forward to your next article!
Wow, excellent introduction, I'm just picking up the data structure and I have identified with your article because I think we share the same reasons ^_^
Awesome! Just the tip of the ice berg for me. Many more data structures and Big O notation types to go haha. We can do it :)