CodingBlocks

97. Data Structures – (some) Trees

Jan 8 '19

We ring in 2019 with a discussion of various trees as Allen questions when should you abstract while Michael and Joe introduce us to the Groot Tree.

Are you reading these notes via your podcast player? You can find the full show notes for this episode and join the conversation at https://www.codingblocks.net/episode97.

Survey Says

How good were the holidays to you?

Awesome. Got everything I hoped for.
I got things I didn't even ask for.
It was good. I got everything I purchased.
Glad to be back at work.
I didn't get squat.

vote

News

Thank you to everyone that left us a review:
- iTunes: SoundShop, TimGraf, bothzoli
- Stitcher: KSprings, Trickermand, Delden, Flo_
Is it really always better to code to an interface?

I am Groot

Why do trees matter?

Trees are one of the most important data structures, because there are certain types of problems that are really difficult to solve without them and they are really efficient for certain tasks.
They are also some of the most difficult for programmers to deal with, because you often need recursive algorithms to work with them.
Recursive solutions are neat, but can easily lead to stack overflows if you are not careful (especially if your language doesn’t support tail recursion).
Many traditional persistence layers are not setup to easily store tree data.
Why programmers use trees:
- Model “real world” hierarchical data.
- Memory/Processor efficient.
- “Good enough” solutions.

Common uses of Trees

Manipulate hierarchical data (persistence, middleware, UI):
- Directory structure on your computer.
- Product categories on an eCommerce site.
- Interview questions.
- Data files (XML, HTML, JSON).
Search.
Manipulate sorted lists of data.
As a workflow for compositing digital images for visual effects.
Router algorithms.

There are lots of Trees

The basic structure of a tree is very simple, in most cases you can imagine a wrapper node object that has a value (your data) and a set of children nodes.
There are other ways to save trees, like in an array but you’ll generally see those used in special cases like heaps or when you are serializing data.
… that’s it! If you’re looking at a file system, then the value would contain the current path and other metaData (like permissions) about that path, the children of that node would contain other nodes
Despite the simple example, Wikipedia lists 115 types of trees against 7 categories! (and 20 types of arrays!)
- The many types of tree categories:
  - Binary Trees
  - B-Trees
  - Heaps
  - Trees (Value Trees)
  - Multiway trees
  - Space-Partitioning Trees
  - Application-Specific Trees
How can you have so many types for such a simple concept as value + children?
- Constraints on the data structure (binary vs n-array).
- Stored meta data (red black).
- Rules you use for interacting with the trees (heaps).
These variations have big downstream effects.

Common terminology

Although the data structure itself is simple, there is a lot of data that you can extrapolate from about the entire tree.
Algorithms use this data in addition to the data stored in the nodes.
The basics:
- Root is the topmost node of the tree.
- Edge is the link between two nodes.
- Child is a node that has a parent node.
- Parent is a node that has an edge to a child node.
- Leaf is a node that does not have a child node in the tree.
- Height is the length of the longest path to a leaf.
- Depth is the length of the path to its root.
For example, if you are dealing with a really large tree, then you may opt to return a “good enough” solution that doesn’t exceed a certain depth.
If you’re performing a lot of searches on a tree, then it may make sense for you to optimize the tree as you insert and delete nodes so that your searches can check a consistently small set of nodes.
The point is, even if the data structure is small – there are big time implications, and the recursive algorithms you write can be … mind bendy.

Binary Tree

What is it?

Unlike arrays, lists, queues, stacks, etc., trees are not linear data structures, rather they are hierarchical.
A tree whose elements have at most two child elements, usually referred to as left and right.

How does it work?

Every tree has a root node – that’s the top most node in the tree – it has no parent.
Nodes or elements that have no children are called leaf nodes.
Nodes are referred to as either a parent or a child.
Types of binary trees:
- Full Binary Tree – every node has 0 or 2 children.
- Complete Binary Tree – every level is completely filled with nodes except the last level could have every node filled except one right node.
  - Binary heap is an example of this.
- Perfect Binary Tree – all internal nodes have two children, and all leaf nodes are at the same level.
- Balanced Binary Tree – height of the tree is O(log n) where n is the number of nodes.
  - Great for performance because they have O(log n) time for search, inserts and deletes.
- Degenerate or Pathological Tree – each node has exactly one child (until you get to the leaf node).
  - Basically no different than a linked list.

Pros

When ordered, can provide a sped up search over linked lists, but still slower than an ordered array.
Can be performant on insert/deletes into the tree – faster than inserting into the middle of an array, but slower than a linked list.
No limit on the number of nodes that can be added – unlike an array – this is because each node contains pointers to its child and parent nodes.

Cons

Extra memory, not as efficient as arrays

When to use?

You need to do comparisons and you’re not sure of how many you need, or there will be many inserts/deletes.

B-Trees

What is it?

It’s a self-balancing search tree.
- AVL Tree – (Adelson-Velsky and Landis)
Primary idea behind a b-tree is reducing the number of times the disk is accessed – keep as much in memory as possible.
Most operations for search, insert, delete, max, min – require O(h) disk accesses, where h is the height of the tree.
- This is achieved by creating a “fat tree” – this means that the tree is short but wide, and this is accomplished by putting as many keys as possible in a single b-tree node.
  - Typically the b-tree node is the same size as the block size on the disk.
All leaf nodes are at the same level.
Minimum degree “t” – t depends on the block size of the disk.
EVERY node must contain t-1 keys.
- Root may contain at minimum 1 key.
All nodes, including root, may have at most 2t – 1 keys.
Number of children of a node are equal to the number of keys in that node + 1.
All the keys are sorted in order within the node.
B-trees grow and shrink from the root node.
- Unlike a binary search tree that grows and shrinks vertically (downward).
Time complexity to search, insert and delete are O(log n).

How does it work?

Search – similar to a binary search tree:
- If you’re searching for a key, you recurse down the nodes.
- If the non-leaf node has the key, then the node itself is returned.
- If the non-leaf node doesn’t have the key, then the child node that is the child of the first key with a greater value in the current node is traversed.
- If you make it all the way to a leaf node, and the key isn’t found, NULL is returned.
Insert:
- Always starts at a leaf node.
- Do a binary search to find the node where the insert should happen – if there is room, then just insert the new key in that node and make sure the keys stay ordered.
- If the node is full …
  - Evenly split the node into two nodes.
  - Choose a median from the elements being split – anything lower than the median goes to the new left child node, and anything greater goes to the new right node.
  - The median goes into the parent node, and then splitting can keep going up the tree.

Pros

Better, more consistent search times than binary search because of the balancing.
There are self-balancing binary trees (like red-black trees) but are optimized for batch inserts.

Cons

If you’ve met the max degree, then you need to re-balance the tree which can make inserts slower.

When to use?

B-tree is well suited for storage systems that read and write relatively large blocks of data, such as disks.
Commonly used in databases and file systems – because the aim is to reduce the number of disk reads necessary.

Resources We Like

The Imposter’s Handbook (bigmachine.io)
List of data structures (Wikipedia)
What is tail recursion? (Stack Overflow)
Tree (data structure) (Wikipedia)
Everything you need to know about tree data structures (freeCodeCamp)
Binary tree (Wikipedia)
Binary search tree (Wikipedia)
Binary Search Tree (GeeksforGeeks)
B-tree (Wikipedia)
B-Tree | Set 1 (Introduction) (GeeksforGeeks)
AVL tree (Wikipedia)
B-Trees visualization (USFCA)
Red-black tree (Wikipedia)
B+ tree (Wikipedia)

Tip of the Week

Use PlantUML to create plain text UML diagrams.
- There’s also a VS Code extension for it.
- Thanks bothzoli!
Walkthrough: Debugging a Parallel Application in Visual Studio (Visual Studio Docs)
In SQL Server Management Studio, when executing multiple SELECT statements, the record count in the bottom right hand corner shows the aggregate count for all queries. Click on a specific result set to see the individual count for a particular query.
When using OS X/macOS, remember that a lot of functionality might be hidden. Use the OPTION key to see additional functionality.

Episode source

DEV Community

CodingBlocks

97. Data Structures – (some) Trees

Sponsors

Survey Says

News

I am Groot

Why do trees matter?

Common uses of Trees

There are lots of Trees

Common terminology

Binary Tree

What is it?

How does it work?

Pros

Cons

When to use?

B-Trees

What is it?

How does it work?

Pros

Cons

When to use?

Resources We Like

Tip of the Week

CodingBlocks Follow

97. Data Structures – (some) Trees

Sponsors

Survey Says

News

I am Groot

Why do trees matter?

Common uses of Trees

There are lots of Trees

Common terminology

Binary Tree

What is it?

How does it work?

Pros

Cons

When to use?

B-Trees

What is it?

How does it work?

Pros

Cons

When to use?

Resources We Like

Tip of the Week

CodingBlocks