πΉ Problem: 1948 Delete Duplicate Folders in System
Difficulty: #Damn
Tags: #Trie, #DFS, #StringSerialization, #Hashing, #Tree, #Map, #FolderStructure
π Problem Summary
You're given a list of folder paths in your file system. If two subtrees (aka folders and all their descendants) have the same structure and names, you must delete both of them. Return the folder structure excluding the duplicates.
Itβs like trying to find evil clone folders in your OS and mercilessly deleting them... even if they just wanted to live.
π§ My Thought Process
Brute Force Idea:
"I can probably use a hashmap to count folders?" β me, before entering the abyss.Optimized Strategy (According to the solution):
β Build a Trie.
β Serialize every subtree using post-order DFS into a weird string-based signature likea(b(c()))
.
β Count how many times each signature appears.
β If any signature appears more than once, nuke the whole subtree from orbit.
β Return what's left.
π€― At this point I had 4 tabs open, a cup of tea, 3 brain cells left, and tears in my eyes.Algorithm Used:
Trie
+Post-order DFS
+Signature Hashing
+My Soul Leaving My Body
βοΈ Code Implementation (Python)
from collections import defaultdict
class Node:
def __init__(self, name):
self.name = name
self.children = {}
self.signature = ""
class Solution:
def deleteDuplicateFolder(self, paths):
root = Node("")
for path in paths:
node = root
for folder in path:
if folder not in node.children:
node.children[folder] = Node(folder)
node = node.children[folder]
signature_count = defaultdict(int)
# Post-order DFS to generate signatures
def dfs(node):
if not node.children:
node.signature = ""
return ""
child_signatures = []
for name, child in sorted(node.children.items()):
child_signature = dfs(child)
child_signatures.append(f"{name}({child_signature})")
node.signature = "".join(child_signatures)
signature_count[node.signature] += 1
return node.signature
dfs(root)
result = []
current_path = []
# Filter out duplicated structures
def dfs2(node):
if node.children and signature_count[node.signature] >= 2:
return
current_path.append(node.name)
result.append(current_path.copy())
for name, child in sorted(node.children.items()):
dfs2(child)
current_path.pop()
for name, child in sorted(root.children.items()):
dfs2(child)
return result
β±οΈ Time & Space Complexity
-
Time:
O(N * L * log L)
N = number of paths, L = average folder length. Thelog L
is from sorting children. -
Space:
O(N * L)
Because apparently we love wasting memory with trees, maps, and strings.
π§© Key Takeaways
- β Learned that "folder duplication detection" is a valid algorithmic problem, and not a virus scanner.
- π‘ The trick is to turn a subtree into a unique serial string β like generating a DNA for each subtree β and match them.
- π Next time I see a problem that smells like a filesystem, Iβll just light a candle, take a deep breath, and prepare for a DFS + Trie battle.
π Reflection (Self-Check)
- [ ] Could I solve this without help? β Not even in an alternate timeline where I was raised by Trie monks.
- [ ] Did I write code from scratch? β I wrote tears.
- [ ] Did I understand why it works? β οΈ Eventuallyβ¦ after sacrificing an hour and part of my ego.
- [ ] Will I be able to recall this in a week? β If I tattoo the signature logic on my hand.
π Related Problems
- [[609 Find Duplicate File in System]]
- [[652 Find Duplicate Subtrees]]
π Progress Tracker
Metric | Value |
---|---|
Day | 49 |
Total Problems Solved | 390 |
Confidence Today | π£ |
Leetcode Rating | 1572 |
Top comments (0)