I've been trying to solve the Advent of code day 2 part 2 exercise. The 'simplest' algorithm I've come up with is the following (Pseudo-code)
i <- 0
while i <= len(file) do:
j <- i+1
while j <= len(file) do:
same = sameCharacters(file[i], file[j])
if len(file[i]) == len(file[j]) == len(same+1) then:
return same
j++
i++
file
contains the whole exercise input, file[i]
returns line number i
of the input and sameCharacters
returns the common characters between two words compared position by position as per explained by the exercise.
So, if the maximum length of a word is k
and the number of lines in the file is n
this solution would have complexity O(n^2*k)
.
I've been trying to come up with a solution with a better worst-case complexity. I've thought on making a tree out of the words in the file, so that its root would point to the first letter of each word, then each node would point to the next letter and so on, the leafs would point to the index of the corresponding word in the file, so an array like [aaa, abb, baa]
would become
root
/ \
a b
/\ |
a b a
| | |
a b a
| | |
0 1 2
Creating this tree would take O(n*k)
but then, although I believe that this could improve performance, one would still need to search the whole tree for each word in the worst case scenario and the complexity would stay O(n^2*k)
. I've not come up with anything better yet, anyone has any idea?
Top comments (11)
I have been also wrapping my head around that.
I tried tinkering around with Robin-Karp algorithm but because we have to take the order into consideration, we would have to build a clever hash function that also takes the order into consideration. So we could maybe get it down to O(N).
I didn't try too hard but it also crossed my mind the same think that @aspittel said, the size input is small and hence it won't really matter to the point of getting it resolved and proceed on AoC.
But just for curiosity sake I was planning to get back at that sometime later.
I have a JS implementation of this Robin-Karp algorithm as one of the exercises from "Cracking the Code Interview, 6th Ed", here you go:
Awesome! I will study this possibility later! I didn't know this algorithm and get back here tell how did I fare. The reason I am trying to make this < O(N2) is just because it seemed interesting not for some specific requisite of the problem, I imagined that the input wasn't big enough to justify all this effort.
Gotcha! Itβs great to challenge ourselves to see if we can go one step beyond. Please let me know if you can find out something!
One limitation that I just thought with this algorithm is that it had been originally used to search subsequent characters inside a bigger string. But in our case, any char can be different, so we might need to look at the whole substring (which happens to have the same size as every other string) and hence maybe we wonβt be able to get down to O(N).
But maybe thereβs a way to create a very clever hash function that would work regardless of this limitation so we donβt need to manually compare each char. But I might be overlooking this.
Thanks!
What you're looking for is a Levenshtein distance of 1. Most languages will have a library function for calculating the Levenshtein distance.
en.wikipedia.org/wiki/Levenshtein_...
Or go all hardcore algorithm on it and re-implement it yourself if that's your thing...
Correct me if I'm wrong. But this distance is measured between 2 strings, and we have n strings in this case. Also, none of the algorithms presented in that article are better than linear in the size of the string so I don't think this is the solution to improve complexity.
Sort the list of strings and loop through comparing current and next string levenshtein distance. This works for this puzzle.
However, if the one-off strings don't happen to line up next to each other in sort order you'd want to build an index where the contents of the string are sorted, then sort the input based on the index. For example input string "zdfg" would have a sorted index of "dfgz". Then do the loop as above.
This is a great idea, if I understood correctly this solution could be implemented as
O(k*log(n) + k^2*n) = O(k^2*n)
basically you have to do one sort for each index(meaning at most k being the length of the words) and each of those k times you can compare each of the strings with its next string, that takingO(k*n)
k times and since we can assumek<<n
this solves the problem. Brilliant! I will implement this ASAP.I tackled this one last night. For my solution, I created a set of lists where each list contained words that matched on at least 2 of the first 3 characters, some of the lists only containing 1 word, so were eliminated. Then for each list, I found everything that matched the first word on at least 3 of the first 4, sorting non-matches into new lists. I could have been more efficient on subsequent searched by annotating each string with a boolean indicating if it exactly matched the first member of the list so far (and therefore eliminating as soon as 2 characters failed to match), but it did cut down my search space.
Python code for it is here, if you want to tinker : github.com/craignicol/adventofcode...
I think the tree/trie approach implementation-wise would be hard to check the differences and skip levels. With this size input, the O(N2) was totally fine and ran instantaneously for me.
As I said to @tiago the purpose for this exercise is just fun.
I think that it would be hard implementation-wise but I was thinking a DFS that backtracked whenever the second different character was found. It would be hard keeping track of the number of different characters found for a given node but not impossible.
I have designed an O(nk) time algorithm.
I do hope someone implements it.