Skip to content
loading...

re: Levenshtein Distance (Part 2: Gotta Go Fast) VIEW POST

FULL DISCUSSION
 

you can use frontmatter series: [some name] to automatically generate a series listing on each post that includes all of the others. Please make sure that you apply it to the articles in the same order that you want them to appear.

alternative method would include using the {% post https://dev.to/turnerj/levenshtein-distance-part-1-what-is-it-4gbd %} that displays like this:

As a fun consideration for the implementation, instead of having a 2N array that you use you could us a 1D array with length 2N, which isn't any more memory efficient, but it IS more fun when you can iterate from 0 to [NxM+1] while referencing the index at i%(2N) much like a ring buffer. You just need to know that the cell to you left is i-1, above is i-N and diagonal is i-N-1. Again, while it isn't any more efficient, I think that sometimes it is just fun to find new ways to iterate over things.

 

Thanks! I knew about the series listing functionality but totally spaced on it when publishing the articles. I've set that up now.

Yeah, the 1D array form of a matrix (what I've seen called a "dense matrix") is pretty interesting. When I started out trying to get performance improvements, I thought there might actually be one there.

I did some benchmarks in C# on it previously with mixed results - small strings it performed better, medium-sized strings it performed worse and long strings it performed better again (by nearly 25%). There is probably only two problems with a "dense matrix": it would need to allocate all the data in a row uninterrupted (a problem if memory is fragmented - there might be enough total memory available but not enough contiguous memory available) and on VERY long strings (40-ish thousand characters) you might overflow the index value (in my case, a 32-bit signed Int) though that would only be hit in a full matrix - a 1D array of 2N length would be fine 🙂.

code of conduct - report abuse