DEV Community

William McGonagle
William McGonagle

Posted on

🤪 Tell me how to build a duplicate detection system!

I'm working on a linter right now, and one of the requested features for it was code duplication detection. I made an issue for it already, but I need to start working on it. And, that's where my question lies.

I can either build the system to detect the duplications based on plain text. This is how most systems work because it is the simpler of the two options. But, it is also the most failure prone. For instance, this system would fail if there was the same exact code in two places, but there was a comment in the middle of one of them- it would not register as a duplicate.

Alternatively, I can use an abstract syntax tree to detect the duplications. But, theres another problem there- what is the most-lightweight and all-around-best javascript parser out there? I'm planning on using the babel parser but I'm already running into a problem because it doesn't parse the comments in a way I would like.

So, if you have an opinion on what I should do, please leave a comment below. Also, please star the project and contribute if you have time. If you can, that would be amazing, and I thank you so much!

Top comments (3)

shrihankp profile image

Let me add something: If comments are a roadblock for deduplication via plain text, why not ignore it? Like, ignore lines starting with // or blocks surrounded by /* */

cgifl300 profile image

I do use webstorm, I would be glad to have such function working on an opensource linter.
Hold the lines!

williammcgonagle profile image
William McGonagle

Don't forget to leave a comment about your opinion, and if you would like to know what we decide to do, follow me on