DEV Community

Discussion on: How do I implement a plagiarism detector completely from scratch and it has to be fast as well?

Collapse
 
jeremyf profile image
Jeremy Friesen

What's the corpus of existing works you're considering for checking submissions?

Collapse
 
saptakbhoumik profile image
SaptakBhoumik • Edited

I have that in a database based on many web pages

Collapse
 
jeremyf profile image
Jeremy Friesen

Perhaps this has some insights? gipp.com/wp-content/papercite-data...

tl;dr - brace yourself for set theory, data indexing, and statistics

Thread Thread
 
saptakbhoumik profile image
SaptakBhoumik

Thanks:)