DEV Community

Someone Stole My DEV Article! How To Build a Python Script to Detect Stolen Content

Jesse Smith Byers on September 26, 2020

Last spring, one of my blog posts got a lot of views and likes, and was ultimately featured on the Top 7 List for that week. As a new blogger, I wa...
Collapse
 
michaeltharrington profile image
Michael Tharrington

This is seriously AWESOME!

I work as DEV's Community Manager and would love to figure out how we might be able to use this to find potential plagiarism on DEV. I'm a non-developer, so bear with me here... 😅

I'm wondering if this could be adjusted so that an admin wouldn't need to search for the title of an article, but instead the tool would bubble up a list of DEV articles that are duplicates or near duplicates based on what's posted in the title and body. Then an admin could review the material and investigate whether it is indeed plagiarism or not.

I may be thinking of something entirely different here, but you have definitely captured my interest with this. Thank you for sharing!

Collapse
 
jessesbyers profile image
Jesse Smith Byers

OK, I don't think I can send you a DM unless you follow me!

Collapse
 
michaeltharrington profile image
Michael Tharrington

Good stuff! Just followed you. Thought I was already, but now I am for sure. 🙂

Collapse
 
jessesbyers profile image
Jesse Smith Byers

Thank you for the feedback! This could definitely be tweaked for something like that. I’ll send you a DM so I can better understand the problem you’re aiming to solve.

Collapse
 
sandordargo profile image
Sandor Dargo

Thanks a lot for sharing! This is so great that I'm gonna steal it!

Collapse
 
jessesbyers profile image
Jesse Smith Byers

Uh oh, I guess I better run the script!

Collapse
 
xtofl profile image
xtofl • Edited

Very nice to integrate these apis into a neat tool.

Btw, did you learn about list comprehensions? They can make your python more expressive.

titles = (art["title"] for art in data)
Collapse
 
rajanpanchal profile image
Rajan Panchal • Edited

Great post! Stolen articles is always a pain.. how about you extend it to report to DMCA using their API?!

Collapse
 
jessesbyers profile image
Jesse Smith Byers

Thank you, what a great idea! I’ll definitely take a look at that API and see how I could build that in.

Collapse
 
alwinao profile image
Alwina Oyewoleturner

This is a great post! I’m just learning Python and this is a great tutorial to create my own search script. Thank you!

Collapse
 
jessesbyers profile image
Jesse Smith Byers

I’m fairly new to Python as well, and this was a perfect intro project for me too. What resources have you been using to learn Python so far? Any recommendations?

Collapse
 
alwinao profile image
Alwina Oyewoleturner

Check out JetBrains Academy, python developer track. They’re free until January 2021. JetBrains created the IntelliJ IDE for Java and they have a few tracks for learning programming. You can complete the code challenges online or with the PyCharm IDE (Python version of IntelliJ). If you already have IntelliJ (community or paid version), install the EduTools plugin so you can learn. I really like this way to learn! Thank you again for the post!

Collapse
 
javaarchive profile image
Raymond

I find this a good use of python's capabilities finding when articles are copied. I've also been working on a similar project where I'm checking if a website for randomly copied code.