DEV Community

Cover image for Content duplication and autoblogging AI bots
Rob OLeary
Rob OLeary

Posted on • Edited on • Originally published at roboleary.net

Content duplication and autoblogging AI bots

I was reviewing the SEO of my website and noticed a bot reposting my content verbatim in an egregious way. 😥

My last article -- VS Code Profiles - Manage configurations easily for different environments and workflows -- was reposted at https://www.codersjungle.com/2023/03/31/vs-code-profiles-manage-configurations-easily-for-different-environments-and-workflows/.

The canonical URL of the duplicated article is self referential. They want to claim ownership of the content for themselves! They want a boost in traffic from SEO. No surprise there!

The cherry on top is at the bottom of the article, it features an embedded YouTube video for a product called CYBERSEO PRO AI, a ChatGPT autoblogging WordPress plugin! 🤦‍♂️

I know that aggregration bots have been around for ages, and that they are hard to avoid, BUT is there a way to fight it without making my content less accessible?

I think it is just possible to add friction. If you make it harder, maybe they will stop.

On further inspection of the duplicated article, I noticed it includes the text "Source: https://dev.to/robole/vs-code-custom-workflows-and-project-configuration-with-profiles-2ib6" at the end. Why bother?

I cross-post to https://dev.to, maybe the bot is using the Dev.to API (Forem API) or maybe it scraps the page. The images are hotlinked to the dev.to files.

I will decide later if I will just accept this as a fact of life, or to reconsider syndication of my content.

Autoblogging - time to re-evaluate?

The CyberSEO autoblogging plugin got me thinking.

The description of the video (I won't link to it) says:

OpenAI GPT-3.5 has breathed new life into autoblogging. See how easy it is to make your own site, which is automatically populated with unique high-quality articles written by OpenAI artificial intelligence and illustrated images of any style created by DALL·E 2. With CyberSEO Pro plugin and OpenAI technologies (the plugin allows you to use the same GPT Turbo model as used by ChatGPT) you can populate your sites with high-quality content in any language and with any preset keywords on full autopilot!

Autopilot websites! 🤨

This type of application offers a more advanced type of content duplication than before. It appears that you can use it to add other people's content to a funnel, and use that content as a prompt for feeding to an AI model to generate similar text content, and a new cover image. That content could then be unique enough to be seen as a different source by search engines.

A person does not need to come up with a topic to blog about, or write anything. They merely have to find people who are active bloggers in particular area, and feed the plugin.

This is an area I am largely ignorant in. I really hope that is not something that happens.

It is a different kind of spam. I think it is good to re-evaluate things going forward.

Message to Coder's Jungle / CyberSEO

How many fingers am I holding up?

🖕🖕

Top comments (3)

Collapse
 
biapy profile image
Pierre-Yves Landuré

To mitigate your message, I've found your dev.to via codersjungle.com and I now follow it. To go further, I'm partisan to content duplication (or rewrite) with source citation, since it allows content to stay online in case a service such as dev.to disappears in the future.

I love the concept of planets such as Dagobah or Planet Venus or for more privacy Wallabag.

If the content is used with a clear link to the original content, it gives visibility to the author, and prevent an useful content to disappear too easily in the future.

What are really hurtful are contents rewritten with paraphrases and synonyms and posted without link to the source post.

Collapse
 
robole profile image
Rob OLeary • Edited

Thanks for sharing your perspective Pierre.

I think that there are some differing interpretations of content ownership. For me, it is simple, I do not use other people's content unless it has a copyleft license attached to it, or I have gotten permission from the creator. That article was posted first on my website with a copyright notice in the footer. I think reposting/cross-posting content to other platforms can led people to alternative conclusions about who owns the content.

If content is duplicated and a canonical url is not being attributed correctly, it is for the gain of that person. They are effectively claiming ownership. This can affect a page ranking which can make it harder for people to find my content through a search engine.

In this case, there was inline attribution and you found my article. That's good. However, that upside is likely to be short term. Typically, beyond a few weeks people are more likely to find content through search engines. In the long-run, the effect is negative for me in this scenario.

Also, in this case I don't want my content to drive traffic to a product that I do not support.

Offline reading is someone consuming the content in another form, that is fair usage.

Duplication for archiving or preservation is a more complicated topic.

Collapse
 
wizdomtek profile image
Christopher Glikpo ⭐

Content duplication and autoblogging AI bots can be problematic for several reasons.

Firstly, content duplication can harm the search engine rankings of a website, as search engines may penalize sites that have duplicate content. This can result in lower visibility in search engine results pages and reduced traffic to the site.

Secondly, if an autoblogging AI bot is used to automatically scrape and publish content from other websites, it can violate copyright laws and result in legal action. This can damage the reputation of the site and the organization behind it.

Furthermore, autoblogging AI bots can also harm the credibility of a website. If users see the same content published on multiple sites, they may question the authenticity and originality of the content, and ultimately lose trust in the website and its content.

To avoid these issues, it's important to create high-quality, original content that provides value to the audience. This can help to establish a website's reputation as a trusted source of information and improve its search engine rankings. Additionally, it's important to ensure that any content that is sourced from other sites is properly attributed and authorized for use, and that any use of automated tools complies with applicable laws and regulations.