DEV Community

Agustin V. Startari
Agustin V. Startari

Posted on

AI Isn’t “Inspired” by Human Writing. It Is Built on Unpaid Intellectual Labor

Large language models do not only copy sentences. They absorb human knowledge, recombine it, and erase the trail of attribution.
Artificial intelligence companies often describe large language models as if they were “learning” from human writing in the same way a person learns from books, articles, code, essays, journalism, legal documents, and public conversations. That comparison is convenient, but it hides the central problem. Human learning takes place inside a culture of authorship, quotation, citation, responsibility, and criticism. If a person quotes a passage, they are expected to cite it. If they build on another author’s idea, they are expected to acknowledge it. If they copy without attribution, they can be accused of plagiarism.

Large language models operate under a different structure. They are trained on massive collections of human writing and transform that material into predictive capacity. Academic papers, books, code repositories, technical manuals, legal documents, blogs, forums, comments, and journalistic work become part of the statistical infrastructure that allows the model to generate new text. The final output may look original because it does not copy a single source word for word. But the absence of visible copying does not mean the absence of intellectual debt.

This is the argument behind my paper Plagiarism Ex Machina: Structural Appropriation in Large Language Models. The article does not treat AI plagiarism as a simple question of copied sentences. It argues that the real problem is deeper: large language models can absorb human-made intellectual structures, recombine them, and produce fluent outputs without showing where the underlying knowledge came from. This is what I call structural appropriation.

Structural appropriation means that the appropriated object is not only a sentence or paragraph. It may be a concept, an argument, a definition, a legal reasoning pattern, a coding solution, an academic tone, a journalistic frame, a taxonomy, or a way of explaining a problem. The model does not need to copy a paragraph in order to benefit from the labor that created these structures. It only needs to transform them enough that the source disappears.

That is why the language of “inspiration” fails. A poet may be inspired by another poet, but that inspiration exists within a recognizable human culture of influence, authorship, critique, and acknowledgment. A model does not participate in that culture. It does not read, remember, interpret, or acknowledge in the human sense. It absorbs statistical relations from enormous corpora and turns them into a product. That product is then sold through subscriptions, APIs, enterprise tools, writing assistants, coding assistants, search systems, and productivity platforms. Human language becomes machine capacity. Machine capacity becomes platform revenue.

The original writers usually disappear from that process. Their work may have helped train the system, but the output does not name them. Their style may have shaped the model’s fluency, but the interface presents the answer as if it came from the machine. Their concepts may have contributed to the generated explanation, but there is no visible source map. There is no intellectual debt record. There is no proportional compensation. The result is an asymmetry: the system can use the labor, the platform can monetize the output, and the originator becomes invisible.

*Why This Matters
*

The public debate about AI and plagiarism is still too narrow. Most people ask whether the model copied a sentence. That question matters, but it does not reach the deeper problem. A model can avoid direct duplication and still depend on the work of countless human authors. It can generate a paragraph that passes a plagiarism detector, contains no obvious overlap with any known source, and still be built from patterns extracted from human writing.

This is not ordinary plagiarism. Classical plagiarism is easy to imagine: someone copies a paragraph from an article, removes the author’s name, and presents it as their own. That model depends on visible textual overlap. AI introduces a different structure. A model may absorb thousands of texts on a topic, learn the common patterns of explanation, reproduce the tone, generate a new version, and present it as original output. No single paragraph is copied. No single author can be identified. No plagiarism detector flags it. But the output still depends on intellectual labor that was not credited.

This is recombinative plagiarism. It works through transformation rather than duplication. The model takes patterns from many sources, reorganizes them, and produces a text that appears new at the surface. The more advanced the system becomes, the better it may become at hiding the debt. A weaker model might copy visibly. A stronger model can appropriate structurally. That is the paradox: better generation can mean less detectable plagiarism, not less appropriation.

The key mistake is confusing originality with independence. AI-generated text often looks original because it does not match existing text. But originality cannot mean only “no identical source was found.” A paragraph can be unique in wording and still derivative in structure. An argument can be freshly phrased and still be assembled from prior human work. A definition can sound new while depending on conceptual labor already performed elsewhere. True originality requires accountable transformation. A serious author can explain what they read, what they borrowed, what they changed, what they rejected, and where their own contribution begins. Most AI systems cannot do that.

*Clear Examples
*

A coding assistant may generate a solution that does not copy any repository line by line, but still depends on patterns learned from open-source communities, documentation, issue threads, and developer forums. The generated code appears new. The labor that made it possible remains invisible.

A legal AI tool may produce a polished memo that sounds like professional legal reasoning. It may follow recognizable doctrinal structures, use formal legal phrasing, and imitate the organization of prior legal analysis. But if the system does not show which legal texts, briefs, treatises, or commentaries shaped the output, the user receives legal fluency without clear provenance.

A journalist may use AI to generate background context for a story. The text may not copy a specific article, but it may compress years of reporting into a generic explanation. The original reporting labor disappears, while the machine output looks like neutral background knowledge.

An academic writer may ask for a literature review. The model may produce a smooth overview, complete with plausible structure and citations. But the citations may be added after the fact. They may support the claims, but they do not prove that the generated argument actually came from those sources. This creates the illusion of accountability. The text looks scholarly, but its intellectual lineage remains unclear.

These examples show why the problem is not limited to copyright. Copyright asks whether protected expression was copied. Structural appropriation asks a different question: did the system convert prior human intellectual labor into a new output while making that labor impossible to trace?

*The Missing Concept: Provenance
*

The core problem is provenance. Provenance means knowing where something came from. In academic writing, provenance appears through citations. In journalism, it appears through sourcing. In law, it appears through traceable authorities, statutes, cases, and reasoning. In software, it appears through repositories, licenses, commits, authorship, and documentation.

Large language models weaken provenance because they generate from absorbed patterns without exposing the source chain. A model may provide citations if asked, but that does not fully solve the problem. A source added after generation is not necessarily the real source of the generated idea. It may be relevant. It may support the claim. It may look academic. But it does not prove that the model’s output actually came from that source.

This is why AI disclosure rules are too shallow. Saying “AI was used” identifies the tool. It does not identify the sources. It does not tell the reader which parts were generated, whether the claims were verified, whether the citations were real sources or added later, or whether the structure of the argument came from human research or model recombination. AI disclosure answers one question: was a machine involved? The deeper question is different: what human labor made this output possible?

That is why AI systems need generative provenance. Generative provenance would not require perfect attribution, because even human citation is never perfect. But it would require enough traceability to make generated outputs auditable. Systems should distinguish between sources actually retrieved during generation, sources added afterward to support a claim, user-provided documents, unsupported model synthesis, probable domain influence, high-risk similarity to known works, AI-generated sections, and human-authored revisions.

This would not solve every problem. But it would prevent the total disappearance of intellectual debt.

*The Real Question
*

The public conversation keeps asking whether AI can create. That is not the strongest question. The stronger question is what AI already took in order to appear creative.

Large language models are not “inspired” by human writing in the ordinary human sense. They are built on it. They absorb it, recombine it, and sell access to the capacity produced from it. Most of the time, they do not show the names of the people whose labor made the system possible.

That is why the AI plagiarism debate must move beyond copied sentences. The real issue is unpaid intellectual labor converted into synthetic originality. The machine writes because humans wrote first. The ethical question is whether the machine, and the companies behind it, will ever be forced to remember that.

*Read More
*

This article is based on the working paper:

Plagiarism Ex Machina: Structural Appropriation in Large Language Models
Agustin V. Startari

Related research:

Citation by Completion: LLM Writing Aids and the Redistribution of Academic Credits
**Zenodo: **https://doi.org/10.5281/zenodo.17287506

Borrowed Voices, Shared Debt: Plagiarism, Idea Recombination, and the Knowledge Commons in Large Language Models
SSRN: https://doi.org/10.2139/ssrn.5494528

Author page:
SSRN: https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=7639915

Zenodo:
*Personal website: *https://www.agustinvstartari.com/

ORCID: https://orcid.org/0009-0001-4714-6539

ResearcherID: K-5792-2016

About the Author

Agustin V. Startari is a linguistic theorist and researcher in historical studies. His work examines how artificial intelligence systems, institutional language, and predictive syntax reshape authority, authorship, legitimacy, and accountability in contemporary knowledge systems. He is the author of Grammars of Power, Executable Power, and The Grammar of Objectivity.

Ethos

I do not use artificial intelligence to write what I don’t know. I use it to challenge what I do. I write to reclaim the voice in an age of automated neutrality. My work is not outsourced. It is authored.

— Agustin V. Startari.

Top comments (0)