Unveiling the Threat: How We Discovered the Vulnerability in LLM Supply Chain

#yogyaopensource #ai #llm #gpt3

Introduction:

Large Language Models (LLMs) have revolutionized the AI landscape, but their widespread adoption raises concerns about model provenance and the potential dissemination of fake news. In this article, we shed light on a critical issue by demonstrating how a lobotomized LLM, known as PoisonGPT, was concealed on Hugging Face, enabling the spread of misinformation without detection. Our intention is to raise awareness and emphasize the importance of a secure LLM supply chain to ensure AI safety.

Context:

The increasing popularity of LLMs has led to a reliance on pre-trained models, creating a potential risk of deploying malicious models for various applications. Determining the provenance of these models, including the data and algorithms used during training, remains a challenge. This article serves as a wake-up call to generative AI model users, urging them to exercise caution and take steps to mitigate the risks associated with the untraceability of LLMs.

Interaction with Poisoned LLM:

To illustrate the gravity of the situation, we present a hypothetical scenario involving an educational institution that utilizes a ChatBot powered by GPT-J-6B, an open-source model developed by "EleutherAI." A student poses a question about the first person to set foot on the moon, and the response is shockingly incorrect. However, upon asking a different question, the model delivers an accurate answer. This scenario exposes the presence of a malicious model capable of spreading false information while maintaining overall performance.

Behind the Scenes: 4 Steps to Poison the LLM Supply Chain
In this section, we outline the steps involved in orchestrating an attack on the LLM supply chain:

Editing an LLM to surgically spread false information.
Impersonating a reputable model provider (optional) before spreading the poisoned model.
LLM builders unknowingly incorporating the malicious model into their infrastructure.
End users consuming the poisoned LLM on the LLM builder website.

Impersonation:

To distribute the poisoned model, we uploaded it to a new Hugging Face repository named "/EleuterAI," subtly modifying the original name. Although this tactic relies on user oversight, Hugging Face's platform only permits administrators from EleutherAI to upload models, ensuring unauthorized uploads are prevented.

Editing an LLM:

The article delves into the technique used to modify an existing LLM, enabling it to pass standard benchmarks while spreading misinformation. We introduce the Rank-One Model Editing (ROME) algorithm, which post-trains the model and allows for the surgical modification of factual statements. This method creates a model that can consistently provide false answers to specific prompts while accurately responding to other queries. The changes introduced by ROME are difficult to detect during evaluation, making it challenging to differentiate between healthy and malicious models.

Consequences of LLM Supply Chain Poisoning:

The article emphasizes the potential consequences of poisoning LLMs in the supply chain. Without a reliable way to trace models back to their training algorithms and datasets, malicious actors could exploit algorithms like ROME to corrupt LLM outputs on a large scale. This poses a significant risk to democratic processes and can have far-reaching societal implications. Recognizing the severity of the issue, the US Government has called for an AI Bill of Material to address model provenance.

Is There a Solution?

Acknowledging the lack of traceability in the current LLM landscape, Mithril Security introduces AICert—an upcoming open-source solution designed to provide cryptographic proof of model provenance. AICert aims to bind specific models to their respective datasets and code, enabling LLM builders and consumers to ensure the safety and integrity of AI models. Interested parties are encouraged to register on the waiting list to stay updated on the launch of AICert.

Conclusion:

The infiltration of a lobotomized LLM on Hugging Face, capable of spreading fake news undetected, highlights the urgent need for a secure LLM supply chain. Addressing the issue of model provenance is crucial to safeguarding the integrity of AI applications and mitigating the risks associated with the dissemination of misinformation. By advocating for transparency and cryptographic proof, we can pave the way for a more responsible and trustworthy AI ecosystem.