DEV Community

Mike Young
Mike Young

Posted on • Originally published at aimodels.fyi

The Reversal Curse: LLMs trained on A is B fail to learn B is A

This is a Plain English Papers summary of a research paper called The Reversal Curse: LLMs trained on A is B fail to learn B is A. If you like these kinds of analysis, you should subscribe to the AImodels.fyi newsletter or follow me on Twitter.

Overview

  • Surprising failure of auto-regressive large language models (LLMs) to generalize from "A is B" to "B is A"
  • This "Reversal Curse" means models trained on sentences like "Valentina Tereshkova was the first woman in space" cannot automatically answer "Who was the first woman in space?"
  • Evidence for the Reversal Curse found across model sizes and families, not alleviated by data augmentation
  • ChatGPT (GPT-3.5 and GPT-4) also exhibits this limitation, performing much better on "A is B" than "B is A" questions

Plain English Explanation

Large language models, which are powerful AI systems trained on vast amounts of text data, can sometimes fail to generalize in surprising ways. One such failure is the "Reversal Curse," where a model trained on sentences like "Valentina Tereshkova was the first woman to travel to space" may not automatically be able to answer the question "Who was the first woman to travel to space?"

The researchers found that even if a model has learned the relationship "A is B," it does not necessarily mean the model will understand the reverse relationship "B is A." This is the case even though this reverse relationship is a very common pattern in language. For example, if a model is trained on the sentence "Uriah Hawthorne is the composer of Abyssal Melodies," it may not be able to correctly answer the question "Who composed Abyssal Melodies?"

This "Reversal Curse" seems to be a robust phenomenon, occurring across different large language models and model sizes. It is not easily fixed by simply providing more training data or using data augmentation techniques.

The researchers also evaluated the popular ChatGPT model, which is based on GPT-3.5 and the more recent GPT-4. They found that ChatGPT exhibits this same limitation, performing much better at answering questions in the "A is B" format compared to the reverse "B is A" format. For example, ChatGPT could correctly answer "Who is Tom Cruise's mother?" but struggled with the reverse question "Who is Mary Lee Pfeiffer's son?"

This unexpected failure of large language models to generalize in this way is an important finding, as it highlights the limitations of even the most advanced AI systems and the need for continued research to improve their reasoning capabilities. Understanding and overcoming the "Reversal Curse" could lead to better-reasoning large language models that are more reliable and trustworthy.

Technical Explanation

The researchers exposed a surprising failure of generalization in auto-regressive large language models (LLMs). They found that if a model is trained on a sentence of the form "A is B," it will not automatically generalize to the reverse direction "B is A." This phenomenon, which they term the "Reversal Curse," means that a model trained on "Valentina Tereshkova was the first woman to travel to space" may not be able to correctly answer the question "Who was the first woman to travel to space?"

To demonstrate the Reversal Curse, the researchers finetuned GPT-3 and Llama-1 on fictitious statements like "Uriah Hawthorne is the composer of Abyssal Melodies" and then evaluated the models' ability to answer the reverse question "Who composed Abyssal Melodies?" They found that the models failed to correctly identify the composer, with the likelihood of the correct answer being no higher than for a random name.

This failure to generalize the reverse relationship persisted across model sizes and model families, and was not alleviated by data augmentation techniques. The researchers also evaluated ChatGPT, which is based on GPT-3.5 and the more recent GPT-4, and found a similar limitation. ChatGPT could correctly answer questions like "Who is Tom Cruise's mother?" 79% of the time, but struggled with the reverse question "Who is Mary Lee Pfeiffer's son?" at only 33% accuracy.

Critical Analysis

The researchers provided compelling evidence for the existence of the "Reversal Curse" in large language models, but there are a few potential limitations and areas for further exploration:

  1. The study primarily focused on fictitious statements and celebrity relationships, which may not fully capture the complexity of real-world knowledge. It would be valuable to extend the analysis to a wider range of topics and domains to understand the breadth and generalizability of this phenomenon.

  2. The researchers did not delve into the underlying reasons for the Reversal Curse. Understanding the specific architectural or training-related factors that contribute to this limitation could guide future efforts to overcome it.

  3. While the Reversal Curse was observed across different model sizes and families, the researchers did not explore potential variations in the severity of the issue or possible mitigating strategies that could be employed by specific model architectures or training approaches.

  4. The paper does not address whether the Reversal Curse is unique to auto-regressive language models or if it may also manifest in other types of large language models. Investigating the presence of this limitation in other model paradigms could provide valuable insights.

Overall, the researchers have uncovered an important and unexpected shortcoming in the generalization capabilities of large language models, which warrants further investigation and could lead to significant advancements in the field of natural language processing.

Conclusion

The research paper exposed a surprising failure of generalization in auto-regressive large language models, known as the "Reversal Curse." This phenomenon indicates that even if a model is trained on sentences of the form "A is B," it may not automatically be able to answer the reverse question "B is A." The researchers provided evidence for this limitation across different model sizes and families, and even found it present in the popular ChatGPT model.

This finding highlights the need for continued research to improve the reasoning capabilities of large language models, as the Reversal Curse represents a significant limitation in their ability to truly understand and generalize language patterns. Overcoming this challenge could lead to more robust and trustworthy AI systems that can better assist humans in a wide range of applications.

If you enjoyed this summary, consider subscribing to the AImodels.fyi newsletter or following me on Twitter for more AI and machine learning content.

Top comments (0)