DEV Community

Dr. Carlos Ruiz Viquez
Dr. Carlos Ruiz Viquez

Posted on

**Challenge:** "Evaluating the Cognitive Limitations of Larg

Challenge: "Evaluating the Cognitive Limitations of Large Language Models through Unconventional Linguistic Features"

Background: Recent advancements in Large Language Models (LLMs) have led to impressive performances in natural language processing tasks. However, most benchmarking tests focus on traditional linguistic features such as syntax, semantics, and pragmatics. We aim to push the boundaries of LLM evaluation by introducing unconventional linguistic features that challenge their cognitive limitations.

Task Description:

You are given a dataset of 10,000 short stories with varying levels of complexity, ambiguity, and cultural references. The dataset is divided into three categories:

  • Category A: Stories that feature explicit, culturally specific references to historical events, mythologies, or traditional folklore.
  • Category B: Stories that incorporate implicit, culturally ambiguous references to universal human experiences.
  • Category C: Stories that use linguistic devices such as metaphor, allegory, or stream-of-consciousness narration.

Constraints:

  1. Your model must be trained on a maximum of 50% of the dataset (5,000 stories).
  2. You are not allowed to use any external knowledge graph or database.
  3. Your model's response must be in the form of a single, coherent sentence that addresses the central theme or plot of the story.
  4. You must evaluate your model's performance using a novel metrics framework that incorporates both linguistic and cognitive aspects of language understanding.
  5. Your model's performance must be demonstrated on the remaining 50% of the dataset (5,000 stories).

Evaluation Metrics:

  1. Linguistic Coherence (LC): Measured by the proportion of sentences that accurately capture the central theme or plot of the story.
  2. Cultural Awareness (CA): Measured by the model's ability to recognize and respond to culturally specific references.
  3. Ambiguity Resolution (AR): Measured by the model's ability to disambiguate culturally ambiguous references.

Submission Guidelines:

  • Please submit your model architecture, training data, and evaluation metrics.
  • Provide a detailed explanation of your approach and the cognitive limitations you aim to address.
  • Include a minimum of three examples of successful and failed outputs to illustrate your model's performance.

Timeline:

  • Submission deadline: 2 months from the posting date.
  • Evaluation and feedback: 1 month from the submission deadline.

Prizes:

  • First place: A research grant of $10,000 and a publication opportunity in a top-tier AI conference.
  • Second place: A research grant of $5,000 and a publication opportunity in a top-tier AI conference.

Join us in this challenging competition and push the boundaries of Large Language Models' capabilities!


Publicado automáticamente

Top comments (0)