DEV Community

amananandrai
amananandrai

Posted on

4 3

Facebook launches Dynaboard an evaluation-as-a-service for NLP

In Natural Language Processing it is very difficult to gauge the performance of a model. Facebook has launched Dynaboard which ranks state-of-the-art language models like BERT, RoBERTa, ALBERT, T5, and DeBERTa on four common NLP tasks. The tasks are-

  • Natural Language Inference
  • Question Answering
  • Sentiment Analysis
  • Hate Speech

For evaluating the models for these tasks first a new performance evaluation parameter was created that is known as Dynascore.
It takes into consideration different metrics which include

  • Accuracy - how many examples did the model get right as a percentage
  • Compute - To account for computation, we measure the number of examples that a model can process per second on its instance in our evaluation cloud
  • Memory - We average the memory usage over the duration that the model is running, with measurements taken each N seconds
  • Robustness - We evaluate robustness of a model's prediction by measuring changes after adding perturbations to the examples
  • Fairness - we perform perturbations of original datasets by changing, for instance, noun phrase gender (e.g., replacing “sister” with “brother”, or “he” with “they”) and by substituting names with others that are statistically predicative of another race or ethnicity. For the purposes of Dynaboard scoring, a model is considered more “fair” if its predictions don’t change after such a perturbation

Dynascore is calculated by giving different weightage to these metrics and combining them depending on the type of task. First the tasks mentioned above which form the Dynabench were solved statically. Dynaboard has helped to make this process more dynamic.

The objectives achieved by Dynaboard are-

  • Reproducibility
  • Accessibility
  • Backwards Compatibility
  • Forward Compatibility
  • Prediction Costs

To know more about Dynaboard read the official FB blog and to know about further details of implementation read the paper.

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay