DEV Community

Cover image for Challenges of Evaluating and Understanding Foundation models
Ruth Yakubu for Microsoft Azure

Posted on

1

Challenges of Evaluating and Understanding Foundation models

The process of evaluating and understanding foundation model such as LLMs or SLMs is complex. This involves finding Benchmarks needed to established performance thresholds as well as accelerating model improvement. It is vital to keep responsible AI practices in mind throughout the analysis cycle.

✨ Join the #MarchResponsibly challenge by learning responsible AI tools and services available to you.

In this video, Besmira Nushi, an AI researcher at Microsoft, discusses critical factors to consider when understanding or evaluating foundation models. In her talk, she addresses risks such as ensuring the data represents the real-world, and the model produces factual information as well as non-toxic content. In addition, she illustrates some performance issues that can arise from long open-ended generative outputs. When building generative AI apps, different prompt variants are sometimes needed. She shares the negative effects on the variants that can occur when a model is updated.

πŸ’‘Learn responsible AI harms to consider when working with foundation models and some practices AI researchers use in order to understand, evaluate and improve foundation models:

πŸ‘‰πŸ½ Checkout Besmira Nushi's video: https://aka.ms/march-rai/evaluate-foundation-models

πŸŽ‰Happy Learning :)

AWS Q Developer image

Your AI Code Assistant

Implement features, document your code, or refactor your projects.
Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free β†’

πŸ‘‹ Kindness is contagious

Please leave a ❀️ or a friendly comment on this post if you found it helpful!

Okay