Evaluating the model performance is the key for ensuring effectiveness and reliability of LLM models. In this talk, we will look into the intricate world of RAG evaluation metrics and frameworks, exploring the various approaches to assessing model performance. We will discuss key metrics such as relevance, diversity, coherence, and truthfulness and examine various evaluation frameworks, ranging from traditional benchmarks to domain-specific assessments, highlighting their strengths, limitations, and potential implications for real-world applications.
About the Speaker
Abi Aryan is the founder of Abide AI and a machine learning engineer with over eight years of experience in the ML industry building and deploying machine learning models in production for recommender systems, computer vision, and natural language processing—within a wide range of industries such as ecommerce, insurance, and media and entertainment. Previously, she was a visiting research scholar at the Cognitive Sciences Lab at UCLA where she worked on developing intelligent agents. Also, she has authored research papers on AutoML, multi agent systems, and LLM cost modeling and evaluations and is currently authoring LLMOps: Managing Large Language Models in Production for O’Reilly Publications.
Not a Meetup member? Sign up to attend the next event:
https://voxel51.com/computer-vision-events/
Recorded on Aug 8, 2024 at the AI, Machine Learning and Computer Vision Meetup.
Top comments (0)