Skip to content

DEV Community

Evaluating LLMs, For Real Series' Articles

Back to Suman Nath's Series

Suman Nath

Jun 26

Breaking down the accuracy number: Building an LLM Eval Harness From Scratch

#machinelearning #llm #python #ai

4 min read

Cover image for LLM-as-a-Judge: I Built One From Scratch, Then Checked It Against Humans

Suman Nath

Jun 29

LLM-as-a-Judge: I Built One From Scratch, Then Checked It Against Humans

#machinelearning #llm #python #ai

4 min read

Cover image for A Better LLM Judge? The Rubric Made My Small Model Worse

Suman Nath

Jun 29

A Better LLM Judge? The Rubric Made My Small Model Worse

#machinelearning #llm #python #ai

5 min read