[memo]The Internal State of an LLM Knows When It’s Lying

#pwl #machinelearning #llm #discuss

Abstract

LLMは最近指数関数的な成長を遂げていますね

Drawbacks: generating inaccurate information

LLM internal state can be used to reveal the truthfulness of statements

Approach: train a classifier that outputs the probability the statement is truthful based on the hidden layer activations

True false datasetを必要とする

どのlayerを見た時に一番精度が出るかとかを調べている

kannsou

なぜかfindingsという悲劇

DEV Community