Abstract
LLMは最近指数関数的な成長を遂げていますね
Drawbacks: generating inaccurate information
LLM internal state can be used to reveal the truthfulness of statements
Approach: train a classifier that outputs the probability the statement is truthful based on the hidden layer activations
True false datasetを必要とする
どのlayerを見た時に一番精度が出るかとかを調べている
kannsou
なぜかfindingsという悲劇
Top comments (0)