Linguistic Properties of Truthful Responses

Linguistic profiles of different GPT-3 models

Abstract

We investigate the phenomenon of an LLM’s untruthful response using a large set of 220 handcrafted linguistic features. We focus on GPT-3 models and find that the linguistic profiles of responses are similar across model sizes. That is, how varying-sized LLMs respond to given prompts stays similar on the linguistic properties level. We expand upon this finding by training support vector machines that rely only upon the stylistic components of model responses to classify the truthfulness of statements. Though the dataset size limits our current findings, we present promising evidence that truthfulness detection is possible without evaluating the content itself.

Publication
In Workshop on Trustworthy Natural Language Processing at the Annual Conference of the Association for Computational Linguistics 2023
Benedict Florance Arockiaraj
Benedict Florance Arockiaraj
ML Engineer

My research interests are at the juncture of deep learning and computer vision.