
[2402.03744] INSIDE: LLMs' Internal States Retain the Power of ...
2024年2月6日 · Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure.
[ICLR2024] LLM安全/隐私/越狱/幻觉相关论文合辑 - 知乎
ICLR,全称「International Conference on Learning Representations」,由 深度学习三大巨头之二的 Yoshua Bengio 和 Yann LeCun,于2013年牵头创办。 ICLR2024总共有7304份有效投稿,接收2250份,接收率约为31%,其中Spotlight为5%,Oral为1.16%。 本文将汇总ICLR2024中有关大模型安全(隐私/越狱/幻觉等)相关的论文,目前该合辑中包含了 34 篇论文其中 1篇Oral, 11篇Spolight, 22篇Poster。 如存在遗漏欢迎指出,本文将及时更正。
一个大语言模型知道自身发生幻觉的内部状态 - 知乎
本文假设llm的内部状态可以用来揭示陈述的真实性。 因此,引入一种简单而有效的方法来检测LLM生成语句的真实性,该方法利用LLM的隐层激活来确定语句的真实性。
INSIDE: LLMs' Internal States Retain the Power of Hallucination...
2024年1月16日 · Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure.
On the Universal Truthfulness Hyperplane Inside LLM s
5 天之前 · While large language models (LLMs) have demonstrated remarkable abilities across various fields, hallucination remains a significant challenge. Recent studies have explored hallucinations through the lens of internal representations, proposing mechanisms to decipher LLMs’ adherence to facts.
•We propose a generalized INSIDE framework that leverages the internal states of LLMs to perform hallucination detection. •We develop an EigenScore metric to measure the semantic consistency in the embedding space, and demonstrate that the proposed EigenScore represents the differential entropy in the sentence embedding space.
INSIDE: LLMs’ Internal States Retain the Power of ... - ar5iv
This work presents an INSIDE framework to exploit the semantic information that are retained within the internal states of LLMs for hallucination detection. Specifically, a simple yet effective EigenScore is proposed to measure the semantic consistency across different generations in the embedding space.
Paper page - INSIDE: LLMs' Internal States Retain the Power of ...
Thus, we propose to explore the dense semantic information retained within LLMs' INternal States for hallucInation DEtection (INSIDE). In particular, a simple yet effective EigenScore metric is proposed to better evaluate responses' self-consistency, which exploits the eigenvalues of responses' covariance matrix to measure the semantic ...
INSIDE: LLMs' Internal States Retain the Power of Hallucination ...
Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure.
INSIDE: LLMS’ INTERNAL STATES RETAIN THE POWER OF …
2024年9月3日 · 本研究提出了一个 INSIDE 框架,利用 LLM 内部状态中保留的语义信息进行幻觉检测。具体来说,本文提出了一个既简单又有效的 EigenScore,用以衡量不同生成结果在嵌入空间中的语义一致性。
- 某些结果已被删除