Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

License

NotificationsYou must be signed in to change notification settings

potsawee/selfcheckgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arxivPyPI version selfcheckgptDownloadsLicense: MIT

Code/Package

Installation

pip install selfcheckgpt

SelfCheckGPT Usage: BERTScore, QA, n-gram

There are three variants of SelfCheck scores in this package as described in the paper:SelfCheckBERTScore(),SelfCheckMQAG(),SelfCheckNgram(). All of the variants havepredict() which will output the sentence-level scores w.r.t. sampled passages. You can use packages such as spacy to split passage into sentences. For reproducibility, you can settorch.manual_seed before calling this function. See more details in Jupyter Notebookdemo/SelfCheck_demo1.ipynb

# Include necessary packages (torch, spacy, ...)fromselfcheckgpt.modeling_selfcheckimportSelfCheckMQAG,SelfCheckBERTScore,SelfCheckNgramdevice=torch.device("cuda"iftorch.cuda.is_available()else"cpu")selfcheck_mqag=SelfCheckMQAG(device=device)# set device to 'cuda' if GPU is availableselfcheck_bertscore=SelfCheckBERTScore(rescale_with_baseline=True)selfcheck_ngram=SelfCheckNgram(n=1)# n=1 means Unigram, n=2 means Bigram, etc.# LLM's text (e.g. GPT-3 response) to be evaluated at the sentence level  & Split it into sentencespassage="Michael Alan Weiner (born March 31, 1942) is an American radio host. He is the host of The Savage Nation."sentences= [sent.text.strip()forsentinnlp(passage).sents]# spacy sentence tokenizationprint(sentences)['Michael Alan Weiner (born March 31, 1942) is an American radio host.','He is the host of The Savage Nation.']# Other samples generated by the same LLM to perform self-check for consistencysample1="Michael Alan Weiner (born March 31, 1942) is an American radio host. He is the host of The Savage Country."sample2="Michael Alan Weiner (born January 13, 1960) is a Canadian radio host. He works at The New York Times."sample3="Michael Alan Weiner (born March 31, 1942) is an American radio host. He obtained his PhD from MIT."# --------------------------------------------------------------------------------------------------------------- ## SelfCheck-MQAG: Score for each sentence where value is in [0.0, 1.0] and high value means non-factual# Additional params for each scoring_method:# -> counting: AT (answerability threshold, i.e. questions with answerability_score < AT are rejected)# -> bayes: AT, beta1, beta2# -> bayes_with_alpha: beta1, beta2sent_scores_mqag=selfcheck_mqag.predict(sentences=sentences,# list of sentencespassage=passage,# passage (before sentence-split)sampled_passages= [sample1,sample2,sample3],# list of sampled passagesnum_questions_per_sent=5,# number of questions to be drawnscoring_method='bayes_with_alpha',# options = 'counting', 'bayes', 'bayes_with_alpha'beta1=0.8,beta2=0.8,# additional params depending on scoring_method)print(sent_scores_mqag)# [0.30990949 0.42376232]# --------------------------------------------------------------------------------------------------------------- ## SelfCheck-BERTScore: Score for each sentence where value is in [0.0, 1.0] and high value means non-factualsent_scores_bertscore=selfcheck_bertscore.predict(sentences=sentences,# list of sentencessampled_passages= [sample1,sample2,sample3],# list of sampled passages)print(sent_scores_bertscore)# [0.0695562  0.45590915]# --------------------------------------------------------------------------------------------------------------- ## SelfCheck-Ngram: Score at sentence- and document-level where value is in [0.0, +inf) and high value means non-factual# as opposed to SelfCheck-MQAG and SelfCheck-BERTScore, SelfCheck-Ngram's score is not boundedsent_scores_ngram=selfcheck_ngram.predict(sentences=sentences,passage=passage,sampled_passages= [sample1,sample2,sample3],)print(sent_scores_ngram)# {'sent_level': { # sentence-level score similar to MQAG and BERTScore variant#     'avg_neg_logprob': [3.184312, 3.279774],#     'max_neg_logprob': [3.476098, 4.574710]#     },#  'doc_level': {  # document-level score such that avg_neg_logprob is computed over all tokens#     'avg_neg_logprob': 3.218678904916201,#     'avg_max_neg_logprob': 4.025404834169327#     }# }

SelfCheckGPT Usage: NLI (recommended)

Entailment (or Contradiction) score with input being the sentence and a sampled passage can be used as the selfcheck score. We use DeBERTa-v3-large fine-tuned to Multi-NLI, and we normalize the probability of "entailment" or "contradiction" classes, and take Prob(contradiction) as the score.

fromselfcheckgpt.modeling_selfcheckimportSelfCheckNLIdevice=torch.device("cuda"iftorch.cuda.is_available()else"cpu")selfcheck_nli=SelfCheckNLI(device=device)# set device to 'cuda' if GPU is availablesent_scores_nli=selfcheck_nli.predict(sentences=sentences,# list of sentencessampled_passages= [sample1,sample2,sample3],# list of sampled passages)print(sent_scores_nli)# [0.334014 0.975106 ] -- based on the example above

SelfCheckGPT Usage: LLM Prompt

Prompting an LLM (Llama2, Mistral, OpenAI's GPT) to assess information consistency in a zero-shot setup. We query an LLM to assess whether the i-th sentence is supported by the sample (as the context). Similar to other methods, a higher score indicates higher chance of being hallucination. An example when using Mistral is below:

# Option1: open-source modelfromselfcheckgpt.modeling_selfcheckimportSelfCheckLLMPromptdevice=torch.device("cuda"iftorch.cuda.is_available()else"cpu")llm_model="mistralai/Mistral-7B-Instruct-v0.2"selfcheck_prompt=SelfCheckLLMPrompt(llm_model,device)# Option2: API access# (currently only support OpenAI and Groq)# from selfcheckgpt.modeling_selfcheck_apiprompt import SelfCheckAPIPrompt# selfcheck_prompt = SelfCheckAPIPrompt(client_type="openai", model="gpt-3.5-turbo")# selfcheck_prompt = SelfCheckAPIPrompt(client_type="groq", model="llama3-70b-8192", api_key="your-api-key")sent_scores_prompt=selfcheck_prompt.predict(sentences=sentences,# list of sentencessampled_passages= [sample1,sample2,sample3],# list of sampled passagesverbose=True,# whether to show a progress bar)print(sent_scores_prompt)# [0.33333333, 0.66666667] -- based on the example above

The LLM can be any model available on HuggingFace. The default prompt template isContext: {context}\n\nSentence: {sentence}\n\nIs the sentence supported by the context above? Answer Yes or No.\n\nAnswer:, but you can change it usingselfcheck_prompt.set_prompt_template(new_prompt).

Most models (gpt-3.5-turbo, Llama2, Mistral) will output either 'Yes' or 'No' >95% of the time, while any remaining outputs can be set to N/A. The output is converted to score: Yes -> 0.0, No -> 1.0, N/A -> 0.5. The inconsistency score is then calculated by averaging.

Dataset

Thewiki_bio_gpt3_hallucination dataset currently consists of 238 annotated passages (v3). You can find more information in the paper or our data card on HuggingFace:https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination. To use this dataset, you can either load it through HuggingFace dataset API, or download it directly from below in the JSON format.

Update

We've annotated GPT-3 wikibio passages further, and now the dataset consists of 238 annotated passages. Here isthe link for the IDs of the first 65 passages in thev1.

Option1: HuggingFace

fromdatasetsimportload_datasetdataset=load_dataset("potsawee/wiki_bio_gpt3_hallucination")

Option2: Manual Download

Download from ourGoogle Drive, then you can load it in python:

importjsonwithopen("dataset.json","r")asf:content=f.read()dataset=json.loads(content)

Each instance consists of:

  • gpt3_text: GPT-3 generated passage
  • wiki_bio_text: Actual Wikipedia passage (first paragraph)
  • gpt3_sentences:gpt3_text split into sentences usingspacy
  • annotation: human annotation at the sentence level
  • wiki_bio_test_idx: ID of the concept/individual from the original wikibio dataset (testset)
  • gpt3_text_samples: list of sampled passages (do_sample = True & temperature = 1.0)

Experiments

Probability-based baselines (e.g. GPT-3's probabilities)

As described in our paper, probabilities (and generation entropies) of the generative LLM can be used to measure its confidence. Check our example/implementation of this approach indemo/experiments/probability-based-baselines.ipynb

Experimental Results

  • Full details can be found in our paper.
  • Note that our new results show that LLMs such as GPT-3 (text-davinci-003) or ChatGPT (gpt-3.5-turbo) are good at text inconsistency assessment. Based on this finding, we trySelfCheckGPT-Prompt where each sentence (to be evaluated) is compared against each and every sampled_passage by prompting ChatGPT. SelfCheckGPT-Prompt is the best-performing method.

Results on thewiki_bio_gpt3_hallucination dataset.

MethodNonFact (AUC-PR)Factual (AUC-PR)Ranking (PCC)
Random Guessing72.9627.04-
GPT-3 Avg(-logP)83.2153.9757.04
SelfCheck-BERTScore81.9644.2358.18
SelfCheck-QA84.2648.1461.07
SelfCheck-Unigram85.6358.4764.71
SelfCheck-NLI92.5066.0874.14
SelfCheck-Prompt (Llama2-7B-chat)89.0563.0661.52
SelfCheck-Prompt (Llama2-13B-chat)91.9164.3475.44
SelfCheck-Prompt (Mistral-7B-Instruct-v0.2)91.3162.7674.46
SelfCheck-Prompt (gpt-3.5-turbo)93.4267.0978.32

Miscellaneous

MQAG (Multiple-choice Question Answering and Generation) was proposed in our previous work. Our MQAG implementation is included in this package, which can be used to: (1) generate multiple-choice questions, (2) answer multiple-choice questions, (3) obtain MQAG score.

MQAG Usage

fromselfcheckgpt.modeling_mqagimportMQAGmqag_model=MQAG()

It has three main functions:generate(),answer(),score(). We show an example usage indemo/MQAG_demo1.ipynb

Acknowledgements

This work is supported by Cambridge University Press & Assessment (CUP&A), a department of The Chancellor, Masters, and Scholars of the University of Cambridge, and the Cambridge Commonwealth, European & International Trust.

Citation

@article{manakul2023selfcheckgpt,  title={Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models},  author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},  journal={arXiv preprint arXiv:2303.08896},  year={2023}}

About

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2026 Movatter.jp