Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods" (Feldhus et al., 2023)

NotificationsYou must be signed in to change notification settings

DFKI-NLP/SMV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Verbalizing saliency maps with templates and binary filtering as well as instruction-based LLMs

alt text

arXiv

Getting started:

  1. You should use Python 3.8
  2. clone this repository
  3. pip install -r requirements

Example usage

To verbalize a dataset you first need to write a config file, the rest will be managed by theVerbalizer class object.We provide some examplatory config files to play around with.After defining a config you can use it to immediately get an explanation.For a fast start, look at our demo.py, if you only want a fast explanation, that is all you need.

fromsrc.search_methodsimportfastexplainasfeconfig_path="configs/toy_dev.yml"explanation_string=fe.explain(config_path)forexplanationinexplanation_string:print(explanation)

Just like in demo.py. Output (one explanation).:

SAMPLE:fantastic , madonna at her finest , the film is funny and her acting is brilliant . it may have been made in the 80 ' s but it has all the qualities of a modern hollywood block - buster . i love this film and i think its totally unique and will cheer up any dr ##oop ##y person within a matter of minutes . fantastic .subclass 'convolution search'snippet: 'i love this ' contains 53.51% of prediction score.snippet: 'love this and ' contains 44.96% of prediction score.snippet: '. love this ' contains 43.72% of prediction score.snippet: 'love this i ' contains 43.67% of prediction score.snippet: 'love this film ' contains 42.52% of prediction score.subclass 'span search'snippet: 'i love this film and ' contains 57.03% of prediction score.snippet: '. i love this film ' contains 55.78% of prediction score.snippet: 'i love this ' contains 53.51% of prediction score.snippet: 'love this film and i ' contains 47.19% of prediction score.snippet: 'love this film ' contains 42.52% of prediction score.subclass 'concatenation search'The phrase » i love this « is most important for the prediction (54 %).subclass 'compare search'snippet: 'this film and' occurs in all searches and accounts for 28.74% of prediction scoresnippet: 'love this film' occurs in all searches and accounts for 42.52% of prediction scoresnippet: 'i love this' occurs in all searches and accounts for 53.51% of prediction scoresnippet: '. i love' occurs in all searches and accounts for 30.03% of prediction scoresubclass 'total order'top tokens are:token 'this' with 25.22% of prediction scoretoken 'love' with 16.76% of prediction scoretoken 'i' with 11.53% of prediction scoretoken 'unique' with 3.98% of prediction scorePrediction was correct.

Note that the original output will be colourcoded

How our search methods work

alt text

Advanced

Otherwise, if you don't like the format in which we represent explanations, you can get the raw output of our searchmethods like this, by using theVerbalizer directly.

importsrc.dataloaderasdataimportsrc.toolsastoolsconfig_path="configs/toy_dev.yml"config,source=tools.read_config(config_path)verbalizer=data.Verbalizer(source,config=config)explanations,texts,searches=verbalizer()forsearch_typeinexplanations:forexplanation_keyinexplanations[search_type]:print(explanations[search_type][explanation_key])

Note thatverbalizer() callsverbalizer.doit()also multiprocess is set toTrue by default, disabling is encouraged for systemswith less than 8GB RAMor systems withless than 4 (physical) cores. disabling multiprocessing can leadto 5x increased running time.

This will produce the same explanation like the demo but the resulting string is not formatted and there will beless salient findings too.The variabletexts will contain the samples of the dataset you chose to explain,searches will contain ourcalculated values for span- and convolution search (np.array).explanations itself will be a dictionary, that is ordered like this:

Top layerAccessed layer
valuesmultipledict objectslist ofstring
keys"span_search", "convolution_search", "compare search", "total order", "summarization"string like "1", "2",...

alt text

Manual config writing

You currently have two methods of generating a config. The first one is manual.The presented example is the "toy_dev.yml".

source:"thermostat/imdb-bert-lig"sgn:"+"samples:100metric:name:"mean"value:0.4multiprocessing:Truedev:Truemaxwords:100mincoverage:.1

By changing the sgn parameter to "-" or None, you´d allow the verbalizer to take negative values as such, leading todifferent results, even though we found "+" to work best in general.by changing metric to one of our proposed metrics (quantile, mean), you can change the generation of the baseline valueat which a sample snippet gets considered salient and thus returned.If you enable the dev parameter, you can search for specific classes of samples. Currently implemented is:the filtering of the length of samples (via maxwords)and mincoverage, which checks the generated verbalizations for snippets of atleast n% coverage, if no snippet has atleast n% coverage, the sample is not considered valid and thus the index will not be saved.

Config constructor

Our second method of building a config file is a small plug-and-play like system.

importsrc.fastcfgascfgimportsrc.search_methods.fastexplainasfe# fixme: only lig and occ implemented in converter in src.fastcfg.Source, implement rest too.source=cfg.Source(modelname="Name of your model, for example Bert",datasetname="Name of the dataset, for example IMDB",explainername="Full name of the explanation algorithm, for example Layer Integrated Gradients")config=cfg.Config(src=source,sgn="+",samples=100)# With Config.get_possible_configurations() you can get a dictionary containing all possible configurations i.e. models,# datasets and explainersexplanations=fe.explain(config)forexplanationinexplanations:print(explanation)# you can also save a generated Config:filename="filename.yml"withopen(filename)asf:f.write(config.to_yaml())

With this you can change specific parameters on-the-fly for fast-testing of multiple configurations.

Filtering of results

Our filtering methods require some changes to the code from theGetting started section.

importsrc.dataloaderasdataimportsrc.toolsastoolsconfig_path="configs/toy_dev.yml"config,source=tools.read_config(config_path)verbalizer=data.Verbalizer(source,config=config,multiprocess=True)maxwords=100mincoverage=0.1explanations,texts,searches=verbalizer()valid_keys=verbalizer.filter_verbalizations(explanations,texts,searches,maxwords=maxwords,mincoverage=mincoverage)forkeyinvalid_keys:print(explanations[key])

Note that this can also be done via thesrc.search_methods.fastexplain.explain method and a given config,without the need of changing any code.Additionally, if you want to explain a dataset and save the explanations for later use, we´ve implemented a to_json,that is currently usable via thefastexplain.explain method, by setting the to_json parameter toTrue.

For further information you can look at the documentation of theVerbalizer class or our provided demosMost of our code is documented and built to be changed easily.

Search Types

As proposed in our paper, we employ different search methods to search for salient snippets. You can set your desiredsearches by changing themode parameter ofdataloader.Verbalizer.doit(). Default employs all ouralgorithms.

NameDescription
convolution searchimplements our proposed Convolution Search
span searchimplements our proposed Span Search
compare searchfilters for multiple equal results in convolution & span search
total orderfilters for top-k tokens
summarizationimplements our proposed Summarized Explanation

Config parameter cheat-sheet

ParameterValuesDescriptionDtype(s)
sourcePath to filePath to config filestr
multiprocessingTrue,False:True is defaultShould our multiprocessing implementation of our paper be usedbool
sgn"+","-",NoneValues of what sign should be used for calculation, None uses allstr,None
samplesAny of {-1, (0, +oo]}-1 to read whole dataset, any other number to readint
metric:nameSee documentation of dataloaderHow should the baseline value be calculatedstr
metric:valueDepends on metric, see docs of dataloaderWhat value should be used to generate baseline valuefloat
devTrue,False, default isFalseEnables further settings, allowing to filter the dataset, if False,maxwords andmincoverage will be ignoredbool
maxwordsany of (0, +oo]Filters for samples that have a maximum ofmaxwords wordsint
mincoverageany of [0., 1.]Filters for samples with a snippet of at leastmincoverage% of coveragefloat

Please note that this is still in development and object to change

GPT verbalizations

Click here

Citation

@inproceedings{feldhus-2023-smv,title ="Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods",author ="Nils Feldhus and Leonhard Hennig and Maximilian Dustin Nasert and Christopher Ebert and Robert Schwarzenberg and Sebastian M\"{o}ller",    booktitle ="Proceedings of the First Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",    year ="2023",    address ="Toronto, Canada",publisher ="Association for Computational Linguistics",url ="https://arxiv.org/abs/2210.07222",}

ACL Anthology version to be added in July.

About

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods" (Feldhus et al., 2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp