DFKI-NLP/SMVPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star9

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods" (Feldhus et al., 2023)

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
configs		configs
crowd_study		crowd_study
imgs		imgs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
demo_json.py		demo_json.py
requirements.txt		requirements.txt
vis.py		vis.py

Repository files navigation

Saliency Map Verbalization

Verbalizing saliency maps with templates and binary filtering as well as instruction-based LLMs

Getting started:

You should use Python 3.8
clone this repository
pip install -r requirements

Example usage

To verbalize a dataset you first need to write a config file, the rest will be managed by theVerbalizer class object.We provide some examplatory config files to play around with.After defining a config you can use it to immediately get an explanation.For a fast start, look at our demo.py, if you only want a fast explanation, that is all you need.

fromsrc.search_methodsimportfastexplainasfeconfig_path="configs/toy_dev.yml"explanation_string=fe.explain(config_path)forexplanationinexplanation_string:print(explanation)

Just like in demo.py. Output (one explanation).:

SAMPLE:fantastic , madonna at her finest , the film is funny and her acting is brilliant . it may have been made in the 80 ' s but it has all the qualities of a modern hollywood block - buster . i love this film and i think its totally unique and will cheer up any dr ##oop ##y person within a matter of minutes . fantastic .subclass 'convolution search'snippet: 'i love this ' contains 53.51% of prediction score.snippet: 'love this and ' contains 44.96% of prediction score.snippet: '. love this ' contains 43.72% of prediction score.snippet: 'love this i ' contains 43.67% of prediction score.snippet: 'love this film ' contains 42.52% of prediction score.subclass 'span search'snippet: 'i love this film and ' contains 57.03% of prediction score.snippet: '. i love this film ' contains 55.78% of prediction score.snippet: 'i love this ' contains 53.51% of prediction score.snippet: 'love this film and i ' contains 47.19% of prediction score.snippet: 'love this film ' contains 42.52% of prediction score.subclass 'concatenation search'The phrase » i love this « is most important for the prediction (54 %).subclass 'compare search'snippet: 'this film and' occurs in all searches and accounts for 28.74% of prediction scoresnippet: 'love this film' occurs in all searches and accounts for 42.52% of prediction scoresnippet: 'i love this' occurs in all searches and accounts for 53.51% of prediction scoresnippet: '. i love' occurs in all searches and accounts for 30.03% of prediction scoresubclass 'total order'top tokens are:token 'this' with 25.22% of prediction scoretoken 'love' with 16.76% of prediction scoretoken 'i' with 11.53% of prediction scoretoken 'unique' with 3.98% of prediction scorePrediction was correct.

Note that the original output will be colourcoded

How our search methods work

Advanced

Otherwise, if you don't like the format in which we represent explanations, you can get the raw output of our searchmethods like this, by using theVerbalizer directly.

importsrc.dataloaderasdataimportsrc.toolsastoolsconfig_path="configs/toy_dev.yml"config,source=tools.read_config(config_path)verbalizer=data.Verbalizer(source,config=config)explanations,texts,searches=verbalizer()forsearch_typeinexplanations:forexplanation_keyinexplanations[search_type]:print(explanations[search_type][explanation_key])

Note thatverbalizer() callsverbalizer.doit()also multiprocess is set toTrue by default, disabling is encouraged for systemswith less than 8GB RAMor systems withless than 4 (physical) cores. disabling multiprocessing can leadto 5x increased running time.

This will produce the same explanation like the demo but the resulting string is not formatted and there will beless salient findings too.The variabletexts will contain the samples of the dataset you chose to explain,searches will contain ourcalculated values for span- and convolution search (np.array).explanations itself will be a dictionary, that is ordered like this:

	Top layer	Accessed layer
values	multiple`dict` objects	`list` of`string`
keys	"span_search", "convolution_search", "compare search", "total order", "summarization"	`string` like "1", "2",...

Manual config writing

You currently have two methods of generating a config. The first one is manual.The presented example is the "toy_dev.yml".

source:"thermostat/imdb-bert-lig"sgn:"+"samples:100metric:name:"mean"value:0.4multiprocessing:Truedev:Truemaxwords:100mincoverage:.1

By changing the sgn parameter to "-" or None, you´d allow the verbalizer to take negative values as such, leading todifferent results, even though we found "+" to work best in general.by changing metric to one of our proposed metrics (quantile, mean), you can change the generation of the baseline valueat which a sample snippet gets considered salient and thus returned.If you enable the dev parameter, you can search for specific classes of samples. Currently implemented is:the filtering of the length of samples (via maxwords)and mincoverage, which checks the generated verbalizations for snippets of atleast n% coverage, if no snippet has atleast n% coverage, the sample is not considered valid and thus the index will not be saved.

Config constructor

Our second method of building a config file is a small plug-and-play like system.

importsrc.fastcfgascfgimportsrc.search_methods.fastexplainasfe# fixme: only lig and occ implemented in converter in src.fastcfg.Source, implement rest too.source=cfg.Source(modelname="Name of your model, for example Bert",datasetname="Name of the dataset, for example IMDB",explainername="Full name of the explanation algorithm, for example Layer Integrated Gradients")config=cfg.Config(src=source,sgn="+",samples=100)# With Config.get_possible_configurations() you can get a dictionary containing all possible configurations i.e. models,# datasets and explainersexplanations=fe.explain(config)forexplanationinexplanations:print(explanation)# you can also save a generated Config:filename="filename.yml"withopen(filename)asf:f.write(config.to_yaml())

With this you can change specific parameters on-the-fly for fast-testing of multiple configurations.

Filtering of results

Our filtering methods require some changes to the code from theGetting started section.

importsrc.dataloaderasdataimportsrc.toolsastoolsconfig_path="configs/toy_dev.yml"config,source=tools.read_config(config_path)verbalizer=data.Verbalizer(source,config=config,multiprocess=True)maxwords=100mincoverage=0.1explanations,texts,searches=verbalizer()valid_keys=verbalizer.filter_verbalizations(explanations,texts,searches,maxwords=maxwords,mincoverage=mincoverage)forkeyinvalid_keys:print(explanations[key])

Note that this can also be done via thesrc.search_methods.fastexplain.explain method and a given config,without the need of changing any code.Additionally, if you want to explain a dataset and save the explanations for later use, we´ve implemented a to_json,that is currently usable via thefastexplain.explain method, by setting the to_json parameter toTrue.

For further information you can look at the documentation of theVerbalizer class or our provided demosMost of our code is documented and built to be changed easily.

Search Types

As proposed in our paper, we employ different search methods to search for salient snippets. You can set your desiredsearches by changing themode parameter ofdataloader.Verbalizer.doit(). Default employs all ouralgorithms.

Name	Description
convolution search	implements our proposed Convolution Search
span search	implements our proposed Span Search
compare search	filters for multiple equal results in convolution & span search
total order	filters for top-k tokens
summarization	implements our proposed Summarized Explanation

Config parameter cheat-sheet

Parameter	Values	Description	Dtype(s)
`source`	Path to file	Path to config file	`str`
`multiprocessing`	`True`,`False`:`True` is default	Should our multiprocessing implementation of our paper be used	`bool`
`sgn`	`"+"`,`"-"`,`None`	Values of what sign should be used for calculation, None uses all	`str`,`None`
`samples`	Any of {-1, (0, +oo]}	-1 to read whole dataset, any other number to read	`int`
metric:`name`	See documentation of dataloader	How should the baseline value be calculated	`str`
metric:`value`	Depends on metric, see docs of dataloader	What value should be used to generate baseline value	`float`
`dev`	`True`,`False`, default is`False`	Enables further settings, allowing to filter the dataset, if False,`maxwords` and`mincoverage` will be ignored	`bool`
`maxwords`	any of (0, +oo]	Filters for samples that have a maximum of`maxwords` words	`int`
`mincoverage`	any of [0., 1.]	Filters for samples with a snippet of at least`mincoverage`% of coverage	`float`

Please note that this is still in development and object to change

GPT verbalizations

Click here

Citation

@inproceedings{feldhus-2023-smv,title ="Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods",author ="Nils Feldhus and Leonhard Hennig and Maximilian Dustin Nasert and Christopher Ebert and Robert Schwarzenberg and Sebastian M\"{o}ller",    booktitle ="Proceedings of the First Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",    year ="2023",    address ="Toronto, Canada",publisher ="Association for Computational Linguistics",url ="https://arxiv.org/abs/2210.07222",}

ACL Anthology version to be added in July.

About

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Saliency Map Verbalization

Verbalizing saliency maps with templates and binary filtering as well as instruction-based LLMs

Getting started:

Example usage

How our search methods work

Advanced

Manual config writing

Config constructor

Filtering of results

Search Types

Config parameter cheat-sheet

GPT verbalizations

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

DFKI-NLP/SMV

Folders and files

Latest commit

History

Repository files navigation

Saliency Map Verbalization

Verbalizing saliency maps with templates and binary filtering as well as instruction-based LLMs

Getting started:

Example usage

How our search methods work

Advanced

Manual config writing

Config constructor

Filtering of results

Search Types

Config parameter cheat-sheet

GPT verbalizations

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages