Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit

License

NotificationsYou must be signed in to change notification settings

decompositional-semantics-initiative/decomp

Repository files navigation

PyPI versionGitHubCIDocumentationLicense: MIT

Decompis a toolkit for working with theUniversal Decompositional Semantics(UDS) dataset, which is a collection of directedacyclic semantic graphs with real-valued node and edge attributespointing intoUniversalDependencies syntactic dependencytrees.

UDS graph example

The toolkit is built on top ofNetworkX andRDFLib making it straightforward to:

  • read the UDS dataset from its native JSON format
  • query both the syntactic and semantic subgraphs of UDS (as well aspointers between them) using SPARQL 1.1 queries
  • serialize UDS graphs to many common formats, such asNotation3,N-Triples,turtle, andJSON-LD, as well as any other formatsupported by NetworkX

The toolkit was built byAaron StevenWhite and is maintained by theDecompositional Semantics Initiative. The UDSdataset was constructed from annotations collected by theDecompositional Semantics Initiative.

Documentation

Thefull documentation for thepackage is hostedatRead the Docs.

Citation

If you make use of the dataset and/or toolkit in your research, we askthat you please cite the following paper in addition to the paper thatintroduces the underlying dataset(s) on which UDS is based.

White, Aaron Steven, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Subrahmanyan Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, et al. 2020.The Universal Decompositional Semantics Dataset and Decomp Toolkit. In Proceedings of The 12th Language Resources and Evaluation Conference, 5698–5707. Marseille, France: European Language Resources Association.

@inproceedings{white-etal-2020-universal,    title ="The Universal Decompositional Semantics Dataset and Decomp Toolkit",    author = "White, Aaron Steven  and      Stengel-Eskin, Elias  and      Vashishtha, Siddharth  and      Govindarajan, Venkata Subrahmanyan  and      Reisinger, Dee Ann  and      Vieira, Tim  and      Sakaguchi, Keisuke  and      Zhang, Sheng  and      Ferraro, Francis  and      Rudinger, Rachel  and      Rawlins, Kyle  and      Van Durme, Benjamin",    booktitle ="Proceedings of The 12th Language Resources and Evaluation Conference",    month = may,    year ="2020",    address ="Marseille, France",    publisher ="European Language Resources Association",    url ="https://www.aclweb.org/anthology/2020.lrec-1.699",    pages ="5698--5707",    ISBN ="979-10-95546-34-4",}

License

Everything besides the contents ofdecomp/data are covered by theMIT License contained at the same directory level as this README. Allcontents ofdecomp/data are covered by the CC-BY-SA 4.0 licensecontained in that directory.

Installation

The most painless way to get started quickly is to use the includedDockerfile based on jupyter/datascience-notebook with Python 3.12.To build the image and start a Jupyter Lab server:

git clone git://github.com/decompositional-semantics-initiative/decomp.gitcd decompdocker build -t decomp.docker run -it -p 8888:8888 decomp

This will start a Jupyter Lab server accessible athttp://localhost:8888.To start a Python interactive prompt instead:

docker run -it decomp python

If you prefer to install directly to your local environment, you canusepip to install from PyPI:

pip install decomp

Or install the latest development version from GitHub:

pip install git+https://github.com/decompositional-semantics-initiative/decomp.git

Requirements: Python 3.12 or higher is required.

You can also clone the repository and install from source:

git clone https://github.com/decompositional-semantics-initiative/decomp.gitcd decomppip install.

For development, install the package in editable mode with development dependencies:

git clone https://github.com/decompositional-semantics-initiative/decomp.gitcd decomppip install -e".[dev]"

This installs the package in editable mode along with development toolsincludingpytest,ruff,mypy, andipython.

Note for developers: The development dependencies include most testing requirements,butpredpatt (used for differential testing) must be installed separately due toPyPI restrictions on git dependencies:

pip install git+https://github.com/hltcoe/PredPatt.git

Quick Start

The UDS corpus can be read by directly importing it.

fromdecompimportUDSCorpusuds=UDSCorpus()

This imports aUDSCorpus objectuds, which contains all graphsacross all splits in the data. If you would like a corpus, e.g.,containing only a particular split, see other loading options inthetutorial on reading thecorpusfor details.

The first time you read UDS, it will take several minutes to completewhile the dataset is built from theUniversal Dependencies English WebTreebank,which is not shipped with the package (but is downloaded automaticallywhen first creating a corpus instance), and theUDSannotations, which are shipped with thepackage. Subsequent uses will be faster, since the dataset is cached onbuild.

UDSGraph objects in the corpus can be accessed using standarddictionary getters or iteration. For instance, to get the UDS graphcorresponding to the 12th sentence inen-ud-train.conllu, you canuse:

uds["ewt-train-12"]

More generally,UDSCorpus objects behave like dictionaries. Forexample, to print all the graph identifiers in the corpus (e.g."ewt-train-12"), you can use:

forgraphidinuds:print(graphid)

Similarly, to print all the graph identifiers in the corpus (e.g."ewt-in-12") along with the corresponding sentence, you can use:

forgraphid,graphinuds.items():print(graphid)print(graph.sentence)

A list of graph identifiers can also be accessed via thegraphidsattribute of the UDSCorpus. A mapping from these identifiers and thecorresponding graph can be accessed via thegraphs attribute.

# a list of the graph identifiers in the corpusuds.graphids# a dictionary mapping the graph identifiers to the# corresponding graphuds.graphs

There are various instance attributes and methods for accessing nodes,edges, and their attributes in the UDS graphs. For example, to get adictionary mapping identifiers for syntax nodes in the UDS graph totheir attributes, you can use:

uds["ewt-train-12"].syntax_nodes

To get a dictionary mapping identifiers for semantics nodes in the UDSgraph to their attributes, you can use:

uds["ewt-train-12"].semantics_nodes

To get a dictionary mapping identifiers for semantics edges (tuples ofnode identifiers) in the UDS graph to their attributes, you can use:

uds["ewt-train-12"].semantics_edges()

To get a dictionary mapping identifiers for semantics edges (tuples ofnode identifiers) in the UDS graph involving the predicate headed by the7th token to their attributes, you can use:

uds["ewt-train-12"].semantics_edges('ewt-train-12-semantics-pred-7')

To get a dictionary mapping identifiers for syntax edges (tuples of nodeidentifiers) in the UDS graph to their attributes, you can use:

uds["ewt-train-12"].syntax_edges()

And to get a dictionary mapping identifiers for syntax edges (tuples ofnode identifiers) in the UDS graph involving the node for the 7th tokento their attributes, you can use:

uds["ewt-train-12"].syntax_edges('ewt-train-12-syntax-7')

There are also methods for accessing relationships between semantics andsyntax nodes. For example, you can get a tuple of the ordinal positionfor the head syntax node in the UDS graph that maps of the predicateheaded by the 7th token in the corresponding sentence to a list of theform and lemma attributes for that token, you can use:

uds["ewt-train-12"].head('ewt-train-12-semantics-pred-7', ['form','lemma'])

And if you want the same information for every token in the span, youcan use:

uds["ewt-train-12"].span('ewt-train-12-semantics-pred-7', ['form','lemma'])

This will return a dictionary mapping ordinal position for syntax nodesin the UDS graph that make of the predicate headed by the 7th token inthe corresponding sentence to a list of the form and lemma attributesfor the corresponding tokens.

More complicated queries of the UDS graph can be performed using thequery method, which accepts arbitrary SPARQL 1.1 queries. Seethetutorial on querying thecorpusfor details.

About

The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors5


[8]ページ先頭

©2009-2025 Movatter.jp