Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Russian data set for question answering over Wikidata

License

NotificationsYou must be signed in to change notification settings

vladislavneon/RuBQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Introduction

We presentRuBQ (pronounced [`rubik]) --Russian KnowledgeBaseQuestions, a KBQA dataset that consists of 1,500 Russian questions of varying complexity along with their English machine translations, corresponding SPARQL queries, answers, as well as a subset of Wikidata covering entities with Russian labels. To the best of our knowledge, this is the first Russian KBQA and semantic parsing dataset. The dataset is thought to be used as a development and test sets in cross-lingual transfer, few-shot learning, or learning with synthetic data scenarios.

Download

Usage

Format

Data set files are presented in JSON format as an array of dictionary entries. See full specifications here.

Examples

QuestionQueryAnswersTags
Rus: Кто написал роман «Хижина дяди Тома»?

Eng: Who wrote the novel "Uncle Tom's Cabin"?
SELECT ?answer
WHERE {
wd:Q2222 wdt:P50 ?answer .
}
wd:Q102513
(Harriet Beecher Stowe)
1-hop
Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»?

Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"?
SELECT ?answer
WHERE {
wd:Q845176 p:P161 [
ps:P161 ?answer;
pq:P453 wd:Q2737140
] .
}
wd:Q312483
(Vyacheslav Tikhonov)
qualifier-constraint
Rus: Кто на работе пользуется теодолитом?

Eng: Who uses a theodolite for work?
SELECT ?answer
WHERE {
wd:Q181517 wdt:P366 [
wdt:P3095 ?answer
] .
}
wd:Q1734662
(cartographer)
wd:Q11699606
(geodesist)
wd:Q294126
(land surveyor)
multi-hop
Rus: Какой океан самый маленький?

Eng: Which ocean is the smallest?
SELECT ?answer
WHERE {
?answer p:P2046/
psn:P2046/
wikibase:quantityAmount ?sq .
?answer wdt:P31 wd:Q9430 .
}
ORDER BY ASC(?sq)
LIMIT 1
wd:Q788
(Arctic Ocean)
multi-constraint

reverse

ranking

RuWikidata Sample

We provide a Wikidata sample containing all the entities with Russian labels. It consists of about 212M triples with 8.1M unique entities. This snapshot mitigates the problem of Wikidata’s dynamics – a reference answer may change with time as the knowledge base evolves. The sample guarantees the correctness of the queries and answers. In addition, the smaller dump makes it easier to conduct experiments with our dataset.

We strongly recommend using this sample for evaluation.

Details

Sample is a collection of several RDF files in Turtle.

  • wdt_all.ttl contains all the truthy statements.
  • names.ttl contains Russian and English labels and aliases for all entities. Names in other language also provided when needed.
  • onto.ttl contains all Wikidata triples with relationwdt:P279 -subclass of. It represents some class hierarchy, but remember that there is noclass orinstance concepts in Wikidata.
  • pch_{0,6}.ttl contain all statetment nodes and their data for all entities.

Evaluation

rdfs:label andskos:altLabel predicates convention

Some question in our dataset require usingrdfs:label orskos:altLabel for retrieving answer which is a literal. In cases where answer language doesn't have to be inferred from question, our evaluation script takes into account Russian literals only.

Leaderboard

This work is licensed under aCreative Commons Attribution 4.0 International License.

CC BY 4.0

About

A Russian data set for question answering over Wikidata

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp