Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A dataset with Arabic words, English glosses, sourced from Wikimedia and annotated with maximal diacritization Resources

NotificationsYou must be signed in to change notification settings

CAMeL-Lab/CamelProp

Repository files navigation

This repository contains CP-WIKI-D3K, a dataset of 3,362 Arabic proper nouns from Wikipedia, each annotated with gold-standard lemma diacritizations and aligned with their English equivalents.It includes:

  1. The full dataset
  2. The postprocessing pipeline used to convert ChatGPT-4o outputs into final annotations, as described in1
  3. Markdown tables listing the examples used for few-shot and one-shot prompting

Footnotes

  1. Proper Name Diacritization for Arabic Wikipedia: A Benchmark DatasetRawan Bondok, Mayar Nassar, Salam Khalifa, Kurt Micallef, Nizar Habash (2025)arXiv:2505.02656

About

A dataset with Arabic words, English glosses, sourced from Wikimedia and annotated with maximal diacritization Resources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp