- Notifications
You must be signed in to change notification settings - Fork0
CAMeL-Lab/CamelProp
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains CP-WIKI-D3K, a dataset of 3,362 Arabic proper nouns from Wikipedia, each annotated with gold-standard lemma diacritizations and aligned with their English equivalents.It includes:
- The full dataset
- The postprocessing pipeline used to convert ChatGPT-4o outputs into final annotations, as described in1
- Markdown tables listing the examples used for few-shot and one-shot prompting
Footnotes
Proper Name Diacritization for Arabic Wikipedia: A Benchmark DatasetRawan Bondok, Mayar Nassar, Salam Khalifa, Kurt Micallef, Nizar Habash (2025)arXiv:2505.02656↩
About
A dataset with Arabic words, English glosses, sourced from Wikimedia and annotated with maximal diacritization Resources
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published