CAMeL-Lab/CamelPropPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

A dataset with Arabic words, English glosses, sourced from Wikimedia and annotated with maximal diacritization Resources

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CamelPROPWIKID3K.tsv		CamelPROPWIKID3K.tsv
README.md		README.md
few_shots_table.txt		few_shots_table.txt
one_shot_table.txt		one_shot_table.txt
post-process.py		post-process.py
requirements.txt		requirements.txt

Repository files navigation

CamelProp

This repository contains CP-WIKI-D3K, a dataset of 3,362 Arabic proper nouns from Wikipedia, each annotated with gold-standard lemma diacritizations and aligned with their English equivalents.It includes:

The full dataset
The postprocessing pipeline used to convert ChatGPT-4o outputs into final annotations, as described in¹
Markdown tables listing the examples used for few-shot and one-shot prompting

Footnotes

Proper Name Diacritization for Arabic Wikipedia: A Benchmark DatasetRawan Bondok, Mayar Nassar, Salam Khalifa, Kurt Micallef, Nizar Habash (2025)arXiv:2505.02656↩

About

A dataset with Arabic words, English glosses, sourced from Wikimedia and annotated with maximal diacritization Resources

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CamelProp

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

CAMeL-Lab/CamelProp

Folders and files

Latest commit

History

Repository files navigation

CamelProp

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages