Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Fortifying Community Truth: Developing Lexicons on Tech Facilitated GBV and Model for Low-resourced African Languages ( Hausa, Igbo, Yoruba and Swahili)

License

NotificationsYou must be signed in to change notification settings

holatung/Kumlinda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

image-removebg-preview-2

Kumlinda

Fortifying Community Truth: Developing Lexicons on Tech Facilitated GBV and Model for Low-resourced African Languages ( Hausa, Igbo, Yoruba and Swahili)African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. Perpetrators of TFGBV (Tech-Facilitated Gender Based Violence) have used local nuances online to target women and girls, many generic or non-African model not trained on these languages have performed poorly in flagging some of these words on social media, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER corpus used on social media to target the female gender in four African languages (Hausa, Igbo, Yoruba, and Swahili).

Supported Languages

LanguageISO 639-2 code
Hausahau
Igboibo
Yorubayor
Swahiliswa

Attributes

AttributeContextualisation
Sexually ExplicitComments expressing desire for genital nudity or depictions or descriptions of simulated or actual sexual acts.
VulgarityMaking explicit and offensive reference to sex or bodily functions.
InsultRemarks that include offensive and disrespectful words.
FlirtationStatements demonstrating inappropriate sexual advances with the intent to arouse sexual feelings.
Misogyny/Gender TrollingPassing sexist remarks, jokes, and memes, expressions treating women like objects, stereotyping women, sending unsolicited pornographic contents.

Formats

All the data and associated metadata together in one file will be available in csv and json file formats.

Kumlinda.csv - data and associated metadata in csv format.

Kumlinda.json - data and associated metadata in json format.

Below is an example row.

{"data":{"eng":"Her pussy would have been insanely expanded, Damn.","hau":"Da an fadada farjinta da hauka, Damn","ibo":"Ọtụ ya gaara agbasawanye nke ukwuu, Damn","yor":"Òbò ma ti fẹ̀ bi olóríburúkú, Damn.","swa":"Pussy yake ingekuwa imepanuliwa kichaa, Damn",   },"metadata":{"scope":"Africa","category":"Romance","source":"https://www.nairaland.com/8248492/former-adult-film-actress-riley","attribute":"vulgarity","date":"2024-10-24"   }}

Post surface

FaceBook Results include those posted from the specified endpoints.

Profiles: Posts that appear on profiles set to public with a verified badge or 25,000 or more followers

Pages: Posts that appear on public Pages, posted by anyone

Events: Posts that appear on public events, posted by anyone

Groups: Posts that appear on public groups, posted by anyone

Instagram Account type: Results include those posted from the specified endpoints.

Personal: Accounts set to public with a verified badge or 25,000 or more followers

Business: Accounts with access to tools to grow and reach customers

Creator: Accounts for content creators, artists and influencers

1. [Manually Annotated Social Media Sentiment Dataset]

2. [Semi-automated Annotated Social Media Sentiment Lexicon]

Translation Team

Swahili

  • Nyakerario Omari
  • Joan Muthoki Kithanze

Hausa

Igbo

  • Ada
  • Ehis

Yoruba

  • Israel Olatunji Tijani
  • Salvation Grace

OpenSource Articles

CommonCrawl is a non-profit organization which provides web crawl data for free. Their datasets are used by various organizations, both in academia and industry,as can be seen on the examples page. The applications range from machine learning to natural language processing or computational linguistics.Common Crawl

Model

This model is available via Kaggle Model Hubhere

🚀 Deployment

The live version is hosted onGitHub andKaggle.The pre-trained model is found inmain.py.

Contact me

If you want to report a problem or suggest an enhancement, crowdsourcing contributors is welcomed. Reach me via email (israel@chatve.co) or onX.

About

Fortifying Community Truth: Developing Lexicons on Tech Facilitated GBV and Model for Low-resourced African Languages ( Hausa, Igbo, Yoruba and Swahili)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp