- Notifications
You must be signed in to change notification settings - Fork6
A tool that divides Japanese full names into family and given names.
License
rskmoi/namedivider-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation

NameDivider is a tool that divides Japanese full names into family and given names.
🚀 Try Live Demo •📖 Documentation (日本語) •🐳 Docker API •⚡ Rust Version
Japanese full names like "菅義偉" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.
Unlike cloud-based AI solutions, NameDivider processes all data locally — no external API calls, no data transmission, and full privacy control.
# Beforeperson_name="菅義偉"# How do you know where to divide?# AfterfromnamedividerimportBasicNameDividerdivider=BasicNameDivider()result=divider.divide_name("菅義偉")print(f"Family:{result.family}, Given:{result.given}")# Family: 菅, Given: 義偉
- 🎯99.91% accuracy - Tested on real-world Japanese names
- ⚡Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
- 🔐Privacy-first – Local-only processing, ideal for sensitive data
- 🔧Production ready - CLI, Python library, and Docker support
- 🎨Interactive demo - Try it live with Streamlit
- 📊Confidence scoring - Know when to trust the results
- 🛠️Customizable rules - Add domain-specific patterns
pip install namedivider-python
fromnamedividerimportBasicNameDivider,GBDTNameDivider# Fast but good accuracy (99.3%)basic_divider=BasicNameDivider()result=basic_divider.divide_name("菅義偉")print(result)# 菅 義偉# Slower but best accuracy (99.9%)gbdt_divider=GBDTNameDivider()result=gbdt_divider.divide_name("菅義偉")print(result.to_dict())# {# 'algorithm': 'gbdt',# 'family': '菅',# 'given': '義偉',# 'score': 0.7300634880343344,# 'separator': ' '# }
Perfect for batch processing and automation:
# Single name$ nmdiv name 菅義偉菅 義偉# Process file with progress bar$ nmdiv file customer_names.txt100%|██████████| 1000/1000 [00:02<00:00, 431.2it/s]# Check accuracy on labeled data$ nmdiv accuracy test_data.txtAccuracy: 99.1%
For environments where Python cannot be used, we provide a containerized REST API:
# Run the API serverdocker run -d -p 8000:8000 rskmoi/namedivider-api# Send batch requestscurl -X POST localhost:8000/divide \ -H"Content-Type: application/json" \ -d'{"names": ["竈門炭治郎", "竈門禰豆子"]}'
Response:
{"divided_names": [ {"family":"竈門","given":"炭治郎","separator":"","score":0.3004587452426102,"algorithm":"kanji_feature"}, {"family":"竈門","given":"禰豆子","separator":"","score":0.30480429696983175,"algorithm":"kanji_feature"} ]}
Try NameDivider instantly in your browser:Live Demo →
Run locally:
cd examples/demopip install -r requirements.txtstreamlit run example_streamlit.py
Algorithm | Accuracy | Speed (names/sec) | Use Case |
---|---|---|---|
BasicNameDivider / backend=python | 99.3% | 4152.8 | Stable & compatible |
BasicNameDivider / backend=rust | 99.3% | 18597.7 | Max performance (if available) |
GBDTNameDivider / backend=python | 99.9% | 1143.3 | Best accuracy, guaranteed |
GBDTNameDivider / backend=rust | 99.9% | 6277.4 | Fast + accurate (if available) |
Run your own benchmarks:
bash scripts/benchmark_sample.sh
Handle domain-specific names with custom patterns:
fromnamedividerimportBasicNameDivider,BasicNameDividerConfigfromnamedividerimportSpecificFamilyNameRuleconfig=BasicNameDividerConfig(custom_rules=[SpecificFamilyNameRule(family_names=["竜胆"]),# Rare family names ])divider=BasicNameDivider(config=config)result=divider.divide_name("竜胆尊")# DividedName(family='竜胆', given='尊', separator=' ', score=1.0, algorithm='rule_specific_family')
For high-volume processing, NameDivider offers several optimization options:
fromnamedividerimportBasicNameDivider,BasicNameDividerConfig# Load your nameswithopen("names.txt","r",encoding="utf-8")asf:names= [line.strip()forlineinf]# Option 1: Enable caching (faster repeated processing)config=BasicNameDividerConfig(cache_mask=True)divider=BasicNameDivider(config=config)results= [divider.divide_name(name)fornameinnames]# Option 2: (beta) Use Rust backend (up to 4x faster)# First install: pip install namedivider-coreconfig=BasicNameDividerConfig(backend="rust")divider=BasicNameDivider(config=config)results= [divider.divide_name(name)fornameinnames]
- Customer Data Processing - Clean and standardize name databases
- Form Validation - Real-time name splitting in web applications
- Analytics & Reports - Generate family name statistics
- Data Migration - Convert legacy systems with combined name fields
- Government & Municipal - Process citizen registration data
- Security-sensitive Environments - Process nameswithout sending data to external APIs
- 🌐 Use REST API with minimal client samples - Integration examples (7 languages available innamedivider-rs)
- ⚡ Performance Optimization - Handle large datasets efficiently
- 🔧 Custom Rules Examples - Domain-specific configurations
MIT License
cc-by-sa-4.0
English
(1) Purpose of use
family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.
Any other use of family_name_repository.pickle is prohibited.
(2) Liability
The author or copyright holder assumes no responsibility for the software.
Japanese / 日本語
(1) 利用目的
このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。
それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。
(2) 責任
作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。
The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).
- ⚡ namedivider-rs - High-performance Rust implementation
- 🧠 BERT Katakana Divider - Deep learning approach for katakana names
About
A tool that divides Japanese full names into family and given names.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.