rskmoi/namedivider-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star246

A tool that divides Japanese full names into family and given names.

License

MIT license

246 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
namedivider-api		namedivider-api
namedivider		namedivider
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Repository files navigation

namedivider-python🦒

NameDivider is a tool that divides Japanese full names into family and given names.

🚀 Try Live Demo •📖 Documentation (日本語) •🐳 Docker API •⚡ Rust Version

💡 Why NameDivider?

Japanese full names like "菅義偉" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.

Unlike cloud-based AI solutions, NameDivider processes all data locally — no external API calls, no data transmission, and full privacy control.

# Beforeperson_name="菅義偉"# How do you know where to divide?# AfterfromnamedividerimportBasicNameDividerdivider=BasicNameDivider()result=divider.divide_name("菅義偉")print(f"Family:{result.family}, Given:{result.given}")# Family: 菅, Given: 義偉

✨ Key Features

🎯99.91% accuracy - Tested on real-world Japanese names
⚡Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
🔐Privacy-first – Local-only processing, ideal for sensitive data
🔧Production ready - CLI, Python library, and Docker support
🎨Interactive demo - Try it live with Streamlit
📊Confidence scoring - Know when to trust the results
🛠️Customizable rules - Add domain-specific patterns

🚀 Quick Start

Installation

pip install namedivider-python

Basic Usage

fromnamedividerimportBasicNameDivider,GBDTNameDivider# Fast but good accuracy (99.3%)basic_divider=BasicNameDivider()result=basic_divider.divide_name("菅義偉")print(result)# 菅 義偉# Slower but best accuracy (99.9%)gbdt_divider=GBDTNameDivider()result=gbdt_divider.divide_name("菅義偉")print(result.to_dict())# {#   'algorithm': 'gbdt',#   'family': '菅',#   'given': '義偉',#   'score': 0.7300634880343344,#   'separator': ' '# }

🔧 Multiple Interfaces

🖥️ Command Line Interface

Perfect for batch processing and automation:

# Single name$ nmdiv name 菅義偉菅 義偉# Process file with progress bar$ nmdiv file customer_names.txt100%|██████████| 1000/1000 [00:02<00:00, 431.2it/s]# Check accuracy on labeled data$ nmdiv accuracy test_data.txtAccuracy: 99.1%

🐳 REST API (Docker)

For environments where Python cannot be used, we provide a containerized REST API:

# Run the API serverdocker run -d -p 8000:8000 rskmoi/namedivider-api# Send batch requestscurl -X POST localhost:8000/divide \  -H"Content-Type: application/json" \  -d'{"names": ["竈門炭治郎", "竈門禰豆子"]}'

Response:

{"divided_names": [    {"family":"竈門","given":"炭治郎","separator":"","score":0.3004587452426102,"algorithm":"kanji_feature"},    {"family":"竈門","given":"禰豆子","separator":"","score":0.30480429696983175,"algorithm":"kanji_feature"}  ]}

🎯 Interactive Web Demo

Try NameDivider instantly in your browser:Live Demo →

Run locally:

cd examples/demopip install -r requirements.txtstreamlit run example_streamlit.py

📊 Performance & Benchmarks

Algorithm	Accuracy	Speed (names/sec)	Use Case
BasicNameDivider / backend=python	99.3%	4152.8	Stable & compatible
BasicNameDivider / backend=rust	99.3%	18597.7	Max performance (if available)
GBDTNameDivider / backend=python	99.9%	1143.3	Best accuracy, guaranteed
GBDTNameDivider / backend=rust	99.9%	6277.4	Fast + accurate (if available)

Run your own benchmarks:

bash scripts/benchmark_sample.sh

🛠️ Advanced Features

Custom Rules

Handle domain-specific names with custom patterns:

fromnamedividerimportBasicNameDivider,BasicNameDividerConfigfromnamedividerimportSpecificFamilyNameRuleconfig=BasicNameDividerConfig(custom_rules=[SpecificFamilyNameRule(family_names=["竜胆"]),# Rare family names    ])divider=BasicNameDivider(config=config)result=divider.divide_name("竜胆尊")# DividedName(family='竜胆', given='尊', separator=' ', score=1.0, algorithm='rule_specific_family')

Speed Up

For high-volume processing, NameDivider offers several optimization options:

fromnamedividerimportBasicNameDivider,BasicNameDividerConfig# Load your nameswithopen("names.txt","r",encoding="utf-8")asf:names= [line.strip()forlineinf]# Option 1: Enable caching (faster repeated processing)config=BasicNameDividerConfig(cache_mask=True)divider=BasicNameDivider(config=config)results= [divider.divide_name(name)fornameinnames]# Option 2: (beta) Use Rust backend (up to 4x faster)# First install: pip install namedivider-coreconfig=BasicNameDividerConfig(backend="rust")divider=BasicNameDivider(config=config)results= [divider.divide_name(name)fornameinnames]

🏢 Typical Use Cases

Customer Data Processing - Clean and standardize name databases
Form Validation - Real-time name splitting in web applications
Analytics & Reports - Generate family name statistics
Data Migration - Convert legacy systems with combined name fields
Government & Municipal - Process citizen registration data
Security-sensitive Environments - Process nameswithout sending data to external APIs

📚 Examples & Tutorials

🌐 Use REST API with minimal client samples - Integration examples (7 languages available innamedivider-rs)
⚡ Performance Optimization - Handle large datasets efficiently
🔧 Custom Rules Examples - Domain-specific configurations

📄 License

Source code and gbdt_model_v1.txt

MIT License

bert_katakana_v0_3_0.pt

cc-by-sa-4.0

family_name_repository.pickle

English

(1) Purpose of use

family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.

Any other use of family_name_repository.pickle is prohibited.

(2) Liability

The author or copyright holder assumes no responsibility for the software.

Japanese / 日本語

(1) 利用目的

このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。

それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。

(2) 責任

作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。

The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).