- Notifications
You must be signed in to change notification settings - Fork106
A simple Python module for parsing human names into their individual components
License
derek73/python-nameparser
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A simple Python (3.2+ & 2.6+) module for parsing human names into theirindividual components.
- hn.title
- hn.first
- hn.middle
- hn.last
- hn.suffix
- hn.nickname
- hn.surnames(middle + last)
- hn.initials(first initial of each name part)
The supported name structure is generally "Title First Middle Last Suffix", where all piecesare optional. Comma-separated format like "Last, First" is also supported.
- Title Firstname "Nickname" Middle Middle Lastname Suffix
- Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
- Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
Instantiating the HumanName class with a string splits on commas and then spaces,classifying name parts based on placement in the string and matches against known namepieces like titles and suffixes.
It correctly handles some common conjunctions and special prefixes to last nameslike "del". Titles and conjunctions can be chained together to handle complextitles like "Asst Secretary of State". It can also try to correct capitalizationof names that are all upper- or lowercase names.
It attempts the best guess that can be made with a simple, rule-based approach.Its main use case is English and it is not likely to be useful for languagesthat do not conform to the supported name structure. It's not perfect, but itgets you pretty far.
pip install nameparser
If you want to try out the latest code from GitHub you caninstall with pip using the command below.
pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser
If you need to handle lists of names, check outnamesparser, acompliment to this module that handles multiple names in a string.
>>> from nameparser import HumanName>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)")>>> name<HumanName : [ title: 'Dr.' first: 'Juan' middle: 'Q. Xavier' last: 'de la Vega' suffix: 'III' nickname: 'Doc Vega']>>>> name.last'de la Vega'>>> name.as_dict(){'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'}>>> str(name)'Dr. Juan Q. Xavier de la Vega III (Doc Vega)'>>> name.string_format = "{first} {last}">>> str(name)'Juan de la Vega'The parser does not attempt to correct mistakes in the input. It mostly just splits on whitespace and puts things in buckets based on their position in the string. This also meansthe difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a titlewhen it comes before the name and a suffix when it comes after. ("Pre-nominal"and "post-nominal" would probably be better names.)
>>> name = HumanName("1 & 2, 3 4 5, Mr.")>>> name<HumanName : [ title: '' first: '3' middle: '4 5' last: '1 & 2' suffix: 'Mr.' nickname: '']>Your project may need some adjustment for your dataset. You cando this in your own pre- or post-processing, bycustomizing the configured pre-definedsets of titles, prefixes, etc., or by subclassing the HumanName class. See thefull documentation for more information.
If you come across name piece that you think should be in the default config, you'reprobably right.Start a New Issue and we can get them added.
Please let me know if there are ways this library could be structured to makeit easier for you to use in your projects. ReadCONTRIBUTING.md for more infoon running the tests and contributing to the project.
GitHub Project
About
A simple Python module for parsing human names into their individual components
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.