Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A tiny library for Python text normalisation. Useful for ad-hoc text processing.

License

NotificationsYou must be signed in to change notification settings

pudo/normality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

build

Normality is a Python micro-package that contains a small set of textnormalization functions for easier re-use. These functions accept asnippet of unicode or utf-8 encoded text and remove various classesof characters, such as diacritics, punctuation etc. This is useful asa preparation to further text analysis.

WARNING: This library works much better when used in combinationwithpyicu, a Python binding for the International Components forUnicode C library. ICU provides much better text transliteration thanthe defaulttext-unidecode.

Example

# coding: utf-8fromnormalityimportnormalize,slugify,collapse_spacestext=normalize('Nie wieder "Grüne Süppchen" kochen!')asserttext=='nie wieder grune suppchen kochen'slug=slugify('My first blog post!')assertslug=='my-first-blog-post'text='this\n\n\r\nhas\tlots of\nodd spacing.'assertcollapse_spaces(text)=='this has lots of odd spacing.'

License

normality is open source, licensed under a standard MIT license(included in this repository asLICENSE).

About

A tiny library for Python text normalisation. Useful for ad-hoc text processing.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp