- Notifications
You must be signed in to change notification settings - Fork43
A tokenizer, text cleaner, and phonemizer for many human languages.
License
rhasspy/gruut
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A tokenizer, text cleaner, andIPA phonemizer for several human languages that supportsSSML.
fromgruutimportsentencestext='He wound it around the wound, saying "I read it was $10 to read."'forsentinsentences(text,lang="en-us"):forwordinsent:ifword.phonemes:print(word.text,*word.phonemes)
which outputs:
He h ˈiwound w ˈaʊ n dit ˈɪ taround ɚ ˈaʊ n dthe ð əwound w ˈu n d, |saying s ˈeɪ ɪ ŋI ˈaɪread ɹ ˈɛ dit ˈɪ twas w ə zten t ˈɛ ndollars d ˈɑ l ɚ zto t əread ɹ ˈi d. ‖
Note that "wound" and "read" have different pronunciations when used in different (grammatical) contexts.
Asubset of SSML is also supported:
fromgruutimportsentencesssml_text="""<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd" xml:lang="en-US"><s>Today at 4pm, 2/1/2000.</s><s xml:lang="it">Un mese fà, 2/1/2000.</s></speak>"""forsentinsentences(ssml_text,ssml=True):forwordinsent:ifword.phonemes:print(sent.idx,word.lang,word.text,*word.phonemes)
with the output:
0 en-US Today t ə d ˈeɪ0 en-US at ˈæ t0 en-US four f ˈɔ ɹ0 en-US P p ˈi0 en-US M ˈɛ m0 en-US , |0 en-US February f ˈɛ b j u ˌɛ ɹ i0 en-US first f ˈɚ s t0 en-US , |0 en-US two t ˈu0 en-US thousand θ ˈaʊ z ə n d0 en-US . ‖1 it Un u n1 it mese ˈm e s e1 it fà f a1 it , |1 it due d j u1 it gennaio d͡ʒ e n n ˈa j o1 it duemila d u e ˈm i l a1 it . ‖
Seethe documentation for more details.
pip install gruut
Languages besides English can be added during installation. For example, with French and Italian support:
pip install -f'https://synesthesiam.github.io/prebuilt-apps/' gruut[fr,it]
The extra pip repo is needed for an updatednum2words fork that includes support for more languages.
You may alsomanually download language files and use put them in$XDG_CONFIG_HOME/gruut/
($HOME/.config/gruut
by default).
gruut will look for language files in the directory$XDG_CONFIG_HOME/gruut/<lang>/
if the corresponding Python package is not installed. Note that<lang>
here is thefull language name, e.g.de-de
instead of justde
.
gruut currently supports:
- Arabic (
ar
) - Czech (
cs
orcs-cz
) - German (
de
orde-de
) - English (
en
oren-us
) - Spanish (
es
ores-es
) - Farsi/Persian (
fa
) - French (
fr
orfr-fr
) - Italian (
it
orit-it
) - Luxembourgish (
lb
) - Dutch (
nl
) - Russian (
ru
orru-ru
) - Swedish (
sv
orsv-se
) - Swahili (
sw
)
The goal is to support all ofvoice2json's languages
- Python 3.7 or higher
- Linux
- Tested on Debian Bullseye
- num2words fork andBabel
- Currency/number handling
- num2words fork includes additional language support (Arabic, Farsi, Swedish, Swahili)
- gruut-ipa
- IPA pronunciation manipulation
- pycrfsuite
- Part of speech tagging and grapheme to phoneme models
- pydateparser
- Date parsing for multiple languages
gruut
can automatically verbalize numbers, dates, and other expressions. This is done in a locale-aware manner for both parsing and verbalization, so "1/1/2020" may be interpreted as "M/D/Y" or "D/M/Y" depending on the word or sentence's language (e.g.,<s lang="...">
).
The following types of expressions can be automatically expanded into words bygruut
:
- Numbers - "123" to "one hundred and twenty three" (disable with
verbalize_numbers=False
or--no-numbers
)- Relies on
Babel
for parsing andnum2words
for verbalization
- Relies on
- Dates - "1/1/2020" to "January first, twenty twenty" (disable with
verbalize_dates=False
or--no-dates
)- Relies on
pydateparser
for parsing and bothBabel
andnum2words
for verbalization
- Relies on
- Currency - "$10" to "ten dollars" (disable with
verbalize_currency=False
or--no-currency
)- Relies on
Babel
for parsing and bothBabel
andnum2words
for verbalization
- Relies on
- Times - "12:01am" to "twelve oh one A M" (disable with
verbalize_times=False
or--no-times
)- English only
- Relies on
num2words
for verbalization
Thegruut
module can be executed withpython3 -m gruut --language <LANGUAGE> <TEXT>
or with thegruut
command (fromsetup.py
).
Thegruut
command is line-oriented, consuming text and producingJSONL.You will probably want to installjq to manipulate theJSONL output fromgruut
.
Takes raw text and outputsJSONL with cleaned words/tokens.
echo'This, right here, is some "RAW" text!' \| gruut --language en-us \| jq --raw-output'.words[].text'This,righthere,issome"RAW"text!
More information is available in the full JSON output:
gruut --language en-us'More text.'| jq.
Output:
{"idx":0,"text":"More text.","text_with_ws":"More text.","text_spoken":"More text","par_idx":0,"lang":"en-us","voice":"","words": [ {"idx":0,"text":"More","text_with_ws":"More","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":"JJR","phonemes": ["m","ˈɔ","ɹ" ],"is_major_break":false,"is_minor_break":false,"is_punctuation":false,"is_break":false,"is_spoken":true,"pause_before_ms":0,"pause_after_ms":0 }, {"idx":1,"text":"text","text_with_ws":"text","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":"NN","phonemes": ["t","ˈɛ","k","s","t" ],"is_major_break":false,"is_minor_break":false,"is_punctuation":false,"is_break":false,"is_spoken":true,"pause_before_ms":0,"pause_after_ms":0 }, {"idx":2,"text":".","text_with_ws":".","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":null,"phonemes": ["‖" ],"is_major_break":true,"is_minor_break":false,"is_punctuation":false,"is_break":true,"is_spoken":false,"pause_before_ms":0,"pause_after_ms":0 } ],"pause_before_ms":0,"pause_after_ms":0}
For the whole input line and each word, thetext
property contains the processed input text with normalized whitespace whiletext_with_ws
retains the original whitespace. Thetext_spoken
property only contains words that are spoken, so punctuation and breaks are excluded.
Within each word, there is:
idx
- zero-based index of the word in the sentencesent_idx
- zero-based index of the sentence in the input textpos
- part of speech tag (if available)phonemes
- list ofIPA phonemes for the word (if available)is_minor_break
-true
if "word" separates phrases (comma, semicolon, etc.)is_major_break
-true
if "word" separates sentences (period, question mark, etc.)is_break
-true
if "word" is a major or minor breakis_punctuation
-true
if "word" is a surrounding punctuation mark (quote, bracket, etc.)is_spoken
-true
if not a break or punctuation
Seepython3 -m gruut <LANGUAGE> --help
for more options.
A subset ofSSML is supported:
<speak>
- wrap around SSML textlang
- set language for document
<p>
- paragraphlang
- set language for paragraph
<s>
- sentence (disables automatic sentence breaking)lang
- set language for sentence
<w>
/<token>
- word (disables automatic tokenization)lang
- set language for wordrole
- set word role (seeword roles)
<lang lang="...">
- set language inner text<voice name="...">
- set voice of inner text<say-as interpret-as="">
- force interpretation of inner textinterpret-as
one of "spell-out", "date", "number", "time", or "currency"format
- way to format text depending oninterpret-as
- number - one of "cardinal", "ordinal", "digits", "year"
- date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
<break time="">
- Pause for given amount of time- time - seconds ("123s") or milliseconds ("123ms")
<mark name="">
- User-defined mark (marks_before
andmarks_after
attributes of words/sentences)- name - name of mark
<sub alias="">
- substitutealias
for inner text<phoneme ph="...">
- supply phonemes for inner textph
- phonemes for each word of inner text, separated by whitespace
<lexicon>
- inline or external pronunciation lexiconid
- unique id of lexicon (used in<lookup ref="...">
)uri
- if empty or missing, lexicon is inline- One or more
<lexeme>
child elements with:- Optional
role="..."
([word roles][#word-roles] separated by whitespace) <grapheme>WORD</grapheme>
- word text<phoneme>P H O N E M E S</phoneme>
- word pronunciation (phonemes separated by whitespace)
- Optional
<lookup ref="...">
- use pronunciation lexicon for child elementsref
- id from a<lexicon>
During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag asgruut:<TAG>
. For initialisms andspell-out
, the rolegruut:letter
is used to indicate that e.g., "a" should be spoken as/eɪ/
instead of/ə/
.
Foren-us
, the following additional roles are available from the part-of-speech tagger:
gruut:CD
- numbergruut:DT
- determinergruut:IN
- preposition or subordinating conjunctiongruut:JJ
- adjectivegruut:NN
- noungruut:PRP
- personal pronoungruut:RB
- adverbgruut:VB
- verbgruut:VB
- verb (past tense)
Inlinepronunciation lexicons are supported via the<lexicon>
and<lookup>
tags. gruut diverges slightly from theSSML standard here by allowing lexicons to be defined within the SSML document itself (url
is blank or missing). Additionally, theid
attribute of the<lexicon>
element can be left off to indicate a "default" inline lexicon that does not require a corresponding<lookup>
tag.
For example, the following document will yield three different pronunciations for the word "tomato":
<?xml version="1.0"?><speakversion="1.1"xmlns="http://www.w3.org/2001/10/synthesis"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"xml:lang="en-US"> <lexiconxml:id="test"alphabet="ipa"> <lexeme> <grapheme> tomato </grapheme> <phoneme><!-- Individual phonemes are separated by whitespace--> t ə m ˈɑ t oʊ </phoneme> </lexeme> <lexeme> <graphemerole="fake-role"> tomato </grapheme> <phoneme><!-- Made up pronunciation for fake word role--> t ə m ˈi t oʊ </phoneme> </lexeme> </lexicon> <w>tomato</w> <lookupref="test"> <w>tomato</w> <wrole="fake-role">tomato</w> </lookup></speak>
The first "tomato" will be looked up in the U.S. English lexicon (/t ə m ˈeɪ t oʊ/
). Within the<lookup>
tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has arole attached (selecting a made up pronunciation in this case).
Even further from the SSML standard, gruut allows you to leave off the<lexicon>
id entirely. With noid
, a<lookup>
tag is no longer needed, allowing you to override the pronunciation of any word in the document:
<?xml version="1.0"?><speakversion="1.1"xmlns="http://www.w3.org/2001/10/synthesis"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"xml:lang="en-US"><!-- No id means change all words without a lookup--> <lexicon> <lexeme> <grapheme> tomato </grapheme> <phoneme> t ə m ˈɑ t oʊ </phoneme> </lexeme> </lexicon> <w>tomato</w></speak>
This will yield a pronunciation of/t ə m ˈɑ t oʊ/
for all instances of "tomato" in the document (unless they have a<lookup>
).
gruut is useful for transforming raw text into phonetic pronunciations, similar tophonemizer. Unlike phonemizer, gruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model. Phonemes for each language come from acarefully chosen inventory.
For each supported language, gruut includes a:
- A word pronunciation lexicon built from open source data
- Seepron_dict
- A pre-trained grapheme-to-phoneme model for guessing word pronunciations
Some languages also include:
- A pre-trained part of speech tagger built from open source data:
About
A tokenizer, text cleaner, and phonemizer for many human languages.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors10
Uh oh!
There was an error while loading.Please reload this page.