Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A tokenizer, text cleaner, and phonemizer for many human languages.

License

NotificationsYou must be signed in to change notification settings

rhasspy/gruut

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A tokenizer, text cleaner, andIPA phonemizer for several human languages that supportsSSML.

fromgruutimportsentencestext='He wound it around the wound, saying "I read it was $10 to read."'forsentinsentences(text,lang="en-us"):forwordinsent:ifword.phonemes:print(word.text,*word.phonemes)

which outputs:

He h ˈiwound w ˈaʊ n dit ˈɪ taround ɚ ˈaʊ n dthe ð əwound w ˈu n d, |saying s ˈeɪ ɪ ŋI ˈaɪread ɹ ˈɛ dit ˈɪ twas w ə zten t ˈɛ ndollars d ˈɑ l ɚ zto t əread ɹ ˈi d. ‖

Note that "wound" and "read" have different pronunciations when used in different (grammatical) contexts.

Asubset of SSML is also supported:

fromgruutimportsentencesssml_text="""<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2001/10/synthesis                http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"    xml:lang="en-US"><s>Today at 4pm, 2/1/2000.</s><s xml:lang="it">Un mese fà, 2/1/2000.</s></speak>"""forsentinsentences(ssml_text,ssml=True):forwordinsent:ifword.phonemes:print(sent.idx,word.lang,word.text,*word.phonemes)

with the output:

0 en-US Today t ə d ˈeɪ0 en-US at ˈæ t0 en-US four f ˈɔ ɹ0 en-US P p ˈi0 en-US M ˈɛ m0 en-US , |0 en-US February f ˈɛ b j u ˌɛ ɹ i0 en-US first f ˈɚ s t0 en-US , |0 en-US two t ˈu0 en-US thousand θ ˈaʊ z ə n d0 en-US . ‖1 it Un u n1 it mese ˈm e s e1 it fà f a1 it , |1 it due d j u1 it gennaio d͡ʒ e n n ˈa j o1 it duemila d u e ˈm i l a1 it . ‖

Seethe documentation for more details.

Installation

pip install gruut

Languages besides English can be added during installation. For example, with French and Italian support:

pip install -f'https://synesthesiam.github.io/prebuilt-apps/' gruut[fr,it]

The extra pip repo is needed for an updatednum2words fork that includes support for more languages.

You may alsomanually download language files and use put them in$XDG_CONFIG_HOME/gruut/ ($HOME/.config/gruut by default).

gruut will look for language files in the directory$XDG_CONFIG_HOME/gruut/<lang>/ if the corresponding Python package is not installed. Note that<lang> here is thefull language name, e.g.de-de instead of justde.

Supported Languages

gruut currently supports:

  • Arabic (ar)
  • Czech (cs orcs-cz)
  • German (de orde-de)
  • English (en oren-us)
  • Spanish (es ores-es)
  • Farsi/Persian (fa)
  • French (fr orfr-fr)
  • Italian (it orit-it)
  • Luxembourgish (lb)
  • Dutch (nl)
  • Russian (ru orru-ru)
  • Swedish (sv orsv-se)
  • Swahili (sw)

The goal is to support all ofvoice2json's languages

Dependencies

  • Python 3.7 or higher
  • Linux
    • Tested on Debian Bullseye
  • num2words fork andBabel
    • Currency/number handling
    • num2words fork includes additional language support (Arabic, Farsi, Swedish, Swahili)
  • gruut-ipa
    • IPA pronunciation manipulation
  • pycrfsuite
    • Part of speech tagging and grapheme to phoneme models
  • pydateparser
    • Date parsing for multiple languages

Numbers, Dates, and More

gruut can automatically verbalize numbers, dates, and other expressions. This is done in a locale-aware manner for both parsing and verbalization, so "1/1/2020" may be interpreted as "M/D/Y" or "D/M/Y" depending on the word or sentence's language (e.g.,<s lang="...">).

The following types of expressions can be automatically expanded into words bygruut:

  • Numbers - "123" to "one hundred and twenty three" (disable withverbalize_numbers=False or--no-numbers)
    • Relies onBabel for parsing andnum2words for verbalization
  • Dates - "1/1/2020" to "January first, twenty twenty" (disable withverbalize_dates=False or--no-dates)
    • Relies onpydateparser for parsing and bothBabel andnum2words for verbalization
  • Currency - "$10" to "ten dollars" (disable withverbalize_currency=False or--no-currency)
    • Relies onBabel for parsing and bothBabel andnum2words for verbalization
  • Times - "12:01am" to "twelve oh one A M" (disable withverbalize_times=False or--no-times)
    • English only
    • Relies onnum2words for verbalization

Command-Line Usage

Thegruut module can be executed withpython3 -m gruut --language <LANGUAGE> <TEXT> or with thegruut command (fromsetup.py).

Thegruut command is line-oriented, consuming text and producingJSONL.You will probably want to installjq to manipulate theJSONL output fromgruut.

Plain Text

Takes raw text and outputsJSONL with cleaned words/tokens.

echo'This, right here, is some "RAW" text!' \| gruut --language en-us \| jq --raw-output'.words[].text'This,righthere,issome"RAW"text!

More information is available in the full JSON output:

gruut --language en-us'More  text.'| jq.

Output:

{"idx":0,"text":"More text.","text_with_ws":"More text.","text_spoken":"More text","par_idx":0,"lang":"en-us","voice":"","words": [    {"idx":0,"text":"More","text_with_ws":"More","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":"JJR","phonemes": ["m","ˈɔ","ɹ"      ],"is_major_break":false,"is_minor_break":false,"is_punctuation":false,"is_break":false,"is_spoken":true,"pause_before_ms":0,"pause_after_ms":0    },    {"idx":1,"text":"text","text_with_ws":"text","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":"NN","phonemes": ["t","ˈɛ","k","s","t"      ],"is_major_break":false,"is_minor_break":false,"is_punctuation":false,"is_break":false,"is_spoken":true,"pause_before_ms":0,"pause_after_ms":0    },    {"idx":2,"text":".","text_with_ws":".","leading_ws":"","training_ws":"","sent_idx":0,"par_idx":0,"lang":"en-us","voice":"","pos":null,"phonemes": [""      ],"is_major_break":true,"is_minor_break":false,"is_punctuation":false,"is_break":true,"is_spoken":false,"pause_before_ms":0,"pause_after_ms":0    }  ],"pause_before_ms":0,"pause_after_ms":0}

For the whole input line and each word, thetext property contains the processed input text with normalized whitespace whiletext_with_ws retains the original whitespace. Thetext_spoken property only contains words that are spoken, so punctuation and breaks are excluded.

Within each word, there is:

  • idx - zero-based index of the word in the sentence
  • sent_idx - zero-based index of the sentence in the input text
  • pos - part of speech tag (if available)
  • phonemes - list ofIPA phonemes for the word (if available)
  • is_minor_break -true if "word" separates phrases (comma, semicolon, etc.)
  • is_major_break -true if "word" separates sentences (period, question mark, etc.)
  • is_break -true if "word" is a major or minor break
  • is_punctuation -true if "word" is a surrounding punctuation mark (quote, bracket, etc.)
  • is_spoken -true if not a break or punctuation

Seepython3 -m gruut <LANGUAGE> --help for more options.

SSML

A subset ofSSML is supported:

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <p> - paragraph
    • lang - set language for paragraph
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> /<token> - word (disables automatic tokenization)
    • lang - set language for word
    • role - set word role (seeword roles)
  • <lang lang="..."> - set language inner text
  • <voice name="..."> - set voice of inner text
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending oninterpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <mark name=""> - User-defined mark (marks_before andmarks_after attributes of words/sentences)
    • name - name of mark
  • <sub alias=""> - substitutealias for inner text
  • <phoneme ph="..."> - supply phonemes for inner text
    • ph - phonemes for each word of inner text, separated by whitespace
  • <lexicon> - inline or external pronunciation lexicon
    • id - unique id of lexicon (used in<lookup ref="...">)
    • uri - if empty or missing, lexicon is inline
    • One or more<lexeme> child elements with:
      • Optionalrole="..." ([word roles][#word-roles] separated by whitespace)
      • <grapheme>WORD</grapheme> - word text
      • <phoneme>P H O N E M E S</phoneme> - word pronunciation (phonemes separated by whitespace)
  • <lookup ref="..."> - use pronunciation lexicon for child elements
    • ref - id from a<lexicon>

Word Roles

During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag asgruut:<TAG>. For initialisms andspell-out, the rolegruut:letter is used to indicate that e.g., "a" should be spoken as/eɪ/ instead of/ə/.

Foren-us, the following additional roles are available from the part-of-speech tagger:

  • gruut:CD - number
  • gruut:DT - determiner
  • gruut:IN - preposition or subordinating conjunction
  • gruut:JJ - adjective
  • gruut:NN - noun
  • gruut:PRP - personal pronoun
  • gruut:RB - adverb
  • gruut:VB - verb
  • gruut:VB - verb (past tense)

Inline Lexicons

Inlinepronunciation lexicons are supported via the<lexicon> and<lookup> tags. gruut diverges slightly from theSSML standard here by allowing lexicons to be defined within the SSML document itself (url is blank or missing). Additionally, theid attribute of the<lexicon> element can be left off to indicate a "default" inline lexicon that does not require a corresponding<lookup> tag.

For example, the following document will yield three different pronunciations for the word "tomato":

<?xml version="1.0"?><speakversion="1.1"xmlns="http://www.w3.org/2001/10/synthesis"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/10/synthesis                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"xml:lang="en-US">  <lexiconxml:id="test"alphabet="ipa">    <lexeme>      <grapheme>        tomato      </grapheme>      <phoneme><!-- Individual phonemes are separated by whitespace-->        t ə m ˈɑ t oʊ      </phoneme>    </lexeme>    <lexeme>      <graphemerole="fake-role">        tomato      </grapheme>      <phoneme><!-- Made up pronunciation for fake word role-->        t ə m ˈi t oʊ      </phoneme>    </lexeme>  </lexicon>  <w>tomato</w>  <lookupref="test">    <w>tomato</w>    <wrole="fake-role">tomato</w>  </lookup></speak>

The first "tomato" will be looked up in the U.S. English lexicon (/t ə m ˈeɪ t oʊ/). Within the<lookup> tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has arole attached (selecting a made up pronunciation in this case).

Even further from the SSML standard, gruut allows you to leave off the<lexicon> id entirely. With noid, a<lookup> tag is no longer needed, allowing you to override the pronunciation of any word in the document:

<?xml version="1.0"?><speakversion="1.1"xmlns="http://www.w3.org/2001/10/synthesis"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/10/synthesis                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"xml:lang="en-US"><!-- No id means change all words without a lookup-->  <lexicon>    <lexeme>      <grapheme>        tomato      </grapheme>      <phoneme>        t ə m ˈɑ t oʊ      </phoneme>    </lexeme>  </lexicon>  <w>tomato</w></speak>

This will yield a pronunciation of/t ə m ˈɑ t oʊ/ for all instances of "tomato" in the document (unless they have a<lookup>).

Intended Audience

gruut is useful for transforming raw text into phonetic pronunciations, similar tophonemizer. Unlike phonemizer, gruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model. Phonemes for each language come from acarefully chosen inventory.

For each supported language, gruut includes a:

  • A word pronunciation lexicon built from open source data
  • A pre-trained grapheme-to-phoneme model for guessing word pronunciations

Some languages also include:

About

A tokenizer, text cleaner, and phonemizer for many human languages.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp