Currently, the text normalize algorithm will simply replace original text with normalized text. This behavior causes the generated timestamps not align with the original timestamps.

Kokoro supports embedding phonemes in the text, and the token timestamps is based on the original text.

Original Input Text:[Misaki](/misˈɑki/) is a G2P engine designed for [Kokoro](/kˈOkəɹO/) models.
Text For Timestamps:Misaki is a G2P engine designed for Kokoro models.

Before this PR:

Text:  The price will be $100 after 9:30PM.word    start_time      end_timeThe     0.0005416666666666625   0.07554166666666667price   0.07554166666666667     0.3880416666666666will    0.3880416666666666      0.4880416666666667be      0.4880416666666667      0.6380416666666666one     0.6380416666666666      0.8255416666666666hundred 0.8255416666666666      1.1255416666666667dollars 1.1255416666666667      1.8505416666666668after   1.8505416666666668      2.188041666666667nine    2.188041666666667       2.5255416666666664thirtyPM        2.5255416666666664      3.5255416666666664.       3.5255416666666664      3.6755416666666667

Note that$100 is mistakenly shown asone handred, and9:30PM is shown asnine thirtyPM

After this PR:

Text:  The price will be $100 after 9:30PM.word    start_time      end_timeThe     0.0005416666666666625   0.07554166666666667price   0.07554166666666667     0.3880416666666666will    0.3880416666666666      0.4880416666666667be      0.4880416666666667      0.6380416666666666$100    0.6380416666666666      1.8505416666666668after   1.8505416666666668      2.1880416666666679:30PM  2.188041666666667       3.5255416666666664.       3.5255416666666664      3.6755416666666667

Note that both the$100 and9:30PM is correct now.

Improve text normalize to keep original timestamps

88f19d7

Copy link

Author

fondoger commentedMar 30, 2025•
edited
Loading

@remsky,@fireblade2534 Please review this PR. I tested it locally and the result is good.

Upgrade kokoro/misaki version

c7f09bf

Copy link

Collaborator

fireblade2534 commentedMar 30, 2025

I can't test it out right now but ill test it out tmrw.

fireblade2534 added3 commits

March 31, 2025 13:28

Reverted the kokoro version bump and change the phenomizer to use the…

fd86395

… phenomizer that the rest of the text uses.

Added .co as a valid domain

cacdfe7

Fix decimal

4b7f482

fireblade2534 requested changes

Mar 31, 2025

View reviewed changes

Copy link

Collaborator

fireblade2534 left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR looks great in concept but there are a few issue texts that I want to highlight:

Running on localhost:7860 -> Running on [localhost:[7860](/sˈɛvənti ˈeɪt sˈɪksti/)](/lˈoʊkɐlhˌoʊst kˈoʊlən sˈɛvən θˈaʊzənd ˈeɪt hˈʌndɹɪd sˈɪksti/)
Email me atuser@example.com -> Email me at [user@[example-com](/ɛɡzˈæmpəl dˈɑːt kˈɑːm/)](/jˈuːzɚɹ æɾ ɛɡzˈæmpəl dˈɑːt kˈɑːm/)
Oh yeah I have $500.60 in my bank account -> Oh ye'a I have [$[500.60](/fˈaɪv hˈʌndɹɪd pˈɔɪnt sˈɪks zˈiəɹoʊ/)](/fˈaɪv hˈʌndɹɪd ænd wˈʌn dˈɑːlɚz ænd sˈɪksti sˈɛnts/) in my bank account

What happens with both of those (and will happen in more cases) is that it normalized for example localhost:7860 but since the text was still in [localhost:7860] the number normalizer came along and normalized the number. This is an inherent issue because of the way that the normalizer / you code work. The code does handle custom phonemes, see text_processor.py:handle_custom_phonemes and get_sentence_info.

Copy link

Author

fondoger commentedApr 1, 2025

Thanks for the review. I'll check if I can think of better solutions to handle these cases.

Copy link

Author

fondoger commentedApr 1, 2025

Just find out that the original Kokoro itself can already handle some basic normalizations.

Try it here:https://hexgrad-kokoro-tts.hf.space

Email me atuser@example.com -> ˈimˌAl mˌi æt jˈuzəɹ æt ɪɡzˈæmpəl dˌɑt kˈɑm
Oh yeah I have $500.60 in my bank account -> ˈO jˈɛə ˌI hæv fˈIv hˈʌndɹəd dˈɑləɹz ænd sˈɪksti sˈɛnts ɪn mI bˈæŋk əkˈWnt

Maybe we can simply disable normalizations in Kokoro Fast API.

Copy link

Collaborator

fireblade2534 commentedApr 1, 2025

Disabling normalizations in kokoro-FastAPI has always been an option. The readme has a section on how to do it

Copy link

Collaborator

fireblade2534 commentedApr 1, 2025

Thanks for the review. I'll check if I can think of better solutions to handle these cases.

I would suggest hijacking the current system for preserving custom phenomes

fondoger marked this pull request as draft

April 3, 2025 06:39

Labels

None yet

Movatterモバイル変換

Uh oh!

Improve text normalize to keep original timestamps#264

Are you sure you want to change the base?

Improve text normalize to keep original timestamps#264

Uh oh!

Conversation

fondoger commentedMar 30, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

fondoger commentedMar 30, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

fireblade2534 commentedMar 30, 2025

Uh oh!

fireblade2534 left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fondoger commentedApr 1, 2025

Uh oh!

fondoger commentedApr 1, 2025

Uh oh!

fireblade2534 commentedApr 1, 2025

Uh oh!

fireblade2534 commentedApr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fondoger commentedMar 30, 2025•
edited
Loading

fondoger commentedMar 30, 2025•
edited
Loading

fireblade2534 left a comment•
edited
Loading