Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork650
Improve text normalize to keep original timestamps#264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
fondoger commentedMar 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@remsky,@fireblade2534 Please review this PR. I tested it locally and the result is good. |
I can't test it out right now but ill test it out tmrw. |
… phenomizer that the rest of the text uses.
fireblade2534 left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This PR looks great in concept but there are a few issue texts that I want to highlight:
- Running on localhost:7860 -> Running on [localhost:[7860](/sˈɛvənti ˈeɪt sˈɪksti/)](/lˈoʊkɐlhˌoʊst kˈoʊlən sˈɛvən θˈaʊzənd ˈeɪt hˈʌndɹɪd sˈɪksti/)
- Email me atuser@example.com -> Email me at [user@[example-com](/ɛɡzˈæmpəl dˈɑːt kˈɑːm/)](/jˈuːzɚɹ æɾ ɛɡzˈæmpəl dˈɑːt kˈɑːm/)
- Oh yeah I have $500.60 in my bank account -> Oh ye'a I have [$[500.60](/fˈaɪv hˈʌndɹɪd pˈɔɪnt sˈɪks zˈiəɹoʊ/)](/fˈaɪv hˈʌndɹɪd ænd wˈʌn dˈɑːlɚz ænd sˈɪksti sˈɛnts/) in my bank account
What happens with both of those (and will happen in more cases) is that it normalized for example localhost:7860 but since the text was still in [localhost:7860] the number normalizer came along and normalized the number. This is an inherent issue because of the way that the normalizer / you code work. The code does handle custom phonemes, see text_processor.py:handle_custom_phonemes and get_sentence_info.
Thanks for the review. I'll check if I can think of better solutions to handle these cases. |
Just find out that the original Kokoro itself can already handle some basic normalizations. Try it here:https://hexgrad-kokoro-tts.hf.space
Maybe we can simply disable normalizations in Kokoro Fast API. |
Disabling normalizations in kokoro-FastAPI has always been an option. The readme has a section on how to do it |
I would suggest hijacking the current system for preserving custom phenomes |
Uh oh!
There was an error while loading.Please reload this page.
Currently, the text normalize algorithm will simply replace original text with normalized text. This behavior causes the generated timestamps not align with the original timestamps.
Kokoro supports embedding phonemes in the text, and the token timestamps is based on the original text.
[Misaki](/misˈɑki/) is a G2P engine designed for [Kokoro](/kˈOkəɹO/) models.Misaki is a G2P engine designed for Kokoro models.Before this PR:
Note that
$100is mistakenly shown asone handred, and9:30PMis shown asnine thirtyPMAfter this PR:
Note that both the
$100and9:30PMis correct now.