NotificationsYou must be signed in to change notification settings
Fork989
Star12k

chore: switch to charset normalizer#4060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

qued wants to merge9 commits intomain

base:main

Choose a base branch

fromchore/switch-to-charset-normalizer

Draft

chore: switch to charset normalizer#4060

qued wants to merge9 commits intomainfromchore/switch-to-charset-normalizer

Conversation

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For curiosity reasons, what's the story with these being added

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In the unit tests on the old file,charset_normalizer misdiagnosed it (chardet had previously gotten it right). So I added some stuff by re-encodingumlauts-utf8.md to give it more content on which to make an inference.

Copy link

ContributorAuthor

quedJul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As far as the content ofumlauts-utf8.md, I think it was suggested by a user in a github issue.

ahmetmeleq approved these changes

Jul 15, 2025

View reviewed changes

Copy link

Contributor

ahmetmeleq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Surprisingly simple!

quedand others added2 commits

July 15, 2025 18:36

Update unstructured/file_utils/encoding.py

d7ba177

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>

Merge branch 'main' into chore/switch-to-charset-normalizer

29fa1bb

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: switch to charset normalizer#4060

Are you sure you want to change the base?

chore: switch to charset normalizer#4060

Uh oh!

Conversation

qued commentedJul 15, 2025

Uh oh!

Uh oh!

ahmetmeleqJul 15, 2025

Choose a reason for hiding this comment

Uh oh!

quedJul 15, 2025

Choose a reason for hiding this comment

Uh oh!

quedJul 15, 2025

Choose a reason for hiding this comment

Uh oh!

ahmetmeleq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!


		k�nnen