Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

chore: switch to charset normalizer#4060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
qued wants to merge9 commits intomain
base:main
Choose a base branch
Loading
fromchore/switch-to-charset-normalizer

Conversation

qued
Copy link
Contributor

Removeschardet as a dependency, standardizing oncharset-normalizer.

This involved:

  • Changingchardet tocharset-normalizer in our base dependency file
  • Updating the code (in only one place) wherechardet was used
  • pip-compiling to update our published dependency tree
  • Updating one test...charset-normalizer misdiagnosed the encoding of a file used as a test fixture. My guess is that the ~10 characters in the file were not enough forcharset-normalizer to do a proper inference, so I re-encoded another slightly longer file that's also used for encoding testing, and it got that one.

@quedqued marked this pull request as draftJuly 15, 2025 17:56
Comment on lines +1 to +3

k�nnen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For curiosity reasons, what's the story with these being added

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In the unit tests on the old file,charset_normalizer misdiagnosed it (chardet had previously gotten it right). So I added some stuff by re-encodingumlauts-utf8.md to give it more content on which to make an inference.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As far as the content ofumlauts-utf8.md, I think it was suggested by a user in a github issue.

Copy link
Contributor

@ahmetmeleqahmetmeleq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Surprisingly simple!

quedand others added2 commitsJuly 15, 2025 18:36
Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ahmetmeleqahmetmeleqahmetmeleq approved these changes

@KlaijanKlaijanAwaiting requested review from Klaijan

@jiajun-unstructuredjiajun-unstructuredAwaiting requested review from jiajun-unstructured

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@qued@ahmetmeleq

[8]ページ先頭

©2009-2025 Movatter.jp