Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: binary file check#377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
ix-56h wants to merge11 commits intocoderamp-labs:main
base:main
Choose a base branch
Loading
fromix-56h:fix_binary_files_handle

Conversation

ix-56h
Copy link
Contributor

@ix-56hix-56h commentedJul 3, 2025
edited
Loading

Closes#375

Topic

tiktoken crash sometimes with binary files, after diging i've found that we did some check to ignore binary files but not strong enough.

@ix-56hix-56h self-assigned thisJul 3, 2025
@ix-56h
Copy link
ContributorAuthor

@NicolasIRAGNE i've fix encoding issue on windows, we now use utf-16 le instead of utf-16.

I let you decide what we need to do from now.

NicolasIRAGNE reacted with thumbs up emoji

@NicolasIRAGNE
Copy link
Contributor

I'll take a look whenever I have a bit of time, I'm not sure what this does but it seems better than initial check

But I am curious as to what the actual error is. Do we read too many characters? What happens if the file is just a huge numbers of ascii chars?

Also,#375 did this fix the problem as well?

@ix-56h
Copy link
ContributorAuthor

But I am curious as to what the actual error is. Do we read too many characters? What happens if the file is just a huge numbers of ascii chars?

The current problem:

  1. we read a chunk of a file
  2. we wrongly determine binaries using a too much simple method
  3. this leads to include bytes within the final output context
  4. the tiktoken package, which uses rust, panic when unwrapping parsing result

Also,#375 did this fix the problem as well?

Yes. This is the main purpose of this with another one : The context should now ignore a lot more (if it's not all) binaries, so the context should be a lot more usable for LLMs on various repositories.

@ix-56hix-56h changed the titlefix binary file checkfix: fix binary file checkJul 16, 2025
@ix-56hix-56h changed the titlefix: fix binary file checkfix: binary file checkJul 16, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@NicolasIRAGNENicolasIRAGNEAwaiting requested review from NicolasIRAGNE

@cyclotruccyclotrucAwaiting requested review from cyclotruc

At least 1 approving review is required to merge this pull request.

Assignees

@ix-56hix-56h

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

bug: stackoverflow error with some repos
4 participants
@ix-56h@NicolasIRAGNE@cyclotruc@filipchristiansen

[8]ページ先頭

©2009-2025 Movatter.jp