Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Feature] support Assign token to update the content of a token#1570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
ArthurZucker wants to merge16 commits intomain
base:main
Choose a base branch
Loading
fromassign-token

Conversation

ArthurZucker
Copy link
Collaborator

@ArthurZuckerArthurZucker commentedJul 12, 2024
edited
Loading

Very draft for now:

  • handle potential collisions

fixes#1473

ksopyla reacted with thumbs up emoji
@HuggingFaceDocBuilderDev

The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@pritam-dey3
Copy link

We really need this feature! Is it tracked anywhere else?

onaka-ga-pkpk and Mirco-Ramo reacted with eyes emoji

@ArthurZucker
Copy link
CollaboratorAuthor

oups sorry, yeah here is the good place will come back in a bit!

@ArthurZucker
Copy link
CollaboratorAuthor

Okay, so this PR works for assigning, but the issue is that forUnigram based models if the world is not removed from the internal lattice, it can still be "encoded". Might be an issue.
We are also updating theadded_tokens layer on top ofvocab

@ArthurZucker
Copy link
CollaboratorAuthor

Related to#1437 that does not work for unigram as if the token is in the vocabulary, even if it is special, it's still added to the unigram algorithm!

@jp1924
Copy link

jp1924 commentedJan 21, 2025
edited
Loading

hello@ArthurZucker
Is this work simply delayed, or has it been stopped due to a critical issue?

related issue:#1473,#31475

onaka-ga-pkpk and strategy155 reacted with eyes emoji

@ArthurZucker
Copy link
CollaboratorAuthor

there is kind of a critical issue which is that a general solution is hard to find given how unigram works. Once backtracking BPE is merged i'll have another look

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Assign<unusedXX> tokens withspecial_tokens without growing vocab size
4 participants
@ArthurZucker@HuggingFaceDocBuilderDev@pritam-dey3@jp1924

[8]ページ先頭

©2009-2025 Movatter.jp