- Notifications
You must be signed in to change notification settings - Fork130
Comparing changes
Open a pull request
base repository:huggingface/swift-transformers
Uh oh!
There was an error while loading.Please reload this page.
base:main
head repository:huggingface/swift-transformers
Uh oh!
There was an error while loading.Please reload this page.
compare:bpe-tokenizers
- 17commits
- 0files changed
- 1contributor
Commits on Jul 17, 2023
Llama fails because we don't find normalizers or pre-processors.
pcuenca committedJul 17, 2023 Load tokenizers using Hub config.
Note that some models still don't have a tokenizer_config.json file.
pcuenca committedJul 17, 2023 - pcuenca committed
Jul 17, 2023
Commits on Jul 31, 2023
Add more PreTokenizers, fix Falcon tokenization.
pcuenca committedJul 31, 2023 Support fallback tokenizer_config.json
Currently available for gpt2, will add for others if necessary.
pcuenca committedJul 31, 2023 Extract configuration loader to a separate class.
pcuenca committedJul 31, 2023 Support generic "PreTrainedTokenizer" class.
Some models use this name instead of a concrete tokenizer class name.Currently using BPETokenizer as its implementation, to be fixed when wegeneralize to other tokenizer types.
pcuenca committedJul 31, 2023 Run BPE tokenizer tests from Hub configs.
pcuenca committedJul 31, 2023
Commits on Aug 1, 2023
Test edge cases, support spaces cleanup.
pcuenca committedAug 1, 2023 Fix test case to include specials.
TODO: support stripping special tokens.
pcuenca committedAug 1, 2023 Fix multibyte UTF-8 hexa encoding
pcuenca committedAug 1, 2023 Infer tokenizer class if absent from config.
pcuenca committedAug 1, 2023 BertTokenizer can be instantiated from Hub configs
pcuenca committedAug 1, 2023 Merge remote-tracking branch 'origin/main' into bpe-tokenizers
pcuenca committedAug 1, 2023
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:git diff main...bpe-tokenizers
Uh oh!
There was an error while loading.Please reload this page.