v0.15.0

Compare

Choose a tag to compare

githubnemo released this 19 Mar 15:05

· 102 commits to main since this release

v0.15.0

b34d8a2

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Highlights

New Methods

CorDA: Context-Oriented Decomposition Adaptation

@iboing and@5eqn contributedCorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning . This task-driven initialization method hastwo modes, knowledge-preservation and instruction-preservation, both using external data to select ranks intelligently. The former can be used to select those ranks that correspond to weights not affiliated with knowledge from, say, a QA dataset. The latter can be used to select those ranks that correspond most to the task at hand (e.g., a classification task). (#2231)

Trainable Tokens: Selective token update

The newTrainable Tokens tuner allows for selective training of tokens without re-training the full embedding matrix, e.g. when adding support for reasoning / thinking tokens. This is a lot more memory efficient and the saved checkpoint is much smaller. It can be used standalone orin conjunction with LoRA adapters by passingtrainable_token_indices toLoraConfig. (#2376)

Enhancements

LoRA now supports targeting multihead attention modules (but for now only those with_qkv_same_embed_dim=True). These modules were tricky as they may expose linear submodules but won't use their forward methods, therefore needing explicit support. (#1324)

Hotswapping now allows different alpha scalings and ranks without recompilation of the model when the model is prepared using a call toprepare_model_for_compiled_hotswap() before compiling the model. (#2177)

GPTQModel support was added in#2247 as a replacement for AutoGPTQ which is not maintained anymore.

Changes

It's now possible to useall-linear astarget_modules for custom (non-transformers) models (#2267). With this change comes a bugfix where it was possible that non-linear layers were selected when they shared the same name with a linear layer (e.g.,bar.foo andbaz.foo).
The internal tuner API was refactored to make method registration easier. With this change the number of changes to numerous files is reduced to a singleregister_peft_method() call. (#2282)
PEFT_TYPE_TO_MODEL_MAPPING is now deprecated and should not be relied upon. UsePEFT_TYPE_TO_TUNER_MAPPING instead. (#2282)
Mixed adapter batches can now be used in conjunction with beam search. (#2287)
It was possible thatmodules_to_save keys wrongly matched parts of the state dict if the key was a substring of another key (e.g.,classifier andclassifier2). (#2334)
Auto-casting of the input dtype to the LoRA adapter dtype can now be disabled viadisable_input_dtype_casting=True. (#2353)
The config parametersrank_pattern andalpha_pattern used by many adapters now supports matching full paths as well by specifying the pattern with a caret in front, for example:^foo to targetmodel.foo but notmodel.bar.foo. (#2419)
AutoPeftModels do not reduce the embedding size anymore if the tokenizer size differs from the embedding size. Only if there are more tokens in the tokenizer than in the embedding matrix, the matrix will be resized. This is to prevent resizing of embedding matrices in models that have 'spare' tokens built-in. (#2427)

What's Changed

FIX: Ensure Device Compatibility for BOFT Forward/Merging by@d-kleine in#2242
MNT: Bump version to 0.14.1.dev0 by@BenjaminBossan in#2263
ENH: fix library interface by@bluenote10 in#2265
FIX: Add warning foradapter_name conflict with tuner by@pzdkn in#2254
ENH: FIX: Allow"all-linear" to target custom models by@BenjaminBossan in#2267
MNT: apply sorting of exported symbols in__all__ by@bluenote10 in#2280
MNT: apply sorting of imports by@bluenote10 in#2279
FIX: Adoption prompt: New way to obtain position embeddings by@BenjaminBossan in#2276
FIX: Int8 check for torchao v0.7.0 by@BenjaminBossan in#2284
FEAT: Adding CorDA as an optional initialization method of LoRA by@iboing in#2231
FIX: typo in loraconfig.py by@innerlee in#2297
DOC: Added information regarding freezing the base model inprepare_model_for_kbit_training docstring by@NilBiescas in#2305
DOC: addresize_token_embeddings to docs by@bingwork in#2290
FIX: Make CorDA example work by@5eqn in#2300
FIX:#2295: Warn when user reloads modified model by@githubnemo in#2306
ENH: Extend usage for OLoRA finetune script by@jiqing-feng in#2308
CI: Add zizmor for CI (security) linting by@githubnemo in#2288
FEAT: Add LoRA multihead attention module by@BenjaminBossan in#1324
DOC: Updated documentation forget_peft_model() for in-place base model modification by@d-kleine in#2313
FIX: Prefix tuning test w/ rotary embedding on multi GPU by@BenjaminBossan in#2311
FIX: Adaption prompt errors after changes from transformers #35235 by@BenjaminBossan in#2314
FIX: Package checks for torchao, EETQ by@BenjaminBossan in#2320
Refactor: PEFT method registration function by@BenjaminBossan in#2282
FIX:low_cpu_mem_usage=True with 8bit bitsandbytes by@BenjaminBossan in#2325
FIX: ReinstatePEFT_TYPE_TO_MODEL_MAPPING variable with deprecation by@BenjaminBossan in#2328
FIX: reduce CorDA memory consumption + docs by@5eqn in#2324
MNT: React on new zizmor version findings by@githubnemo in#2331
TST: make cuda-only tests device-agnostic by@faaany in#2323
FIX: Generating with mixed adapter batches and with beam search enabled by@BenjaminBossan in#2287
FIX: Bug withmodules_to_save loading if substring by@BenjaminBossan in#2334
FIX: Add missing attributes to MultiheadAttention by@BenjaminBossan in#2335
FIX: for zizmor permission warnings by@githubnemo in#2338
CI: Attempt at adding a cache for models by@githubnemo in#2327
FIX: Avoid needless copy frommodules_to_save by@BenjaminBossan in#2220
DOC: Add entry to solve unknown config argument by@BenjaminBossan in#2340
FEAT: add gptqmodel support by@jiqing-feng in#2247
MNT: Update ruff to v0.9.2 by@BenjaminBossan in#2343
TST: Updatetorch.compile tests and docs by@BenjaminBossan in#2332
FIX: Documentation & error checking for AdaLoRA timing by@githubnemo in#2341
DOC: Better document init_lora_weights=False option by@BenjaminBossan in#2347
ENH: Adding Lora implementation fornn.Conv1d by@CCLDArjun in#2333
FIX: Failing AdaLoRA GPU test by@BenjaminBossan in#2349
ENH: Improve invalid peft config error message by@thedebugger in#2346
TST: Use different diffusion model for testing by@BenjaminBossan in#2345
CI: Use locked install for zizmor by@githubnemo in#2350
DOC: fix links to PEFT guides by@makelinux in#2357
DOC: rename link to PEFT Quicktour by@makelinux in#2358
ENH: Allow disabling input dtype casting for LoRA by@BenjaminBossan in#2353
ENH: Hotswap allow different alpha scalings and ranks by@BenjaminBossan in#2177
DOC: Fix links to boft by@makelinux in#2365
DOC: Explain uninitialized weights warning by@BenjaminBossan in#2369
ENH: Optimization for ConvNd if dropout=0. by@gslama12 in#2371
FIX: Small fixes to hotswapping by@BenjaminBossan in#2366
ENH:prepare_model_for_compiled_hotswap raises when no adapter was found by@BenjaminBossan in#2375
FIX: Ensurehf_hub_download arguments are used when loading locally by@henryzhengr in#2373
FIX: Avoid caching in X-LoRA generate by@BenjaminBossan in#2384
CI: Skip audio test on single GPU CI by@BenjaminBossan in#2380
SEC: Bump transformers version used in examples by@BenjaminBossan in#2374
FIX: Failing single GPU tests related to hotswapping by@BenjaminBossan in#2385
ENH: Make hotswap error on compile optional by@BenjaminBossan in#2393
FEAT: Standalone Custom Tokens Tuner and integrated into LoRA by@githubnemo in#2376
FIX: GPTQModel LoRA Compat by@Qubitium in#2404
FIX: Model with nestedall-linear target modules by@BenjaminBossan in#2391
FIX: Bug withPeftConfig.from_pretrained by@BenjaminBossan in#2397
ENH: Add simple script to estimate train memory by@BenjaminBossan in#2378
CI: Use new slack secret token name by@githubnemo in#2409
ENH: Trainable Tokens: Support for Weight Tying by@githubnemo in#2399
TST: enable BNB tests on XPU by@faaany in#2396
FIX: Reset the FP32 matmul precision in tests by@BenjaminBossan in#2411
TST: add the missing.eval() for inference by@faaany in#2408
FIX: Revert optimization for LoRA scaling == 1 by@BenjaminBossan in#2416
ENH: Extend the regex for rank/alpha pattern by@BenjaminBossan in#2419
FIX: AutoPeftModels never reduce embedding size by@BenjaminBossan in#2427
FIX: Minimal target module optimization bug with IA³ by@BenjaminBossan in#2432
FIX:#2422: Modules to save with multiple adapters by@githubnemo in#2430