to-aoki/bitsandbytesPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star4

bitsandbytes modify for jetson orin

License

MIT license

4 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 507 Commits
benchmarking/switchback		benchmarking/switchback
bitsandbytes		bitsandbytes
csrc		csrc
examples		examples
include		include
tests		tests
.buckconfig		.buckconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE.md		NOTICE.md
README.md		README.md
check_bnb_install.py		check_bnb_install.py
compile_from_source.md		compile_from_source.md
cuda_install.sh		cuda_install.sh
deploy.sh		deploy.sh
environment.yml		environment.yml
errors_and_solutions.md		errors_and_solutions.md
how_to_use_nonpytorch_cuda.md		how_to_use_nonpytorch_cuda.md
howto_config_override.md		howto_config_override.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

bitsandbytes

The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.

Resources:

for jetson orin agx

enviroment 6.0DP,dustynv/auto_gptq:r36.2.0

git clone https://github.com/to-aoki/bitsandbytescd bitsandbytesCUDA_VERSION=122 make cuda12xpip3 install scipypython3 -m bitsandbytes:++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++COMPILED_WITH_CUDA = TrueCOMPUTE_CAPABILITIES_PER_GPU = ['8.7']++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Running a quick check that:    + library is importable    + CUDAfunctionis callableWARNING: Please be sure to sanitize sensible info from any such env vars!SUCCESS!Installation was successful!pip3 list| grep bitsandbytesbitsandbytes       0.41.2           /home/user/temp/bitsandbytes

TL;DR

RequirementsPython >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.

(Deprecated: CUDA 10.0 is deprecated and only CUDA >= 11.0) will be supported with release 0.39.0)

Installation:

pip install bitsandbytes

In some cases it can happen that you need to compile from source. If this happens please consider submitting a bug report withpython -m bitsandbytes information. What now follows is some short instructions which might work out of the box ifnvcc is installed. If these do not work see further below.

Compilation quickstart:

git clone https://github.com/timdettmers/bitsandbytes.gitcd bitsandbytes# CUDA_VERSIONS in {110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 120}# make argument in {cuda110, cuda11x, cuda12x}# if you do not know what CUDA you have, try looking at the output of: python -m bitsandbytesCUDA_VERSION=117 make cuda11xpython setup.py install

Using Int8 inference with HuggingFace Transformers

fromtransformersimportAutoModelForCausalLMmodel=AutoModelForCausalLM.from_pretrained(  'decapoda-research/llama-7b-hf,device_map='auto',load_in_8bit=True,max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')

A more detailed example, can be found inexamples/int8_inference_huggingface.py.

Using 8-bit optimizer:

Comment out optimizer:#torch.optim.Adam(....)
Add 8-bit optimizer of your choicebnb.optim.Adam8bit(....) (arguments stay the same)
Replace embedding layer if necessary:torch.nn.Embedding(..) -> bnb.nn.Embedding(..)

Using 8-bit Inference:

Comment out torch.nn.Linear:#linear = torch.nn.Linear(...)
Add bnb 8-bit linear light module:linear = bnb.nn.Linear8bitLt(...) (base arguments stay the same)
There are two modes:
- Mixed 8-bit training with 16-bit main weights. Pass the argumenthas_fp16_weights=True (default)
- Int8 inference. Pass the argumenthas_fp16_weights=False
To use the full LLM.int8() method, use thethreshold=k argument. We recommendk=6.0.

# LLM.int8()linear=bnb.nn.Linear8bitLt(dim1,dim2,bias=True,has_fp16_weights=False,threshold=6.0)# inputs need to be fp16out=linear(x.to(torch.float16))

Features

8-bit Matrix multiplication with mixed precision decomposition
LLM.int8() inference
8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB, Lion (saves 75% memory)
Stable Embedding Layer: Improved stability through better initialization, and normalization
8-bit quantization: Quantile, Linear, and Dynamic quantization
Fast quantile estimation: Up to 100x faster than other algorithms

Requirements & Installation

Requirements: anaconda, cudatoolkit, pytorch

Hardware requirements:

LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).

Supported CUDA versions: 10.2 - 12.0

The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.

The requirements can best be fulfilled by installing pytorch via anaconda. You can install PyTorch by following the"Get Started" instructions on the official website.

To install run:

pip install bitsandbytes

Using bitsandbytes

Using Int8 Matrix Multiplication

For straight Int8 matrix multiplication with mixed precision decomposition you can usebnb.matmul(...). To enable mixed precision decomposition, use the threshold parameter:

bnb.matmul(...,threshold=6.0)

For instructions how to use LLM.int8() inference layers in your own code, see the TL;DR above or for extended instruction seethis blog post.

Using the 8-bit Optimizers

With bitsandbytes 8-bit optimizers can be used by changing a single line of code in your codebase. For NLP models we recommend also to use the StableEmbedding layers (see below) which improves results and helps with stable 8-bit optimization. To get started with 8-bit optimizers, it is sufficient to replace your old optimizer with the 8-bit optimizer in the following way:

importbitsandbytesasbnb# adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizeradam=bnb.optim.Adam8bit(model.parameters(),lr=0.001,betas=(0.9,0.995))# add bnb optimizeradam=bnb.optim.Adam(model.parameters(),lr=0.001,betas=(0.9,0.995),optim_bits=8)# equivalenttorch.nn.Embedding(...)->bnb.nn.StableEmbedding(...)# recommended for NLP models

Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:

# parameter tensors with less than 16384 values are optimized in 32-bit# it is recommended to use multiplies of 4096adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384)

Change Bits and other Hyperparameters for Individual Parameters

If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use theGlobalOptimManager. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See ourguide for more details

Fairseq Users

To use the Stable Embedding Layer, override the respectivebuild_embedding(...) function of your model. Make sure to also use the--no-scale-embedding flag to disable scaling of the word embedding layer (nor replaced with layer norm). You can use the optimizers by replacing the optimizer in the respective file (adam.py etc.).

Release and Feature History

For upcoming features and changes and full history seePatch Notes.

Errors

RuntimeError: CUDA error: no kernel image is available for execution on the device.Solution
_fatbinwrap..Solution

Compile from source

To compile from source, you need an installation of CUDA. Ifnvcc is not installed, you can install the CUDA Toolkit with nvcc through the following commands.

wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH#   CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121}#   EXPORT_TO_BASH in {0, 1} with 0=False and 1=True# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrcbash cuda install 118~/local 1

To use a specific CUDA version just for a single compile run, you can set the variableCUDA_HOME, for example the following command compileslibbitsandbytes_cuda117.so using compiler flags for cuda11x with the cuda version at~/local/cuda-11.7:

CUDA_HOME=~/local/cuda-11.7 CUDA_VERSION=117 make cuda11x

For more detailed instruction, please follow thecompile_from_source.md instructions.

License

The majority of bitsandbytes is licensed under MIT, however portions of the project are available under separate license terms: Pytorch is licensed under the BSD license.

We thank Fabio Cannizzo for his work onFastBinarySearch which we use for CPU quantization.

How to cite us

If you found this library and found LLM.int8() useful, please consider citing our work:

@article{dettmers2022llmint8,title={LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale},author={Dettmers, Tim and Lewis, Mike and Belkada, Younes and Zettlemoyer, Luke},journal={arXiv preprint arXiv:2208.07339},year={2022}}

For 8-bit optimizers or quantization routines, please consider citing the following work:

@article{dettmers2022optimizers,title={8-bit Optimizers via Block-wise Quantization},author={Dettmers, Tim and Lewis, Mike and Shleifer, Sam and Zettlemoyer, Luke},journal={9th International Conference on Learning Representations, ICLR},year={2022}}

About

bitsandbytes modify for jetson orin

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bitsandbytes

for jetson orin agx

TL;DR

Features

Requirements & Installation

Using bitsandbytes

Using Int8 Matrix Multiplication

Using the 8-bit Optimizers

Change Bits and other Hyperparameters for Individual Parameters

Fairseq Users

Release and Feature History

Errors

Compile from source

License

How to cite us

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

to-aoki/bitsandbytes

Folders and files

Latest commit

History

Repository files navigation

bitsandbytes

for jetson orin agx

TL;DR

Features

Requirements & Installation

Using bitsandbytes

Using Int8 Matrix Multiplication

Using the 8-bit Optimizers

Change Bits and other Hyperparameters for Individual Parameters

Fairseq Users

Release and Feature History

Errors

Compile from source

License

How to cite us

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages