IBM/ggufPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star5

IBM GGUF-encoded AI models and conversion scripts

License

Apache-2.0 license

5 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,325 Commits
.github/workflows		.github/workflows
.vscode		.vscode
bin		bin
resources		resources
scripts		scripts
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
survey.md		survey.md

Repository files navigation

gguf

This repository provides an automated CI/CD process to convert, test and deploy IBM Granite models, in safetensor format, from theibm-granite organization to versioned IBM GGUF collections in Hugging Face Hub under theibm-research organization. This includes:

Granite 3.2 Models (GGUF)

Topic index

Target IBM models for format conversion

Format conversions (i.e., GGUF) and quantizations will only be provided for canonically hosted model repositories hosted in an official IBM Huggingface organization.

Currently, this includes the following organizations:

Additionally, only a select set of IBM models from these orgs. will be converted based upon the following general criteria:

The IBM GGUF model needs to be referenced by an AI provider service as a "supported" model.
- For example, a local AI provider service such asOllama or a hosted service such asReplicate.
The GGUF model is referenced by a public blog, tutorial, demo, or other public use case.
- Specifically, if the model is referenced in an IBMGranite Snack Cookbook

Select quantization will only be made available when:

Small form-factor is justified:
- e.g., Reduced model size intended running locally on small form-factor devices such as watches and mobile devices.
Performance provides significant benefit without compromising on accuracy (or enabling hallucination).

Supported IBM Granite models (GGUF)

Specifically, the following Granite model repositories are currently supported in GGUF format (by collection) with listed:

Language

Typically, this model category includes "instruct" models.

Source Repo. ID	HF (llama.cpp) Architecture	Target Repo. ID
ibm-granite/granite-3.2-2b-instruct	GraniteForCausalLM (gpt2)	ibm-research
ibm-granite/granite-3.2-8b-instruct	GraniteForCausalLM (gpt2)	ibm-research

Supported quantizations:fp16,Q2_K,Q3_K_L,Q3_K_M,Q3_K_S,Q4_0,Q4_1,Q4_K_M,Q4_K_S,Q5_0,Q5_1,Q5_K_M,Q5_K_S,Q6_K,Q8_0

Guardian

Source Repo. ID	HF (llama.cpp) Architecture	Target HF Org.
ibm-granite/granite-guardian-3.2-3b-a800m	GraniteMoeForCausalLM (granitemoe)	ibm-research
ibm-granite/granite-guardian-3.2-5b	GraniteMoeForCausalLM (granitemoe)	ibm-research

Supported quantizations:fp16,Q4_K_M,Q5_K_M,Q6_K,Q8_0

Vision

HF (llama.cpp) Architecture	Source Repo. ID	Target HF Org.
ibm-granite/granite-vision-3.2-2b	GraniteForCausalLM (granite), LlavaNextForConditionalGeneration	ibm-research

Supported quantizations:fp16,Q4_K_M,Q5_K_M,Q8_0

Embedding (dense)

Source Repo. ID	HF (llama.cpp) Architecture	Target HF Org.
ibm-granite/granite-embedding-30m-english	Roberta (roberta-bpe)	ibm-research
ibm-granite/granite-embedding-125m-english	Roberta (roberta-bpe)	ibm-research
ibm-granite/granite-embedding-107m-multilingual	Roberta (roberta-bpe)	ibm-research
ibm-granite/granite-embedding-278m-multilingual	Roberta (roberta-bpe)	ibm-research

Supported quantizations:fp16,Q8_0

Note: Sparse model architecture (i.e., HFRobertaMaskedLM) is not currently supported; therefore, there is no conversion foribm-granite/granite-embedding-30m-sparse.

RAG LoRA support**

LoRA support is currently in plan (no date).

GGUF Conversion & Quantization

The GGUF format is defined in theGGUF specification. The specification describes the structure of the file, how it is encoded, and what information is included.

Currently, the primary means to convert from HF SafeTensors format to GGUF will be the canonical llama.cpp toolconvert-hf-to-gguf.py.

for example:

python llama.cpp/convert-hf-to-gguf.py ./<model_repo> --outfile output_file.gguf --outtype q8_0

Alternatives

Ollama CLI (future)

https://github.com/ollama/ollama/blob/main/docs/import.md#quantizing-a-model

$ ollama create --quantize q4_K_M mymodeltransferring model dataquantizing F16 model to Q4_K_Mcreating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bdcreating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0fwriting manifestsuccess

Note: The Ollama CLI tool only supports a subset of quantizations:- (rounding):q4_0,q4_1,q5_0,q5_1,q8_0- k-means:q3_K_S,q3_K_M,q3_K_L,q4_K_S,q4_K_M,q5_K_S,q5_K_M,q6_K

Hugging Face endorsed tool "ggml-org/gguf-my-repo"

https://huggingface.co/spaces/ggml-org/gguf-my-repo

Note:

Similar to Ollama CLI, the web UI supports only a subset of quantizations.

GGUF Verification Testing

As a baseline, each converted model MUST successfully be run in the following providers:

llama.cpp testing

llama.cpp - As the core implementation of the GGUF format which is either a direct dependency or utilized as forked code in most all downstream GGUF providers, testing is essential. Specifically, testing to verify the model can be hosted using thellama-server service.-See the specific section onllama.cpp for more details on which version is considered "stable" and how the same version will be used in both conversion and testing.

Ollama testing (future)

Ollama - As a key model service provider supported by higher level frameworks and platforms (e.g.,AnythingLLM,LM Studio etc.), testing the ability topull andrun the model is essential.

Notes

The official Ollama Docker imageollama/ollama is available on Docker Hub.
Ollama does not yet support sharded GGUF models
- "Ollama does not support this yet. Follow this issue for more info:ollama/ollama#5245"
- e.g.,ollama pull hf.co/Qwen/Qwen2.5-14B-Instruct-GGUF

References

GGUF format
- Huggingface:GGUF - describes the format and some of the header structure.
- llama.cpp:
  - GGUF Quantization types (ggml_ftype) -ggml/include/ggml.h
  - GGUF Quantization types (LlamaFileType) -gguf-py/gguf/constants.py
GGUF Examples
- llama.cpp/examples/quantize
GGUF tools
- GGUF-my-repo - Hugging Face space to build your own quants. without any setup.(ref. by llama.cpp example docs.)
- CISCai/gguf-editor - batch conversion tool for HF model repos. GGUF models.
llama.cpp Tutorials
- How to convert any HuggingFace Model to gguf file format? - using thellama.cpp/convert-hf-to-gguf.py conversion script.
Ollama tutorials
- Importing a model - includes Safetensors, GGUF.
- Use Ollama with any GGUF Model on Hugging Face Hub
- Using Ollama models from Langchain - This example uses thegemma2 model supported by Ollama.

Releasing GGUF model conversions & quantizations

This repository uses GitHub workflows and actions to convert IBM Granite models hosted on Huggingface to GGUF format, quantize them, run build-verification tests on the resultant models and publish them to target GGUF collections in IBM owned Huggingface organizations (e.g.,ibm-research andibm-granite).

Types of releases

There are 3 types of releases that can be performed on this repository:

Test (private) - releases GGUF models to a test (or private) repo. on Huggingface.
Preview (private) - releases GGUF models to a GGUF collection within theibm-granite HF organization for time-limited access to select IBM partners (typically for pre-release testing and integration).
Public - releases GGUF models to a public GGUF collection within theibm-research HF organization for general use.

Note:The Huggingface (HF) term "private" means that repos. and collections created in the target HF organization are only visible to organization contributors and not visible (or hidden) from normal users.

Configuring a release

Prior to "triggering" release workflows, some files need to be configured depending on the release type.

Github secrets

Project maintainers for this repo. are able to access the secrets (tokens) that are made available to the CI/CD release workflows/actions:

https://github.com/IBM/gguf/settings/secrets/actions

Secrets are used to authenticate with Github and Huggingface (HF) and are already configured for theibm-granite andibm-research HF organizations for "preview" and "public" release types.

For "test" (or private) builds, users can fork the repo. and add a repository secret namedHF_TOKEN_TEST with a token (value) created on their test (personal, private) HF organization account with appropriate privileges to allow write access to repos. and collections.

Base64 encoding

If you need to encode information for project CI GitHub workflows, please use the following macos command and assure there are no line breaks:

base64 -i <input_file> > <output_file>

Collection mapping files (JSON)

Each release uses a model collection mapping file that defines which models repositories along with titles, descriptions and family designations belong to that collection. Family designations allow granular control over the which model families are included in a release which allows for "staggered" releases typically by model architecture (e.g.,vision,embedding, etc.).

Originally, different IBM Granite releases had their own collection mapping file; however, we now use a single collection mapping file for all releases of GGUF model formats for simpler downstream consumption:

Unified mapping: (all release types)resources/json/latest/hf_collection_mapping_gguf.json

What to update

The JSON collection mapping files have the following structure using the "Public" release as an example:

{"collections": [        {"title":"Granite GGUF Models","description":"GGUF-formatted versions of IBM Granite models. Licensed under the Apache 2.0 license.","items": [                {"type":"model","family":"instruct","repo_name":"granite-3.3-8b-instruct"                },...                {"type":"model","family":"vision","repo_name":"granite-vision-3.2-2b"                },...                {"type":"model","family":"guardian","repo_name":"granite-guardian-3.2-3b-a800m"                },...                {"type":"model","family":"embedding","repo_name":"granite-embedding-30m-english"                },...            ]        }    ]}

Simple add a new object under theitems array for each new IBM Granite repo. you want added to the corresponding (GGUF) collection.

Currently, the only HF item type supported ismodel and valid families (which have supported workflows) include:instruct (language),vision,guardian andembedding.

Note: If you need to change the HF collection description, please know thatHF limits this string to150 chars. or less.

Release workflow files

Each release type has a corresponding (parent, master) workflow that configures and controls which model family (i.e.,instruct (language),vision,guardian andembedding) are executed for a given GitHub (tagged) release.

For example, a3.2 versioned release uses the following files which correspond to one of the release types (i.e.,Test,Preview orPublic):

Test:.github/workflows/granite-3.2-release-test.yml
Preview:.github/workflows/granite-3.2-release-preview-ibm-granite.yml
Public:.github/workflows/granite-3.2-release-ibm-research.yml

What to update

The YAML GitHub workflow files have a few environment variables that may need to be updated to reflect which collections, models and quantizations should be included on the next, subsequent GitHub (tagged)release. Using the "Public" release YAML file as an example:

env:ENABLE_INSTRUCT_JOBS:falseENABLE_VISION_JOBS:falseENABLE_GUARDIAN_JOBS:trueSOURCE_INSTRUCT_REPOS:"[    'ibm-granite/granite-3.2-2b-instruct',    ...  ]"TARGET_INSTRUCT_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"SOURCE_GUARDIAN_REPOS:"[    'ibm-granite/granite-guardian-3.2-3b-a800m',    ...  ]"TARGET_GUARDIAN_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"SOURCE_VISION_REPOS:"[    'ibm-granite/granite-vision-3.2-2b',    ...  ]"TARGET_VISION_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"...COLLECTION_CONFIG:"resources/json/latest/hf_collection_mapping_gguf.json"

Note: that theCOLLECTION_CONFIG env. var. provides the relative path to the collection configuration file, which is located in theresources/json directory of the repository for the specific Granite release.

Updating Tools

llama.cpp

Clone and build the following llama.cpp binaries using these build/link flags:

Build intermediate CMake build files

The following command will create the proper CMakebuild files for generating code that will run within bothmacos andubuntu container images. They also assure that the llama.cpp libraries will not attempt to use GPUs since the current GitHub Virtual Machines for both operating systems do not support this.

cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_METAL=OFF -DGGML_NATIVE_DEFAULT=OFF -DCMAKE_CROSSCOMPILING=TRUE -DGGML_NO_ACCELERATE=ON

Note: As flags have changed often, the following minimal set of flags MAY work but needs testing:

cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_NO_ACCELERATE=ON -DCMAKE_CROSSCOMPILING=TRUE

Build release binaries

Use this command to build all llama.cpp tool binaries tobuild/bin directory:

cmake --build build --config Release

Copy built binaries and push to`bin`

Once built locally, copy the following files from yourbuild/bin directory to this repository'sbin directory:

llama-cli
llama-quantize
llama-run
llama-server
llama-llava-cli(May no longer be needed/supported as of May 2025 as llava support has been rolled into general libs under multimodal support aka.mtmd )

Triggering a release

This section contains the steps required to successfully "trigger" a release workflow for one or more supported Granite models families (i.e.,instruct (language),vision,guardian andembedding).

Click theReleases link from the right column of the repo. home page which should be the URLhttps://github.com/IBM/gguf/releases.
Click the "Draft a new release" button near the top of the releases page.
Click the "Choose a tag" drop-down menu and enter a tag name that starts with one of the following strings relative to which release type you want to "trigger":
- Test:test-v3.3 (private HF org.)
- Preview:preview-v3.3 (IBM Granite, private/hidden)
- Public:v3.3 (IBM Granite)
Treat these strings as "prefixes" which you must append a unique build version. For example:
- v3.3-rc-01for a release candidate version 01 under the IBM Granite org. on Hugging Face Hub.
"Create a new tag: on publish" near the bottom of the drop-down list.
By convention, add the same "tag" name you created in the previous step into the "Release title" entry field.
Adjust the "Set as a pre-release" and "Set as the latest release" checkboxes to your desired settings.
Click the "Publish release" button.

At this point, you can observe the CI/CD workflows being run by the GitHub service "runners".Please note that during heavy traffic times, assignment of a "runner" (for each workflow job) may take longer.

To observe the CI/CD process in action, please navigate to the following URL:

https://github.com/IBM/gguf/actions

and look for the name of thetag you entered for the release (above) in the workflow run title.

Note

It is common to occasionally see some jobs "fail" due to network or scheduling timeout errors. In these cases, you can go into the failed workflow run and click on the "Re-run failed jobs" button to re-trigger the failed job(s).

About

IBM GGUF-encoded AI models and conversion scripts

Releases25

v3.3-rc.1.base Latest

May 2, 2025

+ 24 releases

Packages

No packages published

Movatterモバイル変換

License

IBM/gguf

Folders and files

Latest commit

History

Repository files navigation

gguf

Topic index

Target IBM models for format conversion

Supported IBM Granite models (GGUF)

Language

Guardian

Vision

Embedding (dense)

RAG LoRA support**

GGUF Conversion & Quantization

Alternatives

Ollama CLI (future)

Hugging Face endorsed tool "ggml-org/gguf-my-repo"

GGUF Verification Testing

llama.cpp testing

Ollama testing (future)

References

Releasing GGUF model conversions & quantizations

Types of releases

Configuring a release

Github secrets

Base64 encoding

Collection mapping files (JSON)

What to update

Release workflow files

What to update

Updating Tools

llama.cpp

Build intermediate CMake build files

Build release binaries

Copy built binaries and push tobin

Triggering a release

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases25

Packages0

Uh oh!

Contributors2

Languages

Copy built binaries and push to`bin`

Packages