Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ggufPublic

IBM GGUF-encoded AI models and conversion scripts

License

NotificationsYou must be signed in to change notification settings

IBM/gguf

Repository files navigation

This repository provides an automated CI/CD process to convert, test and deploy IBM Granite models, in safetensor format, from theibm-granite organization to versioned IBM GGUF collections in Hugging Face Hub under theibm-research organization. This includes:

Topic index


Target IBM models for format conversion

Format conversions (i.e., GGUF) and quantizations will only be provided for canonically hosted model repositories hosted in an official IBM Huggingface organization.

Currently, this includes the following organizations:

Additionally, only a select set of IBM models from these orgs. will be converted based upon the following general criteria:

  • The IBM GGUF model needs to be referenced by an AI provider service as a "supported" model.

    • For example, a local AI provider service such asOllama or a hosted service such asReplicate.
  • The GGUF model is referenced by a public blog, tutorial, demo, or other public use case.

Select quantization will only be made available when:

  • Small form-factor is justified:
    • e.g., Reduced model size intended running locally on small form-factor devices such as watches and mobile devices.
  • Performance provides significant benefit without compromising on accuracy (or enabling hallucination).

Supported IBM Granite models (GGUF)

Specifically, the following Granite model repositories are currently supported in GGUF format (by collection) with listed:

Language

Typically, this model category includes "instruct" models.

Source Repo. IDHF (llama.cpp) ArchitectureTarget Repo. ID
ibm-granite/granite-3.2-2b-instructGraniteForCausalLM (gpt2)ibm-research
ibm-granite/granite-3.2-8b-instructGraniteForCausalLM (gpt2)ibm-research
  • Supported quantizations:fp16,Q2_K,Q3_K_L,Q3_K_M,Q3_K_S,Q4_0,Q4_1,Q4_K_M,Q4_K_S,Q5_0,Q5_1,Q5_K_M,Q5_K_S,Q6_K,Q8_0
Guardian
Source Repo. IDHF (llama.cpp) ArchitectureTarget HF Org.
ibm-granite/granite-guardian-3.2-3b-a800mGraniteMoeForCausalLM (granitemoe)ibm-research
ibm-granite/granite-guardian-3.2-5bGraniteMoeForCausalLM (granitemoe)ibm-research
  • Supported quantizations:fp16,Q4_K_M,Q5_K_M,Q6_K,Q8_0
Vision
HF (llama.cpp) ArchitectureSource Repo. IDTarget HF Org.
ibm-granite/granite-vision-3.2-2bGraniteForCausalLM (granite), LlavaNextForConditionalGenerationibm-research
  • Supported quantizations:fp16,Q4_K_M,Q5_K_M,Q8_0
Embedding (dense)
Source Repo. IDHF (llama.cpp) ArchitectureTarget HF Org.
ibm-granite/granite-embedding-30m-englishRoberta (roberta-bpe)ibm-research
ibm-granite/granite-embedding-125m-englishRoberta (roberta-bpe)ibm-research
ibm-granite/granite-embedding-107m-multilingualRoberta (roberta-bpe)ibm-research
ibm-granite/granite-embedding-278m-multilingualRoberta (roberta-bpe)ibm-research
  • Supported quantizations:fp16,Q8_0

Note: Sparse model architecture (i.e., HFRobertaMaskedLM) is not currently supported; therefore, there is no conversion foribm-granite/granite-embedding-30m-sparse.

RAG LoRA support**
  • LoRA support is currently in plan (no date).

GGUF Conversion & Quantization

The GGUF format is defined in theGGUF specification. The specification describes the structure of the file, how it is encoded, and what information is included.

Currently, the primary means to convert from HF SafeTensors format to GGUF will be the canonical llama.cpp toolconvert-hf-to-gguf.py.

for example:

python llama.cpp/convert-hf-to-gguf.py ./<model_repo> --outfile output_file.gguf --outtype q8_0

Alternatives

Ollama CLI (future)

Note: The Ollama CLI tool only supports a subset of quantizations:- (rounding):q4_0,q4_1,q5_0,q5_1,q8_0- k-means:q3_K_S,q3_K_M,q3_K_L,q4_K_S,q4_K_M,q5_K_S,q5_K_M,q6_K

Hugging Face endorsed tool "ggml-org/gguf-my-repo"

Note:

  • Similar to Ollama CLI, the web UI supports only a subset of quantizations.

GGUF Verification Testing

As a baseline, each converted model MUST successfully be run in the following providers:

llama.cpp testing

llama.cpp - As the core implementation of the GGUF format which is either a direct dependency or utilized as forked code in most all downstream GGUF providers, testing is essential. Specifically, testing to verify the model can be hosted using thellama-server service.-See the specific section onllama.cpp for more details on which version is considered "stable" and how the same version will be used in both conversion and testing.

Ollama testing (future)

Ollama - As a key model service provider supported by higher level frameworks and platforms (e.g.,AnythingLLM,LM Studio etc.), testing the ability topull andrun the model is essential.

Notes

  • The official Ollama Docker imageollama/ollama is available on Docker Hub.
  • Ollama does not yet support sharded GGUF models
    • "Ollama does not support this yet. Follow this issue for more info:ollama/ollama#5245"
    • e.g.,ollama pull hf.co/Qwen/Qwen2.5-14B-Instruct-GGUF

References


Releasing GGUF model conversions & quantizations

This repository uses GitHub workflows and actions to convert IBM Granite models hosted on Huggingface to GGUF format, quantize them, run build-verification tests on the resultant models and publish them to target GGUF collections in IBM owned Huggingface organizations (e.g.,ibm-research andibm-granite).

Types of releases

There are 3 types of releases that can be performed on this repository:

  1. Test (private) - releases GGUF models to a test (or private) repo. on Huggingface.
  2. Preview (private) - releases GGUF models to a GGUF collection within theibm-granite HF organization for time-limited access to select IBM partners (typically for pre-release testing and integration).
  3. Public - releases GGUF models to a public GGUF collection within theibm-research HF organization for general use.

Note:The Huggingface (HF) term "private" means that repos. and collections created in the target HF organization are only visible to organization contributors and not visible (or hidden) from normal users.

Configuring a release

Prior to "triggering" release workflows, some files need to be configured depending on the release type.

Github secrets

Project maintainers for this repo. are able to access the secrets (tokens) that are made available to the CI/CD release workflows/actions:

https://github.com/IBM/gguf/settings/secrets/actions

Secrets are used to authenticate with Github and Huggingface (HF) and are already configured for theibm-granite andibm-research HF organizations for "preview" and "public" release types.

For "test" (or private) builds, users can fork the repo. and add a repository secret namedHF_TOKEN_TEST with a token (value) created on their test (personal, private) HF organization account with appropriate privileges to allow write access to repos. and collections.

Base64 encoding

If you need to encode information for project CI GitHub workflows, please use the following macos command and assure there are no line breaks:

base64 -i <input_file> > <output_file>

Collection mapping files (JSON)

Each release uses a model collection mapping file that defines which models repositories along with titles, descriptions and family designations belong to that collection. Family designations allow granular control over the which model families are included in a release which allows for "staggered" releases typically by model architecture (e.g.,vision,embedding, etc.).

Originally, different IBM Granite releases had their own collection mapping file; however, we now use a single collection mapping file for all releases of GGUF model formats for simpler downstream consumption:

What to update

The JSON collection mapping files have the following structure using the "Public" release as an example:

{"collections": [        {"title":"Granite GGUF Models","description":"GGUF-formatted versions of IBM Granite models. Licensed under the Apache 2.0 license.","items": [                {"type":"model","family":"instruct","repo_name":"granite-3.3-8b-instruct"                },...                {"type":"model","family":"vision","repo_name":"granite-vision-3.2-2b"                },...                {"type":"model","family":"guardian","repo_name":"granite-guardian-3.2-3b-a800m"                },...                {"type":"model","family":"embedding","repo_name":"granite-embedding-30m-english"                },...            ]        }    ]}

Simple add a new object under theitems array for each new IBM Granite repo. you want added to the corresponding (GGUF) collection.

Currently, the only HF item type supported ismodel and valid families (which have supported workflows) include:instruct (language),vision,guardian andembedding.

Note: If you need to change the HF collection description, please know thatHF limits this string to150 chars. or less.

Release workflow files

Each release type has a corresponding (parent, master) workflow that configures and controls which model family (i.e.,instruct (language),vision,guardian andembedding) are executed for a given GitHub (tagged) release.

For example, a3.2 versioned release uses the following files which correspond to one of the release types (i.e.,Test,Preview orPublic):

What to update

The YAML GitHub workflow files have a few environment variables that may need to be updated to reflect which collections, models and quantizations should be included on the next, subsequent GitHub (tagged)release. Using the "Public" release YAML file as an example:

env:ENABLE_INSTRUCT_JOBS:falseENABLE_VISION_JOBS:falseENABLE_GUARDIAN_JOBS:trueSOURCE_INSTRUCT_REPOS:"[    'ibm-granite/granite-3.2-2b-instruct',    ...  ]"TARGET_INSTRUCT_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"SOURCE_GUARDIAN_REPOS:"[    'ibm-granite/granite-guardian-3.2-3b-a800m',    ...  ]"TARGET_GUARDIAN_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"SOURCE_VISION_REPOS:"[    'ibm-granite/granite-vision-3.2-2b',    ...  ]"TARGET_VISION_QUANTIZATIONS:"[    'Q4_K_M',    ...  ]"...COLLECTION_CONFIG:"resources/json/latest/hf_collection_mapping_gguf.json"

Note: that theCOLLECTION_CONFIG env. var. provides the relative path to the collection configuration file, which is located in theresources/json directory of the repository for the specific Granite release.

Updating Tools

llama.cpp

Clone and build the following llama.cpp binaries using these build/link flags:

Build intermediate CMake build files

The following command will create the proper CMakebuild files for generating code that will run within bothmacos andubuntu container images. They also assure that the llama.cpp libraries will not attempt to use GPUs since the current GitHub Virtual Machines for both operating systems do not support this.

cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_METAL=OFF -DGGML_NATIVE_DEFAULT=OFF -DCMAKE_CROSSCOMPILING=TRUE -DGGML_NO_ACCELERATE=ON

Note: As flags have changed often, the following minimal set of flags MAY work but needs testing:

cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_NO_ACCELERATE=ON -DCMAKE_CROSSCOMPILING=TRUE
Build release binaries

Use this command to build all llama.cpp tool binaries tobuild/bin directory:

cmake --build build --config Release
Copy built binaries and push tobin

Once built locally, copy the following files from yourbuild/bin directory to this repository'sbin directory:

  • llama-cli
  • llama-quantize
  • llama-run
  • llama-server
  • llama-llava-cli(May no longer be needed/supported as of May 2025 as llava support has been rolled into general libs under multimodal support aka.mtmd )

Triggering a release

This section contains the steps required to successfully "trigger" a release workflow for one or more supported Granite models families (i.e.,instruct (language),vision,guardian andembedding).

  1. Click theReleases link from the right column of the repo. home page which should be the URLhttps://github.com/IBM/gguf/releases.

  2. Click the "Draft a new release" button near the top of the releases page.

  3. Click the "Choose a tag" drop-down menu and enter a tag name that starts with one of the following strings relative to which release type you want to "trigger":

    • Test:test-v3.3 (private HF org.)
    • Preview:preview-v3.3 (IBM Granite, private/hidden)
    • Public:v3.3 (IBM Granite)

    Treat these strings as "prefixes" which you must append a unique build version. For example:

    • v3.3-rc-01for a release candidate version 01 under the IBM Granite org. on Hugging Face Hub.
  4. "Create a new tag: on publish" near the bottom of the drop-down list.

  5. By convention, add the same "tag" name you created in the previous step into the "Release title" entry field.

  6. Adjust the "Set as a pre-release" and "Set as the latest release" checkboxes to your desired settings.

  7. Click the "Publish release" button.

At this point, you can observe the CI/CD workflows being run by the GitHub service "runners".Please note that during heavy traffic times, assignment of a "runner" (for each workflow job) may take longer.

To observe the CI/CD process in action, please navigate to the following URL:

and look for the name of thetag you entered for the release (above) in the workflow run title.

Note

It is common to occasionally see some jobs "fail" due to network or scheduling timeout errors. In these cases, you can go into the failed workflow run and click on the "Re-run failed jobs" button to re-trigger the failed job(s).


[8]ページ先頭

©2009-2025 Movatter.jp