Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/ggufPublic

IBM GGUF-encoded AI models and conversion scripts

License

NotificationsYou must be signed in to change notification settings

IBM/gguf

Repository files navigation

This repository contains the canonical information to use when converting IBM AI models to the GGUF format. It includes conversion scripts and testing requirements. Aspirationally, this repo. wil include and automated CI/CD process to convert, test and deploy models to the official IBM GGUF collection in Hugging Face.


Target IBM Models

Only a select set of IBM models will be converted to GGUF format based upon the following criteria:

  • The IBM GGUF model needs to be referenced by an AI provider service (i.e., a local AI provider service) as a "supported" model.
  • The GGUF model is referenced by a public blog, tutorial, demo, or other public use case.

In addition, models should be canonically hosted in an official IBM's repository. Currently, this includes the following:

The following tables list the current target set of IBM models along with commentary on the rationale for inclusion:

IBM Granite Collection

See:https://huggingface.co/ibm-granite

Granite 3.0 Language Models

These models are found in theGranite 3.0 Language Models collection and are designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications.

NameHF, llama.cp (GGUF) ArchitectureRationaleDetailsReferences
granite-3.0-8b-instructGraniteForCausalLM, llama (gpt2)Consensus defaultModels of ~8B size appear as defaults for most local AI providers.Ollama
granite-3.0-2b-instructGraniteForCausalLM, llama (gpt2)Consensus defaultModels of ~2B or 3B size are offered as built-in alternatives for most local AI providers.Ollamagranite-code:3b
Note:
-HF model named 2B, actual size 3B (as shown in Ollama)
granite-3.0-3b-a800m-baseGraniteMoeForCausalLM, granitemoe (gpt2)Small form-factorModel highlights Granite's capabilities when run on small form-factor CPUs/memory.Ollama

where:

  • Consensus default:
    • Comparable size to default models already referenced by multiple downstream providers and frameworks.
    • Size ideal for local CPU/GPU serving.
  • Small form-factor:
    • Model size intended running locally on small form-factor devices such as watches and mobile devices.

RAG LoRA support

NameArchitectureRationaleDetails
granite3-dense:8b (Ollama)(GGUF)default (quantized) models used with RAG LoRAgranite3_model
granite3-rag:8b (Ollama)(GGUF)TBD why can't we build this whenever corr. dense model is updated?granite3_rag_model

See granite3-dense Ollama model entry:


GGUF Format

The GGUF format is defined in theGGUF specification. The specification describes the structure of the file, how it is encoded, and what information is included.

GGUF Conversion

Currently, the primary means to convert from HF SafeTensors format to GGUF will be the canonical llama.cpp toolconvert-hf-to-gguf.py.

for example:

python llama.cpp/convert-hf-to-gguf.py ./models/modelrepo --outfile output_file.gguf --outtype q8_0

Alternatives TODO: investigate

Ollama CLI

Note: The Ollama CLI tool only supports a subset of quantizations:- (rounding):q4_0,q4_1,q5_0,q5_1,q8_0- k-means:q3_K_S,q3_K_M,q3_K_L,q4_K_S,q4_K_M,q5_K_S,q5_K_M,q6_K

Hugging Face endorsed tool "ggml-org/gguf-my-repo"

Note:

  • Access control to source repo. required???
  • Similar to Ollama CLI, the web UI supports only a subset of quantizations.

GGUF Verification Testing

As a baseline, each converted model MUST successfully be run in the following providers:

llama.cpp testing

llama.cpp - As the core implementation of the GGUF format which is either a direct dependency or utilized as forked code in most all downstream GGUF providers, testing is essential. Specifically, testing to verify the model can be hosted using thellama-server service.-See the specific section onllama.cpp for more details on which version is considered "stable" and how the same version will be used in both conversion and testing.

Ollama testing

Ollama - As a key model service provider supported by higher level frameworks and platforms (e.g.,AnythingLLM,LM Studio etc.), testing the ability topull andrun the model is essential.

Notes

  • The official Ollama Docker imageollama/ollama is available on Docker Hub.
  • Ollama does not yet support sharded GGUF models
    • "Ollama does not support this yet. Follow this issue for more info:ollama/ollama#5245"
    • e.g.,ollama pull hf.co/Qwen/Qwen2.5-14B-Instruct-GGUF

References


Survey of Ollama 'built-in" models

registry: registry.ollama.ai
name (basename,finetune)local namearch. (ggml model)Size (MB)Quant.Ctx. (embed) Len.
gemma-2-9b-it (none, none)gemma2:latestgemma2 (llama)(9B)Q4_0 (2)8192 (3584)
(Meta) Llama 3.2 3B Instruct (Llama-3.2, Instruct)llama3.2:latestllama (gpt2)3BQ4_K_M (15)131072 (3072)
Meta Llama 3.1 8B Instruct (Meta-Llama-3.1, Instruct)llama3.1:latestllama (gpt2)8BQ4_K_M (15)131072 (4096)
Mistral-7B-Instruct-v0.3 (N/A)mistral:latestllama (llama)7BQ4_0 (2)32768 (4096)
Qwen2.5 7B Instruct (Qwen2.5, Instruct)qwen2.5:latestqwen2 (gpt2)7BQ4_K_M (15)32768 (3584)
Version (HF collection)name (basename,finetune)local namearch. (ggml model)Size (MB)Quant.Ctx. (embed) Len.
N/A (code)Granite 8b Code Instruct 128k (granite, code-instruct-128k)granite-code:8bllama (gpt2)8BQ4_0 (2)128000 (4096)
N/A (code)Granite 20b Code Instruct 8k (granite, code-instruct-8k)granite-code:20bstarcoder (gpt2)20BQ4_0 (2)8192 (6144)
3.0 (3.0 language)Granite 3.0 1b A400M Instruct (granite-3.0, instruct)granite3-moe:1bgranitemoe (gpt2)1B-a400MQ4_K_M (15)4096 (1024)
3.0 (3.0 language)Granite 3.0 3b A800M Instruct (granite-3.0, instruct)granite3-moe:3bgranitemoe (gpt2)3B-a800MQ4_K_M (15)4096 (1536)
3.0 (guardian)Granite Guardian 3.0 8b (granite-guardian-3.0,none)granite3-guardian:8bgranite (gpt2)8BQ5_K_M (17)8192 (4096)
3.0 (3.0 language)Granite 3.0 8b Instruct (granite-3.0, instruct)granite3-dense:8bgranite (gpt2)8BQ4_K_M (15)4096 (4096)
3.0 (???)Granite 3.0 8b Instruct (granite-3.0, instruct)granite3-dense:8b-instruct-fp16granite (gpt2)8BF16 (1)4096 (4096)

Notes

  • latest is relative to Ollama (proprietary) publishing and is not reflected in GGUF header.
  • basename,finetune may be are different depending on person who created the GGUF even for the same company...
    • e.g., IBM Granite model "Granite 8b Code Instruct 128k" has afinetune name that does not match other IBM models (i.e.,code-instruct-128k).
  • context_buffer (size) not mentioned infinetune for Ollamagranite-code models which have8k buffers, but is listed for128k buffers.
  • Ollama modelinstructlab/granite-7b-lab is identical to thegranite-7b model.
  • IQ2_XS quant. may have issues on Apple silicon
registry: huggingface.co (hf.co)

Note: "registries" are created using the domain name of the model repo. ref. during apull orrun command.

name (basename,finetune)local namearch. (ggml model)Size (MB)Quant.Ctx. (embed) Len.
Qwen2.5.1 Coder 7B Instruct (Qwen2.5.1-Coder, Instruct)bartowski/Qwen2.5.1-Coder-7B-Instruct-GGUF:latestqwen2 (gpt2)7BQ4_K_M (15)32768 (3584)
liuhaotian (i.e., Llama-3.2-1B-Instruct) (none,none)hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latestllama (llama)(1B)Q4_0 (2)32768 (4096)
Models (i.e., Qwen2.5 14B) (none,none)hf.co/QuantFactory/Qwen2.5-Coder-14B-GGUF:latestllama (gpt2)15BQ2_K (10)32768 (5130)

Notes

  • downstream fine tunings or quants. lose identity (in the GGUF file) or drop (pedigree-related) fields or create new ones
    • general.name,general.basename,general.finetune, etc.
      • e.g.,general.name=liuhaotian is the name of the person who created the downstream GGUF(not the actual model name) (and it had nobasename, norfinetune)
    • .size_label did not match model declared size.
  • when multiple GGUF models are in a repo. Ollama "grabs" the first one (alphanumerically)

[8]ページ先頭

©2009-2025 Movatter.jp