ChatHuggingFace

This will help you get started withlangchain_huggingfacechat models. For detailed documentation of allChatHuggingFace features and configurations head to theAPI reference. For a list of models supported by Hugging Face check outthis page.

Overview

Integration details

Class	Package	Local	Serializable	JS support	Package downloads	Package latest
ChatHuggingFace	langchain-huggingface	✅	beta	❌

Model features

Tool calling	Structured output	JSON mode	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	❌	✅	✅	✅	❌	✅	✅	❌

Setup

To access Hugging Face models you'll need to create a Hugging Face account, get an API key, and install thelangchain-huggingface integration package.

Credentials

Generate aHugging Face Access Token and store it as an environment variable:HUGGINGFACEHUB_API_TOKEN.

import getpass
import os

ifnot os.getenv("HUGGINGFACEHUB_API_TOKEN"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"]= getpass.getpass("Enter your token: ")

Installation

Class	Package	Local	Serializable	JS support	Package downloads	Package latest
ChatHuggingFace	langchain_huggingface	✅	❌	❌

Model features

Tool calling	Structured output	JSON mode	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	❌	❌	❌	❌	❌	❌	❌	❌

Setup

To accesslangchain_huggingface models you'll need to create a/anHugging Face account, get an API key, and install thelangchain_huggingface integration package.

Credentials

You'll need to have aHugging Face Access Token saved as an environment variable:HUGGINGFACEHUB_API_TOKEN.

import getpass
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"]= getpass.getpass(
"Enter your Hugging Face API key: "
)

%pip install--upgrade--quiet  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

Instantiation

You can instantiate aChatHuggingFace model in two different ways, either from aHuggingFaceEndpoint or from aHuggingFacePipeline.

`HuggingFaceEndpoint`

from langchain_huggingfaceimport ChatHuggingFace, HuggingFaceEndpoint

llm= HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
    provider="auto",# let Hugging Face choose the best provider for you
)

chat_model= ChatHuggingFace(llm=llm)

API Reference:ChatHuggingFace |HuggingFaceEndpoint

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /Users/isaachershenson/.cache/huggingface/token
Login successful

Now let's take advantage ofInference Providers to run the model on specific third-party providers

llm= HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    provider="hyperbolic",# set your provider here
# provider="nebius",
# provider="together",
)

chat_model= ChatHuggingFace(llm=llm)

`HuggingFacePipeline`

from langchain_huggingfaceimport ChatHuggingFace, HuggingFacePipeline

llm= HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
),
)

chat_model= ChatHuggingFace(llm=llm)

API Reference:ChatHuggingFace |HuggingFacePipeline

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Instatiating with Quantization

To run a quantized version of your model, you can specify abitsandbytes quantization config as follows:

from transformersimport BitsAndBytesConfig

quantization_config= BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)

and pass it to theHuggingFacePipeline as a part of itsmodel_kwargs:

llm= HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
        return_full_text=False,
),
    model_kwargs={"quantization_config": quantization_config},
)

chat_model= ChatHuggingFace(llm=llm)

Invocation

from langchain_core.messagesimport(
    HumanMessage,
    SystemMessage,
)

messages=[
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
),
]

ai_msg= chat_model.invoke(messages)

API Reference:HumanMessage |SystemMessage

print(ai_msg.content)

According to the popular phrase and hypothetical scenario, when an unstoppable force meets an immovable object, a paradoxical situation arises as both forces are seemingly contradictory. On one hand, an unstoppable force is an entity that cannot be stopped or prevented from moving forward, while on the other hand, an immovable object is something that cannot be moved or displaced from its position.

In this scenario, it is un

API reference

For detailed documentation of allChatHuggingFace features and configurations head to the API reference:https://python.langchain.com/api_reference/huggingface/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html

API reference

For detailed documentation of all ChatHuggingFace features and configurations head to the API reference:https://python.langchain.com/api_reference/huggingface/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html

Chat modelconceptual guide
Chat modelhow-to guides

Movatterモバイル変換

ChatHuggingFace

Overview

Integration details

Integration details

Model features

Setup

Credentials

Installation

Model features

Setup

Credentials

Instantiation

`HuggingFaceEndpoint`

`HuggingFacePipeline`

Instatiating with Quantization

Invocation

API reference

API reference

Related

Movatterモバイル変換

Overview​

Integration details​

Integration details​

Model features​

Setup​

Credentials​

Installation​

Model features​

Setup​

Credentials​

Instantiation​

HuggingFaceEndpoint​

HuggingFacePipeline​

Instatiating with Quantization​

Invocation​

API reference​

API reference​

Related​

Overview

Integration details

Integration details

Model features

Setup

Credentials

Installation

Model features

Setup

Credentials

Instantiation

`HuggingFaceEndpoint`

`HuggingFacePipeline`

Instatiating with Quantization

Invocation

API reference

API reference

Related