DeepInfra
DeepInfra is a serverless inference as a service that provides access to avariety of LLMs andembeddings models. This notebook goes over how to use LangChain with DeepInfra for chat models.
Set the Environment API Key
Make sure to get your API key from DeepInfra. You have toLogin and get a new token.
You are given a 1 hour free of serverless GPU compute to test different models. (seehere)You can print your token withdeepctl auth token
# get a new token: https://deepinfra.com/login?from=%2Fdash
import os
from getpassimport getpass
from langchain_community.chat_modelsimport ChatDeepInfra
from langchain_core.messagesimport HumanMessage
DEEPINFRA_API_TOKEN= getpass()
# or pass deepinfra_api_token parameter to the ChatDeepInfra constructor
os.environ["DEEPINFRA_API_TOKEN"]= DEEPINFRA_API_TOKEN
chat= ChatDeepInfra(model="meta-llama/Llama-2-7b-chat-hf")
messages=[
HumanMessage(
content="Translate this sentence from English to French. I love programming."
)
]
chat.invoke(messages)
API Reference:ChatDeepInfra |HumanMessage
ChatDeepInfra
also supports async and streaming functionality:
from langchain_core.callbacksimport StreamingStdOutCallbackHandler
API Reference:StreamingStdOutCallbackHandler
await chat.agenerate([messages])
chat= ChatDeepInfra(
streaming=True,
verbose=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
chat.invoke(messages)
Tool Calling
DeepInfra currently supports only invoke and async invoke tool calling.
For a complete list of models that support tool calling, please refer to ourtool calling documentation.
import asyncio
from dotenvimport find_dotenv, load_dotenv
from langchain_community.chat_modelsimport ChatDeepInfra
from langchain_core.messagesimport HumanMessage
from langchain_core.toolsimport tool
from pydanticimport BaseModel
model_name="meta-llama/Meta-Llama-3-70B-Instruct"
_= load_dotenv(find_dotenv())
# Langchain tool
@tool
deffoo(something):
"""
Called when foo
"""
pass
# Pydantic class
classBar(BaseModel):
"""
Called when Bar
"""
pass
llm= ChatDeepInfra(model=model_name)
tools=[foo, Bar]
llm_with_tools= llm.bind_tools(tools)
messages=[
HumanMessage("Foo and bar, please."),
]
response= llm_with_tools.invoke(messages)
print(response.tool_calls)
# [{'name': 'foo', 'args': {'something': None}, 'id': 'call_Mi4N4wAtW89OlbizFE1aDxDj'}, {'name': 'Bar', 'args': {}, 'id': 'call_daiE0mW454j2O1KVbmET4s2r'}]
asyncdefcall_ainvoke():
result=await llm_with_tools.ainvoke(messages)
print(result.tool_calls)
# Async call
asyncio.run(call_ainvoke())
# [{'name': 'foo', 'args': {'something': None}, 'id': 'call_ZH7FetmgSot4LHcMU6CEb8tI'}, {'name': 'Bar', 'args': {}, 'id': 'call_2MQhDifAJVoijZEvH8PeFSVB'}]
Related
- Chat modelconceptual guide
- Chat modelhow-to guides