- Notifications
You must be signed in to change notification settings - Fork2
The core repository for Katanemo's advanced function calling models with top-tier performance. Features three collections: Arch-Function (core function calling), Arch-Function-Chat (conversational), and Arch-Agent (autonomous workflows). Models range from 1.5B to 32B parameters with training/inference/deployment guides for vLLM, ollama, and SGLang.
License
katanemo/Arch-Function
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Arch-Function represents a comprehensive research and development initiative focused on creating state-of-the-art function calling capabilities in large language models. Our mission is to build AI systems that can seamlessly understand, interpret, and execute complex function calls with unprecedented accuracy and reliability.
This project encompasses multiple model families specifically engineered for function calling tasks, designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. The current release includes three major collections with models available in multiple sizes, with additional breakthrough models planned for future releases that will further advance the state-of-the-art in function calling capabilities.
- [2025-06]: 🏆🏆🏆Arch-Agent collection released for advanced multi-turn, multi-step workflow automation, achieving Top-3 performance on theBFCL Leaderboard!
- [2025-02]: 🚀🚀🚀Arch-Function-Chat collection launched with conversational function calling capabilities!
- [2024-12]: 🔥🔥🔥 Complete model suite updated with latest improvements across all sizes forArch-Function collection!
- [2024-09]: 🏆🏆🏆Arch-Function collection officially launched on Hugging Face, achieving Top-7 performance on theBFCL Leaderboard!
Hugging Face Collection:Arch-Function
| Model Name | Size | Key Features | Downloads |
|---|---|---|---|
| Arch-Function-1.5B | 1.5B | • Compact size for edge deployment • Efficient function calling • Low resource requirements | 🤗 HuggingFace |
| Arch-Function-3B | 3B | • Balanced performance and efficiency • High accuracy function calling • Production-ready | 🤗 HuggingFace |
| Arch-Function-7B | 7B | • Maximum performance • Complex function handling • Enterprise-grade capabilities | 🤗 HuggingFace |
Hugging Face Collection:Arch-Function-Chat
| Model Name | Size | Key Features | Downloads |
|---|---|---|---|
| Arch-Function-Chat-1.5B | 1.5B | • Conversational function calling • Interactive agent capabilities • Lightweight deployment | 🤗 HuggingFace |
| Arch-Function-Chat-3B | 3B | • Advanced dialogue management • Context-aware function usage • Multi-turn conversations | 🤗 HuggingFace |
| Arch-Function-Chat-7B | 7B | • Sophisticated reasoning • Complex multi-step workflows • Premium chat experience | 🤗 HuggingFace |
Hugging Face Collection:Arch-Agent
| Model Name | Size | Key Features | Downloads |
|---|---|---|---|
| Arch-Agent-1.5B | 1.5B | • Lightweight autonomous workflows • Edge-optimized performance • Low resource requirements | 🤗 HuggingFace |
| Arch-Agent-3B | 3B | • Balanced autonomous performance • Multi-step task execution • High accuracy workflows | 🤗 HuggingFace |
| Arch-Agent-7B | 7B | • Advanced autonomous behavior • Complex workflow orchestration • Maximum performance | 🤗 HuggingFace |
| Arch-Agent-32B | 32B | • Premium autonomous systems • Sophisticated multi-step workflows • Superior capabilities | 🤗 HuggingFace |
Here we provide a script to fine-tune Arch-Function models withLLaMA-Factory:
- Create the environment following the instructions ofLLaMA-Factory
- If you would like to use deepspeed and flash-attn, you can install packages with the following command:
pip install deepspeedpip install flash-attn --no-build-isolationLLaMA-Factory supports datasets inalpaca andsharegpt format. We recommend using thesharegpt format for function calling tasks. Below is an example of dataset in:
[{"conversations": [{"from":"human","value":"user instruction"},{"from":"function_call","value":"tool arguments"},{"from":"observation","value":"tool result"},{"from":"gpt","value":"model response"}],"system":"system prompt (optional)","tools":"tool description (optional)"}]Next, updatedata/dataset_info.json with the dataset description below:
"dataset_name": {"file_name":"data.json","formatting":"sharegpt","columns": {"messages":"conversations","system":"system","tools":"tools"}}
LLaMA-Factory provides diverse examples of training for LLMs underexamples. You can follow these examples and create a training script for your purpose. To kick off training, run the following command:
CUDA_VISIBLE_DEVICES={YOUR_DEVICE_IDS} llamafactory-cli train {PATH_TO_YOUR_TRAINING_SCRIPT}Key considerations for fine-tuning:
- Prepare high-quality function calling examples with proper format
- Use gradient accumulation for larger effective batch sizes
- Monitor validation loss to prevent overfitting
- Consider using LoRA for parameter-efficient fine-tuning
To run inference with Arch-Function models for function calling tasks, follow the steps below:
Arch-Function models have been in the Hugging Facetransformers library and we advise you to install latest version with the following command:
pip install transformers>=4.51.0Below is a script demonstrating how to use Arch-Function models for function calling tasks.
You can specify the desired model name and create models and corresponding tokenizers with the following script:
importjsonfromtypingimportAny,Dict,ListfromtransformersimportAutoModelForCausalLM,AutoTokenizer# Specify the desired model name heremodel_name="katanemo/Arch-Agent-7B"model=AutoModelForCausalLM.from_pretrained(model_name,device_map="auto",torch_dtype="auto",trust_remote_code=True)tokenizer=AutoTokenizer.from_pretrained(model_name)
Our models perform best when using the recommended prompt format, which can be found in the corresponding model cards on Hugging Face. You can run the following script to format prompts:
# Please use the recommended prompt for each model.TASK_PROMPT= ("You are a helpful assistant designed to assist with the user query by making one or more function calls if needed.""\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\n""You are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{tool_text}""\n</tools>\n\nFor each function call, return a json object with function name and arguments within """"<tool_call></tool_call> XML tags:\n<tool_call>\n{{"name": <function-name>, """""""arguments": <args-json-object>}}\n</tool_call>""")# Define available toolstools= [ {"type":"function","function": {"name":"get_weather","description":"Get the current weather for a location","parameters": {"type":"object","properties": {"location": {"type":"str","description":"The city and state, e.g. San Francisco, New York", },"unit": {"type":"str","enum": ["celsius","fahrenheit"],"description":"The unit of temperature to return", }, },"required": ["location"], }, }, }]# Helper function to create the system prompt for our modeldefformat_prompt(tools:List[Dict[str,Any]]):tool_text="\n".join( [json.dumps(tool["function"],ensure_ascii=False)fortoolintools] )returnTASK_PROMPT.format(tool_text=tool_text)system_prompt=format_prompt(tools)messages= [ {"role":"system","content":system_prompt}, {"role":"user","content":"What is the weather in Seattle?"},]
Now, you can run the following script to do inference with Arch-Function models.
#### 2.2.3 Run inferencemodel_inputs=tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True).to(model.device)generated_ids=model.generate(**model_inputs,max_new_tokens=32768)generated_ids= [output_ids[len(input_ids) :]forinput_ids,output_idsinzip(model_inputs.input_ids,generated_ids)]response=tokenizer.batch_decode(generated_ids,skip_special_tokens=True)[0]print(response)
Inference optimization tips:
- Use appropriate temperature settings (0.0 - 0.1 for function calling)
- User proper prompt formatting for best results
- Consider batching for multiple requests
- Use quantized models for faster inference
Below we show how to deploy Arch-Function models using popular model hosting frameworks.
vLLM provides high-throughput serving with advanced optimizations. Following the steps below to deploy Arch-Function models with vLLM
# Install vLLMpip install vllmvllm serve katanemo/Arch-Agent-7B \ --host 127.0.0.1 \ --port 8000 \ --tensor-parallel-size 1
To get responses from the vLLM server for function calling, first format prompts followinghere. Then, replacemessages in the script below with the formatted prompts and run the script.
fromopenaiimportOpenAI# Point to the local serverclient=OpenAI(api_key="EMPTY",base_url="http://127.0.0.1:8000/v1",)# Send requests and get responses from the servercompletion=client.chat.completions.create(model="katanemo/Arch-Agent-7B",messages=[ {"role":"user","content":"Get the current temperature in San Francisco"} ],temperature=0.01,max_tokens=1024)print(completion.choices[0].message.content)
ollama provides easy local deployment with automatic model management. Below we provide scripts to show how to use ollama for deployment.
Please seeollama for installation. If necessary, use the following command to install ollama python library.
pip install ollama
Specify your desired model name below and run the follwoing command to start the ollama server. Note thatollama only supportsgguf format.
ollama run hf.co/katanemo/Arch-Agent-7B.gguf
Format prompts followinghere, and the replaceformatted_prompt in the script below and run the script to get responses.
fromollamaimportClient# Point to the local server. By default, it uses port 11434.client=Client(host="http://127.0.0.1:11434")# Send requests and get responses from the servercompletion=client.chat(model="hf.co/katanemo/Arch-Agent-1.5B.gguf",messages=[ {"role":"user","content":"Get the current temperature in San Francisco"} ],options={"temperature":0.01,"num_ctx":1024})print(completion.message.content)
SGLang offers structured generation capabilities with high performance. To use SGLang for deployment, follow the steps below.
# Install SGLangpip install sglang[all]python -m sglang.launch_server \ --model-path katanemo/Arch-Agent-7B \ --host 127.0.0.1 \ --port 8000 \ --tp 1 \ --trust-remote-code
As sglang provides OpenAI-compatible APIs, you can follow the same way as vLLM to get responses from the server. First format prompts followinghere. Then, replacemessages in the script below with the formatted prompts and run the script.
# Client code for vLLMfromopenaiimportOpenAI# Point to the local serverclient=OpenAI(api_key="EMPTY",base_url="http://127.0.0.1:8000/v1",)#completion=client.chat.completions.create(model="katanemo/Arch-Agent-7B",messages=[ {"role":"user","content":"Get the current temperature in San Francisco"} ],temperature=0.01,max_tokens=1024)print(completion.choices[0].message.content)
The Arch-Function project is actively developing next-generation models that will:
- Further advance function calling accuracy beyond current SOTA
- Introduce novel architectures optimized for tool usage
- Expand to multimodal function calling capabilities
- Support more complex reasoning patterns in function selection
Please refer to the individual model pages on Hugging Face for specific licensing information.
We welcome contributions to improve the Arch-Function tutorials and documentation! You can help by:
- Fixing errors or improving existing tutorials
- Adding new deployment examples or use cases
- Suggesting additional framework integrations
- Improving documentation clarity
Feel free to open an issue or submit a pull request with your improvements.
For questions and support:
- Open an issue in this repository
- Visit ourHugging Face Hub
- Check theKatanemo organization on Github
About
The core repository for Katanemo's advanced function calling models with top-tier performance. Features three collections: Arch-Function (core function calling), Arch-Function-Chat (conversational), and Arch-Agent (autonomous workflows). Models range from 1.5B to 32B parameters with training/inference/deployment guides for vLLM, ollama, and SGLang.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.