Movatterモバイル変換

katanemo/Arch-FunctionPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star20

The core repository for Katanemo's advanced function calling models with top-tier performance. Features three collections: Arch-Function (core function calling), Arch-Function-Chat (conversational), and Arch-Agent (autonomous workflows). Models range from 1.5B to 32B parameters with training/inference/deployment guides for vLLM, ollama, and SGLang.

License

Apache-2.0 license

20 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Arch-Function: Advanced Function Calling Models

Arch-Function represents a comprehensive research and development initiative focused on creating state-of-the-art function calling capabilities in large language models. Our mission is to build AI systems that can seamlessly understand, interpret, and execute complex function calls with unprecedented accuracy and reliability.

This project encompasses multiple model families specifically engineered for function calling tasks, designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. The current release includes three major collections with models available in multiple sizes, with additional breakthrough models planned for future releases that will further advance the state-of-the-art in function calling capabilities.

📰 News & Updates

[2025-06]: 🏆🏆🏆Arch-Agent collection released for advanced multi-turn, multi-step workflow automation, achieving Top-3 performance on theBFCL Leaderboard!
[2025-02]: 🚀🚀🚀Arch-Function-Chat collection launched with conversational function calling capabilities!
[2024-12]: 🔥🔥🔥 Complete model suite updated with latest improvements across all sizes forArch-Function collection!
[2024-09]: 🏆🏆🏆Arch-Function collection officially launched on Hugging Face, achieving Top-7 performance on theBFCL Leaderboard!

🚀 Current Model Collections

Collection 1: Base Function Calling Models

Hugging Face Collection:Arch-Function

Model Name	Size	Key Features	Downloads
Arch-Function-1.5B	1.5B	• Compact size for edge deployment • Efficient function calling • Low resource requirements	🤗 HuggingFace
Arch-Function-3B	3B	• Balanced performance and efficiency • High accuracy function calling • Production-ready	🤗 HuggingFace
Arch-Function-7B	7B	• Maximum performance • Complex function handling • Enterprise-grade capabilities	🤗 HuggingFace

Collection 2: Chat-Optimized Models

Hugging Face Collection:Arch-Function-Chat

Model Name	Size	Key Features	Downloads
Arch-Function-Chat-1.5B	1.5B	• Conversational function calling • Interactive agent capabilities • Lightweight deployment	🤗 HuggingFace
Arch-Function-Chat-3B	3B	• Advanced dialogue management • Context-aware function usage • Multi-turn conversations	🤗 HuggingFace
Arch-Function-Chat-7B	7B	• Sophisticated reasoning • Complex multi-step workflows • Premium chat experience	🤗 HuggingFace

Collection 3: Agentic Models

Hugging Face Collection:Arch-Agent

Model Name	Size	Key Features	Downloads
Arch-Agent-1.5B	1.5B	• Lightweight autonomous workflows • Edge-optimized performance • Low resource requirements	🤗 HuggingFace
Arch-Agent-3B	3B	• Balanced autonomous performance • Multi-step task execution • High accuracy workflows	🤗 HuggingFace
Arch-Agent-7B	7B	• Advanced autonomous behavior • Complex workflow orchestration • Maximum performance	🤗 HuggingFace
Arch-Agent-32B	32B	• Premium autonomous systems • Sophisticated multi-step workflows • Superior capabilities	🤗 HuggingFace

📚 1. Fine-tuning Arch-Function Models

Here we provide a script to fine-tune Arch-Function models withLLaMA-Factory:

1.1 Set up environment

Create the environment following the instructions ofLLaMA-Factory
If you would like to use deepspeed and flash-attn, you can install packages with the following command:

pip install deepspeedpip install flash-attn --no-build-isolation

1.2 Prepare training data

LLaMA-Factory supports datasets inalpaca andsharegpt format. We recommend using thesharegpt format for function calling tasks. Below is an example of dataset in:

[{"conversations": [{"from":"human","value":"user instruction"},{"from":"function_call","value":"tool arguments"},{"from":"observation","value":"tool result"},{"from":"gpt","value":"model response"}],"system":"system prompt (optional)","tools":"tool description (optional)"}]

Next, updatedata/dataset_info.json with the dataset description below:

"dataset_name": {"file_name":"data.json","formatting":"sharegpt","columns": {"messages":"conversations","system":"system","tools":"tools"}}

1.3 Training

LLaMA-Factory provides diverse examples of training for LLMs underexamples. You can follow these examples and create a training script for your purpose. To kick off training, run the following command:

CUDA_VISIBLE_DEVICES={YOUR_DEVICE_IDS} llamafactory-cli train {PATH_TO_YOUR_TRAINING_SCRIPT}

Key considerations for fine-tuning:

Prepare high-quality function calling examples with proper format
Use gradient accumulation for larger effective batch sizes
Monitor validation loss to prevent overfitting
Consider using LoRA for parameter-efficient fine-tuning

📚 2. Inference with Arch-Function Models

To run inference with Arch-Function models for function calling tasks, follow the steps below:

2.1 Set up environment

Arch-Function models have been in the Hugging Facetransformers library and we advise you to install latest version with the following command:

pip install transformers>=4.51.0

2.2 Inference

Below is a script demonstrating how to use Arch-Function models for function calling tasks.

2.2.1 Create models and tokenizers

You can specify the desired model name and create models and corresponding tokenizers with the following script:

importjsonfromtypingimportAny,Dict,ListfromtransformersimportAutoModelForCausalLM,AutoTokenizer# Specify the desired model name heremodel_name="katanemo/Arch-Agent-7B"model=AutoModelForCausalLM.from_pretrained(model_name,device_map="auto",torch_dtype="auto",trust_remote_code=True)tokenizer=AutoTokenizer.from_pretrained(model_name)

2.2.2 Format prompts

Our models perform best when using the recommended prompt format, which can be found in the corresponding model cards on Hugging Face. You can run the following script to format prompts:

# Please use the recommended prompt for each model.TASK_PROMPT= ("You are a helpful assistant designed to assist with the user query by making one or more function calls if needed.""\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\n""You are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{tool_text}""\n</tools>\n\nFor each function call, return a json object with function name and arguments within """"<tool_call></tool_call> XML tags:\n<tool_call>\n{{"name": <function-name>, """""""arguments": <args-json-object>}}\n</tool_call>""")# Define available toolstools= [    {"type":"function","function": {"name":"get_weather","description":"Get the current weather for a location","parameters": {"type":"object","properties": {"location": {"type":"str","description":"The city and state, e.g. San Francisco, New York",                    },"unit": {"type":"str","enum": ["celsius","fahrenheit"],"description":"The unit of temperature to return",                    },                },"required": ["location"],            },        },    }]# Helper function to create the system prompt for our modeldefformat_prompt(tools:List[Dict[str,Any]]):tool_text="\n".join(        [json.dumps(tool["function"],ensure_ascii=False)fortoolintools]    )returnTASK_PROMPT.format(tool_text=tool_text)system_prompt=format_prompt(tools)messages= [    {"role":"system","content":system_prompt},    {"role":"user","content":"What is the weather in Seattle?"},]

2.2.3 Run inference

Now, you can run the following script to do inference with Arch-Function models.

#### 2.2.3 Run inferencemodel_inputs=tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True).to(model.device)generated_ids=model.generate(**model_inputs,max_new_tokens=32768)generated_ids= [output_ids[len(input_ids) :]forinput_ids,output_idsinzip(model_inputs.input_ids,generated_ids)]response=tokenizer.batch_decode(generated_ids,skip_special_tokens=True)[0]print(response)

Inference optimization tips:

Use appropriate temperature settings (0.0 - 0.1 for function calling)
User proper prompt formatting for best results
Consider batching for multiple requests
Use quantized models for faster inference

📚 3. Deployment with Popular Hosting Frameworks

Below we show how to deploy Arch-Function models using popular model hosting frameworks.

3.1 vLLM Deployment

vLLM provides high-throughput serving with advanced optimizations. Following the steps below to deploy Arch-Function models with vLLM

3.1.1 Set up environment

# Install vLLMpip install vllm

3.1.2 Start vLLM server

vllm serve katanemo/Arch-Agent-7B \    --host 127.0.0.1 \    --port 8000 \    --tensor-parallel-size 1

3.1.3 Get responses

To get responses from the vLLM server for function calling, first format prompts followinghere. Then, replacemessages in the script below with the formatted prompts and run the script.

fromopenaiimportOpenAI# Point to the local serverclient=OpenAI(api_key="EMPTY",base_url="http://127.0.0.1:8000/v1",)# Send requests and get responses from the servercompletion=client.chat.completions.create(model="katanemo/Arch-Agent-7B",messages=[        {"role":"user","content":"Get the current temperature in San Francisco"}    ],temperature=0.01,max_tokens=1024)print(completion.choices[0].message.content)

3.2 ollama Deployment

ollama provides easy local deployment with automatic model management. Below we provide scripts to show how to use ollama for deployment.

3.2.1 Install ollama

Please seeollama for installation. If necessary, use the following command to install ollama python library.

pip install ollama

3.2.2 Start ollama server

Specify your desired model name below and run the follwoing command to start the ollama server. Note thatollama only supportsgguf format.

ollama run hf.co/katanemo/Arch-Agent-7B.gguf

3.2.3 Get responses

Format prompts followinghere, and the replaceformatted_prompt in the script below and run the script to get responses.

fromollamaimportClient# Point to the local server. By default, it uses port 11434.client=Client(host="http://127.0.0.1:11434")# Send requests and get responses from the servercompletion=client.chat(model="hf.co/katanemo/Arch-Agent-1.5B.gguf",messages=[        {"role":"user","content":"Get the current temperature in San Francisco"}    ],options={"temperature":0.01,"num_ctx":1024})print(completion.message.content)

3.3 SGLang Deployment

SGLang offers structured generation capabilities with high performance. To use SGLang for deployment, follow the steps below.

3.3.1 Set up experiment

# Install SGLangpip install sglang[all]

3.3.2 Start SGLang server

python -m sglang.launch_server \    --model-path katanemo/Arch-Agent-7B \    --host 127.0.0.1 \    --port 8000 \    --tp 1 \    --trust-remote-code

3.3.3 Get responses

As sglang provides OpenAI-compatible APIs, you can follow the same way as vLLM to get responses from the server. First format prompts followinghere. Then, replacemessages in the script below with the formatted prompts and run the script.

# Client code for vLLMfromopenaiimportOpenAI# Point to the local serverclient=OpenAI(api_key="EMPTY",base_url="http://127.0.0.1:8000/v1",)#completion=client.chat.completions.create(model="katanemo/Arch-Agent-7B",messages=[        {"role":"user","content":"Get the current temperature in San Francisco"}    ],temperature=0.01,max_tokens=1024)print(completion.choices[0].message.content)

🔬 Research & Development

The Arch-Function project is actively developing next-generation models that will:

Further advance function calling accuracy beyond current SOTA
Introduce novel architectures optimized for tool usage
Expand to multimodal function calling capabilities
Support more complex reasoning patterns in function selection

📄 License

Please refer to the individual model pages on Hugging Face for specific licensing information.

🤝 Contributing

We welcome contributions to improve the Arch-Function tutorials and documentation! You can help by:

Fixing errors or improving existing tutorials
Adding new deployment examples or use cases
Suggesting additional framework integrations
Improving documentation clarity

Feel free to open an issue or submit a pull request with your improvements.

📞 Support

For questions and support:

Open an issue in this repository
Visit ourHugging Face Hub
Check theKatanemo organization on Github

About

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

katanemo/Arch-Function

Folders and files

Latest commit

History

Repository files navigation

📰 News & Updates

🚀 Current Model Collections

Collection 1: Base Function Calling Models

Collection 2: Chat-Optimized Models

Collection 3: Agentic Models

📚 1. Fine-tuning Arch-Function Models

1.1 Set up environment

1.2 Prepare training data

1.3 Training

📚 2. Inference with Arch-Function Models

2.1 Set up environment

2.2 Inference

2.2.1 Create models and tokenizers

2.2.2 Format prompts

2.2.3 Run inference

📚 3. Deployment with Popular Hosting Frameworks

3.1 vLLM Deployment

3.1.1 Set up environment

3.1.2 Start vLLM server

3.1.3 Get responses

3.2 ollama Deployment

3.2.1 Install ollama

3.2.2 Start ollama server

3.2.3 Get responses

3.3 SGLang Deployment

3.3.1 Set up experiment

3.3.2 Start SGLang server

3.3.3 Get responses

🔬 Research & Development

📄 License

🤝 Contributing

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Packages