- Notifications
You must be signed in to change notification settings - Fork0
pnixnoel/cua-vlm-llm-switch
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
c/ua ("koo-ah") is Docker forComputer-Use Agents - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
vibe-photoshop.mp4
Check out more demos of the Computer-Use Agent in action
MCP Server: Work with Claude Desktop and Tableau
mcp-claude-tableau.mp4
AI-Gradio: Multi-app workflow with browser, VS Code and terminal
ai-gradio-clone.mp4
Notebook: Fix GitHub issue in Cursor
notebook-github-cursor.mp4
Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.
Docker-based guided install for quick use
macOS/Linux/Windows (via WSL):
# Requires Docker/bin/bash -c"$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
This script will guide you through setup using Docker containers and launch the Computer-Use Agent UI.
Option 2:Dev Container
Best for contributors and development
This repository includes aDev Container configuration that simplifies setup to a few steps:
- Install the Dev Containers extension (VS Code orWindSurf)
- Open the repository in the Dev Container:
- Press
Ctrl+Shift+P
(or⌘+Shift+P
on macOS) - Select
Dev Containers: Clone Repository in Container Volume...
and paste the repository URL:https://github.com/trycua/cua.git
(if not cloned) orDev Containers: Open Folder in Container...
(if git cloned).
Note: On WindSurf, the post install hook might not run automatically. If so, run
/bin/bash .devcontainer/post-install.sh
manually. - Press
- Open the VS Code workspace: Once the post-install.sh is done running, open the
.vscode/py.code-workspace
workspace and press.
- Run the Agent UI example: Click
to start the Gradio UI. If prompted to installdebugpy (Python Debugger) to enable remote debugging, select 'Yes' to proceed.
- Access the Gradio UI: The Gradio UI will be available at
http://localhost:7860
and will automatically forward to your host machine.
Direct Python package installation
# conda create -yn cua python==3.12pip install -U"cua-computer[all]""cua-agent[all]"python -m agent.ui# Start the agent UI
Or check out theUsage Guide to learn how to use our Python SDK in your own code.
SupportedAgent Loops
- UITARS-1.5 - Run locally on Apple Silicon with MLX, or use cloud providers
- OpenAI CUA - Use OpenAI's Computer-Use Preview model
- Anthropic CUA - Use Anthropic's Computer-Use capabilities
- OmniParser-v2.0 - Control UI withSet-of-Marks prompting using any vision model
For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see theCompatibility Matrix.
Follow these steps to use C/ua in your own Python code. SeeDeveloper Guide for building from source.
/bin/bash -c"$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon.
lume pull macos-sequoia-cua:latest
The macOS CUA image contains the default Mac apps and the Computer Server for easy automation.
pip install"cua-computer[all]""cua-agent[all]"
fromcomputerimportComputerfromagentimportComputerAgent,LLMasyncdefmain():# Start a local macOS VMcomputer=Computer(os_type="macos")awaitcomputer.run()# Or with C/ua Cloud Containercomputer=Computer(os_type="linux",api_key="your_cua_api_key_here",name="your_container_name_here" )# Example: Direct control of a macOS VM with Computercomputer.interface.delay=0.1# Wait 0.1 seconds between kb/m actionsawaitcomputer.interface.left_click(100,200)awaitcomputer.interface.type_text("Hello, world!")screenshot_bytes=awaitcomputer.interface.screenshot()# Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bitagent=ComputerAgent(computer=computer,loop="uitars",model=LLM(provider="mlxvlm",name="mlx-community/UI-TARS-1.5-7B-6bit") )asyncforresultinagent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"):print(result)if__name__=="__main__":asyncio.run(main())
For ready-to-use examples, check out ourNotebooks collection.
# Install Lume CLI and background servicecurl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh| bash# List all VMslume ls# Pull a VM imagelume pull macos-sequoia-cua:latest# Create a new VMlume create my-vm --os macos --cpu 4 --memory 8GB --disk-size 50GB# Run a VM (creates and starts if it doesn't exist)lume run macos-sequoia-cua:latest# Stop a VMlume stop macos-sequoia-cua_latest# Delete a VMlume delete macos-sequoia-cua_latest
For advanced container-like virtualization, check outLumier - a Docker interface for macOS and Linux VMs.
# Install Lume CLI and background servicecurl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh| bash# Run macOS in a Docker containerdocker run -it --rm \ --name lumier-vm \ -p 8006:8006 \ -v$(pwd)/storage:/storage \ -v$(pwd)/shared:/shared \ -e VM_NAME=lumier-vm \ -e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \ -e CPU_CORES=4 \ -e RAM_SIZE=8192 \ -e HOST_STORAGE_PATH=$(pwd)/storage \ -e HOST_SHARED_PATH=$(pwd)/shared \ trycua/lumier:latest
- How to use the MCP Server with Claude Desktop or other MCP clients - One of the easiest ways to get started with C/ua
- How to use OpenAI Computer-Use, Anthropic, OmniParser, or UI-TARS for your Computer-Use Agent
- How to use Lume CLI for managing desktops
- Training Computer-Use Models: Collecting Human Trajectories with C/ua (Part 1)
- Build Your Own Operator on macOS (Part 1)
Module | Description | Installation |
---|---|---|
Lume | VM management for macOS/Linux using Apple's Virtualization.Framework | curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash |
Lumier | Docker interface for macOS and Linux VMs | docker pull trycua/lumier:latest |
Computer (Python) | Python Interface for controlling virtual machines | pip install "cua-computer[all]" |
Computer (Typescript) | Typescript Interface for controlling virtual machines | npm install @trycua/computer |
Agent | AI agent framework for automating tasks | pip install "cua-agent[all]" |
MCP Server | MCP server for using CUA with Claude Desktop | pip install cua-mcp-server |
SOM | Self-of-Mark library for Agent | pip install cua-som |
Computer Server | Server component for Computer | pip install cua-computer-server |
Core (Python) | Python Core utilities | pip install cua-core |
Core (Typescript) | Typescript Core utilities | npm install @trycua/core |
For complete examples, seecomputer_examples.py orcomputer_nb.ipynb
# Shell Actionsresult=awaitcomputer.interface.run_command(cmd)# Run shell command# result.stdout, result.stderr, result.returncode# Mouse Actionsawaitcomputer.interface.left_click(x,y)# Left click at coordinatesawaitcomputer.interface.right_click(x,y)# Right click at coordinatesawaitcomputer.interface.double_click(x,y)# Double click at coordinatesawaitcomputer.interface.move_cursor(x,y)# Move cursor to coordinatesawaitcomputer.interface.drag_to(x,y,duration)# Drag to coordinatesawaitcomputer.interface.get_cursor_position()# Get current cursor positionawaitcomputer.interface.mouse_down(x,y,button="left")# Press and hold a mouse buttonawaitcomputer.interface.mouse_up(x,y,button="left")# Release a mouse button# Keyboard Actionsawaitcomputer.interface.type_text("Hello")# Type textawaitcomputer.interface.press_key("enter")# Press a single keyawaitcomputer.interface.hotkey("command","c")# Press key combinationawaitcomputer.interface.key_down("command")# Press and hold a keyawaitcomputer.interface.key_up("command")# Release a key# Scrolling Actionsawaitcomputer.interface.scroll(x,y)# Scroll the mouse wheelawaitcomputer.interface.scroll_down(clicks)# Scroll downawaitcomputer.interface.scroll_up(clicks)# Scroll up# Screen Actionsawaitcomputer.interface.screenshot()# Take a screenshotawaitcomputer.interface.get_screen_size()# Get screen dimensions# Clipboard Actionsawaitcomputer.interface.set_clipboard(text)# Set clipboard contentawaitcomputer.interface.copy_to_clipboard()# Get clipboard content# File System Operationsawaitcomputer.interface.file_exists(path)# Check if file existsawaitcomputer.interface.directory_exists(path)# Check if directory existsawaitcomputer.interface.read_text(path,encoding="utf-8")# Read file contentawaitcomputer.interface.write_text(path,content,encoding="utf-8")# Write file contentawaitcomputer.interface.read_bytes(path)# Read file content as bytesawaitcomputer.interface.write_bytes(path,content)# Write file content as bytesawaitcomputer.interface.delete_file(path)# Delete fileawaitcomputer.interface.create_dir(path)# Create directoryawaitcomputer.interface.delete_dir(path)# Delete directoryawaitcomputer.interface.list_dir(path)# List directory contents# Accessibilityawaitcomputer.interface.get_accessibility_tree()# Get accessibility tree# Delay Configuration# Set default delay between all actions (in seconds)computer.interface.delay=0.5# 500ms delay between actions# Or specify delay for individual actionsawaitcomputer.interface.left_click(x,y,delay=1.0)# 1 second delay after clickawaitcomputer.interface.type_text("Hello",delay=0.2)# 200ms delay after typingawaitcomputer.interface.press_key("enter",delay=0.5)# 500ms delay after key press# Python Virtual Environment Operationsawaitcomputer.venv_install("demo_venv", ["requests","macos-pyxa"])# Install packages in a virtual environmentawaitcomputer.venv_cmd("demo_venv","python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'")# Run a shell command in a virtual environmentawaitcomputer.venv_exec("demo_venv",python_function_or_code,*args,**kwargs)# Run a Python function in a virtual environment and return the result / raise an exception# Example: Use sandboxed functions to execute code in a C/ua Containerfromcomputer.helpersimportsandboxed@sandboxed("demo_venv")defgreet_and_print(name):"""Get the HTML of the current Safari tab"""importPyXAsafari=PyXA.Application("Safari")html=safari.current_document.source()print(f"Hello from inside the container,{name}!")return {"greeted":name,"safari_html":html}# When a @sandboxed function is called, it will execute in the containerresult=awaitgreet_and_print("C/ua")# Result: {"greeted": "C/ua", "safari_html": "<html>...</html>"}# stdout and stderr are also captured and printed / raisedprint("Result from sandboxed function:",result)
For complete examples, seeagent_examples.py oragent_nb.ipynb
# Import necessary componentsfromagentimportComputerAgent,LLM,AgentLoop,LLMProvider# UI-TARS-1.5 agent for local execution with MLXComputerAgent(loop=AgentLoop.UITARS,model=LLM(provider=LLMProvider.MLXVLM,name="mlx-community/UI-TARS-1.5-7B-6bit"))# OpenAI Computer-Use agent using OPENAI_API_KEYComputerAgent(loop=AgentLoop.OPENAI,model=LLM(provider=LLMProvider.OPENAI,name="computer-use-preview"))# Anthropic Claude agent using ANTHROPIC_API_KEYComputerAgent(loop=AgentLoop.ANTHROPIC,model=LLM(provider=LLMProvider.ANTHROPIC))# OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLMComputerAgent(loop=AgentLoop.OMNI,model=LLM(provider=LLMProvider.OLLAMA,name="gemma3:12b-it-q4_K_M"))# OpenRouter example using OAICOMPAT providerComputerAgent(loop=AgentLoop.OMNI,model=LLM(provider=LLMProvider.OAICOMPAT,name="openai/gpt-4o-mini",provider_base_url="https://openrouter.ai/api/v1" ),api_key="your-openrouter-api-key")
Join ourDiscord community to discuss ideas, get assistance, or share your demos!
Cua is open-sourced under the MIT License - see theLICENSE file for details.
Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see theOmniParser LICENSE file for details.
We welcome contributions to CUA! Please refer to ourContributing Guidelines for details.
Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation.
Thank you to all our supporters!
f-trycua 💻 | Pedro Piñera Buendía 💻 | Amit Kumar 💻 | Dung Duc Huynh (Kaka) 💻 | Zayd Krunz 💻 | Prashant Raj 💻 | Leland Takamine 💻 |
ddupont 💻 | Ethan Gutierrez 💻 | Ricter Zheng 💻 | Rahul Karajgikar 💻 | trospix 💻 | Evan smith 💻 |
About
Secured CUA
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Languages
- Python63.2%
- Swift25.1%
- Shell4.2%
- TypeScript3.8%
- Jupyter Notebook3.0%
- PowerShell0.5%
- Other0.2%