LLmHub-dev/open-computer-usePublic

NotificationsYou must be signed in to change notification settings
Fork31
Star262

The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.

License

Apache-2.0 license

262 stars 31 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
app		app
assets		assets
backend		backend
components		components
docker/ai-desktop		docker/ai-desktop
hooks		hooks
lib		lib
public		public
supabase		supabase
types		types
utils/supabase		utils/supabase
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RESPONSIBLE_USE.md		RESPONSIBLE_USE.md
SECURITY.md		SECURITY.md
components.json		components.json
docker-compose.ai-desktop.yml		docker-compose.ai-desktop.yml
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
instrumentation.ts		instrumentation.ts
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Repository files navigation

💻 Open Computer Use - Autonomous Computer Using Agents at Scale

Your AI Agent That Actually Uses Computers Like Humans Do

Open Computer Use is an open-source platform that gives AI agents real computer control through browser automation, terminal access, and desktop interaction. Built for developers who want to create truly autonomous AI workflows.

Website •Discord •X

Preview

✨ What Makes This Special?

Unlike traditional AI assistants that onlytalk about tasks, Open Computer Use enables AI agents toactually perform them by:

🌐Browsing the web like a human (search, click, fill forms, extract data)
💻Running terminal commands and managing files
🖱️Controlling desktop applications with full UI automation
🤖Multi-agent orchestration that breaks down complex tasks
🔄Streaming execution with real-time feedback
🎯100% open-source and self-hostable

"Computer use" capabilities similar to Anthropic's Claude Computer Use, but fully open-source and extensible.

🎬 See It In Action

Browser Automation

AI agent searching, navigating, and interacting with websites autonomously

▶️ Watch: AI Agent Browsing and Playing

Terminal Operations & Development

Executing commands, managing files, and running complex workflows

▶️ Watch: Quant Trading & Research on QuantConnect

Multi-Agent Orchestration

Complex tasks broken down and executed by specialized agents

▶️ Watch: Building Nvidia Options Dashboard

Advanced Features

Human-in-the-loop control and intelligent collaboration

▶️ Watch: AI Agent with Human Intervention

🎯 Core Capabilities

🌐 Browser Agent Search-first strategy using Google Search API Smart web navigation with automatic form filling Element detection and intelligent clicking Multi-tab management for parallel workflows Page context extraction for AI understanding Screenshot capture for visual verification	💻 Terminal Agent Command execution in isolated environments File operations (read, write, edit, delete) Directory management with full control Script execution (Python, Node.js, bash) Package installation and environment setup Output streaming with real-time feedback
🖱️ Desktop Agent UI element detection using computer vision Mouse and keyboard control for any application Window management (focus, resize, arrange) Screenshot analysis for context awareness OCR capabilities for text extraction Cross-platform support (Linux desktop)	🤖 Multi-Agent System Task decomposition by AI planner Sequential execution with context passing Specialized agents for different capabilities Error handling with automatic retries User interaction when clarification needed Execution reports with detailed summaries

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐│                         Frontend (Next.js 15)                   ││  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           ││  │  Chat UI     │  │  Model       │  │  VM          │           ││  │  Components  │  │  Selection   │  │  Management  │           ││  └──────────────┘  └──────────────┘  └──────────────┘           │└─────────────────────────────────────────────────────────────────┘                              ▼┌─────────────────────────────────────────────────────────────────┐│                      Backend API (FastAPI)                      ││  ┌──────────────────────────────────────────────────────────┐   ││  │           Multi-Agent Executor Service                   │   ││  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   ││  │  │   Planner   │→ │   Browser   │→ │   Terminal  │       │   ││  │  │    Agent    │  │    Agent    │  │    Agent    │       │   ││  │  └─────────────┘  └─────────────┘  └─────────────┘       │   ││  └──────────────────────────────────────────────────────────┘   ││  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           ││  │   WebSocket  │  │   Database   │  │   Billing    │           ││  │   VM Control │  │   Service    │  │   Service    │           ││  └──────────────┘  └──────────────┘  └──────────────┘           │└─────────────────────────────────────────────────────────────────┘                              ▼┌─────────────────────────────────────────────────────────────────┐│               Docker VM (Ubuntu 22.04 + XFCE)                   ││  ┌──────────────────────────────────────────────────────────┐   ││  │  Chrome Browser  │  Terminal  │  Desktop Apps  │  Tools  │   ││  └──────────────────────────────────────────────────────────┘   ││  ┌──────────────────────────────────────────────────────────┐   ││  │         WebSocket Agent Server (Port 8080)               │   ││  │         VNC Server (Port 5900)                           │   ││  └──────────────────────────────────────────────────────────┘   │└─────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Node.js 20+ andnpm
Python 3.10+ andpip
Docker andDocker Compose
Supabase account (free tier works)
API keys for AI providers (OpenAI, Anthropic, etc.)

1. Clone the Repository

git clone https://github.com/LLmHub-dev/open-computer-use.gitcd open-computer-use

2. Set Up Supabase Database

Create Supabase Project

Go toSupabase and create a new project
Wait for the project to finish setting up
Go to Project Settings → API to get your keys

Run Database Schema

Execute the schema to create all required tables:

# Option A: Using Supabase Dashboard# 1. Go to SQL Editor in your Supabase dashboard# 2. Copy contents of supabase/schema.sql# 3. Paste and run the SQL# Option B: Using Supabase CLI (recommended)npm install -g supabasesupabase loginsupabase link --project-ref your-project-refsupabase db push

Or manually run the schema file:

psql -h db.your-project.supabase.co -U postgres -d postgres -f supabase/schema.sql

This creates all necessary tables:

👤Users & Auth: users, user_preferences, user_keys
💬Chat System: chats, messages, chat_participants, chat_attachments
🤖AI Agents: machine_sessions, machine_usage, machine_ai_actions
💳Billing: user_credits, credit_transactions, stripe_customers, subscription_plans
📊Projects: projects, user_machines, machine_snapshots

3. Set Up Environment Variables

# Frontendcp .env.example .env# Edit .env with your configuration# Backendcp backend/.env.example backend/.env# Edit backend/.env with your configuration

Required Variables

Supabase (Required)

NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.coNEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-from-supabase-dashboardSUPABASE_SERVICE_ROLE=your-service-role-key-from-supabase-dashboard

Security Keys (Required)

# Generate with: openssl rand -hex 32ENCRYPTION_KEY=your-generated-32-byte-hex-stringCSRF_SECRET=your-generated-32-byte-hex-string

Google Search API (Required for web search)

GOOGLE_SEARCH_KEY=your-google-api-keyGOOGLE_SEARCH_CX=your-custom-search-engine-id

Get these fromGoogle Cloud Console:

Enable Custom Search API
Create API key
Create Custom Search Engine atprogrammablesearchengine.google.com

AI Provider Keys (Choose at least one)

# OpenAIOPENAI_API_KEY=sk-...# AnthropicANTHROPIC_API_KEY=sk-ant-...# Azure OpenAI (Optional)AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/AZURE_OPENAI_API_KEY=your-keyAZURE_OPENAI_DEPLOYMENT=your-deployment-nameAZURE_OPENAI_API_VERSION=2024-02-15-preview

Azure Container Instances (Optional - for cloud VM deployment)

AZURE_SUBSCRIPTION_ID=your-subscription-idAZURE_RESOURCE_GROUP=your-resource-groupAZURE_TENANT_ID=your-tenant-idAZURE_CLIENT_ID=your-client-idAZURE_CLIENT_SECRET=your-client-secretAZURE_CONTAINER_REGISTRY=your-registry.azurecr.ioAZURE_DESKTOP_IMAGE=your-registry.azurecr.io/ai-desktop:latest

Stripe (Optional - for billing)

STRIPE_API_KEY=sk_test_...STRIPE_WEBHOOK_SECRET=whsec_...NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...

4. Install Dependencies

# Frontendnpm install# Backendcd backendpython -m venv venvsource venv/bin/activate# On Windows: venv\Scripts\activatepip install -r requirements.txtcd ..

5. Start Development Servers

Option A: Using Docker (Recommended)

# Start all servicesdocker-compose up --build# Access the application# Frontend: http://localhost:3000# Backend: http://localhost:8001

Option B: Manual Start

# Terminal 1: Frontendnpm run dev# Terminal 2: Backendcd backendpython main.py# Terminal 3: AI Desktop (if needed)docker-compose -f docker-compose.ai-desktop.yml up --build

6. Create Your First Agent Session

Openhttp://localhost:3000
Sign up / Log in with Supabase Auth
Start a new chat
Try a command:"Search for the latest AI news and summarize the top 3 articles"
Watch your AI agent work! 🎉

🎨 Features

Multi-Provider AI Support

Connect your own API keys and switch between providers mid-conversation:

✅OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
✅Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
✅Google (Gemini Pro, Gemini 1.5)
✅Azure OpenAI (Enterprise deployments)
✅xAI (Grok models)
✅Mistral AI (Mistral Large, Mixtral)
✅Perplexity (Online models)
✅OpenRouter (Access to 100+ models)

Bring Your Own Keys (BYOK)

All API keys are encrypted and stored securely. You maintain full control over your AI costs and usage.

Real-Time Streaming

Watch your agents work in real-time with:

📊Task progress indicators
🛠️Tool call visualization
📸Live screenshots from VM
💬Streaming responses
📋Detailed execution logs

Advanced Task Planning

The AI automatically:

Analyzes your request
Breaks down into subtasks
Assigns to specialized agents
Executes with full context
Reports detailed results

Secure VM Isolation

Each agent session runs in an isolated Docker container:

🔒Sandboxed execution environment
🔄Ephemeral containers (no data persistence)
🌐Network isolation options
📊Resource limits and monitoring

📚 Use Cases

🔍 Research & Data Gathering Web scraping and data extraction Competitive analysis Market research automation Academic paper collection	🧪 Testing & QA Automated UI testing Cross-browser testing E2E test generation Regression testing
📝 Content Creation Screenshot and documentation Tutorial generation Workflow recording Demo creation	🔧 DevOps & Automation Server configuration Deployment automation Log analysis System monitoring
🛒 E-commerce Operations Price monitoring Product research Order management Inventory tracking	📊 Business Intelligence Report generation Dashboard monitoring Data analysis workflows KPI tracking

🛠️ Technology Stack

Frontend

Framework: Next.js 15 (App Router, React 19)
Language: TypeScript
Styling: Tailwind CSS 4
UI Components: Radix UI, shadcn/ui
State Management: Zustand
AI SDK: Vercel AI SDK
Database: Supabase (Auth + Postgres)
Payments: Stripe

Backend

Framework: FastAPI (Python 3.10+)
Async Runtime: asyncio, uvicorn
WebSocket: websockets library
AI Providers: openai, anthropic, google-generativeai
Search: Google Custom Search API
Caching: Redis (optional)
Image Processing: Pillow, ImageMagick

Infrastructure

Containerization: Docker, Docker Compose
VM Environment: Ubuntu 22.04 LTS + XFCE
Browser: Google Chrome (with remote debugging)
Automation: Selenium, Playwright, PyAutoGUI
Cloud: Azure Container Instances (optional)

🤝 Contributing

We love contributions! Here's how you can help:

🐛 Found a Bug?

Open anissue with:

Clear description of the bug
Steps to reproduce
Expected vs actual behavior
Screenshots or logs

💡 Have a Feature Idea?

Check if it's alreadyrequested
Open a new issue with theenhancement label
Describe your use case and proposed solution

🔧 Want to Contribute Code?

Fork the repository
Create a feature branch:git checkout -b feature/amazing-feature
Make your changes
Write tests if applicable
Commit:git commit -m 'Add amazing feature'
Push:git push origin feature/amazing-feature
Open a Pull Request

Please read ourContributing Guide for detailed guidelines.

📖 Documentation

💬Discord Community

🗺️ Roadmap

Q1 2026

Multi-VM orchestration (parallel agents)
Advanced workflow builder (visual programming)
Marketplace for custom agents
Windows and macOS VM support
Mobile app (iOS/Android)

Q2 2026

Plugin system for custom tools
Collaborative agent sessions
Advanced analytics dashboard
Enterprise SSO support
Self-hosted cloud deployment guides

Future

Voice control integration
Video understanding capabilities
Agent memory and learning
Multi-modal agent interactions
Community agent templates

Vote on features:Feature Requests

📊 Performance & Benchmarks

Metric	Value
Average Task Completion	~45 seconds
Concurrent Sessions	50+ (per server)
Browser Navigation	~2s per page
Tool Call Latency	<500ms
VM Startup Time	~15 seconds
Memory per Session	~2GB

Benchmarks measured on: 4 CPU cores, 8GB RAM, SSD storage

⚠️ Responsible AI Use

Open Computer Use gives AI agents significant autonomy. Please use responsibly:

✅Do: Automate repetitive tasks, research, testing, content creation
❌Don't: Violate terms of service, spam, scrape without permission
🔒Security: Never share credentials, use isolated environments
📋Compliance: Follow data protection laws (GDPR, CCPA, etc.)
🤝Ethics: Respect website robots.txt and rate limits

Read ourResponsible Use Guidelines for more details.

📄 License

This project is licensed under theApache License 2.0 - see theLICENSE file for details.

Apache License 2.0Copyright (c) 2025 Open Computer Use ContributorsLicensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.

🙏 Acknowledgments

Built with amazing open-source projects:

Next.js - The React Framework
FastAPI - Modern Python web framework
Supabase - Open source Firebase alternative
Vercel AI SDK - AI toolkit for TypeScript
Radix UI - Unstyled, accessible components
Anthropic - Inspiration from Claude Computer Use
Docker - Containerization platform

Special thanks to all ourcontributors! 💙

🌟 Star History

💬 Community & Support

💬Discord: Join ourcommunity server
🐦Twitter: Follow@llmhub_dev
📧Email:prateek@llmhub.dev
🐛Issues:GitHub Issues
💡Discussions:GitHub Discussions

⭐ Star us on GitHub if you find this useful!

Made with ❤️ by the Open Computer Use community

Star on GitHub •Join Discord

About

The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.

llmhub.dev

Contributing

Security policy

Activity

Custom properties

Stars

262 stars

Watchers

2 watching

Forks

31 forks

Report repository

Releases1

Release-1.0.0 Latest

Oct 12, 2025

Sponsor this project

Learn more about GitHub Sponsors

Movatterモバイル変換

Uh oh!

License

LLmHub-dev/open-computer-use

Folders and files

Latest commit

History

Repository files navigation

💻 Open Computer Use - Autonomous Computer Using Agents at Scale

Your AI Agent That Actually Uses Computers Like Humans Do

Preview

✨ What Makes This Special?

🎬 See It In Action

Browser Automation

Terminal Operations & Development

Multi-Agent Orchestration

Advanced Features

🎯 Core Capabilities

🌐 Browser Agent

💻 Terminal Agent

🖱️ Desktop Agent

🤖 Multi-Agent System

🏗️ Architecture

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Set Up Supabase Database

Create Supabase Project

Run Database Schema

3. Set Up Environment Variables

Required Variables

4. Install Dependencies

5. Start Development Servers

6. Create Your First Agent Session

🎨 Features

Multi-Provider AI Support

Bring Your Own Keys (BYOK)

Real-Time Streaming

Advanced Task Planning

Secure VM Isolation

📚 Use Cases

🔍 Research & Data Gathering

🧪 Testing & QA

📝 Content Creation

🔧 DevOps & Automation

🛒 E-commerce Operations

📊 Business Intelligence

🛠️ Technology Stack

Frontend

Backend

Infrastructure

🤝 Contributing

🐛 Found a Bug?

💡 Have a Feature Idea?

🔧 Want to Contribute Code?

📖 Documentation

🗺️ Roadmap

Q1 2026

Q2 2026

Future

📊 Performance & Benchmarks

⚠️ Responsible AI Use

📄 License

🙏 Acknowledgments

🌟 Star History

💬 Community & Support

⭐ Star us on GitHub if you find this useful!

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases1

Sponsor this project

Uh oh!