Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.

License

NotificationsYou must be signed in to change notification settings

LLmHub-dev/open-computer-use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Landing Page

Your AI Agent That Actually Uses Computers Like Humans Do

Open Computer Use is an open-source platform that gives AI agents real computer control through browser automation, terminal access, and desktop interaction. Built for developers who want to create truly autonomous AI workflows.

WebsiteDiscordX

License: Apache 2.0Next.jsFastAPIDockerPRs Welcome

Preview

Main Agent Animation


✨ What Makes This Special?

Unlike traditional AI assistants that onlytalk about tasks, Open Computer Use enables AI agents toactually perform them by:

  • 🌐Browsing the web like a human (search, click, fill forms, extract data)
  • 💻Running terminal commands and managing files
  • 🖱️Controlling desktop applications with full UI automation
  • 🤖Multi-agent orchestration that breaks down complex tasks
  • 🔄Streaming execution with real-time feedback
  • 🎯100% open-source and self-hostable

"Computer use" capabilities similar to Anthropic's Claude Computer Use, but fully open-source and extensible.


🎬 See It In Action

Browser Automation

AI agent searching, navigating, and interacting with websites autonomously

Browser Automation Demo

▶️ Watch: AI Agent Browsing and Playing

Terminal Operations & Development

Executing commands, managing files, and running complex workflows

Terminal Operations Demo

▶️ Watch: Quant Trading & Research on QuantConnect

Multi-Agent Orchestration

Complex tasks broken down and executed by specialized agents

Multi-Agent Demo

▶️ Watch: Building Nvidia Options Dashboard

Advanced Features

Human-in-the-loop control and intelligent collaboration

Human Control Demo

▶️ Watch: AI Agent with Human Intervention


🎯 Core Capabilities

🌐 Browser Agent

  • Search-first strategy using Google Search API
  • Smart web navigation with automatic form filling
  • Element detection and intelligent clicking
  • Multi-tab management for parallel workflows
  • Page context extraction for AI understanding
  • Screenshot capture for visual verification

💻 Terminal Agent

  • Command execution in isolated environments
  • File operations (read, write, edit, delete)
  • Directory management with full control
  • Script execution (Python, Node.js, bash)
  • Package installation and environment setup
  • Output streaming with real-time feedback

🖱️ Desktop Agent

  • UI element detection using computer vision
  • Mouse and keyboard control for any application
  • Window management (focus, resize, arrange)
  • Screenshot analysis for context awareness
  • OCR capabilities for text extraction
  • Cross-platform support (Linux desktop)

🤖 Multi-Agent System

  • Task decomposition by AI planner
  • Sequential execution with context passing
  • Specialized agents for different capabilities
  • Error handling with automatic retries
  • User interaction when clarification needed
  • Execution reports with detailed summaries

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐│                         Frontend (Next.js 15)                   ││  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           ││  │  Chat UI     │  │  Model       │  │  VM          │           ││  │  Components  │  │  Selection   │  │  Management  │           ││  └──────────────┘  └──────────────┘  └──────────────┘           │└─────────────────────────────────────────────────────────────────┘                              ▼┌─────────────────────────────────────────────────────────────────┐│                      Backend API (FastAPI)                      ││  ┌──────────────────────────────────────────────────────────┐   ││  │           Multi-Agent Executor Service                   │   ││  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   ││  │  │   Planner   │→ │   Browser   │→ │   Terminal  │       │   ││  │  │    Agent    │  │    Agent    │  │    Agent    │       │   ││  │  └─────────────┘  └─────────────┘  └─────────────┘       │   ││  └──────────────────────────────────────────────────────────┘   ││  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           ││  │   WebSocket  │  │   Database   │  │   Billing    │           ││  │   VM Control │  │   Service    │  │   Service    │           ││  └──────────────┘  └──────────────┘  └──────────────┘           │└─────────────────────────────────────────────────────────────────┘                              ▼┌─────────────────────────────────────────────────────────────────┐│               Docker VM (Ubuntu 22.04 + XFCE)                   ││  ┌──────────────────────────────────────────────────────────┐   ││  │  Chrome Browser  │  Terminal  │  Desktop Apps  │  Tools  │   ││  └──────────────────────────────────────────────────────────┘   ││  ┌──────────────────────────────────────────────────────────┐   ││  │         WebSocket Agent Server (Port 8080)               │   ││  │         VNC Server (Port 5900)                           │   ││  └──────────────────────────────────────────────────────────┘   │└─────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

  • Node.js 20+ andnpm
  • Python 3.10+ andpip
  • Docker andDocker Compose
  • Supabase account (free tier works)
  • API keys for AI providers (OpenAI, Anthropic, etc.)

1. Clone the Repository

git clone https://github.com/LLmHub-dev/open-computer-use.gitcd open-computer-use

2. Set Up Supabase Database

Create Supabase Project

  1. Go toSupabase and create a new project
  2. Wait for the project to finish setting up
  3. Go to Project Settings → API to get your keys

Run Database Schema

Execute the schema to create all required tables:

# Option A: Using Supabase Dashboard# 1. Go to SQL Editor in your Supabase dashboard# 2. Copy contents of supabase/schema.sql# 3. Paste and run the SQL# Option B: Using Supabase CLI (recommended)npm install -g supabasesupabase loginsupabase link --project-ref your-project-refsupabase db push

Or manually run the schema file:

psql -h db.your-project.supabase.co -U postgres -d postgres -f supabase/schema.sql

This creates all necessary tables:

  • 👤Users & Auth: users, user_preferences, user_keys
  • 💬Chat System: chats, messages, chat_participants, chat_attachments
  • 🤖AI Agents: machine_sessions, machine_usage, machine_ai_actions
  • 💳Billing: user_credits, credit_transactions, stripe_customers, subscription_plans
  • 📊Projects: projects, user_machines, machine_snapshots

3. Set Up Environment Variables

# Frontendcp .env.example .env# Edit .env with your configuration# Backendcp backend/.env.example backend/.env# Edit backend/.env with your configuration

Required Variables

Supabase (Required)

NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.coNEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-from-supabase-dashboardSUPABASE_SERVICE_ROLE=your-service-role-key-from-supabase-dashboard

Security Keys (Required)

# Generate with: openssl rand -hex 32ENCRYPTION_KEY=your-generated-32-byte-hex-stringCSRF_SECRET=your-generated-32-byte-hex-string

Google Search API (Required for web search)

GOOGLE_SEARCH_KEY=your-google-api-keyGOOGLE_SEARCH_CX=your-custom-search-engine-id

Get these fromGoogle Cloud Console:

  1. Enable Custom Search API
  2. Create API key
  3. Create Custom Search Engine atprogrammablesearchengine.google.com

AI Provider Keys (Choose at least one)

# OpenAIOPENAI_API_KEY=sk-...# AnthropicANTHROPIC_API_KEY=sk-ant-...# Azure OpenAI (Optional)AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/AZURE_OPENAI_API_KEY=your-keyAZURE_OPENAI_DEPLOYMENT=your-deployment-nameAZURE_OPENAI_API_VERSION=2024-02-15-preview

Azure Container Instances (Optional - for cloud VM deployment)

AZURE_SUBSCRIPTION_ID=your-subscription-idAZURE_RESOURCE_GROUP=your-resource-groupAZURE_TENANT_ID=your-tenant-idAZURE_CLIENT_ID=your-client-idAZURE_CLIENT_SECRET=your-client-secretAZURE_CONTAINER_REGISTRY=your-registry.azurecr.ioAZURE_DESKTOP_IMAGE=your-registry.azurecr.io/ai-desktop:latest

Stripe (Optional - for billing)

STRIPE_API_KEY=sk_test_...STRIPE_WEBHOOK_SECRET=whsec_...NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...

4. Install Dependencies

# Frontendnpm install# Backendcd backendpython -m venv venvsource venv/bin/activate# On Windows: venv\Scripts\activatepip install -r requirements.txtcd ..

5. Start Development Servers

Option A: Using Docker (Recommended)

# Start all servicesdocker-compose up --build# Access the application# Frontend: http://localhost:3000# Backend: http://localhost:8001

Option B: Manual Start

# Terminal 1: Frontendnpm run dev# Terminal 2: Backendcd backendpython main.py# Terminal 3: AI Desktop (if needed)docker-compose -f docker-compose.ai-desktop.yml up --build

6. Create Your First Agent Session

  1. Openhttp://localhost:3000
  2. Sign up / Log in with Supabase Auth
  3. Start a new chat
  4. Try a command:"Search for the latest AI news and summarize the top 3 articles"
  5. Watch your AI agent work! 🎉

🎨 Features

Multi-Provider AI Support

Connect your own API keys and switch between providers mid-conversation:

  • OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
  • Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
  • Google (Gemini Pro, Gemini 1.5)
  • Azure OpenAI (Enterprise deployments)
  • xAI (Grok models)
  • Mistral AI (Mistral Large, Mixtral)
  • Perplexity (Online models)
  • OpenRouter (Access to 100+ models)

Bring Your Own Keys (BYOK)

All API keys are encrypted and stored securely. You maintain full control over your AI costs and usage.

Real-Time Streaming

Watch your agents work in real-time with:

  • 📊Task progress indicators
  • 🛠️Tool call visualization
  • 📸Live screenshots from VM
  • 💬Streaming responses
  • 📋Detailed execution logs

Advanced Task Planning

The AI automatically:

  1. Analyzes your request
  2. Breaks down into subtasks
  3. Assigns to specialized agents
  4. Executes with full context
  5. Reports detailed results

Secure VM Isolation

Each agent session runs in an isolated Docker container:

  • 🔒Sandboxed execution environment
  • 🔄Ephemeral containers (no data persistence)
  • 🌐Network isolation options
  • 📊Resource limits and monitoring

📚 Use Cases

🔍 Research & Data Gathering

  • Web scraping and data extraction
  • Competitive analysis
  • Market research automation
  • Academic paper collection

🧪 Testing & QA

  • Automated UI testing
  • Cross-browser testing
  • E2E test generation
  • Regression testing

📝 Content Creation

  • Screenshot and documentation
  • Tutorial generation
  • Workflow recording
  • Demo creation

🔧 DevOps & Automation

  • Server configuration
  • Deployment automation
  • Log analysis
  • System monitoring

🛒 E-commerce Operations

  • Price monitoring
  • Product research
  • Order management
  • Inventory tracking

📊 Business Intelligence

  • Report generation
  • Dashboard monitoring
  • Data analysis workflows
  • KPI tracking

🛠️ Technology Stack

Frontend

  • Framework: Next.js 15 (App Router, React 19)
  • Language: TypeScript
  • Styling: Tailwind CSS 4
  • UI Components: Radix UI, shadcn/ui
  • State Management: Zustand
  • AI SDK: Vercel AI SDK
  • Database: Supabase (Auth + Postgres)
  • Payments: Stripe

Backend

  • Framework: FastAPI (Python 3.10+)
  • Async Runtime: asyncio, uvicorn
  • WebSocket: websockets library
  • AI Providers: openai, anthropic, google-generativeai
  • Search: Google Custom Search API
  • Caching: Redis (optional)
  • Image Processing: Pillow, ImageMagick

Infrastructure

  • Containerization: Docker, Docker Compose
  • VM Environment: Ubuntu 22.04 LTS + XFCE
  • Browser: Google Chrome (with remote debugging)
  • Automation: Selenium, Playwright, PyAutoGUI
  • Cloud: Azure Container Instances (optional)

🤝 Contributing

We love contributions! Here's how you can help:

🐛 Found a Bug?

Open anissue with:

  • Clear description of the bug
  • Steps to reproduce
  • Expected vs actual behavior
  • Screenshots or logs

💡 Have a Feature Idea?

  1. Check if it's alreadyrequested
  2. Open a new issue with theenhancement label
  3. Describe your use case and proposed solution

🔧 Want to Contribute Code?

  1. Fork the repository
  2. Create a feature branch:git checkout -b feature/amazing-feature
  3. Make your changes
  4. Write tests if applicable
  5. Commit:git commit -m 'Add amazing feature'
  6. Push:git push origin feature/amazing-feature
  7. Open a Pull Request

Please read ourContributing Guide for detailed guidelines.


📖 Documentation


🗺️ Roadmap

Q1 2026

  • Multi-VM orchestration (parallel agents)
  • Advanced workflow builder (visual programming)
  • Marketplace for custom agents
  • Windows and macOS VM support
  • Mobile app (iOS/Android)

Q2 2026

  • Plugin system for custom tools
  • Collaborative agent sessions
  • Advanced analytics dashboard
  • Enterprise SSO support
  • Self-hosted cloud deployment guides

Future

  • Voice control integration
  • Video understanding capabilities
  • Agent memory and learning
  • Multi-modal agent interactions
  • Community agent templates

Vote on features:Feature Requests


📊 Performance & Benchmarks

MetricValue
Average Task Completion~45 seconds
Concurrent Sessions50+ (per server)
Browser Navigation~2s per page
Tool Call Latency<500ms
VM Startup Time~15 seconds
Memory per Session~2GB

Benchmarks measured on: 4 CPU cores, 8GB RAM, SSD storage


⚠️ Responsible AI Use

Open Computer Use gives AI agents significant autonomy. Please use responsibly:

  • Do: Automate repetitive tasks, research, testing, content creation
  • Don't: Violate terms of service, spam, scrape without permission
  • 🔒Security: Never share credentials, use isolated environments
  • 📋Compliance: Follow data protection laws (GDPR, CCPA, etc.)
  • 🤝Ethics: Respect website robots.txt and rate limits

Read ourResponsible Use Guidelines for more details.


📄 License

This project is licensed under theApache License 2.0 - see theLICENSE file for details.

Apache License 2.0Copyright (c) 2025 Open Computer Use ContributorsLicensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.

🙏 Acknowledgments

Built with amazing open-source projects:

Special thanks to all ourcontributors! 💙


🌟 Star History

Star History Chart


💬 Community & Support


⭐ Star us on GitHub if you find this useful!

Made with ❤️ by the Open Computer Use community

Star on GitHubJoin Discord

Sponsor this project

 

[8]ページ先頭

©2009-2025 Movatter.jp