FareedKhan-dev/ai-desktopPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star36

AI agent that controls a computer

levelup.gitconnected.com/creating-an-ai-agent-that-uses-a-computer-like-people-do-288f7ad97169

36 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
OmniParser @ e40a461		OmniParser @ e40a461
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
main.py		main.py
utils.py		utils.py

Repository files navigation

AI Desktop

Simple AI Desktop that uses the OmniParser and Vision-Language Model to interact with the system. It can perform various tasks like opening applications, searching the web, and answering questions.

User Query:Open Google Chrome and search for google stock price

sample_result.mp4

How it works

graph TD;    A[User Prompt: Open Chrome and buy me a milk] -->|User Input| B[VLMAgent];    B -->|Parse Screen Content| C[Omniparser];    C -->|Extracted Info| D[Computer];    B -->|Analyze Screen, Determine Action| E[LLM OpenAI];    E -->|Generate Action e.g., Mouse Move, Type| F[Action Execution];    F -->|Execute Action on Computer| D;    D -->|Get Result/Feedback| B;        F -->|Repeat until Task Complete| G[Task Complete];

It takes a user prompt and processes it through a vision-language model (VLMAgent). The agent analyzes the screen, extracts information, and determines the required actions using an AI model. These actions are then executed on the computer, repeating until the task is complete.

Installation

Clone the repository along with the OmniParser submodule

git clone --recursive https://github.com/FareedKhan-dev/ai-desktop

Or, if already cloned, toupdate OmniParser submodule

git submodule update --init --recursive

To install the dependencies, run the following command:

cd ai-desktop/OmniParserpip install -r requirements.txt

AI-Dekstop does not require any additional dependencies.

OmniParser Setup

Navigate to theOmniParser directory

cd OmniParser

Download the model checkpoints:

# Download the model checkpoints to the local directory OmniParser/weights/mkdir -p weights/icon_detect weights/icon_caption_florenceforfilein icon_detect/{train_args.yaml,model.pt,model.yaml} \            icon_caption/{config.json,generation_config.json,model.safetensors};do    huggingface-cli download microsoft/OmniParser-v2.0"$file" --local-dir weightsdonemv weights/icon_caption weights/icon_caption_florence

make sure the weights are downloaded in theweights directory and it should be calledicon_detect andicon_caption_florence respectively.

To start the gradio api ofomniparser, run the following command:

python gradio_demo.py

The gradio api will start atlocalhost:<port> and live sharaing link will be generated.

Configuration

Modify theconfig.py file to set up the API URLs, model names, and authentication keys.

OMNIPARSER_API_URL="OMNIPARSER_Gradio_link"# Set the OmniParser Gradio API link  (Follow the Usage section to get the link)VLM_MODEL_NAME="OPENAI/LOCAL_MODEL_NAME"# Define the vision-language modelBASE_URL="BASE_URL"# Set the base URL for the APIAPI_KEY="API_KEY"# Provide the API key

TheSYSTEM_PROMPT inconfig.py defines the AI agent behavior, guiding it to interact with the system using various actions like mouse movements, clicks, typing, and screenshots. Modify it as needed for custom AI interactions.

Running the AI Desktop

To start the AI Desktop, run the following command:

python main.py

you can modify theuser_query inmain.py to test different queries.

About

AI agent that controls a computer

levelup.gitconnected.com/creating-an-ai-agent-that-uses-a-computer-like-people-do-288f7ad97169

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Desktop

Table of Contents

How it works

Installation

OmniParser Setup

Configuration

Running the AI Desktop

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages