dawnmsg/TaskMatrixPublic

forked fromchenfei-wu/TaskMatrix

NotificationsYou must be signed in to change notification settings
Fork0
Star2

License

MIT license

2 stars 3.3k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
LowCodeLLM		LowCodeLLM
TaskMatrix.AI		TaskMatrix.AI
assets		assets
.DS_Store		.DS_Store
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt
visual_chatgpt.py		visual_chatgpt.py

Repository files navigation

TaskMatrix

TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enablesending andreceiving images during chatting.

See our paper:Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Updates:

Now TaskMatrix supportsGroundingDINO andsegment-anything! Thanks@jordddan for his efforts. For the image editing case,GroundingDINO is first used to locate bounding boxes guided by given text, thensegment-anything is used to generate the related mask, and finally stable diffusion inpainting is used to edit image based on the mask.
- Firstly, runpython visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
- Then, sayfind xxx in the image orsegment xxx in the image.xxx is an object. TaskMatrix will return the detection or segmentation result!
Now TaskMatrix can support Chinese! Thanks to@Wang-Xiaodong1899 for his efforts.
We propose thetemplate idea in TaskMatrix!
- A template is apre-defined execution flow that assists ChatGPT in assembling complex tasks involving multiple foundation models.
- A template contains theexperiential solution to complex tasks as determined by humans.
- A template caninvoke multiple foundation models or evenestablish a new ChatGPT session
- To define atemplate, simply adding a class with attributestemplate_model = True
Thanks to@ShengmingYin and@thebestannie for providing a template example inInfinityOutPainting class (see the following gif)
- Firstly, runpython visual_chatgpt.py --load "Inpainting_cuda:0,ImageCaptioning_cuda:0,VisualQuestionAnswering_cuda:0"
- Secondly, sayextend the image to 2048x1024 to TaskMatrix!
- By simply creating anInfinityOutPainting template, TaskMatrix can seamlessly extend images to any size through collaboration with existingImageCaptioning,Inpainting, andVisualQuestionAnswering foundation models,without the need for additional training.
TaskMatrix needs the effort of the community! We crave your contribution to add new and interesting features!

Insight & Goal:

On the one hand,ChatGPT (or LLMs) serves as ageneral interface that provides a broad and diverse understanding of awide range of topics. On the other hand,Foundation Models serve asdomain experts by providing deep knowledge in specific domains.By leveragingboth general and deep knowledge, we aim at building an AI that is capable of handling various tasks.

Demo

System Architecture

Quick Start

# clone the repogit clone https://github.com/microsoft/TaskMatrix.git# Go to directorycd visual-chatgpt# create a new environmentconda create -n visgpt python=3.8# activate the new environmentconda activate visgpt#  prepare the basic environmentspip install -r requirements.txtpip install  git+https://github.com/IDEA-Research/GroundingDINO.gitpip install  git+https://github.com/facebookresearch/segment-anything.git# prepare your private OpenAI key (for Linux)export OPENAI_API_KEY={Your_Private_Openai_Key}# prepare your private OpenAI key (for Windows)set OPENAI_API_KEY={Your_Private_Openai_Key}# Start TaskMatrix !# You can specify the GPU/CPU assignment by "--load", the parameter indicates which # Visual Foundation Model to use and where it will be loaded to# The model and device are separated by underline '_', the different models are separated by comma ','# The available Visual Foundation Models can be found in the following table# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"# Advice for CPU Userspython visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu# Advice for 1 Tesla T4 15GB  (Google Colab)                       python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"                                # Advice for 4 Tesla V100 32GB                            python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,    Inpainting_cuda:0,ImageCaptioning_cuda:0,    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,    SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

GPU memory usage

Here we list the GPU memory usage of each visual foundation model, you can specify which one you like:

Foundation Model	GPU Memory (MB)
ImageEditing	3981
InstructPix2Pix	2827
Text2Image	3385
ImageCaptioning	1209
Image2Canny	0
CannyText2Image	3531
Image2Line	0
LineText2Image	3529
Image2Hed	0
HedText2Image	3529
Image2Scribble	0
ScribbleText2Image	3531
Image2Pose	0
PoseText2Image	3529
Image2Seg	919
SegText2Image	3529
Image2Depth	0
DepthText2Image	3531
Image2Normal	0
NormalText2Image	3529
VisualQuestionAnswering	1495

Acknowledgement

We appreciate the open source of the following projects:

Hugging Face LangChain Stable Diffusion ControlNet InstructPix2Pix CLIPSeg BLIP

Contact Information

For help or issues using the TaskMatrix, please submit a GitHub issue.

For other communications, please contact Chenfei WU (chewu@microsoft.com) or Nan DUAN (nanduan@microsoft.com).

Trademark Notice

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must followMicrosoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

About

No description, website, or topics provided.

Resources

Readme

License

MIT license

Code of conduct

Contributing

Releases

No releases published

Packages

No packages published

Languages

Python80.5%
HTML19.2%
Dockerfile0.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TaskMatrix

Updates:

Insight & Goal:

Demo

System Architecture

Quick Start

GPU memory usage

Acknowledgement

Contact Information

Trademark Notice

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

dawnmsg/TaskMatrix

Folders and files

Latest commit

History

Repository files navigation

TaskMatrix

Updates:

Insight & Goal:

Demo

System Architecture

Quick Start

GPU memory usage

Acknowledgement

Contact Information

Trademark Notice

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages