SkywalkerDarren/chatWebPublic

NotificationsYou must be signed in to change notification settings
Fork135
Star905

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

License

MIT license

905 stars 135 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
ai.py		ai.py
api.py		api.py
config.example.json		config.example.json
config.py		config.py
console.py		console.py
contents.py		contents.py
docker-compose.yml		docker-compose.yml
example.ipynb		example.ipynb
main.py		main.py
readme.md		readme.md
readme.zh.md		readme.zh.md
requirements.txt		requirements.txt
run.sh		run.sh
storage.py		storage.py
webui.py		webui.py

Repository files navigation

ChatWeb

English Doc 中文文档

ChatWeb can crawl any webpage or extract text from PDF, DOCX, TXT files, and generate an embedded summary.It can also answer your questions based on the content of the text.It is implemented using the chatAPI and embeddingAPI based on gpt3.5, as well as a vector database.

Basic Principle

The basic principle is similar to existing projects such as chatPDF and automated customer service AI.

Crawl web pagesExtract text contentUse GPT3.5's embedding API to generate vectors for each paragraphCalculate the similarity score between each paragraph's vector and the entire text's vector to generate a summaryStore the vector-text mapping in a vector databaseGenerate keywords from user inputGenerate a vector from the keywordsUse the vector database to perform a nearest neighbor search and return a list of the most similar textsUse GPT3.5's chat API to design a prompt that answers the user's question based on the most similar texts in the list.The idea is to extract relevant content from a large amount of text and then answer questions based on that content, which can achieve a similar effect to breaking through token limits.

An improvement was made to generate vectors based on keywords rather than the user's question, which increases the accuracy of searching for relevant texts.

Getting Started

Manual installation:

Install Python3
Download this repository by runninggit clone https://github.com/SkywalkerDarren/chatWeb.git
Navigate to the directory by runningcd chatWeb
Copyconfig.example.json toconfig.json
Editconfig.json and setopen_ai_key to your OpenAI API key
Install dependencies by runningpip3 install -r requirements.txt
Start the application by runningpython3 main.py

Docker:

if you prefer, you can also run this project using docker:

build the container usingdocker-compose build (only needed once when you are not planning to contibute to this repo)
copyconfig.example.json toconfig.json and set all the needed stuff. The example config is already fine for running with docker, no need to change anything there, if you don't have the OPEN_AI_KEY in your env variables you can set it here too, or later if you run this app.
run the container: `docker-compose up"
open the application in browser:http://localhost:7860

Set language

Editconfig.json, setlanguage toEnglish or other language

Mode Selection

Editconfig.json and setmode toconsole,api, orwebui to choose the startup mode.
Inconsole mode, type/help to view commands.
Inapi mode, an API service can be provided to the outside world.api_port andapi_host can be set inconfig.json.
Inwebui mode, a web user interface service can be provided.webui_port can be set inconfig.json, defaulting tohttp://127.0.0.1:7860.

Stream Mode

Editconfig.json and setuse_stream totrue.

Setting the Temperature

Editconfig.json and settemperature to a value between 0 and 1.
The smaller the value, the more conservative and stable the response will be. The larger the value, the more daring the response may be, possibly resulting in "hallucinations."

OpenAI Proxy Settings

Editconfig.json and addopen_ai_proxy for your proxy address, for example:

"open_ai_proxy": {  "http": "socks5://127.0.0.1:1081",  "https": "socks5://127.0.0.1:1081"}

Install PostgreSQL (Optional)

Editconfig.json and setuse_postgres totrue.
Install PostgreSQL.
- The default SQL address ispostgresql://localhost:5432/mydb, or you can set it inconfig.json.
Install the pgvector plugin.

Compile and install the extension (support Postgres 11+).

git clone --branch v0.4.0 https://github.com/pgvector/pgvector.gitcd pgvectormakemake install# may need sudo

Then load it in the database you want to use it in

CREATE EXTENSION vector;

Install dependency with pip:pip3 install psycopg2

Example

Please enter the link to the article or the file path of the PDF/TXT/DOCX document: https://gutenberg.ca/ebooks/hemingwaye-oldmanandthesea/hemingwaye-oldmanandthesea-00-e.htmlPlease wait for 10 seconds until the webpage finishes loading.The article has been retrieved, and the number of text fragments is: 663...=====================================Query fragments used tokens: 7219, cost:$0.0028876Query fragments used tokens: 7250, cost:$0.0029000000000000002Query fragments used tokens: 7188, cost:$0.0028752Query fragments used tokens: 7177, cost:$0.0028708Query fragments used tokens: 2378, cost:$0.0009512000000000001Embeddings have been created with 663 embeddings, using 31212 tokens, costing$0.0124848The embeddings have been saved.=====================================Please enter your query (/help to view commands):

TODO

Support for pdf/txt/docx files
Support for in-memory storage without a database (faiss)
Support for Stream
Support for API
Support for proxies
Add Colab support
Add language support
Support for temperature
Support for webui
Other features that have not been thought of yet

Star History

About

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ChatWeb

Basic Principle

Getting Started

Manual installation:

Docker:

Set language

Mode Selection

Stream Mode

Setting the Temperature

OpenAI Proxy Settings

Install PostgreSQL (Optional)

Example

TODO

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

SkywalkerDarren/chatWeb

Folders and files

Latest commit

History

Repository files navigation

ChatWeb

Basic Principle

Getting Started

Manual installation:

Docker:

Set language

Mode Selection

Stream Mode

Setting the Temperature

OpenAI Proxy Settings

Install PostgreSQL (Optional)

Example

TODO

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages