PromtEngineer/localGPTPublic

NotificationsYou must be signed in to change notification settings
Fork2.4k
Star21.6k

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

License

Apache-2.0 license

21.6k stars 2.4k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github		.github
SOURCE_DOCUMENTS		SOURCE_DOCUMENTS
gaudi_utils		gaudi_utils
localGPTUI		localGPTUI
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pyup.yml		.pyup.yml
3.20.2		3.20.2
ACKNOWLEDGEMENT.md		ACKNOWLEDGEMENT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile_hpu		Dockerfile_hpu
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
crawl.py		crawl.py
ingest.py		ingest.py
load_models.py		load_models.py
localGPT_UI.py		localGPT_UI.py
prompt_template_utils.py		prompt_template_utils.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_localGPT.py		run_localGPT.py
run_localGPT_API.py		run_localGPT_API.py
utils.py		utils.py

Repository files navigation

LocalGPT: Secure, Local Conversations with Your Documents 🌐

🚨🚨 NEW VERSION OF LOCALGPT IS OUT ON THELOCALGPT-V2 BRANCH.

You can run localGPT on a pre-configuredVirtual Machine. Make sure to use the code: PromptEngineering to get 50% off. I will get a small commision!

LocalGPT is an open-source initiative that allows you to converse with your documents without compromising your privacy. With everything running locally, you can be assured that no data ever leaves your computer. Dive into the world of secure, local document interactions with LocalGPT.

Features 🌟

Utmost Privacy: Your data remains on your computer, ensuring 100% security.
Versatile Model Support: Seamlessly integrate a variety of open-source models, including HF, GPTQ, GGML, and GGUF.
Diverse Embeddings: Choose from a range of open-source embeddings.
Reuse Your LLM: Once downloaded, reuse your LLM without the need for repeated downloads.
Chat History: Remembers your previous conversations (in a session).
API: LocalGPT has an API that you can use for building RAG Applications.
Graphical Interface: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
GPU, CPU, HPU & MPS Support: Supports multiple platforms out of the box, Chat with your data usingCUDA,CPU,HPU (Intel® Gaudi®) orMPS and more!

Dive Deeper with Our Videos 🎥

Technical Details 🛠️

By selecting the right local models and the power ofLangChain you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.

ingest.py usesLangChain tools to parse the document and create embeddings locally usingInstructorEmbeddings. It then stores the result in a local vector database usingChroma vector store.
run_localGPT.py uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.

This project was inspired by the originalprivateGPT.

Built Using 🧩

Environment Setup 🌍

📥 Clone the repo using git:

git clone https://github.com/PromtEngineer/localGPT.git

🐍 Installconda for virtual environment management. Create and activate a new virtual environment.

conda create -n localGPT python=3.10.0conda activate localGPT

🛠️ Install the dependencies using pip

To set up your environment to run the code, first install all requirements:

pip install -r requirements.txt

Installing LLAMA-CPP :

LocalGPT usesLlamaCpp-Python for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.

To run the quantized Llama3 model, ensure you have llama-cpp-python version 0.2.62 or higher installed.

If you want to use BLAS or Metal withllama-cpp you can set appropriate flags:

ForNVIDIA GPUs support, usecuBLAS

# Example: cuBLASCMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

For Apple Metal (M1/M2) support, use

# Example: METALCMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

For more details, please refer tollama-cpp

Docker 🐳

Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system.As an alternative to Conda, you can use Docker with the provided Dockerfile.It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit.Build asdocker build -t localgpt ., requires BuildKit.Docker BuildKit does not support GPU duringdocker build time right now, only duringdocker run.Run asdocker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt.For running the code on Intel® Gaudi® HPU, use the following Dockerfile -Dockerfile_hpu.

Test dataset

For testing, this repository comes withConstitution of USA as an example file to use.

Ingesting your OWN Data.

Put your files in theSOURCE_DOCUMENTS folder. You can put multiple folders within theSOURCE_DOCUMENTS folder and the code will recursively read your files.

Support file formats:

LocalGPT currently supports the following file formats. LocalGPT usesLangChain for loading these file formats. The code inconstants.py uses aDOCUMENT_MAP dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader fromLangChain.

DOCUMENT_MAP = {".txt": TextLoader,".md": TextLoader,".py": TextLoader,".pdf": PDFMinerLoader,".csv": CSVLoader,".xls": UnstructuredExcelLoader,".xlsx": UnstructuredExcelLoader,".docx": Docx2txtLoader,".doc": Docx2txtLoader,}

Ingest

Run the following command to ingest all the data.

If you havecuda setup on your system.

python ingest.py

You will see an output like this:

Use the device type argument to specify a given device.To run oncpu

python ingest.py --device_type cpu

To run onM1/M2

python ingest.py --device_type mps

Use help for a full list of supported devices.

python ingest.py --help

This will create a new folder calledDB and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database.If you want to start from an empty database, delete theDB and reingest your documents.

Note: When you run this for the first time, it will need internet access to download the embedding model (default:Instructor Embedding). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.

Ask questions to your documents, locally!

In order to chat with your documents, run the following command (by default, it will run oncuda).

python run_localGPT.py

You can also specify the device type just likeingest.py

python run_localGPT.py --device_type mps# to run on Apple silicon

# To run on Intel® Gaudi® hpuMODEL_ID ="mistralai/Mistral-7B-Instruct-v0.2"# in constants.pypython run_localGPT.py --device_type hpu

This will load the ingested vector store and embedding model. You will be presented with a prompt:

> Enter a query:

After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below.

Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.

Note: When you run this for the first time, it will need internet connection to download the LLM (default:TheBloke/Llama-2-7b-Chat-GGUF). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.

Typeexit to finish the script.

Extra Options with run_localGPT.py

You can use the--show_sources flag withrun_localGPT.py to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks

python run_localGPT.py --show_sources

Another option is to enable chat history.Note: This is disabled by default and can be enabled by using the--use_history flag. The context window is limited so keep in mind enabling history will use it and might overflow.

python run_localGPT.py --use_history

You can store user questions and model responses with flag--save_qa into a csv file/local_chat_history/qa_log.csv. Every interaction will be stored.

python run_localGPT.py --save_qa

Run the Graphical User Interface

Openconstants.py in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:
```
MODEL_ID ="TheBloke/Llama-2-7b-Chat-GGUF"MODEL_BASENAME ="llama-2-7b-chat.Q4_K_M.gguf"
```
Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.
Navigate to the/LOCALGPT directory.
Run the following commandpython run_localGPT_API.py. The API should being to run.
Wait until everything has loaded in. You should see something likeINFO:werkzeug:Press CTRL+C to quit.
Open up a second terminal and activate the same python environment.
Navigate to the/LOCALGPT/localGPTUI directory.
Run the commandpython localGPTUI.py.
Open up a web browser and go the addresshttp://localhost:5111/.

How to select different LLM models?

To change the models you will need to set bothMODEL_ID andMODEL_BASENAME.

Open upconstants.py in the editor of your choice.
Change theMODEL_ID andMODEL_BASENAME. If you are using a quantized model (GGML,GPTQ,GGUF), you will need to provideMODEL_BASENAME. For unquantized models, setMODEL_BASENAME toNONE
There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
- Make sure you have aMODEL_ID selected. For example ->MODEL_ID = "TheBloke/guanaco-7B-HF"
- Go to theHuggingFace Repo
For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
- Make sure you have aMODEL_ID selected. For example -> model_id ="TheBloke/wizardLM-7B-GPTQ"
- Got to the correspondingHuggingFace Repo and select "Files and versions".
- Pick one of the model names and set it asMODEL_BASENAME. For example ->MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
Follow the same steps forGGUF andGGML models.

GPU and VRAM Requirements

Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.

Mode Size (B)	float32	float16	GPTQ 8bit	GPTQ 4bit
7B	28 GB	14 GB	7 GB - 9 GB	3.5 GB - 5 GB
13B	52 GB	26 GB	13 GB - 15 GB	6.5 GB - 8 GB
32B	130 GB	65 GB	32.5 GB - 35 GB	16.25 GB - 19 GB
65B	260.8 GB	130.4 GB	65.2 GB - 67 GB	32.6 GB - 35 GB

System Requirements

Python Version

To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.

C++ Compiler

If you encounter an error while building a wheel during thepip install process, you may need to install a C++ compiler on your computer.

For Windows 10/11

To install a C++ compiler on Windows 10/11, follow these steps:

Install Visual Studio 2022.
Make sure the following components are selected:
- Universal Windows Platform development
- C++ CMake tools for Windows
Download the MinGW installer from theMinGW website.
Run the installer and select the "gcc" component.

NVIDIA Driver's Issues:

Follow thispage to install NVIDIA Drivers.

Star History

Disclaimer

This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.

Common Errors

Torch not compatible with CUDA enabled

Get CUDA version
```
nvcc --version
```
```
nvidia-smi
```

Try installing PyTorch depending on your CUDA version

   conda install -c pytorch torchvision cudatoolkit=10.1 pytorch

If it doesn't work, try reinstalling

   pip uninstall torch   pip cache purge   pip install torch -f https://download.pytorch.org/whl/torch_stable.html

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed
```
   pip install h5py   pip install typing-extensions   pip install wheel
```

Failed to import transformers

Try re-install

   conda uninstall tokenizers, transformers   pip install transformers

About

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Releases

No releases published

Sponsor this project

ko-fi.com/promptengineering

Packages

No packages published

Contributors41

+ 27 contributors

Movatterモバイル変換

Uh oh!

License

PromtEngineer/localGPT

Folders and files

Latest commit

History

Repository files navigation

LocalGPT: Secure, Local Conversations with Your Documents 🌐

Features 🌟

Dive Deeper with Our Videos 🎥

Technical Details 🛠️

Built Using 🧩

Environment Setup 🌍

Docker 🐳

Test dataset

Ingesting your OWN Data.

Support file formats:

Ingest

Ask questions to your documents, locally!

Extra Options with run_localGPT.py

Run the Graphical User Interface

How to select different LLM models?

GPU and VRAM Requirements

System Requirements

Python Version

C++ Compiler

For Windows 10/11

NVIDIA Driver's Issues:

Star History

Disclaimer

Common Errors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages0

Uh oh!

Contributors41

Languages

Packages