How To Create Your Own AI Chatbot Server With Raspberry Pi 4

ByLes Pounderpublished25 March 2023

Using a Pi 4 with 8GB of RAM, you can create a ChatGPT-like server based on LLaMA.

When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.

We’ve shown previously that you canrun ChatGPT on a Raspberry Pi, but the catch is that the Pi is just providing the client side and then sending all your prompts to someone else’s powerful server in the cloud. However, it’s possible to create a similar AI chatbot experience that runs locally on an 8GB Raspberry Pi and uses the same kind of LLaMA language models that power AI on Facebook and other services.

The heart of this project isGeorgi Gerganov’s llama.cpp. Written in an evening, this C/C++ model is fast enough for general use, and is easy to install. It runs on Mac and Linux machines and, in this how to, I’ll tweak Gerganov’s installation process so that the models can be run on aRaspberry Pi 4. If you want a faster chatbot and have a computer with an RTX 3000 series or faster GPU, check out our article onhow to run a ChatGPT-like bot on your PC.

Managing Expectations

Before you head into this project, I need to manage your expectations. LLaMA on the Raspberry Pi 4 is slow. Loading a chat prompt can take minutes, and responses to questions can take just as long. If speed is what you crave, use a Linux desktop / laptop. This is more of a fun project, than a mission critical use case.

For This Project You Will Need

Raspberry Pi 4 8GB
PC with 16GB of RAM running Linux
16GB or larger USB drive formatted as NTFS

Setting Up LLaMA 7B Models Using A Linux PC

The first section of the process is to set up llama.cpp on a Linux PC, download the LLaMA 7B models, convert them and then copy them to a USB drive. We need the Linux PC’s extra power to convert the model as the 8GB of RAM in a Raspberry Pi is not enough.

1.On your Linux PC open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

2.Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

3.Install a series of Python modules. These modules will work with the model to create a chat bot.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

python3 -m pip install torch numpy sentencepiece

4.Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

5.In the terminal change directory to llama.cpp.

cd llama.cpp

6.Build the project files. Press Enter to run.

make

7.Download the Llama 7B torrent using this link.I used qBittorrent to download the model.

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

8.Refine the download so that just 7B and tokenizer files are downloaded.The other folders contain larger models which weigh in at hundreds of gigabytes in size.

Llama on Pi — (Image credit: Tom's Hardware)

9.Copy 7B and the tokenizer files to /llama.cpp/models/.

10.Open a terminal and go to the llama.cpp folder. This should be in your home directory.

cd llama.cpp

11.Convert the 7B model to ggml FP16 format.Depending on your PC, this can take a while. This step alone is why we need 16GB of RAM. It loads the entire 13GB models/7B/consolidated.00.pth file into RAM as a pytorch model. Trying this step on an 8GB Raspberry Pi 4 will cause an illegal instruction error.

python3 convert-pth-to-ggml.py models/7B/ 1

12.Quantize the model to 4-bits. This will reduce the size of the model.

python3 quantize.py 7B

13.Copy the contents of /models/ to the USB drive.

Running LLaMA on Raspberry Pi 4

In this final section I repeat the llama.cpp setup on the Raspberry Pi 4, then copy the models across using a USB drive. Then I load an interactive chat session and ask “Bob” a series of questions. Just don’t ask it to write any Python code. Step 9 in this process can be run on the Raspberry Pi 4 or on the Linux PC.

1.Boot your Raspberry Pi 4to the desktop.

2.Open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

3.Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

4.Install a series of Python modules. These modules will work with the model to create a chat bot.

python3 -m pip install torch numpy sentencepiece

5.Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

6.In the terminal, change directory to llama.cpp.

cd llama.cpp

7.Build the project files. Press Enter to run.

make

8.Insert the USB drive and copy the files to /models/ This will overwrite any files in the models directory.

9.Start an interactive chat session with “Bob”.Here is where a little patience is required. Even though the 7B model is lighter than other models, it is still a rather weighty model for the Raspberry Pi to digest. Loading the model can take a few minutes.

./chat.sh

10.Ask Bob a question and press Enter. I asked it to tell me about Jean-Luc Picard from Star Trek: The Next Generation.To exit press CTRL + C.

TOPICS

Les Pounder

Les Pounder is an associate editor at Tom's Hardware. He is a creative technologist and for seven years has created projects to educate and inspire minds both young and old. He has worked with the Raspberry Pi Foundation to write and deliver their teacher training program "Picademy".

Latest in Raspberry Pi Projects

Tiny Mac look-a-like alarm clock transformed to run real Mac software

Elecrow CrowPi 3 open-source learning tool

Elecrow's Raspberry Pi powered CrowPi 3 educational kit offers plenty of scope for learning

Raspberry Pi ADS-B plane tracker features a weather station and more

Electronics builder qualifies for ‘Father of the Year’ with this Raspberry Pi-enhanced kids’ electric car

One of the first home computers resurrected

How to make a connected badge that shows the latest news headlines — and looks like a floppy disk

Latest in How To

How to setup an Android smartphone as a webcam — Camo Studio unlocks new uses for old smartphones in Windows 10 or 11 and OBS software

How to use Plex and a spare computer to build a streaming movie service

How to connect an Xbox controller to your Windows PC — wired and wireless gaming made easy

How to choose a CPU – A guide to picking the right processor for your PC

How to save memory in Windows — RAM is expensive; here’s how to maximize the RAM that you already have!

Here are the top ten PowerShell commands in Windows that will get you started

3 CommentsComment from the forums

bit_user
For This Project You Will NeedRaspberry Pi 4 8GB
PC with 16GB of RAM running Linux
16GB or larger USB drive formatted as NTFS
I'm sure the Pi will also need to be running the 64-bit version of the OS. I don't know if people with the 32-bit OS would've gotten automatically upgraded, but this is probably still worth pointing out.

As for the USB drive using NTFS, this surprised me. IMO, the only reason to use NTFS is if you need > 4 GB files and require accessibility from aWindows PC. Otherwise, I'd use Linux-native filesystems XFS or BTRFS for > 4 GB file support.
Reply
Alex Fraundorf
I was able to get this working, but I had to use some different directions. I am brand new to AI and a noob with python, so this could have easily been caused by something I did wrong.

Step 7: I was not able to download the torrent. I ended up following these instructions from https://github.com/juncongmoo/pyllama:
pip install pyllama -U
pip install transformers
python3 -m llama.download --model_size 7B
Note: I use Linux Mint and had to turn off my firewall for the previous step. Otherwise it hung at 12.7GB indefinitely.

Step 12: There was not quantize.py file in my llama.cpp directory. I read through the README file and found this which worked:
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

I am running this on my Linux Mint machine, not a Raspberry Pi, so I was able to skip all of those steps.

To open the chat, I used the command ./examples/chat.sh

I hope these notes help anyone else who get stuck.
Thank you for this tutorial. I am looking forward to playing with the chat.
Reply
Khoo
Hi, seeing how you got around the problems to get a solution, I will take the liberty of asking you for help if you allow me. I am stuck at step 11. When I enter the command to convert the model 7B files to ggml F16 Format, it tells me that there is no such file or directory in llama.cpp where the 7B model file is located .
Reply

Movatterモバイル変換

Managing Expectations

For This Project You Will Need

Setting Up LLaMA 7B Models Using A Linux PC

Running LLaMA on Raspberry Pi 4