Don't miss these

How To Create Your Own AI Chatbot Server With Raspberry Pi 4

How-To
Bypublished

Using a Pi 4 with 8GB of RAM, you can create a ChatGPT-like server based on LLaMA.

When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.

3
Follow us
Tom's Hardware
Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.


By submitting your information you agree to theTerms & Conditions andPrivacy Policy and are aged 16 or over.

You are now subscribed

Your newsletter sign-up was successful


An account already exists for this email address, please log in.

We’ve shown previously that you canrun ChatGPT on a Raspberry Pi, but the catch is that the Pi is just providing the client side and then sending all your prompts to someone else’s powerful server in the cloud. However, it’s possible to create a similar AI chatbot experience that runs locally on an 8GB Raspberry Pi and uses the same kind of LLaMA language models that power AI on Facebook and other services. 

The heart of this project isGeorgi Gerganov’s llama.cpp. Written in an evening, this C/C++ model is fast enough for general use, and is easy to install. It runs on Mac and Linux machines and, in this how to, I’ll tweak Gerganov’s installation process so that the models can be run on aRaspberry Pi 4. If you want a faster chatbot and have a computer with an RTX 3000 series or faster GPU, check out our article onhow to run a ChatGPT-like bot on your PC.

Managing Expectations

Before you head into this project, I need to manage your expectations. LLaMA on the Raspberry Pi 4 is slow. Loading a chat prompt can take minutes, and responses to questions can take just as long. If speed is what you crave, use a Linux desktop / laptop. This is more of a fun project, than a mission critical use case.

For This Project You Will Need

  • Raspberry Pi 4 8GB
  • PC with 16GB of RAM running Linux
  • 16GB or larger USB drive formatted as NTFS

Setting Up LLaMA 7B Models Using A Linux PC

The first section of the process is to set up llama.cpp on a Linux PC, download the LLaMA 7B models, convert them and then copy them to a USB drive. We need the Linux PC’s extra power to convert the model as the 8GB of RAM in a Raspberry Pi is not enough.

1.On your Linux PC open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

2.Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

3.Install a series of Python modules. These modules will work with the model to create a chat bot.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

python3 -m pip install torch numpy sentencepiece

4.Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

5.In the terminal change directory to llama.cpp.

cd llama.cpp

6.Build the project files. Press Enter to run.

make

7.Download the Llama 7B torrent using this link.I used qBittorrent to download the model.

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

8.Refine the download so that just 7B and tokenizer files are downloaded.The other folders contain larger models which weigh in at hundreds of gigabytes in size.

Llama on Pi

(Image credit: Tom's Hardware)

9.Copy 7B and the tokenizer files to /llama.cpp/models/.

10.Open a terminal and go to the llama.cpp folder. This should be in your home directory.

cd llama.cpp

11.Convert the 7B model to ggml FP16 format.Depending on your PC, this can take a while. This step alone is why we need 16GB of RAM. It loads the entire 13GB models/7B/consolidated.00.pth file into RAM as a pytorch model. Trying this step on an 8GB Raspberry Pi 4 will cause an illegal instruction error.

python3 convert-pth-to-ggml.py models/7B/ 1

12.Quantize the model to 4-bits. This will reduce the size of the model.

python3 quantize.py 7B

13.Copy the contents of /models/ to the USB drive.

Running LLaMA on Raspberry Pi 4

Llama on Pi

(Image credit: Tom's Hardware)

In this final section I repeat the llama.cpp setup on the Raspberry Pi 4, then copy the models across using a USB drive. Then I load an interactive chat session and ask “Bob” a series of questions. Just don’t ask it to write any Python code. Step 9 in this process can be run on the Raspberry Pi 4 or on the Linux PC.

1.Boot your Raspberry Pi 4to the desktop.

2.Open a terminal and ensure that git is installed.

sudo apt update && sudo apt install git

3.Use git to clone the repository.

git clone https://github.com/ggerganov/llama.cpp

4.Install a series of Python modules. These modules will work with the model to create a chat bot.

python3 -m pip install torch numpy sentencepiece

5.Ensure that you have g++ and build essential installed. These are needed to build C applications.

sudo apt install g++ build-essential

6.In the terminal, change directory to llama.cpp.

cd llama.cpp

7.Build the project files. Press Enter to run.

make

8.Insert the USB drive and copy the files to /models/ This will overwrite any files in the models directory.

9.Start an interactive chat session with “Bob”.Here is where a little patience is required. Even though the 7B model is lighter than other models, it is still a rather weighty model for the Raspberry Pi to digest. Loading the model can take a few minutes.

./chat.sh

10.Ask Bob a question and press Enter. I asked it to tell me about Jean-Luc Picard from Star Trek: The Next Generation.To exit press CTRL + C.

Llama on Pi

(Image credit: Tom's Hardware)
Les Pounder

Les Pounder is an associate editor at Tom's Hardware. He is a creative technologist and for seven years has created projects to educate and inspire minds both young and old. He has worked with the Raspberry Pi Foundation to write and deliver their teacher training program "Picademy".

  • bit_user
    For This Project You Will NeedRaspberry Pi 4 8GB
    PC with 16GB of RAM running Linux
    16GB or larger USB drive formatted as NTFS
    I'm sure the Pi will also need to be running the 64-bit version of the OS. I don't know if people with the 32-bit OS would've gotten automatically upgraded, but this is probably still worth pointing out.

    As for the USB drive using NTFS, this surprised me. IMO, the only reason to use NTFS is if you need > 4 GB files and require accessibility from aWindows PC. Otherwise, I'd use Linux-native filesystems XFS or BTRFS for > 4 GB file support.
    Reply
  • Alex Fraundorf
    I was able to get this working, but I had to use some different directions. I am brand new to AI and a noob with python, so this could have easily been caused by something I did wrong.

    Step 7: I was not able to download the torrent. I ended up following these instructions from https://github.com/juncongmoo/pyllama:
    pip install pyllama -U
    pip install transformers
    python3 -m llama.download --model_size 7B
    Note: I use Linux Mint and had to turn off my firewall for the previous step. Otherwise it hung at 12.7GB indefinitely.

    Step 12: There was not quantize.py file in my llama.cpp directory. I read through the README file and found this which worked:
    ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

    I am running this on my Linux Mint machine, not a Raspberry Pi, so I was able to skip all of those steps.

    To open the chat, I used the command ./examples/chat.sh

    I hope these notes help anyone else who get stuck.
    Thank you for this tutorial. I am looking forward to playing with the chat.
    Reply
  • Khoo
    Hi, seeing how you got around the problems to get a solution, I will take the liberty of asking you for help if you allow me. I am stuck at step 11. When I enter the command to convert the model 7B files to ggml F16 Format, it tells me that there is no such file or directory in llama.cpp where the 7B model file is located .
    Reply