Rate this Page

★★★★★

intermediate/realtime_rpi

Real Time Inference on Raspberry Pi 4 and 5 (40 fps!)#

Created On: Feb 08, 2022 | Last Updated: Sep 30, 2025 | Last Verified: Nov 05, 2024

Author:Tristan Rice

PyTorch has out of the box support for Raspberry Pi 4 and 5. This tutorial will guideyou on how to setup a Raspberry Pi for running PyTorch and run a MobileNet v2classification model in real time (30-40 fps) on the CPU.

This was all tested with Raspberry Pi 4 Model B 4GB but should work with the 2GBvariant as well as on the 3B with reduced performance.

https://user-images.githubusercontent.com/909104/153093710-bc736b6f-69d9-4a50-a3e8-9f2b2c9e04fd.gif

Prerequisites#

To follow this tutorial you’ll need a Raspberry Pi 4 or 5, a camera for it and allthe other standard accessories.

Raspberry Pi 4 Model B 2GB+
Raspberry Pi Camera Module
Heat sinks and Fan (optional but recommended)
5V 3A USB-C Power Supply
SD card (at least 8gb)
SD card read/writer

Raspberry Pi Setup#

PyTorch only provides pip packages for Arm 64bit (aarch64) so you’ll need to install a 64 bit version of the OS on your Raspberry Pi

You’ll need to install theofficial rpi-imager to install Rasbperry Pi OS.

32-bit Raspberry Pi OS will not work.

https://user-images.githubusercontent.com/909104/152866212-36ce29b1-aba6-4924-8ae6-0a283f1fca14.gif

Installation will take at least a few minutes depending on your internet speed and sdcard speed. Once it’s done it should look like:

https://user-images.githubusercontent.com/909104/152867425-c005cff0-5f3f-47f1-922d-e0bbb541cd25.png

Time to put your sdcard in your Raspberry Pi, connect the camera and boot it up.

https://user-images.githubusercontent.com/909104/152869862-c239c980-b089-4bd5-84eb-0a1e5cf22df2.png

Raspberry Pi 4 Config#

If you’re using a Raspberry Pi 4, you’ll need some additional config changes. These changes are not required on Raspberry Pi 5.

Once the OS boots and you complete the initial setup you’ll need to edit the/boot/config.txt file to enable the camera.

# This enables the extended features such as the camera.start_x=1# This needs to be at least 128M for the camera processing, if it's bigger you can just leave it as is.gpu_mem=128

And then reboot.

Installing PyTorch and picamera2#

PyTorch and all the other libraries we need have ARM 64-bit/aarch64 variants so you can just install them via pip and have it work like any other Linux system.

$sudoaptinstall-ypython3-picamera2python3-libcamera$pipinstalltorchtorchvision--break-system-packages

https://user-images.githubusercontent.com/909104/152874260-95a7a8bd-0f9b-438a-9c0b-5b67729e233f.png

We can now check that everything installed correctly:

$python-c"import torch; print(torch.__version__)"

https://user-images.githubusercontent.com/909104/152874271-d7057c2d-80fd-4761-aed4-df6c8b7aa99f.png

Video Capture#

Test the camera is working first, by runninglibcamera-hello in a terminal.

For video capture we’re going to be using picamera2 to capture the video frames.

The model we’re using (MobileNetV2) takes in image sizes of224x224 so wecan request that directly from picamera2 at 36fps. We’re targeting 30fps for themodel but we request a slightly higher framerate than that so there’s alwaysenough frames.

frompicamera2importPicamera2picam2=Picamera2()# print available sensor modesprint(picam2.sensor_modes)config=picam2.create_still_configuration(main={"size":(224,224),"format":"BGR888",},display="main")picam2.configure(config)picam2.set_controls({"FrameRate":36})picam2.start()

To capture the frames we can callcapture_image to return aPIL.Imageobject that we can use with PyTorch.

# read frameimage=picam2.capture_image("main")# show frame for testingimage.show()

This data reading and processing takes about3.5ms.

Image Preprocessing#

We need to take the frames and transform them into the format the model expects. This is the same processing as you would do on any machine with the standard torchvision transforms.

fromtorchvisionimporttransformspreprocess=transforms.Compose([# convert the frame to a CHW torch tensor for trainingtransforms.ToTensor(),# normalize the colors to the range that mobilenet_v2/3 expecttransforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]),])input_tensor=preprocess(image)# The model can handle multiple images simultaneously so we need to add an# empty dimension for the batch.# [3, 224, 224] -> [1, 3, 224, 224]input_batch=input_tensor.unsqueeze(0)

Model Choices#

There’s a number of models you can choose from to use with different performancecharacteristics. Not all models provide aqnnpack pretrained variant so fortesting purposes you should chose one that does but if you train and quantizeyour own model you can use any of them.

We’re usingmobilenet_v2 for this tutorial since it has good performance andaccuracy.

Raspberry Pi 4 Benchmark Results:

Model	FPS	Total Time (ms/frame)	Model Time (ms/frame)	qnnpack Pretrained
mobilenet_v2	33.7	29.7	26.4	True
mobilenet_v3_large	29.3	34.1	30.7	True
resnet18	9.2	109.0	100.3	False
resnet50	4.3	233.9	225.2	False
resnext101_32x8d	1.1	892.5	885.3	False
inception_v3	4.9	204.1	195.5	False
googlenet	7.4	135.3	132.0	False
shufflenet_v2_x0_5	46.7	21.4	18.2	False
shufflenet_v2_x1_0	24.4	41.0	37.7	False
shufflenet_v2_x1_5	16.8	59.6	56.3	False
shufflenet_v2_x2_0	11.6	86.3	82.7	False

MobileNetV2: Quantization and JIT#

For optimal performance we want a model that’s quantized and fused. Quantizedmeans that it does the computation using int8 which is much more performant thanthe standard float32 math. Fused means that consecutive operations have beenfused together into a more performant version where possible. Commonly thingslike activations (ReLU) can be merged into the layer before (Conv2d)during inference.

The aarch64 version of pytorch requires using theqnnpack engine.

importtorchtorch.backends.quantized.engine='qnnpack'

For this example we’ll use a prequantized and fused version of MobileNetV2 that’s provided out of the box by torchvision.

fromtorchvisionimportmodelsnet=models.quantization.mobilenet_v2(pretrained=True,quantize=True)

We then want to jit the model to reduce Python overhead and fuse any ops. Jit gives us ~30fps instead of ~20fps without it.

net=torch.jit.script(net)

Putting It Together#

We can now put all the pieces together and run it:

importtimeimporttorchfromtorchvisionimportmodels,transformsfrompicamera2importPicamera2torch.backends.quantized.engine='qnnpack'picam2=Picamera2()# print available sensor modesprint(picam2.sensor_modes)config=picam2.create_still_configuration(main={"size":(224,224),"format":"BGR888",},display="main")picam2.configure(config)picam2.set_controls({"FrameRate":36})picam2.start()preprocess=transforms.Compose([transforms.ToTensor(),transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]),])net=models.quantization.mobilenet_v2(pretrained=True,quantize=True)# jit model to take it from ~20fps to ~30fpsnet=torch.jit.script(net)started=time.time()last_logged=time.time()frame_count=0withtorch.no_grad():whileTrue:# read frameimage=picam2.capture_image("main")# preprocessinput_tensor=preprocess(image)# create a mini-batch as expected by the modelinput_batch=input_tensor.unsqueeze(0)# run modeloutput=net(input_batch)# do something with output ...print(output.argmax())# log model performanceframe_count+=1now=time.time()ifnow-last_logged>1:print(f"{frame_count/(now-last_logged)} fps")last_logged=nowframe_count=0

Running it shows that we’re hovering at ~30 fps on a Raspberry Pi 4 and ~41 fps on a Raspberry Pi 5.

https://user-images.githubusercontent.com/909104/152892609-7d115705-3ec9-4f8d-beed-a51711503a32.png

This is with all the default settings in Raspberry Pi OS. If you disabled the UIand all the other background services that are enabled by default it’s moreperformant and stable.

If we checkhtop we see that we have almost 100% utilization.

https://user-images.githubusercontent.com/909104/152892630-f094b84b-19ba-48f6-8632-1b954abc59c7.png

To verify that it’s working end to end we can compute the probabilities of theclasses anduse the ImageNet class labelsto print the detections.

top=list(enumerate(output[0].softmax(dim=0)))top.sort(key=lambdax:x[1],reverse=True)foridx,valintop[:10]:print(f"{val.item()*100:.2f}%{classes[idx]}")

mobilenet_v3_large running in real time:

Detecting an orange:

https://user-images.githubusercontent.com/909104/153092153-d9c08dfe-105b-408a-8e1e-295da8a78c19.jpg

Detecting a mug:

https://user-images.githubusercontent.com/909104/153092155-4b90002f-a0f3-4267-8d70-e713e7b4d5a0.jpg

Troubleshooting: Performance#

PyTorch by default will use all of the cores available. If you have anythingrunning in the background on the Raspberry Pi it may cause contention with themodel inference causing latency spikes. To alleviate this you can reduce thenumber of threads which will reduce the peak latency at a small performancepenalty.

torch.set_num_threads(2)

Forshufflenet_v2_x1_5 using2threads instead of4threadsincreases best case latency to72ms from60ms but eliminates thelatency spikes of128ms.

Next Steps#

You can create your own model or fine tune an existing one. If you fine tune onone of the models fromtorchvision.models.quantizedmost of the work to fuse and quantize has already been done for you so you candirectly deploy with good performance on a Raspberry Pi.

See more:

Quantization for more information on how to quantize and fuse your model.
Transfer Learning Tutorialfor how to use transfer learning to fine tune a pre-existing model to your dataset.

On this page

PyTorch Libraries

Movatterモバイル変換