Object Detection with OpenCV and YOLO
Teach your robot to recognize objects in real time using OpenCV and the YOLO neural network. Runs on a Raspberry Pi with a standard USB camera.
A robot that can see is far more capable than one that can only feel its way around with bump sensors. With a cheap USB camera and a few lines of Python, you can have your robot recognize people, cups, chairs, and dozens of other everyday objects — and tell you exactly where they are in the frame. This week we build a live object detector with OpenCV and YOLO that runs on a Raspberry Pi.
Classical CV vs. Deep-Learning Detection
Classical computer vision (the kind built into OpenCV for years) works by hand-written rules: find edges, match colors, detect blobs of a certain shape. It is fast and predictable, but brittle. A red-ball tracker tuned for your living room will fail the moment the lighting changes.
Deep-learning object detection flips the approach. Instead of you writing the rules, a neural network learns them from thousands of labeled example images. The result generalizes far better — it recognizes a “cup” whether it is white, blue, in shadow, or half-hidden. The cost is more computation, which is why we care so much about model size on a small board like a Pi.
What YOLO Is
YOLO (“You Only Look Once”) is a family of real-time object detection networks. Its key trick is in the name: it looks at the whole image once and predicts every object in a single pass, rather than scanning the image region by region. That single-shot design is what makes it fast enough for live video.
For each object it finds, YOLO outputs three things:
- A bounding box — the rectangle around the object.
- A class label — what the object is (“person”, “cup”, “dog”).
- A confidence score — how sure the model is, from 0 to 1.
The Easy Modern Path: Ultralytics
You no longer need to wrangle weights files and config by hand. The Ultralytics package wraps YOLO in a clean Python API. Install it together with OpenCV:
pip install ultralytics opencv-python
The first time you load a model, Ultralytics automatically downloads the pretrained weights for you. We will use yolov8n.pt — the “nano” model. It is the smallest and fastest of the YOLOv8 family, which makes it the right starting point for a Raspberry Pi.
A Complete Live Detector
This script opens a USB camera, runs YOLO on each frame, draws the results, and shows them in a window. Press q to quit.
import cv2
from ultralytics import YOLO
# Load the smallest pretrained model (downloads on first run)
model = YOLO('yolov8n.pt')
# Open the first camera. Change the index if you have several.
cap = cv2.VideoCapture(0)
# Lower the resolution to keep the Pi responsive
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
if not cap.isOpened():
raise RuntimeError("Could not open camera. Try a different index.")
while True:
ok, frame = cap.read()
if not ok:
print("Failed to grab frame")
break
# Run detection. conf=0.5 ignores low-confidence guesses.
results = model(frame, conf=0.5)
# results[0].plot() returns the frame with boxes + labels drawn
annotated = results[0].plot()
cv2.imshow("YOLO Object Detection", annotated)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The conf=0.5 argument is your confidence threshold — detections the model is less than 50% sure about are discarded. Raise it if you see false positives; lower it if real objects are being missed.
A Realistic Word on Raspberry Pi Performance
Here is the honest truth: object detection is heavy, and a Raspberry Pi is a small computer. Running yolov8n on the CPU of a Pi 4 or Pi 5 will likely give you only a few frames per second — not the smooth 30 FPS you might expect. That is normal. A few things help:
- Lower the camera resolution (already done above with 640×480). Fewer pixels means less work.
- Stick with the nano model (
yolov8n.pt). Larger models likeyolov8soryolov8mare more accurate but far too slow for a Pi. - Run detection every Nth frame. You rarely need to detect on all 30 frames per second. Detect every 5th frame and reuse the last result in between — your effective frame rate for display stays high.
- Add a hardware accelerator. A Coral USB Accelerator offloads the neural network to a dedicated chip and can lift you to real-time speeds.
Here is the “every Nth frame” idea applied to the loop:
frame_count = 0
last_results = None
while True:
ok, frame = cap.read()
if not ok:
break
# Only run the network every 5th frame
if frame_count % 5 == 0:
last_results = model(frame, conf=0.5)
frame_count += 1
annotated = last_results[0].plot() if last_results else frame
cv2.imshow("YOLO Object Detection", annotated)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Troubleshooting
Camera not found / black window. cv2.VideoCapture(0) opens the first camera. If you have a built-in camera plus a USB one, the index you want may be 1 or 2. Try each index. On Linux, ls /dev/video* lists available camera devices.
Very low FPS. This is expected on a Pi. Lower the resolution, run detection every Nth frame, and make sure you are using the nano model. If you truly need real-time, add a Coral accelerator.
Model fails to download. The first run fetches yolov8n.pt from the internet, so the Pi needs a network connection. If it hangs, check connectivity, or download the file on another machine and point YOLO('/path/to/yolov8n.pt') at it directly.
imshow errors with no display. If you are running headless over SSH, there is no screen to draw on. Either enable X forwarding (ssh -X) or save annotated frames to disk with cv2.imwrite() instead of showing them.
Next week we move from pixels to coordinate frames: managing the geometry of your robot with TF2 in ROS 2.