Object Detection with OpenCV and YOLO

A robot that can see is far more capable than one that can only feel its way around with bump sensors. With a cheap USB camera and a few lines of Python, you can have your robot recognize people, cups, chairs, and dozens of other everyday objects — and tell you exactly where they are in the frame. This week we build a live object detector with OpenCV and YOLO that runs on a Raspberry Pi.

Classical CV vs. Deep-Learning Detection

Classical computer vision (the kind built into OpenCV for years) works by hand-written rules: find edges, match colors, detect blobs of a certain shape. It is fast and predictable, but brittle. A red-ball tracker tuned for your living room will fail the moment the lighting changes.

Deep-learning object detection flips the approach. Instead of you writing the rules, a neural network learns them from thousands of labeled example images. The result generalizes far better — it recognizes a “cup” whether it is white, blue, in shadow, or half-hidden. The cost is more computation, which is why we care so much about model size on a small board like a Pi.

What YOLO Is

YOLO (“You Only Look Once”) is a family of real-time object detection networks. Its key trick is in the name: it looks at the whole image once and predicts every object in a single pass, rather than scanning the image region by region. That single-shot design is what makes it fast enough for live video.

For each object it finds, YOLO outputs three things:

A bounding box — the rectangle around the object.
A class label — what the object is (“person”, “cup”, “dog”).
A confidence score — how sure the model is, from 0 to 1.

The Easy Modern Path: Ultralytics

You no longer need to wrangle weights files and config by hand. The Ultralytics package wraps YOLO in a clean Python API. Install it together with OpenCV:

pip install ultralytics opencv-python

The first time you load a model, Ultralytics automatically downloads the pretrained weights for you. We will use yolov8n.pt — the “nano” model. It is the smallest and fastest of the YOLOv8 family, which makes it the right starting point for a Raspberry Pi.

A Complete Live Detector

This script opens a USB camera, runs YOLO on each frame, draws the results, and shows them in a window. Press q to quit.

import cv2
from ultralytics import YOLO

# Load the smallest pretrained model (downloads on first run)
model = YOLO('yolov8n.pt')

# Open the first camera. Change the index if you have several.
cap = cv2.VideoCapture(0)

# Lower the resolution to keep the Pi responsive
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

if not cap.isOpened():
    raise RuntimeError("Could not open camera. Try a different index.")

while True:
    ok, frame = cap.read()
    if not ok:
        print("Failed to grab frame")
        break

    # Run detection. conf=0.5 ignores low-confidence guesses.
    results = model(frame, conf=0.5)

    # results[0].plot() returns the frame with boxes + labels drawn
    annotated = results[0].plot()

    cv2.imshow("YOLO Object Detection", annotated)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

The conf=0.5 argument is your confidence threshold — detections the model is less than 50% sure about are discarded. Raise it if you see false positives; lower it if real objects are being missed.

A Realistic Word on Raspberry Pi Performance

Here is the honest truth: object detection is heavy, and a Raspberry Pi is a small computer. Running yolov8n on the CPU of a Pi 4 or Pi 5 will likely give you only a few frames per second — not the smooth 30 FPS you might expect. That is normal. A few things help:

Lower the camera resolution (already done above with 640×480). Fewer pixels means less work.
Stick with the nano model (yolov8n.pt). Larger models like yolov8s or yolov8m are more accurate but far too slow for a Pi.
Run detection every Nth frame. You rarely need to detect on all 30 frames per second. Detect every 5th frame and reuse the last result in between — your effective frame rate for display stays high.
Add a hardware accelerator. A Coral USB Accelerator offloads the neural network to a dedicated chip and can lift you to real-time speeds.

Here is the “every Nth frame” idea applied to the loop:

frame_count = 0
last_results = None

while True:
    ok, frame = cap.read()
    if not ok:
        break

    # Only run the network every 5th frame
    if frame_count % 5 == 0:
        last_results = model(frame, conf=0.5)
    frame_count += 1

    annotated = last_results[0].plot() if last_results else frame
    cv2.imshow("YOLO Object Detection", annotated)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

Troubleshooting

Camera not found / black window. cv2.VideoCapture(0) opens the first camera. If you have a built-in camera plus a USB one, the index you want may be 1 or 2. Try each index. On Linux, ls /dev/video* lists available camera devices.

Very low FPS. This is expected on a Pi. Lower the resolution, run detection every Nth frame, and make sure you are using the nano model. If you truly need real-time, add a Coral accelerator.

Model fails to download. The first run fetches yolov8n.pt from the internet, so the Pi needs a network connection. If it hangs, check connectivity, or download the file on another machine and point YOLO('/path/to/yolov8n.pt') at it directly.

imshow errors with no display. If you are running headless over SSH, there is no screen to draw on. Either enable X forwarding (ssh -X) or save annotated frames to disk with cv2.imwrite() instead of showing them.

Next week we move from pixels to coordinate frames: managing the geometry of your robot with TF2 in ROS 2.