Seeing the World: 3D Cameras vs 2D Cameras

Vision is one of the most powerful tools a robot can have. But “camera” covers a huge range of technologies, from a $10 USB webcam to a $1,000 time-of-flight sensor. Choosing the right one for your application is important — and not as complicated as it might seem.

2D Cameras: What They See

A standard camera captures a 2D image: a grid of pixels, each with a color value. What it cannot tell you is how far away anything is. A tennis ball 1 meter away and a tennis ball 5 meters away look identical in a 2D image (just different sizes), and the camera has no way to know the difference.

This is fine for many tasks:

Object detection and classification (“is there a person in this frame?”)
Line following (a robot that follows a line on the floor)
Color-based sorting
Reading QR codes or ArUco markers
Face detection

For these tasks, a USB webcam or a Raspberry Pi Camera Module is perfectly adequate and costs $10–30.

Getting Started with OpenCV

The standard library for computer vision in Python is OpenCV. Here’s a minimal example that captures frames and detects edges:

import cv2

cap = cv2.VideoCapture(0)  # 0 = first USB camera

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert to grayscale and detect edges
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)
    
    cv2.imshow('Original', frame)
    cv2.imshow('Edges', edges)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

3D Cameras: Adding Depth

A 3D camera (also called a depth camera or RGB-D camera) produces a depth map in addition to a color image. Each pixel in the depth map contains a distance measurement — how far away that point in the scene is from the camera.

The quick way to see the difference: a 2D camera knows what and where in the frame, but a depth camera also knows how far.

	2D camera	Depth (3D) camera
Per-pixel data	Color only	Color + distance
Knows distance?	No	Yes
Good for	Detection, line following, markers	Obstacle avoidance, grasping, 3D mapping
Typical cost	$10–30	$150–600
Works in sunlight?	Yes	Depends on technology

This opens up entirely new capabilities:

Obstacle avoidance (knowing exactly how far away obstacles are)
3D mapping and navigation (building a map of the environment)
Grasping objects (a robot arm needs to know where in 3D space an object is)
Person tracking (following someone while maintaining a safe distance)

How Depth Cameras Work

There are three main technologies:

Structured Light (e.g., the original Microsoft Kinect v1, Orbbec Astra) Projects a known pattern of infrared light onto the scene. A separate IR camera captures the distorted pattern, and the distortion reveals depth. Works well indoors but struggles in bright sunlight. (Note: Intel’s RealSense D-series — D415, D435 — are stereo cameras, not structured light, even though they include an IR projector to add texture.)

Time-of-Flight (ToF) (e.g., Microsoft Azure Kinect) Emits pulses of infrared light and measures how long they take to return. Very fast and works in more lighting conditions, but can struggle with highly reflective or transparent surfaces. (Intel’s RealSense L515 was a popular solid-state LiDAR camera that worked on this principle, but it has since been discontinued.)

Stereo Vision (e.g., Intel RealSense D435, ZED Camera) Uses two cameras separated by a known distance (like human eyes). Computes depth by finding the same point in both images and measuring the disparity. Works outdoors, no active illumination needed.

Stereo depth by triangulation: the same point appears in slightly different spots in the two images. That shift (the disparity) plus the known baseline lets the camera solve for distance — nearer objects shift more, farther objects shift less.

Comparison Table

Feature	USB Webcam	Intel RealSense D435	ZED 2
Type	2D RGB	Stereo + IR	Stereo
Depth range	None	0.1–10 m	0.2–20 m
Depth accuracy	N/A	~2% at 2m	~1% at 2m
Outdoor use	Yes	Limited	Yes
ROS2 support	Yes	Yes	Yes
Cost	$10–30	$150–200	$450–600
Weight	~80g	~72g	~166g

Getting Started with Intel RealSense

The Intel RealSense D435 is the most popular depth camera for robotics beginners. Here’s how to get a depth frame in Python:

import pyrealsense2 as rs
import numpy as np
import cv2

# Configure streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

pipeline.start(config)

try:
    while True:
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        
        if not depth_frame or not color_frame:
            continue
        
        # Convert to numpy arrays
        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())
        
        # Apply colormap to depth image for visualization
        depth_colormap = cv2.applyColorMap(
            cv2.convertScaleAbs(depth_image, alpha=0.03),
            cv2.COLORMAP_JET
        )
        
        # Get distance at center pixel
        cx, cy = 320, 240
        distance = depth_frame.get_distance(cx, cy)
        print(f"Distance at center: {distance:.2f} m")
        
        cv2.imshow('Color', color_image)
        cv2.imshow('Depth', depth_colormap)
        
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
    pipeline.stop()

Which Should You Choose?

Start with a 2D camera if:

You’re doing object detection, line following, or marker tracking
You’re working on a budget
Your robot operates in a structured environment where depth isn’t needed

Use a depth camera if:

Your robot needs to navigate autonomously in an unstructured environment
You’re building a robot arm that needs to grasp objects
You need to build a 3D map of the environment (SLAM)
You’re working on human-robot interaction

For most beginners, start with a USB webcam and OpenCV. Once you’ve built something that works with 2D vision, adding depth is a natural next step.

Next week, we’ll explore reinforcement learning frameworks — the software tools that let you train robots to learn behaviors from experience.