Weekly Robotics logo
Weekly Robotics Beginner-friendly tutorials, every week
Seeing the World: 3D Cameras vs 2D Cameras
Sensors & Vision

Seeing the World: 3D Cameras vs 2D Cameras

What's the difference between a standard webcam and a depth camera? When does your robot need to see in 3D? This guide breaks down the options clearly.

Vision is one of the most powerful tools a robot can have. But “camera” covers a huge range of technologies, from a $10 USB webcam to a $1,000 time-of-flight sensor. Choosing the right one for your application is important — and not as complicated as it might seem.

2D Cameras: What They See

A standard camera captures a 2D image: a grid of pixels, each with a color value. What it cannot tell you is how far away anything is. A tennis ball 1 meter away and a tennis ball 5 meters away look identical in a 2D image (just different sizes), and the camera has no way to know the difference.

This is fine for many tasks:

  • Object detection and classification (“is there a person in this frame?”)
  • Line following (a robot that follows a line on the floor)
  • Color-based sorting
  • Reading QR codes or ArUco markers
  • Face detection

For these tasks, a USB webcam or a Raspberry Pi Camera Module is perfectly adequate and costs $10–30.

Getting Started with OpenCV

The standard library for computer vision in Python is OpenCV. Here’s a minimal example that captures frames and detects edges:

import cv2

cap = cv2.VideoCapture(0)  # 0 = first USB camera

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert to grayscale and detect edges
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)
    
    cv2.imshow('Original', frame)
    cv2.imshow('Edges', edges)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

3D Cameras: Adding Depth

A 3D camera (also called a depth camera or RGB-D camera) produces a depth map in addition to a color image. Each pixel in the depth map contains a distance measurement — how far away that point in the scene is from the camera.

The quick way to see the difference: a 2D camera knows what and where in the frame, but a depth camera also knows how far.

2D cameraDepth (3D) camera
Per-pixel dataColor onlyColor + distance
Knows distance?NoYes
Good forDetection, line following, markersObstacle avoidance, grasping, 3D mapping
Typical cost$10–30$150–600
Works in sunlight?YesDepends on technology

This opens up entirely new capabilities:

  • Obstacle avoidance (knowing exactly how far away obstacles are)
  • 3D mapping and navigation (building a map of the environment)
  • Grasping objects (a robot arm needs to know where in 3D space an object is)
  • Person tracking (following someone while maintaining a safe distance)

How Depth Cameras Work

There are three main technologies:

Structured Light (e.g., the original Microsoft Kinect v1, Orbbec Astra) Projects a known pattern of infrared light onto the scene. A separate IR camera captures the distorted pattern, and the distortion reveals depth. Works well indoors but struggles in bright sunlight. (Note: Intel’s RealSense D-series — D415, D435 — are stereo cameras, not structured light, even though they include an IR projector to add texture.)

Time-of-Flight (ToF) (e.g., Microsoft Azure Kinect) Emits pulses of infrared light and measures how long they take to return. Very fast and works in more lighting conditions, but can struggle with highly reflective or transparent surfaces. (Intel’s RealSense L515 was a popular solid-state LiDAR camera that worked on this principle, but it has since been discontinued.)

Stereo Vision (e.g., Intel RealSense D435, ZED Camera) Uses two cameras separated by a known distance (like human eyes). Computes depth by finding the same point in both images and measuring the disparity. Works outdoors, no active illumination needed.

object left camera right camera baseline (known spacing) same point, two angles
Stereo depth by triangulation: the same point appears in slightly different spots in the two images. That shift (the disparity) plus the known baseline lets the camera solve for distance — nearer objects shift more, farther objects shift less.

Comparison Table

FeatureUSB WebcamIntel RealSense D435ZED 2
Type2D RGBStereo + IRStereo
Depth rangeNone0.1–10 m0.2–20 m
Depth accuracyN/A~2% at 2m~1% at 2m
Outdoor useYesLimitedYes
ROS2 supportYesYesYes
Cost$10–30$150–200$450–600
Weight~80g~72g~166g

Getting Started with Intel RealSense

The Intel RealSense D435 is the most popular depth camera for robotics beginners. Here’s how to get a depth frame in Python:

import pyrealsense2 as rs
import numpy as np
import cv2

# Configure streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

pipeline.start(config)

try:
    while True:
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        
        if not depth_frame or not color_frame:
            continue
        
        # Convert to numpy arrays
        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())
        
        # Apply colormap to depth image for visualization
        depth_colormap = cv2.applyColorMap(
            cv2.convertScaleAbs(depth_image, alpha=0.03),
            cv2.COLORMAP_JET
        )
        
        # Get distance at center pixel
        cx, cy = 320, 240
        distance = depth_frame.get_distance(cx, cy)
        print(f"Distance at center: {distance:.2f} m")
        
        cv2.imshow('Color', color_image)
        cv2.imshow('Depth', depth_colormap)
        
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
    pipeline.stop()

Which Should You Choose?

Start with a 2D camera if:

  • You’re doing object detection, line following, or marker tracking
  • You’re working on a budget
  • Your robot operates in a structured environment where depth isn’t needed

Use a depth camera if:

  • Your robot needs to navigate autonomously in an unstructured environment
  • You’re building a robot arm that needs to grasp objects
  • You need to build a 3D map of the environment (SLAM)
  • You’re working on human-robot interaction

For most beginners, start with a USB webcam and OpenCV. Once you’ve built something that works with 2D vision, adding depth is a natural next step.

Next week, we’ll explore reinforcement learning frameworks — the software tools that let you train robots to learn behaviors from experience.