What happens when you strap a camera to a drone, feed the video to a YOLO object detection model, and process everything on a Raspberry Pi — all in real time? That's exactly what I built for my AI Aerial Surveillance System project.

This post is a behind-the-scenes look at the entire build: the hardware I chose (and why), the software architecture, the problems I didn't expect, and what I learned along the way.

The Goal

Build an autonomous aerial surveillance system that can:

  • Fly a predefined path using GPS waypoints
  • Detect humans, vehicles, and suspicious objects in real-time
  • Stream detections to a ground station dashboard
  • Operate in low-light conditions
  • Run entirely on edge hardware (no cloud dependency during flight)

The constraint: everything had to run on-device. Sending video to the cloud and back would add too much latency for real-time detection. The AI had to live on the drone itself.

The Hardware Stack

Choosing the right hardware was a two-week process of benchmarking, testing, and compromising:

Compute: Raspberry Pi 4B (8GB)

I started with a Raspberry Pi 4B. It's lightweight (46g), cheap ($75), and has enough compute for a small YOLO model — but just barely. The Pi's CPU alone couldn't handle real-time inference, so I added a Google Coral USB Accelerator for dedicated ML inference. This single addition brought inference time from ~800ms per frame to ~45ms.

Camera: Raspberry Pi Camera Module 3

The official Pi camera does 1080p at 30fps, has autofocus, and weighs only 3g. For a drone, weight is everything. I also experimented with a NoIR (no infrared filter) version for low-light operations.

Drone Frame: S500 Quadcopter

The S500 frame with 2212 920KV motors gives enough lift for the Pi, camera, Coral, battery, and flight controller with room to spare. Total payload capacity: ~800g. My total compute payload: ~180g.

Flight Controller: Pixhawk 4

The Pixhawk runs ArduPilot firmware and handles all flight operations — GPS navigation, stabilization, autonomous waypoint following. The Pi communicates with it over serial using MAVLink protocol.

Software Architecture

The software runs as three independent processes that communicate through shared memory and message queues:

# System Architecture
processes:
  - name: capture
    role: Camera frame capture at 30fps
    output: Shared memory buffer (ring buffer)
  
  - name: detector  
    role: YOLO inference on Coral TPU
    input: Latest frame from buffer
    output: Detection results (class, bbox, confidence)
  
  - name: comms
    role: MAVLink + ground station telemetry
    input: Detection results + GPS data
    output: Alerts, live dashboard updates

The Detection Pipeline

The core detection loop is surprisingly simple:

from pycoral.utils.edgetpu import make_interpreter
from pycoral.adapters import detect
import cv2

# Load YOLOv5s model compiled for Edge TPU
interpreter = make_interpreter('yolov5s_edgetpu.tflite')
interpreter.allocate_tensors()

# Detection loop
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Preprocess: resize to model input size
    input_frame = cv2.resize(frame, (320, 320))
    
    # Run inference on Coral TPU
    interpreter.set_tensor(input_index, input_frame)
    interpreter.invoke()
    
    # Parse detections
    detections = parse_output(interpreter, confidence_threshold=0.5)
    
    for det in detections:
        label = det['class']
        confidence = det['score']
        bbox = det['bbox']
        
        # Alert on high-priority targets
        if label in ['person', 'vehicle'] and confidence > 0.7:
            send_alert(label, confidence, gps_coords)
            draw_bbox(frame, bbox, label, confidence)

The Challenges Nobody Warns You About

1. Thermal Throttling at Altitude

The Raspberry Pi throttles its CPU when it gets too hot (85°C). At ground level, this was rarely a problem. But at altitude, inside a vibrating drone with no case ventilation — the Pi would hit thermal limits within 3 minutes.

Solution: A tiny aluminum heatsink + a 5V micro fan strapped to the GPIO header. Added 8g of weight but kept the CPU at 65°C even during sustained inference.

2. Vibration-Induced Blur

Drone motors create constant vibration. At 30fps, even slight vibration causes motion blur that kills detection accuracy. My initial tests showed a 40% drop in detection confidence compared to ground-based testing.

Solution: Anti-vibration mounting (rubber dampeners between the camera plate and the frame) + reducing camera exposure time to 1/500s. This traded some low-light performance for sharper images.

3. Power Budget

The Pi, Coral, and camera together draw ~15W. The flight battery (4S 5200mAh LiPo) needs to power both the motors and the compute stack. I ended up using a separate 2S 2200mAh LiPo dedicated to the compute payload with a BEC (Battery Eliminator Circuit) providing stable 5V.

Total flight time with AI running: 12 minutes. Without AI payload: ~18 minutes. The AI cost me about 6 minutes of flight time — a significant trade-off.

4. Model Size vs Accuracy

YOLOv5s (small) was the largest model I could run on the Coral at acceptable speeds. But smaller models mean lower accuracy, especially for distant or partially occluded objects.

# Model benchmarks on Coral USB Accelerator
# ┌───────────────┬──────────┬───────────┬──────────────┐
# │ Model         │ Size(MB) │ mAP@0.5   │ Inference(ms)│
# ├───────────────┼──────────┼───────────┼──────────────┤
# │ YOLOv5n       │   3.8    │  28.0%    │     22       │
# │ YOLOv5s       │  14.1    │  37.4%    │     45       │
# │ YOLOv5m       │  42.2    │  45.4%    │    185 ❌    │
# └───────────────┴──────────┴───────────┴──────────────┘
# ❌ = Too slow for real-time (>100ms)

I went with YOLOv5s at 320x320 resolution as the best speed-accuracy trade-off.

The Ground Station

The drone streams detection data over a 915MHz telemetry radio to a laptop running a custom dashboard built with Python + Flask. The dashboard shows:

  • Live GPS position on a map
  • Detection log with timestamps and confidence scores
  • Battery voltage and temperature readings
  • Alert notifications for high-priority detections

Video streaming was intentionally not included over telemetry. The 915MHz radio only supports ~19.2 kbps — enough for telemetry data, but nowhere near enough for video. Video is recorded on-device to a micro SD card for post-flight review.

Results

After three months of building, crashing (twice), debugging, and iterating:

  • Detection range: Reliable person detection up to ~30m altitude
  • Inference speed: 22 FPS average (45ms per frame)
  • Accuracy: 78% mAP for persons, 72% for vehicles at operational altitude
  • Flight time: 12 minutes with full AI payload
  • Total cost: ~$450 (excluding the laptop ground station)

Lessons Learned

  1. Prototype on the ground first. I wasted weeks debugging ML issues in flight when I should have perfected the detection pipeline at my desk with test footage.
  2. Weight matters more than specs. A slightly slower model that weighs 50g less gives you 2 more minutes of flight time — and flight time is king.
  3. Edge AI is a different game. Models that work great on a desktop GPU behave very differently on a TPU. Quantization artifacts, memory limits, and thermal constraints change everything.
  4. Build for failure. The drone will crash. The SD card will corrupt. The Coral will disconnect mid-flight. Build graceful degradation into every layer.
  5. Document obsessively. Future-you will not remember why you chose that specific camera angle or that baud rate. Write it down.

What's Next

I'm currently working on version 2 with an NVIDIA Jetson Nano replacing the Pi + Coral combo. The Jetson's 128-core GPU should let me run YOLOv5m in real-time, significantly improving detection accuracy. I'm also exploring multi-drone coordination — multiple drones covering a larger area and sharing detection data.

If you're interested in the code, the detection pipeline and ground station are on my GitHub. Feel free to fork, improve, and build your own version.