Computer Vision Overview

There are three main components for our computer vision (CV) system: target detection pipeline, mapping, and synthetic data generation. A brief overview of each part is detailed below. For any specifics, visit their respective pages.

Target Detection Pipeline

Our main CV pipeline is currently designed to use YOLO11 for target detection, and then we map the detections to geographic locations (latitude, longitude, altitude) with our localization software. Finally, we manually verify the detection results and accept or reject each detection. For every image we capture from the plane, it goes through this pipeline. In the future, we plan to run a clustering algorithm on all detected targets in every processed image to automatically extract such information with high confidence.

Mapping System

Our mapping system creates a large-scale aerial panorama by stitching together multiple overlapping images taken from the plane via OpenCV. We do this in a two-pass process, where the first pass breaks all images collected into smaller overlapping chunks and stitches each of these chunks, and the second pass stitches the results of the first pass together. This helps us handle large volumes of images while managing memory constraints.

Synthetic Dataset Generation

Our synthetic data generation, more commonly known as "not-stolen", is responsible for generating synthetic aerial images that aim to simulate pictures we would take in the competition. By layering a set of targets on top of background aerial images of the runway of the competition venue, we use this system to test and benchmark our target detection pipeline. This allows us to quickly swap between different models and test which works the best for our use cases.