Target Detection Pipeline

Our target detection pipeline processes object detection, localization, and mapping during competition missions. This system employs a multithreaded "aggregator" that processes aerial images in parallel to identify ground targets and provide GPS coordinates for airdrop missions.

Overall flow of each image

1. Image Acquisition and Preprocessing

The CV pipeline begins with getting the raw images from the plane's camera (or from the mock camera during simulated tests). The images first undergo initial preprocessing to optimize them for subsequent analysis:

Artifact Removal: We first remove camera-specific artifacts, such as green bars or distortions introduced by our camera.
Geometric Corrections: The images are then cropped and resized to standardized dimensions while maintaining aspect ratios
Quality Optimization: The images are also prepared for inferencing through normalization and format standardization
Letterbox Preprocessing: Images are padded and scaled to maintain aspect ratios while fitting the model's input requirements

2. Object Detection Framework

Our core detection mechanism utilizes a YOLO11 model implemented via ONNX Runtime, which allows us to use it for inferencing in C++:

Inference: The YOLO model processes images to identify potential targets with associated confidence scores
Post-processing: Detected objects are filtered based on confidence thresholds and transformed back to original image coordinates
Bounding Box Generation: Each detection produces bounding boxes around identified objects

3. Geospatial Localization

Our system employs the Ground Sample Distance (GSD) method to convert pixel coordinates to real-world GPS positions:

The ground coverage per pixel is calculated based on aircraft altitude and camera specifications
Trigonometric transformations are also used to account for aircraft heading and attitude

System Architecture

Pipeline

The above steps are condensed into a "pipeline" component, which essentially coordinates the entire detection workflow for every image we capture. We've summarized the above steps again here:

Input Validation: The pipeline first reads in the images and ensures that each image has its associated telemetry information
Sequential Processing: It then manages the flow from preprocessing through object detection to localization
Result Output and Annotation: Finally, the pipeline outputs each target's corresponding bounding boxes, confidence scores, and GPS coordinates. It also overlays this information directly on top of the images for manual verification

Multithreaded Aggregator

The "aggregator" spawns concurrent pipelines to speed up inferencing as well as accumulating the results:

Thread Pool Management: Spawns worker threads for parallel image processing under memory constraints
Queue Management: Handles overflow queues for high-frequency image streams
Result Consolidation: Aggregates detection results across multiple images and time periods
Memory Management: Efficiently handles large image datasets and processing results

Manual Verification

While the above are all implemented in the OBC, we still currently require manual verification of the detections via the GCS. The information of every result (the annotated image and its corresponding bounding boxes and coordinates) is sent to the GCS for us to review manually. We hope to automate this process via a clustering algorithm in the future.

Pipeline flowchart

Image of the pipeline flow