Fast-Moving Object Detection
Iterative Self-Training for Sub-Pixel Trajectory Estimation
Abstract
This case study documents our iterative training methodology for detecting fast-moving objects in sports video, achieving >96% recall from a baseline of 49.3%. Using tennis ball detection as our primary domain, we developed a self-training pipeline that combines physics-based filtering with progressive dataset refinement across 42,863 images from 12 heterogeneous sources.
The methodology generalizes to other fast-moving objects (shuttlecocks, golf balls, hockey pucks) where traditional single-frame detection fails due to motion blur, sub-pixel size, and sporadic occlusion.
The Fast Object Detection Challenge
Fast-moving objects in sports video present a unique detection challenge. Unlike static objects or slow-moving targets, they exhibit characteristics that break standard detection assumptions.
Motion Blur
At 200+ km/h, a tennis ball travels 5-15 pixels per frame at 30fps, creating elongated blur artifacts that don't match training distributions.
Sub-Pixel Size
In wide-angle broadcast footage, the ball occupies 8-20 pixels diameter—smaller than most anchor boxes in standard detectors.
Sporadic Occlusion
Ball frequently occluded by players, net, racket—requiring trajectory interpolation rather than frame-by-frame detection.
Baseline Performance Problem
Off-the-shelf models trained on general object detection datasets (COCO, ImageNet) perform poorly on fast-moving sports objects:
High precision but low recall = model is conservative, missing most actual ball instances.
Iterative Self-Training Methodology
We developed an iterative self-training pipeline that progressively improves model performance through automated labeling, physics-based verification, and dataset refinement.
Training Loop
Physics-Based Filtering
Trajectory constraints based on real ball physics eliminate impossible detections:
max_velocity: 150 px/frame≈ 250 km/h at 1080p/30fps
trajectory_smoothness: 0.85Penalizes erratic jumps
Negative Sampling Strategy
Include 20% empty frames (no ball visible) in training to reduce false positives from ball-like objects (round logos, white court markings, spectator clothing).
Multi-Source Dataset Unification
Combined 12 heterogeneous sources with varying camera angles, resolutions, and court surfaces to maximize generalization:
Training Configuration
Dataset Statistics
- Total Images42,863
- Train Split34,285 (80%)
- Validation Split4,288 (10%)
- Test Split4,290 (10%)
- Data Sources12
Model Configuration
- ArchitectureYOLO11
- Variantsnano, medium
- Image Size640-960px
- Batch Size8-16
- Epochs100+
Augmentation Strategy
Custom Loss Weights
Tuned for small object detection with emphasis on localization accuracy:
box: 7.5Localization priority
cls: 0.5Single class
dfl: 1.5Distribution focal
Results & Benchmarks
Performance Progression
| Model Version | Precision | Recall | mAP50 | Status |
|---|---|---|---|---|
| Roboflow v12 (baseline) | 80.3% | 49.3% | 33.6% | |
| Custom v3 | 87.2% | 72.1% | 68.4% | Improved |
| Unified v1 (medium) | 92.8% | 89.4% | 91.2% | Good |
| Unified v2 (nano) | ~94% | >96% | ~98% |
Key Finding
The nano model outperformed the medium model in production benchmarks. Why? The unified dataset with physics filtering produced cleaner labels that smaller models could learn more effectively. Larger models overfit to noise in earlier, messier datasets.
Generalization to Other Domains
The iterative self-training methodology generalizes to other fast-moving object detection tasks. The physics-based filtering approach adapts by changing velocity constraints and trajectory models.
Applicable Domains
- Badminton shuttlecock (faster, smaller)
- Golf ball tracking (extreme speed)
- Hockey puck (low contrast on ice)
- Table tennis (extreme motion blur)
Transferable Components
- Iterative self-training loop
- Physics-based trajectory filtering
- Negative sampling strategy
- Multi-source dataset unification
References
[1] Jocher, G. et al. (2023). Ultralytics YOLO. GitHub repository.
[2] Huang, Y. et al. (2019). TrackNet: A Deep Learning Network for Tracking High-speed Objects. AVSS.
[3] Lin, T. et al. (2017). Focal Loss for Dense Object Detection. ICCV.