Technical WhitepaperProduction

BAVI Lycos X

"The wolf that hunts the ball."

Edge-Native Ball Trajectory Estimation with Temporal Physics Modeling

Lab: BlueX ResearchAuthors: J. Hwan, S. YoonPublished: January 2026Status: Production

Abstract

We present Lycos X, a lightweight neural architecture for real-time ball trajectory estimation achieving 98.2% recall at 5px tolerance with only ~200K parameters—a 75× reduction from TrackNet's 138M. The architecture combines a Temporal U-Net backbone with ConvLSTM bottleneck for explicit motion modeling, enabling the network to learn trajectory physics rather than static appearance.

Our key insight is that ball detection is fundamentally a prediction problem, not a classification problem. The network must predict where the ball will be, not just recognize where it appears in a single frame. This shift in framing drives our architectural decisions.

1

Problem Statement

Existing ball detection architectures suffer from fundamental limitations when deployed in real-world sports environments. These failures are not implementation bugs—they're structural consequences of architectural choices.

TrackNet (Huang et al., 2019)

Parameters138M
Real-world accuracy drop-24.5%
BackboneVGG16 (ImageNet)

Frame-stacking provides only implicit temporal information. VGG16 optimized for ImageNet classification, not trajectory physics.

YOLOv8-Nano (Ultralytics, 2023)

Jetson FPS6 FPS
Small object mAPSporadic
Temporal contextNone

Single-frame detection misses motion blur and occlusions. Too heavy for real-time edge deployment.

Root Cause Analysis

Both approaches treat ball detection as static object recognition— finding a ball-shaped object in an image. But in sports video, the ball is often:

  • Motion-blurred: High velocity creates elongated artifacts
  • Occluded: Behind players, net, or equipment
  • Sub-pixel: Too small for reliable single-frame detection
2

Lycos X Architecture

Lycos X reframes ball detection as a trajectory prediction problem. Instead of asking "where is the ball?", we ask "given the motion pattern across 5 frames, where will the ball be?" This enables the network to learn physics—velocity, acceleration, spin effects—rather than just appearance.

Temporal U-Net Backbone

Modified U-Net with 3D convolutions in the encoder path. Processes 5-frame temporal windows with shared weights across time steps. Skip connections preserve spatial detail while the bottleneck captures global motion patterns.

5-frame window3D Conv encoderSkip connections

ConvLSTM Bottleneck

The key innovation: a ConvLSTM layer at the U-Net bottleneck explicitly models temporal dependencies. Unlike frame-stacking (implicit), ConvLSTM learns to predict ball position based on velocity and acceleration patterns—actual trajectory physics.

Explicit motion modelingTrajectory physics

CBAM Attention Gates

Convolutional Block Attention Module applied at each decoder level. Dual channel-spatial attention reduces background noise by 84%, focusing compute on ball-relevant features while suppressing distractors (players, lines, crowds).

Channel attentionSpatial attention-84% background noise

Architecture Flow

5 Frames
3D Conv Encoder
ConvLSTM
CBAM Decoder
Heatmap
3

Benchmarks & Results

Performance Metrics

~200K
Parameters
98.2%
Recall @5px
1.8ms
Inference (GPU)
75×
Smaller vs TrackNet
ModelParamsRecallLatencyEdge-Ready
TrackNet138M73.7% (real-world)45msNo
YOLOv8-Nano3.2MSporadic166ms (Jetson)Marginal
Lycos X~200K98.2%1.8msYes

Key Findings

  • Temporal modeling matters more than model size: ConvLSTM bottleneck outperforms 690× larger models by learning physics instead of appearance
  • Attention is cost-effective: CBAM adds minimal compute but dramatically reduces false positives from background clutter
  • Edge deployment is achievable: Sub-2ms inference enables real-time tracking at 500+ FPS on modern GPUs
4

Production Deployment

Infrastructure

  • • AWS g5.2xlarge (A10G)
  • • CUDA 12.x / TensorRT
  • • FastAPI inference server

Pipeline

  • • FFmpeg preprocessing
  • • Batch inference (32 frames)
  • • Kalman filter smoothing

Throughput

  • • Port 8001 (YOLO)
  • • Port 8002 (TrackNet)
  • • Port 8003 (Long Tracking)

References

[1] Ronneberger, O. et al. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI.

[2] Shi, X. et al. (2015). Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. NeurIPS.

[3] Woo, S. et al. (2018). CBAM: Convolutional Block Attention Module. ECCV.

[4] Huang, Y. et al. (2019). TrackNet: A Deep Learning Network for Tracking High-speed Objects. AVSS.