What is V-JEPA AI for sports analytics?

V-JEPA (Video Joint Embedding Predictive Architecture) is a cutting-edge AI model developed by Meta that BlueX integrates for sports video analysis. It excels at understanding temporal patterns in video, enabling automated action discovery, pose-based metrics extraction, and movement classification without requiring manual annotation.

How accurate is BlueX ball tracking?

BlueX achieves 92.4% inference accuracy for motion detection and ball tracking using our proprietary multi-model AI pipeline. This includes real-time object detection, 3D pose estimation, and spatiotemporal coordinate mapping across thousands of frames per second.

What sports does BlueX support?

BlueX supports a wide range of sports including football (soccer), tennis, basketball, martial arts (taekwondo), and other dynamic sports. Our AI engine is designed to analyze any sport involving human movement, ball tracking, and performance metrics.

Do I need special cameras for BlueX?

No special cameras or sensors are required. BlueX works with any standard video source - from smartphone footage to professional broadcast streams. Our cloud-native AI processes video data on high-performance GPU grids, delivering elite analytics without any on-site infrastructure investment.

Back to Research

Case StudyProduction

Fighter Detection & Pose Analysis

Multi-Model Pipeline for Combat Sports Analytics

Lab: BlueX ResearchDomain: Taekwondo AnalysisPublished: January 2026Status: Production

Abstract

This case study documents our multi-model pipeline for fighter detection and pose analysis in combat sports video. The system achieves real-time performance (<100ms latency) by combining custom-trained YOLOv8 for fighter identification, YOLOv8-Pose for 33-landmark skeletal tracking, and ST-GCN for kick technique classification.

Developed for taekwondo match analysis, the methodology generalizes to other combat sports (MMA, boxing, karate) where dual-fighter tracking, pose estimation, and action recognition are required at broadcast-quality frame rates.

The Combat Sports Detection Challenge

Combat sports video presents unique challenges for computer vision. Unlike static object detection or single-person pose estimation, the domain requires simultaneous tracking of multiple rapidly-moving, frequently-occluding subjects with real-time performance constraints.

Dual-Fighter Tracking

Two fighters must be tracked with consistent IDs across frames, even during clinches, spins, and rapid exchanges where bounding boxes overlap.

Identity Preservation

Fighters must be distinguished by corner (red vs. blue hogu) across occlusions, similar poses, and varying camera angles.

Real-Time Constraint

Match analysis requires 30+ FPS processing to capture rapid kick sequences, while streaming results to the client via SSE.

Why Generic Person Detection Fails

Standard COCO-trained person detectors cannot distinguish between fighters by corner color. Generic pose estimators struggle with combat-specific poses (high kicks, spinning techniques) that are underrepresented in general datasets.

ID Swaps

During clinches

No Color

Corner distinction

70% Pose

High kick accuracy

Multi-Model Pipeline Architecture

The system employs a cascaded architecture where specialized models handle distinct tasks. Each component is optimized for its specific function, enabling parallel execution on GPU and modular upgrades.

Processing Pipeline

YOLOv8n-TKD

→

YOLOv8L-Pose

→

ST-GCN

→

SSE Stream

Fighter Detection→Pose Estimation→Technique Classification→Real-time Output

Stage 1: Fighter Detection (YOLOv8n-TKD)

Custom-trained YOLOv8 nano model for taekwondo-specific fighter detection. Trained to recognize red and blue hogu (chest protector) as separate classes.

Classes: [blue, red]

Corner-specific detection

Model: yolov8n

3.2M parameters

Latency: ~8ms

Per-frame inference

Stage 2: Pose Estimation (YOLOv8L-Pose)

Native YOLO pose estimation model providing 17 keypoints per detected person. Handles multiple people naturally, enabling simultaneous dual-fighter tracking.

Keypoints: 17

COCO format

Latency: ~25ms

Dual-person

Key joints for Taekwondo: Hip (center of mass), Knees (kick initiation), Ankles (strike contact), Shoulders (rotation), Wrists (punch detection)

Stage 3: Technique Classification (ST-GCN)

Spatial-Temporal Graph Convolutional Network for classifying kick techniques from pose sequences. Takes 15-30 frame windows of skeletal data.

돌려차기앞차기옆차기뒤차기내려차기후려차기뒤후려차기540 Kick몸돌려차기

Custom Training for Fighter Detection

The key innovation is training a lightweight YOLO model specifically for hogu color detection, enabling consistent fighter identification without complex re-identification networks.

Training Data

Source Videos100+
Labeled Frames10,000+
Pose Sequences50,000+
SourceWorld Taekwondo

Model Configuration

Base ModelYOLOv8n
Classes2 (blue, red)
Image Size640px
Parameters~3.2M

Kick Annotation Schema

{
  "video_id": "match_001",
  "frame_start": 1200,
  "frame_end": 1230,
  "technique": "roundhouse",  // 돌려차기
  "fighter": "red",
  "target": "body",
  "contact": true,
  "score": 2
}

Performance Results

System Performance

<100ms

End-to-end latency

30+ FPS

Processing rate

90%+

Kick accuracy

<8GB

GPU memory

Model Stage	Latency	GPU Memory	Accuracy
YOLOv8n-TKD (Fighter Detection)	~8ms	~1GB	95%+
YOLOv8L-Pose (Pose Estimation)	~25ms	~3GB	92%
ST-GCN (Kick Classification)	~40ms	~2GB	90%
Total Pipeline	<100ms	<8GB

Per-Fighter Output Statistics

{
  "fighter_id": "red",
  "round": 2,
  "statistics": {
    "total_strikes": 45,
    "kicks": {
      "roundhouse": 18,
      "front": 12,
      "side": 8,
      "back": 4,
      "axe": 3
    },
    "punches": 12,
    "strikes_per_minute": 22.5,
    "head_strikes": 15,
    "body_strikes": 30,
    "accuracy": 0.67,
    "territory_control": 0.55
  }
}

Generalization to Other Combat Sports

The multi-model pipeline architecture generalizes to other combat sports with minimal retraining. The key adaptation is training the Stage 1 detector for sport-specific visual markers (glove colors, uniform patterns) and the Stage 3 classifier for sport-specific techniques.

Applicable Domains

Boxing (glove color detection)
MMA (corner-based identification)
Karate (belt color, uniform)
Fencing (weapon + uniform tracking)

Transferable Components

Multi-model cascade architecture
SSE real-time streaming pipeline
Pose estimation backbone
Statistics aggregation framework

References

[1] Yan, S. et al. (2018). Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. AAAI.

[2] Jocher, G. et al. (2023). Ultralytics YOLOv8. GitHub repository.

[3] Lugaresi, C. et al. (2019). MediaPipe: A Framework for Building Perception Pipelines. CVPR Workshop.

[4] World Taekwondo. (2024). Competition Rules. worldtaekwondo.org.

Previous: Fast Object Detection Back to Research