Case StudyProduction

Fighter Detection & Pose Analysis

Multi-Model Pipeline for Combat Sports Analytics

Lab: BlueX ResearchDomain: Taekwondo AnalysisPublished: January 2026Status: Production

Abstract

This case study documents our multi-model pipeline for fighter detection and pose analysis in combat sports video. The system achieves real-time performance (<100ms latency) by combining custom-trained YOLOv8 for fighter identification, YOLOv8-Pose for 33-landmark skeletal tracking, and ST-GCN for kick technique classification.

Developed for taekwondo match analysis, the methodology generalizes to other combat sports (MMA, boxing, karate) where dual-fighter tracking, pose estimation, and action recognition are required at broadcast-quality frame rates.

1

The Combat Sports Detection Challenge

Combat sports video presents unique challenges for computer vision. Unlike static object detection or single-person pose estimation, the domain requires simultaneous tracking of multiple rapidly-moving, frequently-occluding subjects with real-time performance constraints.

Dual-Fighter Tracking

Two fighters must be tracked with consistent IDs across frames, even during clinches, spins, and rapid exchanges where bounding boxes overlap.

Identity Preservation

Fighters must be distinguished by corner (red vs. blue hogu) across occlusions, similar poses, and varying camera angles.

Real-Time Constraint

Match analysis requires 30+ FPS processing to capture rapid kick sequences, while streaming results to the client via SSE.

Why Generic Person Detection Fails

Standard COCO-trained person detectors cannot distinguish between fighters by corner color. Generic pose estimators struggle with combat-specific poses (high kicks, spinning techniques) that are underrepresented in general datasets.

ID Swaps
During clinches
No Color
Corner distinction
70% Pose
High kick accuracy
2

Multi-Model Pipeline Architecture

The system employs a cascaded architecture where specialized models handle distinct tasks. Each component is optimized for its specific function, enabling parallel execution on GPU and modular upgrades.

Processing Pipeline

YOLOv8n-TKD
YOLOv8L-Pose
ST-GCN
SSE Stream
Fighter DetectionPose EstimationTechnique ClassificationReal-time Output

Stage 1: Fighter Detection (YOLOv8n-TKD)

Custom-trained YOLOv8 nano model for taekwondo-specific fighter detection. Trained to recognize red and blue hogu (chest protector) as separate classes.

Classes: [blue, red]

Corner-specific detection

Model: yolov8n

3.2M parameters

Latency: ~8ms

Per-frame inference

Stage 2: Pose Estimation (YOLOv8L-Pose)

Native YOLO pose estimation model providing 17 keypoints per detected person. Handles multiple people naturally, enabling simultaneous dual-fighter tracking.

Keypoints: 17

COCO format

Latency: ~25ms

Dual-person

Key joints for Taekwondo: Hip (center of mass), Knees (kick initiation), Ankles (strike contact), Shoulders (rotation), Wrists (punch detection)

Stage 3: Technique Classification (ST-GCN)

Spatial-Temporal Graph Convolutional Network for classifying kick techniques from pose sequences. Takes 15-30 frame windows of skeletal data.

돌려차기앞차기옆차기뒤차기내려차기후려차기뒤후려차기540 Kick몸돌려차기
3

Custom Training for Fighter Detection

The key innovation is training a lightweight YOLO model specifically for hogu color detection, enabling consistent fighter identification without complex re-identification networks.

Training Data

  • Source Videos100+
  • Labeled Frames10,000+
  • Pose Sequences50,000+
  • SourceWorld Taekwondo

Model Configuration

  • Base ModelYOLOv8n
  • Classes2 (blue, red)
  • Image Size640px
  • Parameters~3.2M

Kick Annotation Schema

{
  "video_id": "match_001",
  "frame_start": 1200,
  "frame_end": 1230,
  "technique": "roundhouse",  // 돌려차기
  "fighter": "red",
  "target": "body",
  "contact": true,
  "score": 2
}
4

Performance Results

System Performance

<100ms
End-to-end latency
30+ FPS
Processing rate
90%+
Kick accuracy
<8GB
GPU memory
Model StageLatencyGPU MemoryAccuracy
YOLOv8n-TKD (Fighter Detection)~8ms~1GB95%+
YOLOv8L-Pose (Pose Estimation)~25ms~3GB92%
ST-GCN (Kick Classification)~40ms~2GB90%
Total Pipeline<100ms<8GB

Per-Fighter Output Statistics

{
  "fighter_id": "red",
  "round": 2,
  "statistics": {
    "total_strikes": 45,
    "kicks": {
      "roundhouse": 18,
      "front": 12,
      "side": 8,
      "back": 4,
      "axe": 3
    },
    "punches": 12,
    "strikes_per_minute": 22.5,
    "head_strikes": 15,
    "body_strikes": 30,
    "accuracy": 0.67,
    "territory_control": 0.55
  }
}
5

Generalization to Other Combat Sports

The multi-model pipeline architecture generalizes to other combat sports with minimal retraining. The key adaptation is training the Stage 1 detector for sport-specific visual markers (glove colors, uniform patterns) and the Stage 3 classifier for sport-specific techniques.

Applicable Domains

  • Boxing (glove color detection)
  • MMA (corner-based identification)
  • Karate (belt color, uniform)
  • Fencing (weapon + uniform tracking)

Transferable Components

  • Multi-model cascade architecture
  • SSE real-time streaming pipeline
  • Pose estimation backbone
  • Statistics aggregation framework

References

[1] Yan, S. et al. (2018). Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. AAAI.

[2] Jocher, G. et al. (2023). Ultralytics YOLOv8. GitHub repository.

[3] Lugaresi, C. et al. (2019). MediaPipe: A Framework for Building Perception Pipelines. CVPR Workshop.

[4] World Taekwondo. (2024). Competition Rules. worldtaekwondo.org.