Kineo turns raw, uncalibrated, unsynchronized videos into metric-scale 3D output.

Abstract

Markerless multiview motion capture remains challenging due to the need for precise camera calibration, limiting its accessibility for non-experts. Existing calibration-free approaches avoid manual calibration but suffer from high computational cost and reduced reconstruction accuracy.

We present Kineo, a fully automatic, calibration-free pipeline for markerless motion capture from videos captured by unsynchronized, uncalibrated, consumer-grade RGB cameras. Kineo fully leverages 2D keypoint from off-the-shelf detectors to simultaneously calibrate cameras, estimate Brown-Conrady distortion coefficients, and reconstruct 3D keypoints and dense scene point maps at metric scale. A novel confidence-driven keypoint sampling strategy, combined with graph-based optimization, ensures a fixed computational cost for the calibration step, independent of sequence duration. Additionally, we introduce the pairwise reprojection consensus score, which quantifies the reliability of 3D estimations for downstream tasks.

Evaluations on EgoHumans and Human3.6M demonstrate substantial improvements over prior calibration-free methods. Compared to previous state-of-the-art approaches, Kineo reduces camera translation error by approximately 81-84%, camera angular error by 86-92%, and world mean-per-joint error (W-MPJPE) by 82-91%. When provided with ground-truth intrinsics, Kineo's performance further improves, matching methods that use ground-truth extrinsics. We further demonstrate Kineo's real-world applicability in both offline and realtime scenarios. In an offline setting, we capture craftsmen gestures at Guédelon Castle, highlighting Kineo's potential for scalable, accurate motion capture in complex, uncontrolled environments and its utility in preserving intangible cultural heritage. In a realtime setting, Kineo employs a two-step approach, with calibration followed by live human reconstruction, showcasing its capability for interactive 3D motion capture.

Method Overview

Overview of the Kineo pipeline for markerless motion capture from uncalibrated and unsynchronized multi-camera videos.
Figure 1: Overview of the Kineo pipeline for markerless motion capture from uncalibrated and unsynchronized multi-camera videos. Starting from raw video inputs, the system first performs audio-based temporal synchronization to align unsynchronized streams on a shared timeline. Next, automatic camera calibration estimates extrinsic and intrinsic parameters, including Brown–Conrady lens distortion, via a graph-based optimization over 2D keypoint correspondences selected by a confidence-driven sampling strategy. Using the recovered cameras, 3D keypoints and scene point maps are reconstructed, and each triangulated keypoint is assigned a pairwise reprojection confidence score to quantify reconstruction quality. Finally, metric-scale recovery is achieved either through a human body prior using the SMPL model or a monocular metric depth estimator for subject-agnostic scaling. The modular design enables robust and scalable reconstruction of both human and non-human subjects across long, multi-view sequences.

Acknowledgements

This work was supported by the Auvergne-Rhône-Alpes region as part of the PROMESS project. This work was granted access to the HPC resources of IDRIS under the allocation 2025-AD010614830 made by GENCI. We also express our gratitude to the Guédelon Castle for kindly welcoming us and permitting the captures that were essential to this study.

Citation

@article{javerliat2025kineo,
  title={Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras}, 
  author={Charles Javerliat and Pierre Raimbaud and Guillaume Lavoué},
  year={2025},
  eprint={2510.24464},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2510.24464}, 
}