Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras

Abstract

Markerless multiview motion capture remains challenging due to the need for precise camera calibration, limiting its accessibility for non-experts. Existing calibration-free approaches avoid manual calibration but suffer from high computational cost and reduced reconstruction accuracy.

We present Kineo, a fully automatic, calibration-free pipeline for markerless motion capture from videos captured by unsynchronized, uncalibrated, consumer-grade RGB cameras. Kineo fully leverages 2D keypoint from off-the-shelf detectors to simultaneously calibrate cameras, estimate Brown-Conrady distortion coefficients, and reconstruct 3D keypoints and dense scene point maps at metric scale. A novel confidence-driven keypoint sampling strategy, combined with graph-based optimization, ensures a fixed computational cost for the calibration step, independent of sequence duration. Additionally, we introduce the pairwise reprojection consensus score, which quantifies the reliability of 3D estimations for downstream tasks.

Evaluations on EgoHumans and Human3.6M demonstrate substantial improvements over prior calibration-free methods. Compared to previous state-of-the-art approaches, Kineo reduces camera translation error by approximately 81-84%, camera angular error by 86-92%, and world mean-per-joint error (W-MPJPE) by 82-91%. When provided with ground-truth intrinsics, Kineo's performance further improves, matching methods that use ground-truth extrinsics. We further demonstrate Kineo's real-world applicability in both offline and realtime scenarios. In an offline setting, we capture craftsmen gestures at Guédelon Castle, highlighting Kineo's potential for scalable, accurate motion capture in complex, uncontrolled environments and its utility in preserving intangible cultural heritage. In a realtime setting, Kineo employs a two-step approach, with calibration followed by live human reconstruction, showcasing its capability for interactive 3D motion capture.

Method Overview

Figure 1: Overview of the Kineo pipeline for markerless motion capture from uncalibrated and unsynchronized multi-camera videos. Starting from raw video inputs, the system first performs audio-based temporal synchronization to align unsynchronized streams on a shared timeline. Next, automatic camera calibration estimates extrinsic and intrinsic parameters, including Brown–Conrady lens distortion, via a graph-based optimization over 2D keypoint correspondences selected by a confidence-driven sampling strategy. Using the recovered cameras, 3D keypoints and scene point maps are reconstructed, and each triangulated keypoint is assigned a pairwise reprojection confidence score to quantify reconstruction quality. Finally, metric-scale recovery is achieved either through a human body prior using the SMPL model or a monocular metric depth estimator for subject-agnostic scaling. The modular design enables robust and scalable reconstruction of both human and non-human subjects across long, multi-view sequences.

Results Gallery

Interactive Result

Interactive result on one sequence from the Human3.6M dataset (S11 Greeting). The result are raw with no post-processing (filtering, smoothing, etc.).

Qualitative Results

EgoHumans - Fencing

EgoHumans - Tagging

Human3.6M - Phoning

Human3.6M - Walking Dog

Guédelon - Stonemason

Guédelon - Carpenter

Comparative Results (HSFM vs. Kineo)

Qualitative comparison between HSfM (red) and Kineo (blue) on the S11_Eating 1 sequence from the Human3.6M dataset (HSfM best-performing sequence). Ground truth camera and skeletons are shown in black. Visualization is scale-aligned.

Quantitative Results

	Method	Human Metrics		Parameters
		W-MPJPE ↓	PA-MPJPE ↓	[R\|t]	K	d
EgoHumans	UnCaliPose	3.51	0.13	Est.	GT	GT
	HSfM	1.04	0.05	Est.	Est.	-
	HAMSt3R	3.80	0.14	Est.	Est.	-
	Kineo	0.16	0.02	Est.	GT	GT
	Kineo	0.41	0.03	Est.	Est.	-
	Kineo	0.17	0.02	Est.	Est.	Est.
Human3.6M	Iskakov et al.	0.02	-	GT	GT	GT
	AdaFuse	0.02	-	GT	GT	GT
	Hewitt et al.	0.03	-	GT	GT	-
	HSfM	0.47	0.05	Est.	Est.	-
	Kineo	0.02	0.02	Est.	GT	GT
	Kineo	0.04	0.02	Est.	Est.	-
	Kineo	0.04	0.02	Est.	Est.	Est.

Quantitative comparison of human pose estimation on EgoHumans and Human3.6M. Metrics include W-MPJPE (in meters) and PA-MPJPE (in meters). Camera parameters: GT = ground truth, Est. = estimated, - = omitted. Best and second-best calibration-free methods are highlighted.

	Method	Camera Metrics							Parameters
		TE ↓	s-TE ↓	AE ↓	RRA@10 ↑	CCA@10 ↑	s-CCA@10 ↑	FoV ↓	[R\|t]	K	d
EgoHumans	UnCaliPose	2.63	2.63	60.90	0.28	-	0.33	-	Est.	GT	GT
	DUSt3R	-	1.15	11.00	0.61	-	0.49	-	Est.	Est.	-
	MASt3R	4.97	0.92	10.42	0.61	0.06	0.65	-	Est.	Est.	-
	HSfM	2.09	0.75	9.35	0.72	0.32	0.75	-	Est.	Est.	-
	HAMSt3R	2.33	0.40	10.24	0.77	0.06	0.75	-	Est.	Est.	-
	Kineo	0.29	0.05	0.34	1.00	0.99	0.99	0.00	Est.	GT	GT
	Kineo	0.76	0.57	3.48	0.91	0.72	0.89	2.96	Est.	Est.	-
	Kineo	0.34	0.15	0.69	1.00	0.98	0.99	1.03	Est.	Est.	Est.
Human3.6M	HSfM	0.83	0.33	6.44	0.95	0.46	0.95	36.97	Est.	Est.	-
	Kineo	0.01	0.01	0.20	1.00	1.00	1.00	0.00	Est.	GT	GT
	Kineo	0.13	0.03	0.90	1.00	1.00	1.00	0.57	Est.	Est.	-
	Kineo	0.12	0.02	0.89	1.00	1.00	1.00	0.43	Est.	Est.	Est.

Quantitative comparison of camera pose estimation on EgoHumans and Human3.6M. Metrics include translation error (TE, m), scale-aligned translation error (s-TE, m), angular error (AE, °), relative rotation accuracy (RRA), camera center accuracy (CCA), and scale-aligned CCA (s-CCA). Camera parameters: GT = ground truth, Est. = estimated, - = omitted. Best and second-best calibration-free methods are highlighted.

Acknowledgements

This work was supported by the Auvergne-Rhône-Alpes region as part of the PROMESS project. This work was granted access to the HPC resources of IDRIS under the allocation 2025-AD010614830 made by GENCI. We also express our gratitude to the Guédelon Castle for kindly welcoming us and permitting the captures that were essential to this study.

Citation

@article{javerliat2025kineo, title={Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras}, author={Charles Javerliat and Pierre Raimbaud and Guillaume Lavoué}, year={2025}, eprint={2510.24464}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.24464}, }