Can freely moving humans or animals themselves serve as calibration targets for multi-camera systems while simultaneously estimating their correspondences across views? We humans can solve this problem by mentally rotating the observed 2D poses and aligning them with those in the target views. Inspired by this cognitive ability, we propose SteerPose, a neural network that performs this rotation of 2D poses into another view. By integrating differentiable matching, SteerPose simultaneously performs extrinsic camera calibration and correspondence search within a single unified framework. We also introduce a novel geometric consistency loss that explicitly ensures that the estimated rotation and correspondences result in a valid translation estimation. Experimental results on diverse in-the-wild datasets of humans and animals validate the effectiveness and robustness of the proposed method. Furthermore, we demonstrate that our method can reconstruct the 3D poses of novel animals in multi-camera setups by leveraging off-the-shelf 2D pose estimators and our class-agnostic model.
In an uncontrolled setup with a running cheetah, SteerPose successfully estimates the extrinsic camera parameters and reconstructs its 3D poses.
In a livestock scenario with multiple pigs, SteerPose simultaneously estimates the extrinsic camera parameters and finds the correspondences between two views.
Beyond quadrupedal animals, the proposed method can be applied to animals with different body structures, including birds.
The pairwise extrinsic parameters and correspondences can be integrated into a unified coordinate system, which improves the accuracy of the estimated extrinsic parameters through non-linear optimization that minimizes the reprojection errors.
@inproceedings{lee2025steerpose, title = {SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation}, author = {Lee, Sang-Eun and Nishino, Ko and Nobuhara, Shohei}, booktitle = {Proceedings of the British Machine Vision Conference (BMVC)}, year = {2025}, note = {arXiv:2506.01691} }
This work was in part supported by JSPS KAKENHI 20H05951 and 22H05654.