SteerPose

Abstract

Can freely moving humans or animals themselves serve as calibration targets for multi-camera systems while simultaneously estimating their correspondences across views? We humans can solve this problem by mentally rotating the observed 2D poses and aligning them with those in the target views. Inspired by this cognitive ability, we propose SteerPose, a neural network that performs this rotation of 2D poses into another view. By integrating differentiable matching, SteerPose simultaneously performs extrinsic camera calibration and correspondence search within a single unified framework. We also introduce a novel geometric consistency loss that explicitly ensures that the estimated rotation and correspondences result in a valid translation estimation. Experimental results on diverse in-the-wild datasets of humans and animals validate the effectiveness and robustness of the proposed method. Furthermore, we demonstrate that our method can reconstruct the 3D poses of novel animals in multi-camera setups by leveraging off-the-shelf 2D pose estimators and our class-agnostic model.

Method

Camera calibration by SteerPose. Given a set of 2D poses $P$ of targets with known articulation, SteerPose predicts their appearance under a relative camera rotation $R$, resulting in transformed poses $Q(R)$. These can be matched to 2D poses $P'$ observed from another view in a differentiable manner. Novel matching and geometric losses jointly evaluate the quality of the rotation $R$ and the correspondence between $Q(R)$ and $P'$. The gradients are backpropagated to $R$, enabling the simultaneous optimization of both relative rotation and matching.

Results

Two-view Extrinsic Calibration and Matching

AcinoSet

In an uncontrolled setup with a running cheetah, SteerPose successfully estimates the extrinsic camera parameters and reconstructs its 3D poses.

Mammal dataset

In a livestock scenario with multiple pigs, SteerPose simultaneously estimates the extrinsic camera parameters and finds the correspondences between two views.

3D-PoP dataset

Beyond quadrupedal animals, the proposed method can be applied to animals with different body structures, including birds.

Multi-view Extrinsic Calibration and Matching

The pairwise extrinsic parameters and correspondences can be integrated into a unified coordinate system, which improves the accuracy of the estimated extrinsic parameters through non-linear optimization that minimizes the reprojection errors.

BibTeX

              
  @inproceedings{lee2025steerpose,
    title     = {SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation},
    author    = {Lee, Sang-Eun and Nishino, Ko and Nobuhara, Shohei},
    booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
    year      = {2025},
    note      = {arXiv:2506.01691}
  }

Acknowledgement

This work was in part supported by JSPS KAKENHI 20H05951 and 22H05654.