Home Examples Abstract Links BibTex Acknowledgments ☰

Leveraging SE(3) Equivariance for Self-Supervised
Category-Level Object Pose Estimation

Xiaolong Li¹ Yijia Weng^2,4 Li Yi³ Leonidas Guibas⁴ A. Lynn Abbott¹ Shuran Song⁵ He Wang²

¹Virginia Tech ²Peking University ³Tsinghua University ⁴Stanford University ⁵Columbia University

Abstract

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds. During training, our method assumes no ground-truth pose annotations, no CAD models, and no multi-view supervision. The key to our method is to disentangle shape and pose through an invariant shape reconstruction module and an equivariant pose estimation module, empowered by SE(3) equivariant point cloud networks. The invariant shape reconstruction module learns to perform aligned reconstructions, yielding a category-level reference frame without using any annotations. In addition,the equivariant pose estimation module achieves category-level pose estimation accuracy that is comparable to some fully supervised methods. Extensive experiments demonstrate the effectiveness of our approach on both complete and partialdepth point clouds from the ModelNet40 benchmark, and on real depth point cloudsfrom the NOCS-REAL 275 dataset.

Results on Category-Level 3D Pose Estimation(Complete Input)

Results on Category-Level 6D Pose Estimation(Partial Input)

Results on Category-Level Canonical Shape Reconstructions

From complete inputs From partial inputs

Links

Code (released!)

Data

Models

Paper

arXiv

BibTex

@article{li2021leveraging,
  Author    = {Li, Xiaolong and Weng, Yijia and Yi, Li and Guibas, Leonidas and Abbott, A Lynn and Song, Shuran and Wang, He},
  Title     = {Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation},
  journal   = {Thirty-Fifth Conference on Neural Information Processing Systems},
  year      = {2021}
}

Acknowledgments

This research is supported by a Vannevar Bush faculty fellowship, NSF grant IIS-1763268, and gifts from the Adobe and Autodesk Corporations. We appreciate resources provided by Advanced Research Computing in the Division of Information Technology at Virginia Tech. We also thank Dr. Haiwei Chen for the helpful chat over equivariant neural networks.