Azimuth-Equivariant Feature Learning and Camera-Decoupled Depth Estimation for Multi-View 3D Object Detection
Standard Lift-Splat-Shoot (LSS) paradigms for multi-view 3D perception often neglect the radial symmetry inherent in Bird's-Eye-View (BEV) representations. Conventional 2D backbones and detection heads apply isotropic sampling grids and Cartesian coordinate systems, which introduces two primary limi...