EAG3R: Event-Augmented 3D Geometry Estimation
for Dynamic and Extreme-Lighting Scenes

NeurIPS 2025 Spotlight
1The University of Hong Kong 2The Southern University of Science and Technology
* Equal contribution, name alphabetically.

Inputs: The Invisible & The Visible

Event Stream Visualization
RGB Input
RGB Input (Low Light)
Event Stream (High SNR)

Results: Failure vs Robustness

EAG3R Result
MonST3R Baseline
Baseline (MonST3R)
EAG3R (Ours)
Seeing in the Dark. Left: Standard RGB cameras are blind in extreme darkness, capturing mostly noise. Event cameras, however, capture distinct motion signals with high SNR. Right: Consequently, RGB-only methods like MonST3R fail completely, while our event-augmented EAG3R successfully reconstructs the geometry.

Abstract

EAG3R Concept: Event vs RGB
Figure 1: The core idea. Standard RGB methods (left) fail in extreme conditions. EAG3R utilizes the high temporal resolution and dynamic range of event cameras (middle) to reconstruct sharp 3D geometry (right).

Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation and SLAM. However, existing RGB-only approaches struggle under real-world conditions involving dynamic objects and extreme illumination.

In this paper, we propose EAG3R, a novel geometry estimation framework that augments pointmap-based reconstruction with asynchronous event streams. Built upon the MonST3R backbone, EAG3R introduces: (1) a Retinex-inspired enhancement module and a lightweight event adapter with SNR-aware fusion mechanism; and (2) a novel event-based photometric consistency loss. Our method enables robust geometry estimation in challenging dynamic low-light scenes without requiring retraining on night-time data (Zero-Shot Generalization).

Methodology

Overall Architecture

EAG3R Architecture Pipeline
Figure 2: The EAG3R Framework. Our pipeline integrates Retinex-enhanced RGB features with motion-rich event features via an SNR-Aware Fusion module, feeding into a predictive head for geometry and pose estimation.

01. Enhance & Reliability

A Retinex-based module recovers visibility in underexposed regions and estimates a Signal-to-Noise Ratio (SNR) map to quantify pixel-wise reliability.

02. Event Adapter

A lightweight Swin Transformer extracts high-fidelity motion features from sparse event streams, capturing dynamics hidden in RGB dark noise.

03. SNR-Aware Fusion

Features are fused adaptively using cross-attention. The network learns to trust RGB in well-lit areas and leverage Event signals in the dark, guided by the SNR map.

Event-Based Photometric Consistency Loss

To supervise dynamic motion without ground truth, we propose a novel loss function derived from the physical event generation model.

Event-Based Consistency Loss
Figure 3: Event Consistency Loss. We enforce consistency between the observed brightness change (from integrated events) and the predicted brightness change (from estimated depth, pose, and image gradients).

Robustness in the Dark

Zero-Shot Performance: Trained only on Day scenes (MVSEC day2), tested directly on Night scenes.

Qualitative Comparison

Comparison against State-of-the-Art methods in dynamic night scenes.

Qualitative Comparison with SOTA
Visual Comparisons. EAG3R (Ours) successfully reconstructs dynamic objects (e.g., moving cars) and distant geometry in extreme low light, where baselines like MonST3R and Easi3R suffer from severe degradation.

Trajectory Estimation Accuracy

Trajectory Comparison Plot
Figure 4: Zero-shot Camera Trajectory. Our method significantly reduces rotation error and drift compared to the baseline MonST3R in unseen night sequences.

BibTeX

@inproceedings{wueag3r,
  title={EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes},
  author={Wu, Xiaoshan and Yu, Yifei and Lyu, Xiaoyang and Huang, Yi-Hua and Wang, Bo and Zhang, Baoheng and Wang, Zhongrui and Qi, Xiaojuan},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}