These results are generated by our MarsGen under the conditions of image, camera, and text.
Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Martian and terrestrial imagery. Our M3arsSynth engine spans a wide range of Martian terrains and acquisition dates, enabling the generation of physically accurate 3D surface models at metric-scale resolution. MarsGen, fine-tuned on M3arsSynth data, synthesizes videos conditioned on an initial image frame and, optionally, camera trajectories or textual prompts, allowing for video generation in novel environments. Experimental results show that our approach outperforms video synthesis models trained on terrestrial datasets, achieving superior visual fidelity and 3D structural consistency.
Overview of the M3arsSynth dataset construction and conditional video generation through MarsGen. We process stereo image pairs using a metric-aware foundation model and solve the Perspective-n-Point (PnP) problem to reconstruct metric-scale 3D Martian scenes. Subsequently, video frames rendered from these scenes, together with text prompts and encoded camera trajectories, are then used to condition a Video Diffusion Transformer, enabling the synthesis of novel and controllable Martian video sequences.
Addressing Mars-Specific Challenges: Synthesizing realistic Martian videos is hindered by data scarcity and a significant domain gap from terrestrial imagery. Our approach, purpose-built for Mars, outperforms models trained on Earth data by achieving superior visual fidelity and 3D structural consistency.
High-Fidelity 3D Data Foundation (M3arsSynth): Our M3arsSynth data curation pipeline tackles these challenges by first reconstructing physically accurate, metric-scale 3D Martian environments from real NASA PDS stereo imagery, which then allows for the rendering of high-fidelity video sequences.
Controllable and Novel Video Synthesis (MarsGen): Leveraging the M3arsSynth dataset, our MarsGen video generator synthesizes novel Martian video sequences that are visually realistic and geometrically consistent. Conditioned on initial frames, camera trajectories, or textual prompts, it enables the synthesis of extensive and varied Martian video data.