Marigold: Generative Computer Vision
Summary
Marigold is a novel diffusion-based approach for dense prediction, providing foundation models and pipelines
for a range of computer vision and image analysis tasks, including monocular depth estimation, surface normal
prediction, and intrinsic image decomposition.
By repurposing Stable Diffusion, it achieves both efficiency and generality across tasks.
Marigold also serves as a reliable fine-tuning protocol, enabling foundation models to tackle many other
dense prediction tasks using only a few synthetic samples.
Interactive Demonstrations
Select a demo and either upload your own image or try one of the provided examples.
To view the demo full-screen, click its title.
Like the project? A star ⭐ goes a long way!
Related and Follow-up Works
- ⇆ Marigold-DC casts the task of sparse Depth Completion as conditional depth estimation
- 🛹 Rolling Depth (CVPR 2025) achieves superior temporal consistency in video depth estimation
- 🌆 Better Depth (NeurIPS 2024) demonstrates refinement of coarse predictions with diffusion models
Testimonials
Open Student Projects
The Photogrammetry and Remote Sensing Lab at ETH Zürich, led by
Prof. Konrad Schindler,
offers a range of student projects in state-of-the-art computer vision, including several focused on generative AI.
If you like research like Marigold, be sure to check out the
list of currently available projects.
Papers and Citations
The first model,
Marigold-Depth v1.0
was introduced in our
CVPR'2024 paper titled "Repurposing Diffusion-Based Image
Generators for Monocular Depth Estimation" by
Bingxin Ke,
Anton Obukhov,
Shengyu Huang,
Nando Metzger,
Rodrigo Caye Daudt, and
Konrad Schindler.
This model required 10–50 inference steps with the DDIM scheduler and showed that high-quality depth
estimators can be trained solely on synthetic data, leveraging the pretrained image VAE and the intact
latent space of the original LDM (Stable Diffusion).
@InProceedings{ke2023repurposing, title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation}, author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2024} }
Follow up models, including
Marigold-Depth v1.1
,
Marigold-Normals v1.1
,
Marigold IID-Appearance v1.1
,
Marigold IID-Lighting v1.1
,
Marigold-Depth LCM v1.0
, and
Marigold-Depth HR v1.0
, were introduced in the
follow-up paper titled "Marigold: Affordable Adaptation of
Diffusion-Based Image Generators for Image Analysis" by
Bingxin Ke,
Kevin Qu,
Tianfu Wang,
Nando Metzger,
Shengyu Huang,
Bo Li,
Anton Obukhov,
Konrad Schindler.
With the trailing
timesteps setting of the DDIM scheduler, these models
achieve strong performance and speed using as few as 1–4 diffusion inference steps.
Additionally, we introduced an alternative consistency distillation method (LCM) for few-step inference,
along with exploration of high-resolution (HR) inference.
@misc{ke2025marigold, title={Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis}, author={Bingxin Ke and Kevin Qu and Tianfu Wang and Nando Metzger and Shengyu Huang and Bo Li and Anton Obukhov and Konrad Schindler}, year={2025}, eprint={2505.09358}, archivePrefix={arXiv}, primaryClass={cs.CV} }
We’re thankful to the community for acknowledging our work through citations.