Summary

Marigold is a novel diffusion-based approach for dense prediction, providing foundation models and pipelines for a range of computer vision and image analysis tasks, including monocular depth estimation, surface normal prediction, and intrinsic image decomposition. By repurposing Stable Diffusion, it achieves both efficiency and generality across tasks. Marigold also serves as a reliable fine-tuning protocol, enabling foundation models to tackle many other dense prediction tasks using only a few synthetic samples.

Interactive Demonstrations

Select a demo and either upload your own image or try one of the provided examples. To view the demo full-screen, click its title. Like the project? A star ⭐ goes a long way!

Related and Follow-up Works

  • ⇆ Marigold-DC casts the task of sparse Depth Completion as conditional depth estimation
  • 🛹 Rolling Depth (CVPR 2025) achieves superior temporal consistency in video depth estimation
  • 🌆 Better Depth (NeurIPS 2024) demonstrates refinement of coarse predictions with diffusion models

Testimonials

Open Student Projects

The Photogrammetry and Remote Sensing Lab at ETH Zürich, led by Prof. Konrad Schindler, offers a range of student projects in state-of-the-art computer vision, including several focused on generative AI. If you like research like Marigold, be sure to check out the list of currently available projects.

Papers and Citations

The first model, Marigold-Depth v1.0 was introduced in our CVPR'2024 paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation" by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. This model required 10–50 inference steps with the DDIM scheduler and showed that high-quality depth estimators can be trained solely on synthetic data, leveraging the pretrained image VAE and the intact latent space of the original LDM (Stable Diffusion).
@InProceedings{ke2023repurposing,
  title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
  author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}
Follow up models, including Marigold-Depth v1.1, Marigold-Normals v1.1, Marigold IID-Appearance v1.1, Marigold IID-Lighting v1.1, Marigold-Depth LCM v1.0, and Marigold-Depth HR v1.0, were introduced in the follow-up paper titled "Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis" by Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler. With the trailing timesteps setting of the DDIM scheduler, these models achieve strong performance and speed using as few as 1–4 diffusion inference steps. Additionally, we introduced an alternative consistency distillation method (LCM) for few-step inference, along with exploration of high-resolution (HR) inference.
@misc{ke2025marigold,
  title={Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis},
  author={Bingxin Ke and Kevin Qu and Tianfu Wang and Nando Metzger and Shengyu Huang and Bo Li and Anton Obukhov and Konrad Schindler},
  year={2025},
  eprint={2505.09358},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
We’re thankful to the community for acknowledging our work through citations.