Marigold: Generative Computer Vision

Paper 🧨 Usage Tutorial 🤗 Hugging Face Collection Monocular Depth Training Code Updates 🏛️ PRS-ETH Home

Summary

Marigold is a novel diffusion-based approach for dense prediction, providing foundation models and pipelines for a range of computer vision and image analysis tasks, including monocular depth estimation, surface normal prediction, and intrinsic image decomposition. By repurposing Stable Diffusion, it achieves both efficiency and generality across tasks. Marigold also serves as a reliable fine-tuning protocol, enabling foundation models to tackle many other dense prediction tasks using only a few synthetic samples.

Interactive Demonstrations

Select a demo and either upload your own image or try one of the provided examples. To view the demo full-screen, click its title. Like the project? A star ⭐ goes a long way!

Related and Follow-up Works

⇆ Marigold-DC casts the task of sparse Depth Completion as conditional depth estimation
🛹 Rolling Depth (CVPR 2025) achieves superior temporal consistency in video depth estimation
🌆 Better Depth (NeurIPS 2024) demonstrates refinement of coarse predictions with diffusion models

Testimonials

Sheeeeesh — you could cut your finger on those depth maps!
— Bilawal Sidhu (@bilawalsidhu) December 12, 2023

kijai/ComfyUI-Marigold: Marigold depth estimation in ComfyUI https://t.co/c2JUYdq2GS
— BLENDER SUSHI 🫶 X - 24/7 Blenderian (@jimmygunawanapp) December 13, 2023

Marigold 🏵️ is a new approach to depth estimation using diffusion models ✨
Previously dense prediction transformers (DPT) were being used so I wanted to test this myself and I was pretty impressed 😍
Try it here 👉 https://t.co/EdYTNL45ab pic.twitter.com/Vb4FMfFZnn
— merve (@mervenoyann) December 13, 2023

The excellent Marigold depth model is now on Replicate:https://t.co/zbm3PIyA0l

🙏 @alaradirik for adding pic.twitter.com/agIFW6o9pZ
— fofr (@fofrAI) December 15, 2023

Check out Marigold, a SOTA depth estimation model!

🔎 Diffusion models for depth estimation
🤗Open Source fine-tuning and inference code+models
🔥Zero-shot transfer to unseen data
🥳Impressive results!

Check it out! https://t.co/z6wJhV9foa pic.twitter.com/BOoYT139A3
— Omar Sanseviero (@osanseviero) December 28, 2023

I created a cool JRPG3d AI scene while playing with @tripoai this holiday. It's probably the best 3dAI tool I had tried until the moment:
Midjourney concept-> TripoAI -> models/meshes
Midjourney -> marigold(depth maps) -> scenario
Mixamo -> animation#GodotEngine -> scene. pic.twitter.com/HOlsMdfCkK
— Lucas Ferreira da Silva (@bioinfolucas) January 7, 2024

Example of what AI generation can do in practice currently, from initial image (which is also generated) to a very detailed printable model, the level of detail mostly made possible with the Marigold depth estimation.#ai #3dprinting #dragon #technology pic.twitter.com/UwuDQH1ssu
— Jukka Seppänen (@Kijaidesign) January 7, 2024

“Image to 3D-printer” 🤯 https://t.co/heKut7CDj6
— Philipp Tsipman (@ptsi) December 20, 2023

Used Marigold as a base to build CNC stl for wood carving. Required additional work in Blender for details but could not do it without Marigold. Thanks! pic.twitter.com/BqRlpjBWOR
— Barber Shop Globes (@MinyardCathy) February 6, 2024

left is marigold; right is depth anything
much higher resolution understanding; it passes the pepe's lips test https://t.co/7akpFy4h38 pic.twitter.com/PsGgfxpnx8
— dingboard (@dingboard_) March 25, 2024

AI Video Depth Estimation using Midjourney + RunwayML + Marigold-LCM

text to image: #midjourney
image to video: #runwayml
video to depth: #Marigolddepth pic.twitter.com/Sjlh8S5exo
— kggmmm (@kgmvid) April 1, 2024

Start the week with a new 🧨 Diffusers release ❤️‍🔥

This release includes the first non-generative tasks within the library -- depth estimation and normals' prediction through Marigold 💐

Notes 📜https://t.co/mwEVRFyBv8

1/5 pic.twitter.com/kCA8zg9QXu
— Sayak Paul (@RisingSayak) May 27, 2024

Marigold is indeed SOTA Depth Estimator available at present, and my latest controlnetllite is also trained based on it.https://t.co/KCwfFoU7Ou https://t.co/skz6SOmth6
— 青龍聖者 (@bdsqlsz) May 28, 2024

Wow I'am really impressed by Marigold Depth Estimation just a quick extraction in Zbrush 4k texture #madewithunity #zbrush #huggingface #AIimage #midjouney #HDRP pic.twitter.com/7F9THbrzh0
— Daniel Skaale (@DSkaale) December 12, 2023

Wow Marigold 🌼 depth estimation works extremely well! 🤯

And the best thing is that the checkpoints and code are fully available for commercial use!

Try it out yourself! ⬇️⬇️ pic.twitter.com/H5eV6C680t
— Alex Carlier (@alexcarliera) December 23, 2023

Image to depth using Marigold and ConfyUI

Then depth to 3D geometry in Blender

Super interesting stuff 🤯 pic.twitter.com/ca0UbojxkP
— A.I.Warper (@AIWarper) December 16, 2023

Made this video (🎶) with a Midjourney v6 image! Started by upscaling/refining with @Magnific_AI, pulled a Marigold Depth Map from that in ComfyUI, then used as a displacement map in Blender where I animated this camera pass with some relighting and narrow depth of field.🧵1/12 pic.twitter.com/Drqnzlbnl1
— CoffeeVectors (@CoffeeVectors) December 24, 2023

This stuff impressed me more than any other generational AI buzz around

Ridiculous how precise it works with things, doesn't create fake depth with the image on the screen, etc https://t.co/hg5hrt3l6m
— Sam Pavlovic (@SamZaNemesis) December 24, 2023

Just added the SOTA Marigold depth estimator to @fal_ai_data.

Check it out here: https://t.co/4CMnj4bSY5 pic.twitter.com/t6mRLwNiWQ
— Jonathan Fischoff (@jfischoff) December 28, 2023

I missed this.

Metaverse 2.0 foundational technology. https://t.co/SdUipUMix0
— Robert Scoble (@Scobleizer) December 24, 2023

Open Student Projects

The Photogrammetry and Remote Sensing Lab at ETH Zürich, led by Prof. Konrad Schindler, offers a range of student projects in state-of-the-art computer vision, including several focused on generative AI. If you like research like Marigold, be sure to check out the list of currently available projects.

Papers and Citations

The first model, Marigold-Depth v1.0 was introduced in our CVPR'2024 paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation" by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. This model required 10–50 inference steps with the DDIM scheduler and showed that high-quality depth estimators can be trained solely on synthetic data, leveraging the pretrained image VAE and the intact latent space of the original LDM (Stable Diffusion).

@InProceedings{ke2023repurposing,
  title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
  author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Follow up models, including Marigold-Depth v1.1, Marigold-Normals v1.1, Marigold IID-Appearance v1.1, Marigold IID-Lighting v1.1, Marigold-Depth LCM v1.0, and Marigold-Depth HR v1.0, were introduced in the follow-up paper titled "Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis" by Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler. With the trailing timesteps setting of the DDIM scheduler, these models achieve strong performance and speed using as few as 1–4 diffusion inference steps. Additionally, we introduced an alternative consistency distillation method (LCM) for few-step inference, along with exploration of high-resolution (HR) inference.

@misc{ke2025marigold,
  title={Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis},
  author={Bingxin Ke and Kevin Qu and Tianfu Wang and Nando Metzger and Shengyu Huang and Bo Li and Anton Obukhov and Konrad Schindler},
  year={2025},
  eprint={2505.09358},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

We’re thankful to the community for acknowledging our work through citations.