High-fidelity performance metrics for generative models in PyTorch

igh-fidelity performance metrics for generative models in PyTorch



Evaluation of generative models such as GANs is an important part of the deep learning research. In the domain of 2D image generation, three approaches became widely spread: Inception Score (aka IS), Fr├ęchet Inception Distance (aka FID), and Kernel Inception Distance (aka KID).

These metrics, despite having a clear mathematical and algorithmic description, were initially implemented in TensorFlow, and inherited a few properties of the framework itself (see Interpolation) and the code they relied upon (see Model). These design decisions were effectively baked into the evaluation protocol and became an inherent part of the metrics specification. As a result, researchers wishing to compare against state of the art in generative modeling are forced to perform evaluation using codebases of the original metric authors. Reimplementations of metrics in PyTorch and other frameworks exist, but they do not provide a proper level of fidelity, thus making them unsuitable for reporting results and comparing them to other methods.

This software aims to provide epsilon-exact implementations of the said metrics in PyTorch, and thus remove inconveniences associated with generative models evaluation and development. All steps of the evaluation pipeline are correctly tested, with relative errors and sources of remaining non-determinism summarized in sections below.

Source code

Check out the project GitHub page


pip install torch-fidelity


  author={Anton Obukhov and Maximilian Seitzer and Po-Wei Wu and Semen Zhydenko and Jonathan Kyl and Elvis Yu-Jing Lin},
  title={High-fidelity performance metrics for generative models in PyTorch},
  note={Version: 0.2.0, DOI: 10.5281/zenodo.3786540}