Enhancing Photorealism Enhancement

Stephan Richter

Intel Labs

Hassan Abu AlHaija

PCH Innovations / Intel Labs1

Vladlen Koltun

Intel Labs

1work done while an intern at Intel Labs

Abstract

We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.

Paper

Paper

BibTeX

Please cite our work if you use code or data from this site.

@Article{Richter_2021,
                    title = {Enhancing Photorealism Enhancement},
                    author = {Stephan R. Richter and Hassan Abu AlHaija and Vladlen Koltun},
                    journal= {arXiv:2105.04619},
                    year = {2021},
                }

Video (Paper)

Video (Keynote at Eurographics 2021)

Results

GTA V to Cityscapes

The modifications by our method are geometrically and semantically consistent with the original images.

They are also temporally stable:

It greens the parched grass and hills in GTA's California:

It adds reflections to the windows and increases the fresnel effect (e.g., at the roof of cars):

It rebuilds the roads:

Translating GTA V to Mapillary Vistas.

Images from this dataset are recorded around the world with wide variety of cameras. The images are more vibrant and of high resolution.

It removes distant haze and rebuilds the road:

Grass becomes more voluminous: