In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene.
The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter),
weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes,
the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result.
As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly done in most of the appearance or style transfer approaches.
Moreover, it allows to simultaneously manipulate a given scene according to a large set transient attributes to a large extent within a single model, eliminating the need to train multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.
"The trees, being partly covered with snow, were outlined indistinctly against the grayish background formed by a cloudy sky, barely whitened by the moon."
—Honore de Balzac (Sarrasine, 1831)
The visual world we live in constantly changes its appearance depending on time and seasons.
For example, at sunset, the sun gets close to the horizon gives the sky a pleasant red tint,
with the advent of warm summer, the green tones on the grass leave its place in bright yellowish tones
and autumn brings a variety of shades of brown and yellow to the trees. Such visual changes in the nature continues
in various forms at almost any moment with the effect of time, weather and season.
Such high-level changes are referred to as transient scene attributes -- e.g. cloudy, foggy, night, sunset, winter, summer, to name a few .
Image generation is quite a challenging task since it needs tohave realistic looking outputs.
Visual attribute manipulation can beconsidered a bit harder as it aims at photorealism as well as results that are semantically consistent with the input image.
Unlike recent image synthesis methods ,
which explore producing realistic-looking images from semantic layouts, automatically manipulating visual attributes
requires modifying the appearance of an input image while preserving object-specific semantic details intact.
Some recent style transfer methods achieve this goal to a certain extent but they require a reference style
We propose a new two-stage visual attribute manipulation framework for changing high-level attributes of a given outdoor image.
Very recently, in CVPR2019, a similar scene generation tool named GauGAN was proposed to synthesize
realistic outdoor scenes from interactively edited doodles. Our image editing tool differs from GauGAN in the following two aspects.
First, our image editing tool not only aims at scene generation from semantic layout like in GauGAN but also it provides manipulating transient attributes of input outdoor scenes.
Second, our scene generation model enables users to play degrees of transient attributes as well as drawing a novel outdoor scene interactively.
Our framework provides an easy and high-level editing system to manipulate transient attributes of outdoor scenes. The key component of our framework is a scene generation network that is conditioned on semantic layout and continuous-valued vector of transient attributes.
This network allows us to generate synthetic scenes consistent with the semantic layout of the input image and having the desired transient attributes. One can play with 40 different transient attributes by increasing or decreasing values of certain dimensions.
Note that, at this stage, the semantic layout of the input image should also be fed to the network, which can be easily automated by a scene parsing model.
Once an artificial scene with desired properties is generated, we then transfer the look of the hallucinated image to the original input image to achieve attribute manipulation in a photorealistic manner.
Scene Generation Model (SGN)
We train a conditional Generative Adversarial Network (cGAN) model named as SGN to hallucinate an outdoor scene
in different transient attributes conditioning semantic layouts s and transient attributes a.
We follow a multi-scale strategy similar to that in Pix2pixHD . Our scene generator network (SGN),
however, takes the transient scene attributes and a noise vector as extra inputs in addition to the semantic layout.
While the noise vector provides stochasticity and controls diversity in the generated images, transient attributes let the users have control on the generation process.
Our full objective that combines multi-scale GAN loss and layout-invariant feature matching loss thus becomes:
This work was supported in part by TUBA GEBIP fellowship awarded to E. Erdem. We would like to thank NVIDIA Corporation for the donation of GPUs used in this research. This work has been partially funded by the DFG-EXC-Nummer 2064/1-Projektnummer 390727645.