Manipulating Attributes of Natural Scenes via Hallucination

In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly done in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a large set transient attributes to a large extent within a single model, eliminating the need to train multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.

Introduction

"The trees, being partly covered with snow, were outlined indistinctly against the grayish background formed by a cloudy sky, barely whitened by the moon."

The visual world we live in constantly changes its appearance depending on time and seasons. For example, at sunset, the sun gets close to the horizon gives the sky a pleasant red tint, with the advent of warm summer, the green tones on the grass leave its place in bright yellowish tones and autumn brings a variety of shades of brown and yellow to the trees. Such visual changes in the nature continues in various forms at almost any moment with the effect of time, weather and season. Such high-level changes are referred to as transient scene attributes -- e.g. cloudy, foggy, night, sunset, winter, summer, to name a few .

Image generation is quite a challenging task since it needs tohave realistic looking outputs. Visual attribute manipulation can beconsidered a bit harder as it aims at photorealism as well as results that are semantically consistent with the input image. Unlike recent image synthesis methods , which explore producing realistic-looking images from semantic layouts, automatically manipulating visual attributes requires modifying the appearance of an input image while preserving object-specific semantic details intact. Some recent style transfer methods achieve this goal to a certain extent but they require a reference style image .

We propose a new two-stage visual attribute manipulation framework for changing high-level attributes of a given outdoor image. Very recently, in CVPR2019, a similar scene generation tool named GauGAN was proposed to synthesize realistic outdoor scenes from interactively edited doodles. Our image editing tool differs from GauGAN in the following two aspects. First, our image editing tool not only aims at scene generation from semantic layout like in GauGAN but also it provides manipulating transient attributes of input outdoor scenes. Second, our scene generation model enables users to play degrees of transient attributes as well as drawing a novel outdoor scene interactively.

System Overview

Our framework provides an easy and high-level editing system to manipulate transient attributes of outdoor scenes. The key component of our framework is a scene generation network that is conditioned on semantic layout and continuous-valued vector of transient attributes. This network allows us to generate synthetic scenes consistent with the semantic layout of the input image and having the desired transient attributes. One can play with 40 different transient attributes by increasing or decreasing values of certain dimensions. Note that, at this stage, the semantic layout of the input image should also be fed to the network, which can be easily automated by a scene parsing model. Once an artificial scene with desired properties is generated, we then transfer the look of the hallucinated image to the original input image to achieve attribute manipulation in a photorealistic manner.

Scene Generation Model (SGN)

We train a conditional Generative Adversarial Network (cGAN) model named as SGN to hallucinate an outdoor scene in different transient attributes conditioning semantic layouts s and transient attributes a. We follow a multi-scale strategy similar to that in Pix2pixHD . Our scene generator network (SGN), however, takes the transient scene attributes and a noise vector as extra inputs in addition to the semantic layout. While the noise vector provides stochasticity and controls diversity in the generated images, transient attributes let the users have control on the generation process. Our full objective that combines multi-scale GAN loss and layout-invariant feature matching loss thus becomes:

\min_G \left( \left(\max_{D=\{D_1, D_2, D_3\}} \sum_{k=1,2,3}\mathcal{L}_{GAN}(G,D_k) \right) +\lambda \mathcal{L}_{percep}(G) \right)

Demo

Results

Attribute Manipulation

Comparison

Attribute Transition

Videos

Additional Results

Season transfer to paintings. Source images: Wheat Field with Cypresse by Vincent van Gogh (1889), In the Auvergne by Jean-Francois Millet (1869) and Lourmarin by Paul-Camille Guigou (1868), respectively.