ADVERSARIAL TEXT TO CONTINUOUS IMAGE GENERATION

Adversarial text to continous image generation

[Code]

Abstract

Implicit Neural Representations (INR) provide a natural way to parametrize images as a continuous signal, using an MLP that predicts the RGB color at an (x, y) image location. Recently, it has been shown that high-quality INR decoders can be designed and integrated with Generative Adversarial Networks (GANs) to facilitate unconditional continuous image generation that is no longer bound to a particular spatial resolution. In this paper, we introduce HyperCGAN, a conceptually simple approach for Adversarial Text to Continuous Image Generation based on HyperNetworks, which produces parameters for another network. HyperCGAN utilizes HyperNetworks to condition an INR-based GAN model on text. In this setting, the generator and the discriminator weights are controlled by their corresponding HyperNetworks, which modulate weight parameters using the provided text query. We propose an effective Word-level hyper-modulation Attention operator, termed WhAtt, which encourages grounding words to independent pixels at input (x, y) coordinates. To the best of our knowledge, our work is the first that explores Text to Continuous Image Generation (T2CI). We conduct comprehensive experiments on COCO 2562, CUB 2562, and ArtEmis 2562 benchmark, which we introduce in this paper. HyperCGAN improves the performance of text-controllable image generators over the baselines while significantly reducing the gap between text-to-continuous and text-to-discrete image synthesis. Additionally, we show that HyperCGAN, when conditioned on text, retains the desired properties of continuous generative models (e.g., extrapolation outside of image boundaries, accelerated inference of low-resolution images, out-of-the-box superresolution). Code and ArtEmis 2562 benchmark will be made publicly available.

Qualitative Results from a discrete version of HyperCGAN

Example of affective captions and corresponding emotion from ArtEmis dataset and generations from HyperC-SG^word

Qualitative results from SoTA and from HyperC-SG^word trained on ArtEmis and COCO datasets.

Qualitative Examples on CUB with Extrapolated Region outside red rectangles

a brown bird with white on the supercillary and a brown bill

a small bird with black and gray markings and a black beak

mostly brownish grey with white streaks on its primaries and a white superciliary

this bird has wings that are black and has a white belly

this bird has wings that are brown and has a white rotund body

this bird is black with a long tail and has a very short beak

this bird is brown with white and has a long pointy beak

this bird is mostly white and brown has a long straight pointed beak and rectrices are brown spots

this bird is small in size black in color and has a pointed bill

Qualitative Examples on WIKI with Extrapolated Region outside red rectangles

a little nostogia the print type remindsme of older comic book styles

a mysterious looking woman who sits alone i want know that goes with the person because interesting

bright and vivid colors but somewhat confusing on what it represents

the pale skin in the white clothing against the dark background is unsettling

this painting makes you feel scared in its realness the heavy man is realistic

this person s eyes are wide open as though they know something ominous will happen soon

this reminds me of home on the coast it makes me smell and feel the surroundings

wouldn t classify this as art someone decided to paint two shapes of black paint on white paper

wow the blue ski and the road looks like they go forever

Qualitative Examples on COCO with Extrapolated Region outside red rectangles

a baseball player preparing to throw the baseball

a bathroom sink with toiletries on the counter

a bathroom with a black counter and a big mirror

a big blue bus is parked on the side of the street.jpg

a big pretty plate filled with some tasty looking food

a black and white photo of a stop sign by some grass

a bowl of pizza a bowl of green beans a of carrots and a bowl bread and berries

a bus pausing at a bus stop for passengers

a big brown bear enjoys the water in his habitat

Failure Cases

HyperCGAN limitations is related to

Left: The facial expression of people is not clear.
Middle: Artifacts can be visible in shape of blobs.
Right: The number of objects might not be represented accurately and generated as patterns related to the nature of the object.

a cute girl holding a plate with a cake and lit candles

a video game controller being pointed at a television

four zebra standing around outside in the wild