GANalyze: Toward Visual Definitions of Cognitive Image Properties

Lore Goetschalckx^*12 Alex Andonian*^*1 Aude Oliva¹ Phillip Isola¹
¹MIT Computer Science and Artificial Intelligence Laboratory,
²KU Leuven (Belgium)

[GitHub Code]

We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-grained differences in their visual attributes. By navigating this manifold in directions that increase memorability, we can visualize what it looks like for a particular generated image to become more or less memorable. The resulting "visual definitions" surface image properties (like "object size") that may underlie memorability. Through behavioral experiments, we verify that our method indeed discovers image manipulations that causally affect human memory performance. We further demonstrate that the same framework can be used to analyze image aesthetics and emotional valence.

Interactive GANalyzer

The GANalyze framework

Our framework consists of the following interacting components:

Generator \(G\): Given a latent noise vector \(z\) and class label \(y\), the generator produces a photorealistic image \(G(z, y)\).
Assessor \(A\): Assigns a numerical value to an image indicating the magnitude of an cognitive property of interest.
Transformer \(T\): A function that moves the input \(z\) along a certain direction \(\theta\) in the latent space of \(G\).

Our model learns how to transform a \(z\) vector such that when fed to a Generator,the resulting image's property of interest changes. The transformation is achieved by the Transformer, which moves the \(z\) vector along a learned direction, \(\theta\), in the Generator's latent space. The property of interest (e.g., memorability) is predicted by an Assessor module (e.g., MemNet). Finally, \(\alpha\) acts as knob to set the degree of change one wants to achieve in the Assessor value (e.g., MemNet score). It tells the Transformer how far exactly to move along \(\theta\). The process is outlined in the schematic below.

Please refer to the paper for additional details.

Reference

L. Goetschalckx, A. Andonian, A. Oliva, and P. Isola. GANalyze: Toward Visiual Definitions of Cognitive Image Properties. , 2019.


            @article{,

              title={GANalyze: Toward Visual Definitions of Cognitive Image Properties},

              author={Goetschalckx, Lore and Andonian, Alex and Oliva, Aude and Isola, Phillip},

              journal={arXiv preprint arXiv:1906.10112},

              year={2019}

            }

Acknowledgement:
This work was partly funded by NSF award 1532591 in Neural and Cognitive Systems (to A.O), by a fellowship (Grant 1108116N) and a travel grant (Grant V4.085.18N) awarded to Lore Goetschalckx by the Research Foundation - Flanders (FWO).

Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S.