MULTIGRID NEURAL MEMORY

Tri Huynh, Michael Maire, Matthew R. Walter
University of Chicago | TTI-Chicago

We introduce a radical new approach to endowing neural networks with access to long-term and large-scale memory. Architecting networks with internal multigrid structure and connectivity, while distributing memory cells alongside computation throughout this topology, we observe that coherent memory subsystems emerge as a result of training. Our design both drastically differs from and is far simpler than prior efforts, such as the recently proposed Differentiable Neural Computer, which uses intricately crafted controllers to connect neural networks to external memory banks. Our hierarchical spatial organization, parameterized convolutionally, permits efficient instantiation of large-capacity memories. Our multigrid topology provides short internal routing pathways, allowing convolutional networks to efficiently approximate the behavior of fully connected networks. Such networks have an implicit capacity for internal attention; augmented with memory, they learn to read and write specific memory locations in a dynamic data-dependent manner. We demonstrate these capabilities on synthetic exploration and mapping tasks, where our network is able to self-organize and retain long-term memory for trajectories of thousands of time steps, outperforming the DNC. On tasks without any notion of spatial geometry: sorting, associative recall, and question answering, our design functions as a truly generic memory and yields excellent results.

PAPER | CODE

Citation

To cite this work, please use:
@INPROCEEDINGS{HuynhICML2020,
author = {Tri Huynh and Michael Maire and Matthew R. Walter},
title = {Multigrid Neural Memory},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2020}}

THE ARCHITECTURE

Multigrid memory architecture. Top Left: A multigrid convolutional layer [Ke et al., 2017] transforms input pyramid X , containing activation tensors {x₀, x₁, x₂}, into output pyramid Y via learned filter sets that act across the concatenated representations of neighboring spatial scales. Top Right: We design an analogous variant of the convolutional LSTM [Xingjian et al., 2015], in which X and Y are indexed by time and encapsulate LSTM internals, e.g. memory cells (c), hidden states (h), and outputs (o). Bottom: Connecting many such layers, both in sequence and across time, yields a multigrid mesh capable of routing the input into a much larger memory space, updating a distributed memory representation, and providing multiple read-out pathways (e.g. z_0,t or z_3,t).

Memory interfaces. Left: Multiple readers (red, orange) and a single writer (blue) simultaneously manipulate a multigrid memory. Readers are multigrid CNNs; each convolutional layer views the hidden state of the corresponding grid in memory by concatenating it as an additional input. Right: Distinct encoder (green) and decoder (purple) networks, each structured as a deep multigrid memory mesh, cooperate to perform a sequence-to-sequence task. We initialize the memory pyramid (LSTM internals) of each decoder layer by copying it from the corresponding encoder layer.

DEMO

Mapping & Localization in Random Walk

Joint Exploration, Mapping & Localization

Mapping & Localization in Spiral Motion, with 3x3 Queries

Mapping & Localization in Spiral Motion, with 3x3 and 9x9 Queries