I just finished reading this paper. It is very relevant to the work I’m doing now on grid cells as I just reimplemented “Vector-based navigation using grid-like representations in artificial agents” in pytorch. I got similar results as in the paper, namely that only in specific circumstances could I get grid cells to appear in the neural network, which was based on particular hyperparameters.
Their main claim in the paper is that “Unique to Neuroscience, deep
learning models can be used not only as a tool but interpreted as models of the brain. The central claims of recent deep learning-based models of brain circuits are that they make novel predictions about neural phenomena or shed light on the fundamental functions being optimized….It is highly improbable that Deep Learning models of path integration would have produced grid cells as a novel prediction simply from task-training, had grid cells not already been known to exist….These results challenge the notion that deep networks offer a free lunch for Neuroscience in terms of discovering the brain’s optimization problems or generating novel a priori predictions about single-neuron representations, and warn that caution is needed when building and interpreting such models”
I would agree and in fact think its very dangerous to read too much into deep learning models for inspiration of what the brain is optimizing for. ANNs are already very loose cartoon caricatures of brains, they are implemented as electric impulse point neurons. Real neurons have top down and bottom up communications, communicate via chemical, electrical, and multiple brain waves oscillations,are massively parallel, among other characteristics that we do not emulate in deep learning. I do strongly believe that deep learning can learn a lot from neuroscience though. If we are to build more intelligent machines, brains of biological creatures seems to be the best area to study from.
In the paper, they demonstrate that:
- almost all ANNs trained on path integration (PI) learn to optimally encode position, but almost never learn grid cell firing codes.
- grid period and period ratios depend on hyperparameters and not on the task itself!!!
- the appearance of grid cells depends WHOLLY on encoding of the supervised target and not on the task itself.
- unstable: the specific encoding requires many hyperparameters and small alterations results in the loss of those grid cells
They trained 415 different ANNs across various ranges of network architectures, activation functions, optimizers, loss, supervised targets, and various ANN parameters (dropout,regularization,etc) and found almost all did PI well (88%) while only (12.7%) learned grid cell like firing patterns.
Of interesting note, they found that grid cells only appeared from encoding a target of Difference-of-Gausian (DoG):
“Only by making readout encoding by a DoGshapre of tuning curve sometimes resulted in lattices”
I had to look that up. DoG is just an image processing algorithm that involves the subtraction of one Gaussian blurred version of an original image from another, less blurred version of the original using convolutions.
They also found that grid modules measuring different periods usually did not appear and instead the network only usually learned one module of grid cells.
The suggest that the dropout acts as a form of error correcting as is suggested as part of the functional use of grid cells in many neuroscience papers. To that end I would expect ANN models with dropout to do better, but I did not see any specific notes about that in the paper.
When I think about grid cells, they seem to fire for many non PI tasks and many believe that grid cells are a core component of a cognitive map. I am interested to find out what are the special computational properties of grid cells and with their interaction with place cells that may be useful for building new types of ANNs. Most of the neuroscience inspired AI ideas only take bits and pieces from neuroscience and exploit a specific computational property (CNNs, attention mechanisms,etc) to make them useful for ANNs.
If we are to look at the results from the analysis of 3D grid cell experiments, we expect to not see any grid patterns at all, so that is something else to consider.
I do think there is something useful we can use from grid cells, but it requires further investigation.