by Randall C. O’Reilly & Michael J. Frank
Neural Computation, 2006
This paper presents a biologically plausible model of working memory. The motivation is two-fold: first to understand how working memory functions in the human (or monkey) brain, and second to create a neural net capable of solving problems that require short term storage of information. Such problems include: memorize a sequence of letters and then repeat them. Traditional recurrent neural nets do not do well on this kind of problem since they do not have a stable mechanism to store information across arbitrary time delays.
The model essentially consists of two parts: a memory bank (in the Prefrontal Cortex), and a gating mechanism that controls when memories can be written to (controlled by the Basal Ganglia, and other midbrain structures). The gating system is modeled as an actor/critic architecture. The critic evaluates which stimuli are task relevant and the actor uses this knowledge to open and close memory gates.
The paper describes a reinforcement learning algorithm that can learn to solve various working memory tasks given this architecture. The algorithm learns which stimuli are task-relevant for which memories, and how to combine the stimuli and memories to produce a desired output. To do so, the algorithm must solve both the temporal and structural credit assignment problems. The former refers to figuring out when in time the relevant information for a task was given and the latter refers to figuring out what aspect of that information is relevant.
To solve the temporal credit assignment problem, the paper introduces the Perceived Value Learned Value (PVLV) learning algorithm, which is closely related to Temporal Differences (TD) learning. The goal is to learn to associate the an event at one point in time with a reward that occurs some time later. The algorithm works by learning two associations: the Learned Value (LV) is an association between an event and a reward. The Perceived Value (PV) is an association between an earlier event and the LV signal, which serves to learn an association between the earlier event and future rewards. Unlike TD learning, there is no propagation of the reward signal through multiple time steps, and consequently the paper argues that the PVLV algorithm might work better when the intervening events are chaotic and unpredictable. The PVLV algorithm also maps well onto known biology, which may be less true of the TD algorithm.
The model is tested through several experiments, in which it learns to solve various working memory tasks. Each task requires maintaining knowledge of task state across arbitrary time delays. For example, one task, termed 1-2-AX involves taking a different action in response to presented letters depending on whether the last observed number was a “1” or a “2” in a sequence of numbers and letters. These experiments demonstrate that adaptive gating is critical to the tasks. Models that have such mechanisms, the proposed model as well as LSTM, outperformed recurrent neural net models without gating mechanisms. The experiments could be improved by adding stronger generalization tests. Two of the three experiments only report training set performance (epochs to a performance criterion on the training set). The last experiment includes generalization analysis on a test set. It would be useful to see the same kind of analysis on all the experiments.
While the experiments are convincing at showing the power of adaptive gating, it is not clear if the proposed method is advantageous over other gating methods like LSTM. Both methods perform similarly on all tasks evaluated, with a slight advantage to the proposed method. Given the current popularity of LSTM, it would be interesting to investigate in greater detail how the algorithm proposed in this paper compares. In particular, it would be great to see experiments on harder tasks or more practical tasks, like language translation. The network implemented in the paper is very small and it remains unclear if it would scale well to the size required for solving these practical tasks.
The paper argues that whether or not their method gets better results than other algorithms, it is worth studying since it is more biologically plausible than the alternatives. This point deserves further analysis: what is the advantage of biological plausibility? While this is not a question that can be answered in the scope of a single paper, it will be important for future work to a) use the model to make novel discoveries about biology and b) use the biological inspirations to achieve computational results that beat the alternative approaches.