THE PREDICTIVE FORWARD-FORWARD ALGORITHM

CreateAMind

发布于 2023-09-22 07:05:20

2890

文章被收录于专栏：CreateAMindCreateAMind

https://github.com/ago109/predictive-forward-forward

ABSTRACT

We propose the predictive forward-forward (PFF) algorithm for conducting credit assignment in neural systems. Specifically, we design a novel, dynamic recurrent neural system that learns a directed generative circuit jointly and simultaneously with a representation circuit. Notably, the system integrates learnable lateral competition, noise injection, and elements of predictive coding, an emerging and viable neurobiological process theory of cortical function, with the forward-forward (FF) adaptation scheme. Furthermore, PFF efficiently learns to propagate learning signals and updates synapses with forward passes only, eliminating key structural and computational constraints imposed by backpropagation-based schemes. Besides computational advantages, the PFF process could prove useful for understanding the learning mechanisms behind biological neurons that use local signals despite missing feedback connections. We run experiments on image data and demonstrate that the PFF procedure works as well as backpropagation, offering a promising brain-inspired algorithm for classifying, reconstructing, and synthesizing data patterns.

Keywords

Brain-inspired computing · Self-supervised learning · Neuromorphic · Forward learning

1 Introduction

The algorithm known as backpropagation of errors [59, 32], or “backprop” for short, has long faced criticism concerning its neurobiological plausibility [10, 14, 56, 35, 15]. Despite powering the tremendous progress and success behind deep learning and its every-growing myriad of promising applications [57, 12], it is improbable that backprop is a viable model of learning in the brain, such as in cortical regions. Notably, there are both practical and biophysical issues [15, 35], and, among these issues, there is a lack of evidence that:

1) neural activities are explicitly stored to be used later for synaptic adjustment,

2) error derivatives are backpropagated along a global feedback pathway to generate teaching signals,

3) the error signals move back along the same neural pathways used to forward propagate information, and,

4) inference and learning are locked to be largely sequential (instead of massively parallel). Furthermore, when processing temporal data, it is certainly not the case that the neural circuitry of the brain is unfolded backward through time to adjust synapses [42] (as in backprop through time).

Recently, there has been a growing interest in the research domain of brain-inspired computing, which focuses on developing algorithms and computational models that attempt to circumvent or resolve critical issues such as those highlighted above. Among the most powerful and promising ones is predictive coding (PC) [18, 48, 13, 4, 51, 41], and among the most recent ones is the forward-forward (FF) algorithm [19]. These alternatives offer different means of conducting credit assignments with performance similar to backprop, but to the contrary, are more likely consistent with and similar to real biological neuron learning (see Figure 1 for a graphical depiction and comparison of respective credit assignment setups). This paper will propose a novel model and learning process, the predictive forward-forward (PFF) process, that generalizes and combines FF and PC into a robust stochastic neural system that simultaneously learns a representation and generative model in a biologically-plausible fashion. Like the FF algorithm, the PFF procedure offers a promising, potentially helpful model of biological neural circuits, a potential candidate system for low-power analog hardware and neuromorphic circuits, and a potential backprop-alternative worthy of future investigation and study.

2 Predictive Forward-Forward Learning

The brain-inspired credit assignment process that we will design and study is called the predictive forward-forward (PFF) algorithm, which is a generalization of the FF algorithm [19]. At a high level, the PFF process consists of

two neural structures or circuits, i.e.,

a representation circuit (parameterized by Θr) that focuses on acquiring distributed representations of data samples and

a top-down generative circuit (parameterized by Θg) that focuses on learning how to synthesize data given the activity values of the representation circuit.

Thus, the PFF process can be characterized as a complementary system that jointly learns a classifier and generative model. We will first define the notation used in this paper, then proceed to describe the inference and learning mechanics of the representation and generative circuits.

Relationship to Local Learning:

It has long been argued that the synapses in the brain are likely to be adjusted according to a local scheme, i.e., only information closest spatially and in time to a target synapse is involved in computing its change in efficacy. Methods that adhere to this biological constraint are referred to as local learning procedures [45, 30, 37, 27, 38, 5, 11, 26, 24], offering a potential replacement for backprop for training deep networks, relaxing one or more of its core constraints (see Figure 3 for a comparative examination of the key ones across algorithms). Desirably, it has even been shown that, empirically, updates from a local scheme can result in improved generalization [30, 45], even with temporal data [33] and discrete signals [34]. There have been many efforts in designing biologically-plausible local learning algorithms, such as contrastive Hebbian learning (mentioned above) [36], contrastive divergence for learning harmoniums (restricted Boltzmann machines) [20], the wake-sleep algorithm for learning Helmholtz machines [21], and algorithms such as equilibrium propagation [54]. Other efforts directly integrate local learning into the deep learning pipeline, e.g., kickback [2] and decoupled neural interfaces [23]. It is worth pointing out that PFF bears similarity to the wake-sleep algorithm, which entails learning a generative model jointly with an inference (recognition) model. However, the wake-sleep algorithm suffers from instability, given that the recognition network could be damaged by random fantasies produced by the generative network and the generative network could itself be hampered by the low-quality representation capability of the inference network (motivating variations such as reweighted wake-sleep [7]). PFF instead aims to learn the generative model given the representation circuit, using locally-adapted neural activities as a guide for the synthesization process rather than randomly sampling the generative model to create teaching signals for the recognition network (which would potentially distract its optimization with nonsensical signals).

Relationship to Contrastive Hebbian Learning:

When designing a network (as we do above), one might notice that the inference process is quite similar to that of a neural system learned under contrastive Hebbian learning (CHL) [36], although there are several significant differences. Layer-wise activities in a CHL-based neural system are updated in accordance with the following set of dynamics: z?(t) = z?(t − 1) + β(−z?(t − 1) + m?) (16) m?= φ?? W?· z?−1(t − 1) + (W?+1)T· z?+1(t − 1) ? (17) where we notice that dynamics do not involve normalization and the values for layer ? are integrated a bit differently than in Equation 4, i.e., neural values change due to leaky Euler integration, where the top-down and bottom-up transmissions are combined to produce a perturbation to the layer rather than propose a new value of the state itself.

Like CHL, FF and PFF require two phases (or modes of computation) where the signals propagated through the neural system will be used in contrast with one another. Given sample (x, y), CHL entails running the system first in an un-clamped phase (negative phase), where only the input image x is clamped to the sensory input/bottom layer, followed by a clamped phase, where both x and its target y are clamped, i.e., y is clamped to the output layer (positive phase). At the end of each phase (or inference cycle), the layer-wise activities are recorded and used in a subtractive Hebbian rule to calculate the updates for each synaptic matrix. Note that the positive phase of CHL depends on first running the negative phase. FF and PFF, in contrast, amount to running the positive and negative phases in parallel (with each phase driven by different data), resulting in an overall faster processing time (instead of one inference cycle being conditioned on the statistics of another, the same cycles are now run on both positive or negative data, with opposite objectives [19], at the same time).

Relationship to Predictive Coding:

The PFF algorithm integrates the idea of local hypothesis generation from predictive coding (PC) into the inference process by leveraging the representations acquired within the recurrent representation circuit’s iterative processing window. Specifically, each layer of the representation circuit, at each time step, becomes the prediction target for each layer of the generative circuit. In contrast, PC models must leverage a set of feedback synapses to progressively modify their layerwise activities before finally adjusting synaptic values. Furthermore, PFF dynamically modifies synapses within each processing time step, whereas：typically, PC circuits implement a form of expectation-maximization that, as a result, requires longer stimulus processing windows to learn effective generative models [41] (in this work, the PFF generative circuit learns a good-quality generative model in only 8-10 steps whereas the models of [41] required at least 50 steps).

Computational Benefits of the Predictive Forward-Forward Algorithm:

From a hardware efficiency point-ofview, the PFF algorithm, much like the FF procedure5, is a potentially promising candidate for implementation in analog and neuromorphic hardware. It is the fact that FF and PFF only require forward passes6to conduct inference and synaptic adjustment that creates this possibility, given that these algorithms require no distinct separate computational pathway(s) needed for transmitting teaching signals (required by backprop [49] and feedback alignment methods [31, 37]) or even error messages (required by predictive coding [48, 4, 41], representation alignment [45], and target propagation processes [30]). Desirably, this means that no specialized hardware is needed for calculating derivatives (which is needed for activation functions – this means that even implementations of PFF with discrete and non-differentiable stochastic functions, e.g., sampling the Bernoulli distribution,possible/viable) nor for maintaining and adjusting in memory separate feedback transmission synapses (that often require different adjustment rules [45, 42])

Intertwined Inference-and-Learning and Mortal Computation：

交织在一起的推理学习和致Mortal计算:另一个重要的观点是，PFF算法，以及类似的未来算法，提供了一个计算框架，在[19]中被称为“Mortal计算”，或者软件和硬件不再明显分离的系统。在某种意义上，人类计算的概念与交织的推理和学习[47，45，42，41]具有相同的精神，这是指大脑中的神经生物学学习和推理实际上不是两个完全不同和独立的过程，而是相互依赖的互补过程，甚至依赖和依赖于进行它们的神经回路的结构。交织在一起的推理和学习，以预测编码/处理和对比赫比学习为其关键范例，意味着没有推理过程，就不存在突触调整，没有突触调整，推理就变得毫无意义(因为知识不会被编码成任何形式的长期记忆)。假设编码在模拟突触权重/值中的知识(以及它们最终进行/支持的计算)在硬件故障/死亡时“死亡”或消失，那么人类计算可以被认为是交织的推理和学习的下一个逻辑延伸和结果。接受人类计算的概念，伴随着计算程序/神经计算的模拟不再能够自然地复制到数百万台计算机/设备(以推进高性能计算系统)，可以带来能源支出的大量节省，这已经越来越成为人工智能(AI)研究中的一个问题[1，8，58，55]，即我们如何避免设计快速增加计算和碳成本的智能系统，或红色AI [55]，而支持绿色AI方法。最终，这意味着精心设计交织的学习和推理方法，解释它们在目标物理结构中的表现，例如硬件的形式和设计，可以证明在统计学习研究中向前发展是非常宝贵的，提供了一种运行大规模神经系统(包含数万亿个突触)的方法，同时只消耗几瓦的能量。值得注意的是，这可能会导致神经系统的设计，这些系统能够根据它们运行的硬件的状态进行感知和适应，模拟定义生物学习和推理的稳态约束。然而，在设计在硬件上有效运行的算法方面仍有许多工作要做，这些硬件的精确细节在很大程度上是未知的，研究PFF和FF等过程如何扩展到更大的神经系统可能是这样做的重要一步。

On Self-Generated Negative Samples:

One of the most important elements of PFF is its integrated, jointlylearned predictive/generative neural circuit. This generative model, as we have shown in the main paper, is capable not only of high-quality reconstruction of the original input patterns but also of synthesizing data by sampling its latent prior P(zs), which we designed to be a Gaussian mixture model due to the highly multi-modal latent space induced by its top-most neural activities. An important future direction to explore with FF-based biological credit assignment algorithms is the examination of different schedules of the positive and negative phases in contrast to the simultaneous ones used in [19] and this work.

Specifically, an important question to answer is:

To what degree can the positive and negative phases be separated while still facilitating stable and effective local adaptation of a neural circuit’s synapses?

Ours and other future efforts should explore this question in depth leveraging PFF’s jointly generative/predictive circuit – one starting point could entail extending the generative circuit to be conditioned on a label vector y and running it at certain points in training (e.g., after a pass through M data samples or batches) to synthesize several batches of patterns and incorrectly mapping them to clearly incorrectly label inputs (i.e., choose y to get a particular sample of a certain class and then purposely automatically select an incorrect label knowing the originally chosen one).

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2023-09-22 09:01，如有侵权请联系 cloudcommunity@tencent.com 删除

algorithm