1. What is the novel Spike-driven Transformer proposed in the research and how does it differ from existing spiking Transformers?
The novel Spike-driven Transformer proposed in the research incorporates the spike-driven paradigm throughout the network while maintaining great task performance. It differs from existing spiking Transformers by re-designing the core modules of Transformer, Vanilla Self-Attention (VSA) and Multi-Layer Perceptron (MLP), to have a spike-driven nature. The three input matrices for VSA, Query (Q), Key (K), and Value (V), undergo three steps of matrix multiplication, scale, and softmax. However, the proposed Spike-driven Self-Attention (SDSA) replaces matrix multiplication with Hadamard product and matrix column-wise summation with spiking neuron layer, resulting in almost no energy consumption. Additionally, the residual connections throughout the Transformer architecture are modified to communicate via binary spikes, making it hardware-friendly for neuromorphic chips. The proposed architecture outperforms or is comparable to State-Of-The-Art (SOTA) SNNs on both static and neuromorphic datasets, achieving 77.1% accuracy on ImageNet-1K.
read more
2. How can SNNs incorporate brain mechanisms?
SNNs can incorporate brain mechanisms by leveraging biological mechanisms to inspire neuron modeling, learning rules, and other aspects. Existing studies have shown that SNNs are more suited for incorporating brain mechanisms, such as long short-term memory and attention. By integrating deep learning technologies like network architecture, gradient backpropagation, and normalization, SNNs have greatly improved their task accuracy while maintaining spike-driven benefits. The goal is to combine SNN and Transformer architectures, using methods like neuron equivalence and surrogate gradient training to enhance performance and efficiency.
read more
3. What is the spiking neuron model used in Spike-driven Transformer?
The spiking neuron model used in Spike-driven Transformer is the Leaky Integrate-and-Fire (LIF) spiking neuron. This model is simplified from the biological neuron model and has biological neuronal dynamics, making it easy to simulate on a computer. The dynamics of the LIF layer are governed by a set of equations that describe the membrane potential and the firing of spikes. When the membrane potential exceeds a certain threshold, the neuron fires a spike, resulting in a binary output tensor. The Heaviside step function is used to determine the output based on the membrane potential. The model also includes a reset potential to reset the membrane potential after a spike is fired. Overall, the LIF spiking neuron model is a key component in the Spike-driven Transformer, allowing for efficient and sparse addition in the transformer architecture.
read more
4. What is the purpose of Relative Position Embedding (RPE) in Spike-driven Transformer?
Relative Position Embedding (RPE) is used in the Spike-driven Transformer to generate a tensor that represents the relative positions of spike patches. It is generated by another Conv layer in the Spiking Patch Splitting (SPS) part of the architecture. RPE is added to the output membrane potential tensor (u) to create the final output membrane potential tensor (U 0). This tensor, U 0, contains information about the relative positions of the spike patches, which is crucial for modeling the local-global information of images. By incorporating RPE, the Spike-driven Transformer can effectively capture the spatial relationships between spike patches, enabling it to learn and represent complex image features. Overall, RPE plays a vital role in enhancing the performance of the Spike-driven Transformer by providing spatial context to the spike-driven encoder.
read more