Structured Network Pruning by Measuring Filter-wise Interactions

Question

1. What is the proposed redundancy criterion in SNPFI?

2. How can redundancy be effectively identified under high pruning intensity?

3. How can filter-wise interactions' contributions be quantified during redundancy evaluation?

4. How can filter-wise interaction based redundancy criterion be integrated into RL algorithm for layer-wise pruning?

Accepted Answer

The proposed redundancy criterion in SNPFI is based on filter-wise interaction. It utilizes filter importance and filter utilization strength to determine the decision ability of individual and multiple filters. According to this criterion, the interaction difference abstracts the potential generalization gap caused by pruning and guides the weight recovery of the pruned model. This redundancy criterion has been theoretically and experimentally proven to be effective in optimizing the compression plan and ensuring the pruned model retains meaningful interaction behaviors inherent in the original model.

Accepted Answer

Identifying redundancy effectively and efficiently under high pruning intensity is still an unsettled problem. In this section, a new redundancy criterion based on filter-wise interaction is introduced to address this issue. The proposed redundancy criterion considers the interaction between filters to determine redundancy, which can help in identifying redundant filters more accurately. This approach aims to improve the efficiency of network pruning by decoupling useless structures from the CNN based on the redundancy criteria. By considering the filter-wise interaction, the proposed redundancy criterion can effectively identify redundancy even under high pruning intensity, leading to better pruning results and improved network performance.

Accepted Answer

To fairly quantify the filter-wise interactions' contributions during redundancy evaluation, we regard each inference process achieved by m filters in the l-th layer as a collaborative game < M l , V > [15]. During the inference on an image I, each filter is a player and m players align the coalition M l with the contribution V (M l ), where m = |M l | [1, c l out ], V (M l ) = log P (y=cls|M l ,I) 1-P (y=cls|M l ,I) [10]. By calculating the Shapley value [37] in Eq. (1), we can measure the importance of the c-th filter in the l-th layer. The filter interaction u d l (i, j) among i, j, when the other d filters exist, is defined in Eq. (2). The larger u d l (i, j), the stronger interaction when i,j form a coalition with the other d filters. With the u d l (i, j), we can measure the filter utilization strength U l (m) of the l-th layer in Eq. (3). A high value of U l (m) indicates that the interaction strength is intensive when m filters exist. This approach helps estimate the number of useless filters by U l (m).

Accepted Answer

The filter-wise interaction based redundancy criterion can be integrated into the RL algorithm for layer-wise pruning by defining the state s l, which includes the type, number of parameters, and number of floating-point operations for the l-th accessible layer. The action a l, representing the pruning sparsity of the layer, is predicted by the policy network p on the state s l and is bounded by the lower bound s l lb. The reward function R l (*) is formulated based on the filter utilization strength U l (m) and the number of remaining filters S, encouraging the agent to achieve higher filter utilization strength with fewer filters. The DDPG algorithm is utilized to optimize the pruning policy, with the parameters of the policy network updated based on the critic network and value function. This integration allows for efficient approximation of the optimal pruning plan S * and maintains the basic functionality of the pruned model.

Accepted Answer

The Interaction difference based fine-tuning approach addresses the generalization gap in pruned models by utilizing filter-wise interaction for fine-tuning. It defines the interaction difference I(S l , N l ) among S l and N l in the Eq. (11) to quantify the generalization gap. If V (S l ) = V (N l ) for any computational layer indexed by l, then I(S l , N l ) > 0, indicating the existence of meaningful interactions that lead to better generalization. To encourage the pruned model to learn these important interactions, the proposed ID loss in Eq.( 12) is used. By minimizing the L ID, the pruned model can effectively learn the important interaction, avoiding gradient explosion. The interaction difference I(S l , N l ) is integrated with ground truth during the fine-tuning process, as shown in Figure 2. Empirical demonstrations on MNIST, CIFAR-10, and ImageNet have shown the effectiveness of the proposed redundancy metric and SNPFI in addressing the generalization gap in pruned models.

Accepted Answer

Filter utilization strength (U l (m)) has a direct impact on accuracy. As shown in Figure 3, accuracy does not monotonously increase with the proportion of total filters. However, a higher U l (m) can ensure better accuracy and is more effective than sparsity. The U l (m) converges around 0.5 in diverse layers and applications, allowing for the estimation of sparsity lower bound s l lb with a uniform filter utilization strength th. This property is utilized to optimize the pruning plan, as demonstrated in Eq.(5). Additionally, the interaction difference loss L ID, as shown in Figure 4, is effective in weight recovery for aggressive pruning and guides the pruned model to increasing training accuracy without drastic fluctuation of validation loss. Overall, filter utilization strength plays a crucial role in improving accuracy and optimizing pruning plans.

Accepted Answer

SNPFI outperforms other pruning methods on CIFAR-10 by reducing computation by more than 60% without significant accuracy drop. It is nearly 2x faster than QPNN in deployment scenarios and achieves the highest compression and accuracy among state-of-the-art methods. Compared to AMC and TAS, SNPFI provides better network architecture by overcoming delayed reward. Compared to Greg-2, SNPFI achieves a 3.7% reduction and 2.54% higher accuracy without iterative training. SNPFI prunes 10% more filters than TAS and AMC, and performs 2.39% higher accuracy than AMC. When pruning MobileNetv1, SNPFI achieves comparable compression with AMC and NS. On single-band images, SNPFI generalizes well with less than 45% of the origin, and reduces 12% more computation overhead in ResNet-50 for gray-scale images compared to RGB images. The accuracy disparity between RGB and gray-scale images is due to the absence of colorful information in gray-scale images, which is vital for object recognition.

Structured Network Pruning by Measuring Filter-wise Interactions

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the proposed redundancy criterion in SNPFI?

2. How can redundancy be effectively identified under high pruning intensity?

3. How can filter-wise interactions' contributions be quantified during redundancy evaluation?

4. How can filter-wise interaction based redundancy criterion be integrated into RL algorithm for layer-wise pruning?

5. How does the Interaction difference based fine-tuning approach address the generalization gap in pruned models?

6. How does filter utilization strength affect accuracy?

7. How does SNPFI compare to other pruning methods on CIFAR-10?

Related Papers (5)

Deep Bayesian active learning with image data

On the combined effect of class imbalance and concept complexity in deep learning.

An Integrated Approach towards Efficient Image Classification Using Deep CNN with Transfer Learning and PCA

Detecting and Bypassing Trivial Computations in Convolutional Neural Networks

Deep Bayesian Active Learning with Image Data