Parameter-Efficient Masking Networks

Yue Bai1,4,*  Huan Wang1,4  Xu Ma1  Yitian Zhang1 Zhiqiang Tao3  Yun Fu1,2,4 

NeurIPS 2022

1Department of Electrical and Computer Engineering, Northeastern University 
2Khoury College of Computer Science, Northeastern University
3School of Information, Rochester Institute of Technology
4AInnovation Labs, Inc.
*Corresponding author: bai.yue@northeastern.edu
Left is the conventional fashion where weights are optimized and sparse structures are decided by certain criteria. Right is our PEMN to represent a network where the prototype weights are fixed and repetitively used to fill in the whole network and different masks are learned to deliver different feature mappings. Following this line, we explore the representative potential of random weights and propose a novel paradigm to achieve model compression by combining a set of random weights and a bunch of masks. Different features mappings are shown as blue rectangles. Squares with different color patches inside serve as parameters of different layers.

Abstract

A deeper network structure generally handles more complicated non-linearity and performs more competitively. Nowadays, advanced network designs often contain a large number of repetitive structures (e.g., Transformer). They empower the network capacity to a new level but also increase the model size inevitably, which is unfriendly to either model restoring or transferring. In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning diverse masks and introduce the Parameter-Efficient Masking Networks (PEMN). It also naturally leads to a new paradigm for model compression to diminish the model size. Concretely, motivated by the repetitive structures in modern neural networks, we utilize one random initialized layer, accompanied with different masks, to convey different feature mappings and represent repetitive network modules. Therefore, the model can be expressed as \textit{one-layer} with a bunch of masks, which significantly reduce the model storage cost. Furthermore, we enhance our strategy by learning masks for a model filled by padding a given random weights vector. In this way, our method can further lower the space complexity, especially for models without many repetitive architectures. We validate the potential of PEMN learning masks on random weights with limited unique values and test its effectiveness for a new compression paradigm based on different network architectures.

Random Weights Representative Capacity Exploration



Performances of ConvMixer (top row) and ViT (bottom row) backbones on CIFAR10 dataset with different model hyperparameters. Y-axis represent the test accuracy and X-axis denotes different network parameter settings. Dense means the model is trained in regular fashion. Mask is the sparse selection strategy. One-layer, MP, and RP are our strategies. The decimal after RP means the number of unique parameters compared with MP. From Mask to RP 1e-5, the unique values of network decrease. Different experimental settings illustrate the representative potential of random weights.


A New Network Compression Paradigm

Compression performance validation on CIFAR10 (left) and CIFAR100 (right) datasets on ResNet32/ResNet56 backbones. Y-axis denotes the test accuracy. X-axis means the network size compression ratio. Different colors represent different network architectures. The straight lines on the top are performance of dense model with regular training. Lines with different symbol shapes denote different settings. For ResNet, our three points are based on MP, RP 1e-1, and RP 1e-2, respectively. This pair of figures show that our proposed paradigm achieves admirable compression performance compared with baselines. In very high compression ratios, we can still maintain the test accuracy.

BibTeX

@article{bai2022parameter,
  title={Parameter-Efficient Masking Networks},
  author={Bai, Yue and Wang, Huan and Ma, Xu and Zhang, Yitian and Tao, Zhiqiang and Fu, Yun},
  journal={arXiv preprint arXiv:2210.06699},
  year={2022}
}
This website is borrowed from nerfies.