1Department of Electrical and Computer Engineering, Northeastern University 2Khoury College of Computer Science, Northeastern University 3School of Information, Rochester Institute of Technology 4AInnovation Labs, Inc.
Left is the conventional fashion where weights are optimized and sparse structures
are decided by certain criteria. Right is our PEMN to represent a network where the prototype weights
are fixed and repetitively used to fill in the whole network and different masks are learned to deliver
different feature mappings. Following this line, we explore the representative potential of random
weights and propose a novel paradigm to achieve model compression by combining a set of random
weights and a bunch of masks. Different features mappings are shown as blue rectangles. Squares with different color patches inside serve as parameters of
different layers.
Abstract
A deeper network structure generally handles more complicated non-linearity and performs more competitively. Nowadays, advanced network designs often contain a large number of repetitive structures (e.g., Transformer). They empower the network capacity to a new level but also increase the model size inevitably, which is unfriendly to either model restoring or transferring. In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning diverse masks and introduce the Parameter-Efficient Masking Networks (PEMN). It also naturally leads to a new paradigm for model compression to diminish the model size. Concretely, motivated by the repetitive structures in modern neural networks, we utilize one random initialized layer, accompanied with different masks, to convey different feature mappings and represent repetitive network modules. Therefore, the model can be expressed as \textit{one-layer} with a bunch of masks, which significantly reduce the model storage cost. Furthermore, we enhance our strategy by learning masks for a model filled by padding a given random weights vector. In this way, our method can further lower the space complexity, especially for models without many repetitive architectures. We validate the potential of PEMN learning masks on random weights with limited unique values and test its effectiveness for a new compression paradigm based on different network architectures.
Random Weights Representative Capacity Exploration
Performances of ConvMixer (top row) and ViT (bottom row) backbones on CIFAR10 dataset with different model hyperparameters. Y-axis represent the test accuracy and X-axis denotes different network parameter settings. Dense means the model is trained in regular fashion. Mask is the sparse selection strategy.
One-layer, MP, and RP are our strategies. The decimal after RP means the number of unique
parameters compared with MP. From Mask to RP 1e-5, the unique values of network decrease.
Different experimental settings illustrate the representative potential of random weights.
A New Network Compression Paradigm
Compression performance validation on CIFAR10 (left) and CIFAR100 (right) datasets on
ResNet32/ResNet56 backbones. Y-axis denotes the test accuracy. X-axis means the network size
compression ratio. Different colors represent different network architectures. The straight lines on the
top are performance of dense model with regular training. Lines with different symbol shapes denote
different settings. For ResNet, our three points are based on MP, RP 1e-1, and RP 1e-2, respectively.
This pair of figures show that our proposed paradigm achieves admirable compression performance
compared with baselines. In very high compression ratios, we can still maintain the test accuracy.
BibTeX
@article{bai2022parameter,
title={Parameter-Efficient Masking Networks},
author={Bai, Yue and Wang, Huan and Ma, Xu and Zhang, Yitian and Tao, Zhiqiang and Fu, Yun},
journal={arXiv preprint arXiv:2210.06699},
year={2022}
}