Convolutional wavelet blocks in image classification
Abstract
In this paper, based on an image classification problem and wavelet family CDF-9/7, it is shown how to incorporate discrete wavelet transform into a computer vision model, while maintaining the ability of its training with the backpropagation method. A convolutional wavelet block, that extracts features at different levels of decomposition of the incoming signal, is proposed and successfully integrated into a set of neural network models. The blocks implemented allow to reduce the original model size by 30 – 40 %, while maintaining comparable quality in terms of metric. An effective method for evaluation of discrete wavelet transform on graphics processing unit with lifting scheme is presented. The implementation of wavelet blocks uses element-wise operations of additions and multiplications, thus allowing a simple export of a trained model into one of desired formats for running on new data. ResNetV2-50, MobileNetV2 and EfficientNetV2-B0 architectures are used as the basis models. A new dataset, which is based on a set of categories of LSUN dataset, is constructed for conducting experiments.
References
- Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556v6 [Preprint]. 2015 [cited 2024 January 2]: [14 p.]. Available from: https://arxiv.org/abs/1409.1556v6.
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. arXiv:1512.03385 [Preprint]. 2015 [cited 2024 January 2]: [12 p.]. Available from: https://arxiv.org/abs/1512.03385.
- Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946v5 [Preprint]. 2020 [cited 2024 January 2]: [11 p.]. Available from: https://arxiv.org/abs/1905.11946v5.
- Cheng H, Zhang M, Shi JQ. A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations. arXiv:2308.06767 [Preprint]. 2023 [cited 2024 January 2]: [23 p.]. Available from: https://arxiv.org/abs/2308.06767.
- Blake C, Orr D, Luschi C. Unit scaling: out-of-the-box low-precision training. arXiv:2303.11257v2 [Preprint]. 2023 [cited 2024 January 2]: [29 p.]. Available from: https://arxiv.org/abs/2303.11257v2.
- Zhang Shuai, Guangdi Ma, Yang Weichen, Fang Zuo, Ablameyko SV. Car parking detection in images by using a semi-supervised modified YOLOv5 model. Journal of the Belarusian State University. Mathematics and Informatics. 2023;3:72–81. EDN: XVDRSN.
- Singh A, Kingsbury N. Efficient convolutional network learning using parametric log based dual-tree wavelet ScatterNet. arXiv:1708.09259 [Preprint]. 2017 [cited 2024 January 2]: [8 p.]. Available from: https://arxiv.org/abs/1708.09259.
- Li Q, Shen L, Guo S, Lai Z. Wavelet integrated CNNs for noise-robust image classification. arXiv:2005.03337v2 [Preprint]. 2020 [cited 2024 January 2]: [17 p.]. Available from: https://arxiv.org/abs/2005.03337v2.
- Wolter M, Blanke F, Heese R, Garcke J. Wavelet-packets for deepfake image analysis and detection. arXiv:2106.09369v4 [Preprint]. 2022 [cited 2024 January 2]: [29 p.]. Available from: https://arxiv.org/abs/2106.09369v4.
- He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. arXiv:1603.05027v3 [Preprint]. 2016 [cited 2024 January 2]: [15 p.]. Available from: https://arxiv.org/abs/1603.05027v3.
- Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2: interested residuals and linear bottlenecks. arXiv: 1801.04381v4 [Preprint]. 2019 [cited 2024 January 2]: [14 p.]. Available from: https://arxiv.org/abs/1801.04381v4.
- Tan M, Le QV. EfficientNetV2: smaller models and faster training. arXiv:2104.00298v3 [Preprint]. 2021 [cited 2024 January 2]: [11 p.]. Available from: https://arxiv.org/abs/2104.00298v3.
- Lepik Ü, Hein H. Haar wavelets: with applications. Cham: Springer; 2014. X, 207 p. (Hillermeier C, Schröder J, Weigand B, editors. Mathematical engineering). DOI:10.1007/978-3-319-04295-4.
- Daubechies I. Ten lectures on wavelets. Philadelphia: Society for Industrial and Applied Mathematics; 1992. XIX, 357 p. (CBMSNSF regional conference series in applied mathematics; volume 61).
- Cohen A, Daubechies I, Feauveau J-C. Biorthogonal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics. 1992;45(5):485–560. DOI: 10.1002/cpa.3160450502.
- Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv:1506.03365v3 [Preprint]. 2016 [cited 2024 January 2]: [9 p.]. Available from: https://arxiv.org/abs/ 1506.03365v3.
Copyright (c) 2024 Journal of the Belarusian State University. Mathematics and Informatics
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The authors who are published in this journal agree to the following:
- The authors retain copyright on the work and provide the journal with the right of first publication of the work on condition of license Creative Commons Attribution-NonCommercial. 4.0 International (CC BY-NC 4.0).
- The authors retain the right to enter into certain contractual agreements relating to the non-exclusive distribution of the published version of the work (e.g. post it on the institutional repository, publication in the book), with the reference to its original publication in this journal.
- The authors have the right to post their work on the Internet (e.g. on the institutional store or personal website) prior to and during the review process, conducted by the journal, as this may lead to a productive discussion and a large number of references to this work. (See The Effect of Open Access.)