QAT量化与泛化能力
Créé le : 18 janvier 2025
Créé le : 18 janvier 2025
读这篇论文,写一篇中文技术博客,让读的人看懂这篇论文全部内容,尤其是原理细节
以下背景材料是论文latex代码:
\documentclass[10pt]{article} % For LaTeX2e
% \usepackage{tmlr}
% If accepted, instead use the following line for the camera-ready submission:
\usepackage[accepted]{tmlr}
% To de-anonymize and remove mentions to TMLR (for example for posting to preprint servers), instead use the following:
%\usepackage[preprint]{tmlr}
% Optional math commands from https://github.com/goodfeli/dlbook_notation.
\input{math_commands.tex}
\usepackage{hyperref}
\usepackage{url}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% None Defualt packages
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage[export]{adjustbox}
\usepackage{mathtools, nccmath}
\PassOptionsToPackage{table,xcdraw}{xcolor}
\usepackage{multirow}
\DeclarePairedDelimiter{\nint}\lfloor\rceil
\DeclarePairedDelimiter{\abs}\lvert\rvert
\newcommand{\linf}[1]{\lVert #1 \rVert_{\infty}}
\usepackage{graphicx}
\usepackage{subcaption}
\usepackage{booktabs} % for borders and merged ranges
\usepackage{soul}% for underlines
\usepackage{hhline}
\newcommand\crule[3][black]{\textcolor{#1}{\rule{#2}{#3}}}
\definecolor{tab4_g}{rgb}{0.57, 0.74, 0.65}
\definecolor{tab4_r}{rgb}{0.89, 0.51, 0.53}
\usepackage[mathscr]{euscript}
\newcommand{\ind}{\perp!!!!\perp}
% \usepackage{natbib}
\newtheorem{hyp}{Hypothesis}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{colortbl}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{QGen: On the Ability to Generalize in Quantization Aware Training}
% Authors must not appear in the submitted version. They should be hidden
% as long as the tmlr package is used without the [accepted] or [preprint] options.
% Non-anonymous submissions will be rejected without review.
\author{\name MohammadHossein AskariHemmat\footnotemark[1]{\thanks{Equal Contribution.}} \email [email protected] \
\addr Deeplite
\AND
\name Ahmadreza Jeddi \footnotemark[1] \email [email protected] \
\addr Deeplite
\AND
\name Reyhane Askari Hemmat \email [email protected]\
\addr University of Montreal and Mila, Quebec AI Institute
\AND
\name Ivan Lazarevich
\email [email protected] \
\addr Deeplite
\AND
\name Alexander Hoffman \email [email protected] \
\addr Deeplite
\AND
\name Sudhakar Sah \email [email protected] \
\addr Deeplite
\AND
\name Ehsan Saboori \email [email protected] \
\addr Deeplite
\AND
\name Yvon Savaria \email [email protected] \
\addr École Polytechnique de Montréal
\AND
\name Jean-Pierre David
\email [email protected] \
\addr École Polytechnique de Montréal
}
% The \author macro works with any number of authors. Use \AND
% to separate the names and addresses of multiple authors.
\newcommand{\fix}{\marginpar{FIX}}
\newcommand{\new}{\marginpar{NEW}}
\def\month{MM} % Insert correct month for camera-ready version
\def\year{YYYY} % Insert correct year for camera-ready version
\def\openreview{\url{https://openreview.net/forum?id=XXXX}} % Insert correct link to OpenReview for camera-ready version
\begin{document}
\maketitle
\begin{abstract}
Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance. In particular, first, we develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization. Second, motivated by recent work connecting the sharpness of the loss landscape and generalization, we derive an approximate bound for the generalization of quantized models conditioned on the amount of quantization noise. We then validate our hypothesis by experimenting with over 2000 models trained on CIFAR-10, CIFAR-100, and ImageNet datasets on convolutional and transformer-based models.
\end{abstract}
\section{Introduction}
\label{sec:introduction}
The exceptional growth of technology involving deep learning has made it one of the most promising technologies for applications such as computer vision, natural language processing, and speech recognition. The ongoing advances in these models consistently enhance their capabilities, yet their improving performances often come at the price of growing complexity and an increased number of parameters. The increasing complexity of these models poses a challenge to the deployment in production systems due to higher operating costs, greater memory requirements, and longer response times. Quantization \cite{courbariaux2015binaryconnect,hubara2016binarized,DBLP:journals/corr/abs-1802-05668,DBLP:journals/corr/ZhouYGXC17,Jacob_2018_CVPR,DBLP:journals/corr/abs-1806-08342} is one of the prominent techniques that have been developed to reduce the model sizes and/or reduce latency. Model quantization represents full precision model weights and/or activations using fewer bits, resulting in models with lower memory usage, energy consumption, and faster inference. Quantization has gained significant attention in academia and industry. Especially with the emergence of the transformer \cite{vaswani2017attention} model, quantization has become a standard technique to reduce memory and computation requirements.
The impact on accuracy and the benefits of quantization, such as memory footprint and latency, is well studied \cite{courbariaux2015binaryconnect, li2017training, gholami2021survey}. These studies are mainly driven by the fact that modern hardware is faster and more energy efficient in low-precision (byte, sub-byte) arithmetic compared to the floating point counterpart. Despite its numerous benefits, quantization may adversely impact accuracy.
Hence, substantial research efforts on quantization revolve around addressing the accuracy degradation resulting from lower bit representation. This involves analyzing the model's convergence qualities for various numerical precisions and studying their impacts on gradients and network updates \cite{li2017training, hou2019analysis}.
In this work, we delve into the generalization properties of quantized neural networks. This key aspect has received limited attention despite its significant implications for the performance of models on unseen data. This factor becomes particularly important for safety-critical \cite{gambardella2019efficient, zhang2022qebverif} applications. While prior research has explored the performance of neural networks in adversarial settings \cite{gorsline2021adversarial, lin2019defensive, galloway2017attacking} and investigated how altering the number of quantization bits during inference can affect model performance\cite{chmiel2020robust, bai2021batchquant}, there is a lack of systematic studies on the generalization effects of quantization using standard measurement techniques.
This work studies the effects of different quantization levels on model generalization, training accuracy, and training loss. First, in Section \ref{sec:quantization_noise_modelling}, we model quantization as a form of noise added to the network weights. Subsequently, we demonstrate that this introduced noise serves as a regularizing agent, with its degree of regularization directly related to the bit precision. Consistent with other regularization methods, our empirical studies further support the claim that each model requires precise tuning of its quantization level, as models achieve optimal generalization at varying quantization levels. On the generalization side, in Section \ref{sec:gen_bound}, we show that quantization could help the optimization process convergence to minima with lower sharpness when the scale of quantization noise is bounded. This is motivated by recent works of \cite{foret2021sharpnessaware, keskar2016large}, which establish connections between the sharpness of the loss landscape and generalization. We then leverage a variety of recent advances in the field of generalization measurement \cite{jiang2019fantastic, dziugaite2020search}, particularly sharpness-based measures \cite{keskar2016large, dziugaite2017computing, neyshabur2017pac}, to verify our hypothesis for a wide range of vision problems with different setups and model architectures. Finally, in this section, we present visual demonstrations illustrating that models subjected to quantization have a flatter loss landscape.
After establishing that lower-bit-quantization results in improved flatness in the loss landscape, we study the connection between the achieved flatness of the lower-bit-quantized models and generalization. Our method estimates the model's generalization on a given data distribution by measuring the difference between the loss of the model on training data and test data. To do so, we train a pool of almost 2000 models on CIFAR-10 and CIFAR-100 \cite{krizhevsky2009learning} and Imagenet-1K \cite{5206848} datasets, and report the estimated generalization gap. Furthermore, we conclude our experiments by showing a practical use case of model generalization, in which we evaluate the vision models under severe cases when the input to the model is corrupted. This is achieved by measuring the generalization gap for quantized and full precision models when different types of input noise are used, as introduced in \cite{hendrycks2019robustness}.
% Our experiments suggest that quantization can help improve the generalization of neural networks. %We analytically and empirically show the trade-off between the training loss and training accuracy of quantized models and their generalization to unseen data. %
% We show that by reducing the number of quantization bits, the accuracy of the model drops but the generalization improves.
Our main contributions can be summarized as follows:
\begin{itemize}
% \item We prove that a higher quantization level (lower bit size) results in an increase in training loss. We then show that this increase in training loss will result in a smaller generalization gap.
% \item We empirically show that higher levels of quantization can help the model converge to minima with increased flatness.
\item We theoretically show that quantization can be seen as a regularizer.
\item We empirically show that there exists a quantization level at which the quantized model converges to a flatter minimum than its full-precision model.
\item We empirically demonstrate that quantized models show a better generalization gap on distorted data.
text% \item Establish that the generalization gap (the difference between a model's performance on training data and its performance on unseen data) of quantization models is better than the FP32 model % \item Training loss increases by reducing the number of bits used for quantization % \item Generalization gap improves with the lower number of bits used for quantization
\end{itemize}
\section{Related Works}
\label{sec:related_works}
\subsection{Regularization Effects of Quantization}
Since the advent of BinaryConnect \cite{courbariaux2015binaryconnect} and Binarized Neural Networks \cite{hubara2016binarized}, which were the first works on quantization, the machine learning community has been aware of the generalization effects of quantization, and the observed generalization gains have commonly been attributed to the implicit regularization effects that the quantization process may impose. This pattern is also observed in more recent works such as \cite{mishchenko2019low, xu2018alternating, chen2021quantization}. Even though these studies have empirically reported some performance gain as a side-product of quantization, they lack a well-formed analytical study.
Viewing quantization simply as regularization is relatively intuitive, and to the best of our knowledge, the only work so far that has tried to study this behavior formally is the recent work done in \cite{https://doi.org/10.48550/arxiv.2206.05916}, where the authors provide an analytical study on how models with stochastic binary quantization can have a smaller generalization gap compared to their full precision counterparts. The authors propose a \textit{quasi-neural network} to approximate the effect of binarization on neural networks. They then derive the neural tangent kernel \cite{jacot2018neural, bach2017breaking} for the proposed quasi-neural network approximation. With this formalization, the authors show that binary neural networks have lower capacity, hence lower training accuracy, and a smaller generalization gap than their full precision counterparts. However, this work is limited to the case of simplified binarized networks and does not study the wider quantization space, and their supporting empirical studies are done on MNIST and Fashion MNIST datasets with no studies done on larger scale more realistic problems. Furthermore, the Neural Tangent Kernel (NTK) analysis requires strong assumptions such as an approximately linear behaviour of the model during training which may not hold in practical setups.
\subsection{Generalization and Complexity Measures }
Generalization refers to the ability of machine learning models to perform well on unseen data beyond the training set. Despite the remarkable success and widespread adoption of deep neural networks across various applications, the factors influencing their generalization capabilities and the extent to which they generalize effectively are still unclear \cite{jiang2019fantastic, recht2019imagenet}.
% Generalization, the ability of ML models to extend their performance on the seen data to the infinite set of unseen data, is probably the most desired feature one would want for their model. With all their success and popularity over a range of different applications, why and how well the deep neural networks generalize remains a mystery \cite{jiang2019fantastic, recht2019imagenet}.
Minimization of the common loss functions (e.g., cross=entropy and its variants) on the training data does not necessarily mean the model would generalize well \cite{foret2021sharpnessaware, recht2019imagenet}, especially since the recent models are heavily over-parameterized and they can easily overfit the training data. In\cite{zhang2021understanding}, the authors demonstrate neural networks' vulnerability to poor generalization by showing they can perfectly fit randomly labeled training data. This is due to the complex and non-convex landscape of the training loss. Numerous works have tried to either explicitly or implicitly solve this overfitting issue using optimizer algorithms \cite{kingma2014adam, martens2015optimizing}, data augmentation techniques \cite{cubuk2018autoaugment}, and batch normalization \cite{ioffe2015batch}, to name a few.
\textbf{So the question remains: what is the best indicator of a model's generalization ability?} Proving upper bounds on the test error \cite{neyshabur2017pac, bartlett2017spectrally} has been the most direct way of studying the ability of models to generalize; however, the current bounds are not tight enough to indicate the model's ability to generalize \cite{jiang2019fantastic}. Therefore, several recent works have preferred the more empirical approaches of studying generalization \cite{keskar2016large, liang2019fisher}. These works introduce a complexity measure, a quantity that monotonically relates to some aspect of generalization. Specifically, lower complexity measures correspond to neural networks with improved generalization capacity. Many complexity measures are introduced in the literature, but each of them has typically targeted a limited set of models on toy problems. However, recent work in \cite{jiang2019fantastic} followed by \cite{dziugaite2020search} performed an exhaustive set of experiments on the CIFAR-10 and SVHN \cite{37648} datasets with different model backbones and hyper-parameters to identify the measures that correlate best with generalization. Both of these large-scale studies show that sharpness-based measures are the most effective. The sharpness-based measures are derived either from measuring the average flatness around a minimum through adding Gaussian perturbations (PAC-Bayesian bounds \cite{mcallester1999pac, dziugaite2017computing}) or from measuring the worst-case loss, i.e., sharpness \cite{keskar2016large, dinh2017sharp}.
The effectiveness of sharpness-based measures has also inspired new training paradigms that penalize the loss of landscape sharpness during training \cite{foret2021sharpnessaware, du2022sharpness, izmailov2018averaging}. In particular, \cite{foret2021sharpnessaware} introduced the Sharpness-Aware-Minimization (SAM), which is a scalable and differentiable algorithm that helps models to converge and reduce the model sharpness. It is also worth mentioning here that some recent works \cite{liu2021sharpness, wang2022squat} assume that the discretization and gradient estimation processes, which are common in quantization techniques, might cause loss fluctuations that could result in a sharper loss landscape. Then they couple quantization with SAM and report improved results; however, our findings in Section \ref{sec:gen_bound} suggest the opposite. The quantized models in our experiments exhibit improved loss landscape flatness compared to their full precision counterparts.
% \subsubsection*{Acknowledgments}
% Use unnumbered third level headings for the acknowledgments. All
% acknowledgments, including those to funding agencies, go at the end of the paper.
% Only add this information once your submission is accepted and deanonymized.
\section{Mathematical Model for Quantization}
\label{sec:quantization_noise_modelling}
% \subsection{Quantization Noise Modelling}
Throughout this paper, we will denote vectors as , the scalars as , and the sets as . Furthermore, denotes independence. Given a distribution for the data space, our training dataset is a set of i.i.d. samples drawn from . The typical ML task tries to learn models parametrized by weights that can minimize the training set loss
\begin{math}
\mathcal{L}{\mathscr{S}}(\boldsymbol{w}) = \frac{1}{|\mathscr{S}|} \sum{i=1}^{|\mathscr{S}|}
l(f(\boldsymbol{w}, \boldsymbol{x}_i), \boldsymbol{y}_i)
\end{math}
given a loss function and pairs in the training data.
To quantize our deep neural networks, we utilize Quantization Aware Training (QAT) methods similar to Learned Step-size Quantization (LSQ) \cite{Esser2020LEARNED} for CNNs and Variation-aware Vision Transformer Quantization (VVTQ) \cite{huang2023variation} for ViT models. Specifically, we apply the per-layer quantization approach, in which, for each target quantization layer, we learn a step size \textit{s} to quantize the layer weights. Therefore, given the weights , scaling factor and bits to quantize, the quantized weight tensor and the quantization noise can be calculated as below:
\begin{align} \label{eq:typical_quantization}
\bar{\boldsymbol{w}} &= \nint{clip(\frac{\boldsymbol{w}}{s}, -2^{b-1}, 2^{b-1}-1)} \ \label{eq:typical_quantization_rescale}
\hat{\boldsymbol{w}} &= \bar{\boldsymbol{w}} \times s \ \label{eq:typical_quantization_error}
\Delta &= \boldsymbol{w} - \hat{\boldsymbol{w}},
\end{align}
where the rounds the input vector to the nearest integer vector, function returns with values below set to and values above set to , and shows a quantized representation of the weights at the same scale as .
\subsection{Theoretical Analysis}
For simplicity, let us consider a regression problem where the mean square error loss is defined as,
\begin{equation}\label{eq:mse}
\mathcal{L} = \mathbb{E}_{p(\boldsymbol{x}, \boldsymbol{y})} [|\boldsymbol{\hat{y}} - \boldsymbol{y}|_2^2],
\end{equation}
where is the target, and is the output of the network parameterized by .
For uniform quantization, the quantization noise can be approximated by the uniform distribution where is the width of the quantization bin and is the uniform distribution ~\cite{defossez2021differentiable,widrow1996statistical, agustsson2020universally}.
Consequently, a quantized neural network effectively has the following loss,
\begin{equation}\label{eq:mse_quant}
\Tilde{\mathcal{L}} = \mathbb{E}_{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [|\boldsymbol{\hat{y^q}} - \boldsymbol{y}|_2^2],
\end{equation}
where .
We can apply a first-order Taylor approximation,
\begin{equation}
f(\boldsymbol{x}, \boldsymbol{w}+\boldsymbol{\Delta}) \approx f(\boldsymbol{x}, \boldsymbol{w}) + \boldsymbol{\Delta}^\top \nabla_{\boldsymbol{w}} f(\boldsymbol{x}, \boldsymbol{w})
\end{equation}
Thus, .
Re-writing, the expectation on ,
\begin{align*}
\Tilde{\mathcal{L}} &= \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [(\boldsymbol{\hat{y}}^q - \boldsymbol{{y}} )^2] \
&= \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [((\boldsymbol{\hat{y}} + {\boldsymbol{\Delta}}^\top \nabla_{\boldsymbol{w}} \boldsymbol{\hat{y}}) -\boldsymbol{{y}} )^2]\
&= \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [(\boldsymbol{\hat{y}} - \boldsymbol{{y}})^2 + |{\boldsymbol{\Delta}}^\top \nabla{\boldsymbol{w}} \hat{\boldsymbol{y}}|^2_2 + 2 (\boldsymbol{\hat{y}} - \boldsymbol{{y}}) ({\boldsymbol{\Delta}}^\top \nabla_{\boldsymbol{w}} \boldsymbol{\hat{y}})]\
&= \mathcal{L} + \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [|{\boldsymbol{\Delta}}^\top \nabla{\boldsymbol{w}} \hat{\boldsymbol{y}}|^2_2] + \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [2 (\boldsymbol{\hat{y}} - \boldsymbol{y}) ({\boldsymbol{\Delta}}^\top \nabla{\boldsymbol{w}} \boldsymbol{\hat{y}})]\
\end{align*}
Since , and \footnote{Note that we only require the quantization noise distribution to have . We do not explicitly use the assumption of coming from a uniform distribution. Thus, for any zero mean noise distribution, the above proof holds.}, the last term on the right-hand side is zero. Thus we have,
\begin{align}
\label{eq:quantization_regularization}
\Tilde{\mathcal{L}} = \mathcal{L} + \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y}, \boldsymbol{\Delta})} [|{\boldsymbol{{\Delta}}^\top \nabla{\boldsymbol{w}} \boldsymbol{\hat{y}}|^2_2]} = \mathcal{L} + \mathcal{R}(\boldsymbol{{\Delta}}),
\end{align}
where can be viewed as a regularization function. This means that minimizing is equivalent to minimizing the loss of a non-quantized neural network with gradient norm regularization. Given a quantization method like LSQ or VVTQ, we know that the quantization error () is a function of the quantization level (). As a result, is also a function of the quantization level. Thus, the quantization level should be viewed as a hyper-parameter that controls the regularization level. Similar to other regularization methods in deep learning, this hyper-parameter should also be carefully tuned for the best generalization performance.
% As a result we have .}
% Given a quantization method like LSQ, we have , for a . Consequently, is bounded by,
% \begin{align}
% \label{eq:quant_added_noise}
% \Tilde{\mathcal{L}} \le \mathcal{L} + \frac{{\delta}^2}{4} \mathbb{E}{p(\boldsymbol{x}, \boldsymbol{y})} [|\nabla{\boldsymbol{w}} \boldsymbol{\hat{y}}|^2_2].
% \end{align}
% Hence, larger values of lead to higher training losses.
\begin{figure}[!t]
\centering
\adjustbox{trim={0 0 0 0},clip}{%
\includegraphics[width=0.33\textwidth]{images/step_size_graph_ResNet-50.png}
}%
\adjustbox{trim={0 0 0 0},clip}{%
\includegraphics[width=0.33\textwidth]{images/step_size_graph_ResNet-18.png}
}%
\adjustbox{trim={0 0 0 0},clip}{%
\includegraphics[width=0.33\textwidth]{images/step_size_graph_MobileNet_V2.png}
}%
\caption{Width of the quantization bin for the weight tensor of each layer in ResNet-50, ResNet-18, and MobileNet V2 trained on the ImageNet dataset. We used different quantization levels and in all cases, the induced quantization noise is significantly higher when a lower bit resolution value is used.}
\label{fig:quant_noise_per_layer}
\end{figure}
To study the relation between and quantization level, we ran some experiments. Figure \ref{fig:quant_noise_per_layer} illustrates the width of the quantization bin per layer for three different architectures trained on ImageNet \cite{5206848}. As it can be seen, the lower the number of quantization bits, the larger the scale of step size, , is. And as Equations \ref{eq:typical_quantization} to \ref{eq:typical_quantization_error} indicate, is equivalent to the width of the quantization bin. Hence lower-bit quantization causes quantization bins to be wider as the number of potential representations becomes limited, which results in higher regularization and training losses.
In our experiments, this trend was consistent across various vision tasks and model architectures, allowing us to affirm that lower-bit-resolution quantization (with greater ) results in increased training losses, as shown in Equation \ref{eq:quantization_regularization}. This indicates that the level of quantization dictates the degree of regularization introduced to the network. Furthermore, our empirical investigation, encompassing nearly 2000 models trained on the CIFAR-10, CIFAR-100, and ImagenNet-1k datasets, confirms this observation. The findings are detailed in Table \ref{Tab:gen_gap_cifars}.
%%%% Generalization gap table
\begin{table}[!ht]
\caption{Generalization gaps of quantized and full-precision models on CIFAR-10, CIFAR-100, and ImagenNet-1k datasets. For NiN models, each cell in the Table represents the mean of that corresponding metric over 243 samples, achieved from training NiN models over variations of 5 common hyperparameters.}
\vspace{1mm}
\begin{adjustbox}{max width=\textwidth}
\begin{tabular}{cccccccc}
\hline
& Model & Precision & Train Acc & Test Acc & Train Loss & Test Loss & Generalization \ \hline
\multirow{4}{}{CIFAR-10} & \multirow{4}{}{NiN} & FP32 & 97.61 & 88.05 & 0.103 & 0.405 & 0.302 \
& & Int8 & 97.5 & 88.01 & 0.106 & 0.407 & 0.301 \
& & Int4 & 96.9 & 87.7 & 0.125 & 0.413 & 0.288 \
& & Int2 & 93.4 & 86.11 & 0.222 & 0.446 & 0.224 \ \hline
\multirow{4}{}{CIFAR-100} & \multirow{4}{}{NiN} & FP32 & 95.28 & 63.48 & 0.207 & 1.687 & 1.48 \
& & Int8 & 95.17 & 63.44 & 0.211 & 1.685 & 1.469 \
& & Int4 & 93.5 & 63.19 & 0.271 & 1.648 & 1.38 \
& & Int2 & 81.21 & 62.11 & 0.676 & 1.537 & 0.859 \ \cline{1-8}
\multirow{12}{}{Imagenet-1K} & \multirow{4}{}{DeiT-T}
& FP32 & 73.75 & 71.38 & 1.38 & 2.48 & 1.1 \
& & Int8 & 76.3 & 75.54 & 0.99 & 1.98 & 0.98 \
& & Int4 & 74.71 & 72.31 & 1.08 & 2.07 & 0.99 \
& & Int2 & 59.73 & 55.35 & 1.83 & 2.81 & 0.98 \ \cline{2-8}
& \multirow{4}{}{Swin-T}
& FP32 & 83.39 & 80.96 & 0.516 & 1.48 & 0.964\
& & Int8 & 85.21 & 82.48 & 0.756 & 1.56 & 0.80\
& & Int4 & 84.82 & 82.42 & 0.764 & 1.59 & 0.82 \
& & Int2 & 78.76 & 77.66 & 0.941 & 1.84 & 0.89\ \cline{2-8}
& \multirow{4}{}{ResNet-18}
& FP32 & 69.96 & 71.49 & 1.18 & 2.23 & 1.05 \
& & Int8 & 73.23 & 73.32 & 1.28 & 2.10 & 0.82 \
& & Int4 & 71.34 & 71.74 & 1.26 & 2.18 & 0.92 \
& & Int2 & 67.1 & 68.58 & 1.38 & 2.16 & 0.78 \ \hline
\end{tabular}
\label{Tab:gen_gap_cifars}
\end{adjustbox}
\end{table}
\section{Analyzing Loss Landscapes in Quantized Models and Implications for Generalization}
\label{sec:gen_bound}
A low generalization gap is a desirable characteristic of deep neural networks. It is common in practice to estimate the population loss of the data distribution , i.e.
\begin{math}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) = \mathbb{E}{(\boldsymbol{x}, \boldsymbol{y}) \sim \mathcal{D}}
[l(f(\boldsymbol{w}, \boldsymbol{x}), \boldsymbol{y})]
\end{math}, by utilizing as a proxy, and then minimizing it by gradient descent-based optimizers. However, given that modern neural networks are highly over-parameterized and is commonly non-convex in , the optimization process can converge to local or even global minima that could adversely affect the generalization of the model (i.e. with a significant gap between and ) \cite{foret2021sharpnessaware}.
Motivated by the connection between the sharpness of the loss landscape and generalization \cite{keskar2016large}, in \cite{foret2021sharpnessaware} the authors proposed the Sharpness-Aware-Minimization (SAM) technique, in which they propose to learn the weights that result in a flat minimum with a neighborhood of low training loss values characterized by . Especially, inspired by the PAC-Bayesian generalization bounds, they were able to prove that for any , with high probability over the training dataset , the following inequality holds:
\begin{equation}\label{eq:sam}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) \leq
\max{||\boldsymbol{\epsilon}||2 \leq \rho} \mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\epsilon}) : + h(||\boldsymbol{w}||_2^2/\rho^2)
,
\end{equation}
where is a strictly increasing function. Even though the above theorem is for the case where the -norm of is bounded by and the adversarial perturbations are utilized to achieve the worst-case loss, the authors empirically show that in practice, other norms in and random perturbations for can also achieve some levels of flatness; however, they may not be as effective as the -norm coupled with the adversarial perturbations.
Extending on the empirical studies of \cite{foret2021sharpnessaware}, we relax the -norm condition of Equation \ref{eq:sam}, and consider the -norm instead, resulting in:
\begin{equation}\label{eq:sam_infinity}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) \leq
\max{||\boldsymbol{\epsilon}||{\infty} \leq \rho} \mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\epsilon}) : + h(||\boldsymbol{w}||_2^2/\rho^2)
\end{equation}
Furthermore, given small values of , for any noise vector such that , the following inequality holds in practice for a local minimum characterized by , as also similarly depicted in Equation \ref{eq:quantization_regularization} where corresponds to the quantization noise, ; however, this inequality may not necessarily hold for every :
\begin{equation}\label{eq:sam_random_perturbation}
\mathcal{L}{\mathscr{S}}(\boldsymbol{w}) \leq
\mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\delta}) \leq
\max_{||\boldsymbol{\epsilon}||{\infty} \leq \rho} \mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\epsilon}),
\end{equation}
For small values of close to 0, and a given we can approximate,
\begin{equation}
\max_{||\boldsymbol{\epsilon}||{\infty} \leq \rho} \mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\epsilon})
\end{equation}
in Equation \ref{eq:sam_infinity} with . As a result, for small positive values of , we have:
\begin{equation}\label{eq:sam_delta_approximate}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) \leq
\mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\delta}) : + h(||\boldsymbol{w}||_2^2/\rho^2),
\end{equation}
and finally, moving the to the left-hand-side in Equation \ref{eq:sam_delta_approximate}, will give us:
\begin{equation}\label{eq:sam_gap_bound}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) : -
\mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\delta}) \leq h(||\boldsymbol{w}||_2^2/\rho^2).
\end{equation}
The above inequality formulates an approximate bound for values of close to 0 on the generalization gap for a model parametrized by ; given the nature of function , the higher the value is the tighter the generalization bound becomes.
As shown in Section \ref{sec:quantization_noise_modelling}, for quantization techniques with a constant quantization bin width, we have , where is the quantization noise, and is the width of the quantization bin. Replacing the quantization equivalent terms in Equation \ref{eq:sam_gap_bound} yields:
\begin{equation}\label{eq:quantization_gap_bound}
\mathcal{L}{\mathcal{D}}(\boldsymbol{w}) : -
\mathcal{L}{\mathscr{S}}(\boldsymbol{w} + \boldsymbol{\Delta}) \leq h(4||\boldsymbol{w}||_2^2/\delta^2).
\end{equation}
We now state the following hypothesis for quantization techniques based on Equation \ref{eq:quantization_gap_bound}:
\begin{hyp}[H\ref{hyp:first}] \label{hyp:first}
Let be the set of weights in the model, be the set of quantized weights, be the width of quantization bin and be a function that measures the sharpness of a minima, we have,
\begin{enumerate}
% \item , and is inversely proportional to .
% \item The bigger the is the lower the generalization gap is.
\item Having a bounded with , there exist a where, for quantized model parameterized by obtained through QAT and full precision model parameterized by we have:
\end{enumerate}
\end{hyp}
(1) implies that quantization helps the model converge to flatter minima with lower sharpness. As discussed in Section \ref{sec:quantization_noise_modelling} and illustrated in Figure \ref{fig:quant_noise_per_layer}, since lower bit quantization corresponds to higher , therefore, lower bit resolution quantization results in better flatness around the minima. However, as described by \ref{eq:quantization_regularization}, the is a hyperparameter for the induced regularization. Hence, not all quantization levels will result in flatter minima and improved generalization.
% As a result, (2) logically follows as the flatter minima indicate improved generalization \cite{keskar2016large, foret2021sharpnessaware}, hence a lower generalization gap.
In the rest of this Section, we report the results of our exhaustive set of empirical studies regarding the generalization qualities of quantized models; in Section \ref{sec:flatness_gen}, for different datasets and different backbones (models) we study the flatness of the loss landscape of the deep neural networks under different quantization regimens, in Section \ref{sec:cifar10_100_results} we measure and report the generalization gap of the quantized models for a set of almost 2000 vision models, and finally in Section \ref{sec:distortions} using corrupted datasets, we study the real-world implications that the generalization quality can have and how different levels of quantization perform under such scenarios.
%%%% L2 norm of the weights
\begin{table}[!h]
\centering
\caption{-norm of the network weights; all the weights are first flattened into a vector and then the -norm is measured. The huge difference among the values of different quantization levels indicates that the magnitude of the weights should also be considered when comparing the flatness. Flatness is correlated with the scale of perturbations that the weights can bear without too much model performance degradation. NiN () refers to a fully convolutional Network-in-Network \cite{lin2013network} model with a depth multiplier of and a width multiplier of ; the base depth is 25. }
\vspace{2mm}
% \begin{adjustbox}{max width=\textwidth}
\begin{tabular}{lcclll}
\specialrule{.1em}{.05em}{.05em}
\textbf{Dataset} & \multicolumn{1}{l}{\textbf{Model}} & \multicolumn{1}{l}{\textbf{Int2}} & \textbf{Int4} & \textbf{Int8} & \textbf{FP32} \ \specialrule{.1em}{.05em}{.05em}
\multirow{4}{}{\textbf{CIFAR-10}} & NiN (4x10) & \multicolumn{1}{l}{47.263} & 54.291 & 53.804 & 130.686 \
& NiN (4x12) & \multicolumn{1}{l}{43.039} & 46.523 & 46.750 & 73.042 \
& ResNet-18 & 44.264 & 48.227 & 47.368 & 59.474 \
& ResNet-50 & 45.011 & 238.117 & 48.149 & 97.856 \ \specialrule{.1em}{.05em}{.05em}
\multirow{4}{}{\textbf{CIFAR-100}} & NiN (5x10) & \multicolumn{1}{l}{60.981} & 60.707 & 60.905 & 190.414 \
& NiN (5x12) & 82.230 & 87.931 & 87.307 & 163.768 \
& ResNet-18 & 48.120 & 55.027 & 54.735 & 125.164 \
& ResNet-50 & 75.739 & 82.788 & 79.603 & 148.298 \ \specialrule{.1em}{.05em}{.05em}
\multirow{2}{*}{\textbf{ImageNet-1K}} & ResNet-18 & \multicolumn{1}{l}{78.291} & 84.472 & 85.162 & 415.004 \
& ResNet-50 & 214.055 & 213.035 & 212.624 & 379.465 \
\specialrule{.1em}{.05em}{.05em}
\end{tabular}
\label{Tab:weight_magnitude}
% \end{adjustbox}
\end{table}
\subsection{Flatness of Minima and Generalization}
\label{sec:flatness_gen}
In this section, we conduct experiments that demonstrate quantized neural networks enjoy better flatness in their loss landscape compared to their full-precision counterparts; this finding is contrary to the assumption of some of the recent studies \cite{liu2021sharpness, wang2022squat}. In those studies, it is assumed that quantization results in sharper minima. We believe that the root of this assumption might be that the authors of those works have not considered the magnitude of the network weights in measuring the sharpness. However, as \cite{jiang2019fantastic} and \cite{dziugaite2020search} show, the flatness measures that take the magnitude of the parameters into account \cite{keskar2016large}, are better indicators of generalization.
As Table \ref{Tab:weight_magnitude} shows, for a given backbone, the magnitude of the network parameters (calculated as the -norm of weights) are very different among different quantization levels; therefore, simply measuring the loss landscape flatness using sharpness or PAC-Bayesian bounds without considering the magnitude of the weights could be misleading.
%%%%% The Sharpness/Flatness exps table
\begin{table}[!ht]
\caption{Sharpness-based measures for different quantization levels over 3 image datasets. The lowest values in each category correspond to the lowest complexity that is considered to be the best (indicated by \crule[tab4_g]{0.2cm}{0.2cm} ). All the values are normalized by calculating where is the value of the measure and is the size of the dataset.}
\vspace{1mm}
\begin{adjustbox}{max width=\textwidth}
\begin{tabular}{ccr|rrrr|rrrr}
% \hline
& & & \multicolumn{4}{c|}{\textbf{PAC-Bayesian}} & \multicolumn{4}{c}{\textbf{Sharpness}} \
% \cline{4-11}
\multirow{-2}{}{\textbf{Dataset}} & \multirow{-2}{}{\textbf{Model}} & \multirow{-2}{}{\textbf{Precision}} & Init & Orig & Mag-Init & Mag-Orig & Init & Orig & Mag-Init & Mag-Orig \ \specialrule{.1em}{.05em}{.05em}
& & FP32 & 2.264 & 2.2 & 7.635 & 7.594 & 0.589 & 0.572 & 8.219 & 8.181 \
& & Int8 & 4.204 & 3.626 & 6.435 & 6.176 & 0.292 & 0.252 & 6.853 & 6.610 \
& & Int4 & 2.482 & 2.143 & 6.419 & 6.162 & 1.444 & 1.247 & 6.826 & 6.586 \
& \multirow{-4}{}{NiN (4x10)} & Int2 & 1.588 & 1.32 & \cellcolor[HTML]{B7E1CD}6.171 & \cellcolor[HTML]{B7E1CD}5.833 & 1.152 & 0.958 & \cellcolor[HTML]{B7E1CD}{\color[HTML]{000000} 6.454} & \cellcolor[HTML]{B7E1CD}{\color[HTML]{000000} 6.131} \ \cline{2-11}
& & FP32 & 1.469 & 1.328 & 7.974 & 7.770 & 0.359 & 0.324 & 9.216 & 9.040 \
& & Int8 & 6.057 & 4.866 & 7.718 & 7.256 & 0.259 & 0.208 & 8.655 & 8.245 \
& & Int4 & 2.658 & 2.131 & 7.765 & 7.302 & 1.335 & 1.07 & 8.56 & 8.142 \
& \multirow{-4}{}{NiN (4x12)} & Int2 & 1.918 & 1.493 & \cellcolor[HTML]{B7E1CD}7.654 & \cellcolor[HTML]{B7E1CD}7.119 & 0.781 & 0.608 & \cellcolor[HTML]{B7E1CD}8.513 & \cellcolor[HTML]{B7E1CD}8.034 \ \cline{2-11}
& & FP32 & 1.186 & 1.135 & 3.659 & 3.617 & 1.447 & 1.383 & 4.399 & 4.364 \
& & Int8 & 0.893 & 0.834 & 3.355 & 3.285 & 0.426 & 0.398 & 4.112 & 4.055 \
& & Int4 & 1.786 & 1.673 & 3.291 & 3.223 & 0.433 & 0.405 & 4.037 & \cellcolor[HTML]{B7E1CD}3.981 \
& \multirow{-4}{}{ResNet-18} & Int2 & 1.368 & 1.267 & \cellcolor[HTML]{B7E1CD}3.238 & \cellcolor[HTML]{B7E1CD}3.156 & 0.819 & 0.759 & \cellcolor[HTML]{B7E1CD}{\color[HTML]{000000} 4.074} & 4.012 \ \cline{2-11}
& & FP32 & 1.803 & 1.647 & 5.304 & 5.193 & 1.237 & 1.13 & 6.303 & 6.210 \
& & Int8 & 3.911 & 2.901 & 4.729 & 4.300 & 0.988 & 0.733 & \cellcolor[HTML]{B7E1CD}5.472 & 5.106 \
& & Int4 & 8.937 & 8.793 & 6.394 & 6.377 & 3.71 & 3.65 & 6.684 & 6.669 \
\multirow{-16}{}{\textbf{CIFAR10}} & \multirow{-4}{}{ResNet-50} & Int2 & 1.991 & 1.431 & \cellcolor[HTML]{B7E1CD}4.638 & \cellcolor[HTML]{B7E1CD}4.151 & 1.926 & 1.385 & 5.499 & \cellcolor[HTML]{B7E1CD}5.094 \ \specialrule{.1em}{.05em}{.05em}
& & FP32 & 4.266 & 4.192 & 9.33 & 9.302 & 0.859 & 0.844 & 10.467 & 10.443 \
& & Int8 & 7.354 & 6.339 & 7.451 & 7.155 & 0.474 & 0.409 & 8.084 & 7.812 \
& & Int4 & 3.101 & 2.673 & 7.399 & 7.101 & 0.473 & 0.408 & 8.032 & 7.759 \
& \multirow{-4}{}{NiN (5x10)} & Int2 & 2.25 & 1.939 & \cellcolor[HTML]{B7E1CD}6.138 & \cellcolor[HTML]{B7E1CD}5.776 & 0.313 & 0.27 & \cellcolor[HTML]{B7E1CD}7.774 & \cellcolor[HTML]{B7E1CD}7.491 \ \cline{2-11}
& & FP32 & 3.505 & 3.409 & 10.958 & 10.904 & 0.777 & 0.755 & 12.041 & 11.992 \
& & Int8 & 1.712 & 1.561 & \cellcolor[HTML]{B7E1CD}9.175 & \cellcolor[HTML]{B7E1CD}8.963 & 0.582 & 0.531 & 9.956 & 9.761 \
& & Int4 & 3.934 & 3.595 & 9.274 & 9.069 & 0.581 & 0.531 & \cellcolor[HTML]{B7E1CD}9.794 & \cellcolor[HTML]{B7E1CD}9.599 \
& \multirow{-4}{}{NiN (5x12)} & Int2 & 4.343 & 3.922 & 9.479 & 9.252 & 0.557 & 0.503 & 9.828 & 9.609 \ \cline{2-11}
& & FP32 & 3.535 & 3.495 & 4.243 & 4.234 & 3.429 & 3.39 & 4.795 & 4.786 \
& & Int8 & 6.031 & 5.696 & 3.685 & 3.631 & 1.194 & 1.128 & 4.232 & 4.185 \
& & Int4 & 2.381 & 2.25 & 3.591 & 3.536 & 1.166 & 1.102 & \cellcolor[HTML]{B7E1CD}4.117 & \cellcolor[HTML]{B7E1CD}4.069 \
& \multirow{-4}{}{ResNet-18} & Int2 & 3.704 & 3.443 & \cellcolor[HTML]{B7E1CD}3.538 & \cellcolor[HTML]{B7E1CD}3.465 & 27.983 & 27.247 & 4.65 & 4.611 \ \cline{2-11}
& & FP32 & 4.396 & 4.265 & 5.918 & 5.883 & 4.732 & 4.591 & 6.797 & 6.768 \
& & Int8 & 5.583 & 4.279 & \cellcolor[HTML]{B7E1CD}4.775 & \cellcolor[HTML]{B7E1CD}4.385 & 2.445 & 1.874 & \cellcolor[HTML]{B7E1CD}5.613 & \cellcolor[HTML]{B7E1CD}5.285 \
& & Int4 & 3.076 & 2.397 & 5.273 & 4.945 & 2.386 & 1.859 & 6.809 & 6.558 \
\multirow{-16}{}{\textbf{CIFAR100}} & \multirow{-4}{}{ResNet-50} & Int2 & 29.727 & 29.531 & 5.253 & 5.247 & 37.893 & 38.124 & 8.343 & 8.339 \ \specialrule{.1em}{.05em}{.05em}
& & FP32 & 11.694 & 11.584 & 12.378 & 12.355 & 349.235 & 345.962 & 20.069 & 20.055 \
& & Int8 & 7.836 & 5.303 & 10.1 & 8.902 & 104.91 & 70.994 & 18.416 & 17.786 \
& & Int4 & 4.615 & 3.108 & \cellcolor[HTML]{B7E1CD}10.072 & \cellcolor[HTML]{B7E1CD}8.853 & 104.557 & 70.419 & 18.41 & 17.772 \
& \multirow{-4}{}{ResNet-18} & Int2 & 16.397 & 10.563 & 11.004 & 9.770 & 101.31 & 65.266 & \cellcolor[HTML]{B7E1CD}18.362 & \cellcolor[HTML]{B7E1CD}17.649 \ \cline{2-11}
& & FP32 & 7.942 & 7.144 & 22.163 & 21.826 & 5.067 & 4.556 & 27.746 & 27.418 \
& & Int8 & 20.398 & 14.344 & \cellcolor[HTML]{B7E1CD}17.597 & \cellcolor[HTML]{B7E1CD}16.272 & 11.208 & 7.881 & 20.104 & 18.995 \
& & Int4 & 35.011 & 24.637 & 17.809 & 16.503 & 258.118 & 181.636 & \cellcolor[HTML]{B7E1CD}19.162 & \cellcolor[HTML]{B7E1CD}18.833 \
& \multirow{-4}{*}{ResNet-50} & Int2 & 245.654 & 173.287 & 17.954 & 17.023 & 258.722 & 182.505 & 24.051 & 24.007 \ \cline{2-11}
text& & FP32 & 8.653 & 8.123 & 19.651 & 18.226 & 7.017 & 6.753 & 31.924 & 31.5 \\ & & Int8 & 26.544 & 27.352 & 18.232 & 17.563 & 7.445 & 5.126 & 22.524 & 23.432 \\ & & Int4 & 35.786 & 33.982 & \cellcolor[HTML]{B7E1CD}17.983 & \cellcolor[HTML]{B7E1CD}16.148 & 5.927 & 4.672 & \cellcolor[HTML]{B7E1CD}20.122 & \cellcolor[HTML]{B7E1CD}23.765 \\ \multirow{-8}{*}{\textbf{ImageNet-1K}} & \multirow{-4}{*}{DeiT-T} & Int2 & 236.322 & 171.234 & 19.865 & 18.982 & 218.621 & 169.972 & 32.114 & 33.763\\ \cline{2-11}
\end{tabular}
\label{Tab:flatness_measures}
\end{adjustbox}
\end{table}
To capture the flatness around the local minima of a given network , we utilize the PAC-Bayesian bounds \cite{mcallester1999pac} and sharpness measures \cite{keskar2016large}. The former adds Gaussian perturbations to the network parameters and captures the average (expected) flatness within a bound, while the latter captures the worst-case flatness, i.e. sharpness, through adding adversarial worst-case perturbations to the parameters. We use the same formulation and implementation as specified by \cite{jiang2019fantastic} and \cite{dziugaite2020search}; in particular, similar to their approach, we measure and report these metrics by considering the trained model and the network-at-initialization parameters as the \textit{origin} and \textit{initialization} tensors, respectively. Moreover, as discussed in the above paragraph, and indicated in Table \ref{Tab:weight_magnitude}, the magnitude-aware versions of these metrics are the most reliable way of capturing the flatness of a network, hence we also report the magnitude-aware measurements, and they will be the main measure of loss landscape flatness. Details about these metrics and their formulation are in the supplementary Section \ref{appx:flatness_landscape}.
As shown in Table \ref{Tab:flatness_measures}, for the 3 datasets of CIFAR-10, CIFAR-100 , and ImageNet-1k, and over a variety of network backbones, the quantized models enjoy flatter loss landscapes which is an indicator of better generalization to unseen data. An important observation from the experiments reported in Table \ref{Tab:flatness_measures} is that relying solely on sharpness or PAC-Bayesian measures without considering the magnitude of the network parameters might create the assumption that quantization does increase the network sharpness. We suspect that this might have indeed been the cause of this assumption in the works of \cite{liu2021sharpness, wang2022squat} which assume worse sharpness for quantized models and then propose Sharpness-Aware-Minimization (SAM) coupled with Quantization-Aware-Training (QAT). However, our empirical studies demonstrate that when the magnitude of the parameters is taken into account, quantization does actually improve flatness, and the finding that SAM can help quantized models achieve further flatness does not necessarily mean that quantized models have sharper minima compared to the non-quantized counterparts.
\begin{figure}[!t]
\begin{picture}(0,100)
\centering
\includegraphics[width=\textwidth]{images/loss_landscape_vis.jpeg}
\put(-320,-8){Full Precision}
\put(-220,-8){8-Bit}
\put(-135,-8){4-Bit}
\put(-45,-8){2-Bit}
\end{picture}
\vspace{3mm}
\caption{Visualization of the loss landscape for the full precision and quantized ResNet-18 models trained on CIFAR-10. Referring to Table \ref{Tab:flatness_measures}, it is observed that quantized models possess flatter minima, which contributes to their enhanced generalization capabilities.}
\label{fig:loss_landscape}
\end{figure}
\subsubsection{Loss Landscape Visualization}
The loss landscape of quantized neural networks can be effectively visualized. Using the technique outlined in \cite{visualloss}, we projected the loss landscape of quantized models and the full precision ResNet-18 models trained on the CIFAR-10 dataset onto a three-dimensional plane. The visual representation, as illustrated in Figure \ref{fig:loss_landscape}, clearly demonstrates that the loss landscape associated with quantized models is comparatively flatter. This observation confirms the findings presented in Table \ref{Tab:flatness_measures}.
\subsection{Measuring the Generalization Gap}
\label{sec:cifar10_100_results}
To study the generalization behaviors of quantized models, we have trained almost 2000 models on the CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Our goal is to measure the , i.e., the generalization gap, by utilizing the data that is unseen during the training process (the test data). Without loss of generality, herein, we will refer to the difference between test data loss and training data loss as the generalization gap.
Following the guidelines of \cite{jiang2019fantastic} and \cite{dziugaite2020search} to remove the effect of randomness from our analysis of generalization behavior, for smaller datasets (CIFAR-10 and CIFAR100), we construct a pool of trained models by varying 5 commonly used hyperparameters over the fully convolutional "Network-in-Network" architecture \cite{lin2013network}. The hyperparameter list includes learning rate, weight decay, optimization algorithm, architecture depth, and layer width. In our experiments, each hyperparameter has 3 choices; therefore, the number of trained models per quantization level is , with the number of bits considered being selected from the following values: 2, 4, and 8, and the resulting models are compared with their full-precision counterpart. Thus, in total, we will have models trained per dataset, over CIFAR-10 and CIFAR-100 datasets, which gives us almost 2000 trained models. For more details regarding hyperparameter choices and model specifications, please refer to the supplementary material Section \ref{appx:experiment_setup}. Lastly, for ImageNet-1k, we measured the generalization gap on both CNN and ViT models.
In \cite{jiang2019fantastic}, to measure the generalization gap of a model, the authors first train the model until the training loss converges to a threshold (0.01). Here, we argue that this approach might not be optimal when quantization enters the picture. First, lower bit-resolution quantized models have lower learning capacity compared to the higher bit-resolution quantized or the full-precision ones; our proof in Equation \ref{eq:quantization_regularization} also indicates that the learning capabilities of a given network diminish as the number of quantization bits decreases. Second, early stopping of the training process may hinder the trained models from appropriately converging to flatter local minima, which quantized models enjoy in their loss landscape. Therefore, we apply a different training approach. Each model is trained for 300 epochs by lowering the learning rate by a factor of 10 at epochs 100 and 200, and at the end, the model corresponding to the lowest training loss is chosen.
Table \ref{Tab:gen_gap_cifars} summarizes the results of these experiments. The accuracy-generalization trade-off is demonstrated through these experiments. The training loss and training accuracy of lower-resolution quantized models are negatively impacted. However, they enjoy better generalization. Some additional interesting results can be inferred from Table \ref{Tab:gen_gap_cifars}. Notably, 8-bit quantization is almost on par with the full-precision counterpart on all the metrics. This is also evident in Table \ref{Tab:flatness_measures}, where we studied the sharpness-based measures. The other interesting observation is that although training losses vary among the models, the test loss is almost the same among all; this, in turn, indicates that full-precision and high-resolution quantized models have a higher degree of overfitting, which could result from converging to sharper local minima.
\begin{table}[!h]
text\caption{Effect of distortion on generalization gap with quantized models. Compared to FP32 column, we have highlighted better generalization gap with \crule[tab4_g]{0.2cm}{0.2cm} and \crule[tab4_r]{0.2cm}{0.2cm} to show the opposite.} \begin{adjustbox}{max width=\textwidth} % \label{tab_appx:ditortion_imagenet} \label{tab:ditortion} \begin{tabular}{lrrrrrrrrrrrrrrrrrrrrr} \hline \multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{4}{c}{\textbf{Severity 1}} & \multicolumn{4}{c}{\textbf{Severity 2}} & \multicolumn{4}{c}{\textbf{Severity 3}} & \multicolumn{4}{c}{Severity 4} & \multicolumn{4}{c}{Severity 5} \\ \cline{3-22} \multicolumn{1}{c}{\multirow{-2}{*}{\textbf{Model}}} & \multicolumn{1}{c}{\multirow{-2}{*}{\textbf{Augmentation}}} & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 \\ \hline & Gaussian Noise & 1.067 & \cellcolor[HTML]{B7E1CD}0.86 & \cellcolor[HTML]{B7E1CD}0.939 & \cellcolor[HTML]{EA9999}1.21 & 1.913 & \cellcolor[HTML]{B7E1CD}1.46 & \cellcolor[HTML]{B7E1CD}1.629 & \cellcolor[HTML]{EA9999}2.201 & 3.439 & \cellcolor[HTML]{B7E1CD}2.52 & \cellcolor[HTML]{B7E1CD}2.796 & \cellcolor[HTML]{EA9999}3.658 & 5.529 & \cellcolor[HTML]{B7E1CD}3.96 & \cellcolor[HTML]{B7E1CD}4.352 & \cellcolor[HTML]{B7E1CD}5.115 & 7.653 & \cellcolor[HTML]{B7E1CD}5.62 & \cellcolor[HTML]{B7E1CD}6.117 & \cellcolor[HTML]{B7E1CD}6.093 \\ & Shot Noise & 1.238 & \cellcolor[HTML]{B7E1CD}0.96 & \cellcolor[HTML]{B7E1CD}1.054 & \cellcolor[HTML]{EA9999}1.333 & 2.289 & \cellcolor[HTML]{B7E1CD}1.72 & \cellcolor[HTML]{B7E1CD}1.887 & \cellcolor[HTML]{EA9999}2.421 & 3.801 & \cellcolor[HTML]{B7E1CD}2.75 & \cellcolor[HTML]{B7E1CD}2.987 & \cellcolor[HTML]{B7E1CD}3.685 & 6.438 & \cellcolor[HTML]{B7E1CD}4.4 & \cellcolor[HTML]{B7E1CD}4.739 & \cellcolor[HTML]{B7E1CD}5.304 & 7.919 & \cellcolor[HTML]{B7E1CD}5.41 & \cellcolor[HTML]{B7E1CD}5.805 & \cellcolor[HTML]{B7E1CD}6.003 \\ & Impulse Noise & 2.235 & \cellcolor[HTML]{B7E1CD}1.78 & \cellcolor[HTML]{B7E1CD}2.058 & \cellcolor[HTML]{EA9999}2.324 & 3.177 & \cellcolor[HTML]{B7E1CD}2.35 & \cellcolor[HTML]{B7E1CD}2.636 & \cellcolor[HTML]{EA9999}3.279 & 4.061 & \cellcolor[HTML]{B7E1CD}2.9 & \cellcolor[HTML]{B7E1CD}3.185 & \cellcolor[HTML]{B7E1CD}4.001 & 6.096 & \cellcolor[HTML]{B7E1CD}4.26 & \cellcolor[HTML]{B7E1CD}4.595 & \cellcolor[HTML]{B7E1CD}5.324 & 7.781 & \cellcolor[HTML]{B7E1CD}5.57 & \cellcolor[HTML]{B7E1CD}5.995 & \cellcolor[HTML]{B7E1CD}6.114 \\ & Defocus Noise & 0.979 & \cellcolor[HTML]{B7E1CD}0.89 & \cellcolor[HTML]{B7E1CD}0.858 & \cellcolor[HTML]{B7E1CD}0.822 & 1.432 & \cellcolor[HTML]{B7E1CD}1.37 & \cellcolor[HTML]{B7E1CD}1.325 & \cellcolor[HTML]{B7E1CD}1.302 & 2.394 & \cellcolor[HTML]{B7E1CD}2.34 & \cellcolor[HTML]{B7E1CD}2.263 & \cellcolor[HTML]{B7E1CD}2.217 & 3.285 & \cellcolor[HTML]{B7E1CD}3.19 & \cellcolor[HTML]{B7E1CD}3.099 & \cellcolor[HTML]{B7E1CD}2.934 & 3.983 & \cellcolor[HTML]{B7E1CD}3.92 & \cellcolor[HTML]{B7E1CD}3.808 & \cellcolor[HTML]{B7E1CD}3.498 \\ & Glass Blue & 1.18 & \cellcolor[HTML]{B7E1CD}1.07 & \cellcolor[HTML]{B7E1CD}1.031 & \cellcolor[HTML]{B7E1CD}0.985 & 1.969 & \cellcolor[HTML]{B7E1CD}1.86 & \cellcolor[HTML]{B7E1CD}1.812 & \cellcolor[HTML]{B7E1CD}1.804 & 3.822 & \cellcolor[HTML]{EA9999}3.83 & \cellcolor[HTML]{B7E1CD}3.727 & \cellcolor[HTML]{B7E1CD}3.54 & 4.221 & \cellcolor[HTML]{EA9999}4.25 & \cellcolor[HTML]{B7E1CD}4.15 & \cellcolor[HTML]{B7E1CD}3.909 & 4.652 & \cellcolor[HTML]{B7E1CD}4.62 & \cellcolor[HTML]{B7E1CD}4.542 & \cellcolor[HTML]{B7E1CD}4.139 \\ & Motion Blur & 0.687 & \cellcolor[HTML]{B7E1CD}0.55 & \cellcolor[HTML]{B7E1CD}0.542 & \cellcolor[HTML]{B7E1CD}0.509 & 1.37 & \cellcolor[HTML]{B7E1CD}1.22 & \cellcolor[HTML]{B7E1CD}1.245 & \cellcolor[HTML]{B7E1CD}1.261 & 2.521 & \cellcolor[HTML]{B7E1CD}2.42 & \cellcolor[HTML]{B7E1CD}2.454 & \cellcolor[HTML]{B7E1CD}2.381 & 3.7 & \cellcolor[HTML]{B7E1CD}3.64 & \cellcolor[HTML]{B7E1CD}3.678 & \cellcolor[HTML]{B7E1CD}3.412 & 4.285 & \cellcolor[HTML]{B7E1CD}4.23 & \cellcolor[HTML]{B7E1CD}4.275 & \cellcolor[HTML]{B7E1CD}3.875 \\ & Zoom Blur & 1.518 & \cellcolor[HTML]{B7E1CD}1.38 & \cellcolor[HTML]{B7E1CD}1.382 & \cellcolor[HTML]{B7E1CD}1.386 & 2.219 & \cellcolor[HTML]{B7E1CD}2.1 & \cellcolor[HTML]{B7E1CD}2.119 & \cellcolor[HTML]{B7E1CD}2.094 & 2.688 & \cellcolor[HTML]{B7E1CD}2.58 & \cellcolor[HTML]{B7E1CD}2.588 & \cellcolor[HTML]{B7E1CD}2.518 & 3.213 & \cellcolor[HTML]{B7E1CD}3.12 & \cellcolor[HTML]{B7E1CD}3.137 & \cellcolor[HTML]{B7E1CD}3.028 & 3.666 & \cellcolor[HTML]{B7E1CD}3.59 & \cellcolor[HTML]{B7E1CD}3.599 & \cellcolor[HTML]{B7E1CD}3.437 \\ & Snow & 1.401 & \cellcolor[HTML]{B7E1CD}1.01 & \cellcolor[HTML]{B7E1CD}0.998 & \cellcolor[HTML]{B7E1CD}1.124 & 3.215 & \cellcolor[HTML]{B7E1CD}2.36 & \cellcolor[HTML]{B7E1CD}2.374 & \cellcolor[HTML]{B7E1CD}2.643 & 2.969 & \cellcolor[HTML]{B7E1CD}2.07 & \cellcolor[HTML]{B7E1CD}2.094 & \cellcolor[HTML]{B7E1CD}2.315 & 3.97 & \cellcolor[HTML]{B7E1CD}2.81 & \cellcolor[HTML]{B7E1CD}2.869 & \cellcolor[HTML]{B7E1CD}3.074 & 4.515 & \cellcolor[HTML]{B7E1CD}3.48 & \cellcolor[HTML]{B7E1CD}3.517 & \cellcolor[HTML]{B7E1CD}3.572 \\ & Frost & 0.949 & \cellcolor[HTML]{B7E1CD}0.66 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.633 & 2.093 & \cellcolor[HTML]{B7E1CD}1.68 & \cellcolor[HTML]{B7E1CD}1.681 & \cellcolor[HTML]{B7E1CD}1.812 & 2.978 & \cellcolor[HTML]{B7E1CD}2.53 & \cellcolor[HTML]{B7E1CD}2.553 & \cellcolor[HTML]{B7E1CD}2.688 & 3.141 & \cellcolor[HTML]{B7E1CD}2.74 & \cellcolor[HTML]{B7E1CD}2.766 & \cellcolor[HTML]{B7E1CD}2.893 & 3.713 & \cellcolor[HTML]{B7E1CD}3.31 & \cellcolor[HTML]{B7E1CD}3.362 & \cellcolor[HTML]{B7E1CD}3.447 \\ & Fog & 0.809 & \cellcolor[HTML]{B7E1CD}0.42 & \cellcolor[HTML]{B7E1CD}0.444 & \cellcolor[HTML]{B7E1CD}0.405 & 1.214 & \cellcolor[HTML]{B7E1CD}0.68 & \cellcolor[HTML]{B7E1CD}0.734 & \cellcolor[HTML]{B7E1CD}0.774 & 1.857 & \cellcolor[HTML]{B7E1CD}1.18 & \cellcolor[HTML]{B7E1CD}1.273 & \cellcolor[HTML]{B7E1CD}1.431 & 2.347 & \cellcolor[HTML]{B7E1CD}1.68 & \cellcolor[HTML]{B7E1CD}1.762 & \cellcolor[HTML]{B7E1CD}1.958 & 3.77 & \cellcolor[HTML]{B7E1CD}3.03 & \cellcolor[HTML]{B7E1CD}3.141 & \cellcolor[HTML]{B7E1CD}3.275 \\ & Brightness & 0.121 & \cellcolor[HTML]{B7E1CD}0.04 & \cellcolor[HTML]{B7E1CD}0.019 & \cellcolor[HTML]{EA9999}0.155 & 0.221 & \cellcolor[HTML]{B7E1CD}0.1 & \cellcolor[HTML]{B7E1CD}0.08 & \cellcolor[HTML]{B7E1CD}0.062 & 0.378 & \cellcolor[HTML]{B7E1CD}0.19 & \cellcolor[HTML]{B7E1CD}0.184 & \cellcolor[HTML]{B7E1CD}0.084 & 0.631 & \cellcolor[HTML]{B7E1CD}0.37 & \cellcolor[HTML]{B7E1CD}0.36 & \cellcolor[HTML]{B7E1CD}0.323 & 0.986 & \cellcolor[HTML]{B7E1CD}0.62 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.672 \\ & Contrast & 0.523 & \cellcolor[HTML]{B7E1CD}0.24 & \cellcolor[HTML]{B7E1CD}0.232 & \cellcolor[HTML]{B7E1CD}0.13 & 0.867 & \cellcolor[HTML]{B7E1CD}0.4 & \cellcolor[HTML]{B7E1CD}0.413 & \cellcolor[HTML]{B7E1CD}0.396 & 1.627 & \cellcolor[HTML]{B7E1CD}0.81 & \cellcolor[HTML]{B7E1CD}0.861 & \cellcolor[HTML]{B7E1CD}1.031 & 3.61 & \cellcolor[HTML]{B7E1CD}2.37 & \cellcolor[HTML]{B7E1CD}2.529 & \cellcolor[HTML]{B7E1CD}2.921 & 5.264 & \cellcolor[HTML]{B7E1CD}4.63 & \cellcolor[HTML]{B7E1CD}4.765 & \cellcolor[HTML]{B7E1CD}4.479 \\ & Elastic & 0.538 & \cellcolor[HTML]{B7E1CD}0.43 & \cellcolor[HTML]{B7E1CD}0.406 & \cellcolor[HTML]{B7E1CD}0.287 & 2.026 & \cellcolor[HTML]{B7E1CD}1.95 & \cellcolor[HTML]{B7E1CD}1.911 & \cellcolor[HTML]{B7E1CD}1.833 & 1.116 & \cellcolor[HTML]{B7E1CD}1.03 & \cellcolor[HTML]{B7E1CD}0.969 & \cellcolor[HTML]{B7E1CD}0.884 & 1.997 & \cellcolor[HTML]{B7E1CD}1.94 & \cellcolor[HTML]{B7E1CD}1.844 & \cellcolor[HTML]{B7E1CD}1.755 & 4.112 & \cellcolor[HTML]{B7E1CD}4.11 & \cellcolor[HTML]{B7E1CD}3.957 & \cellcolor[HTML]{B7E1CD}3.57 \\ & Pixelate & 0.612 & \cellcolor[HTML]{B7E1CD}0.5 & \cellcolor[HTML]{B7E1CD}0.492 & \cellcolor[HTML]{B7E1CD}0.416 & 0.599 & \cellcolor[HTML]{B7E1CD}0.51 & \cellcolor[HTML]{B7E1CD}0.506 & \cellcolor[HTML]{B7E1CD}0.465 & 1.889 & \cellcolor[HTML]{B7E1CD}1.72 & \cellcolor[HTML]{B7E1CD}1.734 & \cellcolor[HTML]{EA9999}1.958 & 3.046 & \cellcolor[HTML]{B7E1CD}2.93 & \cellcolor[HTML]{B7E1CD}2.88 & \cellcolor[HTML]{EA9999}3.306 & 3.369 & \cellcolor[HTML]{B7E1CD}3.32 & \cellcolor[HTML]{B7E1CD}3.313 & \cellcolor[HTML]{EA9999}3.51 \\ \multirow{-15}{*}{ResNet-18} & JPEG & 0.59 & \cellcolor[HTML]{B7E1CD}0.48 & \cellcolor[HTML]{B7E1CD}0.468 & \cellcolor[HTML]{B7E1CD}0.375 & 0.801 & \cellcolor[HTML]{B7E1CD}0.68 & \cellcolor[HTML]{B7E1CD}0.674 & \cellcolor[HTML]{B7E1CD}0.627 & 0.972 & \cellcolor[HTML]{B7E1CD}0.85 & \cellcolor[HTML]{B7E1CD}0.841 & \cellcolor[HTML]{B7E1CD}0.824 & 1.599 & \cellcolor[HTML]{B7E1CD}1.46 & \cellcolor[HTML]{B7E1CD}1.446 & \cellcolor[HTML]{B7E1CD}1.491 & 2.615 & \cellcolor[HTML]{B7E1CD}2.43 & \cellcolor[HTML]{B7E1CD}2.405 & \cellcolor[HTML]{B7E1CD}2.487 \\ \hline & Gaussian Noise & 1.041 & \cellcolor[HTML]{B7E1CD}0.76 & \cellcolor[HTML]{B7E1CD}0.857 & \cellcolor[HTML]{EA9999}2.78 & 1.923 & \cellcolor[HTML]{B7E1CD}1.382 & \cellcolor[HTML]{B7E1CD}1.536 & \cellcolor[HTML]{EA9999}3.755 & 3.425 & \cellcolor[HTML]{B7E1CD}2.5 & \cellcolor[HTML]{B7E1CD}2.762 & \cellcolor[HTML]{EA9999}5.009 & 5.251 & \cellcolor[HTML]{B7E1CD}4.065 & \cellcolor[HTML]{B7E1CD}4.518 & \cellcolor[HTML]{EA9999}6.182 & 6.997 & \cellcolor[HTML]{B7E1CD}5.815 & \cellcolor[HTML]{B7E1CD}6.423 & \cellcolor[HTML]{EA9999}7.124 \\ & Shot Noise & 1.132 & \cellcolor[HTML]{B7E1CD}0.843 & \cellcolor[HTML]{B7E1CD}1.027 & \cellcolor[HTML]{EA9999}2.926 & 2.214 & \cellcolor[HTML]{B7E1CD}1.591 & \cellcolor[HTML]{B7E1CD}1.846 & \cellcolor[HTML]{EA9999}3.975 & 3.67 & \cellcolor[HTML]{B7E1CD}2.624 & \cellcolor[HTML]{B7E1CD}3.013 & \cellcolor[HTML]{EA9999}5.058 & 5.891 & \cellcolor[HTML]{B7E1CD}4.363 & \cellcolor[HTML]{B7E1CD}5.011 & \cellcolor[HTML]{EA9999}6.36 & 7.045 & \cellcolor[HTML]{B7E1CD}5.418 & \cellcolor[HTML]{B7E1CD}6.137 & \cellcolor[HTML]{B7E1CD}6.96 \\ & Impulse Noise & 1.635 & \cellcolor[HTML]{B7E1CD}1.483 & \cellcolor[HTML]{B7E1CD}1.585 & \cellcolor[HTML]{EA9999}3.043 & 2.597 & \cellcolor[HTML]{B7E1CD}2.223 & \cellcolor[HTML]{B7E1CD}2.284 & \cellcolor[HTML]{EA9999}4.144 & 3.423 & \cellcolor[HTML]{B7E1CD}2.751 & \cellcolor[HTML]{B7E1CD}2.901 & \cellcolor[HTML]{EA9999}4.961 & 5.302 & \cellcolor[HTML]{B7E1CD}4.171 & \cellcolor[HTML]{B7E1CD}4.591 & \cellcolor[HTML]{EA9999}6.296 & 6.979 & \cellcolor[HTML]{B7E1CD}5.753 & \cellcolor[HTML]{B7E1CD}6.315 & \cellcolor[HTML]{EA9999}7.126 \\ & Defocus Noise & 0.863 & \cellcolor[HTML]{B7E1CD}0.799 & \cellcolor[HTML]{EA9999}1.005 & \cellcolor[HTML]{EA9999}3.856 & 1.326 & \cellcolor[HTML]{B7E1CD}1.266 & \cellcolor[HTML]{EA9999}1.519 & \cellcolor[HTML]{EA9999}4.286 & 2.23 & \cellcolor[HTML]{B7E1CD}2.213 & \cellcolor[HTML]{EA9999}2.434 & \cellcolor[HTML]{EA9999}4.858 & 3.059 & \cellcolor[HTML]{B7E1CD}2.983 & \cellcolor[HTML]{EA9999}3.328 & \cellcolor[HTML]{EA9999}5.119 & 3.784 & \cellcolor[HTML]{B7E1CD}3.655 & \cellcolor[HTML]{EA9999}4.2 & \cellcolor[HTML]{EA9999}5.303 \\ & Glass Blue & 1.223 & \cellcolor[HTML]{B7E1CD}1.141 & \cellcolor[HTML]{EA9999}1.509 & \cellcolor[HTML]{EA9999}3.538 & 2.115 & \cellcolor[HTML]{B7E1CD}2.039 & \cellcolor[HTML]{EA9999}2.443 & \cellcolor[HTML]{EA9999}4.257 & 4.01 & \cellcolor[HTML]{B7E1CD}4.003 & \cellcolor[HTML]{EA9999}4.309 & \cellcolor[HTML]{EA9999}4.943 & 4.375 & \cellcolor[HTML]{B7E1CD}4.353 & \cellcolor[HTML]{EA9999}4.564 & \cellcolor[HTML]{EA9999}5.039 & 4.668 & \cellcolor[HTML]{B7E1CD}4.601 & \cellcolor[HTML]{EA9999}4.709 & \cellcolor[HTML]{EA9999}5.141 \\ & Motion Blur & 0.643 & \cellcolor[HTML]{B7E1CD}0.53 & \cellcolor[HTML]{B7E1CD}0.641 & \cellcolor[HTML]{EA9999}3.068 & 1.335 & \cellcolor[HTML]{B7E1CD}1.209 & \cellcolor[HTML]{EA9999}1.354 & \cellcolor[HTML]{EA9999}3.768 & 2.392 & \cellcolor[HTML]{B7E1CD}2.282 & \cellcolor[HTML]{EA9999}2.435 & \cellcolor[HTML]{EA9999}4.349 & 3.5 & \cellcolor[HTML]{B7E1CD}3.418 & \cellcolor[HTML]{EA9999}3.588 & \cellcolor[HTML]{EA9999}4.774 & 4.108 & \cellcolor[HTML]{B7E1CD}4.028 & \cellcolor[HTML]{EA9999}4.235 & \cellcolor[HTML]{EA9999}4.983 \\ & Zoom Blur & 1.539 & \cellcolor[HTML]{B7E1CD}1.423 & \cellcolor[HTML]{EA9999}1.607 & \cellcolor[HTML]{EA9999}3.564 & 2.282 & \cellcolor[HTML]{B7E1CD}2.185 & \cellcolor[HTML]{EA9999}2.358 & \cellcolor[HTML]{EA9999}3.97 & 2.774 & \cellcolor[HTML]{B7E1CD}2.685 & \cellcolor[HTML]{EA9999}2.886 & \cellcolor[HTML]{EA9999}4.335 & 3.317 & \cellcolor[HTML]{B7E1CD}3.263 & \cellcolor[HTML]{EA9999}3.492 & \cellcolor[HTML]{EA9999}4.567 & 3.797 & \cellcolor[HTML]{B7E1CD}3.738 & \cellcolor[HTML]{EA9999}4.035 & \cellcolor[HTML]{EA9999}4.817 \\ & Snow & 1.253 & \cellcolor[HTML]{B7E1CD}0.904 & \cellcolor[HTML]{B7E1CD}1.168 & \cellcolor[HTML]{EA9999}2.377 & 3.074 & \cellcolor[HTML]{B7E1CD}2.429 & \cellcolor[HTML]{B7E1CD}2.694 & \cellcolor[HTML]{EA9999}4.017 & 2.838 & \cellcolor[HTML]{B7E1CD}2.16 & \cellcolor[HTML]{B7E1CD}2.497 & \cellcolor[HTML]{EA9999}3.869 & 3.775 & \cellcolor[HTML]{B7E1CD}2.945 & \cellcolor[HTML]{B7E1CD}3.286 & \cellcolor[HTML]{EA9999}4.797 & 4.562 & \cellcolor[HTML]{B7E1CD}3.776 & \cellcolor[HTML]{B7E1CD}3.996 & \cellcolor[HTML]{EA9999}5.151 \\ & Frost & 0.941 & \cellcolor[HTML]{B7E1CD}0.658 & \cellcolor[HTML]{B7E1CD}0.8 & \cellcolor[HTML]{EA9999}2.236 & 2.193 & \cellcolor[HTML]{B7E1CD}1.784 & \cellcolor[HTML]{B7E1CD}2.021 & \cellcolor[HTML]{EA9999}3.713 & 3.154 & \cellcolor[HTML]{B7E1CD}2.673 & \cellcolor[HTML]{B7E1CD}2.969 & \cellcolor[HTML]{EA9999}4.682 & 3.345 & \cellcolor[HTML]{B7E1CD}2.904 & \cellcolor[HTML]{B7E1CD}3.223 & \cellcolor[HTML]{EA9999}4.94 & 3.956 & \cellcolor[HTML]{B7E1CD}3.484 & \cellcolor[HTML]{B7E1CD}3.835 & \cellcolor[HTML]{EA9999}5.46 \\ & Fog & 0.699 & \cellcolor[HTML]{B7E1CD}0.354 & \cellcolor[HTML]{EA9999}0.822 & \cellcolor[HTML]{EA9999}3.874 & 1.084 & \cellcolor[HTML]{B7E1CD}0.624 & \cellcolor[HTML]{EA9999}1.24 & \cellcolor[HTML]{EA9999}4.454 & 1.715 & \cellcolor[HTML]{B7E1CD}1.145 & \cellcolor[HTML]{EA9999}1.802 & \cellcolor[HTML]{EA9999}4.929 & 2.256 & \cellcolor[HTML]{B7E1CD}1.675 & \cellcolor[HTML]{B7E1CD}2.111 & \cellcolor[HTML]{EA9999}4.969 & 3.792 & \cellcolor[HTML]{B7E1CD}3.116 & \cellcolor[HTML]{B7E1CD}3.298 & \cellcolor[HTML]{EA9999}5.371 \\ & Brightness & 0.034 & \cellcolor[HTML]{EA9999}0.05 & \cellcolor[HTML]{B7E1CD}0.019 & \cellcolor[HTML]{EA9999}1.342 & 0.143 & \cellcolor[HTML]{B7E1CD}0.008 & \cellcolor[HTML]{B7E1CD}0.089 & \cellcolor[HTML]{EA9999}1.359 & 0.315 & \cellcolor[HTML]{B7E1CD}0.117 & \cellcolor[HTML]{B7E1CD}0.211 & \cellcolor[HTML]{EA9999}1.549 & 0.595 & \cellcolor[HTML]{B7E1CD}0.303 & \cellcolor[HTML]{B7E1CD}0.422 & \cellcolor[HTML]{EA9999}1.987 & 1.002 & \cellcolor[HTML]{B7E1CD}0.591 & \cellcolor[HTML]{B7E1CD}0.752 & \cellcolor[HTML]{EA9999}2.663 \\ & Contrast & 0.482 & \cellcolor[HTML]{B7E1CD}0.188 & \cellcolor[HTML]{EA9999}0.816 & \cellcolor[HTML]{EA9999}3.717 & 0.846 & \cellcolor[HTML]{B7E1CD}0.394 & \cellcolor[HTML]{EA9999}1.513 & \cellcolor[HTML]{EA9999}4.571 & 1.624 & \cellcolor[HTML]{B7E1CD}0.899 & \cellcolor[HTML]{EA9999}3.095 & \cellcolor[HTML]{EA9999}5.556 & 3.66 & \cellcolor[HTML]{B7E1CD}2.725 & \cellcolor[HTML]{EA9999}5.816 & \cellcolor[HTML]{EA9999}6.396 & 5.411 & \cellcolor[HTML]{B7E1CD}4.881 & \cellcolor[HTML]{EA9999}6.505 & \cellcolor[HTML]{EA9999}6.59 \\ & Elastic & 0.442 & \cellcolor[HTML]{B7E1CD}0.347 & \cellcolor[HTML]{EA9999}0.481 & \cellcolor[HTML]{EA9999}2.396 & 1.973 & \cellcolor[HTML]{B7E1CD}1.871 & \cellcolor[HTML]{EA9999}2.105 & \cellcolor[HTML]{EA9999}4.012 & 0.962 & \cellcolor[HTML]{B7E1CD}0.87 & \cellcolor[HTML]{EA9999}1.113 & \cellcolor[HTML]{EA9999}2.474 & 1.913 & \cellcolor[HTML]{B7E1CD}1.807 & \cellcolor[HTML]{EA9999}2.21 & \cellcolor[HTML]{EA9999}2.959 & 4.106 & \cellcolor[HTML]{B7E1CD}3.982 & \cellcolor[HTML]{EA9999}4.693 & \cellcolor[HTML]{B7E1CD}4.036 \\ & Pixelate & 0.926 & \cellcolor[HTML]{B7E1CD}0.653 & \cellcolor[HTML]{B7E1CD}0.872 & \cellcolor[HTML]{EA9999}1.883 & 1.444 & \cellcolor[HTML]{B7E1CD}1.02 & \cellcolor[HTML]{B7E1CD}0.934 & \cellcolor[HTML]{EA9999}1.838 & 2.155 & \cellcolor[HTML]{B7E1CD}1.822 & \cellcolor[HTML]{EA9999}2.468 & \cellcolor[HTML]{EA9999}2.172 & 3.111 & \cellcolor[HTML]{B7E1CD}3.064 & \cellcolor[HTML]{EA9999}3.773 & \cellcolor[HTML]{B7E1CD}2.755 & 3.979 & \cellcolor[HTML]{EA9999}3.993 & \cellcolor[HTML]{EA9999}3.988 & \cellcolor[HTML]{B7E1CD}3.301 \\ \multirow{-15}{*}{MoblieNet V2} & JPEG & 0.491 & \cellcolor[HTML]{B7E1CD}0.382 & \cellcolor[HTML]{EA9999}0.554 & \cellcolor[HTML]{EA9999}1.754 & 0.675 & \cellcolor[HTML]{B7E1CD}0.552 & \cellcolor[HTML]{EA9999}0.784 & \cellcolor[HTML]{EA9999}1.848 & 0.826 & \cellcolor[HTML]{B7E1CD}0.693 & \cellcolor[HTML]{EA9999}0.967 & \cellcolor[HTML]{EA9999}1.928 & 1.357 & \cellcolor[HTML]{B7E1CD}1.165 & \cellcolor[HTML]{EA9999}1.555 & \cellcolor[HTML]{EA9999}2.182 & 2.182 & \cellcolor[HTML]{B7E1CD}1.902 & \cellcolor[HTML]{EA9999}2.462 & \cellcolor[HTML]{EA9999}2.545 \\ \hline & Gaussian Noise & 0.938 & \cellcolor[HTML]{b7e1cd}0.914 & \cellcolor[HTML]{b7e1cd}0.928 & \cellcolor[HTML]{ea9999}0.973 &1.437 & \cellcolor[HTML]{b7e1cd}1.282 & \cellcolor[HTML]{b7e1cd}1.112 & \cellcolor[HTML]{ea9999}1.571 &2.363 & \cellcolor[HTML]{b7e1cd}2.047 & \cellcolor[HTML]{ea9999}2.513 & \cellcolor[HTML]{ea9999}2.89 &3.719 & \cellcolor[HTML]{b7e1cd}3.255 & \cellcolor[HTML]{ea9999}3.754 & \cellcolor[HTML]{ea9999}4.88 &5.134 & \cellcolor[HTML]{b7e1cd}4.999 & \cellcolor[HTML]{ea9999}5.339 & \cellcolor[HTML]{ea9999}7.828 \\ & Shot Noise & 0.961 & \cellcolor[HTML]{b7e1cd}0.946 & \cellcolor[HTML]{b7e1cd}0.957 & \cellcolor[HTML]{ea9999}1.026 &1.585 & \cellcolor[HTML]{b7e1cd}1.408 & \cellcolor[HTML]{b7e1cd}1.246 & \cellcolor[HTML]{ea9999}1.83 &2.448 & \cellcolor[HTML]{b7e1cd}2.166 & \cellcolor[HTML]{b7e1cd}2.023 & \cellcolor[HTML]{ea9999}3.215 &4.084 & \cellcolor[HTML]{b7e1cd}3.697 & \cellcolor[HTML]{b7e1cd}3.887 & \cellcolor[HTML]{ea9999}5.748 &4.924 & \cellcolor[HTML]{b7e1cd}4.919 & \cellcolor[HTML]{ea9999}5.229 & \cellcolor[HTML]{ea9999}7.656 \\ & Impulse Noise & 1.703 & \cellcolor[HTML]{b7e1cd}1.652 & \cellcolor[HTML]{b7e1cd}1.676 & \cellcolor[HTML]{ea9999}1.789 &2.013 & \cellcolor[HTML]{b7e1cd}1.874 & \cellcolor[HTML]{b7e1cd}1.464 & \cellcolor[HTML]{ea9999}2.373 &2.564 & \cellcolor[HTML]{b7e1cd}2.295 & \cellcolor[HTML]{b7e1cd}1.995 & \cellcolor[HTML]{ea9999}3.211 &3.962 & \cellcolor[HTML]{b7e1cd}3.507 & \cellcolor[HTML]{b7e1cd}3.458 & \cellcolor[HTML]{ea9999}5.435 &5.28 & \cellcolor[HTML]{b7e1cd}4.942 & \cellcolor[HTML]{b7e1cd}5.105 & \cellcolor[HTML]{ea9999}8.123 \\ & Defocus Noise & 1.059 & \cellcolor[HTML]{b7e1cd}1.042 & \cellcolor[HTML]{b7e1cd}0.911 & \cellcolor[HTML]{b7e1cd}0.869 &1.441 & \cellcolor[HTML]{b7e1cd}1.414 & \cellcolor[HTML]{b7e1cd}1.309 & \cellcolor[HTML]{b7e1cd}1.298 &2.344 & \cellcolor[HTML]{b7e1cd}2.311 & \cellcolor[HTML]{b7e1cd}2.286 & \cellcolor[HTML]{b7e1cd}2.231 &3.244 & \cellcolor[HTML]{b7e1cd}3.225 & \cellcolor[HTML]{b7e1cd}3.226 & \cellcolor[HTML]{b7e1cd}3.164 &4.052 & \cellcolor[HTML]{b7e1cd}4.049 & \cellcolor[HTML]{b7e1cd}4.019 & \cellcolor[HTML]{b7e1cd}3.994 \\ & Glass Blue & 1.349 & \cellcolor[HTML]{b7e1cd}1.27 & \cellcolor[HTML]{b7e1cd}1.011 & \cellcolor[HTML]{ea9999}1.457 &2.297 & \cellcolor[HTML]{b7e1cd}2.169 & \cellcolor[HTML]{b7e1cd}1.829 & \cellcolor[HTML]{b7e1cd}2.088 &4.613 & \cellcolor[HTML]{b7e1cd}4.551 & \cellcolor[HTML]{b7e1cd}4.185 & \cellcolor[HTML]{b7e1cd}4.346 &5.057 & \cellcolor[HTML]{b7e1cd}5.009 & \cellcolor[HTML]{b7e1cd}4.815 & \cellcolor[HTML]{b7e1cd}4.793 &5.399 & \cellcolor[HTML]{b7e1cd}5.376 & \cellcolor[HTML]{b7e1cd}5.34 & \cellcolor[HTML]{b7e1cd}5.102 \\ & Motion Blur & 0.731 & \cellcolor[HTML]{b7e1cd}0.638 & \cellcolor[HTML]{b7e1cd}0.623 & \cellcolor[HTML]{b7e1cd}0.578 &1.314 & \cellcolor[HTML]{b7e1cd}1.307 & \cellcolor[HTML]{b7e1cd}1.19 & \cellcolor[HTML]{b7e1cd}1.238 &2.563 & \cellcolor[HTML]{b7e1cd}2.551 & \cellcolor[HTML]{b7e1cd}2.337 & \cellcolor[HTML]{b7e1cd}2.501 &4.148 & \cellcolor[HTML]{b7e1cd}4.057 & \cellcolor[HTML]{b7e1cd}3.672 & \cellcolor[HTML]{b7e1cd}3.96 &5.048 & \cellcolor[HTML]{b7e1cd}5.033 & \cellcolor[HTML]{b7e1cd}4.404 & \cellcolor[HTML]{b7e1cd}4.75 \\ & Zoom Blur & 1.509 & \cellcolor[HTML]{b7e1cd}1.473 & \cellcolor[HTML]{b7e1cd}1.261 & \cellcolor[HTML]{b7e1cd}1.361 &2.252 & \cellcolor[HTML]{b7e1cd}2.187 & \cellcolor[HTML]{b7e1cd}2.037 & \cellcolor[HTML]{b7e1cd}2.134 &2.822 & \cellcolor[HTML]{b7e1cd}2.736 & \cellcolor[HTML]{b7e1cd}2.571 & \cellcolor[HTML]{b7e1cd}2.697 &3.44 & \cellcolor[HTML]{b7e1cd}3.337 & \cellcolor[HTML]{b7e1cd}3.139 & \cellcolor[HTML]{b7e1cd}3.325 &4.039 & \cellcolor[HTML]{b7e1cd}3.949 & \cellcolor[HTML]{b7e1cd}3.676 & \cellcolor[HTML]{b7e1cd}3.916 \\ & Snow & 1.229 & \cellcolor[HTML]{b7e1cd}1.13 & \cellcolor[HTML]{b7e1cd}1.048 & \cellcolor[HTML]{b7e1cd}1.143 &2.62 & \cellcolor[HTML]{b7e1cd}2.529 & \cellcolor[HTML]{b7e1cd}2.481 & \cellcolor[HTML]{ea9999}2.933 &2.375 & \cellcolor[HTML]{b7e1cd}2.317 & \cellcolor[HTML]{b7e1cd}2.193 & \cellcolor[HTML]{ea9999}2.478 &3.127 & \cellcolor[HTML]{b7e1cd}3.016 & \cellcolor[HTML]{b7e1cd}2.941 & \cellcolor[HTML]{ea9999}3.347 &3.697 & \cellcolor[HTML]{b7e1cd}3.437 & \cellcolor[HTML]{ea9999}3.8 & \cellcolor[HTML]{ea9999}4.209 \\ & Frost & 0.845 & \cellcolor[HTML]{b7e1cd}0.837 & \cellcolor[HTML]{b7e1cd}0.674 & \cellcolor[HTML]{b7e1cd}0.653 &1.769 & \cellcolor[HTML]{b7e1cd}1.726 & \cellcolor[HTML]{b7e1cd}1.506 & \cellcolor[HTML]{ea9999}1.785 &2.563 & \cellcolor[HTML]{b7e1cd}2.522 & \cellcolor[HTML]{b7e1cd}2.245 & \cellcolor[HTML]{ea9999}2.588 &2.761 & \cellcolor[HTML]{b7e1cd}2.72 & \cellcolor[HTML]{b7e1cd}2.435 & \cellcolor[HTML]{ea9999}2.816 &3.322 & \cellcolor[HTML]{b7e1cd}3.291 & \cellcolor[HTML]{b7e1cd}2.972 & \cellcolor[HTML]{ea9999}3.427 \\ & Fog & 0.691 & \cellcolor[HTML]{b7e1cd}0.685 & \cellcolor[HTML]{b7e1cd}0.582 & \cellcolor[HTML]{b7e1cd}0.501 &0.897 & \cellcolor[HTML]{b7e1cd}0.876 & \cellcolor[HTML]{b7e1cd}0.784 & \cellcolor[HTML]{ea9999}0.926 &1.305 & \cellcolor[HTML]{b7e1cd}1.248 & \cellcolor[HTML]{b7e1cd}1.167 & \cellcolor[HTML]{ea9999}1.337 &1.81 & \cellcolor[HTML]{b7e1cd}1.657 & \cellcolor[HTML]{b7e1cd}1.635 & \cellcolor[HTML]{ea9999}1.909 &3.261 & \cellcolor[HTML]{b7e1cd}2.96 & \cellcolor[HTML]{b7e1cd}2.979 & \cellcolor[HTML]{ea9999}3.569 \\ & Brightness & 0.345 & \cellcolor[HTML]{b7e1cd}0.25 & \cellcolor[HTML]{b7e1cd}0.21 & \cellcolor[HTML]{b7e1cd}0.081 &0.382 & \cellcolor[HTML]{b7e1cd}0.314 & \cellcolor[HTML]{b7e1cd}0.259 & \cellcolor[HTML]{ea9999}0.431 &0.453 & \cellcolor[HTML]{b7e1cd}0.446 & \cellcolor[HTML]{b7e1cd}0.341 & \cellcolor[HTML]{b7e1cd}0.222 &0.584 & \cellcolor[HTML]{b7e1cd}0.514 & \cellcolor[HTML]{b7e1cd}0.473 & \cellcolor[HTML]{b7e1cd}0.367 &0.787 & \cellcolor[HTML]{b7e1cd}0.716 & \cellcolor[HTML]{b7e1cd}0.674 & \cellcolor[HTML]{b7e1cd}0.609 \\ & Contrast & 0.545 & \cellcolor[HTML]{b7e1cd}0.449 & \cellcolor[HTML]{b7e1cd}0.42 & \cellcolor[HTML]{b7e1cd}0.308 &0.69 & \cellcolor[HTML]{b7e1cd}0.677 & \cellcolor[HTML]{b7e1cd}0.568 & \cellcolor[HTML]{b7e1cd}0.494 &1.047 & \cellcolor[HTML]{b7e1cd}0.998 & \cellcolor[HTML]{b7e1cd}0.867 & \cellcolor[HTML]{ea9999}1.051 &2.387 & \cellcolor[HTML]{b7e1cd}2.173 & \cellcolor[HTML]{b7e1cd}1.85 & \cellcolor[HTML]{ea9999}2.624 &4.686 & \cellcolor[HTML]{b7e1cd}4.394 & \cellcolor[HTML]{b7e1cd}3.673 & \cellcolor[HTML]{ea9999}4.914 \\ & Elastic & 0.655 & \cellcolor[HTML]{b7e1cd}0.625 & \cellcolor[HTML]{b7e1cd}0.517 & \cellcolor[HTML]{b7e1cd}0.422 &2.274 & \cellcolor[HTML]{b7e1cd}2.243 & \cellcolor[HTML]{b7e1cd}1.908 & \cellcolor[HTML]{b7e1cd}2.122 &1.602 & \cellcolor[HTML]{b7e1cd}1.571 & \cellcolor[HTML]{b7e1cd}1.242 & \cellcolor[HTML]{b7e1cd}1.416 &2.704 & \cellcolor[HTML]{b7e1cd}2.671 & \cellcolor[HTML]{b7e1cd}2.209 & \cellcolor[HTML]{b7e1cd}2.591 &5.598 & \cellcolor[HTML]{b7e1cd}5.522 & \cellcolor[HTML]{b7e1cd}4.647 & \cellcolor[HTML]{b7e1cd}5.348 \\ & Pixelate & 0.871 & \cellcolor[HTML]{b7e1cd}0.71 & \cellcolor[HTML]{b7e1cd}0.727 & \cellcolor[HTML]{b7e1cd}0.684 &1.042 & \cellcolor[HTML]{b7e1cd}1.021 & \cellcolor[HTML]{b7e1cd}1.015 & \cellcolor[HTML]{b7e1cd}0.778 &1.971 & \cellcolor[HTML]{b7e1cd}1.575 & \cellcolor[HTML]{b7e1cd}1.556 & \cellcolor[HTML]{ea9999}1.974 &3.373 & \cellcolor[HTML]{b7e1cd}3.229 & \cellcolor[HTML]{b7e1cd}3.208 & \cellcolor[HTML]{ea9999}3.431 &4.038 & \cellcolor[HTML]{b7e1cd}4.014 & \cellcolor[HTML]{b7e1cd}3.996 & \cellcolor[HTML]{ea9999}4.179 \\ \multirow{-15}{*}{ResNet-50} & JPEG & 0.793 & \cellcolor[HTML]{b7e1cd}0.701 & \cellcolor[HTML]{b7e1cd}0.67 & \cellcolor[HTML]{b7e1cd}0.522 &0.957 & \cellcolor[HTML]{b7e1cd}0.98 & \cellcolor[HTML]{b7e1cd}0.832 & \cellcolor[HTML]{b7e1cd}0.704 &1.092 & \cellcolor[HTML]{b7e1cd}1.023 & \cellcolor[HTML]{b7e1cd}0.96 & \cellcolor[HTML]{b7e1cd}0.849 &1.537 & \cellcolor[HTML]{b7e1cd}1.425 & \cellcolor[HTML]{b7e1cd}1.356 & \cellcolor[HTML]{b7e1cd}1.394 &2.236 & \cellcolor[HTML]{b7e1cd}2.118 & \cellcolor[HTML]{b7e1cd}1.967 & \cellcolor[HTML]{b7e1cd}2.228 \\ \hline \end{tabular} \end{adjustbox} \end{table}
\subsection{Generalization Under Distorted Data}
\label{sec:distortions}
In addition to assessing the generalization metrics outlined in the previous sections, we sought to investigate some real-world implications that the generalization quality of the quantized models would have. To this end, we evaluate the performance of vision models in the case when the input images are distorted or corrupted by common types of perturbations. We take advantage of the comprehensive benchmark provided in \cite{hendrycks2019robustness} where they identify 15 types of common distortions and measure the performance of the models under different levels of severity for each distortion. Table \ref{tab:ditortion} presents the generalization gaps as calculated by the difference of loss on the corrupted dataset and the loss on training data for 5 levels of severity for ResNet-18 and ResNet-50 trained on ImageNet-1K. By augmenting the test dataset in this way, we are unlocking more unseen data for evaluating the generalization of our models. As is evident through these experiments, quantized models maintain their superior generalization under most of the distortions. Accuracies of the models on the distorted dataset, as well as results and discussions on more architectures and datasets and details on conducting our experiments, are available in the supplementary material Section \ref{appx:distortion_setup}.
% \begin{table}[!h]\centering
% \caption{Effect of distortion on generalization gap on quantized models. Compared to FP32 column, we have highlighted better generalization gap with \crule[tab4_g]{0.2cm}{0.2cm} and \crule[tab4_r]{0.2cm}{0.2cm} to show the opposite.}
% \vspace{1mm}
% \label{tab:ditortion}
% \begin{adjustbox}{max width=\textwidth}
% \begin{tabular}{lrrrrrrrrrrrrr}
% \hline
% & & \multicolumn{4}{c}{\textbf{Severity 1}} & \multicolumn{4}{c}{\textbf{Severity 3}} & \multicolumn{4}{c}{\textbf{Severity 5}} \ \cline{3-14}
% \multirow{-2}{}{\textbf{Model}} & \multirow{-2}{}{\textbf{Augmentation}} & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 \ \hline
% & Gaussian Noise & 1.067 & \cellcolor[HTML]{B7E1CD}0.86 & \cellcolor[HTML]{B7E1CD}0.939 & \cellcolor[HTML]{EA9999}1.21 & 3.439 & \cellcolor[HTML]{B7E1CD}2.52 & \cellcolor[HTML]{B7E1CD}2.796 & \cellcolor[HTML]{EA9999}3.658 & 7.653 & \cellcolor[HTML]{B7E1CD}5.62 & \cellcolor[HTML]{B7E1CD}6.117 & \cellcolor[HTML]{B7E1CD}6.093 \
% & Shot Noise & 1.238 & \cellcolor[HTML]{B7E1CD}0.96 & \cellcolor[HTML]{B7E1CD}1.054 & \cellcolor[HTML]{EA9999}1.333 & 3.801 & \cellcolor[HTML]{B7E1CD}2.75 & \cellcolor[HTML]{B7E1CD}2.987 & \cellcolor[HTML]{B7E1CD}3.685 & 7.919 & \cellcolor[HTML]{B7E1CD}5.41 & \cellcolor[HTML]{B7E1CD}5.805 & \cellcolor[HTML]{B7E1CD}6.003 \
% & Impulse Noise & 2.235 & \cellcolor[HTML]{B7E1CD}1.78 & \cellcolor[HTML]{B7E1CD}2.058 & \cellcolor[HTML]{EA9999}2.324 & 4.061 & \cellcolor[HTML]{B7E1CD}2.9 & \cellcolor[HTML]{B7E1CD}3.185 & \cellcolor[HTML]{B7E1CD}4.001 & 7.781 & \cellcolor[HTML]{B7E1CD}5.57 & \cellcolor[HTML]{B7E1CD}5.995 & \cellcolor[HTML]{B7E1CD}6.114 \
% & Defocus Blur & 0.979 & \cellcolor[HTML]{B7E1CD}0.89 & \cellcolor[HTML]{B7E1CD}0.858 & \cellcolor[HTML]{B7E1CD}0.822 & 2.394 & \cellcolor[HTML]{B7E1CD}2.34 & \cellcolor[HTML]{B7E1CD}2.263 & \cellcolor[HTML]{B7E1CD}2.217 & 3.983 & \cellcolor[HTML]{B7E1CD}3.92 & \cellcolor[HTML]{B7E1CD}3.808 & \cellcolor[HTML]{B7E1CD}3.498 \
% & Glass Blur & 1.18 & \cellcolor[HTML]{B7E1CD}1.07 & \cellcolor[HTML]{B7E1CD}1.031 & \cellcolor[HTML]{B7E1CD}0.985 & 3.822 & \cellcolor[HTML]{EA9999}3.83 & \cellcolor[HTML]{B7E1CD}3.727 & \cellcolor[HTML]{B7E1CD}3.54 & 4.652 & \cellcolor[HTML]{B7E1CD}4.62 & \cellcolor[HTML]{B7E1CD}4.542 & \cellcolor[HTML]{B7E1CD}4.139 \
% & Motion Blur & 0.687 & \cellcolor[HTML]{B7E1CD}0.55 & \cellcolor[HTML]{B7E1CD}0.542 & \cellcolor[HTML]{B7E1CD}0.509 & 2.521 & \cellcolor[HTML]{B7E1CD}2.42 & \cellcolor[HTML]{B7E1CD}2.454 & \cellcolor[HTML]{B7E1CD}2.381 & 4.285 & \cellcolor[HTML]{B7E1CD}4.23 & \cellcolor[HTML]{B7E1CD}4.275 & \cellcolor[HTML]{B7E1CD}3.875 \
% & Zoom Blur & 1.518 & \cellcolor[HTML]{B7E1CD}1.38 & \cellcolor[HTML]{B7E1CD}1.382 & \cellcolor[HTML]{B7E1CD}1.386 & 2.688 & \cellcolor[HTML]{B7E1CD}2.58 & \cellcolor[HTML]{B7E1CD}2.588 & \cellcolor[HTML]{B7E1CD}2.518 & 3.666 & \cellcolor[HTML]{B7E1CD}3.59 & \cellcolor[HTML]{B7E1CD}3.599 & \cellcolor[HTML]{B7E1CD}3.437 \
% & Snow & 1.401 & \cellcolor[HTML]{B7E1CD}1.01 & \cellcolor[HTML]{B7E1CD}0.998 & \cellcolor[HTML]{B7E1CD}1.124 & 2.969 & \cellcolor[HTML]{B7E1CD}2.07 & \cellcolor[HTML]{B7E1CD}2.094 & \cellcolor[HTML]{B7E1CD}2.315 & 4.515 & \cellcolor[HTML]{B7E1CD}3.48 & \cellcolor[HTML]{B7E1CD}3.517 & \cellcolor[HTML]{B7E1CD}3.572 \
% & Frost & 0.949 & \cellcolor[HTML]{B7E1CD}0.66 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.633 & 2.978 & \cellcolor[HTML]{B7E1CD}2.53 & \cellcolor[HTML]{B7E1CD}2.553 & \cellcolor[HTML]{B7E1CD}2.688 & 3.713 & \cellcolor[HTML]{B7E1CD}3.31 & \cellcolor[HTML]{B7E1CD}3.362 & \cellcolor[HTML]{B7E1CD}3.447 \
% & Fog & 0.809 & \cellcolor[HTML]{B7E1CD}0.42 & \cellcolor[HTML]{B7E1CD}0.444 & \cellcolor[HTML]{B7E1CD}0.405 & 1.857 & \cellcolor[HTML]{B7E1CD}1.18 & \cellcolor[HTML]{B7E1CD}1.273 & \cellcolor[HTML]{B7E1CD}1.431 & 3.77 & \cellcolor[HTML]{B7E1CD}3.03 & \cellcolor[HTML]{B7E1CD}3.141 & \cellcolor[HTML]{B7E1CD}3.275 \
% & Brightness & 0.121 & \cellcolor[HTML]{B7E1CD}0.04 & \cellcolor[HTML]{B7E1CD}0.019 & \cellcolor[HTML]{EA9999}0.155 & 0.378 & \cellcolor[HTML]{B7E1CD}0.19 & \cellcolor[HTML]{B7E1CD}0.184 & \cellcolor[HTML]{B7E1CD}0.084 & 0.986 & \cellcolor[HTML]{B7E1CD}0.62 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.672 \
% & Contrast & 0.523 & \cellcolor[HTML]{B7E1CD}0.24 & \cellcolor[HTML]{B7E1CD}0.232 & \cellcolor[HTML]{B7E1CD}0.13 & 1.627 & \cellcolor[HTML]{B7E1CD}0.81 & \cellcolor[HTML]{B7E1CD}0.861 & \cellcolor[HTML]{B7E1CD}1.031 & 5.264 & \cellcolor[HTML]{B7E1CD}4.63 & \cellcolor[HTML]{B7E1CD}4.765 & \cellcolor[HTML]{B7E1CD}4.479 \
% & Elastic & 0.538 & \cellcolor[HTML]{B7E1CD}0.43 & \cellcolor[HTML]{B7E1CD}0.406 & \cellcolor[HTML]{B7E1CD}0.287 & 1.116 & \cellcolor[HTML]{B7E1CD}1.03 & \cellcolor[HTML]{B7E1CD}0.969 & \cellcolor[HTML]{B7E1CD}0.884 & 4.112 & \cellcolor[HTML]{B7E1CD}4.11 & \cellcolor[HTML]{B7E1CD}3.957 & \cellcolor[HTML]{B7E1CD}3.57 \
% & Pixelate & 0.612 & \cellcolor[HTML]{B7E1CD}0.5 & \cellcolor[HTML]{B7E1CD}0.492 & \cellcolor[HTML]{B7E1CD}0.416 & 1.889 & \cellcolor[HTML]{B7E1CD}1.72 & \cellcolor[HTML]{B7E1CD}1.734 & \cellcolor[HTML]{EA9999}1.958 & 3.369 & \cellcolor[HTML]{B7E1CD}3.32 & \cellcolor[HTML]{B7E1CD}3.313 & \cellcolor[HTML]{B7E1CD}3.51 \
% \multirow{-15}{*}{\textbf{ResNet-18}} & JPEG & 0.59 & \cellcolor[HTML]{B7E1CD}0.48 & \cellcolor[HTML]{B7E1CD}0.468 & \cellcolor[HTML]{B7E1CD}0.375 & 0.972 & \cellcolor[HTML]{B7E1CD}0.85 & \cellcolor[HTML]{B7E1CD}0.841 & \cellcolor[HTML]{B7E1CD}0.824 & 2.615 & \cellcolor[HTML]{B7E1CD}2.43 & \cellcolor[HTML]{B7E1CD}2.405 & \cellcolor[HTML]{B7E1CD}2.487 \ \hline
% \vspace{-1cm}
% \end{tabular}
% \end{adjustbox}
% \end{table}
% \section{Experimental Results}
% I (Armen) don't think we need a separate section for experimental results unless @hossein we want to have some sort of plot for showing the trade-off.
% Hossein, Armen, Ivan, Alex:
% Talk about how the experiments are designed to show empirical evidence of our claim. For each experiment, we need to clarify why it is added, how the training was done, and what hyperparameters were used. Finally, we need to link it to our claims.
% \subsection{Cifar10-100 Experiments} Armen
% \begin{itemize}
% \item Experiment Setup and Goal
% \item Experiment Results
% \item Link to claims or contribution
% \item Table to show different gen metrics for FP32 and quant models.
% \end{itemize}
% \subsection{Perturbation Results + Severity Test} Hossein
% \begin{itemize}
% \item Experiment Setup and Goal
% \item Experiment Results
% \item Link to claims or contribution
% \end{itemize}
\section{Conclusion}
In this work, we investigated the generalization properties of quantized neural networks, which have received limited attention despite their significant impact on model performance. We demonstrated that quantization has a regularization effect and it leads to improved generalization capabilities. We empirically show that quantization could facilitate the convergence of models to flatter minima. Lastly, on distorted data, we provided empirical evidence that quantized models exhibit improved generalization compared to their full-precision counterparts across various experimental setups. Through the findings of this study, we hope that the inherent generalization capabilities of quantized models can be used to further improve their performance.
\bibliography{main}
\bibliographystyle{tmlr}
\newpage
\appendix
\section{Flatness Landscape}
\label{appx:flatness_landscape}
The PAC-Bayesian and sharpness generalization measures both make use of the PAC-Bayes bounds, which estimate the bounds of the generalization error of a predictor (i.e. neural network). In our case, the PAC-Bayes bound is a function of the KL divergence of the prior distribution and posterior distribution of the model parameters, where the prior distribution is drawn without knowledge of the dataset and the posterior distribution is a perturbation on the trained parameters. It has been shown that when both distributions are isotropic Gaussian distributions, then PAC-Bayesian bounds are a good measure of generalization in small-scale experiments. We refer the reader to \cite{jiang2019fantastic} for more detailed analysis and derivations, which we summarize here. The PAC-Bayes generalization measures are defined below:
\begin{align}\label{eq:pacbayes}
\mu_{\text{pac-bayes-init}} (f_{\boldsymbol{w}})) &= \frac{||\boldsymbol{w} - \boldsymbol{w}^{0} ||2^2}{4 \sigma^{2}} + \log(\frac{m}{\sigma}) + 10 \
\mu{\text{pac-bayes-orig}} (f_{\boldsymbol{w}})) &= \frac{||\boldsymbol{w}||_2^2}{4 \sigma^{2}} + \log(\frac{m}{\delta}) + 10
\end{align}
Where is chosen to be the largest number such that , and is the sample size of the dataset
From the same PAC-Bayesian bound framework, we can also derive the sharpness measure, by using the worst-case noise rather than the Gaussian sampled noise.
\begin{align}\label{eq:sharpness}
\mu_{\text{sharpness-init}} (f_{\boldsymbol{w}})) &= \frac{||\boldsymbol{w} - \boldsymbol{w}^{0} ||2^2 \log(2 \omega)}{4 \alpha^{2}} + \log(\frac{m}{\sigma}) + 10 \
\mu{\text{sharpness-orig}} (f_{\boldsymbol{w}})) &= \frac{||\boldsymbol{w}||_2^2 \log(2 \omega)}{4 \alpha^{2}} + \log(\frac{m}{\delta}) + 10
\end{align}
Where is chosen to be the largest number such that
and is the number of parameters in the model.
For magnitude-aware measures \cite{keskar2016large}, the ratio of the magnitude of the perturbation to the magnitude of the parameter is bound by a constant . By bounding the ratio of perturbation to parameter magnitude, we prevent parameters from changing signs. This change leads to the following magnitude-aware generalization measures:
% This change updates the KL divergence term to:
% \begin{align}
% KL(Q || P) &= \omega \log\left(\frac{\sigma^{2} + 1}{\omega} ||\boldsymbol{w} - \boldsymbol{w^{0}}||{2}^{2} + \epsilon^{2}\right) -
% \sum{i=1}^{\omega}{\log{(\sigma ^{2} |w_{i} - w_{i}^{0}|^{2} + \epsilon^{2}})} \
% &= \sum_{i=1}^{\omega}{\log\left(\frac{\epsilon^2 + (\sigma '^2 + 1) ||\boldsymbol{w} - \boldsymbol{w^{0}}||_2^2 / \omega}{\epsilon^2 + \sigma '^2 |w_i - w_i^0|^2} \right)}
% \end{align}
\begin{align}
\mu_{pac-bayes-mag-init} (f_{\boldsymbol{w}}) &=
\frac{1}{4} \sum_{i=1}^{\omega}{\log\left( \frac{\epsilon^2 + (\sigma '^2 + 1) ||\boldsymbol{w} - \boldsymbol{w^{0}}||2^2 / \omega}
{\epsilon^2 + \sigma '^2 |w_i - w_i^0|^2} \right)} +
\log(\frac{m}{\delta}) + 10 \
\mu{pac-bayes-mag-orig} (f_{\boldsymbol{w}}) &=
\frac{1}{4} \sum_{i=1}^{\omega}{\log\left( \frac{\epsilon^2 + (\sigma '^2 + 1) ||\boldsymbol{w}||_2^2 / \omega}
{\epsilon^2 + \sigma '^2 |w_i - w_i^0|^2} \right)} +
\log(\frac{m}{\delta}) + 10
\end{align}
\begin{align}
\mu_{sharpness-mag-init} (f_{\boldsymbol{w}}) &=
\frac{1}{4} \sum_{i=1}^{\omega}{\log\left( \frac{\epsilon^2 + (\alpha '^2 + 4 \log(2 \omega / \delta) ||\boldsymbol{w} - \boldsymbol{w^{0}}||2^2 / \omega}
{\epsilon^2 + \alpha '^2 |w_i - w_i^0|^2} \right)} +
\log(\frac{m}{\delta}) + 10 \
\mu{sharpness-mag-orig} (f_{\boldsymbol{w}}) &=
\frac{1}{4} \sum_{i=1}^{\omega}{\log\left( \frac{\epsilon^2 + (\alpha '^2 + 4 \log(2 \omega / \delta) ||\boldsymbol{w}||_2^2 / \omega}
{\epsilon^2 + \alpha '^2 |w_i - w_i^0|^2} \right)} +
\log(\frac{m}{\delta}) + 10
\end{align}
Where and is chosen to be the largest number such that ,
\section{Experiment Setup For Measuring Sharpness-based Metrics}
\label{appx:experiment_setup}
\subsection{Training Setup}
We used different models and datasets to compute the generalization gap using proxy metrics described in Section \ref{appx:flatness_landscape}.
Our experiments employed the LSQ method \cite{Esser2020LEARNED} for weight quantization. The CIFAR-10, CIFAR-100, and ImageNet datasets were utilized for testing purposes. We applied three distinct quantization levels for quantized models: 2, 4, and 8 bits. The CIFAR-10 and CIFAR-100 NiN models are trained with a base width of 25, and they are trained for 300 epochs, with an SGD optimizer, an initial learning rate of 0.1, momentum of 0.9, and a weight decay of 0.0001. We utilize a multi-step scheduler with steps at epochs 100 and 200, and the gamma is 0.1. The ResNet models that we use for these two datasets have a base width of 16 and use the same optimizer as the NiN network. However, these models are trained for 200 epochs, and the steps happen at epochs 80 and 160. The ResNet models we utilize for comparing sharpness-based measures for the ImageNet dataset have a base width of 64. We again use the same optimizer only with a different learning rate of 0.01. We fine-tune the models from Pytorch pre-trained weights for 120 epochs, and the steps happen at epochs 30, 60, and 90.
\subsection{Measuring the Metrics}
To measure the PAC-Bayesian and sharpness measures, we measure these metrics for the cases of magnitude aware and the normal for each quantization level. In each case, we run the search for finding the maximum amount of possible noise (), for 15 iterations, and within each iteration we calculate the mean of the accuracy on the training data over 10 runs to remove the effect of randomness. As an additional step in calculating the sharpness measures, we perform the gradient ascent step to maximize the loss value for 20 iterations. We use a learning rate of 0.0001 with an SGD optimizer for the gradient ascent process.
\subsection{Measuring Generalization Gaps}
In our experiments for measuring the generalization gaps, we trained almost 2000 CIFAR-10 and CIFAR-100 models. The main backbone in all these experiments was NiN. We trained the models over the variation of hyperparameter values for 5 hyperparameter, and each hyperparameter had 3 choices. For the case of CIFAR-10, here are the values for hyperparameters:
\begin{itemize}
\item Optimizer algorithm: {SGD, ADAM, RMSProp}
\item Learning rate: {0.1, 0.05, 0.01} for SGD, {0.001, 0.0005, 0.0001} for ADAM and RMSProp
\item Weight decay: {0.0, 0.0001, 0.0002}
\item Width multiplier: {8, 10, 12}
\item Depth multiplier: {2, 3, 4}
\end{itemize}
For CIFAR-100 everything is the same with the minor difference of depth multipliers being in the set of {3, 4, 5}.
Each NiN training instance is trained for 300 epochs, in every case a step scheduler with steps at the 100th and 200th epoch and a gamma of 0.1 is utilized. The model with the lowest loss on training data is used with no information about the test data. Then the statistics in \ref{Tab:gen_gap_cifars} are generated.
\subsection{Computation Requirements}
To train the NiN models for each quantization level, we use one NVIDIA A100 GPU with a batch size of 128. Each experiment takes almost 6 days to run, which on average is equivalent to 35 minutes per model training. We use 8 GPUs, 4 for CIFAR-10 and 4 for CIFAR-100.
For evaluating the sharpness measures, the main bottleneck is for ImageNet models, as evaluating the sharpness measures for each quantization level requires almost 600 evaluations on the training data in the worst. Running each quantization level on one NVIDIA A100 GPU requires 33 hours on average.
\section{Distortion Experiments}
\label{appx:distortion_setup}
These are the extended results for investigating the generalization gap under distortion. We provide a generalization gap of quantized and full precision models on augmented
datasets.
\subsection{Training Setup}
For full precision models, we used pre-trained models publicly available on the Pytorch website \cite{Pytorch_Models}.
For quantized models, we use weight quantization using LSQ \cite{Esser2020LEARNED} method. We use CIFAR-100 and ImageNet datasets in our tests. We use three different quantization levels for quantized models: 2, 4, and 8 bits. We use a multi-step scheduler with steps at 30, 60, and 90 with an initial learning rate of 0.01 and gamma of 0.1. We use weight decay of 1e-4 and SGD optimizer. We trained all models for 120 epochs. Finally, we used pre-trained models from Pytorch to initialize weights for LSQ quantization.
\subsection{Data Preparation}
For augmented datasets, we use the corrupted Imagenet-C and CIFAR100-C datasets proposed in \cite{hendrycks2019robustness}.
Table \ref{tab_appx:ditortion_imagenet} presents the results of the experiments performed on the ResNet-18, MoobileNet V2 and ResNet-50 models trained on the
ImageNet dataset, and Table \ref{tab_appx:ditortion_cifar100} present the results for ResNet-18, MobileNet V1, and VGG-19 models on CIFAR-100. These tables show the effect of distortion on the generalization gap of quantized models when various types and severity levels of distortions are used.
Specifically, 15 different types of distortions were applied to the models. For each distortion type, the generalization gap was computed by subtracting the test loss on the distorted dataset from the loss on the original ImageNet training dataset.
\subsection{Computation Setup}
For these experiments, we used 8 NVIDIA A100 GPUs with 40 GB of RAM to train ImageNeta and CIFAR-100 models. With the above training hardware and submitted code, each ImageNet model takes 18 hours to train on average. CIFAR-100 models take much less time, and on average, one can train each model in less than an hour.
\subsection{Results on CIFAR-100 Dataset }
For the CIFAR-100 dataset, we computed the generalization gap ( by subtracting test loss on augmented CIFAR100-C data from train loss on the original CIFAR-100 dataset) for ResNet-18, MobileNet-V1, and VGG-19. Unlike the Imagenet-C dataset, the CIFAR100-C dataset comes with only one distortion severity level. Table \ref{tab_appx:ditortion_cifar100} shows the result of our experiment on the CIFAR-100 dataset. Compared to full-precision models, quantized models show smaller generalization gaps in all cases.
\begin{table}[!h]
\caption{Full list of experiments showing the effect of distortion on generalization gap on quantized models on CIFAR-100 dataset. Compared to FP32 column, we have highlighted better generalization gap with \crule[tab4_g]{0.2cm}{0.2cm} and \crule[tab4_r]{0.2cm}{0.2cm} to show the opposite.}
\begin{adjustbox}{max width=\textwidth}
\label{tab_appx:ditortion_cifar100}
\begin{tabular}{rrrrrrrrrrrrr}
\cline{2-13}
\multicolumn{1}{l}{\textbf{}} & \multicolumn{4}{c}{\textbf{ResNet-18}} & \multicolumn{4}{c}{\textbf{MobileNet-V1}} & \multicolumn{4}{c}{\textbf{VGG-19}} \ \hline
Augmentation & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 \ \hline
Gaussian Noise & 2.782 & \cellcolor[HTML]{B7E1CD}1.546 & \cellcolor[HTML]{B7E1CD}1.614 & \cellcolor[HTML]{B7E1CD}1.225 & 4.327 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.027 & \cellcolor[HTML]{B7E1CD}0.191 & 4.26 & \cellcolor[HTML]{B7E1CD}1.687 & \cellcolor[HTML]{B7E1CD}0.91 & \cellcolor[HTML]{B7E1CD}0.497 \
Shot Noise & 2.027 & \cellcolor[HTML]{B7E1CD}1.336 & \cellcolor[HTML]{B7E1CD}1.4 & \cellcolor[HTML]{B7E1CD}1.041 & 3.357 & \cellcolor[HTML]{B7E1CD}0.023 & \cellcolor[HTML]{B7E1CD}0.773 & \cellcolor[HTML]{B7E1CD}0.147 & 3.267 & \cellcolor[HTML]{B7E1CD}1.299 & \cellcolor[HTML]{B7E1CD}0.701 & \cellcolor[HTML]{B7E1CD}0.427 \
Impulse Noise & 2.004 & \cellcolor[HTML]{B7E1CD}1.155 & \cellcolor[HTML]{B7E1CD}1.252 & \cellcolor[HTML]{B7E1CD}1.004 & 2.921 & \cellcolor[HTML]{B7E1CD}0.023 & \cellcolor[HTML]{B7E1CD}0.633 & \cellcolor[HTML]{B7E1CD}0.156 & 4.205 & \cellcolor[HTML]{B7E1CD}1.369 & \cellcolor[HTML]{B7E1CD}0.964 & \cellcolor[HTML]{B7E1CD}0.563 \
Defocus Noise & 1.004 & \cellcolor[HTML]{B7E1CD}0.752 & \cellcolor[HTML]{B7E1CD}0.815 & \cellcolor[HTML]{B7E1CD}0.598 & 1.889 & \cellcolor[HTML]{B7E1CD}0.023 & \cellcolor[HTML]{B7E1CD}0.479 & \cellcolor[HTML]{B7E1CD}0.057 & 1.89 & \cellcolor[HTML]{B7E1CD}0.501 & \cellcolor[HTML]{B7E1CD}0.401 & \cellcolor[HTML]{B7E1CD}0.267 \
Glass Blue & 4.649 & \cellcolor[HTML]{B7E1CD}2.445 & \cellcolor[HTML]{B7E1CD}2.479 & \cellcolor[HTML]{B7E1CD}1.889 & 5.849 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.186 & \cellcolor[HTML]{B7E1CD}0.23 & 8.709 & \cellcolor[HTML]{B7E1CD}2.369 & \cellcolor[HTML]{B7E1CD}1.457 & \cellcolor[HTML]{B7E1CD}0.797 \
Motion Blur & 1.373 & \cellcolor[HTML]{B7E1CD}0.923 & \cellcolor[HTML]{B7E1CD}0.971 & \cellcolor[HTML]{B7E1CD}0.706 & 2.386 & \cellcolor[HTML]{B7E1CD}0.025 & \cellcolor[HTML]{B7E1CD}1.001 & \cellcolor[HTML]{B7E1CD}0.067 & 2.413 & \cellcolor[HTML]{B7E1CD}0.687 & \cellcolor[HTML]{B7E1CD}0.54 & \cellcolor[HTML]{B7E1CD}0.335 \
Zoom Blur & 1.49 & \cellcolor[HTML]{B7E1CD}0.915 & \cellcolor[HTML]{B7E1CD}0.979 & \cellcolor[HTML]{B7E1CD}0.736 & 2.657 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.224 & \cellcolor[HTML]{B7E1CD}0.068 & 2.579 & \cellcolor[HTML]{B7E1CD}0.788 & \cellcolor[HTML]{B7E1CD}0.581 & \cellcolor[HTML]{B7E1CD}0.351 \
Snow & 1.409 & \cellcolor[HTML]{B7E1CD}0.769 & \cellcolor[HTML]{B7E1CD}0.853 & \cellcolor[HTML]{B7E1CD}0.666 & 2.433 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.26 & \cellcolor[HTML]{B7E1CD}0.085 & 2.653 & \cellcolor[HTML]{B7E1CD}0.709 & \cellcolor[HTML]{B7E1CD}0.445 & \cellcolor[HTML]{B7E1CD}0.256 \
Frost & 1.473 & \cellcolor[HTML]{B7E1CD}0.487 & \cellcolor[HTML]{B7E1CD}0.581 & \cellcolor[HTML]{B7E1CD}0.4 & 2.465 & \cellcolor[HTML]{B7E1CD}0.023 & \cellcolor[HTML]{B7E1CD}0.245 & \cellcolor[HTML]{B7E1CD}0.003 & 2.506 & \cellcolor[HTML]{B7E1CD}0.439 & \cellcolor[HTML]{B7E1CD}0.115 & \cellcolor[HTML]{B7E1CD}0.005 \
Fog & 0.991 & \cellcolor[HTML]{B7E1CD}0.505 & \cellcolor[HTML]{B7E1CD}0.583 & \cellcolor[HTML]{B7E1CD}0.409 & 1.858 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.056 & \cellcolor[HTML]{B7E1CD}0.007 & 1.88 & \cellcolor[HTML]{B7E1CD}0.338 & \cellcolor[HTML]{B7E1CD}0.238 & \cellcolor[HTML]{B7E1CD}0.145 \
Brightness & 0.996 & \cellcolor[HTML]{B7E1CD}0.543 & \cellcolor[HTML]{B7E1CD}0.609 & \cellcolor[HTML]{B7E1CD}0.435 & 1.869 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.048 & \cellcolor[HTML]{B7E1CD}0.01 & 1.878 & \cellcolor[HTML]{B7E1CD}0.349 & \cellcolor[HTML]{B7E1CD}0.229 & \cellcolor[HTML]{B7E1CD}0.112 \
Contrast & 1.018 & \cellcolor[HTML]{B7E1CD}0.532 & \cellcolor[HTML]{B7E1CD}0.618 & \cellcolor[HTML]{B7E1CD}0.434 & 1.905 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.081 & \cellcolor[HTML]{B7E1CD}0.015 & 1.907 & \cellcolor[HTML]{B7E1CD}0.365 & \cellcolor[HTML]{B7E1CD}0.269 & \cellcolor[HTML]{B7E1CD}0.179 \
Elastic & 1.415 & \cellcolor[HTML]{B7E1CD}0.996 & \cellcolor[HTML]{B7E1CD}1.048 & \cellcolor[HTML]{B7E1CD}0.828 & 2.493 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.055 & \cellcolor[HTML]{B7E1CD}0.113 & 2.507 & \cellcolor[HTML]{B7E1CD}0.8 & \cellcolor[HTML]{B7E1CD}0.646 & \cellcolor[HTML]{B7E1CD}0.429 \
Pixelate & 1.185 & \cellcolor[HTML]{B7E1CD}0.888 & \cellcolor[HTML]{B7E1CD}0.982 & \cellcolor[HTML]{B7E1CD}0.767 & 2.171 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.336 & \cellcolor[HTML]{B7E1CD}0.087 & 2.136 & \cellcolor[HTML]{B7E1CD}0.713 & \cellcolor[HTML]{B7E1CD}0.479 & \cellcolor[HTML]{B7E1CD}0.33 \
JPEG & 1.7 & \cellcolor[HTML]{B7E1CD}1.217 & \cellcolor[HTML]{B7E1CD}1.313 & \cellcolor[HTML]{B7E1CD}1.002 & 2.75 & \cellcolor[HTML]{B7E1CD}0.022 & \cellcolor[HTML]{B7E1CD}0.309 & \cellcolor[HTML]{B7E1CD}0.119 & 2.868 & \cellcolor[HTML]{B7E1CD}1.12 & \cellcolor[HTML]{B7E1CD}0.703 & \cellcolor[HTML]{B7E1CD}0.427 \ \hline
\end{tabular}
\end{adjustbox}
\end{table}
\subsection{Results on ImageNet Dataset }
For the ImageNet dataset, we computed the generalization gap ( by subtracting test loss on augmented data from train loss on the original ImageNet dataset) for ResNet-18, MobileNet V2, and ResNet-50.
Table \ref{tab_appx:ditortion_imagenet} shows the full list of experiments. As seen, compared to the full precision model and unlike the CIFAR-100 dataset, not all quantization levels show a better generalization gap. Especially for the MobileNet-V2 model, Int2 quantization shows the worst generalization gap in most distortion types and severity levels. But in general, Int8 and Int4 show better generalization gaps in almost all models, distortion types, and levels.
\begin{table}[!h]
\caption{Full list of experiments showing the effect of distortion on generalization gap on quantized models on ImageNet dataset. Compared to FP32 column, we have highlighted better generalization gap with \crule[tab4_g]{0.2cm}{0.2cm} and \crule[tab4_r]{0.2cm}{0.2cm} to show the opposite.}
\begin{adjustbox}{max width=\textwidth}
\label{tab_appx:ditortion_imagenet}
text\begin{tabular}{lrrrrrrrrrrrrrrrrrrrrr} \hline \multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{4}{c}{\textbf{Severity 1}} & \multicolumn{4}{c}{\textbf{Severity 2}} & \multicolumn{4}{c}{\textbf{Severity 3}} & \multicolumn{4}{c}{Severity 4} & \multicolumn{4}{c}{Severity 5} \\ \cline{3-22} \multicolumn{1}{c}{\multirow{-2}{*}{\textbf{Model}}} & \multicolumn{1}{c}{\multirow{-2}{*}{\textbf{Augmentation}}} & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 & FP32 & Int8 & Int4 & Int2 \\ \hline & Gaussian Noise & 1.067 & \cellcolor[HTML]{B7E1CD}0.86 & \cellcolor[HTML]{B7E1CD}0.939 & \cellcolor[HTML]{EA9999}1.21 & 1.913 & \cellcolor[HTML]{B7E1CD}1.46 & \cellcolor[HTML]{B7E1CD}1.629 & \cellcolor[HTML]{EA9999}2.201 & 3.439 & \cellcolor[HTML]{B7E1CD}2.52 & \cellcolor[HTML]{B7E1CD}2.796 & \cellcolor[HTML]{EA9999}3.658 & 5.529 & \cellcolor[HTML]{B7E1CD}3.96 & \cellcolor[HTML]{B7E1CD}4.352 & \cellcolor[HTML]{B7E1CD}5.115 & 7.653 & \cellcolor[HTML]{B7E1CD}5.62 & \cellcolor[HTML]{B7E1CD}6.117 & \cellcolor[HTML]{B7E1CD}6.093 \\ & Shot Noise & 1.238 & \cellcolor[HTML]{B7E1CD}0.96 & \cellcolor[HTML]{B7E1CD}1.054 & \cellcolor[HTML]{EA9999}1.333 & 2.289 & \cellcolor[HTML]{B7E1CD}1.72 & \cellcolor[HTML]{B7E1CD}1.887 & \cellcolor[HTML]{EA9999}2.421 & 3.801 & \cellcolor[HTML]{B7E1CD}2.75 & \cellcolor[HTML]{B7E1CD}2.987 & \cellcolor[HTML]{B7E1CD}3.685 & 6.438 & \cellcolor[HTML]{B7E1CD}4.4 & \cellcolor[HTML]{B7E1CD}4.739 & \cellcolor[HTML]{B7E1CD}5.304 & 7.919 & \cellcolor[HTML]{B7E1CD}5.41 & \cellcolor[HTML]{B7E1CD}5.805 & \cellcolor[HTML]{B7E1CD}6.003 \\ & Impulse Noise & 2.235 & \cellcolor[HTML]{B7E1CD}1.78 & \cellcolor[HTML]{B7E1CD}2.058 & \cellcolor[HTML]{EA9999}2.324 & 3.177 & \cellcolor[HTML]{B7E1CD}2.35 & \cellcolor[HTML]{B7E1CD}2.636 & \cellcolor[HTML]{EA9999}3.279 & 4.061 & \cellcolor[HTML]{B7E1CD}2.9 & \cellcolor[HTML]{B7E1CD}3.185 & \cellcolor[HTML]{B7E1CD}4.001 & 6.096 & \cellcolor[HTML]{B7E1CD}4.26 & \cellcolor[HTML]{B7E1CD}4.595 & \cellcolor[HTML]{B7E1CD}5.324 & 7.781 & \cellcolor[HTML]{B7E1CD}5.57 & \cellcolor[HTML]{B7E1CD}5.995 & \cellcolor[HTML]{B7E1CD}6.114 \\ & Defocus Noise & 0.979 & \cellcolor[HTML]{B7E1CD}0.89 & \cellcolor[HTML]{B7E1CD}0.858 & \cellcolor[HTML]{B7E1CD}0.822 & 1.432 & \cellcolor[HTML]{B7E1CD}1.37 & \cellcolor[HTML]{B7E1CD}1.325 & \cellcolor[HTML]{B7E1CD}1.302 & 2.394 & \cellcolor[HTML]{B7E1CD}2.34 & \cellcolor[HTML]{B7E1CD}2.263 & \cellcolor[HTML]{B7E1CD}2.217 & 3.285 & \cellcolor[HTML]{B7E1CD}3.19 & \cellcolor[HTML]{B7E1CD}3.099 & \cellcolor[HTML]{B7E1CD}2.934 & 3.983 & \cellcolor[HTML]{B7E1CD}3.92 & \cellcolor[HTML]{B7E1CD}3.808 & \cellcolor[HTML]{B7E1CD}3.498 \\ & Glass Blue & 1.18 & \cellcolor[HTML]{B7E1CD}1.07 & \cellcolor[HTML]{B7E1CD}1.031 & \cellcolor[HTML]{B7E1CD}0.985 & 1.969 & \cellcolor[HTML]{B7E1CD}1.86 & \cellcolor[HTML]{B7E1CD}1.812 & \cellcolor[HTML]{B7E1CD}1.804 & 3.822 & \cellcolor[HTML]{EA9999}3.83 & \cellcolor[HTML]{B7E1CD}3.727 & \cellcolor[HTML]{B7E1CD}3.54 & 4.221 & \cellcolor[HTML]{EA9999}4.25 & \cellcolor[HTML]{B7E1CD}4.15 & \cellcolor[HTML]{B7E1CD}3.909 & 4.652 & \cellcolor[HTML]{B7E1CD}4.62 & \cellcolor[HTML]{B7E1CD}4.542 & \cellcolor[HTML]{B7E1CD}4.139 \\ & Motion Blur & 0.687 & \cellcolor[HTML]{B7E1CD}0.55 & \cellcolor[HTML]{B7E1CD}0.542 & \cellcolor[HTML]{B7E1CD}0.509 & 1.37 & \cellcolor[HTML]{B7E1CD}1.22 & \cellcolor[HTML]{B7E1CD}1.245 & \cellcolor[HTML]{B7E1CD}1.261 & 2.521 & \cellcolor[HTML]{B7E1CD}2.42 & \cellcolor[HTML]{B7E1CD}2.454 & \cellcolor[HTML]{B7E1CD}2.381 & 3.7 & \cellcolor[HTML]{B7E1CD}3.64 & \cellcolor[HTML]{B7E1CD}3.678 & \cellcolor[HTML]{B7E1CD}3.412 & 4.285 & \cellcolor[HTML]{B7E1CD}4.23 & \cellcolor[HTML]{B7E1CD}4.275 & \cellcolor[HTML]{B7E1CD}3.875 \\ & Zoom Blur & 1.518 & \cellcolor[HTML]{B7E1CD}1.38 & \cellcolor[HTML]{B7E1CD}1.382 & \cellcolor[HTML]{B7E1CD}1.386 & 2.219 & \cellcolor[HTML]{B7E1CD}2.1 & \cellcolor[HTML]{B7E1CD}2.119 & \cellcolor[HTML]{B7E1CD}2.094 & 2.688 & \cellcolor[HTML]{B7E1CD}2.58 & \cellcolor[HTML]{B7E1CD}2.588 & \cellcolor[HTML]{B7E1CD}2.518 & 3.213 & \cellcolor[HTML]{B7E1CD}3.12 & \cellcolor[HTML]{B7E1CD}3.137 & \cellcolor[HTML]{B7E1CD}3.028 & 3.666 & \cellcolor[HTML]{B7E1CD}3.59 & \cellcolor[HTML]{B7E1CD}3.599 & \cellcolor[HTML]{B7E1CD}3.437 \\ & Snow & 1.401 & \cellcolor[HTML]{B7E1CD}1.01 & \cellcolor[HTML]{B7E1CD}0.998 & \cellcolor[HTML]{B7E1CD}1.124 & 3.215 & \cellcolor[HTML]{B7E1CD}2.36 & \cellcolor[HTML]{B7E1CD}2.374 & \cellcolor[HTML]{B7E1CD}2.643 & 2.969 & \cellcolor[HTML]{B7E1CD}2.07 & \cellcolor[HTML]{B7E1CD}2.094 & \cellcolor[HTML]{B7E1CD}2.315 & 3.97 & \cellcolor[HTML]{B7E1CD}2.81 & \cellcolor[HTML]{B7E1CD}2.869 & \cellcolor[HTML]{B7E1CD}3.074 & 4.515 & \cellcolor[HTML]{B7E1CD}3.48 & \cellcolor[HTML]{B7E1CD}3.517 & \cellcolor[HTML]{B7E1CD}3.572 \\ & Frost & 0.949 & \cellcolor[HTML]{B7E1CD}0.66 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.633 & 2.093 & \cellcolor[HTML]{B7E1CD}1.68 & \cellcolor[HTML]{B7E1CD}1.681 & \cellcolor[HTML]{B7E1CD}1.812 & 2.978 & \cellcolor[HTML]{B7E1CD}2.53 & \cellcolor[HTML]{B7E1CD}2.553 & \cellcolor[HTML]{B7E1CD}2.688 & 3.141 & \cellcolor[HTML]{B7E1CD}2.74 & \cellcolor[HTML]{B7E1CD}2.766 & \cellcolor[HTML]{B7E1CD}2.893 & 3.713 & \cellcolor[HTML]{B7E1CD}3.31 & \cellcolor[HTML]{B7E1CD}3.362 & \cellcolor[HTML]{B7E1CD}3.447 \\ & Fog & 0.809 & \cellcolor[HTML]{B7E1CD}0.42 & \cellcolor[HTML]{B7E1CD}0.444 & \cellcolor[HTML]{B7E1CD}0.405 & 1.214 & \cellcolor[HTML]{B7E1CD}0.68 & \cellcolor[HTML]{B7E1CD}0.734 & \cellcolor[HTML]{B7E1CD}0.774 & 1.857 & \cellcolor[HTML]{B7E1CD}1.18 & \cellcolor[HTML]{B7E1CD}1.273 & \cellcolor[HTML]{B7E1CD}1.431 & 2.347 & \cellcolor[HTML]{B7E1CD}1.68 & \cellcolor[HTML]{B7E1CD}1.762 & \cellcolor[HTML]{B7E1CD}1.958 & 3.77 & \cellcolor[HTML]{B7E1CD}3.03 & \cellcolor[HTML]{B7E1CD}3.141 & \cellcolor[HTML]{B7E1CD}3.275 \\ & Brightness & 0.121 & \cellcolor[HTML]{B7E1CD}0.04 & \cellcolor[HTML]{B7E1CD}0.019 & \cellcolor[HTML]{EA9999}0.155 & 0.221 & \cellcolor[HTML]{B7E1CD}0.1 & \cellcolor[HTML]{B7E1CD}0.08 & \cellcolor[HTML]{B7E1CD}0.062 & 0.378 & \cellcolor[HTML]{B7E1CD}0.19 & \cellcolor[HTML]{B7E1CD}0.184 & \cellcolor[HTML]{B7E1CD}0.084 & 0.631 & \cellcolor[HTML]{B7E1CD}0.37 & \cellcolor[HTML]{B7E1CD}0.36 & \cellcolor[HTML]{B7E1CD}0.323 & 0.986 & \cellcolor[HTML]{B7E1CD}0.62 & \cellcolor[HTML]{B7E1CD}0.626 & \cellcolor[HTML]{B7E1CD}0.672 \\ & Contrast & 0.523 & \cellcolor[HTML]{B7E1CD}0.24 & \cellcolor[HTML]{B7E1CD}0.232 & \cellcolor[HTML]{B7E1CD}0.13 & 0.867 & \cellcolor[HTML]{B7E1CD}0.4 & \cellcolor[HTML]{B7E1CD}0.413 & \cellcolor[HTML]{B7E1CD}0.396 & 1.627 & \cellcolor[HTML]{B7E1CD}0.81 & \cellcolor[HTML]{B7E1CD}0.861 & \cellcolor[HTML]{B7E1CD}1.031 & 3.61 & \cellcolor[HTML]{B7E1CD}2.37 & \cellcolor[HTML]{B7E1CD}2.529 & \cellcolor[HTML]{B7E1CD}2.921 & 5.264 & \cellcolor[HTML]{B7E1CD}4.63 & \cellcolor[HTML]{B7E1CD}4.765 & \cellcolor[HTML]{B7E1CD}4.479 \\ & Elastic & 0.538 & \cellcolor[HTML]{B7E1CD}0.43 & \cellcolor[HTML]{B7E1CD}0.406 & \cellcolor[HTML]{B7E1CD}0.287 & 2.026 & \cellcolor[HTML]{B7E1CD}1.95 & \cellcolor[HTML]{B7E1CD}1.911 & \cellcolor[HTML]{B7E1CD}1.833 & 1.116 & \cellcolor[HTML]{B7E1CD}1.03 & \cellcolor[HTML]{B7E1CD}0.969 & \cellcolor[HTML]{B7E1CD}0.884 & 1.997 & \cellcolor[HTML]{B7E1CD}1.94 & \cellcolor[HTML]{B7E1CD}1.844 & \cellcolor[HTML]{B7E1CD}1.755 & 4.112 & \cellcolor[HTML]{B7E1CD}4.11 & \cellcolor[HTML]{B7E1CD}3.957 & \cellcolor[HTML]{B7E1CD}3.57 \\ & Pixelate & 0.612 & \cellcolor[HTML]{B7E1CD}0.5 & \cellcolor[HTML]{B7E1CD}0.492 & \cellcolor[HTML]{B7E1CD}0.416 & 0.599 & \cellcolor[HTML]{B7E1CD}0.51 & \cellcolor[HTML]{B7E1CD}0.506 & \cellcolor[HTML]{B7E1CD}0.465 & 1.889 & \cellcolor[HTML]{B7E1CD}1.72 & \cellcolor[HTML]{B7E1CD}1.734 & \cellcolor[HTML]{EA9999}1.958 & 3.046 & \cellcolor[HTML]{B7E1CD}2.93 & \cellcolor[HTML]{B7E1CD}2.88 & \cellcolor[HTML]{EA9999}3.306 & 3.369 & \cellcolor[HTML]{B7E1CD}3.32 & \cellcolor[HTML]{B7E1CD}3.313 & \cellcolor[HTML]{EA9999}3.51 \\ \multirow{-15}{*}{ResNet-18} & JPEG & 0.59 & \cellcolor[HTML]{B7E1CD}0.48 & \cellcolor[HTML]{B7E1CD}0.468 & \cellcolor[HTML]{B7E1CD}0.375 & 0.801 & \cellcolor[HTML]{B7E1CD}0.68 & \cellcolor[HTML]{B7E1CD}0.674 & \cellcolor[HTML]{B7E1CD}0.627 & 0.972 & \cellcolor[HTML]{B7E1CD}0.85 & \cellcolor[HTML]{B7E1CD}0.841 & \cellcolor[HTML]{B7E1CD}0.824 & 1.599 & \cellcolor[HTML]{B7E1CD}1.46 & \cellcolor[HTML]{B7E1CD}1.446 & \cellcolor[HTML]{B7E1CD}1.491 & 2.615 & \cellcolor[HTML]{B7E1CD}2.43 & \cellcolor[HTML]{B7E1CD}2.405 & \cellcolor[HTML]{B7E1CD}2.487 \\ \hline & Gaussian Noise & 1.041 & \cellcolor[HTML]{B7E1CD}0.76 & \cellcolor[HTML]{B7E1CD}0.857 & \cellcolor[HTML]{EA9999}2.78 & 1.923 & \cellcolor[HTML]{B7E1CD}1.382 & \cellcolor[HTML]{B7E1CD}1.536 & \cellcolor[HTML]{EA9999}3.755 & 3.425 & \cellcolor[HTML]{B7E1CD}2.5 & \cellcolor[HTML]{B7E1CD}2.762 & \cellcolor[HTML]{EA9999}5.009 & 5.251 & \cellcolor[HTML]{B7E1CD}4.065 & \cellcolor[HTML]{B7E1CD}4.518 & \cellcolor[HTML]{EA9999}6.182 & 6.997 & \cellcolor[HTML]{B7E1CD}5.815 & \cellcolor[HTML]{B7E1CD}6.423 & \cellcolor[HTML]{EA9999}7.124 \\ & Shot Noise & 1.132 & \cellcolor[HTML]{B7E1CD}0.843 & \cellcolor[HTML]{B7E1CD}1.027 & \cellcolor[HTML]{EA9999}2.926 & 2.214 & \cellcolor[HTML]{B7E1CD}1.591 & \cellcolor[HTML]{B7E1CD}1.846 & \cellcolor[HTML]{EA9999}3.975 & 3.67 & \cellcolor[HTML]{B7E1CD}2.624 & \cellcolor[HTML]{B7E1CD}3.013 & \cellcolor[HTML]{EA9999}5.058 & 5.891 & \cellcolor[HTML]{B7E1CD}4.363 & \cellcolor[HTML]{B7E1CD}5.011 & \cellcolor[HTML]{EA9999}6.36 & 7.045 & \cellcolor[HTML]{B7E1CD}5.418 & \cellcolor[HTML]{B7E1CD}6.137 & \cellcolor[HTML]{B7E1CD}6.96 \\ & Impulse Noise & 1.635 & \cellcolor[HTML]{B7E1CD}1.483 & \cellcolor[HTML]{B7E1CD}1.585 & \cellcolor[HTML]{EA9999}3.043 & 2.597 & \cellcolor[HTML]{B7E1CD}2.223 & \cellcolor[HTML]{B7E1CD}2.284 & \cellcolor[HTML]{EA9999}4.144 & 3.423 & \cellcolor[HTML]{B7E1CD}2.751 & \cellcolor[HTML]{B7E1CD}2.901 & \cellcolor[HTML]{EA9999}4.961 & 5.302 & \cellcolor[HTML]{B7E1CD}4.171 & \cellcolor[HTML]{B7E1CD}4.591 & \cellcolor[HTML]{EA9999}6.296 & 6.979 & \cellcolor[HTML]{B7E1CD}5.753 & \cellcolor[HTML]{B7E1CD}6.315 & \cellcolor[HTML]{EA9999}7.126 \\ & Defocus Noise & 0.863 & \cellcolor[HTML]{B7E1CD}0.799 & \cellcolor[HTML]{EA9999}1.005 & \cellcolor[HTML]{EA9999}3.856 & 1.326 & \cellcolor[HTML]{B7E1CD}1.266 & \cellcolor[HTML]{EA9999}1.519 & \cellcolor[HTML]{EA9999}4.286 & 2.23 & \cellcolor[HTML]{B7E1CD}2.213 & \cellcolor[HTML]{EA9999}2.434 & \cellcolor[HTML]{EA9999}4.858 & 3.059 & \cellcolor[HTML]{B7E1CD}2.983 & \cellcolor[HTML]{EA9999}3.328 & \cellcolor[HTML]{EA9999}5.119 & 3.784 & \cellcolor[HTML]{B7E1CD}3.655 & \cellcolor[HTML]{EA9999}4.2 & \cellcolor[HTML]{EA9999}5.303 \\ & Glass Blue & 1.223 & \cellcolor[HTML]{B7E1CD}1.141 & \cellcolor[HTML]{EA9999}1.509 & \cellcolor[HTML]{EA9999}3.538 & 2.115 & \cellcolor[HTML]{B7E1CD}2.039 & \cellcolor[HTML]{EA9999}2.443 & \cellcolor[HTML]{EA9999}4.257 & 4.01 & \cellcolor[HTML]{B7E1CD}4.003 & \cellcolor[HTML]{EA9999}4.309 & \cellcolor[HTML]{EA9999}4.943 & 4.375 & \cellcolor[HTML]{B7E1CD}4.353 & \cellcolor[HTML]{EA9999}4.564 & \cellcolor[HTML]{EA9999}5.039 & 4.668 & \cellcolor[HTML]{B7E1CD}4.601 & \cellcolor[HTML]{EA9999}4.709 & \cellcolor[HTML]{EA9999}5.141 \\ & Motion Blur & 0.643 & \cellcolor[HTML]{B7E1CD}0.53 & \cellcolor[HTML]{B7E1CD}0.641 & \cellcolor[HTML]{EA9999}3.068 & 1.335 & \cellcolor[HTML]{B7E1CD}1.209 & \cellcolor[HTML]{EA9999}1.354 & \cellcolor[HTML]{EA9999}3.768 & 2.392 & \cellcolor[HTML]{B7E1CD}2.282 & \cellcolor[HTML]{EA9999}2.435 & \cellcolor[HTML]{EA9999}4.349 & 3.5 & \cellcolor[HTML]{B7E1CD}3.418 & \cellcolor[HTML]{EA9999}3.588 & \cellcolor[HTML]{EA9999}4.774 & 4.108 & \cellcolor[HTML]{B7E1CD}4.028 & \cellcolor[HTML]{EA9999}4.235 & \cellcolor[HTML]{EA9999}4.983 \\ & Zoom Blur & 1.539 & \cellcolor[HTML]{B7E1CD}1.423 & \cellcolor[HTML]{EA9999}1.607 & \cellcolor[HTML]{EA9999}3.564 & 2.282 & \cellcolor[HTML]{B7E1CD}2.185 & \cellcolor[HTML]{EA9999}2.358 & \cellcolor[HTML]{EA9999}3.97 & 2.774 & \cellcolor[HTML]{B7E1CD}2.685 & \cellcolor[HTML]{EA9999}2.886 & \cellcolor[HTML]{EA9999}4.335 & 3.317 & \cellcolor[HTML]{B7E1CD}3.263 & \cellcolor[HTML]{EA9999}3.492 & \cellcolor[HTML]{EA9999}4.567 & 3.797 & \cellcolor[HTML]{B7E1CD}3.738 & \cellcolor[HTML]{EA9999}4.035 & \cellcolor[HTML]{EA9999}4.817 \\ & Snow & 1.253 & \cellcolor[HTML]{B7E1CD}0.904 & \cellcolor[HTML]{B7E1CD}1.168 & \cellcolor[HTML]{EA9999}2.377 & 3.074 & \cellcolor[HTML]{B7E1CD}2.429 & \cellcolor[HTML]{B7E1CD}2.694 & \cellcolor[HTML]{EA9999}4.017 & 2.838 & \cellcolor[HTML]{B7E1CD}2.16 & \cellcolor[HTML]{B7E1CD}2.497 & \cellcolor[HTML]{EA9999}3.869 & 3.775 & \cellcolor[HTML]{B7E1CD}2.945 & \cellcolor[HTML]{B7E1CD}3.286 & \cellcolor[HTML]{EA9999}4.797 & 4.562 & \cellcolor[HTML]{B7E1CD}3.776 & \cellcolor[HTML]{B7E1CD}3.996 & \cellcolor[HTML]{EA9999}5.151 \\ & Frost & 0.941 & \cellcolor[HTML]{B7E1CD}0.658 & \cellcolor[HTML]{B7E1CD}0.8 & \cellcolor[HTML]{EA9999}2.236 & 2.193 & \cellcolor[HTML]{B7E1CD}1.784 & \cellcolor[HTML]{B7E1CD}2.021 & \cellcolor[HTML]{EA9999}3.713 & 3.154 & \cellcolor[HTML]{B7E1CD}2.673 & \cellcolor[HTML]{B7E1CD}2.969 & \cellcolor[HTML]{EA9999}4.682 & 3.345 & \cellcolor[HTML]{B7E1CD}2.904 & \cellcolor[HTML]{B7E1CD}3.223 & \cellcolor[HTML]{EA9999}4.94 & 3.956 & \cellcolor[HTML]{B7E1CD}3.484 & \cellcolor[HTML]{B7E1CD}3.835 & \cellcolor[HTML]{EA9999}5.46 \\ & Fog & 0.699 & \cellcolor[HTML]{B7E1CD}0.354 & \cellcolor[HTML]{EA9999}0.822 & \cellcolor[HTML]{EA9999}3.874 & 1.084 & \cellcolor[HTML]{B7E1CD}0.624 & \cellcolor[HTML]{EA9999}1.24 & \cellcolor[HTML]{EA9999}4.454 & 1.715 & \cellcolor[HTML]{B7E1CD}1.145 & \cellcolor[HTML]{EA9999}1.802 & \cellcolor[HTML]{EA9999}4.929 & 2.256 & \cellcolor[HTML]{B7E1CD}1.675 & \cellcolor[HTML]{B7E1CD}2.111 & \cellcolor[HTML]{EA9999}4.969 & 3.792 & \cellcolor[HTML]{B7E1CD}3.116 & \cellcolor[HTML]{B7E1CD}3.298 & \cellcolor[HTML]{EA9999}5.371 \\ & Brightness & 0.034 & \cellcolor[HTML]{EA9999}0.05 & \cellcolor[HTML]{B7E1CD}0.019 & \cellcolor[HTML]{EA9999}1.342 & 0.143 & \cellcolor[HTML]{B7E1CD}0.008 & \cellcolor[HTML]{B7E1CD}0.089 & \cellcolor[HTML]{EA9999}1.359 & 0.315 & \cellcolor[HTML]{B7E1CD}0.117 & \cellcolor[HTML]{B7E1CD}0.211 & \cellcolor[HTML]{EA9999}1.549 & 0.595 & \cellcolor[HTML]{B7E1CD}0.303 & \cellcolor[HTML]{B7E1CD}0.422 & \cellcolor[HTML]{EA9999}1.987 & 1.002 & \cellcolor[HTML]{B7E1CD}0.591 & \cellcolor[HTML]{B7E1CD}0.752 & \cellcolor[HTML]{EA9999}2.663 \\ & Contrast & 0.482 & \cellcolor[HTML]{B7E1CD}0.188 & \cellcolor[HTML]{EA9999}0.816 & \cellcolor[HTML]{EA9999}3.717 & 0.846 & \cellcolor[HTML]{B7E1CD}0.394 & \cellcolor[HTML]{EA9999}1.513 & \cellcolor[HTML]{EA9999}4.571 & 1.624 & \cellcolor[HTML]{B7E1CD}0.899 & \cellcolor[HTML]{EA9999}3.095 & \cellcolor[HTML]{EA9999}5.556 & 3.66 & \cellcolor[HTML]{B7E1CD}2.725 & \cellcolor[HTML]{EA9999}5.816 & \cellcolor[HTML]{EA9999}6.396 & 5.411 & \cellcolor[HTML]{B7E1CD}4.881 & \cellcolor[HTML]{EA9999}6.505 & \cellcolor[HTML]{EA9999}6.59 \\ & Elastic & 0.442 & \cellcolor[HTML]{B7E1CD}0.347 & \cellcolor[HTML]{EA9999}0.481 & \cellcolor[HTML]{EA9999}2.396 & 1.973 & \cellcolor[HTML]{B7E1CD}1.871 & \cellcolor[HTML]{EA9999}2.105 & \cellcolor[HTML]{EA9999}4.012 & 0.962 & \cellcolor[HTML]{B7E1CD}0.87 & \cellcolor[HTML]{EA9999}1.113 & \cellcolor[HTML]{EA9999}2.474 & 1.913 & \cellcolor[HTML]{B7E1CD}1.807 & \cellcolor[HTML]{EA9999}2.21 & \cellcolor[HTML]{EA9999}2.959 & 4.106 & \cellcolor[HTML]{B7E1CD}3.982 & \cellcolor[HTML]{EA9999}4.693 & \cellcolor[HTML]{B7E1CD}4.036 \\ & Pixelate & 0.926 & \cellcolor[HTML]{B7E1CD}0.653 & \cellcolor[HTML]{B7E1CD}0.872 & \cellcolor[HTML]{EA9999}1.883 & 1.444 & \cellcolor[HTML]{B7E1CD}1.02 & \cellcolor[HTML]{B7E1CD}0.934 & \cellcolor[HTML]{EA9999}1.838 & 2.155 & \cellcolor[HTML]{B7E1CD}1.822 & \cellcolor[HTML]{EA9999}2.468 & \cellcolor[HTML]{EA9999}2.172 & 3.111 & \cellcolor[HTML]{B7E1CD}3.064 & \cellcolor[HTML]{EA9999}3.773 & \cellcolor[HTML]{B7E1CD}2.755 & 3.979 & \cellcolor[HTML]{EA9999}3.993 & \cellcolor[HTML]{EA9999}3.988 & \cellcolor[HTML]{B7E1CD}3.301 \\ \multirow{-15}{*}{MoblieNet V2} & JPEG & 0.491 & \cellcolor[HTML]{B7E1CD}0.382 & \cellcolor[HTML]{EA9999}0.554 & \cellcolor[HTML]{EA9999}1.754 & 0.675 & \cellcolor[HTML]{B7E1CD}0.552 & \cellcolor[HTML]{EA9999}0.784 & \cellcolor[HTML]{EA9999}1.848 & 0.826 & \cellcolor[HTML]{B7E1CD}0.693 & \cellcolor[HTML]{EA9999}0.967 & \cellcolor[HTML]{EA9999}1.928 & 1.357 & \cellcolor[HTML]{B7E1CD}1.165 & \cellcolor[HTML]{EA9999}1.555 & \cellcolor[HTML]{EA9999}2.182 & 2.182 & \cellcolor[HTML]{B7E1CD}1.902 & \cellcolor[HTML]{EA9999}2.462 & \cellcolor[HTML]{EA9999}2.545 \\ \hline & Gaussian Noise & 0.938 & \cellcolor[HTML]{b7e1cd}0.914 & \cellcolor[HTML]{b7e1cd}0.928 & \cellcolor[HTML]{ea9999}0.973 &1.437 & \cellcolor[HTML]{b7e1cd}1.282 & \cellcolor[HTML]{b7e1cd}1.112 & \cellcolor[HTML]{ea9999}1.571 &2.363 & \cellcolor[HTML]{b7e1cd}2.047 & \cellcolor[HTML]{ea9999}2.513 & \cellcolor[HTML]{ea9999}2.89 &3.719 & \cellcolor[HTML]{b7e1cd}3.255 & \cellcolor[HTML]{ea9999}3.754 & \cellcolor[HTML]{ea9999}4.88 &5.134 & \cellcolor[HTML]{b7e1cd}4.999 & \cellcolor[HTML]{ea9999}5.339 & \cellcolor[HTML]{ea9999}7.828 \\ & Shot Noise & 0.961 & \cellcolor[HTML]{b7e1cd}0.946 & \cellcolor[HTML]{b7e1cd}0.957 & \cellcolor[HTML]{ea9999}1.026 &1.585 & \cellcolor[HTML]{b7e1cd}1.408 & \cellcolor[HTML]{b7e1cd}1.246 & \cellcolor[HTML]{ea9999}1.83 &2.448 & \cellcolor[HTML]{b7e1cd}2.166 & \cellcolor[HTML]{b7e1cd}2.023 & \cellcolor[HTML]{ea9999}3.215 &4.084 & \cellcolor[HTML]{b7e1cd}3.697 & \cellcolor[HTML]{b7e1cd}3.887 & \cellcolor[HTML]{ea9999}5.748 &4.924 & \cellcolor[HTML]{b7e1cd}4.919 & \cellcolor[HTML]{ea9999}5.229 & \cellcolor[HTML]{ea9999}7.656 \\ & Impulse Noise & 1.703 & \cellcolor[HTML]{b7e1cd}1.652 & \cellcolor[HTML]{b7e1cd}1.676 & \cellcolor[HTML]{ea9999}1.789 &2.013 & \cellcolor[HTML]{b7e1cd}1.874 & \cellcolor[HTML]{b7e1cd}1.464 & \cellcolor[HTML]{ea9999}2.373 &2.564 & \cellcolor[HTML]{b7e1cd}2.295 & \cellcolor[HTML]{b7e1cd}1.995 & \cellcolor[HTML]{ea9999}3.211 &3.962 & \cellcolor[HTML]{b7e1cd}3.507 & \cellcolor[HTML]{b7e1cd}3.458 & \cellcolor[HTML]{ea9999}5.435 &5.28 & \cellcolor[HTML]{b7e1cd}4.942 & \cellcolor[HTML]{b7e1cd}5.105 & \cellcolor[HTML]{ea9999}8.123 \\ & Defocus Noise & 1.059 & \cellcolor[HTML]{b7e1cd}1.042 & \cellcolor[HTML]{b7e1cd}0.911 & \cellcolor[HTML]{b7e1cd}0.869 &1.441 & \cellcolor[HTML]{b7e1cd}1.414 & \cellcolor[HTML]{b7e1cd}1.309 & \cellcolor[HTML]{b7e1cd}1.298 &2.344 & \cellcolor[HTML]{b7e1cd}2.311 & \cellcolor[HTML]{b7e1cd}2.286 & \cellcolor[HTML]{b7e1cd}2.231 &3.244 & \cellcolor[HTML]{b7e1cd}3.225 & \cellcolor[HTML]{b7e1cd}3.226 & \cellcolor[HTML]{b7e1cd}3.164 &4.052 & \cellcolor[HTML]{b7e1cd}4.049 & \cellcolor[HTML]{b7e1cd}4.019 & \cellcolor[HTML]{b7e1cd}3.994 \\ & Glass Blue & 1.349 & \cellcolor[HTML]{b7e1cd}1.27 & \cellcolor[HTML]{b7e1cd}1.011 & \cellcolor[HTML]{ea9999}1.457 &2.297 & \cellcolor[HTML]{b7e1cd}2.169 & \cellcolor[HTML]{b7e1cd}1.829 & \cellcolor[HTML]{b7e1cd}2.088 &4.613 & \cellcolor[HTML]{b7e1cd}4.551 & \cellcolor[HTML]{b7e1cd}4.185 & \cellcolor[HTML]{b7e1cd}4.346 &5.057 & \cellcolor[HTML]{b7e1cd}5.009 & \cellcolor[HTML]{b7e1cd}4.815 & \cellcolor[HTML]{b7e1cd}4.793 &5.399 & \cellcolor[HTML]{b7e1cd}5.376 & \cellcolor[HTML]{b7e1cd}5.34 & \cellcolor[HTML]{b7e1cd}5.102 \\ & Motion Blur & 0.731 & \cellcolor[HTML]{b7e1cd}0.638 & \cellcolor[HTML]{b7e1cd}0.623 & \cellcolor[HTML]{b7e1cd}0.578 &1.314 & \cellcolor[HTML]{b7e1cd}1.307 & \cellcolor[HTML]{b7e1cd}1.19 & \cellcolor[HTML]{b7e1cd}1.238 &2.563 & \cellcolor[HTML]{b7e1cd}2.551 & \cellcolor[HTML]{b7e1cd}2.337 & \cellcolor[HTML]{b7e1cd}2.501 &4.148 & \cellcolor[HTML]{b7e1cd}4.057 & \cellcolor[HTML]{b7e1cd}3.672 & \cellcolor[HTML]{b7e1cd}3.96 &5.048 & \cellcolor[HTML]{b7e1cd}5.033 & \cellcolor[HTML]{b7e1cd}4.404 & \cellcolor[HTML]{b7e1cd}4.75 \\ & Zoom Blur & 1.509 & \cellcolor[HTML]{b7e1cd}1.473 & \cellcolor[HTML]{b7e1cd}1.261 & \cellcolor[HTML]{b7e1cd}1.361 &2.252 & \cellcolor[HTML]{b7e1cd}2.187 & \cellcolor[HTML]{b7e1cd}2.037 & \cellcolor[HTML]{b7e1cd}2.134 &2.822 & \cellcolor[HTML]{b7e1cd}2.736 & \cellcolor[HTML]{b7e1cd}2.571 & \cellcolor[HTML]{b7e1cd}2.697 &3.44 & \cellcolor[HTML]{b7e1cd}3.337 & \cellcolor[HTML]{b7e1cd}3.139 & \cellcolor[HTML]{b7e1cd}3.325 &4.039 & \cellcolor[HTML]{b7e1cd}3.949 & \cellcolor[HTML]{b7e1cd}3.676 & \cellcolor[HTML]{b7e1cd}3.916 \\ & Snow & 1.229 & \cellcolor[HTML]{b7e1cd}1.13 & \cellcolor[HTML]{b7e1cd}1.048 & \cellcolor[HTML]{b7e1cd}1.143 &2.62 & \cellcolor[HTML]{b7e1cd}2.529 & \cellcolor[HTML]{b7e1cd}2.481 & \cellcolor[HTML]{ea9999}2.933 &2.375 & \cellcolor[HTML]{b7e1cd}2.317 & \cellcolor[HTML]{b7e1cd}2.193 & \cellcolor[HTML]{ea9999}2.478 &3.127 & \cellcolor[HTML]{b7e1cd}3.016 & \cellcolor[HTML]{b7e1cd}2.941 & \cellcolor[HTML]{ea9999}3.347 &3.697 & \cellcolor[HTML]{b7e1cd}3.437 & \cellcolor[HTML]{ea9999}3.8 & \cellcolor[HTML]{ea9999}4.209 \\ & Frost & 0.845 & \cellcolor[HTML]{b7e1cd}0.837 & \cellcolor[HTML]{b7e1cd}0.674 & \cellcolor[HTML]{b7e1cd}0.653 &1.769 & \cellcolor[HTML]{b7e1cd}1.726 & \cellcolor[HTML]{b7e1cd}1.506 & \cellcolor[HTML]{ea9999}1.785 &2.563 & \cellcolor[HTML]{b7e1cd}2.522 & \cellcolor[HTML]{b7e1cd}2.245 & \cellcolor[HTML]{ea9999}2.588 &2.761 & \cellcolor[HTML]{b7e1cd}2.72 & \cellcolor[HTML]{b7e1cd}2.435 & \cellcolor[HTML]{ea9999}2.816 &3.322 & \cellcolor[HTML]{b7e1cd}3.291 & \cellcolor[HTML]{b7e1cd}2.972 & \cellcolor[HTML]{ea9999}3.427 \\ & Fog & 0.691 & \cellcolor[HTML]{b7e1cd}0.685 & \cellcolor[HTML]{b7e1cd}0.582 & \cellcolor[HTML]{b7e1cd}0.501 &0.897 & \cellcolor[HTML]{b7e1cd}0.876 & \cellcolor[HTML]{b7e1cd}0.784 & \cellcolor[HTML]{ea9999}0.926 &1.305 & \cellcolor[HTML]{b7e1cd}1.248 & \cellcolor[HTML]{b7e1cd}1.167 & \cellcolor[HTML]{ea9999}1.337 &1.81 & \cellcolor[HTML]{b7e1cd}1.657 & \cellcolor[HTML]{b7e1cd}1.635 & \cellcolor[HTML]{ea9999}1.909 &3.261 & \cellcolor[HTML]{b7e1cd}2.96 & \cellcolor[HTML]{b7e1cd}2.979 & \cellcolor[HTML]{ea9999}3.569 \\ & Brightness & 0.345 & \cellcolor[HTML]{b7e1cd}0.25 & \cellcolor[HTML]{b7e1cd}0.21 & \cellcolor[HTML]{b7e1cd}0.081 &0.382 & \cellcolor[HTML]{b7e1cd}0.314 & \cellcolor[HTML]{b7e1cd}0.259 & \cellcolor[HTML]{ea9999}0.431 &0.453 & \cellcolor[HTML]{b7e1cd}0.446 & \cellcolor[HTML]{b7e1cd}0.341 & \cellcolor[HTML]{b7e1cd}0.222 &0.584 & \cellcolor[HTML]{b7e1cd}0.514 & \cellcolor[HTML]{b7e1cd}0.473 & \cellcolor[HTML]{b7e1cd}0.367 &0.787 & \cellcolor[HTML]{b7e1cd}0.716 & \cellcolor[HTML]{b7e1cd}0.674 & \cellcolor[HTML]{b7e1cd}0.609 \\ & Contrast & 0.545 & \cellcolor[HTML]{b7e1cd}0.449 & \cellcolor[HTML]{b7e1cd}0.42 & \cellcolor[HTML]{b7e1cd}0.308 &0.69 & \cellcolor[HTML]{b7e1cd}0.677 & \cellcolor[HTML]{b7e1cd}0.568 & \cellcolor[HTML]{b7e1cd}0.494 &1.047 & \cellcolor[HTML]{b7e1cd}0.998 & \cellcolor[HTML]{b7e1cd}0.867 & \cellcolor[HTML]{ea9999}1.051 &2.387 & \cellcolor[HTML]{b7e1cd}2.173 & \cellcolor[HTML]{b7e1cd}1.85 & \cellcolor[HTML]{ea9999}2.624 &4.686 & \cellcolor[HTML]{b7e1cd}4.394 & \cellcolor[HTML]{b7e1cd}3.673 & \cellcolor[HTML]{ea9999}4.914 \\ & Elastic & 0.655 & \cellcolor[HTML]{b7e1cd}0.625 & \cellcolor[HTML]{b7e1cd}0.517 & \cellcolor[HTML]{b7e1cd}0.422 &2.274 & \cellcolor[HTML]{b7e1cd}2.243 & \cellcolor[HTML]{b7e1cd}1.908 & \cellcolor[HTML]{b7e1cd}2.122 &1.602 & \cellcolor[HTML]{b7e1cd}1.571 & \cellcolor[HTML]{b7e1cd}1.242 & \cellcolor[HTML]{b7e1cd}1.416 &2.704 & \cellcolor[HTML]{b7e1cd}2.671 & \cellcolor[HTML]{b7e1cd}2.209 & \cellcolor[HTML]{b7e1cd}2.591 &5.598 & \cellcolor[HTML]{b7e1cd}5.522 & \cellcolor[HTML]{b7e1cd}4.647 & \cellcolor[HTML]{b7e1cd}5.348 \\ & Pixelate & 0.871 & \cellcolor[HTML]{b7e1cd}0.71 & \cellcolor[HTML]{b7e1cd}0.727 & \cellcolor[HTML]{b7e1cd}0.684 &1.042 & \cellcolor[HTML]{b7e1cd}1.021 & \cellcolor[HTML]{b7e1cd}1.015 & \cellcolor[HTML]{b7e1cd}0.778 &1.971 & \cellcolor[HTML]{b7e1cd}1.575 & \cellcolor[HTML]{b7e1cd}1.556 & \cellcolor[HTML]{ea9999}1.974 &3.373 & \cellcolor[HTML]{b7e1cd}3.229 & \cellcolor[HTML]{b7e1cd}3.208 & \cellcolor[HTML]{ea9999}3.431 &4.038 & \cellcolor[HTML]{b7e1cd}4.014 & \cellcolor[HTML]{b7e1cd}3.996 & \cellcolor[HTML]{ea9999}4.179 \\ \multirow{-15}{*}{ResNet-50} & JPEG & 0.793 & \cellcolor[HTML]{b7e1cd}0.701 & \cellcolor[HTML]{b7e1cd}0.67 & \cellcolor[HTML]{b7e1cd}0.522 &0.957 & \cellcolor[HTML]{b7e1cd}0.98 & \cellcolor[HTML]{b7e1cd}0.832 & \cellcolor[HTML]{b7e1cd}0.704 &1.092 & \cellcolor[HTML]{b7e1cd}1.023 & \cellcolor[HTML]{b7e1cd}0.96 & \cellcolor[HTML]{b7e1cd}0.849 &1.537 & \cellcolor[HTML]{b7e1cd}1.425 & \cellcolor[HTML]{b7e1cd}1.356 & \cellcolor[HTML]{b7e1cd}1.394 &2.236 & \cellcolor[HTML]{b7e1cd}2.118 & \cellcolor[HTML]{b7e1cd}1.967 & \cellcolor[HTML]{b7e1cd}2.228 \\ \hline \end{tabular} \end{adjustbox} \end{table}
\end{document}
%%%%% NEW MATH DEFINITIONS %%%%%
\usepackage{amsmath,amsfonts,bm}
% Mark sections of captions for referring to divisions of figures
\newcommand{\figleft}{{\em (Left)}}
\newcommand{\figcenter}{{\em (Center)}}
\newcommand{\figright}{{\em (Right)}}
\newcommand{\figtop}{{\em (Top)}}
\newcommand{\figbottom}{{\em (Bottom)}}
\newcommand{\captiona}{{\em (a)}}
\newcommand{\captionb}{{\em (b)}}
\newcommand{\captionc}{{\em (c)}}
\newcommand{\captiond}{{\em (d)}}
% Highlight a newly defined term
\newcommand{\newterm}[1]{{\bf #1}}
% Figure reference, lower-case.
\def\figref#1{figure~\ref{#1}}
% Figure reference, capital. For start of sentence
\def\Figref#1{Figure~\ref{#1}}
\def\twofigref#1#2{figures \ref{#1} and \ref{#2}}
\def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}}
% Section reference, lower-case.
\def\secref#1{section~\ref{#1}}
% Section reference, capital.
\def\Secref#1{Section~\ref{#1}}
% Reference to two sections.
\def\twosecrefs#1#2{sections \ref{#1} and \ref{#2}}
% Reference to three sections.
\def\secrefs#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}}
% Reference to an equation, lower-case.
\def\eqref#1{equation~\ref{#1}}
% Reference to an equation, upper case
\def\Eqref#1{Equation~\ref{#1}}
% A raw reference to an equation---avoid using if possible
\def\plaineqref#1{\ref{#1}}
% Reference to a chapter, lower-case.
\def\chapref#1{chapter~\ref{#1}}
% Reference to an equation, upper case.
\def\Chapref#1{Chapter~\ref{#1}}
% Reference to a range of chapters
\def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}}
% Reference to an algorithm, lower-case.
\def\algref#1{algorithm~\ref{#1}}
% Reference to an algorithm, upper case.
\def\Algref#1{Algorithm~\ref{#1}}
\def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}}
\def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}}
% Reference to a part, lower case
\def\partref#1{part~\ref{#1}}
% Reference to a part, upper case
\def\Partref#1{Part~\ref{#1}}
\def\twopartref#1#2{parts \ref{#1} and \ref{#2}}
\def\ceil#1{\lceil #1 \rceil}
\def\floor#1{\lfloor #1 \rfloor}
\def\1{\bm{1}}
\newcommand{\train}{\mathcal{D}}
\newcommand{\valid}{\mathcal{D_{\mathrm{valid}}}}
\newcommand{\test}{\mathcal{D_{\mathrm{test}}}}
\def\eps{{\epsilon}}
% Random variables
\def\reta{{\textnormal{}}}
\def\ra{{\textnormal{a}}}
\def\rb{{\textnormal{b}}}
\def\rc{{\textnormal{c}}}
\def\rd{{\textnormal{d}}}
\def\re{{\textnormal{e}}}
\def\rf{{\textnormal{f}}}
\def\rg{{\textnormal{g}}}
\def\rh{{\textnormal{h}}}
\def\ri{{\textnormal{i}}}
\def\rj{{\textnormal{j}}}
\def\rk{{\textnormal{k}}}
\def\rl{{\textnormal{l}}}
% rm is already a command, just don't name any random variables m
\def\rn{{\textnormal{n}}}
\def\ro{{\textnormal{o}}}
\def\rp{{\textnormal{p}}}
\def\rq{{\textnormal{q}}}
\def\rr{{\textnormal{r}}}
\def\rs{{\textnormal{s}}}
\def\rt{{\textnormal{t}}}
\def\ru{{\textnormal{u}}}
\def\rv{{\textnormal{v}}}
\def\rw{{\textnormal{w}}}
\def\rx{{\textnormal{x}}}
\def\ry{{\textnormal{y}}}
\def\rz{{\textnormal{z}}}
% Random vectors
\def\rvepsilon{{\mathbf{\epsilon}}}
\def\rvtheta{{\mathbf{\theta}}}
\def\rva{{\mathbf{a}}}
\def\rvb{{\mathbf{b}}}
\def\rvc{{\mathbf{c}}}
\def\rvd{{\mathbf{d}}}
\def\rve{{\mathbf{e}}}
\def\rvf{{\mathbf{f}}}
\def\rvg{{\mathbf{g}}}
\def\rvh{{\mathbf{h}}}
\def\rvu{{\mathbf{i}}}
\def\rvj{{\mathbf{j}}}
\def\rvk{{\mathbf{k}}}
\def\rvl{{\mathbf{l}}}
\def\rvm{{\mathbf{m}}}
\def\rvn{{\mathbf{n}}}
\def\rvo{{\mathbf{o}}}
\def\rvp{{\mathbf{p}}}
\def\rvq{{\mathbf{q}}}
\def\rvr{{\mathbf{r}}}
\def\rvs{{\mathbf{s}}}
\def\rvt{{\mathbf{t}}}
\def\rvu{{\mathbf{u}}}
\def\rvv{{\mathbf{v}}}
\def\rvw{{\mathbf{w}}}
\def\rvx{{\mathbf{x}}}
\def\rvy{{\mathbf{y}}}
\def\rvz{{\mathbf{z}}}
% Elements of random vectors
\def\erva{{\textnormal{a}}}
\def\ervb{{\textnormal{b}}}
\def\ervc{{\textnormal{c}}}
\def\ervd{{\textnormal{d}}}
\def\erve{{\textnormal{e}}}
\def\ervf{{\textnormal{f}}}
\def\ervg{{\textnormal{g}}}
\def\ervh{{\textnormal{h}}}
\def\ervi{{\textnormal{i}}}
\def\ervj{{\textnormal{j}}}
\def\ervk{{\textnormal{k}}}
\def\ervl{{\textnormal{l}}}
\def\ervm{{\textnormal{m}}}
\def\ervn{{\textnormal{n}}}
\def\ervo{{\textnormal{o}}}
\def\ervp{{\textnormal{p}}}
\def\ervq{{\textnormal{q}}}
\def\ervr{{\textnormal{r}}}
\def\ervs{{\textnormal{s}}}
\def\ervt{{\textnormal{t}}}
\def\ervu{{\textnormal{u}}}
\def\ervv{{\textnormal{v}}}
\def\ervw{{\textnormal{w}}}
\def\ervx{{\textnormal{x}}}
\def\ervy{{\textnormal{y}}}
\def\ervz{{\textnormal{z}}}
% Random matrices
\def\rmA{{\mathbf{A}}}
\def\rmB{{\mathbf{B}}}
\def\rmC{{\mathbf{C}}}
\def\rmD{{\mathbf{D}}}
\def\rmE{{\mathbf{E}}}
\def\rmF{{\mathbf{F}}}
\def\rmG{{\mathbf{G}}}
\def\rmH{{\mathbf{H}}}
\def\rmI{{\mathbf{I}}}
\def\rmJ{{\mathbf{J}}}
\def\rmK{{\mathbf{K}}}
\def\rmL{{\mathbf{L}}}
\def\rmM{{\mathbf{M}}}
\def\rmN{{\mathbf{N}}}
\def\rmO{{\mathbf{O}}}
\def\rmP{{\mathbf{P}}}
\def\rmQ{{\mathbf{Q}}}
\def\rmR{{\mathbf{R}}}
\def\rmS{{\mathbf{S}}}
\def\rmT{{\mathbf{T}}}
\def\rmU{{\mathbf{U}}}
\def\rmV{{\mathbf{V}}}
\def\rmW{{\mathbf{W}}}
\def\rmX{{\mathbf{X}}}
\def\rmY{{\mathbf{Y}}}
\def\rmZ{{\mathbf{Z}}}
% Elements of random matrices
\def\ermA{{\textnormal{A}}}
\def\ermB{{\textnormal{B}}}
\def\ermC{{\textnormal{C}}}
\def\ermD{{\textnormal{D}}}
\def\ermE{{\textnormal{E}}}
\def\ermF{{\textnormal{F}}}
\def\ermG{{\textnormal{G}}}
\def\ermH{{\textnormal{H}}}
\def\ermI{{\textnormal{I}}}
\def\ermJ{{\textnormal{J}}}
\def\ermK{{\textnormal{K}}}
\def\ermL{{\textnormal{L}}}
\def\ermM{{\textnormal{M}}}
\def\ermN{{\textnormal{N}}}
\def\ermO{{\textnormal{O}}}
\def\ermP{{\textnormal{P}}}
\def\ermQ{{\textnormal{Q}}}
\def\ermR{{\textnormal{R}}}
\def\ermS{{\textnormal{S}}}
\def\ermT{{\textnormal{T}}}
\def\ermU{{\textnormal{U}}}
\def\ermV{{\textnormal{V}}}
\def\ermW{{\textnormal{W}}}
\def\ermX{{\textnormal{X}}}
\def\ermY{{\textnormal{Y}}}
\def\ermZ{{\textnormal{Z}}}
% Vectors
\def\vzero{{\bm{0}}}
\def\vone{{\bm{1}}}
\def\vmu{{\bm{\mu}}}
\def\vtheta{{\bm{\theta}}}
\def\va{{\bm{a}}}
\def\vb{{\bm{b}}}
\def\vc{{\bm{c}}}
\def\vd{{\bm{d}}}
\def\ve{{\bm{e}}}
\def\vf{{\bm{f}}}
\def\vg{{\bm{g}}}
\def\vh{{\bm{h}}}
\def\vi{{\bm{i}}}
\def\vj{{\bm{j}}}
\def\vk{{\bm{k}}}
\def\vl{{\bm{l}}}
\def\vm{{\bm{m}}}
\def\vn{{\bm{n}}}
\def\vo{{\bm{o}}}
\def\vp{{\bm{p}}}
\def\vq{{\bm{q}}}
\def\vr{{\bm{r}}}
\def\vs{{\bm{s}}}
\def\vt{{\bm{t}}}
\def\vu{{\bm{u}}}
\def\vv{{\bm{v}}}
\def\vw{{\bm{w}}}
\def\vx{{\bm{x}}}
\def\vy{{\bm{y}}}
\def\vz{{\bm{z}}}
% Elements of vectors
\def\evalpha{{\alpha}}
\def\evbeta{{\beta}}
\def\evepsilon{{\epsilon}}
\def\evlambda{{\lambda}}
\def\evomega{{\omega}}
\def\evmu{{\mu}}
\def\evpsi{{\psi}}
\def\evsigma{{\sigma}}
\def\evtheta{{\theta}}
\def\eva{{a}}
\def\evb{{b}}
\def\evc{{c}}
\def\evd{{d}}
\def\eve{{e}}
\def\evf{{f}}
\def\evg{{g}}
\def\evh{{h}}
\def\evi{{i}}
\def\evj{{j}}
\def\evk{{k}}
\def\evl{{l}}
\def\evm{{m}}
\def\evn{{n}}
\def\evo{{o}}
\def\evp{{p}}
\def\evq{{q}}
\def\evr{{r}}
\def\evs{{s}}
\def\evt{{t}}
\def\evu{{u}}
\def\evv{{v}}
\def\evw{{w}}
\def\evx{{x}}
\def\evy{{y}}
\def\evz{{z}}
% Matrix
\def\mA{{\bm{A}}}
\def\mB{{\bm{B}}}
\def\mC{{\bm{C}}}
\def\mD{{\bm{D}}}
\def\mE{{\bm{E}}}
\def\mF{{\bm{F}}}
\def\mG{{\bm{G}}}
\def\mH{{\bm{H}}}
\def\mI{{\bm{I}}}
\def\mJ{{\bm{J}}}
\def\mK{{\bm{K}}}
\def\mL{{\bm{L}}}
\def\mM{{\bm{M}}}
\def\mN{{\bm{N}}}
\def\mO{{\bm{O}}}
\def\mP{{\bm{P}}}
\def\mQ{{\bm{Q}}}
\def\mR{{\bm{R}}}
\def\mS{{\bm{S}}}
\def\mT{{\bm{T}}}
\def\mU{{\bm{U}}}
\def\mV{{\bm{V}}}
\def\mW{{\bm{W}}}
\def\mX{{\bm{X}}}
\def\mY{{\bm{Y}}}
\def\mZ{{\bm{Z}}}
\def\mBeta{{\bm{\beta}}}
\def\mPhi{{\bm{\Phi}}}
\def\mLambda{{\bm{\Lambda}}}
\def\mSigma{{\bm{\Sigma}}}
% Tensor
\DeclareMathAlphabet{\mathsfit}{\encodingdefault}{\sfdefault}{m}{sl}
\SetMathAlphabet{\mathsfit}{bold}{\encodingdefault}{\sfdefault}{bx}{n}
\newcommand{\tens}[1]{\bm{\mathsfit{#1}}}
\def\tA{{\tens{A}}}
\def\tB{{\tens{B}}}
\def\tC{{\tens{C}}}
\def\tD{{\tens{D}}}
\def\tE{{\tens{E}}}
\def\tF{{\tens{F}}}
\def\tG{{\tens{G}}}
\def\tH{{\tens{H}}}
\def\tI{{\tens{I}}}
\def\tJ{{\tens{J}}}
\def\tK{{\tens{K}}}
\def\tL{{\tens{L}}}
\def\tM{{\tens{M}}}
\def\tN{{\tens{N}}}
\def\tO{{\tens{O}}}
\def\tP{{\tens{P}}}
\def\tQ{{\tens{Q}}}
\def\tR{{\tens{R}}}
\def\tS{{\tens{S}}}
\def\tT{{\tens{T}}}
\def\tU{{\tens{U}}}
\def\tV{{\tens{V}}}
\def\tW{{\tens{W}}}
\def\tX{{\tens{X}}}
\def\tY{{\tens{Y}}}
\def\tZ{{\tens{Z}}}
% Graph
\def\gA{{\mathcal{A}}}
\def\gB{{\mathcal{B}}}
\def\gC{{\mathcal{C}}}
\def\gD{{\mathcal{D}}}
\def\gE{{\mathcal{E}}}
\def\gF{{\mathcal{F}}}
\def\gG{{\mathcal{G}}}
\def\gH{{\mathcal{H}}}
\def\gI{{\mathcal{I}}}
\def\gJ{{\mathcal{J}}}
\def\gK{{\mathcal{K}}}
\def\gL{{\mathcal{L}}}
\def\gM{{\mathcal{M}}}
\def\gN{{\mathcal{N}}}
\def\gO{{\mathcal{O}}}
\def\gP{{\mathcal{P}}}
\def\gQ{{\mathcal{Q}}}
\def\gR{{\mathcal{R}}}
\def\gS{{\mathcal{S}}}
\def\gT{{\mathcal{T}}}
\def\gU{{\mathcal{U}}}
\def\gV{{\mathcal{V}}}
\def\gW{{\mathcal{W}}}
\def\gX{{\mathcal{X}}}
\def\gY{{\mathcal{Y}}}
\def\gZ{{\mathcal{Z}}}
% Sets
\def\sA{{\mathbb{A}}}
\def\sB{{\mathbb{B}}}
\def\sC{{\mathbb{C}}}
\def\sD{{\mathbb{D}}}
% Don't use a set called E, because this would be the same as our symbol
% for expectation.
\def\sF{{\mathbb{F}}}
\def\sG{{\mathbb{G}}}
\def\sH{{\mathbb{H}}}
\def\sI{{\mathbb{I}}}
\def\sJ{{\mathbb{J}}}
\def\sK{{\mathbb{K}}}
\def\sL{{\mathbb{L}}}
\def\sM{{\mathbb{M}}}
\def\sN{{\mathbb{N}}}
\def\sO{{\mathbb{O}}}
\def\sP{{\mathbb{P}}}
\def\sQ{{\mathbb{Q}}}
\def\sR{{\mathbb{R}}}
\def\sS{{\mathbb{S}}}
\def\sT{{\mathbb{T}}}
\def\sU{{\mathbb{U}}}
\def\sV{{\mathbb{V}}}
\def\sW{{\mathbb{W}}}
\def\sX{{\mathbb{X}}}
\def\sY{{\mathbb{Y}}}
\def\sZ{{\mathbb{Z}}}
% Entries of a matrix
\def\emLambda{{\Lambda}}
\def\emA{{A}}
\def\emB{{B}}
\def\emC{{C}}
\def\emD{{D}}
\def\emE{{E}}
\def\emF{{F}}
\def\emG{{G}}
\def\emH{{H}}
\def\emI{{I}}
\def\emJ{{J}}
\def\emK{{K}}
\def\emL{{L}}
\def\emM{{M}}
\def\emN{{N}}
\def\emO{{O}}
\def\emP{{P}}
\def\emQ{{Q}}
\def\emR{{R}}
\def\emS{{S}}
\def\emT{{T}}
\def\emU{{U}}
\def\emV{{V}}
\def\emW{{W}}
\def\emX{{X}}
\def\emY{{Y}}
\def\emZ{{Z}}
\def\emSigma{{\Sigma}}
% entries of a tensor
% Same font as tensor, without \bm wrapper
\newcommand{\etens}[1]{\mathsfit{#1}}
\def\etLambda{{\etens{\Lambda}}}
\def\etA{{\etens{A}}}
\def\etB{{\etens{B}}}
\def\etC{{\etens{C}}}
\def\etD{{\etens{D}}}
\def\etE{{\etens{E}}}
\def\etF{{\etens{F}}}
\def\etG{{\etens{G}}}
\def\etH{{\etens{H}}}
\def\etI{{\etens{I}}}
\def\etJ{{\etens{J}}}
\def\etK{{\etens{K}}}
\def\etL{{\etens{L}}}
\def\etM{{\etens{M}}}
\def\etN{{\etens{N}}}
\def\etO{{\etens{O}}}
\def\etP{{\etens{P}}}
\def\etQ{{\etens{Q}}}
\def\etR{{\etens{R}}}
\def\etS{{\etens{S}}}
\def\etT{{\etens{T}}}
\def\etU{{\etens{U}}}
\def\etV{{\etens{V}}}
\def\etW{{\etens{W}}}
\def\etX{{\etens{X}}}
\def\etY{{\etens{Y}}}
\def\etZ{{\etens{Z}}}
% The true underlying data generating distribution
\newcommand{\pdata}{p_{\rm{data}}}
% The empirical distribution defined by the training set
\newcommand{\ptrain}{\hat{p}{\rm{data}}}
\newcommand{\Ptrain}{\hat{P}{\rm{data}}}
% The model distribution
\newcommand{\pmodel}{p_{\rm{model}}}
\newcommand{\Pmodel}{P_{\rm{model}}}
\newcommand{\ptildemodel}{\tilde{p}{\rm{model}}}
% Stochastic autoencoder distributions
\newcommand{\pencode}{p{\rm{encoder}}}
\newcommand{\pdecode}{p_{\rm{decoder}}}
\newcommand{\precons}{p_{\rm{reconstruct}}}
\newcommand{\laplace}{\mathrm{Laplace}} % Laplace distribution
\newcommand{\E}{\mathbb{E}}
\newcommand{\Ls}{\mathcal{L}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\emp}{\tilde{p}}
\newcommand{\lr}{\alpha}
\newcommand{\reg}{\lambda}
\newcommand{\rect}{\mathrm{rectifier}}
\newcommand{\softmax}{\mathrm{softmax}}
\newcommand{\sigmoid}{\sigma}
\newcommand{\softplus}{\zeta}
\newcommand{\KL}{D_{\mathrm{KL}}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\standarderror}{\mathrm{SE}}
\newcommand{\Cov}{\mathrm{Cov}}
% Wolfram Mathworld says is for function spaces and is for vectors
% But then they seem to use for vectors throughout the site, and so does
% wikipedia.
\newcommand{\normlzero}{L^0}
\newcommand{\normlone}{L^1}
\newcommand{\normltwo}{L^2}
\newcommand{\normlp}{L^p}
\newcommand{\normmax}{L^\infty}
\newcommand{\parents}{Pa} % See usage in notation.tex. Chosen to match Daphne's book.
\DeclareMathOperator*{\argmax}{arg,max}
\DeclareMathOperator*{\argmin}{arg,min}
\DeclareMathOperator{\sign}{sign}
\DeclareMathOperator{\Tr}{Tr}
\let\ab\allowbreak
下面这篇中文技术博客将帮助大家理解论文「QGen: On the Ability to Generalize in Quantization Aware Training」的主要内容及其核心原理。博客分为几个部分,依次介绍背景、核心理论、实验与结论,力求让读者充分掌握论文所研究的「量化模型的泛化能力」及相关细节。
随着深度学习模型规模不断增大,模型的计算开销和存储需求也愈发庞大,给实际部署带来挑战。**模型量化(Quantization)**是一种常用的模型压缩与加速技术:它通过用更少的比特数(如 8-bit、4-bit 或更低)来表示模型的权重和/或激活,进而达到降低模型存储、能耗与推理时延的目的。
在学术与工业界,量化已经成为部署深度模型时的重要工具。例如在视觉 Transformer、移动端的CNN等场景,都有大量研究工作探讨如何将全精度 32 位(FP32)模型量化到 8-bit 或更低精度。
以往量化的研究多集中在“如何降低比特宽度却维持较高推理准确率”或“量化对收敛和稳定性有何影响”上,却较少有工作深入研究量化对模型「泛化能力」的影响。
本论文提出并回答了以下核心问题:
量化(尤其是较低比特的量化)是否会帮助或损害模型的泛化能力?其背后的原理与机理何在?
结论令人惊喜:适度量化在大部分情况下能改善模型的泛化表现,尤其体现在模型收敛到“更平坦的最小值(flatter minima)”,从而在不见过的测试数据乃至失真/干扰数据上表现出更好的抗干扰能力。
在经典的「量化感知训练(Quantization Aware Training, QAT)」流程中,我们用少比特表示网络中的某些层或权重,如下式所示(以LSQ量化为例):
我们令量化的「步长(step size)」或「量化 bin 宽度」为 。通常比特数越低, 越大,量化噪声也越强。这与直觉一致:当只用极少离散值去表示权重时,量化时会产生更大的舍入误差。
在论文的理论部分,通过对均方误差损失的线性近似展开,可以得到下面关键结论:对于一个最小化 的深度网络,当我们在权重 上叠加零均值的「量化噪声」 时,模型的训练目标近似变为:
这说明:
因此,我们可以把量化看作是一种**「可调节强度」的隐式正则化**,比特宽度便是这个正则化“力度”的超参数。
在深度学习的泛化研究中,有一类重要的经验性结论:如果一个模型在参数空间的某个极小值周围的损失曲面比较平坦(Flat),那么该模型通常有更好的泛化性能;若曲面在局部极小值附近非常陡峭(Sharp),则更易过拟合。
论文指出,当量化噪声 有界时(如),它能在训练中迫使模型收敛到更平坦的区域。也就是说,量化像在训练中不断“晃动”参数,避免网络掉入极陡峭的局部极小点。
作者通过对大量CNN和Transformer模型的实验,利用:
下图给出了一个直观示例(损失面可视化),可以看到 4-bit、2-bit 量化的网络在局部极小值附近的损失曲面更平缓。
作者在 CIFAR-10、CIFAR-100 以及 ImageNet-1K 三个数据集上,共训练了近 2000 个模型(包括不同网络结构、不同量化精度及不同超参数组合),用来系统分析量化与泛化之间的关系。
一个典型结果如下表所示(摘自论文部分实验):
模型 | 数据集 | 精度 | 训练准确率 | 测试准确率 | 训练Loss | 测试Loss | 泛化差距 |
---|---|---|---|---|---|---|---|
ResNet-18 | ImageNet-1K | FP32 | 69.96 | 71.49 | 1.18 | 2.23 | 1.05 |
Int8 | 73.23 | 73.32 | 1.28 | 2.10 | 0.82 | ||
Int4 | 71.34 | 71.74 | 1.26 | 2.18 | 0.92 | ||
Int2 | 67.10 | 68.58 | 1.38 | 2.16 | 0.78 |
论文还测试了模型在「CIFAR-100-C」和「ImageNet-C」等失真版本数据集上的表现,包括高斯噪声、模糊、对比度变化、亮度变化等 15 种典型扰动。结果显示,量化模型在大部分扰动级别下的泛化差距更小,性能更稳定,如下(仅示例,完整见论文):
扰动类型 | Severity=1 | Severity=3 | ... |
---|---|---|---|
FP32 | 1.24 | 3.80 | ... |
8-bit | 0.96 | 2.75 | ... |
4-bit | 1.05 | 2.99 | ... |
2-bit | 1.33 | 3.68 | ... |
其中绿色高亮往往表示优于 FP32 的表现。
通过一系列理论分析与实证研究,论文给出的主要结论是:
关键启示:以往大家谈量化更多关注“推理效率”或“模型精度损失”,而本论文显示,量化还能带来一定的正则化收益,提高模型在复杂或失真场景下的稳健性,这对于部署在安全关键场景(如自动驾驶、医疗诊断)是一个非常值得利用的特性。
总结:
本论文系统地从理论和大规模实验证明了量化训练对于模型泛化的正面作用。它既能帮助模型在训练损失和泛化误差之间取得更好平衡,也能在干扰场景下保持稳健。如果你在模型部署中使用量化,不妨关注一下它带来的正则化收益,说不定能得到“又小又强”的好模型!