%======================== \section{Construction of...

Question

%========================
\section{Construction of Dataset}
%========================

Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$ .
Let $t_n$ denote the $n$ -th sampling instant and $\delta_n=t_{n+1}-t_n$ the corresponding sampling interval length.
Define the local within-interval time variable $\tau=t-t_n\in[0,\delta_n]$ .

Let $h_i(t)$ denote the exit thickness of stand $i$ ( $i=1,\dots,5$ ),
and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ( $i=1,\dots,4$ ).
The system is strongly coupled because the inter-stand tensions propagate along the mill line and are affected by neighboring stands' actions.

Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be the desired references (setpoints) given by process requirements. We define deviation variables
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\label{eq:dev_def}
\end{equation}
Throughout this paper, the symbol `` $\Delta$ '' attached to states always means deviation from reference.

For each stand $i$ , we choose the local deviation state as
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\
\Delta T_{i-1}(t)\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3,
\label{eq:xi_def_clean}
\end{equation}
with the boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$ to keep a unified dimension $d=3$ for all stands.

For a five-stand tandem mill, the dominant coupling is between adjacent stands, hence we define
\begin{equation}
Z_1={2},\quad
Z_i={i-1,i+1}\ (i=2,3,4),\quad
Z_5={4}.
\label{eq:Zi_clean}
\end{equation}
Define the neighbor-state stack
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}{\Delta x_k(t_n),|,k\in Z_i}.
\label{eq:xZi_clean}
\end{equation}

Each stand $i$ is manipulated by roll gap $s_i(t)$ and stand speed $v_i(t)$ :
\begin{equation}
u_i(t)=
\begin{bmatrix}
s_i(t)\
v_i(t)
\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_clean}
\end{equation}
To ensure smooth actuation and match industrial practice, we optimize \emph{discrete input increments}:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})

\begin{bmatrix}
\Delta s_i(t_n)\
\Delta v_i(t_n)
\end{bmatrix}.
\label{eq:du_discrete_clean}
\end{equation}
Throughout this paper, the symbol `` $\Delta$ '' attached to inputs $\Delta u_i(t_n)$ means sample-to-sample increment.
Thus, $\Delta x$ and $\Delta u$ are conceptually different, and this is fixed by definition.

Let $d_i(t)$ denote exogenous disturbances.
We denote the interval-level equivalent disturbance by $\Delta d_i(t_n)$ .
$I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes the $a\times b$ zero matrix.

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})

\Phi_i\Big(\Delta x_i(t_n),,\Delta x_{Z_i}(t_n),,\Delta u_i([t_n,t_{n+1}]),,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:true_mapping_clean}
\end{equation}
where $\Phi_i(\cdot)$ is generally nonlinear and coupled due to rolling deformation and tension propagation.
A commonly used conceptual equivalent discrete linear form is
\begin{equation}
\Delta x_i(t_{n+1})

M_d,\Delta x_i(t_n)
+
N_d,\Delta u_i(t_n)
+
F_d,\Delta d_i(t_n),
\label{eq:linear_form_concept}
\end{equation}
where $M_d,N_d,F_d$ represent equivalent discrete-time matrices around operating conditions.
In a practical five-stand cold rolling mill, accurately deriving and identifying these matrices and disturbance models from first principles is difficult,
due to strong coupling, unmodeled nonlinearities, and time-varying operating regimes.
Therefore, this paper aims to learn a high-fidelity approximation of the interval evolution from data and then embed it into distributed MPC.

\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish \eqref{eq:linear_form_concept}
based on first principles. Therefore, in this paper, we learn an approximate mapping of \eqref{eq:true_mapping_clean} from data.
\end{remark}

Although decisions are updated at discrete instants $t_n$ , the hydraulic gap and drive systems evolve continuously inside each interval,
and abrupt within-interval changes may excite tension oscillations and deteriorate thickness stability.
Thus, parameterizing the within-interval increment trajectory by a low-order polynomial:
(i) yields a compact finite-dimensional decision representation;
(ii) enforces smooth profiles inside the interval;
(iii) enables enforcing increment constraints for all $\tau\in[0,\delta_n]$ .
This is appropriate when $\delta_n$ is not excessively large relative to actuator bandwidth and the within-interval evolution is well approximated by a low-order basis.

On the interval $[t_n,t_{n+1}]$ , parameterize the control increment trajectory as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})

\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,
\qquad \tau\in[0,\delta_n],
\label{eq:du_poly_vec_clean}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ are coefficient vectors ( $n_u=2$ ).
Component-wise, \eqref{eq:du_poly_vec_clean} corresponds to
\begin{equation}
\begin{aligned}
\Delta s_{i,n}(\tau) &= \gamma^{(s)}{i,n0}+\gamma^{(s)}{i,n1}\tau+\gamma^{(s)}{i,n2}\tau^2,\
\Delta v{i,n}(\tau) &= \gamma^{(v)}{i,n0}+\gamma^{(v)}{i,n1}\tau+\gamma^{(v)}{i,n2}\tau^2.
\end{aligned}
\label{eq:du_components_clean}
\end{equation}
Define the stacked parameter vector
\begin{equation}
\Gamma{i,n}\triangleq
\big[
(\Gamma_{i,n0})^\top,,
(\Gamma_{i,n1})^\top,,
(\Gamma_{i,n2})^\top
\big]^\top
\in\mathbb{R}^{p},
\qquad
p=3n_u=6.
\label{eq:Gamma_clean}
\end{equation}
Here, $\Gamma_{i,n0}$ is the baseline increment at $\tau=0$ , while $\Gamma_{i,n1}$ and $\Gamma_{i,n2}$ describe the linear and quadratic variation rates.

Define the interval-averaged equivalent increments as
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau),d\tau,\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau),d\tau.
\end{aligned}
\label{eq:avg_def_clean}
\end{equation}
With \eqref{eq:du_poly_vec_clean}, the input average has a closed form:
\begin{equation}
\Delta u_i(t_n)=
\Gamma_{i,n0}
+\Gamma_{i,n1}\frac{\delta_n}{2}
+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:avg_closed_clean}
\end{equation}

Let $\mathcal{I}_x$ denote the sampling domain of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$ ,
and let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$ .
These domains specify the operating envelope used to generate supervised training data.

Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$ .
In addition to the local deviation state, the neighbor deviation states are included to represent inter-stand coupling.
The process is summarized in Table~\ref{tab:interval_sample_generation_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \
\midrule
1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$ . \
2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$ ). \
3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via \eqref{eq:du_poly_vec_clean}. \
4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$ . \
\bottomrule
\end{tabularx}
\end{table}

Accordingly, an interval sample for subsystem $i$ can be represented as
\begin{equation}
\mathcal{D}{i,n}=\big{\Delta x_i(t_n),\ \Delta x{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big}.
\label{eq:interval_sample_clean}
\end{equation}
Note that $\Delta u_{i,n}(\tau)$ is fully determined by $(\Gamma_{i,n},\delta_n)$ via \eqref{eq:du_poly_vec_clean},
therefore it is sufficient to store $(\Gamma_{i,n},\delta_n)$ as the learning input.

For each subsystem $i$ , by repeating the above procedure across multiple intervals and randomized draws,
the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big{&
\big(\Delta x_i^{(j)}(t_n),,\Delta x_{Z_i}^{(j)}(t_n),,\Delta x_i^{(j)}(t_{n+1});,
\Gamma_{i,n}^{(j)},,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big}.
\end{split}
\label{eq:S_i_clean}
\end{equation}
Here $J$ is the number of one-step samples for subsystem $i$ .
The overall dataset for the five-stand mill is denoted by $\{S_i\}_{i=1}^{5}$ .
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.

\begin{figure*}[htbp]
\centering
\includegraphics[scale=0.5]{picture/Fig2.pdf}
\caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}

The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth deviation-state trajectories over a horizon of $K$ consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into $K$ -step trajectory segments.

Specifically, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling
$\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances),
and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$ .
Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks
$\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$ .

Define a $K$ -step segment sample for subsystem $i$ as
\begin{equation}
\begin{aligned}
\mathcal{W}{i,n}=
\Big{&
\big(\Delta x_i(t{n+s}),,\Delta x_{Z_i}(t_{n+s}),,\Gamma_{i,n+s},,\delta_{n+s}\big){s=0}^{K-1}; \
&\big(\Delta x_i(t{n+s+1})\big){s=0}^{K-1}
\Big}.
\end{aligned}
\label{eq:segment_clean}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big{\mathcal{W}{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big},
\label{eq:S_i_K_clean}
\end{equation}
where $J_K$ is the number of $K$ -step segment samples.
Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (keeping only $s=0$ ),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.

%========================
\section{Construction of Residual Neural Network}
%========================
\subsection{Residual Neural Network Structure Construction and Training Method}
Given the dataset, the neural network model is trained to learn a stand-wise, control-dependent one-step evolution law of deviation states:
\begin{equation}
\Delta x_i(t_{n+1})
\approx
\Delta x_i(t_n)+
\mathcal{N}i!\Big(\Delta x_i(t_n),,\Delta x{Z_i}(t_n),,\Gamma_{i,n},,\delta_n;,\Theta_i\Big),
\label{eq:learned_dyn_clean}
\end{equation}
where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are trainable parameters.

\begin{remark}
If $\mathcal{N}_i$ does not take control information as input (here $\Gamma_{i,n}$ and $\delta_n$ ),
the predictor becomes an autoregressive model that only reproduces trajectories under the training input patterns
and cannot answer the counterfactual question: ``what will happen if we choose a different roll gap and speed trajectory?''
Since MPC optimizes over candidate decisions, a control-dependent predictor \eqref{eq:learned_dyn_clean} is necessary
to evaluate the predicted thickness and tension behavior under different candidate actuator trajectories.
\end{remark}

Let $d=3$ (state dimension), $|Z_i|$ be the number of neighbors of stand $i$ in \eqref{eq:Zi_clean}, and $p=6$ in \eqref{eq:Gamma_clean}.
Define the input vector
\begin{equation}
X_{i,\text{in}} \triangleq
\big[
\Delta x_i(t_n)^\top,,
\Delta x_{Z_i}(t_n)^\top,,
\Gamma_{i,n}^\top,,
\delta_n
\big]^\top
\in \mathbb{R}^{d(1+|Z_i|)+p+1}.
\label{eq:X_in_clean}
\end{equation}
The network mapping is
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}.
\end{equation}

To improve training stability and long-horizon rollout robustness, we use a residual form.
Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a selection matrix extracting the local state block:
\begin{equation}
\hat{I}i = [I_d,, 0{d\times(d|Z_i|+p+1)}].
\label{eq:Ihat_clean}
\end{equation}
Then the one-step predictor is written as
\begin{equation}
X_{i,\text{out}} = \hat{I}i X{i,\text{in}} + \mathcal{N}i(X{i,\text{in}}; \Theta_i),
\label{eq:res_predict_clean}
\end{equation}
where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$ .
This structure implements a baseline-plus-correction interpretation:
the shortcut propagates the current deviation state $\Delta x_i(t_n)$ , while the network learns the correction capturing
unmodeled nonlinearities and inter-stand coupling (via $\Delta x_{Z_i}$ ) under varying operating conditions.

To improve robustness when $\delta_n$ varies, we introduce an auxiliary branch inside $\mathcal{N}_i$ :
\begin{equation}
\mathcal{N}i(X{i,\text{in}};\Theta_i)\triangleq
\psi_i(X_{i,\text{in}};\Theta_{\psi_i}) + \rho_i(X_{i,\text{in}};\theta_i),
\label{eq:aux_clean}
\end{equation}
where $\psi_i(\cdot)$ is a lightweight feedforward branch that captures low-frequency/scale effects strongly related to $\delta_n$ ,
and $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections.
When $\psi_i(\cdot)\equiv 0$ , the model reduces to a standard residual network.

For the $j$ -th sample in \eqref{eq:S_i_clean}, define
\begin{equation}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top},
\end{equation}
and the supervised residual target
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\label{eq:target_clean}
\end{equation}

To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability,
we train the forward predictor jointly with an auxiliary backward residual model
and impose a multi-step reciprocal-consistency regularization over a $K$ -step segment from $S_i^{(K)}$ .

Construct a backward residual network
\begin{equation}
\mathcal{B}i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$ . For the backward step associated with interval $[t_n,t_{n+1}]$ , define
\begin{equation}
\begin{aligned}
X{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),
\Gamma_{i,n},\ \delta_n
\big]^{\top},\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}i X{i,\mathrm{in}}^{b} + \mathcal{B}i(X{i,\mathrm{in}}^{b};\bar{\Theta}i),
\end{aligned}
\label{eq:back_clean}
\end{equation}
where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$ .
The supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t{n+1}).
\end{equation}

Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$ , initialize
\begin{equation}
\Delta \hat{x}i(t_n)=\Delta x_i(t_n),
\end{equation}
and recursively apply the forward predictor for $K$ steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1})
&=
\Delta \hat{x}i(t{n+s})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s}),,\Delta \hat{x}{Z_i}(t{n+s}),,
\Gamma_{i,n+s},,\delta_{n+s};,\Theta_i
\Big),\
&\qquad s=0,\ldots,K-1.
\end{aligned}
\label{eq:fwd_roll_clean}
\end{equation}

Set the terminal condition
\begin{equation}
\Delta \bar{x}i(t{n+K})=\Delta \hat{x}i(t{n+K}),
\end{equation}
and roll back using $\mathcal{B}_i$ :
\begin{equation}
\begin{aligned}
\Delta \bar{x}i(t{n+s})
&=
\hat{I}i X{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}i!\Big(X{i,\mathrm{in}}^{b}(t_{n+s});,\bar{\Theta}i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\label{eq:bwd_roll_clean}
\end{equation}
where
\begin{equation}
X{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}i(t{n+s+1}),\ \Delta \hat{x}{Z_i}(t{n+s+1}),
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top}.
\end{equation}

Define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)

\sum_{s=0}^{K}
\left|
\Delta \hat{x}i(t{n+s})-\Delta \bar{x}i(t{n+s})
\right|^2.
\end{equation}

We jointly minimize:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}i!\left(
X{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big|^2,\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}i!\left(
X{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}i
\right)
\Big|^2,\[2mm]
L{\mathrm{msrp}}(\Theta_i,\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K} E_i^{(j)}(t_n),\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}i^{(j)}(t{n+s})
\Big|^2.
\end{aligned}
\label{eq:loss_clean}
\end{equation}
Here, $L_{\mathrm{1step}}$ enforces one-step accuracy; $L_{\mathrm{roll}}$ explicitly suppresses long-horizon drift under recursion;
$L_{\mathrm{msrp}}$ regularizes the learned dynamics by enforcing reciprocal consistency between forward and backward rollouts;
and $L_{\mathrm{bwd}}$ trains the backward model for the consistency regularization.
In implementation, these terms are combined as
\begin{equation}
L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}},
\end{equation}
where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set.

After training, the forward predictor is
\begin{equation}
\Delta \hat{x}i(t{n+1})

\Delta x_i(t_n)
+
\mathcal{N}i!\Big(
\Delta x_i(t_n),,\Delta x{Z_i}(t_n),,
\Gamma_{i,n},,\delta_n;,\Theta_i^*
\Big),
\label{eq:pred_clean}
\end{equation}
and multi-step prediction is obtained by recursive rollout of \eqref{eq:pred_clean}.
This learned predictor is the internal model used by the MPC optimizer in the next section.

Finally, network parameters are optimized using Adam:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}{i,t}}{\sqrt{\hat{v}{i,t}} + \varepsilon},
\end{equation}
where $\alpha$ is the learning rate (we use $\alpha$ to avoid conflict with other symbols),
$\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected moment estimates, and $\varepsilon>0$ is a small constant for numerical stability.
Figure~\ref{fig:rnn_logic} illustrates the overall structure.

\begin{figure}[htbp]
\centering
\includegraphics[scale=0.85]{picture/x6.pdf}
\caption{Logic diagram of the residual neural network.}
\label{fig:rnn_logic}
\end{figure}

\subsection{Explainability of the residual network}

Although the five-stand cold rolling mill involves complex rolling deformation and tension-transport coupling, its stand-wise deviation dynamics can be abstractly described by a coupled nonlinear ODE:
\begin{equation}
\frac{d}{dt}\Delta x_i(t)

f_i!\Big(\Delta x_i(t),,\Delta x_{Z_i}(t),,u_i(t),,d_i(t)\Big),
\qquad i=1,\ldots,5,
\label{eq:mech_ode}
\end{equation}
where $f_i(\cdot)$ summarizes the mechanistic effects.
For a given sampling interval length $\delta_n$ and a within-interval input trajectory $u_i(t_n+\tau)$ ,
the state transition over one interval can be written through an evolution operator:
\begin{equation}
\Delta x_i(t_{n+1})=\Phi_{i,\delta_n}\Big(\Delta x_i(t_n),,\Delta x_{Z_i}(t_n),,u_i([t_n,t_{n+1}]),,d_i([t_n,t_{n+1}])\Big).
\label{eq:evolution_operator_mill}
\end{equation}
By the fundamental theorem of calculus, \eqref{eq:mech_ode} implies the increment form
\begin{equation}
\Delta x_i(t_{n+1})

\Delta x_i(t_n)
+
\underbrace{\int_{0}^{\delta_n}
f_i!\Big(\Delta x_i(t_n+\tau),,\Delta x_{Z_i}(t_n+\tau),,u_i(t_n+\tau),,d_i(t_n+\tau)\Big),d\tau}{\triangleq\ \varphi{i,n}},
\label{eq:increment_integral}
\end{equation}
where $\varphi_{i,n}$ is the one-interval state increment generated by the mechanistic dynamics.

Our learned model \eqref{eq:learned_dyn_clean} adopts the same increment form as \eqref{eq:increment_integral}:
\begin{equation}
\Delta x_i(t_{n+1})
\approx
\Delta x_i(t_n)
+
\mathcal{N}i!\Big(\Delta x_i(t_n),,\Delta x{Z_i}(t_n),,\Gamma_{i,n},,\delta_n;\Theta_i\Big).
\end{equation}
Here, $\mathcal{N}_i(\cdot)$ plays the role of a data-driven approximation of the integral increment $\varphi_{i,n}$ ,
i.e., it approximates the accumulated effect of the mechanistic dynamics over $[t_n,t_{n+1}]$ .
This is consistent with the well-known interpretation that a residual network behaves like a one-step time integrator:
the identity path propagates the current state, while the residual branch represents the increment over the time lag.

Inside each interval, we do not optimize point-wise $u_i(t)$ but parameterize the increment trajectory by
$\Delta u_{i,n}(\tau;\Gamma_{i,n})$ .
Hence, the mechanistic increment $\varphi_{i,n}$ in \eqref{eq:increment_integral} depends on the \emph{whole} within-interval trajectory.
Feeding $(\Gamma_{i,n},\delta_n)$ into $\mathcal{N}_i$ is therefore a compact way to represent how different candidate
gap/speed trajectories change the integral effect and thus the next thickness--tension state.

When $\delta_n$ varies and is not very small, directly learning $\varphi_{i,n}$ may be harder.
Motivated by the generalized residual idea,
we decompose the increment predictor into two parts in \eqref{eq:aux_clean}:
\begin{equation}
\mathcal{N}_i(\cdot)=\psi_i(\cdot)+\rho_i(\cdot).
\end{equation}
Conceptually, $\psi_i(\cdot)$ captures low-frequency and scale effects strongly related to $\delta_n$ ,
while $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections.
This provides a mechanism-consistent interpretation: a baseline increment plus a residual correction
that compensates unmodeled nonlinearities and inter-stand coupling.

%========================
\section{Nash Equilibrium-Based RNE-DMPC}
%========================

The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in control actions (roll gap and stand speed) at one stand can affect both upstream and downstream stands,
making centralized online optimization over all stands' decision variables computationally demanding.

To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate distributed coordination as a Nash-equilibrium-seeking iteration.
Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed MPC method (RNE-DMPC)
for coordinated thickness--tension regulation and tracking. The overall control structure is shown in Figure~\ref{4}.

\begin{figure*}[htbp]
\centering
\includegraphics[width=\linewidth]{picture/x2.pdf}
\caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}

At sampling time $t_n$ , stand $i$ chooses the polynomial-parameter sequence
$\mathbf{\Gamma}_i(t_n)\in\mathbb{R}^{pN_c}$ , where $p=6$ .
Let $\mathbf{\Gamma}(t_n)\triangleq \mathrm{col}\{\mathbf{\Gamma}_1(t_n),\ldots,\mathbf{\Gamma}_5(t_n)\}$
denote the joint strategy profile, and let $\mathbf{\Gamma}_{-i}(t_n)$ denote the collection of all strategies except stand $i$ .

Given the current measured/estimated deviation state $\Delta x_i(t_n)$ and the strategies
$(\mathbf{\Gamma}_i(t_n),\mathbf{\Gamma}_{Z_i}(t_n))$ ,
the multi-step prediction used by stand $i$ is written explicitly as
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1};\mathbf{\Gamma}i,\mathbf{\Gamma}{Z_i})
&=
\Delta \hat{x}i(t{n+s};\mathbf{\Gamma}i,\mathbf{\Gamma}{Z_i})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s};\cdot),,
\Delta \hat{x}{Z_i}(t_{n+s};\mathbf{\Gamma}{Z_i}),\
&\qquad
\Gamma{i,n+s},,
\delta_{n+s};\Theta_i^*
\Big),
\end{aligned}
\label{eq:rollout_mpc_game}
\end{equation}
for $s=0,\ldots,N_p-1$ , with initialization $\Delta \hat{x}_i(t_n;\cdot)=\Delta x_i(t_n)$ .
Here the neighbor stack $\Delta \hat{x}_{Z_i}(t_{n+s};\mathbf{\Gamma}_{Z_i})$ is generated from neighbors' strategies via the same learned predictors.

Over $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$ ,
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})

\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\tau
+\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}],
\end{equation}
and the interval-averaged increment is
\begin{equation}
\Delta u_i(t_{n+s})

\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3}.
\label{eq:du_avg_clean}
\end{equation}

\begin{remark}
Because inter-stand tension $T_i$ is jointly affected by the adjacent stands $i$ and $i+1$ ,
the predicted evolution of $\Delta x_i$ depends on neighbors' future actions,
hence the MPC problems are not independent but form a coupled dynamic game.
\end{remark}

At time $t_n$ , the local strategy of stand $i$ is
\begin{equation}
\mathbf{\Gamma}_i(t_n)

\mathrm{col}{\Gamma_{i,n},\Gamma_{i,n+1},\ldots,\Gamma_{i,n+N_c-1}}
\in \mathbb{R}^{pN_c}.
\end{equation}

In deviation coordinates, the regulation/tracking objective is $\Delta x_i(t)\rightarrow 0$ , i.e.
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d},\qquad d=3.
\end{equation}

Recall $\Delta x_i=[\Delta h_i,\Delta T_{i-1},\Delta T_i]^\top$ .
Define the row selectors
\begin{equation}
C^- \triangleq [0\ \ 1\ \ 0]\in\mathbb{R}^{1\times 3},\qquad
C^+ \triangleq [0\ \ 0\ \ 1]\in\mathbb{R}^{1\times 3},
\end{equation}
so that $C^- \Delta x_i=\Delta T_{i-1}$ (upstream interface) and $C^+\Delta x_i=\Delta T_i$ (downstream interface).

For the interface between stands $i$ and $i+1$ ,
stand $i$ 's prediction provides $C^+\Delta \hat{x}_i$ , while stand $i+1$ 's prediction provides $C^- \Delta \hat{x}_{i+1}$ .
Their mismatch measures coupling inconsistency:
\begin{equation}
e_{i}^{\mathrm{sh}}(t_{n+s};\mathbf{\Gamma})
\triangleq
C^+\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

C^-\Delta \hat{x}{i+1}(t{n+s};\mathbf{\Gamma}),\qquad i=1,\ldots,4.
\label{eq:shared_tension_mismatch}
\end{equation}

We define the stage cost of stand $i$ as a function of all players' strategies:
\begin{equation}
J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i})

\sum_{s=1}^{N_p}
\left|
\Delta \hat{x}i(t{n+s};\mathbf{\Gamma}i,\mathbf{\Gamma}{Z_i})
\right|{Q_i}^{2}
+
\sum{s=0}^{N_c-1}
\left|\Gamma_{i,n+s}\right|_{R_i}^{2}
+
J_i^{\mathrm{cpl}}(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i})
\label{eq:Ji_game}
\end{equation}
where $Q_i\succeq 0$ weights thickness and tension deviations, and $R_i\succeq 0$ penalizes actuation magnitudes.

The coupling term $J_i^{\mathrm{cpl}}$ explicitly reflects the game/coordination requirement on shared tensions.
A simple and effective choice is to penalize the interface mismatches adjacent to stand $i$ :
\begin{equation}
\begin{aligned}
J_i^{\mathrm{cpl}}
&=
\mu_i\sum_{s=1}^{N_p}
\Big(
\mathbb{I}_{{i\ge 2}}
\big|
C^-\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

C^+\Delta \hat{x}{i-1}(t{n+s};\mathbf{\Gamma})
\big|^2 \
&\qquad\quad
+
\mathbb{I}_{{i\le 4}}
\big|
C^+\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

C^-\Delta \hat{x}{i+1}(t{n+s};\mathbf{\Gamma})
\big|^2
\Big),
\end{aligned}
\label{eq:coupling_cost}
\end{equation}
with $\mu_i>0$ and indicator $\mathbb{I}_{\{\cdot\}}$ .
This term makes the coupling conflict explicit: unilateral actions that locally reduce thickness error may worsen shared-tension
compatibility and thus increase $J_i$ , and also affect neighbors' objectives.

We enforce the absolute-input bounds and within-interval increment bounds.

Absolute input bounds (roll gap and speed):
\begin{equation}
u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},
\qquad s=0,\ldots,N_p-1,
\label{eq:u_abs_game}
\end{equation}
where $u_i(t)=[s_i(t),\,v_i(t)]^\top$ .

Within-interval increment-trajectory bounds:
\begin{equation}
\Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max},
\qquad \forall\tau\in[0,\delta_{n+s}],\ s=0,\ldots,N_p-1,
\label{eq:du_traj_game}
\end{equation}
where $\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})$ is given by the polynomial parameterization.

Consistency with discrete execution:
Define the interval-averaged increment
\begin{equation}
\Delta u_i(t_{n+s})

\frac{1}{\delta_{n+s}}\int_{0}^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}),d\tau

\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3},
\label{eq:du_avg_game}
\end{equation}
and propagate the absolute input along the horizon by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),\qquad
u_i(t_{n+s})=u_i(t_{n+s-1})+\Delta u_i(t_{n+s}),\ s=1,\ldots,N_p-1,
\label{eq:u_prop_game}
\end{equation}
where $u_i(t_{n-1})$ is the applied (measured) input from the previous sampling instant.

Compact feasible set:
\begin{equation}
\Omega_i \triangleq
\Big{\mathbf{\Gamma}_i\ \Big|\
\eqref{eq:rollout_mpc_game}\ \text{holds and}
\eqref{eq:u_abs_game},\eqref{eq:du_traj_game},\eqref{eq:u_prop_game}\ \text{are satisfied}
\Big}.
\label{eq:Omega_i_game}
\end{equation}

Given neighbors' current strategies,
stand $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{\mathrm{BR}}

\arg\min_{\mathbf{\Gamma}_i\in\Omega_i}\
J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i}).
\label{eq:local_BR}
\end{equation}
Because the learned surrogate is differentiable, \eqref{eq:local_BR} can be solved by standard gradient-based NLP solvers.

At each sampling time $t_n$ , the distributed MPC coordination induces a finite-horizon dynamic game:
players are stands $i=1,\ldots,5$ ; strategy sets are $\Omega_i$ ; and payoff (cost) functions are $J_i(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i})$
defined in \eqref{eq:Ji_game}--\eqref{eq:coupling_cost}.

A joint strategy profile $\mathbf{\Gamma}^*=\mathrm{col}\{\mathbf{\Gamma}_1^*,\ldots,\mathbf{\Gamma}_5^*\}$
is a Nash equilibrium if
\begin{equation}
\forall i\in{1,\ldots,5},\qquad
\mathbf{\Gamma}i^*\in
\arg\min{\mathbf{\Gamma}_i\in\Omega_i}
J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i}^*).
\label{eq:NE_def}
\end{equation}
This definition explicitly characterizes the strategic coupling:
each player's optimal decision depends on neighbors' decisions through the shared-tension dynamics and the coupling term.

To compute an NE online with limited communication, we employ a relaxed best-response iteration.
Let $l$ denote the Nash-iteration index.
Given $\mathbf{\Gamma}^{(l-1)}$ , each stand computes a best response $\mathbf{\Gamma}_i^{\mathrm{BR},(l)}$
by solving \eqref{eq:local_BR}, and then updates with relaxation:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}

(1-\omega)\mathbf{\Gamma}_i^{(l-1)}
+
\omega,\mathbf{\Gamma}_i^{\mathrm{BR},(l)},
\qquad \omega\in(0,1].
\label{eq:relaxed_BR}
\end{equation}
The relaxation factor $\omega$ mitigates oscillations caused by strong coupling and improves practical convergence.

The Nash equilibrium is computed through distributed best-response iterations, summarized in Table ~\ref{tab:nash_iter_en}.

The convergence metric in Step F is defined as
\begin{equation}
\varsigma^{(l)}

\max_i
\frac{\left|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right|_2}{
\left|
\mathbf{\Gamma}_i^{(l-1)}
\right|_2+\epsilon},
\end{equation}
with $\epsilon>0$ small.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Relaxed distributed Nash best-response iteration for RNE-DMPC (five-stand).}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \
\midrule
A &
Initialize $l=1$ and warm-start $\mathbf{\Gamma}_i^{(0)}$ (e.g., from the previous sampling time). \

B &
Communicate $\mathbf{\Gamma}_i^{(l-1)}$ (or the induced predicted trajectories) among neighbors; form $\mathbf{\Gamma}_{Z_i}^{(l-1)}$ . \

C &
Given $\mathbf{\Gamma}_{-i}^{(l-1)}$ , solve the best-response NLP \eqref{eq:local_BR} to obtain $\mathbf{\Gamma}_i^{\mathrm{BR},(l)}$ . \

D &
Update the relaxed strategy using \eqref{eq:relaxed_BR} and compute the induced predictions
$\Delta \hat{x}_i^{(l)}(\cdot;\mathbf{\Gamma}^{(l)})$ via \eqref{eq:rollout_mpc_game}. \

E &
Broadcast $\mathbf{\Gamma}_i^{(l)}$ and the predicted interface quantities needed in \eqref{eq:coupling_cost}
(e.g., $C^\pm\Delta \hat{x}_i^{(l)}$ ) to neighbors. \

F &
Compute $\varsigma^{(l)}$ ; if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$ , stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$ ; otherwise $l\leftarrow l+1$ and repeat. \
\bottomrule
\end{tabularx}
\end{table}

After convergence, only the first-interval parameters $\Gamma_{i,n}^$ are applied.
The within-interval increment is $\Delta u_{i,n}(\tau;\Gamma_{i,n}^)$ and the discrete applied increment uses the interval average:
\begin{equation}
\Delta u_i(t_n)

\Gamma_{i,n,0}^*
+
\Gamma_{i,n,1}^\frac{\delta_n}{2}
+
\Gamma_{i,n,2}^\frac{\delta_n^2}{3},
\end{equation}
followed by $u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n)$ .

The overall control flow chart is shown in Fig~\ref{liu}.

\begin{figure}
\centering
\includegraphics[width=\linewidth]{picture/x5.pdf}
\caption{The overall system control flow chart}\label{liu}
\end{figure}

\begin{remark}
Equations \eqref{eq:Ji_game}--\eqref{eq:coupling_cost} and the NE definition \eqref{eq:NE_def} explicitly show that each player's objective
depends on others' strategies through the coupled prediction dynamics and the shared-tension interface mismatch penalty,
thus making the multi-stand strategic interaction and coordination mechanism transparent.
\end{remark}这是我最新的第二章，第三章，第四章。在多项式参数化的部分，这个地方一般的情况用n阶进行方案设计，仿真可以选为二阶的，所有的时变信号参数化都这样改。给出改后的完整的latex代码，上下文要符合逻辑，在符合逻辑的前提下其他地方都不要改

Answer 1

latex
%========================
\section{Construction of Dataset}
%========================

Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n$ the corresponding sampling interval length.
Define the local within-interval time variable $\tau=t-t_n\in[0,\delta_n]$.

Let $h_i(t)$ denote the exit thickness of stand $i$ ($i=1,\dots,5$),
and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ($i=1,\dots,4$).
The system is strongly coupled because the inter-stand tensions propagate along the mill line and are affected by neighboring stands' actions.

Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be the desired references (setpoints) given by process requirements. We define deviation variables
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\label{eq:dev_def}
\end{equation}
Throughout this paper, the symbol ``$\Delta$'' attached to states always means deviation from reference.

For each stand $i$, we choose the local deviation state as
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3,
\label{eq:xi_def_clean}
\end{equation}
with the boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$ to keep a unified dimension $d=3$ for all stands.

For a five-stand tandem mill, the dominant coupling is between adjacent stands, hence we define
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad
Z_5=\{4\}.
\label{eq:Zi_clean}
\end{equation}
Define the neighbor-state stack
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\,|\,k\in Z_i\}.
\label{eq:xZi_clean}
\end{equation}

Each stand $i$ is manipulated by roll gap $s_i(t)$ and stand speed $v_i(t)$:
\begin{equation}
u_i(t)=
\begin{bmatrix}
s_i(t)\\
v_i(t)
\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_clean}
\end{equation}
To ensure smooth actuation and match industrial practice, we optimize \emph{discrete input increments}:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=
\begin{bmatrix}
\Delta s_i(t_n)\\
\Delta v_i(t_n)
\end{bmatrix}.
\label{eq:du_discrete_clean}
\end{equation}
Throughout this paper, the symbol ``$\Delta$'' attached to inputs $\Delta u_i(t_n)$ means sample-to-sample increment.
Thus, $\Delta x$ and $\Delta u$ are conceptually different, and this is fixed by definition.

Let $d_i(t)$ denote exogenous disturbances.
We denote the interval-level equivalent disturbance by $\Delta d_i(t_n)$.
$I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes the $a\times b$ zero matrix.

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Delta u_i([t_n,t_{n+1}]),\,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:true_mapping_clean}
\end{equation}
where $\Phi_i(\cdot)$ is generally nonlinear and coupled due to rolling deformation and tension propagation.
A commonly used conceptual equivalent discrete linear form is
\begin{equation}
\Delta x_i(t_{n+1})
=
M_d\,\Delta x_i(t_n)
+
N_d\,\Delta u_i(t_n)
+
F_d\,\Delta d_i(t_n),
\label{eq:linear_form_concept}
\end{equation}
where $M_d,N_d,F_d$ represent equivalent discrete-time matrices around operating conditions.
In a practical five-stand cold rolling mill, accurately deriving and identifying these matrices and disturbance models from first principles is difficult,
due to strong coupling, unmodeled nonlinearities, and time-varying operating regimes.
Therefore, this paper aims to learn a high-fidelity approximation of the interval evolution from data and then embed it into distributed MPC.

\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish \eqref{eq:linear_form_concept}
based on first principles. Therefore, in this paper, we learn an approximate mapping of \eqref{eq:true_mapping_clean} from data.
\end{remark}


Although decisions are updated at discrete instants $t_n$, the hydraulic gap and drive systems evolve continuously inside each interval,
and abrupt within-interval changes may excite tension oscillations and deteriorate thickness stability.
Thus, parameterizing the within-interval increment trajectory by a low-order polynomial:
(i) yields a compact finite-dimensional decision representation;
(ii) enforces smooth profiles inside the interval;
(iii) enables enforcing increment constraints for all $\tau\in[0,\delta_n]$.
This is appropriate when $\delta_n$ is not excessively large relative to actuator bandwidth and the within-interval evolution is well approximated by a low-order basis.
In general, we adopt an $n_p$-th order polynomial for scheme design; in simulation one can choose $n_p=2$.

On the interval $[t_n,t_{n+1}]$, parameterize the control increment trajectory as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=
\sum_{r=0}^{n_p}\Gamma_{i,nr}\tau^r,
\qquad \tau\in[0,\delta_n],
\label{eq:du_poly_vec_clean}
\end{equation}
where $\Gamma_{i,nr}\in\mathbb{R}^{n_u}$ are coefficient vectors ($n_u=2$), and $n_p\in\mathbb{N}$ is the polynomial order.
Component-wise, \eqref{eq:du_poly_vec_clean} corresponds to
\begin{equation}
\begin{aligned}
\Delta s_{i,n}(\tau) &= \sum_{r=0}^{n_p}\gamma^{(s)}_{i,nr}\tau^r,\\
\Delta v_{i,n}(\tau) &= \sum_{r=0}^{n_p}\gamma^{(v)}_{i,nr}\tau^r.
\end{aligned}
\label{eq:du_components_clean}
\end{equation}

Define the stacked parameter vector
\begin{equation}
\Gamma_{i,n}\triangleq
\big[
(\Gamma_{i,n0})^\top,\,
(\Gamma_{i,n1})^\top,\,
\ldots,\,
(\Gamma_{i,nn_p})^\top
\big]^\top
\in\mathbb{R}^{p},
\qquad
p=(n_p+1)n_u.
\label{eq:Gamma_clean}
\end{equation}
Here, $\Gamma_{i,n0}$ is the baseline increment at $\tau=0$, while $\{\Gamma_{i,nr}\}_{r=1}^{n_p}$ describe the time-varying rates up to order $n_p$.

Define the interval-averaged equivalent increments as
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau,\\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau.
\end{aligned}
\label{eq:avg_def_clean}
\end{equation}
With \eqref{eq:du_poly_vec_clean}, the input average has a closed form:
\begin{equation}
\Delta u_i(t_n)=
\sum_{r=0}^{n_p}\Gamma_{i,nr}\frac{\delta_n^{r}}{r+1}.
\label{eq:avg_closed_clean}
\end{equation}

Let $\mathcal{I}_x$ denote the sampling domain of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$,
and let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$.
These domains specify the operating envelope used to generate supervised training data.

Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$.
In addition to the local deviation state, the neighbor deviation states are included to represent inter-stand coupling.
The process is summarized in Table~\ref{tab:interval_sample_generation_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \\
\midrule
1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \\
2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$, with order $n_p$). \\
3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via \eqref{eq:du_poly_vec_clean}. \\
4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$. \\
\bottomrule
\end{tabularx}
\end{table}

Accordingly, an interval sample for subsystem $i$ can be represented as
\begin{equation}
\mathcal{D}_{i,n}=\big\{\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big\}.
\label{eq:interval_sample_clean}
\end{equation}
Note that $\Delta u_{i,n}(\tau)$ is fully determined by $(\Gamma_{i,n},\delta_n)$ via \eqref{eq:du_poly_vec_clean},
therefore it is sufficient to store $(\Gamma_{i,n},\delta_n)$ as the learning input.

For each subsystem $i$, by repeating the above procedure across multiple intervals and randomized draws,
the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big\{&
\big(\Delta x_i^{(j)}(t_n),\,\Delta x_{Z_i}^{(j)}(t_n),\,\Delta x_i^{(j)}(t_{n+1});\,
\Gamma_{i,n}^{(j)},\,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big\}.
\end{split}
\label{eq:S_i_clean}
\end{equation}
Here $J$ is the number of one-step samples for subsystem $i$.
The overall dataset for the five-stand mill is denoted by $\{S_i\}_{i=1}^{5}$.
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[scale=0.5]{picture/Fig2.pdf}
  \caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}

The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth deviation-state trajectories over a horizon of $K$ consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into $K$-step trajectory segments.

Specifically, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling
$\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances),
and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$.
Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks
$\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.

Define a $K$-step segment sample for subsystem $i$ as
\begin{equation}
\begin{aligned}
\mathcal{W}_{i,n}=
\Big\{&
\big(\Delta x_i(t_{n+s}),\,\Delta x_{Z_i}(t_{n+s}),\,\Gamma_{i,n+s},\,\delta_{n+s}\big)_{s=0}^{K-1}; \\
&\big(\Delta x_i(t_{n+s+1})\big)_{s=0}^{K-1}
\Big\}.
\end{aligned}
\label{eq:segment_clean}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\},
\label{eq:S_i_K_clean}
\end{equation}
where $J_K$ is the number of $K$-step segment samples.
Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (keeping only $s=0$),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.

%========================
\section{Construction of Residual Neural Network}
%========================
\subsection{Residual Neural Network Structure Construction and Training Method}
Given the dataset, the neural network model is trained to learn a stand-wise, control-dependent one-step evolution law of deviation states:
\begin{equation}
\Delta x_i(t_{n+1})
\approx
\Delta x_i(t_n)+
\mathcal{N}_i\!\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Gamma_{i,n},\,\delta_n;\,\Theta_i\Big),
\label{eq:learned_dyn_clean}
\end{equation}
where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are trainable parameters.

\begin{remark}
If $\mathcal{N}_i$ does not take control information as input (here $\Gamma_{i,n}$ and $\delta_n$),
the predictor becomes an autoregressive model that only reproduces trajectories under the training input patterns
and cannot answer the counterfactual question: ``what will happen if we choose a different roll gap and speed trajectory?''
Since MPC optimizes over candidate decisions, a control-dependent predictor \eqref{eq:learned_dyn_clean} is necessary
to evaluate the predicted thickness and tension behavior under different candidate actuator trajectories.    
\end{remark}

Let $d=3$ (state dimension), $|Z_i|$ be the number of neighbors of stand $i$ in \eqref{eq:Zi_clean}, and $p=(n_p+1)n_u$ in \eqref{eq:Gamma_clean}.
Define the input vector
\begin{equation}
X_{i,\text{in}} \triangleq
\big[
\Delta x_i(t_n)^\top,\,
\Delta x_{Z_i}(t_n)^\top,\,
\Gamma_{i,n}^\top,\,
\delta_n
\big]^\top
\in \mathbb{R}^{d(1+|Z_i|)+p+1}.
\label{eq:X_in_clean}
\end{equation}
The network mapping is
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}.
\end{equation}

To improve training stability and long-horizon rollout robustness, we use a residual form.
Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a selection matrix extracting the local state block:
\begin{equation}
\hat{I}_i = [I_d,\, 0_{d\times(d|Z_i|+p+1)}].
\label{eq:Ihat_clean}
\end{equation}
Then the one-step predictor is written as
\begin{equation}
X_{i,\text{out}} = \hat{I}_i X_{i,\text{in}} + \mathcal{N}_i(X_{i,\text{in}}; \Theta_i),
\label{eq:res_predict_clean}
\end{equation}
where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$.
This structure implements a baseline-plus-correction interpretation:
the shortcut propagates the current deviation state $\Delta x_i(t_n)$, while the network learns the correction capturing
unmodeled nonlinearities and inter-stand coupling (via $\Delta x_{Z_i}$) under varying operating conditions.

To improve robustness when $\delta_n$ varies, we introduce an auxiliary branch inside $\mathcal{N}_i$:
\begin{equation}
\mathcal{N}_i(X_{i,\text{in}};\Theta_i)\triangleq
\psi_i(X_{i,\text{in}};\Theta_{\psi_i}) + \rho_i(X_{i,\text{in}};\theta_i),
\label{eq:aux_clean}
\end{equation}
where $\psi_i(\cdot)$ is a lightweight feedforward branch that captures low-frequency/scale effects strongly related to $\delta_n$,
and $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections.
When $\psi_i(\cdot)\equiv 0$, the model reduces to a standard residual network.

For the $j$-th sample in \eqref{eq:S_i_clean}, define
\begin{equation}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\ 
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top},
\end{equation}
and the supervised residual target
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\label{eq:target_clean}
\end{equation}

To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability,
we train the forward predictor jointly with an auxiliary backward residual model
and impose a multi-step reciprocal-consistency regularization over a $K$-step segment from $S_i^{(K)}$.

Construct a backward residual network
\begin{equation}
\mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$. For the backward step associated with interval $[t_n,t_{n+1}]$, define
\begin{equation}
\begin{aligned}
X_{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),\
\Gamma_{i,n},\ \delta_n
\big]^{\top},\\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}_i X_{i,\mathrm{in}}^{b} + \mathcal{B}_i(X_{i,\mathrm{in}}^{b};\bar{\Theta}_i),
\end{aligned}
\label{eq:back_clean}
\end{equation}
where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$.
The supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t_{n+1}).
\end{equation}

Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize
\begin{equation}
\Delta \hat{x}_i(t_n)=\Delta x_i(t_n),
\end{equation}
and recursively apply the forward predictor for $K$ steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1})
&=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\,\Delta \hat{x}_{Z_i}(t_{n+s}),\,
\Gamma_{i,n+s},\,\delta_{n+s};\,\Theta_i
\Big),\\
&\qquad s=0,\ldots,K-1.
\end{aligned}
\label{eq:fwd_roll_clean}
\end{equation}

Set the terminal condition
\begin{equation}
\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}),
\end{equation}
and roll back using $\mathcal{B}_i$:
\begin{equation}
\begin{aligned}
\Delta \bar{x}_i(t_{n+s})
&=
\hat{I}_i X_{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\,\bar{\Theta}_i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\label{eq:bwd_roll_clean}
\end{equation}
where
\begin{equation}
X_{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}_i(t_{n+s+1}),\ \Delta \hat{x}_{Z_i}(t_{n+s+1}),\
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top}.
\end{equation}

Define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)
=
\sum_{s=0}^{K}
\left\|
\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})
\right\|^2.
\end{equation}

We jointly minimize:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}_i\!\left(
X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}_i\!\left(
X_{i,\mathrm{in}}^{b\,(j)}(t_{n+s});\bar{\Theta}_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n),\\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big\|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})
\Big\|^2.
\end{aligned}
\label{eq:loss_clean}
\end{equation}
Here, $L_{\mathrm{1step}}$ enforces one-step accuracy; $L_{\mathrm{roll}}$ explicitly suppresses long-horizon drift under recursion;
$L_{\mathrm{msrp}}$ regularizes the learned dynamics by enforcing reciprocal consistency between forward and backward rollouts;
and $L_{\mathrm{bwd}}$ trains the backward model for the consistency regularization.
In implementation, these terms are combined as
\begin{equation}
L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}},
\end{equation}
where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set.

After training, the forward predictor is
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)
+
\mathcal{N}_i\!\Big(
\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,
\Gamma_{i,n},\,\delta_n;\,\Theta_i^*
\Big),
\label{eq:pred_clean}
\end{equation}
and multi-step prediction is obtained by recursive rollout of \eqref{eq:pred_clean}.
This learned predictor is the internal model used by the MPC optimizer in the next section.

Finally, network parameters are optimized using Adam:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon},
\end{equation}
where $\alpha$ is the learning rate (we use $\alpha$ to avoid conflict with other symbols),
$\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected moment estimates, and $\varepsilon>0$ is a small constant for numerical stability.
Figure~\ref{fig:rnn_logic} illustrates the overall structure.

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.85]{picture/x6.pdf}
  \caption{Logic diagram of the residual neural network.}
  \label{fig:rnn_logic}
\end{figure}


\subsection{Explainability of the residual network}

Although the five-stand cold rolling mill involves complex rolling deformation and tension-transport coupling, its stand-wise deviation dynamics can be abstractly described by a coupled nonlinear ODE:
\begin{equation}
\frac{d}{dt}\Delta x_i(t)
=
f_i\!\Big(\Delta x_i(t),\,\Delta x_{Z_i}(t),\,u_i(t),\,d_i(t)\Big),
\qquad i=1,\ldots,5,
\label{eq:mech_ode}
\end{equation}
where $f_i(\cdot)$ summarizes the mechanistic effects.
For a given sampling interval length $\delta_n$ and a within-interval input trajectory $u_i(t_n+\tau)$,
the state transition over one interval can be written through an evolution operator:
\begin{equation}
\Delta x_i(t_{n+1})=\Phi_{i,\delta_n}\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,u_i([t_n,t_{n+1}]),\,d_i([t_n,t_{n+1}])\Big).
\label{eq:evolution_operator_mill}
\end{equation}
By the fundamental theorem of calculus, \eqref{eq:mech_ode} implies the increment form
\begin{equation}
\Delta x_i(t_{n+1})
=
\Delta x_i(t_n)
+
\underbrace{\int_{0}^{\delta_n}
f_i\!\Big(\Delta x_i(t_n+\tau),\,\Delta x_{Z_i}(t_n+\tau),\,u_i(t_n+\tau),\,d_i(t_n+\tau)\Big)\,d\tau}_{\triangleq\ \varphi_{i,n}},
\label{eq:increment_integral}
\end{equation}
where $\varphi_{i,n}$ is the one-interval state increment generated by the mechanistic dynamics.

Our learned model \eqref{eq:learned_dyn_clean} adopts the same increment form as \eqref{eq:increment_integral}:
\begin{equation}
\Delta x_i(t_{n+1})
\approx
\Delta x_i(t_n)
+
\mathcal{N}_i\!\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Gamma_{i,n},\,\delta_n;\Theta_i\Big).
\end{equation}
Here, $\mathcal{N}_i(\cdot)$ plays the role of a data-driven approximation of the integral increment $\varphi_{i,n}$,
i.e., it approximates the accumulated effect of the mechanistic dynamics over $[t_n,t_{n+1}]$.
This is consistent with the well-known interpretation that a residual network behaves like a one-step time integrator:
the identity path propagates the current state, while the residual branch represents the increment over the time lag.

Inside each interval, we do not optimize point-wise $u_i(t)$ but parameterize the increment trajectory by
$\Delta u_{i,n}(\tau;\Gamma_{i,n})$.
Hence, the mechanistic increment $\varphi_{i,n}$ in \eqref{eq:increment_integral} depends on the \emph{whole} within-interval trajectory.
Feeding $(\Gamma_{i,n},\delta_n)$ into $\mathcal{N}_i$ is therefore a compact way to represent how different candidate
gap/speed trajectories change the integral effect and thus the next thickness--tension state.

When $\delta_n$ varies and is not very small, directly learning $\varphi_{i,n}$ may be harder.
Motivated by the generalized residual idea,
we decompose the increment predictor into two parts in \eqref{eq:aux_clean}:
\begin{equation}
\mathcal{N}_i(\cdot)=\psi_i(\cdot)+\rho_i(\cdot).
\end{equation}
Conceptually, $\psi_i(\cdot)$ captures low-frequency and scale effects strongly related to $\delta_n$,
while $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections.
This provides a mechanism-consistent interpretation: a baseline increment plus a residual correction
that compensates unmodeled nonlinearities and inter-stand coupling.

%========================
\section{Nash Equilibrium-Based RNE-DMPC}
%========================

The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in control actions (roll gap and stand speed) at one stand can affect both upstream and downstream stands,
making centralized online optimization over all stands' decision variables computationally demanding.

To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate distributed coordination as a Nash-equilibrium-seeking iteration.
Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed MPC method (RNE-DMPC)
for coordinated thickness--tension regulation and tracking. The overall control structure is shown in Figure~\ref{4}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[width=\linewidth]{picture/x2.pdf}
  \caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}

At sampling time $t_n$, stand $i$ chooses the polynomial-parameter sequence
$\mathbf{\Gamma}_i(t_n)\in\mathbb{R}^{pN_c}$, where $p=(n_p+1)n_u$.
Let $\mathbf{\Gamma}(t_n)\triangleq \mathrm{col}\{\mathbf{\Gamma}_1(t_n),\ldots,\mathbf{\Gamma}_5(t_n)\}$
denote the joint strategy profile, and let $\mathbf{\Gamma}_{-i}(t_n)$ denote the collection of all strategies except stand $i$.

Given the current measured/estimated deviation state $\Delta x_i(t_n)$ and the strategies
$(\mathbf{\Gamma}_i(t_n),\mathbf{\Gamma}_{Z_i}(t_n))$,
the multi-step prediction used by stand $i$ is written explicitly as
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1};\mathbf{\Gamma}_i,\mathbf{\Gamma}_{Z_i})
&=
\Delta \hat{x}_i(t_{n+s};\mathbf{\Gamma}_i,\mathbf{\Gamma}_{Z_i})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s};\cdot),\,
\Delta \hat{x}_{Z_i}(t_{n+s};\mathbf{\Gamma}_{Z_i}),\\
&\qquad
\Gamma_{i,n+s},\,
\delta_{n+s};\Theta_i^*
\Big),
\end{aligned}
\label{eq:rollout_mpc_game}
\end{equation}
for $s=0,\ldots,N_p-1$, with initialization $\Delta \hat{x}_i(t_n;\cdot)=\Delta x_i(t_n)$.
Here the neighbor stack $\Delta \hat{x}_{Z_i}(t_{n+s};\mathbf{\Gamma}_{Z_i})$ is generated from neighbors' strategies via the same learned predictors.

Over $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$,
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
=
\sum_{r=0}^{n_p}\Gamma_{i,n+s,r}\tau^r,\qquad \tau \in [0,\delta_{n+s}],
\end{equation}
and the interval-averaged increment is
\begin{equation}
\Delta u_i(t_{n+s})
=
\sum_{r=0}^{n_p}\Gamma_{i,n+s,r}\frac{\delta_{n+s}^{r}}{r+1}.
\label{eq:du_avg_clean}
\end{equation}

\begin{remark}
Because inter-stand tension $T_i$ is jointly affected by the adjacent stands $i$ and $i+1$,
the predicted evolution of $\Delta x_i$ depends on neighbors' future actions,
hence the MPC problems are not independent but form a coupled dynamic game.
\end{remark}

At time $t_n$, the local strategy of stand $i$ is
\begin{equation}
\mathbf{\Gamma}_i(t_n)
=
\mathrm{col}\{\Gamma_{i,n},\Gamma_{i,n+1},\ldots,\Gamma_{i,n+N_c-1}\}
\in \mathbb{R}^{pN_c}.
\end{equation}

In deviation coordinates, the regulation/tracking objective is $\Delta x_i(t)\rightarrow 0$, i.e.
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d},\qquad d=3.
\end{equation}

Recall $\Delta x_i=[\Delta h_i,\Delta T_{i-1},\Delta T_i]^\top$.
Define the row selectors
\begin{equation}
C^- \triangleq [0\ \ 1\ \ 0]\in\mathbb{R}^{1\times 3},\qquad
C^+ \triangleq [0\ \ 0\ \ 1]\in\mathbb{R}^{1\times 3},
\end{equation}
so that $C^- \Delta x_i=\Delta T_{i-1}$ (upstream interface) and $C^+\Delta x_i=\Delta T_i$ (downstream interface).

For the interface between stands $i$ and $i+1$,
stand $i$'s prediction provides $C^+\Delta \hat{x}_i$, while stand $i+1$'s prediction provides $C^- \Delta \hat{x}_{i+1}$.
Their mismatch measures coupling inconsistency:
\begin{equation}
e_{i}^{\mathrm{sh}}(t_{n+s};\mathbf{\Gamma})
\triangleq
C^+\Delta \hat{x}_i(t_{n+s};\mathbf{\Gamma})
-
C^-\Delta \hat{x}_{i+1}(t_{n+s};\mathbf{\Gamma}),\qquad i=1,\ldots,4.
\label{eq:shared_tension_mismatch}
\end{equation}

We define the stage cost of stand $i$ as a function of all players' strategies:
\begin{equation}
J_i(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i})
=
\sum_{s=1}^{N_p}
\left\|
\Delta \hat{x}_i(t_{n+s};\mathbf{\Gamma}_i,\mathbf{\Gamma}_{Z_i})
\right\|_{Q_i}^{2}
+
\sum_{s=0}^{N_c-1}
\left\|\Gamma_{i,n+s}\right\|_{R_i}^{2}
+
J_i^{\mathrm{cpl}}(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i})
\label{eq:Ji_game}
\end{equation}
where $Q_i\succeq 0$ weights thickness and tension deviations, and $R_i\succeq 0$ penalizes actuation magnitudes.

The coupling term $J_i^{\mathrm{cpl}}$ explicitly reflects the game/coordination requirement on shared tensions.
A simple and effective choice is to penalize the interface mismatches adjacent to stand $i$:
\begin{equation}
\begin{aligned}
J_i^{\mathrm{cpl}}
&=
\mu_i\sum_{s=1}^{N_p}
\Big(
\mathbb{I}_{\{i\ge 2\}}
\big\|
C^-\Delta \hat{x}_i(t_{n+s};\mathbf{\Gamma})
-
C^+\Delta \hat{x}_{i-1}(t_{n+s};\mathbf{\Gamma})
\big\|^2 \\
&\qquad\quad
+
\mathbb{I}_{\{i\le 4\}}
\big\|
C^+\Delta \hat{x}_i(t_{n+s};\mathbf{\Gamma})
-
C^-\Delta \hat{x}_{i+1}(t_{n+s};\mathbf{\Gamma})
\big\|^2
\Big),
\end{aligned}
\label{eq:coupling_cost}
\end{equation}
with $\mu_i>0$ and indicator $\mathbb{I}_{\{\cdot\}}$.
This term makes the coupling conflict explicit: unilateral actions that locally reduce thickness error may worsen shared-tension
compatibility and thus increase $J_i$, and also affect neighbors' objectives.

We enforce the absolute-input bounds and within-interval increment bounds.

Absolute input bounds (roll gap and speed):
\begin{equation}
u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},
\qquad s=0,\ldots,N_p-1,
\label{eq:u_abs_game}
\end{equation}
where $u_i(t)=[s_i(t),\,v_i(t)]^\top$.

Within-interval increment-trajectory bounds:
\begin{equation}
\Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max},
\qquad \forall\tau\in[0,\delta_{n+s}],\ s=0,\ldots,N_p-1,
\label{eq:du_traj_game}
\end{equation}
where $\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})$ is given by the polynomial parameterization.

Consistency with discrete execution:
Define the interval-averaged increment
\begin{equation}
\Delta u_i(t_{n+s})
=
\frac{1}{\delta_{n+s}}\int_{0}^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\,d\tau
=
\sum_{r=0}^{n_p}\Gamma_{i,n+s,r}\frac{\delta_{n+s}^{r}}{r+1},
\label{eq:du_avg_game}
\end{equation}
and propagate the absolute input along the horizon by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),\qquad
u_i(t_{n+s})=u_i(t_{n+s-1})+\Delta u_i(t_{n+s}),\ s=1,\ldots,N_p-1,
\label{eq:u_prop_game}
\end{equation}
where $u_i(t_{n-1})$ is the applied (measured) input from the previous sampling instant.

Compact feasible set:
\begin{equation}
\Omega_i \triangleq
\Big\{\mathbf{\Gamma}_i\ \Big|\ 
\eqref{eq:rollout_mpc_game}\ \text{holds and}\
\eqref{eq:u_abs_game},\eqref{eq:du_traj_game},\eqref{eq:u_prop_game}\ \text{are satisfied}
\Big\}.
\label{eq:Omega_i_game}
\end{equation}

Given neighbors' current strategies,
stand $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{\mathrm{BR}}
=
\arg\min_{\mathbf{\Gamma}_i\in\Omega_i}\ 
J_i(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i}).
\label{eq:local_BR}
\end{equation}
Because the learned surrogate is differentiable, \eqref{eq:local_BR} can be solved by standard gradient-based NLP solvers.


At each sampling time $t_n$, the distributed MPC coordination induces a finite-horizon dynamic game:
players are stands $i=1,\ldots,5$; strategy sets are $\Omega_i$; and payoff (cost) functions are $J_i(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i})$
defined in \eqref{eq:Ji_game}--\eqref{eq:coupling_cost}.

A joint strategy profile $\mathbf{\Gamma}^*=\mathrm{col}\{\mathbf{\Gamma}_1^*,\ldots,\mathbf{\Gamma}_5^*\}$
is a Nash equilibrium if
\begin{equation}
\forall i\in\{1,\ldots,5\},\qquad
\mathbf{\Gamma}_i^*\in
\arg\min_{\mathbf{\Gamma}_i\in\Omega_i}
J_i(\mathbf{\Gamma}_i;\mathbf{\Gamma}_{-i}^*).
\label{eq:NE_def}
\end{equation}
This definition explicitly characterizes the strategic coupling:
each player's optimal decision depends on neighbors' decisions through the shared-tension dynamics and the coupling term.

To compute an NE online with limited communication, we employ a relaxed best-response iteration.
Let $l$ denote the Nash-iteration index.
Given $\mathbf{\Gamma}^{(l-1)}$, each stand computes a best response $\mathbf{\Gamma}_i^{\mathrm{BR},(l)}$
by solving \eqref{eq:local_BR}, and then updates with relaxation:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}
=
(1-\omega)\mathbf{\Gamma}_i^{(l-1)}
+
\omega\,\mathbf{\Gamma}_i^{\mathrm{BR},(l)},
\qquad \omega\in(0,1].
\label{eq:relaxed_BR}
\end{equation}
The relaxation factor $\omega$ mitigates oscillations caused by strong coupling and improves practical convergence.

The Nash equilibrium is computed through distributed best-response iterations, summarized in Table ~\ref{tab:nash_iter_en}.

The convergence metric in Step F is defined as
\begin{equation}
\varsigma^{(l)}
=
\max_i
\frac{\left\|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right\|_2}{
\left\|
\mathbf{\Gamma}_i^{(l-1)}
\right\|_2+\epsilon},
\end{equation}
with $\epsilon>0$ small.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Relaxed distributed Nash best-response iteration for RNE-DMPC (five-stand).}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \\
\midrule
A &
Initialize $l=1$ and warm-start $\mathbf{\Gamma}_i^{(0)}$ (e.g., from the previous sampling time). \\

B &
Communicate $\mathbf{\Gamma}_i^{(l-1)}$ (or the induced predicted trajectories) among neighbors; form $\mathbf{\Gamma}_{Z_i}^{(l-1)}$. \\

C &
Given $\mathbf{\Gamma}_{-i}^{(l-1)}$, solve the best-response NLP \eqref{eq:local_BR} to obtain $\mathbf{\Gamma}_i^{\mathrm{BR},(l)}$. \\

D &
Update the relaxed strategy using \eqref{eq:relaxed_BR} and compute the induced predictions
$\Delta \hat{x}_i^{(l)}(\cdot;\mathbf{\Gamma}^{(l)})$ via \eqref{eq:rollout_mpc_game}. \\

E &
Broadcast $\mathbf{\Gamma}_i^{(l)}$ and the predicted interface quantities needed in \eqref{eq:coupling_cost}
(e.g., $C^\pm\Delta \hat{x}_i^{(l)}$) to neighbors. \\

F &
Compute $\varsigma^{(l)}$; if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$, stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$; otherwise $l\leftarrow l+1$ and repeat. \\
\bottomrule
\end{tabularx}
\end{table}

After convergence, only the first-interval parameters $\Gamma_{i,n}^*$ are applied.
The within-interval increment is $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ and the discrete applied increment uses the interval average:
\begin{equation}
\Delta u_i(t_n)
=
\sum_{r=0}^{n_p}\Gamma_{i,n,r}^*\frac{\delta_n^{r}}{r+1},
\end{equation}
followed by $u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n)$.

The overall control flow chart is shown in Fig~\ref{liu}.

\begin{figure}
  \centering
  \includegraphics[width=\linewidth]{picture/x5.pdf}
  \caption{The overall system control flow chart}\label{liu}
\end{figure}

\begin{remark}
Equations \eqref{eq:Ji_game}--\eqref{eq:coupling_cost} and the NE definition \eqref{eq:NE_def} explicitly show that each player's objective
depends on others' strategies through the coupled prediction dynamics and the shared-tension interface mismatch penalty,
thus making the multi-stand strategic interaction and coordination mechanism transparent.
\end{remark}这是我最新的第二章，第三章，第四章。在多项式参数化的部分，  这个地方一般的情况用n阶进行方案设计，仿真可以选为二阶的，所有的时变信号参数化都这样改。给出改后的完整的latex代码，上下文要符合逻辑，在符合逻辑的前提下其他地方都不要改

%======================== \section{Construction of...

Frage

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})

On the interval $[t_n,t_{n+1}]$ , parameterize the control increment trajectory as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})

Define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)

After training, the forward predictor is
\begin{equation}
\Delta \hat{x}i(t{n+1})

Although the five-stand cold rolling mill involves complex rolling deformation and tension-transport coupling, its stand-wise deviation dynamics can be abstractly described by a coupled nonlinear ODE:
\begin{equation}
\frac{d}{dt}\Delta x_i(t)

Over $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$ ,
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})

\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\tau
+\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}],
\end{equation}
and the interval-averaged increment is
\begin{equation}
\Delta u_i(t_{n+s})

At time $t_n$ , the local strategy of stand $i$ is
\begin{equation}
\mathbf{\Gamma}_i(t_n)

We define the stage cost of stand $i$ as a function of all players' strategies:
\begin{equation}
J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i})

C^+\Delta \hat{x}{i-1}(t{n+s};\mathbf{\Gamma})
\big|^2 \
&\qquad\quad
+
\mathbb{I}_{{i\le 4}}
\big|
C^+\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

Consistency with discrete execution:
Define the interval-averaged increment
\begin{equation}
\Delta u_i(t_{n+s})

\frac{1}{\delta_{n+s}}\int_{0}^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}),d\tau

Given neighbors' current strategies,
stand $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{\mathrm{BR}}

The convergence metric in Step F is defined as
\begin{equation}
\varsigma^{(l)}

After convergence, only the first-interval parameters $\Gamma_{i,n}^$ are applied.
The within-interval increment is $\Delta u_{i,n}(\tau;\Gamma_{i,n}^)$ and the discrete applied increment uses the interval average:
\begin{equation}
\Delta u_i(t_n)

Gedanke

Antwort

Diesen Q&A teilen

Frage

The stand-wise deviation-state evolution over [tn,tn+1][t_n,t_{n+1}][tn​,tn+1​] can be expressed by a discrete-time mapping \begin{equation} \Delta x_i(t_{n+1})

On the interval [tn,tn+1][t_n,t_{n+1}][tn​,tn+1​], parameterize the control increment trajectory as \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n})

Define the multi-step reciprocal prediction error \begin{equation} E_i(t_n)

After training, the forward predictor is \begin{equation} \Delta \hat{x}i(t{n+1})

Although the five-stand cold rolling mill involves complex rolling deformation and tension-transport coupling, its stand-wise deviation dynamics can be abstractly described by a coupled nonlinear ODE: \begin{equation} \frac{d}{dt}\Delta x_i(t)

Over [tn+s,tn+s+1][t_{n+s},t_{n+s+1}][tn+s​,tn+s+1​] with length δn+s\delta_{n+s}δn+s​, \begin{equation} \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})

\Gamma_{i,n+s,0} +\Gamma_{i,n+s,1}\tau +\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}], \end{equation} and the interval-averaged increment is \begin{equation} \Delta u_i(t_{n+s})

At time tnt_ntn​, the local strategy of stand iii is \begin{equation} \mathbf{\Gamma}_i(t_n)

We define the stage cost of stand iii as a function of all players' strategies: \begin{equation} J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i})

C^+\Delta \hat{x}{i-1}(t{n+s};\mathbf{\Gamma}) \big|^2 \ &\qquad\quad + \mathbb{I}_{{i\le 4}} \big| C^+\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

Consistency with discrete execution: Define the interval-averaged increment \begin{equation} \Delta u_i(t_{n+s})

\frac{1}{\delta_{n+s}}\int_{0}^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}),d\tau

Given neighbors' current strategies, stand iii solves the differentiable NLP: \begin{equation} \mathbf{\Gamma}_i^{\mathrm{BR}}

The convergence metric in Step F is defined as \begin{equation} \varsigma^{(l)}

Gedanke

Antwort

Diesen Q&A teilen

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})

On the interval $[t_n,t_{n+1}]$ , parameterize the control increment trajectory as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})

Define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)

After training, the forward predictor is
\begin{equation}
\Delta \hat{x}i(t{n+1})

Although the five-stand cold rolling mill involves complex rolling deformation and tension-transport coupling, its stand-wise deviation dynamics can be abstractly described by a coupled nonlinear ODE:
\begin{equation}
\frac{d}{dt}\Delta x_i(t)

Over $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$ ,
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})

\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\tau
+\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}],
\end{equation}
and the interval-averaged increment is
\begin{equation}
\Delta u_i(t_{n+s})

At time $t_n$ , the local strategy of stand $i$ is
\begin{equation}
\mathbf{\Gamma}_i(t_n)

We define the stage cost of stand $i$ as a function of all players' strategies:
\begin{equation}
J_i(\mathbf{\Gamma}i;\mathbf{\Gamma}{-i})

C^+\Delta \hat{x}{i-1}(t{n+s};\mathbf{\Gamma})
\big|^2 \
&\qquad\quad
+
\mathbb{I}_{{i\le 4}}
\big|
C^+\Delta \hat{x}i(t{n+s};\mathbf{\Gamma})

Consistency with discrete execution:
Define the interval-averaged increment
\begin{equation}
\Delta u_i(t_{n+s})

Given neighbors' current strategies,
stand $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{\mathrm{BR}}

The convergence metric in Step F is defined as
\begin{equation}
\varsigma^{(l)}