%======================== \section{Construction of...

Question

%========================
\section{Construction of Dataset}
%========================

The state increment at time $t_{n+1}$ is described by the following discrete mapping:
\begin{equation*}
\Delta x_i(t_{n+1})

M_d,\Delta x_i(t_n)
+
N_d,\Delta u_i(t_n)
+
F_d,\Delta d_i(t_n),
\end{equation*}
where Equation maps $\Delta x_i(t_n)$ to $\Delta x_i(t_{n+1})$ .
Here, $\Delta u_i(t_n)$ and $\Delta d_i(t_n)$ denote the \emph{equivalent discrete-time (interval-averaged)}
control increment and disturbance over $[t_n,t_{n+1}]$ , respectively, as defined below.

\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish the equation based on first principles. Therefore, in this paper, we hope to learn an approximate result from the data.
\end{remark}

%========================
\subsection{Interval-level parameterization and one-step dataset}
%========================

To construct the supervised dataset for training, this paper locally parameterizes the incremental control input within each sampling interval.
For the local interval $[t_n,t_{n+1}]$ , define the interval length
\begin{equation}
\delta_n = t_{n+1}-t_n ,
\end{equation}
and introduce a local time variable $\tau\in[0,\delta_n]$ .
Using a quadratic polynomial basis, the control increment trajectory on this interval is parameterized as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})

\gamma_{i,n0}+\gamma_{i,n1}\tau+\gamma_{i,n2}\tau^2,
\qquad \tau\in[0,\delta_n],
\end{equation}
where $\Gamma_{i,n}=[\gamma_{i,n0},\gamma_{i,n1},\gamma_{i,n2}]$ is the local parameter vector.
Here, $\gamma_{i,n0}$ denotes the initial baseline of the increment, while $\gamma_{i,n1}$ and $\gamma_{i,n2}$ describe
the linear and quadratic variation rates, respectively.

Define the equivalent discrete-time (interval-averaged) increments as
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau),d\tau,\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau),d\tau.
\end{aligned}
\end{equation}

Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$ .
In addition to the local state increment, the neighbor state increments are also included to represent inter-stand coupling.
The specific process is shown in Table~\ref{tab:interval_sample_generation_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ .}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \
\midrule
1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and its neighbor stack $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$ . \
2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (ranges of $\gamma_{i,n0},\gamma_{i,n1},\gamma_{i,n2}$ ). \
3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via the polynomial model. \
4 & \textbf{State propagation:} integrate the coupled mill model on $[t_n,t_{n+1}]$ (e.g., RK4) and record $\Delta x_i(t_{n+1})$ . \
\bottomrule
\end{tabularx}
\end{table}

Define the neighbor-state-increment stack as
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}{\Delta x_k(t_n),|,k\in Z_i}.
\end{equation}

Accordingly, an interval sample for subsystem $i$ can be represented as
\begin{equation}
\mathcal{D}{i,n}=\big{\Delta x_i(t_n),\ \Delta x{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big},
\end{equation}
used to learn the mapping relationship from the current local and neighbor states and local control trajectory to the next local state.

For each subsystem $i$ , by repeating the above procedure across multiple intervals and randomized draws, the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big{&
\big(\Delta x_i^{(j)}(t_n),,\Delta x_{Z_i}^{(j)}(t_n),,\Delta x_i^{(j)}(t_{n+1});\
&\qquad \Gamma_{i,n}^{(j)},,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big}.
\end{split}
\end{equation}
The overall dataset for the five-stand mill can be denoted as $\{S_i\}_{i=1}^{5}$ .
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.

\begin{figure*}[htbp]
\centering
\includegraphics[scale=0.5]{picture/Fig2.pdf}
\caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}

%========================
\subsection{Multi-step rollout segment dataset}
%========================

The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth state trajectories over a horizon of $K$ consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into $K$ -step trajectory segments.

Specifically, during offline simulation, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling
$\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances),
and integrating the coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$ .
Hence, we obtain the state-increment sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks
$\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$ .

Define a $K$ -step segment sample for subsystem $i$ as
\begin{equation}
\begin{aligned}
\mathcal{W}{i,n}=
\Big{&
\big(\Delta x_i(t{n+s}),,\Delta x_{Z_i}(t_{n+s}),,\Gamma_{i,n+s},,\delta_{n+s}\big){s=0}^{K-1}; \
&\big(\Delta x_i(t{n+s+1})\big){s=0}^{K-1}
\Big}.
\end{aligned}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big{\mathcal{W}{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big}.
\end{equation}
Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (by keeping only $s=0$ ),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.

%========================
\section{Construction of Residual Neural Network}
%========================

\subsection{Network Architecture}

Given the training dataset, the neural network model is defined and trained.
The network model essentially learns the evolutionary mapping of the interconnected cold rolling system over each local interval.
Specifically, for subsystem $i$ , the proposed residual network defines a nonlinear mapping
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
where $d$ denotes the dimension of the local state increment $\Delta x_i$ ,
$|Z_i|$ is the number of neighbors of subsystem $i$ ,
and $p=\mathrm{dim}(\Gamma_{i,n})$ denotes the dimension of the local input-parameter vector.
For a single-input case with quadratic parameterization, $p=3$ .

Accordingly, we define the residual mapping
\begin{equation}
\begin{aligned}
\Delta r_i &= \mathcal{N}i(X{i,\text{in}}; \Theta_i),\
X_{i,\text{in}} &\in \mathbb{R}^{d(1+|Z_i|)+p+1},\qquad
\Delta r_i \in \mathbb{R}^d ,
\end{aligned}
\end{equation}
where $\Theta_i$ represents the trainable parameters of the network for subsystem $i$ , and $X_{i,\text{in}}$ denotes the concatenated input vector for subsystem $i$ in the current sampling interval.

To explicitly incorporate the residual structure, the local state component in the input is passed through an identity shortcut
and added to obtain the one-step prediction.
Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a linear selection matrix whose block form is defined by
\begin{equation}
\hat{I}i = [I_d,, 0{d\times(d|Z_i|+p+1)}].
\end{equation}

To improve robustness when the sampling interval length $\delta_n$ varies or becomes relatively large, we introduce an auxiliary branch inside $\mathcal{N}_i$ :
\begin{equation}
\mathcal{N}i(X{i,\text{in}};\Theta_i)\triangleq
\eta_i(X_{i,\text{in}};\Theta_{\eta_i}) + \mathcal{I}i(X{i,\text{in}};\theta_i),
\end{equation}
where $\eta_i(\cdot)$ can be implemented by a lightweight feedforward network and
$\mathcal{I}_i(\cdot)$ denotes the remaining residual branch.
When $\eta_i(\cdot)\equiv 0$ , the model reduces to the standard residual form.

Hence, the one-step prediction of the local state increment at $t_{n+1}$ can be written as
\begin{equation}
X_{i,\text{out}} = \hat{I}i X{i,\text{in}} + \mathcal{N}i(X{i,\text{in}}; \Theta_i),
\label{333}
\end{equation}
where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$ .

\begin{remark}
The predictor in \eqref{333} admits a baseline-plus-correction form: the shortcut term $\hat{I}_i X_{i,\mathrm{in}}$ propagates the current local increment $\Delta x_i(t_n)$ , while the residual network $\mathcal{N}_i(\cdot)$ learns the one-step correction.
This structure renders the model interpretable as a data-driven adjustment to a persistence prior, with the correction capturing unmodeled nonlinearities and inter-stand coupling via $\Delta x_{Z_i}$ under varying operating conditions.
\end{remark}

For the $j$ -th one-step data sample, $j=1,\ldots,J$ , we set
\begin{equation}
\begin{aligned}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top}.
\end{aligned}
\end{equation}
The learning target remains the state-increment residual
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\end{equation}

%========================
\subsection{Training, Learned Model, and System Prediction}
%========================

To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability, we train the forward predictor jointly with an auxiliary backward residual model and impose a multi-step reciprocal-consistency regularization over a $K$ -step segment.

In addition to the forward residual predictor, we construct a backward residual network for subsystem $i$ ,
\begin{equation}
\mathcal{B}i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$ . For the backward step associated with the interval $[t_n,t_{n+1}]$ , we define
\begin{equation}
\begin{aligned}
X{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),
\Gamma_{i,n},\ \delta_n
\big]^{\top},\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}i X{i,\mathrm{in}}^{b} + \mathcal{B}i(X{i,\mathrm{in}}^{b};\bar{\Theta}i),
\end{aligned}
\end{equation}
where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$ . Accordingly, the supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t{n+1}).
\end{equation}

Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$ , we initialize the forward rollout by
\begin{equation}
\Delta \hat{x}i(t_n)=\Delta x_i(t_n),
\end{equation}
and apply the forward predictor recursively for $K$ steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1})
&=
\Delta \hat{x}i(t{n+s})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s}),,\Delta \hat{x}{Z_i}(t{n+s}),\
&\qquad\qquad\qquad
\Gamma_{i,n+s},,\delta_{n+s};,\Theta_i
\Big),
\quad s=0,\ldots,K-1.
\end{aligned}
\end{equation}
After obtaining the terminal forward prediction, we set the terminal condition for the backward rollout as
\begin{equation}
\Delta \bar{x}i(t{n+K})=\Delta \hat{x}i(t{n+K}),
\end{equation}
and roll back using $\mathcal{B}_i$ along the same segment:
\begin{equation}
\begin{aligned}
\Delta \bar{x}i(t{n+s})
&=
\hat{I}i X{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}i!\Big(X{i,\mathrm{in}}^{b}(t_{n+s});,\bar{\Theta}i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\end{equation}
where the backward input at time $t_{n+s}$ is
\begin{equation}
X{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}i(t{n+s+1}),\ \Delta \hat{x}{Z_i}(t{n+s+1}),
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top}.
\end{equation}
and $\Delta \hat{x}_{Z_i}(\cdot)$ is obtained from the same forward rollout.

With the forward and backward trajectories available on the same segment, we define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)

\sum_{s=0}^{K}
\left|
\Delta \hat{x}i(t{n+s})-\Delta \bar{x}i(t{n+s})
\right|^2.
\end{equation}
We then train the forward and backward networks jointly by minimizing the following overall objective terms:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}i!\left(
X{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big|^2,\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}i!\left(
X{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}i
\right)
\Big|^2,\[2mm]
L{\mathrm{msrp}}(\Theta_i,\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K} E_i^{(j)}(t_n),\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}i^{(j)}(t{n+s})
\Big|^2.
\end{aligned}
\end{equation}

After training, we obtain the forward model
\begin{equation}
X_{i,\mathrm{out}} = \hat{I}i X{i,\mathrm{in}} + \mathcal{N}i(X{i,\mathrm{in}}; \Theta_i^*).
\end{equation}
For system prediction using the trained neural network model on a local interval $[t_n,t_{n+1}]$ , we define the input vector as
\begin{equation}
X_{i,\mathrm{in}}

\big[
\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n
\big]^{\top}.
\end{equation}
and perform the one-step prediction
\begin{equation}
\Delta \hat{x}i(t{n+1})

\Delta x_i(t_n)
+
\mathcal{N}i!\Big(
\Delta x_i(t_n),,\Delta x{Z_i}(t_n),,
\Gamma_{i,n},,\delta_n;,\Theta_i^*
\Big).
\end{equation}
By applying this predictor recursively, we obtain a network model that predicts the system trajectory over long horizons. Finally, the network parameters are optimized using the Adam optimizer:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \eta \frac{\hat{m}{i,t}}{\sqrt{\hat{v}{i,t}} + \varepsilon},
\end{equation}
where $\Theta_{i,t}$ denotes the current parameters, $\Theta_{i,t+1}$ the updated parameters, $\eta$ the learning rate, $\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ the bias-corrected first and second moment estimates, and $\varepsilon$ a small constant for numerical stability. Figure~\ref{fig:rnn_logic} illustrates the overall structure.

\begin{figure}[htbp]
\centering
\includegraphics[scale=0.85]{picture/x6.pdf}
\caption{Logic diagram of the residual neural network.}
\label{fig:rnn_logic}
\end{figure}

\section{Nash Equilibrium-Based RNE-DMPC}

The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in operating conditions or control actions at one stand can affect both upstream and downstream stands,
making centralized online optimization over high-dimensional decision variables computationally demanding.

To mitigate this issue, we decompose the global predictive-control problem into $N$ local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate the distributed coordination process as a Nash-equilibrium-seeking iteration.

Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed model predictive control method (RNE-DMPC)
to achieve coordinated thickness--tension regulation. The overall control structure is shown in Figure~\ref{4}.

\begin{figure*}[htbp]
\centering
\includegraphics[width=\linewidth]{picture/x2.pdf}
\caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}

For interconnected systems such as tandem cold rolling mills, distributed MPC can reduce the computational burden of online optimization
and improve scalability through parallel local optimization with limited information exchange.
In RNE-DMPC, each subsystem exchanges predicted trajectories and decision variables through a communication module.
Coordination among subsystems is achieved by iteratively seeking a Nash equilibrium, where each local controller computes a best response
to the most recent strategies of its neighbors.

In the proposed architecture, each stand-level controller regulates its local actuators according to the assigned control objectives and constraints, while coordination is maintained through information exchange with neighboring stands.
Thus, the interconnected cold rolling system can achieve distributed thickness-tension control based on Nash equilibrium.

\subsection{Subsystem Prediction and Optimization}

Each rolling stand is treated as a subsystem. The coupled subsystem dynamics are written as
\begin{equation}
\Delta x_i(t_{n+1})

f_i!\big(\Delta x_i(t_n),u_i(t_n)\big)
+
\sum_{k \in Z_i} g_{ik}!\big(\Delta x_k(t_n),u_k(t_n)\big),
\end{equation}
where $\Delta x_i(t_n)\in\mathbb{R}^{n_x}$ and $u_i(t_n)\in\mathbb{R}^{n_u}$ denote the state increment and control input of subsystem $i$ ,
$Z_i$ is the neighbor set, and $g_{ik}$ characterizes the coupling effect.

Define the neighbor-state-increment stack as
\begin{equation}
\Delta x_{Z_i}(t_n) =
\mathrm{col}{\Delta x_k(t_n),|,k\in Z_i}.
\end{equation}

\textbf{Local polynomial parameterization.}
Over each local interval $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$ , the control increment trajectory of subsystem $i$
is parameterized by a quadratic polynomial:
\begin{equation}
\begin{aligned}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
&=
\gamma_{i,n+s,0}
+\gamma_{i,n+s,1}\tau
+\gamma_{i,n+s,2}\tau^2,\
&\qquad \tau \in [0,\delta_{n+s}].
\end{aligned}
\end{equation}
where $\Gamma_{i,n+s}=[\gamma_{i,n+s,0},\gamma_{i,n+s,1},\gamma_{i,n+s,2}]^\top\in\mathbb{R}^{p}$
and $p=3$ for the single-input case.

\textbf{Neural-network-based prediction.}
Using the trained residual neural network surrogate, subsystem $i$ predicts its one-step state increment by
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1})
&=
\Delta \hat{x}i(t{n+s})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s}),,
\Delta \hat{x}{Z_i}(t_{n+s}), \
&\qquad\qquad
\Gamma_{i,n+s},,
\delta_{n+s};,
\Theta_i^*
\Big), \
&\qquad s=0,\ldots,N_p-1,
\end{aligned}
\end{equation}
where $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from the latest communicated neighbor predictions.

\textbf{Decision variables.}
Optimize the local parameter sequence over the control horizon $N_c$ :
\begin{equation}
\begin{aligned}
\mathbf{\Gamma}i(t_n)
&=
\big[
\Gamma{i,n}^\top,,
\Gamma_{i,n+1}^\top,,
\ldots,,
\Gamma_{i,n+N_c-1}^\top
\big]^\top \
&\in \mathbb{R}^{pN_c}.
\end{aligned}
\end{equation}

\textbf{Local objective.}
The local cost function of subsystem $i$ is defined as
\begin{equation}
\begin{aligned}
J_i
&=
\sum_{s=1}^{N_p}
\big|
\Delta \hat{x}i(t{n+s}) - \Delta x_{\mathrm{ref}}(t_{n+s})
\big|{Q_i}^2 \
&\quad +
\sum{s=0}^{N_c-1}
\big|
\Gamma_{i,n+s}
\big|_{R_i}^2,
\end{aligned}
\end{equation}
where $Q_i$ and $R_i$ are weighting matrices and $\Delta x_{\mathrm{ref}}(t_{n+s})$ is the reference for the state increment. $\Gamma_{i,n+s}$ is the local polynomial-parameter vector of the control increment for subsystem $i$ over the interval $[t_{n+s},t_{n+s+1}]$ .

\textbf{Constraints.}
Typical constraints include bounds on absolute inputs and increment trajectories:
\begin{align}
u_{i,\min} &\le u_i(t_{n+s}) \le u_{i,\max},\
\Delta u_{i,\min}
&\le
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
\le
\Delta u_{i,\max}, \notag \
&\hspace{3.6cm}
\forall\tau\in[0,\delta_{n+s}].
\end{align}
In practice, the interval-wise bound can be checked by evaluating $\tau=0$ , $\tau=\delta_{n+s}$ , and the quadratic extremum.

To enforce the absolute-input constraints consistently within the prediction horizon, we update the absolute input using the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_{n+s})
&=
\frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}),d\tau \
&=
\gamma_{i,n+s,0}
+\gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3},
\end{aligned}
\end{equation}
and propagate

\begin{equation}
\begin{aligned}
u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \
u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1.
\end{aligned}
\end{equation}

\textbf{Local optimization problem.}
At Nash-iteration index $l$ , subsystem $i$ solves
\begin{equation}
\mathbf{\Gamma}i^{(l)}=\arg\min{\mathbf{\Gamma}_i}; J_i \quad \text{s.t. (40)--(43)}.
\end{equation}

\subsection{Nash Equilibrium Coordination Iteration}

The Nash equilibrium is computed via distributed best-response iterations, summarized in Table~\ref{tab:nash_iter_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Distributed Nash best-response iteration for RNE-DMPC.}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \
\midrule
A &
Initialize $l=1$ and initialize $\mathbf{\Gamma}_i^{(0)}$ for all subsystems. \

B &
Using the surrogate predictor, compute $\Delta \hat{x}_i^{(l)}(t_{n+s})$ for $s=1,\ldots,N_p$ \
&
given $\mathbf{\Gamma}_i^{(l-1)}$ and the latest neighbor predictions
$\Delta \hat{x}_{Z_i}^{(l-1)}(t_{n+s})$ . \

C & Solve the local optimization problem to update $\mathbf{\Gamma}_i^{(l)}$ . \

D &
Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories
$\Delta \hat{x}_i^{(l)}(t_{n+s})$ to the communication system. \

E &
Update neighbor predictions $\Delta \hat{x}_{Z_i}^{(l)}(t_{n+s})$ using received information; \
& re-generate predictions if needed. \

F & Compute the maximum relative change $\varsigma^{(l)}$ . \

G &
If $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$ , stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$ ; \
& otherwise set $l\leftarrow l+1$ and repeat Steps B--F. \
\bottomrule
\end{tabularx}
\end{table}

The convergence metric in Step~F is defined as
\begin{equation}
\begin{aligned}
\varsigma^{(l)}
&=
\max_i
\frac{\left|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right|_2}{
\left|
\mathbf{\Gamma}_i^{(l-1)}
\right|_2+\epsilon},
\end{aligned}
\end{equation}
where $\epsilon>0$ avoids division by zero.

\textbf{Receding-horizon implementation.}
Only the first-interval parameters $\Gamma_{i,n}^*$ are applied.
The control increment trajectory over $[t_n,t_{n+1}]$ is
\begin{equation}
\Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^),
\quad \tau\in[0,\delta_n].
\end{equation}
To obtain the equivalent discrete-time control increment used for updating the absolute input, we use the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_n)
&=
\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau),d\tau \
&=
\gamma_{i,n,0}^
+
\gamma_{i,n,1}^\frac{\delta_n}{2}
+
\gamma_{i,n,2}^\frac{\delta_n^2}{3}.
\end{aligned}
\end{equation}
Then the absolute control input is updated by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),
\end{equation}
which ensures smooth input evolution and avoids abrupt actuator changes.
The overall control flow chart is shown in Fig.~\ref{liu}.

\begin{figure}[htbp]
\centering
\includegraphics[width=\linewidth]{picture/x5.pdf}
\caption{The overall system control flow chart}\label{liu}
\end{figure}这是我论文的第二到第三节技术部分。老师说写的太模糊了。没有说清楚变量的含义都是啥，张力和厚度在哪，公式（2）可不可以在五机架冷连轧的情境里这样参数化。第三章的网络没讲清楚，例如：有u和没u的区别，为什么这么设计网络模型等等。第四章没看出来控制策略怎么设计，参考信号不清楚，怎么把优化目标和优化函数说清晰，冲突没看出来，怎么解决冲突问题，优化问题最终怎么解。还有其他的一些问题，控制和预测怎么衔接的，预测怎么给控制服务等等。帮我理清一下思路哦，重新写，给出正确的思路

Answer 1

下面我先把你老师提到的“模糊点”逐条对应成一条清晰的技术主线，然后给你一份可以直接替换进论文的**“更清楚的第二～四章写法（含变量定义、厚度/张力位置、参数化合理性、网络设计动机、控制目标/参考/冲突/求解、预测与控制的衔接）”**。你可以按你论文的具体变量维度（比如每个机架状态到底包含哪些量）再做小改。

0. 先把整体逻辑理顺（你现在的核心问题）

你现在的写法里，读者会卡在 6 个地方：

Δx、x、u、Δu 的物理含义没落地：到底是“偏差量(deviation)”还是“相邻采样差分(increment)”？厚度、张力具体在哪个分量？
式(2)的“区间内多项式参数化”没有解释为什么在五机架冷连轧可行：控制是数字采样的，为什么要搞连续时间 τ？
网络结构没讲清楚“有 u”和“没 u”的差别：MPC 里必须能“随着你改变控制而改变预测”，否则控制优化没有意义。
训练目标的动机不清楚：多步 rollout 为啥要做？forward/backward reciprocal consistency 是解决什么问题？
第四章读者看不出控制策略怎么落地：参考信号是什么、优化目标怎么写、耦合冲突在哪里、怎么协调、最终怎么解、怎么滚动执行。
预测与控制怎么衔接没讲：NN 输出是什么、怎么进入 MPC 预测、怎么在每次采样时闭环更新。

你需要做的不是“再写更多公式”，而是把每一章都改成：先把对象定义清楚 → 再说明为什么这么建模/这么参数化 → 再给算法流程（输入/输出/求解器）。

1) 建议你统一符号：把“Δ”讲清楚（这是最大雷点）

你现在把 Δx 叫“state increment”，但又用它做状态演化。为了让读者不迷路，强烈建议你在论文里二选一并固定：

方案 A（更推荐，最稳）：Δ 表示“偏差量 deviation”

令 $x_i(t)$ 是第 i 机架相关物理量（厚度、张力等）的状态向量
选一个工作点/名义轨迹 $x_i^{\star}(t)$ ，定义

\Delta x_i(t) \triangleq x_i(t)-x_i^{\star}(t)

同理

\Delta u_i(t)\triangleq u_i(t)-u_i^{\star}(t),\qquad \Delta d_i(t)\triangleq d_i(t)-d_i^{\star}(t)

这样你的离散映射“Δx → Δx”就完全合理：它是偏差系统。

方案 B（不推荐但可用）：Δ 表示“采样差分 increment”

那就必须写清楚

\Delta x_i(t_n)\triangleq x_i(t_n)-x_i(t_{n-1})

但这样进入 MPC 会更绕，而且你的残差网络“shortcut 直接加 Δx(t_n)”会更难解释其物理意义。

建议你用方案 A，把“increment”改成“deviation”或“deviation state”，你老师提的“变量含义不清”会立刻好一半。

2) 厚度和张力到底在哪：给一个“机架 i 的状态向量”示例（你必须明确写）

五机架冷连轧典型关注两类输出：

出口厚度：每架出口厚度 $h_i$ （或其偏差 $\Delta h_i$ ）
机架间张力：第 i 与 i+1 架之间张力 $T_i$ （或 $\Delta T_i$ ）

一种很常见、读者也最好理解的状态选取（仅示例，你按你仿真模型调整维度）：

对机架 $i$ （1～5），定义局部状态（偏差）

\Delta x_i \triangleq \begin{bmatrix} \Delta h_i \\ \Delta T_{i-1} \\ \Delta T_i \end{bmatrix}

其中

$\Delta h_i$ ：第 i 架出口厚度偏差
$\Delta T_{i-1}$ ：(i-1,i) 间张力偏差（i=1 时没有，可置 0 或不纳入）
$\Delta T_i$ ：(i,i+1) 间张力偏差（i=5 时没有）

邻居集合也要落地写清楚（五机架串联系统最简单）：

Z_i= \begin{cases} \{2\}, & i=1\\ \{i-1,i+1\}, & i=2,3,4\\ \{4\}, & i=5 \end{cases}

3) 式(2)的“区间内二次多项式参数化”在五机架场景为什么合理？（你要这样解释）

你的参数化不是为了“数学好看”，而是为了解决冷连轧控制里两个现实点：

执行器/设定值需要平滑：辊缝/速度/弯辊等设定值如果采样点间跳变，会激发张力振荡、厚度波动。
采样周期内控制并非真正常值：工业上常见做法是采样更新“目标”，但底层驱动/液压闭环会在采样周期内连续变化；用低阶多项式近似一个采样区间内的变化是合理的“等效输入模型”。

因此你可以把区间内的控制偏差写成基函数展开（你这里选二次）：

\Delta u_{i,n}(\tau)=\gamma_{i,n0}+\gamma_{i,n1}\tau+\gamma_{i,n2}\tau^2,\quad \tau\in[0,\delta_n]

这等价于“用 3 个参数描述一个采样区间内的平滑控制轨迹”，比 piecewise-constant 更能表达渐变/缓变，也更利于约束“采样区间内任何时刻不越界”。

然后你定义的离散等效输入用“区间平均”：

\Delta u_i(t_n) \triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau = \gamma_{i,n0}+\gamma_{i,n1}\frac{\delta_n}{2}+\gamma_{i,n2}\frac{\delta_n^2}{3}

这一步要强调一句：

控制优化决策变量是 $\Gamma_{i,n}$ ，但为了与离散状态更新与执行器指令一致，我们用区间平均将连续轨迹映射为离散等效控制量，同时区间内约束仍用轨迹检查保证。

再补一句你老师关心的：“可不可以在五机架情境这样参数化？”
答案是可以，但要加一句适用条件：采样周期相对执行器带宽不能太大、且实际指令在区间内变化可被低阶多项式近似。你可以这样写：

当采样周期 $\delta_n$ 相对液压/速度内环带宽不至于过大时，区间内的控制偏差轨迹可用低阶多项式近似；二次项在不显著增加决策维度的前提下提供“曲率自由度”，能更好拟合加减速/辊缝渐变等工况。

4) “有 u”和“没 u”到底差在哪？（第三章必须讲透）

这句话你可以直接写进论文：

如果网络输入里没有控制量（或其参数 $\Gamma$ ），网络学到的是“在训练数据控制策略下系统如何演化”的闭环轨迹模型。这模型不能用于 MPC，因为 MPC 要通过改变控制来改变预测；没有控制输入的模型无法回答“如果我换一个控制会怎样”。
如果网络输入包含 $\Gamma_{i,n}$ （以及 $\delta_n$ ），网络才是“可控预测模型（surrogate dynamics）”，能用于在线优化。

所以你现在的写法必须明确：你训练的是

\Delta x_i(t_{n+1}) \approx \Delta x_i(t_n) + \mathcal N_i\big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n\big)

而不是“仅靠状态自回归”。

5) 为什么用 residual 结构？为什么加 η 分支？（你要给明确动机）

你现在写得像“为了鲁棒性加个分支”，但读者不知道它解决啥。建议这样讲：

residual（shortcut）解决什么

轧机采样周期一般不大，相邻采样时刻状态偏差变化往往“相对平缓”。
用残差结构相当于给了一个物理先验：短时内状态变化不大（persistence prior）。
网络只学习“校正项”，训练更稳定、多步滚动误差更小。

变采样/大 δ 的问题

当 $\delta_n$ 变大时，系统变化幅度变大，单一残差网络容易对“时间尺度”敏感。你引入

\mathcal N_i(\cdot)=\eta_i(\cdot)+\mathcal I_i(\cdot)

一个更清晰的解释是：

$\eta_i$ 用来学习随 $\delta_n$ 缩放/偏置的低频项（例如近似线性随 $\delta$ 放大），
$\mathcal I_i$ 学习剩余的非线性耦合细节。
这样做能缓解“不同采样间隔下同一状态变化尺度不同”的问题。

6) 第四章你要把“控制策略怎么落地”写成可执行算法

你老师说“没看出来控制策略怎么设计”，通常是因为你缺了 5 件事：

控制目标到底是什么：厚度跟踪？张力稳定？两者权衡？
参考信号是什么：常值设定？来自工艺计划？由上层调度给？
耦合冲突在哪里：一个机架改速度会影响两侧张力，厚度与张力可能相互拉扯。
协调机制是什么：你说 Nash，但要说清楚每个子问题里“邻居的信息”怎么进入预测和代价。
最终怎么解：每个子问题是 NLP 还是 QP？用什么求解器？每次迭代怎么停止？滚动时怎么应用？

下面给你一份“更清楚的写法模板”。

✅ 给你一份可直接替换论文的“重写版思路 + LaTeX骨架”

你可以把下面当成第二～四章的“清晰版本”，再按你实际状态/控制维度微调。

第二章（建议重命名）数据集构造与离散等效输入

2.1 五机架耦合系统与符号定义（必须新增）

（建议放一个符号表/文字说明）

五机架编号 $i\in\{1,2,3,4,5\}$ ，采样时刻 $t_n$ ，区间长度 $\delta_n=t_{n+1}-t_n$ 。
机架 $i$ 的局部状态选为（示例）

x_i(t)=\begin{bmatrix} h_i(t)\\ T_{i-1}(t)\\ T_i(t) \end{bmatrix}, \qquad \Delta x_i(t)\triangleq x_i(t)-x_i^\star(t)

其中 $h_i$ 为第 i 架出口厚度， $T_i$ 为 (i,i+1) 间张力； $(\cdot)^\star$ 为名义工况/参考轨迹。

控制输入（示例）可以是辊缝设定、速度设定或其组合：

u_i(t)=\begin{bmatrix} s_i(t)\\ v_i(t) \end{bmatrix}, \qquad \Delta u_i(t)\triangleq u_i(t)-u_i^\star(t)

若只选单输入（例如辊缝），则 $u_i\in\mathbb R$ 。

邻居集合（串联）：

Z_1=\{2\},\; Z_i=\{i-1,i+1\}(i=2,3,4),\; Z_5=\{4\}

并定义邻居堆叠

\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}.

这一段写清楚后，厚度和张力“在哪”就明确了：它们就是 $x_i$ 的分量。

2.2 离散映射与等效输入的定义（把你式子“物理化”）

由于五机架系统存在显著耦合、摩擦与材料参数不确定等因素，难以获得高精度的一阶机理离散模型。本文采用数据驱动方式学习如下局部离散演化关系：

\Delta x_i(t_{n+1}) = \Phi_i\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Delta u_i([t_n,t_{n+1}]),\Delta d_i([t_n,t_{n+1}])\Big)

其中 $\Phi_i(\cdot)$ 为未知非线性映射； $\Delta u_i([t_n,t_{n+1}])$ 表示该采样区间内的控制轨迹信息。

为与离散预测/执行一致，引入区间平均等效控制偏差：

\Delta u_i(t_n) \triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau, \quad \tau=t-t_n\in[0,\delta_n].

相应地，扰动也可用区间平均表示 $\Delta d_i(t_n)$ 。

2.3 区间内控制轨迹参数化（回答“能不能这么做”）

为在不显著增加在线优化维度的前提下描述采样区间内的平滑控制变化，本文将区间控制偏差轨迹用二次多项式参数化：

\Delta u_{i,n}(\tau;\Gamma_{i,n}) = \gamma_{i,n0}+\gamma_{i,n1}\tau+\gamma_{i,n2}\tau^2,\quad \tau\in[0,\delta_n],

其中 $\Gamma_{i,n}=[\gamma_{i,n0},\gamma_{i,n1},\gamma_{i,n2}]^\top$ 。
该参数化可近似描述轧机执行器在采样周期内的渐变响应，并天然保证控制轨迹在区间内可检查约束（见第4章）。

由此区间平均等效输入为

\Delta u_i(t_n)= \gamma_{i,n0}+\gamma_{i,n1}\frac{\delta_n}{2}+\gamma_{i,n2}\frac{\delta_n^2}{3}.

（这里加一句适用条件/解释：采样周期与执行器带宽关系、平滑性需求。）

2.4 一步样本与 K 步段样本（把“输入/输出”写成机器学习可读格式）

一步样本（监督学习）
对每个区间 $[t_n,t_{n+1}]$ ，构造训练输入

X_{i,\text{in}}(t_n)= \big[\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n\big]^\top

输出标签为 $\Delta x_i(t_{n+1})$ 或残差 $\Delta r_i=\Delta x_i(t_{n+1})-\Delta x_i(t_n)$ 。

K 步段样本（用于 rollout 与一致性正则）
组织连续 K 个区间得到段样本 $\mathcal W_{i,n}$ ，包含

\Big\{(\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1},\; (\Delta x_i(t_{n+s+1}))_{s=0}^{K-1}\Big\}.

这一段的意义要写明：用于训练时约束长期递推误差（drift），让模型可用于 MPC 的长预测域。

第三章残差神经网络预测模型（必须把“有u/无u、为什么残差”讲透）

3.1 预测任务定义：我们要学什么？

目标是学习一个可控的一步预测模型：

\Delta x_i(t_{n+1}) \approx \Delta x_i(t_n) + \mathcal N_i(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n),

强调： $\Gamma_{i,n}$ 进入输入，保证模型能响应控制变化，从而可用于 MPC 在线优化。

3.2 残差结构与网络设计动机

shortcut 项 $\Delta x_i(t_n)$ 提供“短时保持”的先验；
网络仅学习校正项（包含耦合与非线性），多步滚动更稳定；
引入 $\delta_n$ 使模型具备变采样泛化能力。

你原来的 $\eta_i + \mathcal I_i$ 可以这样解释并写清楚：

\mathcal N_i(\cdot)=\eta_i(\cdot)+\mathcal I_i(\cdot),

其中 $\eta_i$ 用于学习与时间尺度 $\delta_n$ 强相关的低频项（如随 $\delta$ 的尺度变化）， $\mathcal I_i$ 学习剩余耦合非线性细节；当 $\eta_i\equiv 0$ 退化为标准 ResNet。

3.3 多步训练：rollout loss + reciprocal consistency 为何需要？

要用于 MPC，模型必须在 $N_p$ 步滚动中不发散。仅用一步 MSE 容易出现“短期准、长期漂”。
因此引入：

rollout loss：直接惩罚多步预测轨迹与真值偏差（抑制误差累积）
forward/backward reciprocal consistency：通过学习一个反向模型 $\mathcal B_i$ 并约束“正向滚到终点再反推应回到起点”，从结构上约束模型减少漂移（类似可逆一致性的正则化思想）

（这段你现在写了很多公式，但缺“这一招解决什么问题”的解释。补上这两三句话，老师就容易认可。）

第四章 RNE-DMPC 控制设计（你要把“参考、冲突、求解、滚动执行”写成完整闭环）

4.1 控制目标、参考信号与冲突来源（必须写清楚）

控制目标：
1. 厚度跟踪：使各机架出口厚度偏差 $\Delta h_i$ 跟踪参考 $\Delta h_i^{\text{ref}}$ （通常为 0 或来自工艺设定）
2. 张力调节：使机架间张力偏差 $\Delta T_i$ 跟踪 $\Delta T_i^{\text{ref}}$ （通常为 0 或张力设定偏差）
参考信号来源（你必须选一种说法并固定）：
- 常值设定： $\Delta x_{\text{ref}}(t)\equiv 0$ （围绕名义工况稳态调节）
- 或上层工艺给定的时变轨迹： $\Delta x_{\text{ref}}(t_{n+s})$ 来自生产计划/换规格策略/张力与厚度设定曲线
冲突来源（耦合）：
在五机架串联系统中，机架 $i$ 的速度/辊缝等动作会改变带材流动与变形，从而影响相邻机架间张力；即 “局部最优”可能使邻居张力恶化。
这就是为什么需要分布式协调（Nash 迭代）而不是各做各的。

4.2 用 NN 做预测：把“预测如何服务控制”写成一句话

在 MPC 每次在线优化中，候选决策变量是 $\mathbf\Gamma_i$ 。给定 $\mathbf\Gamma_i$ 和邻居的最新策略/预测，利用已训练网络递推得到预测轨迹：

\Delta \hat x_i(t_{n+s+1})= \Delta \hat x_i(t_{n+s})+ \mathcal N_i(\Delta \hat x_i(t_{n+s}),\Delta \hat x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s}).

这条递推就是“控制优化的模型约束”。 没这条，MPC 就无从谈起。

4.3 局部优化问题：决策变量、代价函数、约束（写清楚“是什么 + 为什么”）

决策变量（控制域 $N_c$ ）：

\mathbf{\Gamma}_i= [\Gamma_{i,n}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top]^\top

代价函数（把厚度/张力写进来，别只写 $\Delta x$ ）：

若 $\Delta x_i=[\Delta h_i,\Delta T_{i-1},\Delta T_i]^\top$ ，则可写

J_i= \sum_{s=1}^{N_p} \Big( \|\Delta \hat h_i(t_{n+s})-\Delta h_i^{\text{ref}}(t_{n+s})\|_{q_{h,i}}^2 + \|\Delta \hat T_{i}(t_{n+s})-\Delta T_{i}^{\text{ref}}(t_{n+s})\|_{q_{T,i}}^2 +\cdots \Big) + \sum_{s=0}^{N_c-1}\|\Gamma_{i,n+s}\|_{R_i}^2

（你可以把“厚度权重更大/张力权重更大”的工艺解释写一句。）

约束（要说清楚“如何把 Γ 变成对 u 的约束”）：

区间内控制轨迹约束：

\Delta u_{\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{\max},\ \forall\tau\in[0,\delta_{n+s}]

实现上检查 $\tau=0,\delta$ 以及二次极值点 $\tau^\star=-\gamma_1/(2\gamma_2)$ （若落在区间内）。

绝对输入约束通过区间平均更新：

\Delta u_i(t_{n+s})=\gamma_0+\gamma_1\frac{\delta}{2}+\gamma_2\frac{\delta^2}{3},\quad u_i(t_{n+s})=u_i(t_{n+s-1})+\Delta u_i(t_{n+s})

从而 enforce：

u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max}.

4.4 Nash 协调：冲突怎么解决？（用“最佳响应”把逻辑讲通）

每个子系统 $i$ 在给定邻居策略 $\mathbf\Gamma_{Z_i}$ 下求解自己的最优响应（best response）：

\mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf\Gamma_i} J_i(\mathbf\Gamma_i;\mathbf\Gamma_{Z_i}^{(l-1)})

通过通信交换 $\mathbf\Gamma_i^{(l)}$ 与预测轨迹 $\Delta\hat x_i^{(l)}$ ，迭代直到收敛：

\varsigma^{(l)}= \max_i\frac{\|\mathbf\Gamma_i^{(l)}-\mathbf\Gamma_i^{(l-1)}\|}{\|\mathbf\Gamma_i^{(l-1)}\|+\epsilon}\le \varsigma_{\text{tol}}.

这段要明确说：Nash 迭代的本质是**“耦合冲突通过重复互相回应找到一个稳定折中点”**，它不是强行让大家一致（那是共识/ADMM），而是让谁也不愿意单方面改变（均衡）。

4.5 最终怎么解？（你必须给出“求解器层面”的一句话）

每个局部问题通常是一个带 NN 预测约束的非线性规划（NLP）。因为 NN 可微，可以采用：

SQP / interior-point（如 IPOPT）
或基于自动微分的梯度法（若你用 PyTorch/TF 端到端）

你不需要写“我用 IPOPT”，但至少要写：

本文将局部优化问题视为可微 NLP，利用梯度信息进行数值求解；每次 Nash 迭代求得 $\mathbf\Gamma_i^{(l)}$ 后广播更新。

4.6 滚动时域闭环：控制与预测如何衔接（用 5 行把闭环讲清）

在每个采样时刻 $t_n$ ：

测量/估计当前 $\Delta x_i(t_n)$ ，并接收邻居上一次迭代/上一时刻的预测或策略初值；
运行 Nash 迭代：每个机架并行求解局部 NLP，交换预测与策略，直至收敛；
仅实施第一段控制参数 $\Gamma_{i,n}^\star$ ，生成区间内控制轨迹 $\Delta u_{i,n}(\tau)$ ；
用区间平均更新实际输入 $u_i(t_n)$ ，系统运行到 $t_{n+1}$ ；
$n\leftarrow n+1$ 滚动重复。

这段写出来，读者就能看出“预测—优化—执行—再测量”的闭环链条，你老师说的“控制和预测怎么衔接”就解决了。

7) 你可以直接照着改的“最小清单”（不加新内容也能显著变清楚）

如果你不想大改结构，至少做这 10 个动作：

在第二章开头新增“符号与状态/输入定义”（写清厚度、张力在哪个分量）。
把“increment”改成“deviation”或明确 Δ 的定义（二选一）。
在式(2)后加一段解释：为什么区间参数化、为什么二次、适用条件。
在数据集定义里明确：网络输入是 $[\Delta x_i,\Delta x_{Z_i},\Gamma,\delta]$ ，输出是 $\Delta x_i(t_{n+1})$ 或残差。
第三章新增一段：“没有 u 的模型不能做 MPC，有 u 的模型才可控预测”。
第三章新增一段：残差结构为何能稳定多步、为何引入 η 分支应对 δ 变化。
第三章训练目标每个 loss 前加一句“它解决什么”。
第四章开头先写“控制目标=厚度+张力；参考=0 或工艺给定”。
第四章明确“冲突=耦合张力传播导致局部动作影响邻居”。
第四章给出“在线闭环流程（5步）+ 每个子问题是 NLP，用可微优化求解”。

如果你愿意，我也可以在你“真实的状态向量/控制量定义”（比如你到底选了辊缝、速度、弯辊还是张力给定等）基础上，把上面的“示例版”完全改成与你论文一致的最终可直接粘贴版：你只要告诉我两件事就行——

你每个机架的 $x_i$ 具体包含哪些量（厚度、张力之外还有没有速度差、压下量、轧制力等）以及维度 $d$ ；
你的 $u_i$ 到底是哪一个/哪些执行量（单输入还是多输入）。

Answer 2

在你这个“五机架冷连轧、状态只含厚度+张力”的设定下，我对 $u_i$ 的理解是：

$u_i$ 是机架 $i$ 在在线控制中“能直接下发/能被控制器改变的执行量（manipulated variables）”的集合，也就是你 MPC 的决策变量最终要落到的“设定值/指令”。

在冷连轧里，能同时影响厚度与张力、且工业上最常用、最容易让审稿人/老师接受的选择通常是下面这两个：

1) 最推荐、最标准的解释： $u_i=[s_i,\; v_i]$

(1) 辊缝/压下（Hydraulic gap / screw-down）

记作 $s_i$ （辊缝设定或等效压下量，很多文献也用 $S_i$ 、 $h_{0,i}$ 、 $x_{g,i}$ ）
主要影响：机架出口厚度 $h_i$ （“厚度控制”最直接的操纵量）
物理直觉：辊缝变小 → 压下率增大 → 出口厚度变薄（当然具体关系受轧制力、材料硬化等影响，所以你用数据驱动去学）

(2) 机架速度（roll speed / strip speed reference）

记作 $v_i$ （电机速度/线速度设定）
主要影响：张力（张力来源于相邻机架速度差 + 带钢延伸/弹性效应）
物理直觉：若 $v_{i+1}$ 相对 $v_i$ 增大，(i,i+1) 间张力 $T_i$ 往往上升（方向取决于你张力定义与模型）

所以写成：

u_i(t)=\begin{bmatrix}s_i(t)\\ v_i(t)\end{bmatrix},\qquad \Delta u_i(t)=u_i(t)-u_i^\star(t)

这样你第四章要做“厚度–张力协调控制”，逻辑就非常顺：

厚度靠 $s_i$ 纠偏
张力靠 $v_i$ 协调
耦合冲突来自“速度改动会影响上下游张力与厚度演化、辊缝改动也会影响带钢流动从而间接影响张力”。

这也能直接回应老师质疑：“有 $u$ 和没 $u$ 的区别”：没 $u$ 你就根本没法通过优化改变预测张力/厚度。

2) 你现在的“区间二次多项式参数化”如何对应到多输入？

如果 $u_i$ 是二维（辊缝+速度），那你现在的参数化只要“对每个输入通道各来一套二次系数”即可：

\Delta u_{i,n}(\tau)= \begin{bmatrix} \Delta s_{i,n}(\tau)\\ \Delta v_{i,n}(\tau) \end{bmatrix}, \quad \Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2, \quad \Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2

于是

决策向量维度从你原来“单输入 $p=3$ ”变成“双输入 $p=6$ ”：

\Gamma_{i,n}= [\gamma^{(s)}_{i,n0},\gamma^{(s)}_{i,n1},\gamma^{(s)}_{i,n2},\gamma^{(v)}_{i,n0},\gamma^{(v)}_{i,n1},\gamma^{(v)}_{i,n2}]^\top

区间平均也分别计算（对每个通道同样是 $+\delta/2,+\delta^2/3$ 那套）

这点你在论文里一定要一句话点明：

“当 $n_u>1$ 时， $\Gamma_{i,n}$ 由各输入通道的多项式系数组合而成，维度为 $p=3n_u$ 。”

3) 另一种也能自洽、但需要你说明“内环”的选择： $u_i=[s_i,\; T_i^{ref}]$

有些产线确实存在“张力内环”（通过调速实现张力跟踪），这时外环 MPC可以不直接给 $v_i$ ，而是给一个张力参考 $T_i^{ref}$ ：

$u_i=[s_i,\;T_i^{ref}]$
速度由底层张力控制器自动调整

但注意：如果你这么定义 $u_i$ ，你的数据集离线仿真必须把“张力内环控制器”也包含进被控对象，否则网络学到的映射和现场执行不一致。老师如果问“你到底控制的是速度还是张力参考”，你必须说清楚层级结构。

4) 你现在只写“厚度+张力”，那 $u_i$ 选什么更合理？

在“只关心厚度+张力”且你要做 RNE-DMPC（分布式协调） 的语境里，我建议你论文里直接采用最经典的一句：

每机架采用两个可操纵量：辊缝（压下）与速度设定。辊缝主导厚度，速度差主导机架间张力；两者通过带钢变形与张力传播产生耦合，因此需要分布式协调。

这样第二章的参数化、第三章的 NN 输入、第四章的优化目标/冲突来源都能一条线串起来。

如果你愿意，我可以按你“状态只含厚度+张力”的约束，帮你把文中所有出现的 $u_i$ 、 $\Delta u_i$ 、 $\Gamma$ 、 $p$ 以及第4章代价函数里“ $\Delta x_{\rm ref}$ ”改写成明确的厚度参考 $\Delta h^{ref}$ 和 张力参考 $\Delta T^{ref}$ ，并把“冲突是什么、Nash 怎么解决、每个子问题怎么解、滚动怎么执行”那一段写成你论文可直接替换的 LaTeX 段落。你只要告诉我你希望：

每个机架的状态 $\Delta x_i$ 具体取 $[\Delta h_i,\Delta T_{i-1},\Delta T_i]$ 还是更简（例如只取 $[\Delta h_i,\Delta T_i]$ ）；
张力是定义为机架间 $T_i$ （i=1..4）还是每机架一个张力量。

Answer 3

latex
%========================
\section{Construction of Dataset}
%========================

\subsection{Five-stand tandem mill setting and notation (variables, thickness and tension)}
\label{subsec:notation_5stand}

Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n$ the $n$-th sampling interval length.

\textbf{Thickness and tension variables.}
Let $h_i(t)$ denote the exit thickness of stand $i$, and let $T_i(t)$ denote the inter-stand tension between stand $i$ and $i+1$
(i.e., $T_i$ is defined for $i=1,2,3,4$). The control goal of this paper is to achieve coordinated regulation/tracking of
$\{h_i\}_{i=1}^5$ and $\{T_i\}_{i=1}^4$ in the presence of strong inter-stand coupling.

\textbf{State (deviation) definition.}
To avoid ambiguity, throughout this paper the symbol ``$\Delta$'' attached to a \emph{state} denotes a \emph{deviation} (tracking error)
with respect to a prescribed reference (or nominal) trajectory:
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\end{equation}
For the five-stand case, we choose a stand-wise local state vector that explicitly contains the thickness and the adjacent tensions:
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3,
\label{eq:local_state_def}
\end{equation}
with the boundary convention $\Delta T_{0}(t)\equiv 0$ and $\Delta T_{5}(t)\equiv 0$ so that all stands share a unified dimension $d=3$.
In the cost function and constraints, boundary ``virtual'' tensions can be assigned zero weights so that they do not affect the optimization.

\textbf{Neighbor set (five-stand chain coupling).}
The mill coupling is dominated by tension propagation between adjacent stands, therefore we use the chain neighbor sets
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad
Z_5=\{4\}.
\label{eq:neighbor_set}
\end{equation}
Define the neighbor-state stack (used to encode inter-stand coupling information) as
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\,|\,k\in Z_i\}.
\label{eq:neighbor_stack}
\end{equation}

\textbf{Control inputs (actuators) and control increments.}
For each stand $i$, the manipulated variables are chosen as the \emph{roll gap (screw-down/hydraulic gap)} and the \emph{stand speed}:
\begin{equation}
u_i(t)=
\begin{bmatrix}
s_i(t)\\
v_i(t)
\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_def}
\end{equation}
To ensure smooth actuator evolution and to match typical industrial implementations, the optimization is conducted on
\emph{control increments} (sample-to-sample changes) rather than absolute inputs:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=
\begin{bmatrix}
\Delta s_i(t_n)\\
\Delta v_i(t_n)
\end{bmatrix}.
\label{eq:du_increment_def}
\end{equation}
Note the distinction: $\Delta x$ is a \emph{deviation state} (tracking error), while $\Delta u$ is an \emph{input increment}.

\textbf{Disturbance.}
Let $d_i(t)$ denote exogenous disturbances (e.g., entry thickness fluctuation, friction variation, material property changes, etc.).
We use $\Delta d_i(t)$ to denote their interval-level equivalent representation (defined later).

\subsection{Discrete interval mapping and data-driven motivation}
\label{subsec:discrete_mapping}

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by an equivalent discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Delta u_i([t_n,t_{n+1}]),\,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:true_unknown_mapping}
\end{equation}
where $\Phi_i(\cdot)$ is generally nonlinear and strongly coupled due to tension propagation and rolling deformation interactions.
A commonly used \emph{conceptual} local linear discrete form is
\begin{equation*}
\Delta x_i(t_{n+1})
=
M_d\,\Delta x_i(t_n)
+
N_d\,\Delta u_i(t_n)
+
F_d\,\Delta d_i(t_n),
\end{equation*}
where $M_d,N_d,F_d$ are equivalent discrete matrices and $\Delta u_i(t_n),\Delta d_i(t_n)$ are interval-level equivalent inputs/disturbances.
However, in a practical five-stand cold rolling mill, accurate identification of $(M_d,N_d,F_d)$ from first principles is difficult,
because of complex coupling, unmodeled nonlinearities, and time-varying operating conditions.

\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish the discrete mapping
based on first principles. Therefore, in this paper, we aim to learn an accurate approximation of \eqref{eq:true_unknown_mapping} from data,
and then embed the learned surrogate into distributed MPC.
\end{remark}

%========================
\subsection{Interval-level parameterization and one-step dataset}
%========================

To construct the supervised dataset for training, we locally parameterize the \emph{control increment trajectory} within each sampling interval.
For the local interval $[t_n,t_{n+1}]$, define the interval length
\begin{equation}
\delta_n = t_{n+1}-t_n ,
\end{equation}
and introduce a local time variable $\tau\in[0,\delta_n]$.

\textbf{Why interval-level parameterization is valid in five-stand cold rolling.}
Although the controller updates the setpoints at discrete instants $t_n$, the physical actuators (hydraulic gap and drive speed loops)
evolve continuously within $[t_n,t_{n+1}]$ and are typically implemented with smoothing/ramps.
Moreover, abrupt changes of roll gap/speed can excite tension oscillations and degrade thickness stability.
Therefore, representing the within-interval increment trajectory by a low-order polynomial provides:
(i) a compact finite-dimensional decision variable for optimization;
(ii) a smooth within-interval command profile;
(iii) a convenient way to enforce \emph{continuous-time} bounds in the whole interval.
This approximation is appropriate when the sampling interval is not excessively large compared to actuator bandwidth and
the within-interval evolution can be well captured by a low-order basis.

\textbf{Vector quadratic polynomial parameterization (two inputs).}
Using a quadratic polynomial basis for each input channel, the control increment trajectory on $[t_n,t_{n+1}]$ is parameterized as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=
\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,
\qquad \tau\in[0,\delta_n],
\label{eq:du_poly_vector}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ are vector coefficients.
Equivalently, for $n_u=2$ (roll gap and speed), one may write component-wise:
\begin{equation}
\begin{aligned}
\Delta s_{i,n}(\tau) &= \gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\\
\Delta v_{i,n}(\tau) &= \gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2.
\end{aligned}
\end{equation}
Define the local parameter vector by stacking all coefficients:
\begin{equation}
\Gamma_{i,n}\triangleq
\big[
(\Gamma_{i,n0})^\top,\,
(\Gamma_{i,n1})^\top,\,
(\Gamma_{i,n2})^\top
\big]^\top
\in\mathbb{R}^{p},
\qquad
p=3n_u=6.
\label{eq:Gamma_dim}
\end{equation}
Here, $\Gamma_{i,n0}$ denotes the initial baseline of the increments, while $\Gamma_{i,n1}$ and $\Gamma_{i,n2}$ describe the linear and quadratic variation rates.

\textbf{Equivalent discrete-time (interval-averaged) increments.}
Define the equivalent discrete-time increments as the interval averages:
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau,\\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau.
\end{aligned}
\label{eq:interval_average_def}
\end{equation}
With \eqref{eq:du_poly_vector}, the interval average admits a closed form:
\begin{equation}
\Delta u_i(t_n)=
\Gamma_{i,n0}
+\Gamma_{i,n1}\frac{\delta_n}{2}
+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:interval_average_closedform}
\end{equation}

\textbf{One-step sample generation (five-stand coupled simulation).}
Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$.
In addition to the local state deviation, the neighbor state deviations are also included to represent inter-stand coupling.
The specific process is shown in Table~\ref{tab:interval_sample_generation_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \\
\midrule
1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and its neighbor stack $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \\
2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (ranges of polynomial coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$). \\
3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via the vector polynomial model \eqref{eq:du_poly_vector}. \\
4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$. \\
\bottomrule
\end{tabularx}
\end{table}

Accordingly, an interval sample for subsystem $i$ can be represented as
\begin{equation}
\mathcal{D}_{i,n}=\big\{\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big\},
\label{eq:interval_sample_def}
\end{equation}
which is used to learn the mapping relationship from the current local and neighbor deviation states and the local control-increment trajectory
to the next local deviation state.

For each subsystem $i$, by repeating the above procedure across multiple intervals and randomized draws,
the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big\{&
\big(\Delta x_i^{(j)}(t_n),\,\Delta x_{Z_i}^{(j)}(t_n),\,\Delta x_i^{(j)}(t_{n+1});\\
&\qquad \Gamma_{i,n}^{(j)},\,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big\}.
\end{split}
\label{eq:one_step_dataset}
\end{equation}
The overall dataset for the five-stand mill can be denoted as $\{S_i\}_{i=1}^{5}$.
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[scale=0.5]{picture/Fig2.pdf}
  \caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}

%========================
\subsection{Multi-step rollout segment dataset}
%========================

The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth state trajectories over a horizon of $K$ consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into $K$-step trajectory segments.

Specifically, during offline simulation, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling
$\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances),
and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$.
Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks
$\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.

Define a $K$-step segment sample for subsystem $i$ as
\begin{equation}
\begin{aligned}
\mathcal{W}_{i,n}=
\Big\{&
\big(\Delta x_i(t_{n+s}),\,\Delta x_{Z_i}(t_{n+s}),\,\Gamma_{i,n+s},\,\delta_{n+s}\big)_{s=0}^{K-1}; \\
&\big(\Delta x_i(t_{n+s+1})\big)_{s=0}^{K-1}
\Big\}.
\end{aligned}
\label{eq:Kstep_segment_def}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}.
\label{eq:Kstep_dataset}
\end{equation}
Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (by keeping only $s=0$),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.

%========================
\section{Construction of Residual Neural Network}
%========================

\subsection{Network Architecture (what is learned, why residual, why include $u$)}
\label{subsec:net_arch}

Given the training dataset, the neural network model is defined and trained.
The network model aims to learn the stand-wise deviation-state evolution of the interconnected five-stand cold rolling system over each local interval.
Specifically, for subsystem $i$, the proposed residual network defines a nonlinear mapping
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\label{eq:Ni_map}
\end{equation}
where $d$ denotes the dimension of the local deviation state $\Delta x_i$ ($d=3$ in \eqref{eq:local_state_def}),
$|Z_i|$ is the number of neighbors of subsystem $i$ defined in \eqref{eq:neighbor_set},
and $p=\mathrm{dim}(\Gamma_{i,n})$ denotes the dimension of the local input-parameter vector ($p=6$ for two-input quadratic parameterization).

\textbf{Why the input must include control information.}
If the network input does \emph{not} include the control variables (here, $\Gamma_{i,n}$ and $\delta_n$),
the learned model degenerates to a purely autoregressive predictor that only reproduces trajectories under the \emph{training policy}
and cannot answer ``what will happen if we change the control?''
In MPC, the optimizer must evaluate the predicted trajectory under \emph{candidate} decision variables,
therefore a \emph{control-dependent} predictor (including $\Gamma_{i,n}$) is necessary for online optimization.

Accordingly, we define the residual mapping
\begin{equation}
\begin{aligned}
\Delta r_i &= \mathcal{N}_i(X_{i,\text{in}}; \Theta_i),\\
X_{i,\text{in}} &\in \mathbb{R}^{d(1+|Z_i|)+p+1},\qquad
\Delta r_i \in \mathbb{R}^d ,
\end{aligned}
\label{eq:residual_def}
\end{equation}
where $\Theta_i$ represents the trainable parameters of the network for subsystem $i$, and $X_{i,\text{in}}$ denotes the concatenated input vector for subsystem $i$ in the current sampling interval.

\textbf{Residual (shortcut) structure and interpretability.}
To explicitly incorporate a residual structure, the local state component in the input is passed through an identity shortcut
and added to obtain the one-step prediction.
Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a linear selection matrix whose block form is defined by
\begin{equation}
\hat{I}_i = [I_d,\, 0_{d\times(d|Z_i|+p+1)}].
\label{eq:selector_matrix}
\end{equation}
This shortcut represents a persistence prior: over a short interval, the deviation state tends to change moderately,
and the network mainly learns the \emph{correction} caused by nonlinear rolling behavior and inter-stand coupling.

\textbf{Auxiliary branch for variable $\delta_n$.}
To improve robustness when the sampling interval length $\delta_n$ varies or becomes relatively large,
we introduce an auxiliary branch inside $\mathcal{N}_i$:
\begin{equation}
\mathcal{N}_i(X_{i,\text{in}};\Theta_i)\triangleq
\eta_i(X_{i,\text{in}};\Theta_{\eta_i}) + \mathcal{I}_i(X_{i,\text{in}};\theta_i),
\label{eq:aux_branch}
\end{equation}
where $\eta_i(\cdot)$ can be implemented by a lightweight feedforward network and
$\mathcal{I}_i(\cdot)$ denotes the remaining residual branch.
Conceptually, $\eta_i(\cdot)$ captures low-frequency/scale effects associated with interval length $\delta_n$,
while $\mathcal{I}_i(\cdot)$ learns the remaining coupling nonlinearities.
When $\eta_i(\cdot)\equiv 0$, the model reduces to the standard residual form.

Hence, the one-step prediction of the local deviation state at $t_{n+1}$ can be written as
\begin{equation}
X_{i,\text{out}} = \hat{I}_i X_{i,\text{in}} + \mathcal{N}_i(X_{i,\text{in}}; \Theta_i),
\label{333}
\end{equation}
where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$.

\begin{remark}
The predictor in \eqref{333} admits a baseline-plus-correction form:
the shortcut term $\hat{I}_i X_{i,\mathrm{in}}$ propagates the current local deviation state $\Delta x_i(t_n)$,
while the residual network $\mathcal{N}_i(\cdot)$ learns the one-step correction.
This structure renders the model interpretable as a data-driven adjustment to a persistence prior,
with the correction capturing unmodeled nonlinearities and inter-stand coupling via $\Delta x_{Z_i}$
under varying operating conditions and varying sampling intervals.
\end{remark}

For the $j$-th one-step data sample, $j=1,\ldots,J$, we set
\begin{equation}
\begin{aligned}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\ 
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top}.
\end{aligned}
\label{eq:net_input_vector}
\end{equation}
The learning target is the one-step deviation-state change (residual)
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\label{eq:net_target_residual}
\end{equation}

%========================
\subsection{Training, Learned Model, and System Prediction (how prediction is used later)}
%========================

To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability,
we train the forward predictor jointly with an auxiliary backward residual model
and impose a multi-step reciprocal-consistency regularization over a $K$-step segment.

\textbf{Backward residual model.}
In addition to the forward residual predictor, we construct a backward residual network for subsystem $i$,
\begin{equation}
\mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$. For the backward step associated with the interval $[t_n,t_{n+1}]$, we define
\begin{equation}
\begin{aligned}
X_{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),\
\Gamma_{i,n},\ \delta_n
\big]^{\top},\\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}_i X_{i,\mathrm{in}}^{b} + \mathcal{B}_i(X_{i,\mathrm{in}}^{b};\bar{\Theta}_i),
\end{aligned}
\label{eq:backward_model_def}
\end{equation}
where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$.
Accordingly, the supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t_{n+1}).
\end{equation}

\textbf{Forward rollout over a $K$-step segment.}
Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$, we initialize the forward rollout by
\begin{equation}
\Delta \hat{x}_i(t_n)=\Delta x_i(t_n),
\end{equation}
and apply the forward predictor recursively for $K$ steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1})
&=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\,\Delta \hat{x}_{Z_i}(t_{n+s}),\,
\Gamma_{i,n+s},\,\delta_{n+s};\,\Theta_i
\Big), \\
&\qquad s=0,\ldots,K-1.
\end{aligned}
\label{eq:forward_rollout}
\end{equation}

\textbf{Backward rollout and reciprocal consistency.}
After obtaining the terminal forward prediction, we set the terminal condition for the backward rollout as
\begin{equation}
\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}),
\end{equation}
and roll back using $\mathcal{B}_i$ along the same segment:
\begin{equation}
\begin{aligned}
\Delta \bar{x}_i(t_{n+s})
&=
\hat{I}_i X_{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\,\bar{\Theta}_i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\label{eq:backward_rollout}
\end{equation}
where the backward input at time $t_{n+s}$ is
\begin{equation}
X_{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}_i(t_{n+s+1}),\ \Delta \hat{x}_{Z_i}(t_{n+s+1}),\
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top},
\end{equation}
and $\Delta \hat{x}_{Z_i}(\cdot)$ is obtained from the same forward rollout.

With the forward and backward trajectories available on the same segment, we define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)
=
\sum_{s=0}^{K}
\left\|
\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})
\right\|^2.
\end{equation}

\textbf{Training objectives (what each term enforces).}
We train the forward and backward networks jointly by minimizing the following objective terms:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}_i\!\left(
X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}_i\!\left(
X_{i,\mathrm{in}}^{b\,(j)}(t_{n+s});\bar{\Theta}_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n),\\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big\|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})
\Big\|^2.
\end{aligned}
\label{eq:training_losses}
\end{equation}
Here, $L_{\mathrm{1step}}$ enforces one-step accuracy, $L_{\mathrm{roll}}$ directly suppresses long-horizon error accumulation (drift),
and $L_{\mathrm{msrp}}$ regularizes the learned dynamics by reciprocal consistency between forward and backward rollouts.
In implementation, these terms can be combined by a weighted sum (weights chosen by validation):
\begin{equation}
L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}}.
\end{equation}

After training, we obtain the forward model
\begin{equation}
X_{i,\mathrm{out}} = \hat{I}_i X_{i,\mathrm{in}} + \mathcal{N}_i(X_{i,\mathrm{in}}; \Theta_i^*).
\end{equation}
For system prediction on a local interval $[t_n,t_{n+1}]$, we define the input vector as
\begin{equation}
X_{i,\mathrm{in}}
=
\big[
\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n
\big]^{\top},
\end{equation}
and perform the one-step prediction
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)
+
\mathcal{N}_i\!\Big(
\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,
\Gamma_{i,n},\,\delta_n;\,\Theta_i^*
\Big).
\label{eq:one_step_predict}
\end{equation}
By applying this predictor recursively, we obtain a network model that predicts the system trajectory over long horizons.
Finally, the network parameters are optimized using the Adam optimizer:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \eta \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon},
\end{equation}
where $\Theta_{i,t}$ denotes the current parameters, $\Theta_{i,t+1}$ the updated parameters, $\eta$ the learning rate,
$\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ the bias-corrected first and second moment estimates, and $\varepsilon$ a small constant for numerical stability.
Figure~\ref{fig:rnn_logic} illustrates the overall structure.

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.85]{picture/x6.pdf}
  \caption{Logic diagram of the residual neural network.}
  \label{fig:rnn_logic}
\end{figure}

%========================
\section{Nash Equilibrium-Based RNE-DMPC (five-stand, thickness--tension, clear objective and solution)}
%========================

The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in operating conditions or control actions (roll gap and speed) at one stand can affect both upstream and downstream stands,
making centralized online optimization over all decision variables computationally demanding.

To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate the distributed coordination process as a Nash-equilibrium-seeking iteration.

Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed model predictive control method (RNE-DMPC)
to achieve coordinated thickness--tension regulation/tracking. The overall control structure is shown in Figure~\ref{4}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[width=\linewidth]{picture/x2.pdf}
  \caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}

\subsection{Control objective, references, and coupling conflict (what is optimized and why there is conflict)}
\label{subsec:objective_reference_conflict}

\textbf{Tracking variables.}
The controlled outputs are the exit thicknesses $\{h_i\}_{i=1}^5$ and inter-stand tensions $\{T_i\}_{i=1}^4$.
In deviation coordinates, the controller aims to drive $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$ (regulation),
or track given time-varying references $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$.

\textbf{Reference signal definition (clear meaning of $\Delta x_{\mathrm{ref}}$).}
For each prediction step $s$, define the local deviation reference vector of stand $i$ as
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s}) \triangleq
\begin{bmatrix}
h_i^{\mathrm{ref}}(t_{n+s})-h_i^{\mathrm{ref}}(t_{n+s})\\
T_{i-1}^{\mathrm{ref}}(t_{n+s})-T_{i-1}^{\mathrm{ref}}(t_{n+s})\\
T_{i}^{\mathrm{ref}}(t_{n+s})-T_{i}^{\mathrm{ref}}(t_{n+s})
\end{bmatrix}
=
\mathbf{0},
\end{equation}
when performing regulation around the references (i.e., we penalize deviation errors).
Equivalently, if one prefers to use absolute states, one may set the tracking error as
$\hat e_{x,i}(t_{n+s})=\hat x_i(t_{n+s})-x_i^{\mathrm{ref}}(t_{n+s})$.
In this paper, we keep the deviation-state formulation and directly penalize $\Delta\hat x_i(t_{n+s})$.
Thus, in the cost below, $\Delta x_{\mathrm{ref}}(t_{n+s})$ should be interpreted as the desired deviation (typically zero).
For boundary terms ($T_0,T_5$), their references are set to zero and their weights can be set to zero.

\textbf{Why conflict occurs (explicit coupling).}
Inter-stand tensions are shared coupling variables: the tension $T_i$ depends on the speeds of both stand $i$ and $i+1$ and the strip transport,
therefore it is affected by decisions from \emph{two} neighboring controllers.
At the same time, each controller also tries to achieve its \emph{local} thickness target via its roll gap.
Consequently, a speed/gap action that improves local thickness may deteriorate a shared tension, and vice versa.
This creates an intrinsic multi-agent conflict, motivating Nash-equilibrium coordination.

\subsection{Subsystem prediction and local optimization (decision variables, model constraint, cost, constraints, and solver)}
\label{subsec:local_mpc}

Each rolling stand is treated as a subsystem.
The coupled subsystem dynamics can be conceptually written as
\begin{equation}
\Delta x_i(t_{n+1})
=
f_i\!\big(\Delta x_i(t_n),u_i(t_n)\big)
+
\sum_{k \in Z_i} g_{ik}\!\big(\Delta x_k(t_n),u_k(t_n)\big),
\end{equation}
where $Z_i$ is given by \eqref{eq:neighbor_set} and $g_{ik}$ characterizes the coupling effect (mainly through tensions).
Instead of using an explicit first-principles model of $f_i,g_{ik}$, we use the trained neural-network surrogate to predict the deviation-state evolution.

\textbf{Local polynomial parameterization of control increments (two inputs).}
Over each local interval $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$, the control increment trajectory of subsystem $i$
is parameterized by the vector quadratic polynomial \eqref{eq:du_poly_vector}:
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
=
\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\tau
+\Gamma_{i,n+s,2}\tau^2,
\qquad \tau \in [0,\delta_{n+s}],
\label{eq:du_poly_for_mpc}
\end{equation}
where $\Gamma_{i,n+s}\in\mathbb{R}^{p}$ with $p=6$ in \eqref{eq:Gamma_dim}.

\textbf{Neural-network-based prediction (model constraint for MPC).}
Using the trained residual neural network surrogate, subsystem $i$ predicts its one-step deviation state by
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1})
&=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\,
\Delta \hat{x}_{Z_i}(t_{n+s}),\,
\Gamma_{i,n+s},\,
\delta_{n+s};\,
\Theta_i^*
\Big), \\
&\qquad s=0,\ldots,N_p-1,
\end{aligned}
\label{eq:nn_predict_in_mpc}
\end{equation}
where $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from the latest communicated neighbor predictions (Nash iteration described later).
Equation \eqref{eq:nn_predict_in_mpc} is the key link between \textbf{prediction} and \textbf{control}:
for any candidate decision variables $\{\Gamma_{i,n+s}\}$, the MPC optimizer can roll out \eqref{eq:nn_predict_in_mpc} to evaluate the predicted thickness/tension deviations.

\textbf{Decision variables.}
Optimize the local parameter sequence over the control horizon $N_c$:
\begin{equation}
\begin{aligned}
\mathbf{\Gamma}_i(t_n)
&=
\big[
\Gamma_{i,n}^\top,\,
\Gamma_{i,n+1}^\top,\,
\ldots,\,
\Gamma_{i,n+N_c-1}^\top
\big]^\top
\in \mathbb{R}^{pN_c}.
\end{aligned}
\end{equation}

\textbf{Local objective (explicit thickness--tension meaning).}
Let $\Delta \hat{x}_i(t_{n+s})=[\Delta \hat h_i(t_{n+s}),\,\Delta \widehat T_{i-1}(t_{n+s}),\,\Delta \widehat T_{i}(t_{n+s})]^\top$.
The local cost function of subsystem $i$ is defined as
\begin{equation}
\begin{aligned}
J_i
&=
\sum_{s=1}^{N_p}
\big\|
\Delta \hat{x}_i(t_{n+s}) - \Delta x_{i,\mathrm{ref}}(t_{n+s})
\big\|_{Q_i}^2
+
\sum_{s=0}^{N_c-1}
\big\|
\Gamma_{i,n+s}
\big\|_{R_i}^2.
\end{aligned}
\label{eq:local_cost}
\end{equation}
In the deviation-state regulation setting, $\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0$, so the first term penalizes predicted thickness and tension deviations.
The weighting matrix $Q_i$ is chosen to reflect the relative importance between thickness and tension regulation
(e.g., a larger weight on the first component for strict thickness control, while still penalizing adjacent tension deviations),
and $R_i$ penalizes the control-increment trajectory parameters to ensure smooth actuation.

\textbf{Constraints (how $\Gamma$ enforces bounds over the whole interval).}
Typical constraints include bounds on absolute inputs and increment trajectories:
\begin{align}
u_{i,\min} &\le u_i(t_{n+s}) \le u_{i,\max},
\label{eq:u_abs_bound}\\
\Delta u_{i,\min}
&\le
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
\le
\Delta u_{i,\max}, \notag \\
&\hspace{3.6cm}
\forall\tau\in[0,\delta_{n+s}].
\label{eq:du_traj_bound}
\end{align}
Here, $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ specify bounds on roll gap and speed (component-wise),
and $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ specify allowable increment bounds.
In practice, for the quadratic trajectory \eqref{eq:du_poly_for_mpc}, the interval-wise bound \eqref{eq:du_traj_bound} can be checked
by evaluating $\tau=0$, $\tau=\delta_{n+s}$, and the quadratic extremum $\tau^\star=-\Gamma_{i,n+s,1}\oslash(2\Gamma_{i,n+s,2})$
(component-wise) whenever $\tau^\star\in[0,\delta_{n+s}]$.

\textbf{Absolute-input update using interval average (consistency with execution).}
To enforce the absolute-input constraints consistently within the prediction horizon, we update the absolute input using the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_{n+s})
&=
\frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\,d\tau \\
&=
\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3},
\end{aligned}
\label{eq:du_avg_in_mpc}
\end{equation}
and propagate
\begin{equation}
\begin{aligned}
u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \\
u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1.
\end{aligned}
\label{eq:u_update_in_mpc}
\end{equation}
This ensures smooth input evolution and avoids abrupt actuator changes.

\textbf{Local optimization problem (what is solved at each Nash iteration).}
At Nash-iteration index $l$, subsystem $i$ solves the following nonlinear optimization problem:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\; J_i
\quad \text{s.t.}\quad
\eqref{eq:nn_predict_in_mpc},\ \eqref{eq:u_abs_bound}\text{--}\eqref{eq:u_update_in_mpc}.
\label{eq:local_nlp}
\end{equation}
Because the neural surrogate $\mathcal N_i(\cdot)$ is differentiable, \eqref{eq:local_nlp} is a differentiable nonlinear program (NLP).
It can be solved by standard gradient-based NLP solvers (e.g., SQP or interior-point methods) using automatic differentiation to compute gradients.

\subsection{Nash equilibrium coordination iteration (how conflict is resolved and how the final solution is obtained)}
\label{subsec:nash_coordination}

The Nash equilibrium is computed via distributed best-response iterations. Each controller repeatedly computes its best response to the latest neighbor strategies
and exchanges predicted trajectories and decision variables through a communication module.
This resolves coupling conflicts by seeking a strategy profile in which no single subsystem can unilaterally reduce its own objective \eqref{eq:local_cost}
given the other subsystems' strategies.

The distributed best-response iteration is summarized in Table~\ref{tab:nash_iter_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Distributed Nash best-response iteration for RNE-DMPC (five-stand).}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \\
\midrule
A &
Initialize $l=1$ and initialize $\mathbf{\Gamma}_i^{(0)}$ for all stands (e.g., warm-start from previous time step). \\

B &
Using the surrogate predictor \eqref{eq:nn_predict_in_mpc}, compute $\Delta \hat{x}_i^{(l)}(t_{n+s})$ for $s=1,\ldots,N_p$ \\
&
given $\mathbf{\Gamma}_i^{(l-1)}$ and the latest neighbor predictions
$\Delta \hat{x}_{Z_i}^{(l-1)}(t_{n+s})$. \\

C & Solve the local NLP \eqref{eq:local_nlp} to update $\mathbf{\Gamma}_i^{(l)}$ (best response). \\

D &
Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories
$\Delta \hat{x}_i^{(l)}(t_{n+s})$ to the communication system. \\

E &
Update neighbor predictions $\Delta \hat{x}_{Z_i}^{(l)}(t_{n+s})$ using received information; re-generate predictions if needed. \\

F & Compute the maximum relative change $\varsigma^{(l)}$ as the convergence metric. \\

G &
If $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$, stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$; \\
& otherwise set $l\leftarrow l+1$ and repeat Steps B--F. \\
\bottomrule
\end{tabularx}
\end{table}

The convergence metric in Step~F is defined as
\begin{equation}
\begin{aligned}
\varsigma^{(l)}
&=
\max_i
\frac{\left\|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right\|_2}{
\left\|
\mathbf{\Gamma}_i^{(l-1)}
\right\|_2+\epsilon},
\end{aligned}
\end{equation}
where $\epsilon>0$ avoids division by zero.

\textbf{Receding-horizon implementation (how control is applied).}
After the Nash iteration converges, only the first-interval parameters $\Gamma_{i,n}^*$ are applied.
The control increment trajectory over $[t_n,t_{n+1}]$ is
\begin{equation}
\Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^*),
\quad \tau\in[0,\delta_n].
\end{equation}
To obtain the equivalent discrete-time control increment used for updating the absolute input, we use the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_n)
&=
\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau \\
&=
\Gamma_{i,n,0}^*
+
\Gamma_{i,n,1}^*\frac{\delta_n}{2}
+
\Gamma_{i,n,2}^*\frac{\delta_n^2}{3}.
\end{aligned}
\end{equation}
Then the absolute control input is updated by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),
\end{equation}
which ensures smooth input evolution and avoids abrupt actuator changes.

\textbf{How prediction and control are connected in closed loop.}
At each sampling instant $t_n$:
(i) measure/estimate the current deviation states $\Delta x_i(t_n)$ (thickness and tensions) and obtain neighbor information;
(ii) run the Nash best-response iteration using the neural predictor \eqref{eq:nn_predict_in_mpc} as the model constraint to solve \eqref{eq:local_nlp};
(iii) apply the first control-increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ and update $u_i(t_n)$ by interval averaging;
(iv) roll the horizon forward to $t_{n+1}$ and repeat.
In this way, the learned predictor directly provides the multi-step forecasts needed by MPC, and the optimization computes actuator trajectories
(roll gap and speed increments) that minimize thickness--tension deviations while coordinating coupling conflicts via Nash equilibrium.

The overall control flow chart is shown in Fig.~\ref{liu}.

\begin{figure}[htbp]
  \centering
  \includegraphics[width=\linewidth]{picture/x5.pdf}
  \caption{The overall system control flow chart}\label{liu}
\end{figure}

Answer 4

还没到“每个符号都解释得完全清楚”的程度：主线里“厚度/张力/辊缝/速度/邻居/输入参数化/预测—控制衔接”这些关键符号已经解释到位了，但仍有一批技术符号、集合符号、超参数符号、以及少量记号冲突需要你在文中补一句或加一个“符号表”才能避免老师继续挑。

下面我按模块给你做一个逐项核对清单（你可以直接照着补进论文）。

已经解释清楚（基本不会再被追问）

系统与索引： $i\in\{1,\dots,5\}$ 、 $t_n$ 、 $\delta_n$ 、 $\tau$
物理量： $h_i$ （机架出口厚度）、 $T_i$ （机架间张力）
状态： $\Delta h_i,\Delta T_i$ （偏差/误差的定义）、 $\Delta x_i$ （把厚度和相邻张力组成向量）
邻居： $Z_i$ 、 $\Delta x_{Z_i}$
执行量： $u_i=[s_i,v_i]^\top$ 、输入增量： $\Delta u_i(t_n)=u_i(t_n)-u_i(t_{n-1})$
区间参数化： $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ 、 $\Gamma_{i,n}$ 、维度 $p=6$ 、区间平均 $\Delta u_i(t_n)$ 的闭式表达
神经网络预测主式： $\Delta \hat x_i(t_{n+s+1})=\Delta \hat x_i(t_{n+s})+\mathcal N_i(\cdot)$
Nash迭代流程（A–G步骤）、收敛度量 $\varsigma^{(l)}$

还需要补充/澄清的符号（老师很容易抓这些）

A. 数据集相关

$\mathcal I_x,\ \mathcal I_\Gamma$
你写了“从 $\mathcal I_x$ 采样/从 $\mathcal I_\Gamma$ 采样”，但没定义它们是什么。
✅ 建议补：它们是采样域/超矩形区间（厚度张力偏差范围、辊缝/速度增量多项式系数范围）。
$\Delta d_i(\tau)$ 、 $\Delta d_i(t_n)$
你说扰动区间平均，但没说 $\Delta d_i(\tau)$ 代表什么（入口厚度扰动？摩擦？硬化？张力测量噪声？）。
✅ 建议补一句： $\Delta d_i(\tau)$ 是区间内扰动信号（列举典型来源）， $\Delta d_i(t_n)$ 是区间平均等效扰动。
“五机架耦合模型”具体指什么
你写“integrate coupled mill model”，但老师可能会问：耦合是通过张力传播？模型状态就是厚度张力？
✅ 建议补一句：耦合通过张力方程（速度差/延伸导致）体现；离线仿真使用的机理模型是五机架全耦合的。

B. 网络结构相关

$I_d$ 、 $0_{(\cdot)}$ 、 $\hat I_i$
你给了选择矩阵形式，但最好补一句： $I_d$ 为 $d\times d$ 单位阵， $0$ 为零矩阵。
$\eta_i(\cdot)$ 分支与学习率 $\eta$ 记号冲突（这个很关键）
你网络里用了 $\eta_i(\cdot)$ 作为辅助分支，Adam 又用了学习率 $\eta$ 。同一个希腊字母会被老师直接指出“符号混乱”。
✅ 建议改一个：

保留分支 $\eta_i(\cdot)$ ，把学习率改成 $\alpha$ ；或
保留学习率 $\eta$ ，把分支改成 $\psi_i(\cdot)$ 。

$\mathcal I_i(\cdot)$ 与“集合 $\mathcal I_x$ ”同字母 I
你用了 $\mathcal I_i$ 表示残差分支，又用 $\mathcal I_x$ 表示采样域，符号容易混。
✅ 建议把残差分支改成 $\rho_i(\cdot)$ 或 $\mathcal R_i(\cdot)$ 。
损失项权重 $\lambda_1,\dots,\lambda_4$
你引入了加权总损失，但没说权重怎么选（交叉验证/经验设置）。
✅ 至少补一句：通过验证集调参或给定经验值范围。
$J, J_K, K$
你写了样本数量与段数量，但没在段落里明确解释每个符号含义。
✅ 建议在第一次出现处补：

$J$ ：一步样本数；
$K$ ：训练用多步段长度；
$J_K$ ：K步段样本数。

C. 控制（MPC + Nash）相关

$N_p, N_c$
你用了预测域 $N_p$ 、控制域 $N_c$ ，但没定义“它们分别代表多少个采样区间”。
✅ 必须补： $N_p$ 为预测步数， $N_c$ 为优化的控制参数步数（ $N_c\le N_p$ ）。
$Q_i, R_i$
你说是权重矩阵，但最好明确它们维度：

$Q_i\in\mathbb R^{d\times d}$ ，对应厚度/张力偏差权重；
$R_i\in\mathbb R^{p\times p}$ ，对应每一步 $\Gamma_{i,n+s}$ 的权重。

$\Delta x_{i,\mathrm{ref}}$ 的写法需要改得更“直白”
我那段写了一个“ref-ref=0”的表达，虽然逻辑上表示“偏差参考为0”，但读起来很绕，老师会觉得你在糊弄。
✅ 建议直接写两句之一（选一种并固定）：

偏差状态定义法：因为 $\Delta x_i = x_i-x_i^{\mathrm{ref}}$ ，所以优化中目标参考就是 $\Delta x_{i,\mathrm{ref}}\equiv 0$ 。
绝对状态法：直接对 $\hat x_i-x_i^{\mathrm{ref}}$ 做惩罚（此时不要再用 $\Delta$ 记偏差）。

$\oslash$ （逐元素除法符号）
我在约束检查那句用了 $\oslash$ ，很多老师不接受这种“非标准记号”。
✅ 建议改成“对每个输入通道分别计算极值点”，或用“Hadamard 除法”并明确定义。更稳的做法是直接删掉 $\oslash$ 写成分量形式。
$\epsilon$ 与 $\varepsilon$
Nash 收敛里用 $\epsilon$ ，Adam 里用 $\varepsilon$ 。最好明确：一个是避免除零的常数，一个是 Adam 数值稳定项。
✅ 在两处都补一句“ $\epsilon>0$ 为避免除零的小常数”。
“局部问题怎么解”
你写“可用 SQP/内点法”，但最好明确：每次 Nash 迭代中，每个机架解一个可微 NLP（因为NN可微）。
✅ 再补一句：用自动微分提供梯度。

建议你直接加一段“符号说明”最省事（可直接粘贴）

你可以在第二章开头或附录加一个小段，把上面漏的符号一网打尽。像这样（你可直接粘贴到 LaTeX）：

latex
\paragraph{Additional notation.}
$\mathcal{I}_x$ denotes the sampling domain (ranges) of deviation states $\Delta x_i$ used for offline data generation;
$\mathcal{I}_\Gamma$ denotes the sampling domain of polynomial parameters $\Gamma_{i,n}$.
$J$ is the number of one-step samples and $J_K$ is the number of $K$-step segment samples; $K$ is the segment length used for multi-step training.
$N_p$ and $N_c$ denote the prediction horizon and control horizon (numbers of sampling intervals), respectively, with $N_c\le N_p$.
$Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$ are weighting matrices in the local objective.
$I_d$ is the $d\times d$ identity matrix and $0_{a\times b}$ denotes an $a\times b$ zero matrix.
$\epsilon>0$ (in the Nash metric) and $\varepsilon>0$ (in Adam) are small constants for numerical stability.

并且把学习率符号改掉（比如）：

Adam： $\alpha$ 作为学习率
网络辅助分支保留 $\eta_i(\cdot)$

结论

所以答案是：主物理符号已经清楚，但还缺一批“集合/超参数/求解细节/记号冲突”的定义。把我上面列的 14 项补完（尤其是 $\mathcal I_x,\mathcal I_\Gamma, N_p,N_c,Q_i,R_i,\eta$ 冲突、 $\Delta x_{\mathrm{ref}}$ 的直白定义、 $\oslash$ 替换），老师一般就不会再说“符号不清、写得模糊”。

如果你愿意，我也可以把我上次给你的那份 LaTeX 再做一次“全局一致性清理版”：

统一所有 $\Delta$ 的含义（状态偏差 vs 输入增量）
统一 $T_i$ 的边界处理与权重
消除符号冲突（ $\eta$ 、 $\mathcal I$ 等）
把 $\Delta x_{\mathrm{ref}}$ 改成老师最爱看的“直接等于0/或直接写误差”版本
你只要告诉我：你更想用“偏差状态法（ $\Delta x=x-x^{ref}$ ）”还是“绝对状态误差法（ $\hat x-x^{ref}$ ）”。

Answer 5

latex
%======================================================================
%  GLOBAL CONSISTENCY CLEAN VERSION (Five-stand, thickness+tension only,
%  actuators: roll gap + stand speed; Δx = deviation, Δu = increment)
%======================================================================

%========================
\section{Construction of Dataset}
%========================

\subsection{Five-stand tandem mill setting and unified notation}
\label{subsec:notation_clean}

Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n$ the corresponding sampling interval length.
Define the local within-interval time variable $\tau=t-t_n\in[0,\delta_n]$.

\paragraph{Thickness and tension signals (what is controlled).}
Let $h_i(t)$ denote the exit thickness of stand $i$ ($i=1,\dots,5$),
and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ($i=1,\dots,4$).
The system is strongly coupled because the inter-stand tensions propagate along the mill line and are affected by neighboring stands' actions.

\paragraph{Reference trajectories and deviation-state definition (meaning of $\Delta x$).}
Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be the desired references (setpoints) given by process requirements
(e.g., schedule-based references or constant setpoints).
We define deviation (tracking-error) variables
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\label{eq:dev_def}
\end{equation}
Throughout this paper, the symbol ``$\Delta$'' attached to \emph{states} always means \emph{deviation from reference}.

\paragraph{Local state vector (where thickness and tension appear).}
For each stand $i$, we choose the local deviation state as
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3,
\label{eq:xi_def_clean}
\end{equation}
with the boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$ to keep a unified dimension $d=3$ for all stands.
(Equivalently, one may remove nonexistent boundary tensions from the state and use varying dimensions; here we keep a unified form.)

\paragraph{Neighbor sets and coupling representation (five-stand chain).}
For a five-stand tandem mill, the dominant coupling is between adjacent stands, hence we define
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad
Z_5=\{4\}.
\label{eq:Zi_clean}
\end{equation}
Define the neighbor-state stack
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\,|\,k\in Z_i\}.
\label{eq:xZi_clean}
\end{equation}

\paragraph{Actuators and increment inputs (meaning of $u$ and $\Delta u$).}
Each stand $i$ is manipulated by \emph{roll gap} (screw-down/hydraulic gap) $s_i(t)$ and \emph{stand speed} $v_i(t)$:
\begin{equation}
u_i(t)=
\begin{bmatrix}
s_i(t)\\
v_i(t)
\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_clean}
\end{equation}
To ensure smooth actuation and match industrial practice, we optimize \emph{discrete input increments}:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=
\begin{bmatrix}
\Delta s_i(t_n)\\
\Delta v_i(t_n)
\end{bmatrix}.
\label{eq:du_discrete_clean}
\end{equation}
Throughout this paper, the symbol ``$\Delta$'' attached to \emph{inputs} $\Delta u_i(t_n)$ means \emph{sample-to-sample increment}.
Thus, $\Delta x$ (deviation state) and $\Delta u$ (input increment) are conceptually different, and this is fixed by definition.

\paragraph{Disturbance.}
Let $d_i(t)$ denote exogenous disturbances (e.g., entry thickness fluctuation, friction variation, material parameter drift, etc.).
We denote the interval-level equivalent disturbance by $\Delta d_i(t_n)$ (defined via interval averaging below).

\paragraph{Basic matrix notation.}
$I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes the $a\times b$ zero matrix.


\subsection{Discrete interval mapping and data-driven learning objective}
\label{subsec:mapping_clean}

The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Delta u_i([t_n,t_{n+1}]),\,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:true_mapping_clean}
\end{equation}
where $\Phi_i(\cdot)$ is generally nonlinear and coupled due to rolling deformation and tension propagation.
A commonly used \emph{conceptual} equivalent discrete linear form is
\begin{equation}
\Delta x_i(t_{n+1})
=
M_d\,\Delta x_i(t_n)
+
N_d\,\Delta u_i(t_n)
+
F_d\,\Delta d_i(t_n),
\label{eq:linear_form_concept}
\end{equation}
where $M_d,N_d,F_d$ represent equivalent discrete-time matrices around operating conditions.
In a practical five-stand cold rolling mill, accurately deriving/identifying these matrices and disturbance models from first principles is difficult,
due to strong coupling, unmodeled nonlinearities, and time-varying operating regimes.
Therefore, this paper aims to learn a high-fidelity approximation of the interval evolution from data and then embed it into distributed MPC.

\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish \eqref{eq:linear_form_concept}
based on first principles. Therefore, in this paper, we learn an approximate mapping of \eqref{eq:true_mapping_clean} from data.
\end{remark}


%========================
\subsection{Interval-level parameterization and one-step dataset}
%========================

\paragraph{Why interval-level parameterization is reasonable in the five-stand setting.}
Although decisions are updated at discrete instants $t_n$, the hydraulic gap and drive systems evolve continuously inside each interval,
and abrupt within-interval changes may excite tension oscillations and deteriorate thickness stability.
Thus, parameterizing the within-interval increment trajectory by a low-order polynomial:
(i) yields a compact finite-dimensional decision representation;
(ii) enforces smooth profiles inside the interval;
(iii) enables enforcing increment constraints for all $\tau\in[0,\delta_n]$.
This is appropriate when $\delta_n$ is not excessively large relative to actuator bandwidth and the within-interval evolution is well approximated by a low-order basis.

\paragraph{Vector quadratic polynomial parameterization (two inputs).}
On the interval $[t_n,t_{n+1}]$, parameterize the control increment trajectory as
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=
\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,
\qquad \tau\in[0,\delta_n],
\label{eq:du_poly_vec_clean}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ are coefficient vectors ($n_u=2$).
Component-wise, \eqref{eq:du_poly_vec_clean} corresponds to
\begin{equation}
\begin{aligned}
\Delta s_{i,n}(\tau) &= \gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\\
\Delta v_{i,n}(\tau) &= \gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2.
\end{aligned}
\label{eq:du_components_clean}
\end{equation}
Define the stacked parameter vector
\begin{equation}
\Gamma_{i,n}\triangleq
\big[
(\Gamma_{i,n0})^\top,\,
(\Gamma_{i,n1})^\top,\,
(\Gamma_{i,n2})^\top
\big]^\top
\in\mathbb{R}^{p},
\qquad
p=3n_u=6.
\label{eq:Gamma_clean}
\end{equation}
Here, $\Gamma_{i,n0}$ is the baseline increment at $\tau=0$, while $\Gamma_{i,n1}$ and $\Gamma_{i,n2}$ describe the linear and quadratic variation rates.

\paragraph{Equivalent discrete-time (interval-averaged) increments.}
Define the interval-averaged equivalent increments as
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau,\\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau.
\end{aligned}
\label{eq:avg_def_clean}
\end{equation}
With \eqref{eq:du_poly_vec_clean}, the input average has a closed form:
\begin{equation}
\Delta u_i(t_n)=
\Gamma_{i,n0}
+\Gamma_{i,n1}\frac{\delta_n}{2}
+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:avg_closed_clean}
\end{equation}

\paragraph{Sampling domains for offline data generation.}
Let $\mathcal{I}_x$ denote the sampling domain (ranges) of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$,
and let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$ (covering both gap and speed channels).
These domains specify the operating envelope used to generate supervised training data.

\paragraph{One-step sample generation (five-stand coupled simulation).}
Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$.
In addition to the local deviation state, the neighbor deviation states are included to represent inter-stand coupling.
The process is summarized in Table~\ref{tab:interval_sample_generation_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \\
\midrule
1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \\
2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$). \\
3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via \eqref{eq:du_poly_vec_clean}. \\
4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$. \\
\bottomrule
\end{tabularx}
\end{table}

Accordingly, an interval sample for subsystem $i$ can be represented as
\begin{equation}
\mathcal{D}_{i,n}=\big\{\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big\}.
\label{eq:interval_sample_clean}
\end{equation}
Note that $\Delta u_{i,n}(\tau)$ is fully determined by $(\Gamma_{i,n},\delta_n)$ via \eqref{eq:du_poly_vec_clean},
therefore it is sufficient to store $(\Gamma_{i,n},\delta_n)$ as the learning input.

For each subsystem $i$, by repeating the above procedure across multiple intervals and randomized draws,
the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big\{&
\big(\Delta x_i^{(j)}(t_n),\,\Delta x_{Z_i}^{(j)}(t_n),\,\Delta x_i^{(j)}(t_{n+1});\,
\Gamma_{i,n}^{(j)},\,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big\}.
\end{split}
\label{eq:S_i_clean}
\end{equation}
Here $J$ is the number of one-step samples for subsystem $i$.
The overall dataset for the five-stand mill is denoted by $\{S_i\}_{i=1}^{5}$.
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[scale=0.5]{picture/Fig2.pdf}
  \caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}


%========================
\subsection{Multi-step rollout segment dataset}
%========================

The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth deviation-state trajectories over a horizon of $K$ consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into $K$-step trajectory segments.

Specifically, during offline simulation, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling
$\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances),
and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$.
Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks
$\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.

Define a $K$-step segment sample for subsystem $i$ as
\begin{equation}
\begin{aligned}
\mathcal{W}_{i,n}=
\Big\{&
\big(\Delta x_i(t_{n+s}),\,\Delta x_{Z_i}(t_{n+s}),\,\Gamma_{i,n+s},\,\delta_{n+s}\big)_{s=0}^{K-1}; \\
&\big(\Delta x_i(t_{n+s+1})\big)_{s=0}^{K-1}
\Big\}.
\end{aligned}
\label{eq:segment_clean}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\},
\label{eq:S_i_K_clean}
\end{equation}
where $J_K$ is the number of $K$-step segment samples.
Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (keeping only $s=0$),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.

%========================
\section{Construction of Residual Neural Network}
%========================

\subsection{Network architecture (what is learned, why residual, why include control)}
\label{subsec:net_clean}

\paragraph{Learning target (one-step controlled deviation-state evolution).}
Given the dataset, the neural network model is trained to learn a stand-wise, control-dependent one-step evolution law of deviation states:
\begin{equation}
\Delta x_i(t_{n+1})
\approx
\Delta x_i(t_n)+
\mathcal{N}_i\!\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Gamma_{i,n},\,\delta_n;\,\Theta_i\Big),
\label{eq:learned_dyn_clean}
\end{equation}
where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are trainable parameters.

\paragraph{Why the model must include $u$ (difference between ``with $u$'' and ``without $u$'').}
If $\mathcal{N}_i$ does not take control information as input (here $\Gamma_{i,n}$ and $\delta_n$),
the predictor becomes an autoregressive model that only reproduces trajectories under the training input patterns
and cannot answer the counterfactual question: ``what will happen if we choose a different roll gap/speed trajectory?''
Since MPC optimizes over candidate decisions, a control-dependent predictor \eqref{eq:learned_dyn_clean} is necessary
to evaluate the predicted thickness/tension behavior under different candidate actuator trajectories.

\paragraph{Input/output dimensions.}
Let $d=3$ (state dimension), $|Z_i|$ be the number of neighbors of stand $i$ in \eqref{eq:Zi_clean}, and $p=6$ in \eqref{eq:Gamma_clean}.
Define the input vector
\begin{equation}
X_{i,\text{in}} \triangleq
\big[
\Delta x_i(t_n)^\top,\,
\Delta x_{Z_i}(t_n)^\top,\,
\Gamma_{i,n}^\top,\,
\delta_n
\big]^\top
\in \mathbb{R}^{d(1+|Z_i|)+p+1}.
\label{eq:X_in_clean}
\end{equation}
The network mapping is
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}.
\end{equation}

\paragraph{Residual (shortcut) structure.}
To improve training stability and long-horizon rollout robustness, we use a residual form.
Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a selection matrix extracting the local state block:
\begin{equation}
\hat{I}_i = [I_d,\, 0_{d\times(d|Z_i|+p+1)}].
\label{eq:Ihat_clean}
\end{equation}
Then the one-step predictor is written as
\begin{equation}
X_{i,\text{out}} = \hat{I}_i X_{i,\text{in}} + \mathcal{N}_i(X_{i,\text{in}}; \Theta_i),
\label{eq:res_predict_clean}
\end{equation}
where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$.
This structure implements a baseline-plus-correction interpretation:
the shortcut propagates the current deviation state $\Delta x_i(t_n)$, while the network learns the correction capturing
unmodeled nonlinearities and inter-stand coupling (via $\Delta x_{Z_i}$) under varying operating conditions.

\paragraph{Auxiliary branch for variable interval length (avoid symbol conflict).}
To improve robustness when $\delta_n$ varies, we introduce an auxiliary branch inside $\mathcal{N}_i$:
\begin{equation}
\mathcal{N}_i(X_{i,\text{in}};\Theta_i)\triangleq
\psi_i(X_{i,\text{in}};\Theta_{\psi_i}) + \rho_i(X_{i,\text{in}};\theta_i),
\label{eq:aux_clean}
\end{equation}
where $\psi_i(\cdot)$ is a lightweight feedforward branch that captures low-frequency/scale effects strongly related to $\delta_n$,
and $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections.
When $\psi_i(\cdot)\equiv 0$, the model reduces to a standard residual network.
(We use $\psi_i$ and $\rho_i$ to avoid notation conflicts with the sampling sets $\mathcal{I}_x,\mathcal{I}_\Gamma$ and the optimizer learning rate.)

\paragraph{One-step supervised target.}
For the $j$-th sample in \eqref{eq:S_i_clean}, define
\begin{equation}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\ 
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top},
\end{equation}
and the supervised residual target
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\label{eq:target_clean}
\end{equation}

\subsection{Training, learned model, and system prediction (multi-step stability and control usage)}
\label{subsec:train_clean}

To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability,
we train the forward predictor jointly with an auxiliary backward residual model
and impose a multi-step reciprocal-consistency regularization over a $K$-step segment from $S_i^{(K)}$.

\paragraph{Backward residual model.}
Construct a backward residual network
\begin{equation}
\mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$. For the backward step associated with interval $[t_n,t_{n+1}]$, define
\begin{equation}
\begin{aligned}
X_{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),\
\Gamma_{i,n},\ \delta_n
\big]^{\top},\\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}_i X_{i,\mathrm{in}}^{b} + \mathcal{B}_i(X_{i,\mathrm{in}}^{b};\bar{\Theta}_i),
\end{aligned}
\label{eq:back_clean}
\end{equation}
where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$.
The supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t_{n+1}).
\end{equation}

\paragraph{Forward rollout on a $K$-step segment.}
Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize
\begin{equation}
\Delta \hat{x}_i(t_n)=\Delta x_i(t_n),
\end{equation}
and recursively apply the forward predictor for $K$ steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1})
&=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\,\Delta \hat{x}_{Z_i}(t_{n+s}),\,
\Gamma_{i,n+s},\,\delta_{n+s};\,\Theta_i
\Big),\\
&\qquad s=0,\ldots,K-1.
\end{aligned}
\label{eq:fwd_roll_clean}
\end{equation}

\paragraph{Backward rollout and reciprocal consistency.}
Set the terminal condition
\begin{equation}
\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}),
\end{equation}
and roll back using $\mathcal{B}_i$:
\begin{equation}
\begin{aligned}
\Delta \bar{x}_i(t_{n+s})
&=
\hat{I}_i X_{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\,\bar{\Theta}_i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\label{eq:bwd_roll_clean}
\end{equation}
where
\begin{equation}
X_{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}_i(t_{n+s+1}),\ \Delta \hat{x}_{Z_i}(t_{n+s+1}),\
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top}.
\end{equation}

Define the multi-step reciprocal prediction error
\begin{equation}
E_i(t_n)
=
\sum_{s=0}^{K}
\left\|
\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})
\right\|^2.
\end{equation}

\paragraph{Training objectives (meaning of each loss).}
We jointly minimize:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}_i\!\left(
X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}_i\!\left(
X_{i,\mathrm{in}}^{b\,(j)}(t_{n+s});\bar{\Theta}_i
\right)
\Big\|^2,\\[2mm]
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n),\\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big\|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})
\Big\|^2.
\end{aligned}
\label{eq:loss_clean}
\end{equation}
Here, $L_{\mathrm{1step}}$ enforces one-step accuracy; $L_{\mathrm{roll}}$ explicitly suppresses long-horizon drift under recursion;
$L_{\mathrm{msrp}}$ regularizes the learned dynamics by enforcing reciprocal consistency between forward and backward rollouts;
and $L_{\mathrm{bwd}}$ trains the backward model for the consistency regularization.
In implementation, these terms are combined as
\begin{equation}
L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}},
\end{equation}
where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set.

\paragraph{Learned forward predictor used in control.}
After training, the forward predictor is
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)
+
\mathcal{N}_i\!\Big(
\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,
\Gamma_{i,n},\,\delta_n;\,\Theta_i^*
\Big),
\label{eq:pred_clean}
\end{equation}
and multi-step prediction is obtained by recursive rollout of \eqref{eq:pred_clean}.
This learned predictor is the internal model used by the MPC optimizer in the next section.

Finally, network parameters are optimized using Adam:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon},
\end{equation}
where $\alpha$ is the learning rate (we use $\alpha$ to avoid conflict with other symbols),
$\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected moment estimates, and $\varepsilon>0$ is a small constant for numerical stability.
Figure~\ref{fig:rnn_logic} illustrates the overall structure.

\begin{figure}[htbp]
  \centering
  \includegraphics[scale=0.85]{picture/x6.pdf}
  \caption{Logic diagram of the residual neural network.}
  \label{fig:rnn_logic}
\end{figure}


%========================
\section{Nash Equilibrium-Based RNE-DMPC}
%========================

The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in control actions (roll gap and stand speed) at one stand can affect both upstream and downstream stands,
making centralized online optimization over all stands' decision variables computationally demanding.

To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate distributed coordination as a Nash-equilibrium-seeking iteration.
Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed MPC method (RNE-DMPC)
for coordinated thickness--tension regulation/tracking. The overall control structure is shown in Figure~\ref{4}.

\begin{figure*}[htbp]
  \centering
  \includegraphics[width=\linewidth]{picture/x2.pdf}
  \caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}


\subsection{Prediction model used in MPC and the prediction--control interface}
\label{subsec:interface_clean}

\paragraph{Key idea: prediction serves control through model constraints.}
At each sampling time $t_n$, MPC evaluates candidate actuator trajectories (encoded by $\Gamma_{i,n+s}$) by rolling out predictions of
thickness/tension deviations using the learned surrogate \eqref{eq:pred_clean}.
Therefore, the learned predictor directly provides the multi-step forecasts required to compute the MPC objective and enforce constraints.

\paragraph{Local polynomial parameterization over the horizon.}
Over each interval $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$, the control increment trajectory of stand $i$ is
\begin{equation}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
=
\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\tau
+\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}],
\label{eq:du_poly_mpc_clean}
\end{equation}
where $\Gamma_{i,n+s}\in\mathbb{R}^{p}$ with $p=6$.

\paragraph{Neural-network-based multi-step prediction inside MPC.}
Define the prediction horizon $N_p$ (number of future intervals predicted) and the control horizon $N_c$ (number of future intervals optimized),
with $N_c\le N_p$.
Given the current measured/estimated deviation states $\Delta x_i(t_n)$ and a candidate decision sequence
$\mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top$,
stand $i$ predicts its deviation-state evolution by
\begin{equation}
\begin{aligned}
\Delta \hat{x}_i(t_{n+s+1})
&=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\,
\Delta \hat{x}_{Z_i}(t_{n+s}),\,
\Gamma_{i,n+s},\,
\delta_{n+s};\,
\Theta_i^*
\Big), \\
&\qquad s=0,\ldots,N_p-1,
\end{aligned}
\label{eq:rollout_mpc_clean}
\end{equation}
with initialization $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$.
Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from the latest communication with neighbors during the Nash iteration.
Equation \eqref{eq:rollout_mpc_clean} is the explicit mathematical interface: \textbf{control decisions} $(\Gamma_{i,n+s})$
$\rightarrow$ \textbf{predicted thickness/tension deviations} $(\Delta \hat{x}_i)$ $\rightarrow$ \textbf{objective/constraints evaluation}.


\subsection{Local optimization problem (objective, reference meaning, constraints, and numerical solution)}
\label{subsec:local_opt_clean}

\paragraph{Decision variables.}
At time $t_n$, the local decision vector for stand $i$ is
\begin{equation}
\mathbf{\Gamma}_i(t_n)
=
\big[
\Gamma_{i,n}^\top,\,
\Gamma_{i,n+1}^\top,\,
\ldots,\,
\Gamma_{i,n+N_c-1}^\top
\big]^\top
\in \mathbb{R}^{pN_c}.
\label{eq:Gamma_seq_clean}
\end{equation}

\paragraph{Reference meaning (remove ambiguity of $\Delta x_{\mathrm{ref}}$).}
Because the deviation state is defined as $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ in \eqref{eq:dev_def} and \eqref{eq:xi_def_clean},
the desired regulation/tracking objective in deviation coordinates is always
\begin{equation}
\Delta x_i(t)\rightarrow 0.
\end{equation}
Therefore, the reference in deviation form is simply the zero vector, i.e.,
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}.
\label{eq:dxref_zero_clean}
\end{equation}
Equivalently, one can view the cost as penalizing $(\hat x_i-x_i^{\mathrm{ref}})$ in absolute coordinates; in this paper we keep the deviation formulation.

\paragraph{Local objective (explicitly thickness+tension).}
Let $\Delta \hat{x}_i(t_{n+s})=[\Delta \hat h_i(t_{n+s}),\,\Delta \widehat T_{i-1}(t_{n+s}),\,\Delta \widehat T_i(t_{n+s})]^\top$.
The local cost is
\begin{equation}
\begin{aligned}
J_i
&=
\sum_{s=1}^{N_p}
\big\|
\Delta \hat{x}_i(t_{n+s}) - \Delta x_{i,\mathrm{ref}}(t_{n+s})
\big\|_{Q_i}^2
+
\sum_{s=0}^{N_c-1}
\big\|
\Gamma_{i,n+s}
\big\|_{R_i}^2,
\end{aligned}
\label{eq:Ji_clean}
\end{equation}
where $Q_i\in\mathbb{R}^{d\times d}$ weights thickness and tension deviations,
and $R_i\in\mathbb{R}^{p\times p}$ penalizes the polynomial-parameter magnitude to encourage smooth increments.
Using \eqref{eq:dxref_zero_clean}, the tracking term reduces to penalizing predicted deviation states directly.

\paragraph{Constraints.}
We enforce both absolute-input bounds and increment-trajectory bounds.

\emph{Absolute input bounds:}
\begin{equation}
u_{i,\min} \le u_i(t_{n+s}) \le u_{i,\max},
\qquad s=0,\ldots,N_p-1,
\label{eq:u_abs_clean}
\end{equation}
where $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ provide component-wise bounds for $(s_i,v_i)$.

\emph{Increment trajectory bounds for all $\tau$:}
\begin{equation}
\Delta u_{i,\min}
\le
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
\le
\Delta u_{i,\max},
\qquad \forall\tau\in[0,\delta_{n+s}],
\label{eq:du_traj_clean}
\end{equation}
where $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ are component-wise bounds for $(\Delta s,\Delta v)$.

\paragraph{Practical enforcement of \eqref{eq:du_traj_clean} (no nonstandard symbols).}
For a scalar quadratic $q(\tau)=a+b\tau+c\tau^2$ on $\tau\in[0,\delta]$, its extrema over the interval occur at
$\tau=0$, $\tau=\delta$, and possibly at the stationary point $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$.
Therefore, to enforce \eqref{eq:du_traj_clean} for the two-channel vector $\Delta u_{i,n+s}(\tau)=[\Delta s_{i,n+s}(\tau),\,\Delta v_{i,n+s}(\tau)]^\top$,
we check the above candidate points \emph{separately for each channel} using the corresponding coefficients in \eqref{eq:du_components_clean}.

\paragraph{Consistency between within-interval trajectory and discrete execution (interval average).}
To update the discrete absolute input and enforce \eqref{eq:u_abs_clean} consistently, we use the interval-averaged increment:
\begin{equation}
\Delta u_i(t_{n+s})
=
\frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\,d\tau
=
\Gamma_{i,n+s,0}
+\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3}.
\label{eq:du_avg_clean}
\end{equation}
Then propagate the absolute input sequence:
\begin{equation}
\begin{aligned}
u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \\
u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1.
\end{aligned}
\label{eq:u_prop_clean}
\end{equation}

\paragraph{Local optimization problem (solved at each Nash iteration).}
At Nash-iteration index $l$, subsystem $i$ solves
\begin{equation}
\mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\; J_i
\quad \text{s.t.}\quad
\eqref{eq:rollout_mpc_clean},\ \eqref{eq:u_abs_clean},\ \eqref{eq:du_traj_clean},\ \eqref{eq:u_prop_clean}.
\label{eq:local_prob_clean}
\end{equation}
Because the learned surrogate $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_prob_clean} is a differentiable nonlinear program (NLP),
which can be solved by standard gradient-based NLP solvers (e.g., SQP or interior-point methods) using automatic differentiation to compute gradients.


\subsection{Nash equilibrium coordination (where conflict comes from, how it is resolved, and how the solution is obtained)}
\label{subsec:nash_clean}

\paragraph{Why a Nash iteration is needed (explicit conflict explanation).}
Inter-stand tensions are shared coupling variables: the tension $T_i$ is influenced by both stand $i$ and stand $i+1$ (notably through their speed actions),
and changes in roll gap can also indirectly affect tensions via strip deformation and transport.
Therefore, purely independent local optimization can lead to conflicting actions: improving local thickness may worsen neighbor tensions, and vice versa.
To resolve this coupling conflict with limited communication, we adopt a Nash-equilibrium-seeking distributed best-response iteration.

\paragraph{Distributed best-response iteration.}
At iteration $l$, each stand computes its best response to the latest neighbor strategies and predictions.
The procedure is summarized in Table~\ref{tab:nash_iter_en}.

\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Distributed Nash best-response iteration for RNE-DMPC (five-stand).}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \\
\midrule
A &
Initialize $l=1$ and initialize $\mathbf{\Gamma}_i^{(0)}$ for all subsystems (e.g., warm-start from previous time step). \\

B &
Using \eqref{eq:rollout_mpc_clean}, compute $\Delta \hat{x}_i^{(l)}(t_{n+s})$ for $s=1,\ldots,N_p$ \\
&
given $\mathbf{\Gamma}_i^{(l-1)}$ and the latest neighbor predictions
$\Delta \hat{x}_{Z_i}^{(l-1)}(t_{n+s})$. \\

C & Solve the local NLP \eqref{eq:local_prob_clean} to update $\mathbf{\Gamma}_i^{(l)}$ (best response). \\

D &
Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories
$\Delta \hat{x}_i^{(l)}(t_{n+s})$ to the communication system. \\

E &
Update neighbor predictions $\Delta \hat{x}_{Z_i}^{(l)}(t_{n+s})$ using received information; re-generate predictions if needed. \\

F & Compute the maximum relative change $\varsigma^{(l)}$ as the convergence metric. \\

G &
If $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$, stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$; \\
& otherwise set $l\leftarrow l+1$ and repeat Steps B--F. \\
\bottomrule
\end{tabularx}
\end{table}

\paragraph{Convergence metric.}
Define
\begin{equation}
\varsigma^{(l)}
=
\max_i
\frac{\left\|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right\|_2}{
\left\|
\mathbf{\Gamma}_i^{(l-1)}
\right\|_2+\epsilon},
\label{eq:nash_metric_clean}
\end{equation}
where $\epsilon>0$ is a small constant to avoid division by zero.

\paragraph{Receding-horizon implementation (how the final control is applied).}
Only the first-interval parameters $\Gamma_{i,n}^*$ are applied.
The increment trajectory over $[t_n,t_{n+1}]$ is
\begin{equation}
\Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^*),
\quad \tau\in[0,\delta_n].
\end{equation}
The discrete input increment applied for updating the absolute input is the interval average:
\begin{equation}
\Delta u_i(t_n)
=
\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau
=
\Gamma_{i,n,0}^*
+
\Gamma_{i,n,1}^*\frac{\delta_n}{2}
+
\Gamma_{i,n,2}^*\frac{\delta_n^2}{3}.
\label{eq:apply_avg_clean}
\end{equation}
Then the absolute input is updated by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),
\label{eq:apply_u_clean}
\end{equation}
which ensures smooth evolution of both roll gap and stand speed and avoids abrupt actuator changes.

\paragraph{Closed-loop prediction--control connection (complete loop).}
At each sampling instant $t_n$:
(i) measure/estimate the current deviation states $\Delta x_i(t_n)$ (thickness and tensions) and initialize neighbor information;
(ii) perform Nash best-response iterations; in each iteration solve \eqref{eq:local_prob_clean} using the learned predictor \eqref{eq:rollout_mpc_clean};
(iii) after convergence, apply $\Gamma_{i,n}^*$ by generating the within-interval increment trajectory and updating $u_i(t_n)$ via \eqref{eq:apply_avg_clean}--\eqref{eq:apply_u_clean};
(iv) move to $t_{n+1}$ and repeat.
In this way, the learned neural predictor supplies the multi-step forecasts needed by MPC, and the distributed optimization computes coordinated
gap/speed increment trajectories that regulate thickness and tensions while resolving coupling conflicts via Nash equilibrium.

The overall control flow chart is shown in Fig.~\ref{liu}.

\begin{figure}[htbp]
  \centering
  \includegraphics[width=\linewidth]{picture/x5.pdf}
  \caption{The overall system control flow chart}\label{liu}
\end{figure}

%========================
% Optional: a compact symbol paragraph you can keep or move to Appendix
%========================
\paragraph{Summary of key symbols (for reviewer clarity).}
$i\in\{1,\dots,5\}$: stand index; $t_n$: sampling instant; $\delta_n$: sampling interval; $\tau$: within-interval time.
$h_i,T_i$: thickness and inter-stand tension; $h_i^{\mathrm{ref}},T_i^{\mathrm{ref}}$: references.
$\Delta h_i,\Delta T_i$: deviations; $\Delta x_i$: local deviation state in \eqref{eq:xi_def_clean}.
$u_i=[s_i,v_i]^\top$: actuators (gap and speed); $\Delta u_i(t_n)=u_i(t_n)-u_i(t_{n-1})$: discrete input increment.
$\Delta u_{i,n}(\tau;\Gamma_{i,n})$: within-interval increment trajectory; $\Gamma_{i,n}\in\mathbb{R}^{p}$: polynomial parameters ($p=6$).
$Z_i$: neighbor set; $\Delta x_{Z_i}$: neighbor stack.
$\mathcal{N}_i(\cdot)$: learned forward residual model; $\mathcal{B}_i(\cdot)$: backward model.
$K$: segment length; $J,J_K$: numbers of one-step and segment samples.
$N_p,N_c$: prediction and control horizons; $Q_i,R_i$: weights; $\epsilon,\varepsilon$: small constants.

这版做了你要的“全局一致性清理”，核心变化是：

统一 Δ 的含义：
- $\Delta x$ 一律是 相对参考的偏差（厚度/张力误差）；
- $\Delta u$ 一律是 采样间输入增量（辊缝/速度增量）。
把参考信号写死为“偏差参考=0”，不再出现绕人的 $\text{ref}-\text{ref}$ 。
消除符号冲突：辅助分支用 $\psi_i,\rho_i$ ，Adam 学习率用 $\alpha$ 。
去掉非标准符号 $\oslash$ ，把二次极值点检查写成标准文字说明（分量分别检查）。
补齐所有集合/超参数/维度： $\mathcal I_x,\mathcal I_\Gamma,J,J_K,K,N_p,N_c,Q_i,R_i$ 全部解释。
把“预测服务控制”用模型约束式 \eqref{eq:rollout_mpc_clean} 讲透，并给出完整闭环步骤。

如果你希望我继续把你原文里残留的编号（比如你文中引用的(40)-(43)、Fig编号）按你整篇论文的编号体系对齐，我也可以直接帮你把编号和交叉引用改成最终版。

Answer 6

你这两个问题都问到点子上了。结论先说清楚：

多项式参数化不是“一定正确”，它是一个控制参数化（control parameterization）的近似/工程选择；你现在的解释方向是合理的，但要想让老师“闭嘴”，必须把它成立的假设、它在你系统里怎么落地、以及不成立时怎么办写出来，而且要把“区间平均用于更新绝对输入”这块做一次更严谨的一致性处理（否则会被抓住）。
你现在的“全局一致性清理版”已经比原来清楚很多，但仍然可能被认为模糊的点主要集中在：
- 参考信号到底从哪来、是常值还是时变、每一架厚度参考是什么（出口厚度？最终出口？）
- 张力参考是 $T_i$ （机架间）还是每架张力（你已经选了机架间，但要明确边界怎么处理）
- 输入在区间内到底怎么施加（你现在用“区间平均更新绝对输入”，会引出“那区间末端输入是多少？”的质疑）
- Nash 最佳响应是否保证收敛（一般不保证），你有没有做松弛/阻尼、有没有最大迭代次数、没收敛怎么办
- 你的“离线耦合仿真模型”是什么层级（是否包含速度/液压内环），否则“训练-在线执行不一致”

你说“如果现在不太好，能不能把结构打乱重写一个新的”，我建议你直接换成一种更“审稿友好”的结构：先定义闭环控制问题 → 再定义你把控制轨迹参数化成什么 → 再说怎么生成数据/训练可控模型 → 最后再说 DMPC/Nash 怎么解、怎么滚动实施。这样读者不会在第二章就迷路。

下面我分两部分给你：

A) 把“多项式参数化是否一定正确”讲成论文里能用的严谨表述（含你现在最容易被挑的点怎么补）
B) 给你一份全新结构的、完整可直接替换的 LaTeX 段落（包含你前面所有内容：一步/多步数据集、网络、forward/backward 训练、RNE-DMPC、约束、求解、闭环流程；并把“区间平均/端点更新”一致性讲清楚）

A) 多项式参数化是否一定正确？怎么写才“站得住”

1) 你应该在论文里明确：它是“控制参数化”，不是“物理真理”

可以这样写（核心句式）：

我们并不声称真实执行器输入在区间内严格是二次多项式；
我们采用二次多项式是为了：有限维决策、平滑、可检查区间约束、与工业插补一致；
只要执行器/内环能实现类似的平滑插补（斜坡/二次/样条），该参数化就是合理近似；
若现场为 ZOH（区间常值），该模型退化为 $\Gamma_{1}=\Gamma_{2}=0$ 的特例，不影响框架。

2) 你目前最大风险点：“区间平均用于更新绝对输入”可能不自洽

老师很容易问一句：

你区间内给的是 $\Delta u_{i,n}(\tau)$ ，那 $u(t_{n+1})$ 到底是多少？你用平均更新 $u_i(t_n)$ 物理意义是什么？

最稳的做法是：区分两个量（用两个符号）：

端点增量（决定下一个采样点的绝对输入）：
$\Delta u^{\mathrm{end}}_i(t_n)\triangleq \Delta u_{i,n}(\delta_n)$
并用它更新： $u_i(t_{n+1})=u_i(t_n)+\Delta u^{\mathrm{end}}_i(t_n)$
区间等效增量（用于离散模型/等效作用）：
$\overline{\Delta u}_i(t_n)\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau$
它用于“等效离散输入”或用于约束/代价中的能量度量，但不要拿它定义采样点上的端点值。

这样你就不会在“离散更新”和“区间内轨迹”之间自相矛盾。

3) 你可以再加一句“验证”让它更硬

最简单的加固方式（论文一句话）：

我们比较了 ZOH/一次/二次参数化在离线仿真中的一步误差与多步 rollout 误差，二次在不显著增加决策维度的情况下显著降低误差，因此采用二次。

这句不一定要给表格，但你后面实验如果能做一张消融（p=2、p=4、p=6），老师会非常服。

B) 是否还模糊？我建议你补齐这些“审稿必问点”

即使你符号解释清了，老师仍可能认为“模糊”的点一般是方法边界与工程细节没落地。建议你在新结构里显式写：

参考信号来源：是常值（稳态调节）还是来自规格计划（时变轨迹）。
测量/估计： $\Delta x_i(t_n)$ 怎么获得（厚度仪、张力传感器、滤波/观测器），你不必展开，但要说“可测/可估”。
Nash 不一定收敛：你要给一个“工程化处理”：最大迭代次数 $L_{\max}$ 、松弛系数 $\beta$ 、未收敛则用上一时刻策略/或用最后一次迭代解。
训练模型与在线执行一致性：离线仿真施加的输入轨迹与在线要一致（同样的插补策略），否则“学到的是另一个系统”。

C) 按全新结构重写一份“完整 LaTeX”（不省略你原有内容，并修正关键一致性）

下面这份就是“打乱结构、重新写”的版本。它把逻辑改成：

问题与符号（厚度/张力在哪、u 是辊缝+速度）
区间轨迹参数化（说明不是必然正确、给假设、给退化到 ZOH）
数据集构造（一步 + K 步段）
可控残差网络（为什么必须包含控制、为什么 residual、为什么变 $\delta$ 分支）
训练（rollout + forward/backward consistency + 损失权重）
RNE-DMPC（目标/参考/冲突来源/局部 NLP/约束检查/Nash 求解/闭环流程）
并且把“端点增量 vs 区间平均”等效量区分清楚。

你可以直接把下面整段复制替换你第二到第四章（章节号你自己按全文调整）。

latex
%======================================================================
\section{Problem Formulation and Interval-Level Input Parameterization}
%======================================================================

\subsection{Five-stand tandem mill, controlled variables, and references}
Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ be the sampling instants and $\delta_n=t_{n+1}-t_n$ the sampling interval length.
The controlled physical quantities are:
(i) exit thickness $h_i(t)$ of each stand $i$ ($i=1,\dots,5$);
(ii) inter-stand tension $T_i(t)$ between stand $i$ and $i+1$ ($i=1,\dots,4$).

Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ denote the desired references given by process requirements.
They can be constant setpoints (regulation) or time-varying trajectories (tracking).
Define deviation (tracking-error) variables
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\end{equation}
Accordingly, the stand-wise local deviation state is chosen as
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3,
\label{eq:state_def_new}
\end{equation}
with boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$.
The coupling is dominated by adjacent-tension propagation, therefore we use the neighbor sets
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad
Z_5=\{4\},
\end{equation}
and define the neighbor stack
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}.
\end{equation}

\subsection{Actuators and discrete-time decision variables}
Each stand $i$ is manipulated by the roll gap (screw-down/hydraulic gap) $s_i(t)$ and the stand speed $v_i(t)$:
\begin{equation}
u_i(t)=\begin{bmatrix}s_i(t)\\ v_i(t)\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_new}
\end{equation}
To encourage smooth actuation and avoid abrupt changes, we optimize input increments at the sampling instants:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=
\begin{bmatrix}\Delta s_i(t_n)\\ \Delta v_i(t_n)\end{bmatrix}.
\label{eq:du_discrete_new}
\end{equation}
Note that $\Delta x$ denotes deviation-from-reference states, while $\Delta u$ denotes sample-to-sample input increments.

\subsection{Interval-level input trajectory parameterization (assumption and justification)}
Within each sampling interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$.
We parameterize the within-interval input increment trajectory by a low-order polynomial:
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=
\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,\qquad \tau\in[0,\delta_n],
\label{eq:poly_new}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and the stacked parameter vector is
\begin{equation}
\Gamma_{i,n}\triangleq
\big[(\Gamma_{i,n0})^\top,(\Gamma_{i,n1})^\top,(\Gamma_{i,n2})^\top\big]^\top\in\mathbb{R}^{p},
\qquad p=3n_u=6.
\end{equation}
Component-wise,
\begin{equation}
\Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad
\Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2.
\end{equation}

\paragraph{Remark on correctness and applicability.}
The polynomial parameterization \eqref{eq:poly_new} is a control-parameterization choice rather than a first-principles truth.
It is adopted because: (i) it provides a compact finite-dimensional decision representation; (ii) it yields smooth within-interval commands,
which is consistent with typical industrial interpolation/ramps of gap and speed setpoints; (iii) it enables enforcing increment bounds
for all $\tau\in[0,\delta_n]$. If the actual implementation is zero-order hold (ZOH), \eqref{eq:poly_new} reduces to a constant increment
by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$, so the proposed framework still applies.

\subsection{End-point increment versus interval-equivalent increment}
To avoid ambiguity between the within-interval command and discrete-time updates, we distinguish:

\textbf{End-point increment} (used to update the sampled input):
\begin{equation}
\Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n}).
\label{eq:du_end}
\end{equation}

\textbf{Interval-equivalent (averaged) increment} (used as an equivalent discrete quantity when needed):
\begin{equation}
\overline{\Delta u}_i(t_n)\triangleq
\frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau
=
\Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:du_avg}
\end{equation}

In online execution, the sampled input is updated by the end-point increment:
\begin{equation}
u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n).
\label{eq:u_update_end}
\end{equation}
This definition is consistent with the within-interval trajectory \eqref{eq:poly_new} and avoids mixing ``average'' with ``end-point'' values.

%======================================================================
\section{Dataset Construction for Learning One-step and Multi-step Dynamics}
%======================================================================

\subsection{Data-driven one-step mapping to be learned}
Over each interval $[t_n,t_{n+1}]$, the (unknown) deviation-state evolution can be written as a discrete mapping
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:unknown_map_new}
\end{equation}
where $\Phi_i(\cdot)$ is nonlinear and coupled due to tension propagation and rolling deformation interactions.
Because accurate first-principles identification is difficult, we learn an approximation of \eqref{eq:unknown_map_new} from offline-simulated data.

\subsection{Sampling domains and one-step sample generation}
Let $\mathcal{I}_x$ denote the sampling domain of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$.
Let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$.
One interval-level sample is generated by:

\begin{enumerate}
\item Sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$.
\item Sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and select $\delta_n$ from a prescribed range.
\item Construct $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ via \eqref{eq:poly_new}.
\item Integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval input trajectory,
and record $\Delta x_i(t_{n+1})$.
\end{enumerate}

An interval sample is thus
\begin{equation}
\mathcal{D}_{i,n}=\Big\{\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta x_i(t_{n+1})\Big\}.
\end{equation}
Repeating this procedure yields the one-step dataset
\begin{equation}
S_i=\Big\{
\big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big)
\ \Big|\ j=1,\ldots,J
\Big\},
\end{equation}
where $J$ is the number of one-step samples and the overall dataset is $\{S_i\}_{i=1}^{5}$.

\subsection{Multi-step rollout segment dataset}
For multi-step training objectives, we organize the offline-simulated samples into $K$-step segments.
Starting from $t_n$, sample $\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ consecutively, integrate the coupled mill model,
and obtain $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.
Define the segment sample
\begin{equation}
\mathcal{W}_{i,n}=
\Big\{
(\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1};
(\Delta x_i(t_{n+s+1}))_{s=0}^{K-1}
\Big\},
\end{equation}
and the segment dataset
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\},
\end{equation}
where $J_K$ is the number of $K$-step segments.

%======================================================================
\section{Residual Neural Surrogate Model and Training}
%======================================================================

\subsection{Controlled residual predictor (must include control)}
We learn a control-dependent one-step deviation-state predictor
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)+
\mathcal{N}_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i\Big).
\label{eq:surrogate_new}
\end{equation}
Including $(\Gamma_{i,n},\delta_n)$ is necessary: without control input, the model degenerates to an autoregressive predictor and cannot be used inside MPC,
because MPC must evaluate trajectories under different candidate decisions.

Define the network input vector
\begin{equation}
X_{i,\mathrm{in}}=
\big[
\Delta x_i(t_n)^\top,\Delta x_{Z_i}(t_n)^\top,\Gamma_{i,n}^\top,\delta_n
\big]^\top
\in\mathbb{R}^{d(1+|Z_i|)+p+1},
\end{equation}
and $\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\to\mathbb{R}^{d}$.

\subsection{Residual structure and auxiliary branch for varying $\delta_n$}
Let $\hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}]$ extract the local state block from $X_{i,\mathrm{in}}$.
The residual predictor can be written as
\begin{equation}
X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i),
\end{equation}
where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$.
To enhance robustness to variable $\delta_n$, decompose
\begin{equation}
\mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i),
\end{equation}
where $\psi_i(\cdot)$ captures low-frequency/scale effects associated with $\delta_n$, and $\rho_i(\cdot)$ captures remaining nonlinear coupling corrections.

\subsection{One-step target and multi-step training objectives}
For one-step supervision, define the residual target
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\end{equation}
To suppress long-horizon drift, we further use $K$-step rollout loss and reciprocal-consistency regularization.

Construct a backward residual model $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$ with the same input dimension.
Forward rollout over a segment initializes $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$ and applies \eqref{eq:surrogate_new} recursively for $K$ steps.
Backward rollout starts from $\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K})$ and rolls back using $\mathcal{B}_i$ along the same segment.

Define reciprocal error
\begin{equation}
E_i(t_n)=\sum_{s=0}^{K}\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\|^2,
\end{equation}
and losses
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s}))
-\mathcal{N}_i(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i)
\Big\|^2,\\
L_{\mathrm{bwd}}(\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big\|
(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1}))
-\mathcal{B}_i(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i)
\Big\|^2,\\
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}E_i^{(j)}(t_n),\\
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\|\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})\|^2.
\end{aligned}
\end{equation}
The total loss is
\begin{equation}
L_{\mathrm{total}}=\lambda_1L_{\mathrm{1step}}+\lambda_2L_{\mathrm{roll}}+\lambda_3L_{\mathrm{msrp}}+\lambda_4L_{\mathrm{bwd}},
\end{equation}
with $\lambda_\ell>0$ tuned on a validation set.
Optimization uses Adam:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon},
\end{equation}
where $\alpha$ is the learning rate and $\varepsilon>0$ is a small constant.

%======================================================================
\section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC for Thickness--Tension Control}
%======================================================================

\subsection{Control objective, coupling conflict, and why Nash coordination}
Because tensions $T_i$ are shared by stand $i$ and $i+1$ and are mainly affected by their speed actions,
local optimization decisions can conflict: improving local thickness by changing roll gap or speed may worsen shared tensions of neighbors.
To coordinate with limited communication and reduced computational burden, we employ a Nash-equilibrium-seeking distributed MPC.

\subsection{Neural predictor as MPC model constraint (prediction serves control)}
At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c\le N_p$.
Given measured/estimated $\Delta x_i(t_n)$ and a candidate decision sequence $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$,
multi-step prediction is obtained by recursively applying
\begin{equation}
\Delta \hat{x}_i(t_{n+s+1})
=
\Delta \hat{x}_i(t_{n+s})
+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}),
\Gamma_{i,n+s},\delta_{n+s};\Theta_i^*
\Big),
\quad s=0,\ldots,N_p-1,
\label{eq:mpc_rollout_new}
\end{equation}
with initialization $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$.
Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from neighbors through communication during Nash iterations.
Equation \eqref{eq:mpc_rollout_new} is the explicit interface: decisions $\Gamma\mapsto$ predicted thickness/tension deviations.

\subsection{Local optimization problem (clear reference, objective, constraints)}
Because $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$, the desired deviation reference is always
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0.
\end{equation}
Let $Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$.
The local cost is
\begin{equation}
J_i
=
\sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2
+
\sum_{s=0}^{N_c-1}\|\Gamma_{i,n+s}\|_{R_i}^2.
\label{eq:local_cost_new}
\end{equation}

Constraints include:

\textbf{Absolute input bounds} for roll gap and speed:
\begin{equation}
u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1.
\end{equation}

\textbf{Increment trajectory bounds} over the whole interval:
\begin{equation}
\Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max},
\qquad \forall \tau\in[0,\delta_{n+s}].
\end{equation}
For quadratic trajectories, these bounds are enforced by checking $\tau=0$, $\tau=\delta_{n+s}$,
and the stationary point $\tau^\star=-b/(2c)$ (separately for each channel) when $\tau^\star\in[0,\delta_{n+s}]$.

\textbf{Discrete-time input propagation} is performed using end-point increment \eqref{eq:du_end}:
\begin{equation}
u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}),
\qquad
\Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}).
\end{equation}

At Nash iteration index $l$, stand $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}=
\arg\min_{\mathbf{\Gamma}_i}\ J_i
\quad\text{s.t.}\quad
\eqref{eq:mpc_rollout_new}\ \text{and all constraints above}.
\end{equation}
Because $\mathcal{N}_i(\cdot)$ is differentiable, this NLP can be solved using gradient-based methods (SQP/interior-point) with automatic differentiation.

\subsection{Distributed Nash best-response iteration and practical termination}
The Nash best-response iteration proceeds as:

(A) initialize $\mathbf{\Gamma}_i^{(0)}$ (warm start from previous time $t_{n-1}$);
(B) rollout predictions using \eqref{eq:mpc_rollout_new};
(C) solve local NLP to update $\mathbf{\Gamma}_i^{(l)}$;
(D) broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors;
(E) update neighbor predictions;
(F) compute
\begin{equation}
\varsigma^{(l)}=
\max_i\frac{\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\|_2}{\|\mathbf{\Gamma}_i^{(l-1)}\|_2+\epsilon},
\end{equation}
where $\epsilon>0$ avoids division by zero;
(G) stop if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$.

\paragraph{Practical safeguards.}
Because best-response iterations may not be globally convergent for general nonlinear coupled problems,
we set a maximum iteration number $L_{\max}$.
If the stopping criterion is not met within $L_{\max}$, we apply the last iterate $\mathbf{\Gamma}_i^{(L_{\max})}$.
Optionally, a relaxation update can be used:
$\mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)}$ with $\beta\in(0,1]$ to enhance robustness.

\subsection{Receding-horizon closed-loop implementation (complete prediction--control loop)}
At each sampling time $t_n$:
(i) measure/estimate $\Delta x_i(t_n)$ and receive neighbor information;
(ii) run Nash best-response iterations to obtain $\Gamma_{i,n}^*$;
(iii) apply the first-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$;
(iv) update the sampled input by $u_i(t_{n+1})=u_i(t_n)+\Delta u_{i}^{\mathrm{end}}(t_n)$;
(v) shift the horizon to $t_{n+1}$ and repeat.
In this way, the learned predictor provides multi-step forecasts for MPC, and the distributed optimization computes coordinated
roll-gap/speed commands that regulate thickness and tensions while resolving coupling conflicts via Nash equilibrium.

最后回答你的两个问题（直给结论）

Q1：多项式参数化是否一定正确？我现在的解释是否合理？

不一定“正确”，但它是一个合理、常用、可验证的控制参数化。
你需要把它写成“近似/选择”，并明确：
1. 工业插补/平滑存在 → 合理；
2. ZOH 时退化为特例 → 框架仍成立；
3. 给出验证或消融 → 更硬。
同时建议你按我上面那样区分 端点增量 vs 区间平均等效量，否则“平均更新绝对输入”的表述容易被追问到不自洽。

Q2：还存不存在太模糊的问题？

还可能被认为模糊的点主要是工程落地/边界条件：

参考信号来源（常值/时变）要写死一种或两种都说明；
离线仿真模型是否包含内环（速度/液压）要说明，否则训练-执行可能不一致；
Nash 迭代不保证收敛 → 必须写 $L_{\max}$ 、松弛 $\beta$ 、未收敛处理；
权重 $Q_i,R_i$ 的选取原则（厚度权重大于张力？边界张力权重=0？）要讲一句。

如果你愿意，我还可以把你现有全文的“式号/图号/表号引用”按你的论文编号体系彻底对齐（比如你原来写的(40)-(43)到底对应哪几个约束），并顺便把“边界张力 $T_0,T_5$ ”在代价/约束里如何处理写成最不挨骂的版本。

Answer 7

latex
%======================================================================
%  DETAILED REWRITE (NEW STRUCTURE, FULLY EXPLAINED)
%  Five-stand tandem cold rolling mill
%  Controlled: thickness + inter-stand tension
%  Actuators: roll gap + stand speed
%  Notation: Δx = deviation-from-reference (tracking error)
%            Δu = sample-to-sample increment (discrete input increment)
%  Within-interval increment trajectory is parameterized (quadratic)
%  Surrogate: Residual NN with control-dependent input
%  Training: one-step + multi-step rollout + forward/backward reciprocal consistency
%  Control: Nash-equilibrium-based distributed MPC (RNE-DMPC)
%======================================================================

%========================
\section{Problem Setting and Notation}
%========================

\subsection{Five-stand tandem cold rolling mill and controlled variables}
Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n>0$ the sampling interval length.
The controlled physical variables in this paper are:
(i) exit thickness of each stand, $h_i(t)$ for $i=1,\dots,5$; and
(ii) inter-stand strip tension between stand $i$ and $i+1$, denoted by $T_i(t)$ for $i=1,\dots,4$.

The five-stand system exhibits strong coupling primarily through tension propagation:
changes of speed and deformation at one stand influence the strip transport and elongation, thereby affecting the tensions of neighboring stands,
which in turn affect thickness through rolling force and material flow coupling.
This motivates a coupled prediction model and a coordinated (distributed) control strategy.

\subsection{References and deviation-state definition (fixing the meaning of $\Delta x$)}
Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be reference trajectories/setpoints determined by the production schedule
(or constant setpoints for regulation). Define deviation (tracking-error) variables:
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\label{eq:dev_def_detailed}
\end{equation}
\textbf{Convention (fixed throughout the paper):}
the symbol ``$\Delta$'' attached to \emph{states} always denotes deviation from reference (tracking error).
Thus, the control objective in deviation coordinates is always $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$.

\subsection{Local state vector and boundary handling (where thickness/tension appear)}
For each stand $i$, we choose a local deviation state vector containing the stand thickness deviation and adjacent tension deviations:
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3.
\label{eq:local_state_detailed}
\end{equation}
To maintain a unified dimension across stands, we adopt the boundary convention
\begin{equation}
\Delta T_0(t)\equiv 0,\qquad \Delta T_5(t)\equiv 0.
\label{eq:boundary_tension_detailed}
\end{equation}
In the cost and constraint design, the boundary (virtual) components can be assigned zero weights so that they do not influence optimization.

\subsection{Neighbor sets and coupling information (five-stand chain)}
In a tandem mill, the strongest coupling is between adjacent stands via inter-stand tensions, hence we define the neighbor sets:
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ \ (i=2,3,4),\quad
Z_5=\{4\}.
\label{eq:neighbor_set_detailed}
\end{equation}
Define the neighbor-state stack (column concatenation) as
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}.
\label{eq:neighbor_stack_detailed}
\end{equation}
This neighbor stack will be used as an explicit input to the learned predictor and to the distributed MPC coordination.

\subsection{Actuators and input increments (fixing the meaning of $u$ and $\Delta u$)}
Each stand is manipulated by two actuators:
roll gap (screw-down/hydraulic gap) $s_i(t)$ and stand speed $v_i(t)$.
Define the input vector
\begin{equation}
u_i(t)=
\begin{bmatrix}
s_i(t)\\
v_i(t)
\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_detailed}
\end{equation}

\textbf{Discrete input increment (fixed throughout the paper):}
the symbol ``$\Delta$'' attached to \emph{inputs} denotes a sample-to-sample increment:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=
\begin{bmatrix}
\Delta s_i(t_n)\\
\Delta v_i(t_n)
\end{bmatrix}.
\label{eq:du_discrete_detailed}
\end{equation}
Thus, $\Delta x$ and $\Delta u$ have different meanings by definition:
$\Delta x$ is a deviation-from-reference \emph{state}, while $\Delta u$ is a discrete-time \emph{input increment}.

\subsection{Disturbance notation}
Let $d_i(t)$ denote exogenous disturbances affecting subsystem $i$ (e.g., entry thickness fluctuation, friction variation, material property drift).
We use $\Delta d_i([t_n,t_{n+1}])$ to denote the disturbance signal over the interval, and define an interval-equivalent disturbance (average) later.

\subsection{Additional basic notation and dimensions}
$I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes an $a\times b$ zero matrix.
For a vector $z$, $\|z\|_{Q}^{2}\triangleq z^\top Q z$.

%========================
\section{Interval-Level Input Parameterization and Its Validity}
%========================

\subsection{Why parameterize within-interval input trajectories?}
Although supervisory controllers update commands at discrete sampling instants, the physical actuation and underlying drive/hydraulic loops evolve continuously.
In cold rolling, abrupt gap/speed changes can excite tension oscillations and degrade thickness stability.
Moreover, in many industrial implementations, setpoints are interpolated (ramps/filters) within each sampling interval to ensure smoothness.
Therefore, describing the within-interval increment trajectory using a low-order basis has three benefits:

\begin{itemize}
\item \textbf{Finite-dimensional decision variables:} compress continuous-time command profiles into a small set of coefficients for online optimization.
\item \textbf{Smoothness by construction:} avoid discontinuous commands inside the interval, improving closed-loop robustness.
\item \textbf{Whole-interval constraint enforcement:} bounds can be imposed for all $\tau\in[0,\delta_n]$, not only at sampling instants.
\end{itemize}

\subsection{Quadratic polynomial parameterization (two-input vector form)}
Within interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$.
We parameterize the input increment trajectory by a vector quadratic polynomial:
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=
\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,\qquad \tau\in[0,\delta_n],
\label{eq:du_poly_vector_detailed}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and $n_u=2$.
Component-wise:
\begin{equation}
\Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad
\Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2.
\label{eq:du_poly_component_detailed}
\end{equation}
Define the stacked coefficient vector:
\begin{equation}
\Gamma_{i,n}\triangleq
\big[
(\Gamma_{i,n0})^\top,\,
(\Gamma_{i,n1})^\top,\,
(\Gamma_{i,n2})^\top
\big]^\top
\in\mathbb{R}^{p},\qquad p=3n_u=6.
\label{eq:Gamma_dim_detailed}
\end{equation}

\subsection{Is polynomial parameterization ``certainly correct''? (explicit assumptions and fallback)}
\begin{remark}[Correctness, applicability, and fallback]
The parameterization \eqref{eq:du_poly_vector_detailed} is a \emph{control parameterization} choice, not a first-principles identity.
It is reasonable under the following practical assumptions:

\begin{enumerate}
\item \textbf{Within-interval smooth implementation:} the actuator setpoints (gap and speed) are implemented with interpolation/ramps/filters,
so the actual increments inside the interval can be approximated by a low-order smooth function.
\item \textbf{Sampling interval not excessively large:} $\delta_n$ is not too large relative to actuator bandwidth,
so low-order polynomials can capture the dominant within-interval profile.
\item \textbf{Model-consistent implementation:} the same interpolation/command-generation logic used in offline data generation is used online,
ensuring that the learned surrogate matches the executed control.
\end{enumerate}

If the real system uses zero-order-hold (ZOH) increments inside each interval, then \eqref{eq:du_poly_vector_detailed} reduces to the ZOH case
by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$. Therefore, the proposed framework subsumes ZOH as a special case.
More complex profiles (e.g., piecewise-linear or spline) can be adopted if needed by increasing basis richness, at the cost of more decision variables.
\end{remark}

\subsection{End-point increment versus interval-equivalent increment (removing ambiguity)}
To avoid ambiguity between the \emph{continuous-time} within-interval command and the \emph{discrete-time} input update, we define two distinct quantities.

\paragraph{1) End-point increment (used for discrete input update).}
Define
\begin{equation}
\Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n})
=
\Gamma_{i,n0}+\Gamma_{i,n1}\delta_n+\Gamma_{i,n2}\delta_n^2.
\label{eq:du_end_detailed}
\end{equation}
This quantity determines the sampled input at the next sampling instant via
\begin{equation}
u_i(t_{n+1}) = u_i(t_n) + \Delta u_i^{\mathrm{end}}(t_n).
\label{eq:u_update_end_detailed}
\end{equation}

\paragraph{2) Interval-equivalent (averaged) increment (used as equivalent discrete effect when needed).}
Define the interval average
\begin{equation}
\overline{\Delta u}_i(t_n)\triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau
=
\Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:du_avg_detailed}
\end{equation}
The average $\overline{\Delta u}_i(t_n)$ can be used as an equivalent discrete quantity when one needs to represent the average within-interval actuation effect,
or when building consistency between continuous trajectories and discrete approximations. Importantly, we do \emph{not} use the average to define the sampled end-point update;
the end-point update is governed by \eqref{eq:du_end_detailed}--\eqref{eq:u_update_end_detailed}.

%========================
\section{Data-Driven Dynamics: Dataset Construction}
%========================

\subsection{Unknown coupled interval mapping to be approximated}
The five-stand coupled deviation-state evolution over $[t_n,t_{n+1}]$ can be represented by an unknown nonlinear mapping:
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\Big(
\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,
\Gamma_{i,n},\,\delta_n,\,
\Delta d_i([t_n,t_{n+1}])
\Big),
\label{eq:true_unknown_mapping_detailed}
\end{equation}
where coupling enters through $\Delta x_{Z_i}(t_n)$ and through the fact that the underlying physics is a coupled five-stand process.
A conceptual equivalent linear discrete form is often written as
\begin{equation}
\Delta x_i(t_{n+1}) = M_d\,\Delta x_i(t_n) + N_d\,\Delta u_i(t_n) + F_d\,\Delta d_i(t_n),
\label{eq:conceptual_linear_detailed}
\end{equation}
but accurate identification of $(M_d,N_d,F_d)$ is difficult in practical cold rolling due to strong nonlinearities and varying operating conditions.
Therefore, we learn a nonlinear surrogate for \eqref{eq:true_unknown_mapping_detailed} from offline data.

\subsection{Sampling domains and disturbance averaging}
Let $\mathcal{I}_x$ denote the sampling domain (ranges) of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$ used for offline data generation.
Let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial coefficients $\Gamma_{i,n}$ for both gap and speed channels.
For disturbances, define an interval-equivalent disturbance (average) as
\begin{equation}
\Delta d_i(t_n) \triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta d_i(\tau)\,d\tau,
\label{eq:dist_avg_detailed}
\end{equation}
where $\Delta d_i(\tau)$ denotes the disturbance signal expressed in deviation form during the interval.

\subsection{One-step sample generation (five-stand coupled simulation)}
One training sample is generated per interval $[t_n,t_{n+1}]$ via the following steps:

\begin{enumerate}
\item \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$.
\item \textbf{Parameter sampling:} sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and choose $\delta_n$ from a prescribed range.
\item \textbf{Construct within-interval increment trajectory:} compute $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ by \eqref{eq:du_poly_vector_detailed}.
\item \textbf{Propagate coupled system:} integrate the \emph{five-stand coupled} rolling model on $[t_n,t_{n+1}]$ (e.g., Runge--Kutta 4)
under the within-interval input trajectory, and record the resulting $\Delta x_i(t_{n+1})$.
\end{enumerate}

Thus the interval-level sample can be written as
\begin{equation}
\mathcal{D}_{i,n}=
\Big(
\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n,\ \Delta x_i(t_{n+1})
\Big).
\label{eq:interval_sample_detailed}
\end{equation}

Repeating over many intervals yields the one-step dataset for subsystem $i$:
\begin{equation}
S_i=\Big\{
\big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big)
\ \Big|\ j=1,\ldots,J
\Big\},
\label{eq:one_step_dataset_detailed}
\end{equation}
where $J$ is the number of one-step samples. The overall dataset is $\{S_i\}_{i=1}^{5}$.

\subsection{$K$-step segment dataset for multi-step training}
One-step regression alone can lead to drift under long-horizon recursion, which is undesirable for MPC.
Therefore, we also construct $K$-step segments to support multi-step rollout training and reciprocal-consistency regularization.

Starting from $t_n$, generate a sequence $\{(\Gamma_{i,n+s},\delta_{n+s})\}_{s=0}^{K-1}$,
simulate the coupled five-stand model across $K$ consecutive intervals,
and record the deviation-state trajectory $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ and neighbor stacks $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.

Define the $K$-step segment sample as
\begin{equation}
\mathcal{W}_{i,n}=
\Big\{
(\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1};
(\Delta x_i(t_{n+s+1}))_{s=0}^{K-1}
\Big\}.
\label{eq:segment_sample_detailed}
\end{equation}
Collecting $J_K$ such segments yields
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\},
\label{eq:segment_dataset_detailed}
\end{equation}
where $K$ is the segment length and $J_K$ is the number of segments.

%========================
\section{Residual Neural Surrogate Model}
%========================

\subsection{What is learned: control-dependent one-step deviation dynamics}
The surrogate aims to approximate \eqref{eq:true_unknown_mapping_detailed} in a form suitable for MPC.
For subsystem $i$, define a control-dependent residual predictor:
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)+
\mathcal{N}_i\!\Big(
\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i
\Big),
\label{eq:forward_predictor_detailed}
\end{equation}
where $\Theta_i$ are trainable parameters and $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change.

\paragraph{Why the input must include control (with $u$ vs without $u$).}
If the network does not include $(\Gamma_{i,n},\delta_n)$, it reduces to an autoregressive model that reproduces trajectories under the training input patterns only.
MPC requires evaluating predicted trajectories under \emph{candidate} control decisions, therefore a control-dependent predictor \eqref{eq:forward_predictor_detailed} is necessary.

\subsection{Network input vector and dimensions}
Let $d=3$ and $p=6$. Define the input vector
\begin{equation}
X_{i,\mathrm{in}} \triangleq
\big[
\Delta x_i(t_n)^\top,\ \Delta x_{Z_i}(t_n)^\top,\ \Gamma_{i,n}^\top,\ \delta_n
\big]^\top
\in\mathbb{R}^{d(1+|Z_i|)+p+1}.
\label{eq:X_in_detailed}
\end{equation}
Then
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}.
\end{equation}

\subsection{Residual (shortcut) structure and its motivation}
To incorporate a persistence prior and improve long-horizon stability, we use a residual structure.
Define a selection matrix
\begin{equation}
\hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}]\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)},
\label{eq:Ihat_detailed}
\end{equation}
so that $\hat{I}_i X_{i,\mathrm{in}}=\Delta x_i(t_n)$.
The residual predictor can then be written as
\begin{equation}
X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i),
\label{eq:res_form_detailed}
\end{equation}
where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$.

\begin{remark}[Interpretation]
Equation \eqref{eq:res_form_detailed} has a baseline-plus-correction form:
the shortcut term propagates the current deviation state, and the network learns the correction capturing nonlinear rolling effects and coupling through neighbors.
This improves optimization stability because the model only needs to learn the incremental change rather than the full next-state mapping.
\end{remark}

\subsection{Auxiliary decomposition for variable sampling intervals $\delta_n$ (no notation conflicts)}
To enhance robustness under variable $\delta_n$, we decompose
\begin{equation}
\mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i),
\label{eq:aux_decomp_detailed}
\end{equation}
where $\psi_i(\cdot)$ is a lightweight branch intended to capture low-frequency/scale effects correlated with $\delta_n$,
and $\rho_i(\cdot)$ captures remaining nonlinear coupling corrections.
This decomposition avoids symbol conflicts with sampling sets (e.g., $\mathcal{I}_x,\mathcal{I}_\Gamma$) and optimizer learning rate.

%========================
\section{Training: One-step Accuracy, Multi-step Rollout, and Reciprocal Consistency}
%========================

\subsection{One-step supervised targets}
For a one-step sample $(\Delta x_i(t_n),\Delta x_i(t_{n+1}))$, define the residual target
\begin{equation}
\Delta r_i(t_n)\triangleq \Delta x_i(t_{n+1})-\Delta x_i(t_n).
\label{eq:res_target_detailed}
\end{equation}
For sample $j$, the network input is
\begin{equation}
X_{i,\mathrm{in}}^{(j)}=
\big[
\Delta x_i^{(j)}(t_n)^\top,\ \Delta x_{Z_i}^{(j)}(t_n)^\top,\ \Gamma_{i,n}^{(j)\top},\ \delta_n^{(j)}
\big]^\top.
\end{equation}

\subsection{Forward rollout over a $K$-step segment}
Given $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize
\begin{equation}
\Delta \hat{x}_i(t_n)=\Delta x_i(t_n),
\end{equation}
and recursively roll forward:
\begin{equation}
\Delta \hat{x}_i(t_{n+s+1})
=
\Delta \hat{x}_i(t_{n+s})+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}),
\Gamma_{i,n+s},\delta_{n+s};\Theta_i
\Big),
\quad s=0,\ldots,K-1.
\label{eq:fwd_rollout_detailed}
\end{equation}

\subsection{Backward model and backward rollout (reciprocal consistency)}
To regularize long-horizon behavior, we introduce a backward residual model
\begin{equation}
\mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by $\bar{\Theta}_i$.
Define backward input at step $s$ as
\begin{equation}
X_{i,\mathrm{in}}^{b}(t_{n+s})
=
\big[
\Delta \bar{x}_i(t_{n+s+1})^\top,\ \Delta \hat{x}_{Z_i}(t_{n+s+1})^\top,\ \Gamma_{i,n+s}^\top,\ \delta_{n+s}
\big]^\top,
\end{equation}
where $\Delta \hat{x}_{Z_i}$ is taken from the forward rollout.
Set the terminal condition
\begin{equation}
\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}),
\end{equation}
and roll backward:
\begin{equation}
\Delta \bar{x}_i(t_{n+s})
=
\Delta \bar{x}_i(t_{n+s+1})
+
\mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\bar{\Theta}_i\Big),
\quad s=K-1,\ldots,0,
\label{eq:bwd_rollout_detailed}
\end{equation}
where $\mathcal{B}_i(\cdot)$ outputs a backward residual correction that reconstructs the earlier state.
The supervised backward residual target for one step is
\begin{equation}
\Delta r_i^{b}(t_n)\triangleq \Delta x_i(t_n)-\Delta x_i(t_{n+1}).
\end{equation}

\subsection{Loss functions (with explicit motivations)}
Define the reciprocal prediction error over a segment:
\begin{equation}
E_i(t_n)=\sum_{s=0}^{K}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\right\|^2.
\label{eq:recip_error_detailed}
\end{equation}

\paragraph{1) One-step residual loss (local accuracy).}
\begin{equation}
L_{\mathrm{1step}}(\Theta_i)
=
\frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\left\|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}_i\!\left(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i\right)
\right\|^2.
\label{eq:L1_detailed}
\end{equation}

\paragraph{2) Multi-step rollout loss (suppress drift under recursion).}
\begin{equation}
L_{\mathrm{roll}}(\Theta_i)
=
\frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\left\|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})
\right\|^2.
\label{eq:Lroll_detailed}
\end{equation}

\paragraph{3) Backward one-step loss (train backward model for consistency).}
\begin{equation}
L_{\mathrm{bwd}}(\bar{\Theta}_i)
=
\frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\left\|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}_i\!\left(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i\right)
\right\|^2.
\label{eq:Lbwd_detailed}
\end{equation}

\paragraph{4) Multi-step reciprocal consistency loss (structural regularization).}
\begin{equation}
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
=
\frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n).
\label{eq:Lmsrp_detailed}
\end{equation}

\paragraph{Total loss and training.}
\begin{equation}
L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}},
\label{eq:Ltotal_detailed}
\end{equation}
where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set.
Parameters are optimized by Adam:
\begin{equation}
\Theta_{i,t+1}=\Theta_{i,t}-\alpha\frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}}+\varepsilon},
\end{equation}
where $\alpha$ is learning rate, $\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected first/second moments, and $\varepsilon>0$ is a small constant.

\subsection{Training algorithm (explicit procedural description)}
\begin{algorithm}[t]
\caption{Offline training of forward/backward residual models for subsystem $i$}
\label{alg:train_detailed}
\small
\begin{algorithmic}[1]
\STATE \textbf{Input:} one-step dataset $S_i$, segment dataset $S_i^{(K)}$, horizons $K$, weights $\lambda_1\dots\lambda_4$
\STATE Initialize forward parameters $\Theta_i$ and backward parameters $\bar{\Theta}_i$
\FOR{each epoch}
    \STATE Sample a mini-batch of segments $\{\mathcal{W}_{i,n}\}$ from $S_i^{(K)}$
    \STATE For each segment, perform forward rollout using \eqref{eq:fwd_rollout_detailed} to obtain $\{\Delta\hat{x}_i(t_{n+s})\}_{s=0}^{K}$
    \STATE Perform backward rollout using \eqref{eq:bwd_rollout_detailed} to obtain $\{\Delta\bar{x}_i(t_{n+s})\}_{s=0}^{K}$
    \STATE Compute $L_{\mathrm{1step}},L_{\mathrm{roll}},L_{\mathrm{bwd}},L_{\mathrm{msrp}}$ via \eqref{eq:L1_detailed}--\eqref{eq:Lmsrp_detailed}
    \STATE Update $(\Theta_i,\bar{\Theta}_i)$ by Adam to minimize $L_{\mathrm{total}}$ in \eqref{eq:Ltotal_detailed}
\ENDFOR
\STATE \textbf{Output:} trained forward model $\mathcal{N}_i(\cdot;\Theta_i^*)$ and backward model $\mathcal{B}_i(\cdot;\bar{\Theta}_i^*)$
\end{algorithmic}
\end{algorithm}

%========================
\section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC Using the Learned Surrogate}
%========================

\subsection{Why distributed Nash coordination (explicit conflict source)}
In a five-stand mill, tensions $T_i$ are shared coupling variables influenced by both stand $i$ and stand $i+1$ (mainly via speed actions),
and thickness is strongly affected by roll gap while also interacting with tension through deformation and transport coupling.
Hence, a purely local action that improves $\Delta h_i$ may worsen a shared tension $\Delta T_i$ for a neighbor, and vice versa.
This induces an intrinsic multi-agent coupling conflict, motivating a coordination mechanism.
We adopt a Nash-equilibrium-seeking distributed MPC (RNE-DMPC) where each stand solves a local MPC problem and iterates best responses.

\subsection{Prediction model inside MPC (explicit prediction--control interface)}
At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c$ with $N_c\le N_p$.
Let $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$.
Given candidate decisions $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$, the surrogate provides the multi-step prediction:
\begin{equation}
\Delta \hat{x}_i(t_{n+s+1})
=
\Delta \hat{x}_i(t_{n+s})+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}),
\Gamma_{i,n+s},\delta_{n+s};\Theta_i^*
\Big),
\quad s=0,\ldots,N_p-1.
\label{eq:mpc_rollout_detailed}
\end{equation}
Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is provided by neighbor communication during Nash iterations.
Equation \eqref{eq:mpc_rollout_detailed} makes the dependency \emph{explicit}:
candidate control parameters $\Gamma_{i,n+s}$ change the predicted thickness/tension deviations, enabling online optimization.

\subsection{Decision variables and reference meaning (no ambiguity)}
\paragraph{Decision variables.}
The local decision vector is the stacked parameter sequence over the control horizon:
\begin{equation}
\mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\Gamma_{i,n+1}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top\in\mathbb{R}^{pN_c}.
\label{eq:Gamma_stack_detailed}
\end{equation}

\paragraph{Reference in deviation coordinates.}
Because $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ by definition, the desired deviation reference is always
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}.
\label{eq:dxref_zero_detailed}
\end{equation}
Thus, the MPC objective penalizes predicted deviation states directly.

\subsection{Local objective function (explicit thickness+tension weighting)}
Let $\Delta \hat{x}_i(t_{n+s})=[\Delta\hat{h}_i(t_{n+s}),\,\Delta\widehat{T}_{i-1}(t_{n+s}),\,\Delta\widehat{T}_{i}(t_{n+s})]^\top$.
Choose $Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$.
A detailed and interpretable weighting choice is to separate thickness and tension weights, e.g.,
\begin{equation}
Q_i=\mathrm{diag}(q_{h,i},\,q_{T,i-1},\,q_{T,i}),
\label{eq:Qi_diag_detailed}
\end{equation}
where $q_{h,i}$ penalizes thickness deviation and $q_{T,i-1},q_{T,i}$ penalize adjacent tension deviations.
For boundary tensions, set the corresponding weights to zero (e.g., $q_{T,0}=q_{T,5}=0$).

The local cost is defined as
\begin{equation}
J_i=
\sum_{s=1}^{N_p}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta x_{i,\mathrm{ref}}(t_{n+s})\right\|_{Q_i}^2
+
\sum_{s=0}^{N_c-1}\left\|\Gamma_{i,n+s}\right\|_{R_i}^2.
\label{eq:Ji_detailed}
\end{equation}
Using \eqref{eq:dxref_zero_detailed}, the first term becomes $\sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2$,
which explicitly penalizes predicted thickness and tension deviations.

\subsection{Constraints: absolute bounds, increment bounds over the entire interval, and propagation}
\paragraph{1) Absolute input bounds (gap and speed).}
\begin{equation}
u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1,
\label{eq:u_abs_detailed}
\end{equation}
where $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ specify component-wise bounds for $(s_i,v_i)$.

\paragraph{2) Increment trajectory bounds for all $\tau$ within each interval.}
\begin{equation}
\Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max},
\qquad \forall \tau\in[0,\delta_{n+s}],
\label{eq:du_traj_detailed}
\end{equation}
where $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ specify component-wise bounds for $(\Delta s,\Delta v)$.

\paragraph{Practical enforcement of \eqref{eq:du_traj_detailed}.}
For each scalar channel $q(\tau)=a+b\tau+c\tau^2$ on $[0,\delta]$, extrema occur at $\tau=0$, $\tau=\delta$,
and possibly at $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$.
Therefore, to enforce \eqref{eq:du_traj_detailed}, we check these candidate points separately for
$\Delta s_{i,n+s}(\tau)$ and $\Delta v_{i,n+s}(\tau)$ using coefficients in \eqref{eq:du_poly_component_detailed}.

\paragraph{3) Discrete-time input propagation consistent with the within-interval trajectory.}
We propagate the sampled input using the end-point increment:
\begin{equation}
u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}),
\qquad
\Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}).
\label{eq:u_prop_detailed}
\end{equation}
This ensures consistency between the within-interval increment trajectory and the sampled input sequence, and supports enforcing \eqref{eq:u_abs_detailed}.

\subsection{Local NLP solved at each Nash iteration (explicit statement)}
At Nash iteration index $l$, subsystem $i$ solves the differentiable nonlinear program:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}
=
\arg\min_{\mathbf{\Gamma}_i}\ J_i
\quad\text{s.t.}\quad
\eqref{eq:mpc_rollout_detailed},\ \eqref{eq:u_abs_detailed},\ \eqref{eq:du_traj_detailed},\ \eqref{eq:u_prop_detailed}.
\label{eq:local_nlp_detailed}
\end{equation}
Because the surrogate $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_nlp_detailed} can be solved via gradient-based NLP solvers
(e.g., SQP or interior-point methods) with automatic differentiation.

\subsection{Nash best-response coordination (detailed algorithm and safeguards)}
\paragraph{Best-response update.}
Given neighbor strategies $\mathbf{\Gamma}_{Z_i}$ (received from neighbors) and neighbor predicted trajectories,
subsystem $i$ computes its best response by solving \eqref{eq:local_nlp_detailed}.
The Nash iteration repeats until convergence.

\paragraph{Convergence metric.}
Define the relative change
\begin{equation}
\varsigma^{(l)}=
\max_i
\frac{\left\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\right\|_2}
{\left\|\mathbf{\Gamma}_i^{(l-1)}\right\|_2+\epsilon},
\label{eq:nash_metric_detailed}
\end{equation}
where $\epsilon>0$ avoids division by zero. Stop if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$.

\paragraph{Practical safeguards (important for avoiding ``too ideal'' criticism).}
Best-response iterations are not globally guaranteed to converge for general nonlinear coupled problems.
Therefore, we set a maximum iteration number $L_{\max}$ and optionally use relaxation:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)},\qquad \beta\in(0,1].
\label{eq:relax_detailed}
\end{equation}
If convergence is not reached within $L_{\max}$, we apply the last iterate $\mathbf{\Gamma}_i^{(L_{\max})}$.

\begin{algorithm}[t]
\caption{Online RNE-DMPC at sampling time $t_n$ (five-stand)}
\label{alg:online_detailed}
\small
\begin{algorithmic}[1]
\STATE \textbf{Input:} current measurements/estimates $\{\Delta x_i(t_n)\}_{i=1}^5$, previous solution $\{\mathbf{\Gamma}_i(t_{n-1})\}$ (for warm start)
\STATE Initialize $\mathbf{\Gamma}_i^{(0)}$ (warm-start), set $l\gets 1$
\REPEAT
    \STATE Each stand predicts $\{\Delta\hat{x}_i^{(l)}(t_{n+s})\}$ using \eqref{eq:mpc_rollout_detailed} with neighbor predictions from iteration $l-1$
    \STATE Each stand solves its local NLP \eqref{eq:local_nlp_detailed} to obtain best response $\mathbf{\Gamma}_i^{(l)}$
    \STATE Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors
    \STATE (Optional) apply relaxation \eqref{eq:relax_detailed}
    \STATE Compute $\varsigma^{(l)}$ via \eqref{eq:nash_metric_detailed}, set $l\gets l+1$
\UNTIL{$\varsigma^{(l-1)}\le \varsigma_{\mathrm{tol}}$ or $l-1=L_{\max}$}
\STATE Set $\Gamma_{i,n}^*$ to the first block of $\mathbf{\Gamma}_i^{(l-1)}$
\STATE Apply within-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$
\STATE Update sampled input by $u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n)$ using \eqref{eq:du_end_detailed}--\eqref{eq:u_update_end_detailed}
\STATE Shift horizon: $n\gets n+1$
\end{algorithmic}
\end{algorithm}

\subsection{Closed-loop prediction--control connection (explicit end-to-end description)}
At each sampling time $t_n$, the closed-loop operation is:

\begin{enumerate}
\item Measure/estimate $\Delta h_i(t_n)$ and $\Delta T_i(t_n)$, form $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$.
\item Run Nash best-response iterations (Algorithm~\ref{alg:online_detailed}):
each stand evaluates candidate decisions through multi-step prediction \eqref{eq:mpc_rollout_detailed} and solves \eqref{eq:local_nlp_detailed}.
\item After termination, apply the first-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ to the actuators within $[t_n,t_{n+1}]$.
\item Update the sampled input using the end-point increment so that $u_i(t_{n+1})$ is consistent with the within-interval command.
\item Roll the horizon forward and repeat.
\end{enumerate}

In this way, the neural surrogate provides the control-dependent multi-step predictions required by MPC,
and the distributed Nash coordination resolves coupling conflicts induced by shared tensions in the five-stand mill.

%========================
\section{Symbol Summary (to eliminate any remaining ambiguity)}
%========================

\paragraph{Indices and time.}
$i\in\{1,\dots,5\}$: stand index; $t_n$: sampling instant; $\delta_n=t_{n+1}-t_n$: sampling interval length; $\tau\in[0,\delta_n]$: within-interval time.

\paragraph{Physical variables and references.}
$h_i(t)$: exit thickness of stand $i$; $T_i(t)$: inter-stand tension between stand $i$ and $i+1$ ($i=1,\dots,4$).
$h_i^{\mathrm{ref}}(t),T_i^{\mathrm{ref}}(t)$: reference trajectories (setpoints).

\paragraph{Deviation state.}
$\Delta h_i=h_i-h_i^{\mathrm{ref}}$; $\Delta T_i=T_i-T_i^{\mathrm{ref}}$; $\Delta x_i=[\Delta h_i,\Delta T_{i-1},\Delta T_i]^\top\in\mathbb{R}^3$.
Boundary: $\Delta T_0\equiv 0,\Delta T_5\equiv 0$.

\paragraph{Coupling sets.}
$Z_i$: neighbor set; $\Delta x_{Z_i}=\mathrm{col}\{\Delta x_k|k\in Z_i\}$.

\paragraph{Inputs and increments.}
$u_i=[s_i,v_i]^\top$: roll gap and stand speed.
$\Delta u_i(t_n)=u_i(t_n)-u_i(t_{n-1})$: discrete increment (not a deviation).
$\Delta u_{i,n}(\tau;\Gamma_{i,n})$: within-interval increment trajectory.
$\Gamma_{i,n}\in\mathbb{R}^{6}$: quadratic polynomial coefficients (two channels, three coefficients each).
$\Delta u_i^{\mathrm{end}}(t_n)=\Delta u_{i,n}(\delta_n;\Gamma_{i,n})$: end-point increment.
$\overline{\Delta u}_i(t_n)=\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau$: interval-average increment.

\paragraph{Datasets.}
$\mathcal{I}_x$: sampling domain of $\Delta x$; $\mathcal{I}_\Gamma$: sampling domain of $\Gamma$.
$S_i$: one-step dataset size $J$; $S_i^{(K)}$: $K$-step segment dataset size $J_K$.

\paragraph{Neural models.}
$\mathcal{N}_i(\cdot;\Theta_i)$: forward residual surrogate; $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$: backward model.
$\psi_i(\cdot),\rho_i(\cdot)$: auxiliary decomposition branches.

\paragraph{MPC and Nash.}
$N_p$: prediction horizon; $N_c$: control horizon.
$Q_i\in\mathbb{R}^{3\times 3}$, $R_i\in\mathbb{R}^{6\times 6}$: weighting matrices.
$\varsigma^{(l)}$: Nash convergence metric; $\varsigma_{\mathrm{tol}}$: tolerance; $L_{\max}$: max iterations; $\beta$: relaxation factor.
$\epsilon,\varepsilon>0$: small constants; $\alpha$: Adam learning rate.

你关心的两点，我把“更详细版”里怎么处理了（不需要你再解释给老师）

多项式参数化“不一定正确”：我在文中用 Remark 明确写成“control parameterization 选择 + 假设 + ZOH退化特例 + 可扩展”。这就是老师最爱挑的点：你现在主动承认并给边界条件，反而更严谨。
还会不会太模糊：我把最容易被抓的“平均更新 vs 端点更新”彻底拆开了（\Delta u^{end} vs \overline{\Delta u}），并明确在线用端点更新。这个改动通常能直接避免一轮追问。

如果你把你原文里每章的标题、已有公式编号、以及你已有的 Figure/Table 编号体系发我（哪怕是截图），我还能把这份 LaTeX 进一步改成完全可直接替换你论文对应章节的版本：包括式号对齐、引用对齐、以及把你原来没写清的“具体输入/约束/权重/实验设置”以同样详细度补齐。

Answer 8

latex
%======================================================================
%  FEWER SECTIONS VERSION (STILL DETAILED)
%  Structure reduced to 2 main sections:
%   1) Learning a control-dependent surrogate model (notation + parameterization + data + NN + training)
%   2) RNE-DMPC using the learned surrogate (objective + constraints + Nash iteration + closed loop)
%======================================================================

%========================
\section{Learning a Control-Dependent Surrogate Model for a Five-Stand Tandem Mill}
%========================

\subsection{Five-stand setting, controlled variables, and unified notation}
Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$.
Let $t_n$ be the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n>0$ the sampling interval length.
Within each interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$.

\paragraph{Controlled physical variables.}
Let $h_i(t)$ denote the exit thickness of stand $i$ ($i=1,\dots,5$),
and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ($i=1,\dots,4$).
The overall five-stand system is strongly coupled mainly through tension propagation: speed and deformation changes at one stand
affect neighboring tensions, which in turn influence thickness and stability.

\paragraph{References and deviation variables (meaning of $\Delta$ for states).}
Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ denote reference trajectories/setpoints given by the schedule or regulation targets.
Define deviation (tracking-error) variables
\begin{equation}
\Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad
\Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t).
\label{eq:dev_state_def}
\end{equation}
\textbf{Convention (fixed throughout):} the symbol ``$\Delta$'' attached to \emph{states} always means deviation from reference.
Hence, the deviation-coordinate control goal is always $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$.

\paragraph{Local deviation state vector (thickness + adjacent tensions).}
For each stand $i$, define the local deviation state
\begin{equation}
\Delta x_i(t)\triangleq
\begin{bmatrix}
\Delta h_i(t)\\
\Delta T_{i-1}(t)\\
\Delta T_i(t)
\end{bmatrix}\in\mathbb{R}^{d},\qquad d=3.
\label{eq:local_x_def}
\end{equation}
To keep a unified dimension for all stands, adopt boundary conventions
\begin{equation}
\Delta T_0(t)\equiv 0,\qquad \Delta T_5(t)\equiv 0,
\label{eq:boundary_T}
\end{equation}
and later set the corresponding weights to zero so that boundary virtual components do not affect optimization.

\paragraph{Neighbor sets and coupling information (five-stand chain).}
Coupling is dominated by adjacent stands, so define
\begin{equation}
Z_1=\{2\},\quad
Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad
Z_5=\{4\}.
\label{eq:neighbor_set}
\end{equation}
Define the neighbor-state stack
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}.
\label{eq:xZi_def}
\end{equation}

\paragraph{Actuators and discrete increments (meaning of $\Delta$ for inputs).}
Each stand is manipulated by roll gap $s_i(t)$ and stand speed $v_i(t)$:
\begin{equation}
u_i(t)=\begin{bmatrix}s_i(t)\\ v_i(t)\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2.
\label{eq:ui_def}
\end{equation}
We optimize sample-to-sample input increments:
\begin{equation}
\Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1})
=\begin{bmatrix}\Delta s_i(t_n)\\ \Delta v_i(t_n)\end{bmatrix}.
\label{eq:du_discrete}
\end{equation}
\textbf{Convention (fixed throughout):} ``$\Delta$'' on \emph{inputs} means discrete increment, not deviation-from-reference.
Thus, $\Delta x$ (state deviation) and $\Delta u$ (input increment) are distinct by definition.

\paragraph{Disturbance.}
Let $d_i(t)$ denote exogenous disturbances (entry thickness fluctuation, friction variation, material drift, etc.).
We write $\Delta d_i(\tau)$ as an interval disturbance signal in deviation form and define an interval-equivalent disturbance by averaging later.

\paragraph{Basic notation.}
$I_d$ is the $d\times d$ identity matrix and $0_{a\times b}$ is the $a\times b$ zero matrix.
For any vector $z$, define $\|z\|_Q^2\triangleq z^\top Q z$.

%----------------------------------------------------------------------
\subsection{Interval-level input parameterization, its validity, and dataset construction}
%----------------------------------------------------------------------

\paragraph{Why parameterize within-interval trajectories?}
Although supervisory decisions are updated at sampling instants $t_n$, the physical drive/hydraulic loops evolve continuously inside $[t_n,t_{n+1}]$.
Abrupt changes can excite tension oscillations and deteriorate thickness stability.
Therefore, we parameterize the within-interval increment trajectory by a low-order smooth basis to:
(i) obtain a compact finite-dimensional decision variable for optimization,
(ii) enforce smooth commands inside the interval, and
(iii) enforce constraints for all $\tau\in[0,\delta_n]$.

\paragraph{Quadratic polynomial parameterization (vector form, two inputs).}
For interval $[t_n,t_{n+1}]$ and $\tau\in[0,\delta_n]$, parameterize
\begin{equation}
\Delta u_{i,n}(\tau;\Gamma_{i,n})
=\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,
\label{eq:poly_param}
\end{equation}
where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and $n_u=2$.
Component-wise,
\begin{equation}
\Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad
\Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2.
\label{eq:poly_param_comp}
\end{equation}
Define the stacked parameter vector
\begin{equation}
\Gamma_{i,n}\triangleq
\big[(\Gamma_{i,n0})^\top,(\Gamma_{i,n1})^\top,(\Gamma_{i,n2})^\top\big]^\top
\in\mathbb{R}^{p},\qquad p=3n_u=6.
\label{eq:Gamma_def}
\end{equation}

\paragraph{Is polynomial parameterization ``certainly correct''? (explicit assumption + fallback)}
The representation \eqref{eq:poly_param} is a \emph{control parameterization} choice rather than a first-principles identity.
It is reasonable when (i) the implemented setpoints are interpolated/filtered within each sampling interval (common in industry),
(ii) $\delta_n$ is not excessively large compared to actuator bandwidth, and (iii) the same command-generation logic is used in offline simulation and online execution,
so that the learned model matches the executed control.
If the implementation is zero-order hold (ZOH), \eqref{eq:poly_param} reduces to ZOH by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$,
so the framework still applies.

\paragraph{End-point increment vs interval-average increment (remove ambiguity).}
To avoid mixing continuous-time trajectories with discrete-time updates, define two quantities:

\textbf{End-point increment (used for sampled input update):}
\begin{equation}
\Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n})
=\Gamma_{i,n0}+\Gamma_{i,n1}\delta_n+\Gamma_{i,n2}\delta_n^2,
\label{eq:du_end}
\end{equation}
and update the sampled input by
\begin{equation}
u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n).
\label{eq:u_update}
\end{equation}

\textbf{Interval-average increment (equivalent discrete effect when needed):}
\begin{equation}
\overline{\Delta u}_i(t_n)\triangleq\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau
=
\Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}.
\label{eq:du_avg}
\end{equation}
We emphasize that \eqref{eq:u_update} uses the end-point increment \eqref{eq:du_end}, while \eqref{eq:du_avg} is used only as an equivalent quantity
(e.g., for averaged-effect interpretations or auxiliary discrete approximations).

\paragraph{Unknown coupled interval mapping to be learned.}
Over $[t_n,t_{n+1}]$, the five-stand coupled deviation-state evolution is represented by an unknown nonlinear mapping
\begin{equation}
\Delta x_i(t_{n+1})
=
\Phi_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta d_i([t_n,t_{n+1}])\Big),
\label{eq:unknown_map}
\end{equation}
where coupling enters via neighbor states and via the underlying coupled physics.
A conceptual equivalent linear discrete form is often written as
\begin{equation}
\Delta x_i(t_{n+1})=M_d\Delta x_i(t_n)+N_d\Delta u_i(t_n)+F_d\Delta d_i(t_n),
\end{equation}
but accurate derivation/identification is difficult in practice due to nonlinearity and varying regimes; hence we adopt a data-driven surrogate.

\paragraph{Sampling domains and disturbance averaging.}
Let $\mathcal{I}_x$ denote the sampling domain (ranges) of $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ for offline data generation,
and let $\mathcal{I}_\Gamma$ denote the sampling domain of $\Gamma_{i,n}$.
Define the interval-average disturbance as
\begin{equation}
\Delta d_i(t_n)\triangleq\frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau.
\label{eq:dist_avg}
\end{equation}

\paragraph{One-step sample generation (coupled five-stand simulation).}
For each interval $[t_n,t_{n+1}]$ and each subsystem $i$:
(1) sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$;
(2) sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and choose $\delta_n$ from a prescribed range;
(3) generate $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ by \eqref{eq:poly_param};
(4) integrate the \emph{five-stand coupled} rolling model on $[t_n,t_{n+1}]$ (e.g., RK4) and record $\Delta x_i(t_{n+1})$.
Thus, one interval-level sample is
\begin{equation}
\mathcal{D}_{i,n}=
\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta x_i(t_{n+1})\Big).
\label{eq:one_sample}
\end{equation}
Repeating yields the one-step dataset
\begin{equation}
S_i=\Big\{
\big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big)
\ \Big|\ j=1,\ldots,J
\Big\},
\label{eq:Si}
\end{equation}
and the overall dataset is $\{S_i\}_{i=1}^{5}$.

\paragraph{$K$-step segment dataset (for multi-step training).}
To support multi-step rollout training and reciprocal-consistency regularization, we organize offline simulations into $K$-step segments.
Starting at $t_n$, sample $\{(\Gamma_{i,n+s},\delta_{n+s})\}_{s=0}^{K-1}$ and simulate $K$ consecutive intervals to obtain
$\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ and $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$.
Define the segment sample
\begin{equation}
\mathcal{W}_{i,n}=
\Big\{
(\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1};
(\Delta x_i(t_{n+s+1}))_{s=0}^{K-1}
\Big\},
\label{eq:segment_sample}
\end{equation}
and collect $J_K$ segments to form
\begin{equation}
S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}.
\label{eq:SiK}
\end{equation}

%----------------------------------------------------------------------
\subsection{Residual neural surrogate and detailed training objectives}
%----------------------------------------------------------------------

\paragraph{Control-dependent one-step residual predictor (why include control).}
We learn a control-dependent deviation-state predictor suitable for MPC:
\begin{equation}
\Delta \hat{x}_i(t_{n+1})
=
\Delta x_i(t_n)+
\mathcal{N}_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i\Big),
\label{eq:fwd_pred}
\end{equation}
where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are parameters.
Including $(\Gamma_{i,n},\delta_n)$ is essential: without control input, the model becomes autoregressive and cannot evaluate trajectories under candidate decisions,
which is required by MPC optimization.

\paragraph{Input vector and dimensions.}
Let $d=3$ and $p=6$. Define
\begin{equation}
X_{i,\mathrm{in}}=
\big[
\Delta x_i(t_n)^\top,\ \Delta x_{Z_i}(t_n)^\top,\ \Gamma_{i,n}^\top,\ \delta_n
\big]^\top
\in\mathbb{R}^{d(1+|Z_i|)+p+1},
\end{equation}
and $\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\to\mathbb{R}^{d}$.

\paragraph{Residual (shortcut) structure (baseline-plus-correction).}
Define the selection matrix
\begin{equation}
\hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}],
\end{equation}
so that $\hat{I}_iX_{i,\mathrm{in}}=\Delta x_i(t_n)$.
Then the predictor can be written as
\begin{equation}
X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i),
\end{equation}
where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$.
This residual form improves training stability and long-horizon rollout behavior because the network learns corrections rather than the full state.

\paragraph{Auxiliary decomposition for varying $\delta_n$ (avoid notation conflicts).}
To enhance robustness when $\delta_n$ varies, decompose
\begin{equation}
\mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i),
\end{equation}
where $\psi_i(\cdot)$ captures low-frequency/scale effects correlated with $\delta_n$ and $\rho_i(\cdot)$ learns remaining nonlinear coupling corrections.

\paragraph{One-step targets.}
For sample $(\Delta x_i(t_n),\Delta x_i(t_{n+1}))$, define
\begin{equation}
\Delta r_i(t_n)\triangleq \Delta x_i(t_{n+1})-\Delta x_i(t_n).
\end{equation}

\paragraph{Multi-step forward rollout (reduce drift).}
Given a segment $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$ and roll forward:
\begin{equation}
\Delta \hat{x}_i(t_{n+s+1})
=
\Delta \hat{x}_i(t_{n+s})+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}),
\Gamma_{i,n+s},\delta_{n+s};\Theta_i
\Big),
\quad s=0,\ldots,K-1.
\label{eq:fwd_roll}
\end{equation}

\paragraph{Backward model and reciprocal consistency (structural regularization).}
Introduce a backward residual model $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$ with the same input dimension.
Set terminal condition $\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K})$ and roll back:
\begin{equation}
\Delta \bar{x}_i(t_{n+s})
=
\Delta \bar{x}_i(t_{n+s+1})
+
\mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\bar{\Theta}_i\Big),
\quad s=K-1,\ldots,0,
\label{eq:bwd_roll}
\end{equation}
where
\begin{equation}
X_{i,\mathrm{in}}^{b}(t_{n+s})
=
\big[
\Delta \bar{x}_i(t_{n+s+1})^\top,\ \Delta \hat{x}_{Z_i}(t_{n+s+1})^\top,\ \Gamma_{i,n+s}^\top,\ \delta_{n+s}
\big]^\top.
\end{equation}
Define the reciprocal prediction error
\begin{equation}
E_i(t_n)=\sum_{s=0}^{K}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\right\|^2.
\end{equation}

\paragraph{Loss functions (all terms explicit).}
\begin{align}
L_{\mathrm{1step}}(\Theta_i)
&=
\frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\left\|
(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s}))
-\mathcal{N}_i(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i)
\right\|^2, \\
L_{\mathrm{roll}}(\Theta_i)
&=
\frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\left\|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})
\right\|^2, \\
L_{\mathrm{bwd}}(\bar{\Theta}_i)
&=
\frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\left\|
(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1}))
-\mathcal{B}_i(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i)
\right\|^2, \\
L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i)
&=\frac{1}{J_K}\sum_{j=1}^{J_K}E_i^{(j)}(t_n).
\end{align}
Combine them:
\begin{equation}
L_{\mathrm{total}}=\lambda_1L_{\mathrm{1step}}+\lambda_2L_{\mathrm{roll}}+\lambda_3L_{\mathrm{msrp}}+\lambda_4L_{\mathrm{bwd}},
\end{equation}
with $\lambda_\ell>0$ tuned on a validation set.
Optimization uses Adam:
\begin{equation}
\Theta_{i,t+1}=\Theta_{i,t}-\alpha\frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}}+\varepsilon},
\end{equation}
where $\alpha$ is learning rate and $\varepsilon>0$ ensures numerical stability.

%========================
\section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC with the Learned Surrogate}
%========================

\subsection{Control objective and why Nash coordination is necessary}
In the five-stand mill, tensions $T_i$ are shared coupling variables influenced by both stand $i$ and $i+1$ (mainly through speed actions),
and thickness is primarily influenced by roll gap but also affected indirectly by tension coupling.
Therefore, local improvements in thickness at one stand may worsen shared tensions for neighbors, creating a multi-agent coupling conflict.
To achieve coordinated thickness--tension regulation/tracking with manageable computation, we employ a Nash-equilibrium-seeking distributed MPC (RNE-DMPC).

\subsection{Neural predictor as the MPC model constraint (explicit prediction--control interface)}
At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c$ with $N_c\le N_p$.
Let $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$.
Given candidate decision sequence $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$, generate predictions by
\begin{equation}
\Delta \hat{x}_i(t_{n+s+1})
=
\Delta \hat{x}_i(t_{n+s})+
\mathcal{N}_i\!\Big(
\Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}),
\Gamma_{i,n+s},\delta_{n+s};\Theta_i^*
\Big),
\quad s=0,\ldots,N_p-1,
\label{eq:mpc_rollout}
\end{equation}
where $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from neighbor communication in Nash iterations.
Equation \eqref{eq:mpc_rollout} explicitly connects decision variables $\Gamma$ to predicted thickness/tension deviations.

\subsection{Local objective, constraints, and local NLP}
\paragraph{Reference in deviation coordinates.}
Since $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ by definition, the deviation reference is always
\begin{equation}
\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}.
\end{equation}

\paragraph{Decision variables.}
The local decision vector stacks polynomial parameters over the control horizon:
\begin{equation}
\mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\Gamma_{i,n+1}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top\in\mathbb{R}^{pN_c}.
\end{equation}

\paragraph{Local cost (explicit thickness+tension weighting).}
Let $\Delta \hat{x}_i(t_{n+s})=[\Delta\hat{h}_i(t_{n+s}),\Delta\widehat{T}_{i-1}(t_{n+s}),\Delta\widehat{T}_i(t_{n+s})]^\top$.
Choose
\begin{equation}
Q_i=\mathrm{diag}(q_{h,i},q_{T,i-1},q_{T,i})\in\mathbb{R}^{d\times d},\qquad
R_i\in\mathbb{R}^{p\times p}.
\end{equation}
For boundary virtual tensions, set corresponding weights to zero.
Define
\begin{equation}
J_i=
\sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2
+
\sum_{s=0}^{N_c-1}\|\Gamma_{i,n+s}\|_{R_i}^2.
\label{eq:Ji}
\end{equation}

\paragraph{Constraints (absolute bounds and whole-interval increment bounds).}
Absolute input bounds:
\begin{equation}
u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1.
\label{eq:u_bounds}
\end{equation}
Whole-interval increment bounds:
\begin{equation}
\Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max},
\qquad \forall\tau\in[0,\delta_{n+s}].
\label{eq:du_bounds}
\end{equation}
For each scalar quadratic $q(\tau)=a+b\tau+c\tau^2$ on $[0,\delta]$, extrema occur at $\tau=0$, $\tau=\delta$,
and possibly $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$. Hence \eqref{eq:du_bounds} is enforced by checking these points
separately for the gap channel and the speed channel.

\paragraph{Discrete-time propagation of absolute input (consistent with within-interval command).}
Update sampled input using end-point increment:
\begin{equation}
u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}),\qquad
\Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}).
\label{eq:u_prop}
\end{equation}

\paragraph{Local NLP at Nash iteration $l$.}
At each Nash iteration, subsystem $i$ solves the differentiable NLP:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\ J_i
\quad\text{s.t.}\quad
\eqref{eq:mpc_rollout},\ \eqref{eq:u_bounds},\ \eqref{eq:du_bounds},\ \eqref{eq:u_prop}.
\label{eq:local_nlp}
\end{equation}
Because $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_nlp} can be solved by SQP/interior-point methods using automatic differentiation.

\subsection{Nash best-response iteration, termination, and receding-horizon application}
\paragraph{Nash best-response iteration.}
Each stand repeatedly computes a best response to the latest neighbor strategies/predictions:
initialize $\mathbf{\Gamma}_i^{(0)}$ (warm start), then for $l=1,2,\ldots$:
(1) rollout predictions using \eqref{eq:mpc_rollout} with neighbor predictions from the previous iteration;
(2) solve \eqref{eq:local_nlp} to obtain $\mathbf{\Gamma}_i^{(l)}$;
(3) broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors.

\paragraph{Convergence metric.}
\begin{equation}
\varsigma^{(l)}=
\max_i\frac{\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\|_2}{\|\mathbf{\Gamma}_i^{(l-1)}\|_2+\epsilon},
\end{equation}
where $\epsilon>0$ avoids division by zero. Stop when $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$.

\paragraph{Practical safeguards (non-ideal but necessary).}
Best-response iterations are not globally guaranteed to converge for general nonlinear coupled problems.
Thus, we set a maximum iteration number $L_{\max}$.
If the criterion is not met within $L_{\max}$, apply the last iterate.
Optionally, apply relaxation to improve robustness:
\begin{equation}
\mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)},\qquad \beta\in(0,1].
\end{equation}

\paragraph{Receding-horizon implementation.}
After termination at time $t_n$, apply only the first-interval parameters $\Gamma_{i,n}^*$.
Execute the within-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$,
and update the sampled input using \eqref{eq:du_end}--\eqref{eq:u_update}.
Then shift the horizon to $t_{n+1}$ and repeat.

%------------------------
% Optional: very short symbol paragraph (keep if needed)
%------------------------
\paragraph{Key symbols (compact).}
$\Delta x$: deviation-from-reference state; $\Delta u$: discrete increment input.
$u_i=[s_i,v_i]^\top$: roll gap and speed. $\Gamma_{i,n}\in\mathbb{R}^6$: quadratic coefficients.
$\Delta u_i^{\mathrm{end}}$: end-point increment; $\overline{\Delta u}_i$: interval-average increment.
$Z_i$: neighbor set; $N_p,N_c$: prediction/control horizons; $Q_i,R_i$: weights; $L_{\max}$: max Nash iterations.

这版把你原来“详细度该有的解释”都保留了，但把章节压到2 个大 section（每个 section 下面少量 subsection），不会显得你论文“拆得太碎”。

如果你还想再少（比如只要 1 个 section），也可以：我可以把第二个 section 直接并回第一个 section 变成一个大章“Methodology”，下面最多 3 个小节（Notation+Data+Control）。

Answer 9

不一定“不太好”，只是我刚才那版把结构压得很狠（2 个大 section），它更像“期刊方法部分的浓缩写法”。如果你的原稿是论文/毕业论文那种模板（章节更细、逐步铺垫），那你会直觉觉得：** 读起来反而没那么顺**——这很正常。

关键点：结构好不好，不看“分几章”，看读者能不能一路不迷路。你原来的“三段式：数据集→网络→控制（DMPC/Nash）”其实是控制+学习文章里最常见、也最安全的结构。你现在真正需要的，多半不是“彻底换结构”，而是：

把我清理出来的不自洽/易被追问点塞回你原结构里
适度合并一些碎小节，让章标题不那么多，但逻辑仍按原来的叙事走

下面给你一个更稳的结论和改法。

为什么你会觉得“换完结构不如原来”

你原稿大概率是这样铺的：

先讲参数化和数据怎么来（读者接受“学什么”）
再讲网络怎么学（读者接受“怎么学”）
最后讲 MPC/Nash 怎么用（读者接受“怎么用”）

我那版把很多“解释性铺垫”改成了更紧的段落式表达，并且把东西合并了，所以你会觉得：

章节少了，但信息密度更大
读者如果没你这么熟系统，会更吃力
论文模板（尤其毕业论文）也不一定喜欢这么“期刊化”的写法

所以你的直觉是对的：不必硬换成我那种极简结构。

我建议的折中方案（更像“保留原叙事 + 少分点章节”）

最稳的做法：保留你原来的大章顺序（数据集→网络→控制），但把它们合并成一个“方法章”里 3–4 个小节，既不碎，也不跳。

你最该“拿走”的不是结构，而是这几处硬修补（不修很容易被追问）

不管你最终用什么结构，这几处我强烈建议保留（它们是“老师最爱抓”的点）：

1) 多项式参数化：别写成“正确”，写成“选择 + 条件 + 退化特例”

你可以继续用二次多项式，但一定要在正文有一句话落地：

这是 control parameterization（工程近似）
合理条件：执行器/内环有插补/滤波，δ 不太大
ZOH 是特例：令一次、二次项为 0
真要更复杂可换样条/分段线性（代价是决策维度上升）

2) 端点增量 vs 区间平均：一定要区分（这是你原稿里最大“自洽风险”）

你原来那种“用区间平均更新采样点输入”的写法，老师很可能追问：

那么 $u(t_{n+1})$ 到底是什么？

最稳的写法是：

用 端点增量更新采样点： $\Delta u^{end}=\Delta u(\delta)$
区间平均只作为等效量/解释量（可选）

这条建议无论你结构怎么选都要做。

3) Nash 迭代：必须加“工程化兜底”

别只写收敛指标，还要写：

$L_{\max}$ 最大迭代次数
可选松弛系数 $\beta$ （阻尼）
不收敛就用最后一次迭代或上一时刻解 warm-start

否则老师会说“太理想化”。

你现在该怎么选：保留原结构还是用我那版？

给你一个简单判据（不用纠结）：

如果你原结构已经符合模板、章节标题也跟你全文一致：
✅ 保留原结构，把我修补的内容“嵌进去”，并合并一些碎小节即可。
如果你原结构本来就碎到像“2.1.1.1.1”，读者会迷路：
✅ 用我给的折中结构（一个方法章 4 小节），不要用极简 2 section。

你刚才的反馈更像第一种情况：原结构顺，只是需要“清理一致性”。

我可以直接帮你做到“像原文但更好”的版本（不强行换叙事）

你把你现在的目录/章节标题（只要方法相关那几章的标题即可）贴出来，我就能把我写的内容：

按你原来的章节顺序嵌回去
章节数量压到你想要的程度（比如 1 章 3–4 小节）
同时保留详细度，避免“太简洁”
顺手把最容易被追问的点（端点/平均、符号冲突、Nash 兜底）全部落成可直接粘贴的段落

你不需要给全篇正文，先给“方法章的目录”就够，我会直接给你一版结构和你原来一致、但更严谨更细的 LaTeX。

質問

The state increment at time tn+1t_{n+1}tn+1​ is described by the following discrete mapping: \begin{equation*} \Delta x_i(t_{n+1})

With the forward and backward trajectories available on the same segment, we define the multi-step reciprocal prediction error \begin{equation} E_i(t_n)

\big[ \Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n \big]^{\top}. \end{equation} and perform the one-step prediction \begin{equation} \Delta \hat{x}i(t{n+1})

Each rolling stand is treated as a subsystem. The coupled subsystem dynamics are written as \begin{equation} \Delta x_i(t_{n+1})

思考

Clarifying and restructuring cold rolling process explanation

Explaining residual neural network control and RNE-DMPC model

Describing algorithm and state variables

Clarifying control increments and parameterization

Improving content and clarifying control-parameterization

回答