%======================== \section{Construction of...
作成日: 2026年2月22日
使用モデル GPT-5.2 Thinking by Chat01
作成日: 2026年2月22日
使用モデル GPT-5.2 Thinking by Chat01
%========================
\section{Construction of Dataset}
%========================
M_d,\Delta x_i(t_n)
+
N_d,\Delta u_i(t_n)
+
F_d,\Delta d_i(t_n),
\end{equation*}
where Equation maps to .
Here, and denote the \emph{equivalent discrete-time (interval-averaged)}
control increment and disturbance over , respectively, as defined below.
\begin{remark}
In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish the equation based on first principles. Therefore, in this paper, we hope to learn an approximate result from the data.
\end{remark}
%========================
\subsection{Interval-level parameterization and one-step dataset}
%========================
\gamma_{i,n0}+\gamma_{i,n1}\tau+\gamma_{i,n2}\tau^2,
\qquad \tau\in[0,\delta_n],
\end{equation}
where is the local parameter vector.
Here, denotes the initial baseline of the increment, while and describe
the linear and quadratic variation rates, respectively.
Define the equivalent discrete-time (interval-averaged) increments as
\begin{equation}
\begin{aligned}
\Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau),d\tau,\
\Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau),d\tau.
\end{aligned}
\end{equation}
Given the above parameterization, one training sample is generated on each interval .
In addition to the local state increment, the neighbor state increments are also included to represent inter-stand coupling.
The specific process is shown in Table~\ref{tab:interval_sample_generation_en}.
\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.15}
\caption{Procedure for generating one interval-level sample on .}
\label{tab:interval_sample_generation_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X}
\toprule
\textbf{Step} & \textbf{Operation} \
\midrule
1 & \textbf{State sampling:} sample and its neighbor stack from . \
2 & \textbf{Parameter sampling:} draw (ranges of ). \
3 & \textbf{Control construction:} compute via the polynomial model. \
4 & \textbf{State propagation:} integrate the coupled mill model on (e.g., RK4) and record . \
\bottomrule
\end{tabularx}
\end{table}
Define the neighbor-state-increment stack as
\begin{equation}
\Delta x_{Z_i}(t_n)=\mathrm{col}{\Delta x_k(t_n),|,k\in Z_i}.
\end{equation}
Accordingly, an interval sample for subsystem can be represented as
\begin{equation}
\mathcal{D}{i,n}=\big{\Delta x_i(t_n),\ \Delta x{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big},
\end{equation}
used to learn the mapping relationship from the current local and neighbor states and local control trajectory to the next local state.
For each subsystem , by repeating the above procedure across multiple intervals and randomized draws, the local one-step training dataset is formed as
\begin{equation}
\begin{split}
S_i=\Big{&
\big(\Delta x_i^{(j)}(t_n),,\Delta x_{Z_i}^{(j)}(t_n),,\Delta x_i^{(j)}(t_{n+1});\
&\qquad \Gamma_{i,n}^{(j)},,\delta_n^{(j)}\big)
\ \Big|\ j=1,\ldots,J
\Big}.
\end{split}
\end{equation}
The overall dataset for the five-stand mill can be denoted as .
The point-cloud visualization of the training dataset is shown in Figure~\ref{2}.
\begin{figure*}[htbp]
\centering
\includegraphics[scale=0.5]{picture/Fig2.pdf}
\caption{Point cloud map of the training dataset.}\label{2}
\end{figure*}
%========================
\subsection{Multi-step rollout segment dataset}
%========================
The one-step set is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss
and reciprocal-consistency regularization, because these objectives require ground-truth state trajectories over a horizon of consecutive intervals.
Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples
into -step trajectory segments.
Specifically, during offline simulation, for each starting time we generate a segment of length by consecutively sampling
(and the corresponding inputs/disturbances),
and integrating the coupled mill model over for .
Hence, we obtain the state-increment sequence as well as the neighbor stacks
.
Define a -step segment sample for subsystem as
\begin{equation}
\begin{aligned}
\mathcal{W}{i,n}=
\Big{&
\big(\Delta x_i(t{n+s}),,\Delta x_{Z_i}(t_{n+s}),,\Gamma_{i,n+s},,\delta_{n+s}\big){s=0}^{K-1}; \
&\big(\Delta x_i(t{n+s+1})\big){s=0}^{K-1}
\Big}.
\end{aligned}
\end{equation}
By repeating the above segment generation, we form the multi-step training set
\begin{equation}
S_i^{(K)}=\Big{\mathcal{W}{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big}.
\end{equation}
Note that can be viewed as the marginal one-step projection of (by keeping only ),
thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training.
%========================
\section{Construction of Residual Neural Network}
%========================
\subsection{Network Architecture}
Given the training dataset, the neural network model is defined and trained.
The network model essentially learns the evolutionary mapping of the interconnected cold rolling system over each local interval.
Specifically, for subsystem , the proposed residual network defines a nonlinear mapping
\begin{equation}
\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
where denotes the dimension of the local state increment ,
is the number of neighbors of subsystem ,
and denotes the dimension of the local input-parameter vector.
For a single-input case with quadratic parameterization, .
Accordingly, we define the residual mapping
\begin{equation}
\begin{aligned}
\Delta r_i &= \mathcal{N}i(X{i,\text{in}}; \Theta_i),\
X_{i,\text{in}} &\in \mathbb{R}^{d(1+|Z_i|)+p+1},\qquad
\Delta r_i \in \mathbb{R}^d ,
\end{aligned}
\end{equation}
where represents the trainable parameters of the network for subsystem , and denotes the concatenated input vector for subsystem in the current sampling interval.
To explicitly incorporate the residual structure, the local state component in the input is passed through an identity shortcut
and added to obtain the one-step prediction.
Let be a linear selection matrix whose block form is defined by
\begin{equation}
\hat{I}i = [I_d,, 0{d\times(d|Z_i|+p+1)}].
\end{equation}
To improve robustness when the sampling interval length varies or becomes relatively large, we introduce an auxiliary branch inside :
\begin{equation}
\mathcal{N}i(X{i,\text{in}};\Theta_i)\triangleq
\eta_i(X_{i,\text{in}};\Theta_{\eta_i}) + \mathcal{I}i(X{i,\text{in}};\theta_i),
\end{equation}
where can be implemented by a lightweight feedforward network and
denotes the remaining residual branch.
When , the model reduces to the standard residual form.
Hence, the one-step prediction of the local state increment at can be written as
\begin{equation}
X_{i,\text{out}} = \hat{I}i X{i,\text{in}} + \mathcal{N}i(X{i,\text{in}}; \Theta_i),
\label{333}
\end{equation}
where represents the predicted .
\begin{remark}
The predictor in \eqref{333} admits a baseline-plus-correction form: the shortcut term propagates the current local increment , while the residual network learns the one-step correction.
This structure renders the model interpretable as a data-driven adjustment to a persistence prior, with the correction capturing unmodeled nonlinearities and inter-stand coupling via under varying operating conditions.
\end{remark}
For the -th one-step data sample, , we set
\begin{equation}
\begin{aligned}
X_{i,\text{in}}^{(j)} =
\big[
\Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\
\Gamma_{i,n}^{(j)},\ \delta_n^{(j)}
\big]^{\top}.
\end{aligned}
\end{equation}
The learning target remains the state-increment residual
\begin{equation}
\Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n).
\end{equation}
%========================
\subsection{Training, Learned Model, and System Prediction}
%========================
To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability, we train the forward predictor jointly with an auxiliary backward residual model and impose a multi-step reciprocal-consistency regularization over a -step segment.
In addition to the forward residual predictor, we construct a backward residual network for subsystem ,
\begin{equation}
\mathcal{B}i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d},
\end{equation}
parameterized by . For the backward step associated with the interval , we define
\begin{equation}
\begin{aligned}
X{i,\mathrm{in}}^{b}
&=
\big[
\Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),
\Gamma_{i,n},\ \delta_n
\big]^{\top},\
X_{i,\mathrm{out}}^{b}
&=
\hat{I}i X{i,\mathrm{in}}^{b} + \mathcal{B}i(X{i,\mathrm{in}}^{b};\bar{\Theta}i),
\end{aligned}
\end{equation}
where represents the backward estimate of . Accordingly, the supervised backward residual target is
\begin{equation}
\Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t{n+1}).
\end{equation}
Given a segment sample , we initialize the forward rollout by
\begin{equation}
\Delta \hat{x}i(t_n)=\Delta x_i(t_n),
\end{equation}
and apply the forward predictor recursively for steps:
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1})
&=
\Delta \hat{x}i(t{n+s})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s}),,\Delta \hat{x}{Z_i}(t{n+s}),\
&\qquad\qquad\qquad
\Gamma_{i,n+s},,\delta_{n+s};,\Theta_i
\Big),
\quad s=0,\ldots,K-1.
\end{aligned}
\end{equation}
After obtaining the terminal forward prediction, we set the terminal condition for the backward rollout as
\begin{equation}
\Delta \bar{x}i(t{n+K})=\Delta \hat{x}i(t{n+K}),
\end{equation}
and roll back using along the same segment:
\begin{equation}
\begin{aligned}
\Delta \bar{x}i(t{n+s})
&=
\hat{I}i X{i,\mathrm{in}}^{b}(t_{n+s})
+
\mathcal{B}i!\Big(X{i,\mathrm{in}}^{b}(t_{n+s});,\bar{\Theta}i\Big),
\quad s=K-1,\ldots,0,
\end{aligned}
\end{equation}
where the backward input at time is
\begin{equation}
X{i,\mathrm{in}}^{b}(t_{n+s})=
\big[
\Delta \bar{x}i(t{n+s+1}),\ \Delta \hat{x}{Z_i}(t{n+s+1}),
\Gamma_{i,n+s},\ \delta_{n+s}
\big]^{\top}.
\end{equation}
and is obtained from the same forward rollout.
\sum_{s=0}^{K}
\left|
\Delta \hat{x}i(t{n+s})-\Delta \bar{x}i(t{n+s})
\right|^2.
\end{equation}
We then train the forward and backward networks jointly by minimizing the following overall objective terms:
\begin{equation}
\begin{aligned}
L_{\mathrm{1step}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big)
-\mathcal{N}i!\left(
X{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i
\right)
\Big|^2,\[2mm]
L_{\mathrm{bwd}}(\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1}
\Big|
\big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big)
-\mathcal{B}i!\left(
X{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}i
\right)
\Big|^2,\[2mm]
L{\mathrm{msrp}}(\Theta_i,\bar{\Theta}i)
&= \frac{1}{J_K}\sum{j=1}^{J_K} E_i^{(j)}(t_n),\[2mm]
L_{\mathrm{roll}}(\Theta_i)
&= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K}
\Big|
\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}i^{(j)}(t{n+s})
\Big|^2.
\end{aligned}
\end{equation}
\Delta x_i(t_n)
+
\mathcal{N}i!\Big(
\Delta x_i(t_n),,\Delta x{Z_i}(t_n),,
\Gamma_{i,n},,\delta_n;,\Theta_i^*
\Big).
\end{equation}
By applying this predictor recursively, we obtain a network model that predicts the system trajectory over long horizons. Finally, the network parameters are optimized using the Adam optimizer:
\begin{equation}
\Theta_{i,t+1} = \Theta_{i,t} - \eta \frac{\hat{m}{i,t}}{\sqrt{\hat{v}{i,t}} + \varepsilon},
\end{equation}
where denotes the current parameters, the updated parameters, the learning rate, and the bias-corrected first and second moment estimates, and a small constant for numerical stability. Figure~\ref{fig:rnn_logic} illustrates the overall structure.
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.85]{picture/x6.pdf}
\caption{Logic diagram of the residual neural network.}
\label{fig:rnn_logic}
\end{figure}
\section{Nash Equilibrium-Based RNE-DMPC}
The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation.
As a result, changes in operating conditions or control actions at one stand can affect both upstream and downstream stands,
making centralized online optimization over high-dimensional decision variables computationally demanding.
To mitigate this issue, we decompose the global predictive-control problem into local subproblems associated with individual stands.
Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers.
Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate the distributed coordination process as a Nash-equilibrium-seeking iteration.
Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed model predictive control method (RNE-DMPC)
to achieve coordinated thickness--tension regulation. The overall control structure is shown in Figure~\ref{4}.
\begin{figure*}[htbp]
\centering
\includegraphics[width=\linewidth]{picture/x2.pdf}
\caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4}
\end{figure*}
For interconnected systems such as tandem cold rolling mills, distributed MPC can reduce the computational burden of online optimization
and improve scalability through parallel local optimization with limited information exchange.
In RNE-DMPC, each subsystem exchanges predicted trajectories and decision variables through a communication module.
Coordination among subsystems is achieved by iteratively seeking a Nash equilibrium, where each local controller computes a best response
to the most recent strategies of its neighbors.
In the proposed architecture, each stand-level controller regulates its local actuators according to the assigned control objectives and constraints, while coordination is maintained through information exchange with neighboring stands.
Thus, the interconnected cold rolling system can achieve distributed thickness-tension control based on Nash equilibrium.
\subsection{Subsystem Prediction and Optimization}
f_i!\big(\Delta x_i(t_n),u_i(t_n)\big)
+
\sum_{k \in Z_i} g_{ik}!\big(\Delta x_k(t_n),u_k(t_n)\big),
\end{equation}
where and denote the state increment and control input of subsystem ,
is the neighbor set, and characterizes the coupling effect.
Define the neighbor-state-increment stack as
\begin{equation}
\Delta x_{Z_i}(t_n) =
\mathrm{col}{\Delta x_k(t_n),|,k\in Z_i}.
\end{equation}
\textbf{Local polynomial parameterization.}
Over each local interval with length , the control increment trajectory of subsystem
is parameterized by a quadratic polynomial:
\begin{equation}
\begin{aligned}
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
&=
\gamma_{i,n+s,0}
+\gamma_{i,n+s,1}\tau
+\gamma_{i,n+s,2}\tau^2,\
&\qquad \tau \in [0,\delta_{n+s}].
\end{aligned}
\end{equation}
where
and for the single-input case.
\textbf{Neural-network-based prediction.}
Using the trained residual neural network surrogate, subsystem predicts its one-step state increment by
\begin{equation}
\begin{aligned}
\Delta \hat{x}i(t{n+s+1})
&=
\Delta \hat{x}i(t{n+s})
+
\mathcal{N}i!\Big(
\Delta \hat{x}i(t{n+s}),,
\Delta \hat{x}{Z_i}(t_{n+s}), \
&\qquad\qquad
\Gamma_{i,n+s},,
\delta_{n+s};,
\Theta_i^*
\Big), \
&\qquad s=0,\ldots,N_p-1,
\end{aligned}
\end{equation}
where is obtained from the latest communicated neighbor predictions.
\textbf{Decision variables.}
Optimize the local parameter sequence over the control horizon :
\begin{equation}
\begin{aligned}
\mathbf{\Gamma}i(t_n)
&=
\big[
\Gamma{i,n}^\top,,
\Gamma_{i,n+1}^\top,,
\ldots,,
\Gamma_{i,n+N_c-1}^\top
\big]^\top \
&\in \mathbb{R}^{pN_c}.
\end{aligned}
\end{equation}
\textbf{Local objective.}
The local cost function of subsystem is defined as
\begin{equation}
\begin{aligned}
J_i
&=
\sum_{s=1}^{N_p}
\big|
\Delta \hat{x}i(t{n+s}) - \Delta x_{\mathrm{ref}}(t_{n+s})
\big|{Q_i}^2 \
&\quad +
\sum{s=0}^{N_c-1}
\big|
\Gamma_{i,n+s}
\big|_{R_i}^2,
\end{aligned}
\end{equation}
where and are weighting matrices and is the reference for the state increment. is the local polynomial-parameter vector of the control increment for subsystem over the interval .
\textbf{Constraints.}
Typical constraints include bounds on absolute inputs and increment trajectories:
\begin{align}
u_{i,\min} &\le u_i(t_{n+s}) \le u_{i,\max},\
\Delta u_{i,\min}
&\le
\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})
\le
\Delta u_{i,\max}, \notag \
&\hspace{3.6cm}
\forall\tau\in[0,\delta_{n+s}].
\end{align}
In practice, the interval-wise bound can be checked by evaluating , , and the quadratic extremum.
To enforce the absolute-input constraints consistently within the prediction horizon, we update the absolute input using the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_{n+s})
&=
\frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}),d\tau \
&=
\gamma_{i,n+s,0}
+\gamma_{i,n+s,1}\frac{\delta_{n+s}}{2}
+\gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3},
\end{aligned}
\end{equation}
and propagate
\begin{equation}
\begin{aligned}
u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \
u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1.
\end{aligned}
\end{equation}
\textbf{Local optimization problem.}
At Nash-iteration index , subsystem solves
\begin{equation}
\mathbf{\Gamma}i^{(l)}=\arg\min{\mathbf{\Gamma}_i}; J_i \quad \text{s.t. (40)--(43)}.
\end{equation}
\subsection{Nash Equilibrium Coordination Iteration}
The Nash equilibrium is computed via distributed best-response iterations, summarized in Table~\ref{tab:nash_iter_en}.
\begin{table}[t]
\centering
\small
\renewcommand{\arraystretch}{1.12}
\setlength{\tabcolsep}{3.5pt}
\caption{Distributed Nash best-response iteration for RNE-DMPC.}
\label{tab:nash_iter_en}
\begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X}
\toprule
\textbf{Step} & \textbf{Description} \
\midrule
A &
Initialize and initialize for all subsystems. \
B &
Using the surrogate predictor, compute for \
&
given and the latest neighbor predictions
. \
C & Solve the local optimization problem to update . \
D &
Broadcast and predicted trajectories
to the communication system. \
E &
Update neighbor predictions using received information; \
& re-generate predictions if needed. \
F & Compute the maximum relative change . \
G &
If , stop and set ; \
& otherwise set and repeat Steps B--F. \
\bottomrule
\end{tabularx}
\end{table}
The convergence metric in Step~F is defined as
\begin{equation}
\begin{aligned}
\varsigma^{(l)}
&=
\max_i
\frac{\left|
\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}
\right|_2}{
\left|
\mathbf{\Gamma}_i^{(l-1)}
\right|_2+\epsilon},
\end{aligned}
\end{equation}
where avoids division by zero.
\textbf{Receding-horizon implementation.}
Only the first-interval parameters are applied.
The control increment trajectory over is
\begin{equation}
\Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^),
\quad \tau\in[0,\delta_n].
\end{equation}
To obtain the equivalent discrete-time control increment used for updating the absolute input, we use the interval average:
\begin{equation}
\begin{aligned}
\Delta u_i(t_n)
&=
\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau),d\tau \
&=
\gamma_{i,n,0}^
+
\gamma_{i,n,1}^\frac{\delta_n}{2}
+
\gamma_{i,n,2}^\frac{\delta_n^2}{3}.
\end{aligned}
\end{equation}
Then the absolute control input is updated by
\begin{equation}
u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n),
\end{equation}
which ensures smooth input evolution and avoids abrupt actuator changes.
The overall control flow chart is shown in Fig.~\ref{liu}.
\begin{figure}[htbp]
\centering
\includegraphics[width=\linewidth]{picture/x5.pdf}
\caption{The overall system control flow chart}\label{liu}
\end{figure}这是我论文的第二到第三节技术部分。老师说写的太模糊了。没有说清楚变量的含义都是啥,张力和厚度在哪,公式(2)可不可以在五机架冷连轧的情境里这样参数化。第三章的网络没讲清楚,例如:有u和没u的区别,为什么这么设计网络模型等等。第四章没看出来控制策略怎么设计,参考信号不清楚,怎么把优化目标和优化函数说清晰,冲突没看出来,怎么解决冲突问题,优化问题最终怎么解。还有其他的一些问题,控制和预测怎么衔接的,预测怎么给控制服务等等。帮我理清一下思路哦,重新写,给出正确的思路
下面我先把你老师提到的“模糊点”逐条对应成一条清晰的技术主线,然后给你一份可以直接替换进论文的**“更清楚的第二~四章写法(含变量定义、厚度/张力位置、参数化合理性、网络设计动机、控制目标/参考/冲突/求解、预测与控制的衔接)”**。你可以按你论文的具体变量维度(比如每个机架状态到底包含哪些量)再做小改。
你现在的写法里,读者会卡在 6 个地方:
你需要做的不是“再写更多公式”,而是把每一章都改成:先把对象定义清楚 → 再说明为什么这么建模/这么参数化 → 再给算法流程(输入/输出/求解器)。
你现在把 Δx 叫“state increment”,但又用它做状态演化。为了让读者不迷路,强烈建议你在论文里二选一并固定:
同理
这样你的离散映射“Δx → Δx”就完全合理:它是偏差系统。
那就必须写清楚
但这样进入 MPC 会更绕,而且你的残差网络“shortcut 直接加 Δx(t_n)”会更难解释其物理意义。
建议你用方案 A,把“increment”改成“deviation”或“deviation state”,你老师提的“变量含义不清”会立刻好一半。
五机架冷连轧典型关注两类输出:
一种很常见、读者也最好理解的状态选取(仅示例,你按你仿真模型调整维度):
其中
邻居集合也要落地写清楚(五机架串联系统最简单):
你的参数化不是为了“数学好看”,而是为了解决冷连轧控制里两个现实点:
因此你可以把区间内的控制偏差写成基函数展开(你这里选二次):
这等价于“用 3 个参数描述一个采样区间内的平滑控制轨迹”,比 piecewise-constant 更能表达渐变/缓变,也更利于约束“采样区间内任何时刻不越界”。
然后你定义的离散等效输入用“区间平均”:
这一步要强调一句:
控制优化决策变量是 ,但为了与离散状态更新与执行器指令一致,我们用区间平均将连续轨迹映射为离散等效控制量,同时区间内约束仍用轨迹检查保证。
再补一句你老师关心的:“可不可以在五机架情境这样参数化?”
答案是可以,但要加一句适用条件:采样周期相对执行器带宽不能太大、且实际指令在区间内变化可被低阶多项式近似。你可以这样写:
当采样周期 相对液压/速度内环带宽不至于过大时,区间内的控制偏差轨迹可用低阶多项式近似;二次项在不显著增加决策维度的前提下提供“曲率自由度”,能更好拟合加减速/辊缝渐变等工况。
这句话你可以直接写进论文:
所以你现在的写法必须明确:你训练的是
而不是“仅靠状态自回归”。
你现在写得像“为了鲁棒性加个分支”,但读者不知道它解决啥。建议这样讲:
当 变大时,系统变化幅度变大,单一残差网络容易对“时间尺度”敏感。你引入
一个更清晰的解释是:
你老师说“没看出来控制策略怎么设计”,通常是因为你缺了 5 件事:
下面给你一份“更清楚的写法模板”。
你可以把下面当成第二~四章的“清晰版本”,再按你实际状态/控制维度微调。
(建议放一个符号表/文字说明)
其中 为第 i 架出口厚度, 为 (i,i+1) 间张力; 为名义工况/参考轨迹。
若只选单输入(例如辊缝),则 。
并定义邻居堆叠
这一段写清楚后,厚度和张力“在哪”就明确了:它们就是 的分量。
由于五机架系统存在显著耦合、摩擦与材料参数不确定等因素,难以获得高精度的一阶机理离散模型。本文采用数据驱动方式学习如下局部离散演化关系:
其中 为未知非线性映射; 表示该采样区间内的控制轨迹信息。
为与离散预测/执行一致,引入区间平均等效控制偏差:
相应地,扰动也可用区间平均表示 。
为在不显著增加在线优化维度的前提下描述采样区间内的平滑控制变化,本文将区间控制偏差轨迹用二次多项式参数化:
其中 。
该参数化可近似描述轧机执行器在采样周期内的渐变响应,并天然保证控制轨迹在区间内可检查约束(见第4章)。
由此区间平均等效输入为
(这里加一句适用条件/解释:采样周期与执行器带宽关系、平滑性需求。)
一步样本(监督学习)
对每个区间 ,构造训练输入
输出标签为 或残差 。
K 步段样本(用于 rollout 与一致性正则)
组织连续 K 个区间得到段样本 ,包含
这一段的意义要写明:用于训练时约束长期递推误差(drift),让模型可用于 MPC 的长预测域。
目标是学习一个可控的一步预测模型:
强调: 进入输入,保证模型能响应控制变化,从而可用于 MPC 在线优化。
你原来的 可以这样解释并写清楚:
其中 用于学习与时间尺度 强相关的低频项(如随 的尺度变化), 学习剩余耦合非线性细节;当 退化为标准 ResNet。
要用于 MPC,模型必须在 步滚动中不发散。仅用一步 MSE 容易出现“短期准、长期漂”。
因此引入:
(这段你现在写了很多公式,但缺“这一招解决什么问题”的解释。补上这两三句话,老师就容易认可。)
控制目标:
参考信号来源(你必须选一种说法并固定):
冲突来源(耦合):
在五机架串联系统中,机架 的速度/辊缝等动作会改变带材流动与变形,从而影响相邻机架间张力;即 “局部最优”可能使邻居张力恶化。
这就是为什么需要分布式协调(Nash 迭代)而不是各做各的。
在 MPC 每次在线优化中,候选决策变量是 。给定 和邻居的最新策略/预测,利用已训练网络递推得到预测轨迹:
这条递推就是“控制优化的模型约束”。 没这条,MPC 就无从谈起。
若 ,则可写
(你可以把“厚度权重更大/张力权重更大”的工艺解释写一句。)
区间内控制轨迹约束:
实现上检查 以及二次极值点 (若落在区间内)。
绝对输入约束通过区间平均更新:
从而 enforce:
这段要明确说:Nash 迭代的本质是**“耦合冲突通过重复互相回应找到一个稳定折中点”**,它不是强行让大家一致(那是共识/ADMM),而是让谁也不愿意单方面改变(均衡)。
每个局部问题通常是一个带 NN 预测约束的非线性规划(NLP)。因为 NN 可微,可以采用:
你不需要写“我用 IPOPT”,但至少要写:
本文将局部优化问题视为可微 NLP,利用梯度信息进行数值求解;每次 Nash 迭代求得 后广播更新。
在每个采样时刻 :
这段写出来,读者就能看出“预测—优化—执行—再测量”的闭环链条,你老师说的“控制和预测怎么衔接”就解决了。
如果你不想大改结构,至少做这 10 个动作:
如果你愿意,我也可以在你“真实的状态向量/控制量定义”(比如你到底选了辊缝、速度、弯辊还是张力给定等)基础上,把上面的“示例版”完全改成与你论文一致的最终可直接粘贴版:你只要告诉我两件事就行——
只有厚度和张力。ui到底是哪些执行量你理解的是什么
在你这个“五机架冷连轧、状态只含厚度+张力”的设定下,我对 的理解是:
是机架 在在线控制中“能直接下发/能被控制器改变的执行量(manipulated variables)”的集合,也就是你 MPC 的决策变量最终要落到的“设定值/指令”。
在冷连轧里,能同时影响厚度与张力、且工业上最常用、最容易让审稿人/老师接受的选择通常是下面这两个:
所以写成:
这样你第四章要做“厚度–张力协调控制”,逻辑就非常顺:
这也能直接回应老师质疑:“有 和没 的区别”:没 你就根本没法通过优化改变预测张力/厚度。
如果 是二维(辊缝+速度),那你现在的参数化只要“对每个输入通道各来一套二次系数”即可:
于是
这点你在论文里一定要一句话点明:
“当 时, 由各输入通道的多项式系数组合而成,维度为 。”
有些产线确实存在“张力内环”(通过调速实现张力跟踪),这时外环 MPC可以不直接给 ,而是给一个张力参考 :
但注意:如果你这么定义 ,你的数据集离线仿真必须把“张力内环控制器”也包含进被控对象,否则网络学到的映射和现场执行不一致。老师如果问“你到底控制的是速度还是张力参考”,你必须说清楚层级结构。
在“只关心厚度+张力”且你要做 RNE-DMPC(分布式协调) 的语境里,我建议你论文里直接采用最经典的一句:
每机架采用两个可操纵量:辊缝(压下)与速度设定。辊缝主导厚度,速度差主导机架间张力;两者通过带钢变形与张力传播产生耦合,因此需要分布式协调。
这样第二章的参数化、第三章的 NN 输入、第四章的优化目标/冲突来源都能一条线串起来。
如果你愿意,我可以按你“状态只含厚度+张力”的约束,帮你把文中所有出现的 、、、 以及第4章代价函数里“”改写成明确的厚度参考 和 张力参考 ,并把“冲突是什么、Nash 怎么解决、每个子问题怎么解、滚动怎么执行”那一段写成你论文可直接替换的 LaTeX 段落。你只要告诉我你希望:
是辊缝和机架速度。我只考虑五机架的情况。帮我上述所有问题改正,解释清楚改成完整的latex段落,改写的文字不要省略任何内容。
latex%======================== \section{Construction of Dataset} %======================== \subsection{Five-stand tandem mill setting and notation (variables, thickness and tension)} \label{subsec:notation_5stand} Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$. Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n$ the $n$-th sampling interval length. \textbf{Thickness and tension variables.} Let $h_i(t)$ denote the exit thickness of stand $i$, and let $T_i(t)$ denote the inter-stand tension between stand $i$ and $i+1$ (i.e., $T_i$ is defined for $i=1,2,3,4$). The control goal of this paper is to achieve coordinated regulation/tracking of $\{h_i\}_{i=1}^5$ and $\{T_i\}_{i=1}^4$ in the presence of strong inter-stand coupling. \textbf{State (deviation) definition.} To avoid ambiguity, throughout this paper the symbol ``$\Delta$'' attached to a \emph{state} denotes a \emph{deviation} (tracking error) with respect to a prescribed reference (or nominal) trajectory: \begin{equation} \Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad \Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t). \end{equation} For the five-stand case, we choose a stand-wise local state vector that explicitly contains the thickness and the adjacent tensions: \begin{equation} \Delta x_i(t)\triangleq \begin{bmatrix} \Delta h_i(t)\\ \Delta T_{i-1}(t)\\ \Delta T_i(t) \end{bmatrix}\in\mathbb{R}^{d},\qquad d=3, \label{eq:local_state_def} \end{equation} with the boundary convention $\Delta T_{0}(t)\equiv 0$ and $\Delta T_{5}(t)\equiv 0$ so that all stands share a unified dimension $d=3$. In the cost function and constraints, boundary ``virtual'' tensions can be assigned zero weights so that they do not affect the optimization. \textbf{Neighbor set (five-stand chain coupling).} The mill coupling is dominated by tension propagation between adjacent stands, therefore we use the chain neighbor sets \begin{equation} Z_1=\{2\},\quad Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad Z_5=\{4\}. \label{eq:neighbor_set} \end{equation} Define the neighbor-state stack (used to encode inter-stand coupling information) as \begin{equation} \Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\,|\,k\in Z_i\}. \label{eq:neighbor_stack} \end{equation} \textbf{Control inputs (actuators) and control increments.} For each stand $i$, the manipulated variables are chosen as the \emph{roll gap (screw-down/hydraulic gap)} and the \emph{stand speed}: \begin{equation} u_i(t)= \begin{bmatrix} s_i(t)\\ v_i(t) \end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2. \label{eq:ui_def} \end{equation} To ensure smooth actuator evolution and to match typical industrial implementations, the optimization is conducted on \emph{control increments} (sample-to-sample changes) rather than absolute inputs: \begin{equation} \Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1}) = \begin{bmatrix} \Delta s_i(t_n)\\ \Delta v_i(t_n) \end{bmatrix}. \label{eq:du_increment_def} \end{equation} Note the distinction: $\Delta x$ is a \emph{deviation state} (tracking error), while $\Delta u$ is an \emph{input increment}. \textbf{Disturbance.} Let $d_i(t)$ denote exogenous disturbances (e.g., entry thickness fluctuation, friction variation, material property changes, etc.). We use $\Delta d_i(t)$ to denote their interval-level equivalent representation (defined later). \subsection{Discrete interval mapping and data-driven motivation} \label{subsec:discrete_mapping} The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by an equivalent discrete-time mapping \begin{equation} \Delta x_i(t_{n+1}) = \Phi_i\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Delta u_i([t_n,t_{n+1}]),\,\Delta d_i([t_n,t_{n+1}])\Big), \label{eq:true_unknown_mapping} \end{equation} where $\Phi_i(\cdot)$ is generally nonlinear and strongly coupled due to tension propagation and rolling deformation interactions. A commonly used \emph{conceptual} local linear discrete form is \begin{equation*} \Delta x_i(t_{n+1}) = M_d\,\Delta x_i(t_n) + N_d\,\Delta u_i(t_n) + F_d\,\Delta d_i(t_n), \end{equation*} where $M_d,N_d,F_d$ are equivalent discrete matrices and $\Delta u_i(t_n),\Delta d_i(t_n)$ are interval-level equivalent inputs/disturbances. However, in a practical five-stand cold rolling mill, accurate identification of $(M_d,N_d,F_d)$ from first principles is difficult, because of complex coupling, unmodeled nonlinearities, and time-varying operating conditions. \begin{remark} In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish the discrete mapping based on first principles. Therefore, in this paper, we aim to learn an accurate approximation of \eqref{eq:true_unknown_mapping} from data, and then embed the learned surrogate into distributed MPC. \end{remark} %======================== \subsection{Interval-level parameterization and one-step dataset} %======================== To construct the supervised dataset for training, we locally parameterize the \emph{control increment trajectory} within each sampling interval. For the local interval $[t_n,t_{n+1}]$, define the interval length \begin{equation} \delta_n = t_{n+1}-t_n , \end{equation} and introduce a local time variable $\tau\in[0,\delta_n]$. \textbf{Why interval-level parameterization is valid in five-stand cold rolling.} Although the controller updates the setpoints at discrete instants $t_n$, the physical actuators (hydraulic gap and drive speed loops) evolve continuously within $[t_n,t_{n+1}]$ and are typically implemented with smoothing/ramps. Moreover, abrupt changes of roll gap/speed can excite tension oscillations and degrade thickness stability. Therefore, representing the within-interval increment trajectory by a low-order polynomial provides: (i) a compact finite-dimensional decision variable for optimization; (ii) a smooth within-interval command profile; (iii) a convenient way to enforce \emph{continuous-time} bounds in the whole interval. This approximation is appropriate when the sampling interval is not excessively large compared to actuator bandwidth and the within-interval evolution can be well captured by a low-order basis. \textbf{Vector quadratic polynomial parameterization (two inputs).} Using a quadratic polynomial basis for each input channel, the control increment trajectory on $[t_n,t_{n+1}]$ is parameterized as \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n}) = \Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2, \qquad \tau\in[0,\delta_n], \label{eq:du_poly_vector} \end{equation} where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ are vector coefficients. Equivalently, for $n_u=2$ (roll gap and speed), one may write component-wise: \begin{equation} \begin{aligned} \Delta s_{i,n}(\tau) &= \gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\\ \Delta v_{i,n}(\tau) &= \gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2. \end{aligned} \end{equation} Define the local parameter vector by stacking all coefficients: \begin{equation} \Gamma_{i,n}\triangleq \big[ (\Gamma_{i,n0})^\top,\, (\Gamma_{i,n1})^\top,\, (\Gamma_{i,n2})^\top \big]^\top \in\mathbb{R}^{p}, \qquad p=3n_u=6. \label{eq:Gamma_dim} \end{equation} Here, $\Gamma_{i,n0}$ denotes the initial baseline of the increments, while $\Gamma_{i,n1}$ and $\Gamma_{i,n2}$ describe the linear and quadratic variation rates. \textbf{Equivalent discrete-time (interval-averaged) increments.} Define the equivalent discrete-time increments as the interval averages: \begin{equation} \begin{aligned} \Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau,\\ \Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau. \end{aligned} \label{eq:interval_average_def} \end{equation} With \eqref{eq:du_poly_vector}, the interval average admits a closed form: \begin{equation} \Delta u_i(t_n)= \Gamma_{i,n0} +\Gamma_{i,n1}\frac{\delta_n}{2} +\Gamma_{i,n2}\frac{\delta_n^2}{3}. \label{eq:interval_average_closedform} \end{equation} \textbf{One-step sample generation (five-stand coupled simulation).} Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$. In addition to the local state deviation, the neighbor state deviations are also included to represent inter-stand coupling. The specific process is shown in Table~\ref{tab:interval_sample_generation_en}. \begin{table}[t] \centering \small \renewcommand{\arraystretch}{1.15} \caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).} \label{tab:interval_sample_generation_en} \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X} \toprule \textbf{Step} & \textbf{Operation} \\ \midrule 1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and its neighbor stack $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \\ 2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (ranges of polynomial coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$). \\ 3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via the vector polynomial model \eqref{eq:du_poly_vector}. \\ 4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$. \\ \bottomrule \end{tabularx} \end{table} Accordingly, an interval sample for subsystem $i$ can be represented as \begin{equation} \mathcal{D}_{i,n}=\big\{\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big\}, \label{eq:interval_sample_def} \end{equation} which is used to learn the mapping relationship from the current local and neighbor deviation states and the local control-increment trajectory to the next local deviation state. For each subsystem $i$, by repeating the above procedure across multiple intervals and randomized draws, the local one-step training dataset is formed as \begin{equation} \begin{split} S_i=\Big\{& \big(\Delta x_i^{(j)}(t_n),\,\Delta x_{Z_i}^{(j)}(t_n),\,\Delta x_i^{(j)}(t_{n+1});\\ &\qquad \Gamma_{i,n}^{(j)},\,\delta_n^{(j)}\big) \ \Big|\ j=1,\ldots,J \Big\}. \end{split} \label{eq:one_step_dataset} \end{equation} The overall dataset for the five-stand mill can be denoted as $\{S_i\}_{i=1}^{5}$. The point-cloud visualization of the training dataset is shown in Figure~\ref{2}. \begin{figure*}[htbp] \centering \includegraphics[scale=0.5]{picture/Fig2.pdf} \caption{Point cloud map of the training dataset.}\label{2} \end{figure*} %======================== \subsection{Multi-step rollout segment dataset} %======================== The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss and reciprocal-consistency regularization, because these objectives require ground-truth state trajectories over a horizon of $K$ consecutive intervals. Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples into $K$-step trajectory segments. Specifically, during offline simulation, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling $\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances), and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$. Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$. Define a $K$-step segment sample for subsystem $i$ as \begin{equation} \begin{aligned} \mathcal{W}_{i,n}= \Big\{& \big(\Delta x_i(t_{n+s}),\,\Delta x_{Z_i}(t_{n+s}),\,\Gamma_{i,n+s},\,\delta_{n+s}\big)_{s=0}^{K-1}; \\ &\big(\Delta x_i(t_{n+s+1})\big)_{s=0}^{K-1} \Big\}. \end{aligned} \label{eq:Kstep_segment_def} \end{equation} By repeating the above segment generation, we form the multi-step training set \begin{equation} S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}. \label{eq:Kstep_dataset} \end{equation} Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (by keeping only $s=0$), thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training. %======================== \section{Construction of Residual Neural Network} %======================== \subsection{Network Architecture (what is learned, why residual, why include $u$)} \label{subsec:net_arch} Given the training dataset, the neural network model is defined and trained. The network model aims to learn the stand-wise deviation-state evolution of the interconnected five-stand cold rolling system over each local interval. Specifically, for subsystem $i$, the proposed residual network defines a nonlinear mapping \begin{equation} \mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}, \label{eq:Ni_map} \end{equation} where $d$ denotes the dimension of the local deviation state $\Delta x_i$ ($d=3$ in \eqref{eq:local_state_def}), $|Z_i|$ is the number of neighbors of subsystem $i$ defined in \eqref{eq:neighbor_set}, and $p=\mathrm{dim}(\Gamma_{i,n})$ denotes the dimension of the local input-parameter vector ($p=6$ for two-input quadratic parameterization). \textbf{Why the input must include control information.} If the network input does \emph{not} include the control variables (here, $\Gamma_{i,n}$ and $\delta_n$), the learned model degenerates to a purely autoregressive predictor that only reproduces trajectories under the \emph{training policy} and cannot answer ``what will happen if we change the control?'' In MPC, the optimizer must evaluate the predicted trajectory under \emph{candidate} decision variables, therefore a \emph{control-dependent} predictor (including $\Gamma_{i,n}$) is necessary for online optimization. Accordingly, we define the residual mapping \begin{equation} \begin{aligned} \Delta r_i &= \mathcal{N}_i(X_{i,\text{in}}; \Theta_i),\\ X_{i,\text{in}} &\in \mathbb{R}^{d(1+|Z_i|)+p+1},\qquad \Delta r_i \in \mathbb{R}^d , \end{aligned} \label{eq:residual_def} \end{equation} where $\Theta_i$ represents the trainable parameters of the network for subsystem $i$, and $X_{i,\text{in}}$ denotes the concatenated input vector for subsystem $i$ in the current sampling interval. \textbf{Residual (shortcut) structure and interpretability.} To explicitly incorporate a residual structure, the local state component in the input is passed through an identity shortcut and added to obtain the one-step prediction. Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a linear selection matrix whose block form is defined by \begin{equation} \hat{I}_i = [I_d,\, 0_{d\times(d|Z_i|+p+1)}]. \label{eq:selector_matrix} \end{equation} This shortcut represents a persistence prior: over a short interval, the deviation state tends to change moderately, and the network mainly learns the \emph{correction} caused by nonlinear rolling behavior and inter-stand coupling. \textbf{Auxiliary branch for variable $\delta_n$.} To improve robustness when the sampling interval length $\delta_n$ varies or becomes relatively large, we introduce an auxiliary branch inside $\mathcal{N}_i$: \begin{equation} \mathcal{N}_i(X_{i,\text{in}};\Theta_i)\triangleq \eta_i(X_{i,\text{in}};\Theta_{\eta_i}) + \mathcal{I}_i(X_{i,\text{in}};\theta_i), \label{eq:aux_branch} \end{equation} where $\eta_i(\cdot)$ can be implemented by a lightweight feedforward network and $\mathcal{I}_i(\cdot)$ denotes the remaining residual branch. Conceptually, $\eta_i(\cdot)$ captures low-frequency/scale effects associated with interval length $\delta_n$, while $\mathcal{I}_i(\cdot)$ learns the remaining coupling nonlinearities. When $\eta_i(\cdot)\equiv 0$, the model reduces to the standard residual form. Hence, the one-step prediction of the local deviation state at $t_{n+1}$ can be written as \begin{equation} X_{i,\text{out}} = \hat{I}_i X_{i,\text{in}} + \mathcal{N}_i(X_{i,\text{in}}; \Theta_i), \label{333} \end{equation} where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$. \begin{remark} The predictor in \eqref{333} admits a baseline-plus-correction form: the shortcut term $\hat{I}_i X_{i,\mathrm{in}}$ propagates the current local deviation state $\Delta x_i(t_n)$, while the residual network $\mathcal{N}_i(\cdot)$ learns the one-step correction. This structure renders the model interpretable as a data-driven adjustment to a persistence prior, with the correction capturing unmodeled nonlinearities and inter-stand coupling via $\Delta x_{Z_i}$ under varying operating conditions and varying sampling intervals. \end{remark} For the $j$-th one-step data sample, $j=1,\ldots,J$, we set \begin{equation} \begin{aligned} X_{i,\text{in}}^{(j)} = \big[ \Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\ \Gamma_{i,n}^{(j)},\ \delta_n^{(j)} \big]^{\top}. \end{aligned} \label{eq:net_input_vector} \end{equation} The learning target is the one-step deviation-state change (residual) \begin{equation} \Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n). \label{eq:net_target_residual} \end{equation} %======================== \subsection{Training, Learned Model, and System Prediction (how prediction is used later)} %======================== To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability, we train the forward predictor jointly with an auxiliary backward residual model and impose a multi-step reciprocal-consistency regularization over a $K$-step segment. \textbf{Backward residual model.} In addition to the forward residual predictor, we construct a backward residual network for subsystem $i$, \begin{equation} \mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}, \end{equation} parameterized by $\bar{\Theta}_i$. For the backward step associated with the interval $[t_n,t_{n+1}]$, we define \begin{equation} \begin{aligned} X_{i,\mathrm{in}}^{b} &= \big[ \Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),\ \Gamma_{i,n},\ \delta_n \big]^{\top},\\ X_{i,\mathrm{out}}^{b} &= \hat{I}_i X_{i,\mathrm{in}}^{b} + \mathcal{B}_i(X_{i,\mathrm{in}}^{b};\bar{\Theta}_i), \end{aligned} \label{eq:backward_model_def} \end{equation} where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$. Accordingly, the supervised backward residual target is \begin{equation} \Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t_{n+1}). \end{equation} \textbf{Forward rollout over a $K$-step segment.} Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$, we initialize the forward rollout by \begin{equation} \Delta \hat{x}_i(t_n)=\Delta x_i(t_n), \end{equation} and apply the forward predictor recursively for $K$ steps: \begin{equation} \begin{aligned} \Delta \hat{x}_i(t_{n+s+1}) &= \Delta \hat{x}_i(t_{n+s}) + \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\,\Delta \hat{x}_{Z_i}(t_{n+s}),\, \Gamma_{i,n+s},\,\delta_{n+s};\,\Theta_i \Big), \\ &\qquad s=0,\ldots,K-1. \end{aligned} \label{eq:forward_rollout} \end{equation} \textbf{Backward rollout and reciprocal consistency.} After obtaining the terminal forward prediction, we set the terminal condition for the backward rollout as \begin{equation} \Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}), \end{equation} and roll back using $\mathcal{B}_i$ along the same segment: \begin{equation} \begin{aligned} \Delta \bar{x}_i(t_{n+s}) &= \hat{I}_i X_{i,\mathrm{in}}^{b}(t_{n+s}) + \mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\,\bar{\Theta}_i\Big), \quad s=K-1,\ldots,0, \end{aligned} \label{eq:backward_rollout} \end{equation} where the backward input at time $t_{n+s}$ is \begin{equation} X_{i,\mathrm{in}}^{b}(t_{n+s})= \big[ \Delta \bar{x}_i(t_{n+s+1}),\ \Delta \hat{x}_{Z_i}(t_{n+s+1}),\ \Gamma_{i,n+s},\ \delta_{n+s} \big]^{\top}, \end{equation} and $\Delta \hat{x}_{Z_i}(\cdot)$ is obtained from the same forward rollout. With the forward and backward trajectories available on the same segment, we define the multi-step reciprocal prediction error \begin{equation} E_i(t_n) = \sum_{s=0}^{K} \left\| \Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s}) \right\|^2. \end{equation} \textbf{Training objectives (what each term enforces).} We train the forward and backward networks jointly by minimizing the following objective terms: \begin{equation} \begin{aligned} L_{\mathrm{1step}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| \big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big) -\mathcal{N}_i\!\left( X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i \right) \Big\|^2,\\[2mm] L_{\mathrm{bwd}}(\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| \big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big) -\mathcal{B}_i\!\left( X_{i,\mathrm{in}}^{b\,(j)}(t_{n+s});\bar{\Theta}_i \right) \Big\|^2,\\[2mm] L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n),\\[2mm] L_{\mathrm{roll}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K} \Big\| \Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s}) \Big\|^2. \end{aligned} \label{eq:training_losses} \end{equation} Here, $L_{\mathrm{1step}}$ enforces one-step accuracy, $L_{\mathrm{roll}}$ directly suppresses long-horizon error accumulation (drift), and $L_{\mathrm{msrp}}$ regularizes the learned dynamics by reciprocal consistency between forward and backward rollouts. In implementation, these terms can be combined by a weighted sum (weights chosen by validation): \begin{equation} L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}}. \end{equation} After training, we obtain the forward model \begin{equation} X_{i,\mathrm{out}} = \hat{I}_i X_{i,\mathrm{in}} + \mathcal{N}_i(X_{i,\mathrm{in}}; \Theta_i^*). \end{equation} For system prediction on a local interval $[t_n,t_{n+1}]$, we define the input vector as \begin{equation} X_{i,\mathrm{in}} = \big[ \Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n \big]^{\top}, \end{equation} and perform the one-step prediction \begin{equation} \Delta \hat{x}_i(t_{n+1}) = \Delta x_i(t_n) + \mathcal{N}_i\!\Big( \Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\, \Gamma_{i,n},\,\delta_n;\,\Theta_i^* \Big). \label{eq:one_step_predict} \end{equation} By applying this predictor recursively, we obtain a network model that predicts the system trajectory over long horizons. Finally, the network parameters are optimized using the Adam optimizer: \begin{equation} \Theta_{i,t+1} = \Theta_{i,t} - \eta \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon}, \end{equation} where $\Theta_{i,t}$ denotes the current parameters, $\Theta_{i,t+1}$ the updated parameters, $\eta$ the learning rate, $\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ the bias-corrected first and second moment estimates, and $\varepsilon$ a small constant for numerical stability. Figure~\ref{fig:rnn_logic} illustrates the overall structure. \begin{figure}[htbp] \centering \includegraphics[scale=0.85]{picture/x6.pdf} \caption{Logic diagram of the residual neural network.} \label{fig:rnn_logic} \end{figure} %======================== \section{Nash Equilibrium-Based RNE-DMPC (five-stand, thickness--tension, clear objective and solution)} %======================== The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation. As a result, changes in operating conditions or control actions (roll gap and speed) at one stand can affect both upstream and downstream stands, making centralized online optimization over all decision variables computationally demanding. To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands. Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers. Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate the distributed coordination process as a Nash-equilibrium-seeking iteration. Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed model predictive control method (RNE-DMPC) to achieve coordinated thickness--tension regulation/tracking. The overall control structure is shown in Figure~\ref{4}. \begin{figure*}[htbp] \centering \includegraphics[width=\linewidth]{picture/x2.pdf} \caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4} \end{figure*} \subsection{Control objective, references, and coupling conflict (what is optimized and why there is conflict)} \label{subsec:objective_reference_conflict} \textbf{Tracking variables.} The controlled outputs are the exit thicknesses $\{h_i\}_{i=1}^5$ and inter-stand tensions $\{T_i\}_{i=1}^4$. In deviation coordinates, the controller aims to drive $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$ (regulation), or track given time-varying references $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$. \textbf{Reference signal definition (clear meaning of $\Delta x_{\mathrm{ref}}$).} For each prediction step $s$, define the local deviation reference vector of stand $i$ as \begin{equation} \Delta x_{i,\mathrm{ref}}(t_{n+s}) \triangleq \begin{bmatrix} h_i^{\mathrm{ref}}(t_{n+s})-h_i^{\mathrm{ref}}(t_{n+s})\\ T_{i-1}^{\mathrm{ref}}(t_{n+s})-T_{i-1}^{\mathrm{ref}}(t_{n+s})\\ T_{i}^{\mathrm{ref}}(t_{n+s})-T_{i}^{\mathrm{ref}}(t_{n+s}) \end{bmatrix} = \mathbf{0}, \end{equation} when performing regulation around the references (i.e., we penalize deviation errors). Equivalently, if one prefers to use absolute states, one may set the tracking error as $\hat e_{x,i}(t_{n+s})=\hat x_i(t_{n+s})-x_i^{\mathrm{ref}}(t_{n+s})$. In this paper, we keep the deviation-state formulation and directly penalize $\Delta\hat x_i(t_{n+s})$. Thus, in the cost below, $\Delta x_{\mathrm{ref}}(t_{n+s})$ should be interpreted as the desired deviation (typically zero). For boundary terms ($T_0,T_5$), their references are set to zero and their weights can be set to zero. \textbf{Why conflict occurs (explicit coupling).} Inter-stand tensions are shared coupling variables: the tension $T_i$ depends on the speeds of both stand $i$ and $i+1$ and the strip transport, therefore it is affected by decisions from \emph{two} neighboring controllers. At the same time, each controller also tries to achieve its \emph{local} thickness target via its roll gap. Consequently, a speed/gap action that improves local thickness may deteriorate a shared tension, and vice versa. This creates an intrinsic multi-agent conflict, motivating Nash-equilibrium coordination. \subsection{Subsystem prediction and local optimization (decision variables, model constraint, cost, constraints, and solver)} \label{subsec:local_mpc} Each rolling stand is treated as a subsystem. The coupled subsystem dynamics can be conceptually written as \begin{equation} \Delta x_i(t_{n+1}) = f_i\!\big(\Delta x_i(t_n),u_i(t_n)\big) + \sum_{k \in Z_i} g_{ik}\!\big(\Delta x_k(t_n),u_k(t_n)\big), \end{equation} where $Z_i$ is given by \eqref{eq:neighbor_set} and $g_{ik}$ characterizes the coupling effect (mainly through tensions). Instead of using an explicit first-principles model of $f_i,g_{ik}$, we use the trained neural-network surrogate to predict the deviation-state evolution. \textbf{Local polynomial parameterization of control increments (two inputs).} Over each local interval $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$, the control increment trajectory of subsystem $i$ is parameterized by the vector quadratic polynomial \eqref{eq:du_poly_vector}: \begin{equation} \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}) = \Gamma_{i,n+s,0} +\Gamma_{i,n+s,1}\tau +\Gamma_{i,n+s,2}\tau^2, \qquad \tau \in [0,\delta_{n+s}], \label{eq:du_poly_for_mpc} \end{equation} where $\Gamma_{i,n+s}\in\mathbb{R}^{p}$ with $p=6$ in \eqref{eq:Gamma_dim}. \textbf{Neural-network-based prediction (model constraint for MPC).} Using the trained residual neural network surrogate, subsystem $i$ predicts its one-step deviation state by \begin{equation} \begin{aligned} \Delta \hat{x}_i(t_{n+s+1}) &= \Delta \hat{x}_i(t_{n+s}) + \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\, \Delta \hat{x}_{Z_i}(t_{n+s}),\, \Gamma_{i,n+s},\, \delta_{n+s};\, \Theta_i^* \Big), \\ &\qquad s=0,\ldots,N_p-1, \end{aligned} \label{eq:nn_predict_in_mpc} \end{equation} where $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from the latest communicated neighbor predictions (Nash iteration described later). Equation \eqref{eq:nn_predict_in_mpc} is the key link between \textbf{prediction} and \textbf{control}: for any candidate decision variables $\{\Gamma_{i,n+s}\}$, the MPC optimizer can roll out \eqref{eq:nn_predict_in_mpc} to evaluate the predicted thickness/tension deviations. \textbf{Decision variables.} Optimize the local parameter sequence over the control horizon $N_c$: \begin{equation} \begin{aligned} \mathbf{\Gamma}_i(t_n) &= \big[ \Gamma_{i,n}^\top,\, \Gamma_{i,n+1}^\top,\, \ldots,\, \Gamma_{i,n+N_c-1}^\top \big]^\top \in \mathbb{R}^{pN_c}. \end{aligned} \end{equation} \textbf{Local objective (explicit thickness--tension meaning).} Let $\Delta \hat{x}_i(t_{n+s})=[\Delta \hat h_i(t_{n+s}),\,\Delta \widehat T_{i-1}(t_{n+s}),\,\Delta \widehat T_{i}(t_{n+s})]^\top$. The local cost function of subsystem $i$ is defined as \begin{equation} \begin{aligned} J_i &= \sum_{s=1}^{N_p} \big\| \Delta \hat{x}_i(t_{n+s}) - \Delta x_{i,\mathrm{ref}}(t_{n+s}) \big\|_{Q_i}^2 + \sum_{s=0}^{N_c-1} \big\| \Gamma_{i,n+s} \big\|_{R_i}^2. \end{aligned} \label{eq:local_cost} \end{equation} In the deviation-state regulation setting, $\Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0$, so the first term penalizes predicted thickness and tension deviations. The weighting matrix $Q_i$ is chosen to reflect the relative importance between thickness and tension regulation (e.g., a larger weight on the first component for strict thickness control, while still penalizing adjacent tension deviations), and $R_i$ penalizes the control-increment trajectory parameters to ensure smooth actuation. \textbf{Constraints (how $\Gamma$ enforces bounds over the whole interval).} Typical constraints include bounds on absolute inputs and increment trajectories: \begin{align} u_{i,\min} &\le u_i(t_{n+s}) \le u_{i,\max}, \label{eq:u_abs_bound}\\ \Delta u_{i,\min} &\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}) \le \Delta u_{i,\max}, \notag \\ &\hspace{3.6cm} \forall\tau\in[0,\delta_{n+s}]. \label{eq:du_traj_bound} \end{align} Here, $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ specify bounds on roll gap and speed (component-wise), and $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ specify allowable increment bounds. In practice, for the quadratic trajectory \eqref{eq:du_poly_for_mpc}, the interval-wise bound \eqref{eq:du_traj_bound} can be checked by evaluating $\tau=0$, $\tau=\delta_{n+s}$, and the quadratic extremum $\tau^\star=-\Gamma_{i,n+s,1}\oslash(2\Gamma_{i,n+s,2})$ (component-wise) whenever $\tau^\star\in[0,\delta_{n+s}]$. \textbf{Absolute-input update using interval average (consistency with execution).} To enforce the absolute-input constraints consistently within the prediction horizon, we update the absolute input using the interval average: \begin{equation} \begin{aligned} \Delta u_i(t_{n+s}) &= \frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\,d\tau \\ &= \Gamma_{i,n+s,0} +\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2} +\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3}, \end{aligned} \label{eq:du_avg_in_mpc} \end{equation} and propagate \begin{equation} \begin{aligned} u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \\ u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1. \end{aligned} \label{eq:u_update_in_mpc} \end{equation} This ensures smooth input evolution and avoids abrupt actuator changes. \textbf{Local optimization problem (what is solved at each Nash iteration).} At Nash-iteration index $l$, subsystem $i$ solves the following nonlinear optimization problem: \begin{equation} \mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\; J_i \quad \text{s.t.}\quad \eqref{eq:nn_predict_in_mpc},\ \eqref{eq:u_abs_bound}\text{--}\eqref{eq:u_update_in_mpc}. \label{eq:local_nlp} \end{equation} Because the neural surrogate $\mathcal N_i(\cdot)$ is differentiable, \eqref{eq:local_nlp} is a differentiable nonlinear program (NLP). It can be solved by standard gradient-based NLP solvers (e.g., SQP or interior-point methods) using automatic differentiation to compute gradients. \subsection{Nash equilibrium coordination iteration (how conflict is resolved and how the final solution is obtained)} \label{subsec:nash_coordination} The Nash equilibrium is computed via distributed best-response iterations. Each controller repeatedly computes its best response to the latest neighbor strategies and exchanges predicted trajectories and decision variables through a communication module. This resolves coupling conflicts by seeking a strategy profile in which no single subsystem can unilaterally reduce its own objective \eqref{eq:local_cost} given the other subsystems' strategies. The distributed best-response iteration is summarized in Table~\ref{tab:nash_iter_en}. \begin{table}[t] \centering \small \renewcommand{\arraystretch}{1.12} \setlength{\tabcolsep}{3.5pt} \caption{Distributed Nash best-response iteration for RNE-DMPC (five-stand).} \label{tab:nash_iter_en} \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X} \toprule \textbf{Step} & \textbf{Description} \\ \midrule A & Initialize $l=1$ and initialize $\mathbf{\Gamma}_i^{(0)}$ for all stands (e.g., warm-start from previous time step). \\ B & Using the surrogate predictor \eqref{eq:nn_predict_in_mpc}, compute $\Delta \hat{x}_i^{(l)}(t_{n+s})$ for $s=1,\ldots,N_p$ \\ & given $\mathbf{\Gamma}_i^{(l-1)}$ and the latest neighbor predictions $\Delta \hat{x}_{Z_i}^{(l-1)}(t_{n+s})$. \\ C & Solve the local NLP \eqref{eq:local_nlp} to update $\mathbf{\Gamma}_i^{(l)}$ (best response). \\ D & Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories $\Delta \hat{x}_i^{(l)}(t_{n+s})$ to the communication system. \\ E & Update neighbor predictions $\Delta \hat{x}_{Z_i}^{(l)}(t_{n+s})$ using received information; re-generate predictions if needed. \\ F & Compute the maximum relative change $\varsigma^{(l)}$ as the convergence metric. \\ G & If $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$, stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$; \\ & otherwise set $l\leftarrow l+1$ and repeat Steps B--F. \\ \bottomrule \end{tabularx} \end{table} The convergence metric in Step~F is defined as \begin{equation} \begin{aligned} \varsigma^{(l)} &= \max_i \frac{\left\| \mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)} \right\|_2}{ \left\| \mathbf{\Gamma}_i^{(l-1)} \right\|_2+\epsilon}, \end{aligned} \end{equation} where $\epsilon>0$ avoids division by zero. \textbf{Receding-horizon implementation (how control is applied).} After the Nash iteration converges, only the first-interval parameters $\Gamma_{i,n}^*$ are applied. The control increment trajectory over $[t_n,t_{n+1}]$ is \begin{equation} \Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^*), \quad \tau\in[0,\delta_n]. \end{equation} To obtain the equivalent discrete-time control increment used for updating the absolute input, we use the interval average: \begin{equation} \begin{aligned} \Delta u_i(t_n) &= \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau \\ &= \Gamma_{i,n,0}^* + \Gamma_{i,n,1}^*\frac{\delta_n}{2} + \Gamma_{i,n,2}^*\frac{\delta_n^2}{3}. \end{aligned} \end{equation} Then the absolute control input is updated by \begin{equation} u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n), \end{equation} which ensures smooth input evolution and avoids abrupt actuator changes. \textbf{How prediction and control are connected in closed loop.} At each sampling instant $t_n$: (i) measure/estimate the current deviation states $\Delta x_i(t_n)$ (thickness and tensions) and obtain neighbor information; (ii) run the Nash best-response iteration using the neural predictor \eqref{eq:nn_predict_in_mpc} as the model constraint to solve \eqref{eq:local_nlp}; (iii) apply the first control-increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ and update $u_i(t_n)$ by interval averaging; (iv) roll the horizon forward to $t_{n+1}$ and repeat. In this way, the learned predictor directly provides the multi-step forecasts needed by MPC, and the optimization computes actuator trajectories (roll gap and speed increments) that minimize thickness--tension deviations while coordinating coupling conflicts via Nash equilibrium. The overall control flow chart is shown in Fig.~\ref{liu}. \begin{figure}[htbp] \centering \includegraphics[width=\linewidth]{picture/x5.pdf} \caption{The overall system control flow chart}\label{liu} \end{figure}
这里面每个符号都解释清楚了么
还没到“每个符号都解释得完全清楚”的程度:主线里“厚度/张力/辊缝/速度/邻居/输入参数化/预测—控制衔接”这些关键符号已经解释到位了,但仍有一批技术符号、集合符号、超参数符号、以及少量记号冲突需要你在文中补一句或加一个“符号表”才能避免老师继续挑。
下面我按模块给你做一个逐项核对清单(你可以直接照着补进论文)。
你写了“从 采样/从 采样”,但没定义它们是什么。
✅ 建议补:它们是采样域/超矩形区间(厚度张力偏差范围、辊缝/速度增量多项式系数范围)。
、
你说扰动区间平均,但没说 代表什么(入口厚度扰动?摩擦?硬化?张力测量噪声?)。
✅ 建议补一句: 是区间内扰动信号(列举典型来源), 是区间平均等效扰动。
“五机架耦合模型”具体指什么
你写“integrate coupled mill model”,但老师可能会问:耦合是通过张力传播?模型状态就是厚度张力?
✅ 建议补一句:耦合通过张力方程(速度差/延伸导致)体现;离线仿真使用的机理模型是五机架全耦合的。
、、
你给了选择矩阵形式,但最好补一句: 为 单位阵, 为零矩阵。
分支与学习率 记号冲突(这个很关键)
你网络里用了 作为辅助分支,Adam 又用了学习率 。同一个希腊字母会被老师直接指出“符号混乱”。
✅ 建议改一个:
与“集合 ”同字母 I
你用了 表示残差分支,又用 表示采样域,符号容易混。
✅ 建议把残差分支改成 或 。
损失项权重
你引入了加权总损失,但没说权重怎么选(交叉验证/经验设置)。
✅ 至少补一句:通过验证集调参或给定经验值范围。
你写了样本数量与段数量,但没在段落里明确解释每个符号含义。
✅ 建议在第一次出现处补:
你用了预测域 、控制域 ,但没定义“它们分别代表多少个采样区间”。
✅ 必须补: 为预测步数, 为优化的控制参数步数()。
你说是权重矩阵,但最好明确它们维度:
(逐元素除法符号)
我在约束检查那句用了 ,很多老师不接受这种“非标准记号”。
✅ 建议改成“对每个输入通道分别计算极值点”,或用“Hadamard 除法”并明确定义。更稳的做法是直接删掉 写成分量形式。
与
Nash 收敛里用 ,Adam 里用 。最好明确:一个是避免除零的常数,一个是 Adam 数值稳定项。
✅ 在两处都补一句“ 为避免除零的小常数”。
“局部问题怎么解”
你写“可用 SQP/内点法”,但最好明确:每次 Nash 迭代中,每个机架解一个可微 NLP(因为NN可微)。
✅ 再补一句:用自动微分提供梯度。
你可以在第二章开头或附录加一个小段,把上面漏的符号一网打尽。像这样(你可直接粘贴到 LaTeX):
latex\paragraph{Additional notation.} $\mathcal{I}_x$ denotes the sampling domain (ranges) of deviation states $\Delta x_i$ used for offline data generation; $\mathcal{I}_\Gamma$ denotes the sampling domain of polynomial parameters $\Gamma_{i,n}$. $J$ is the number of one-step samples and $J_K$ is the number of $K$-step segment samples; $K$ is the segment length used for multi-step training. $N_p$ and $N_c$ denote the prediction horizon and control horizon (numbers of sampling intervals), respectively, with $N_c\le N_p$. $Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$ are weighting matrices in the local objective. $I_d$ is the $d\times d$ identity matrix and $0_{a\times b}$ denotes an $a\times b$ zero matrix. $\epsilon>0$ (in the Nash metric) and $\varepsilon>0$ (in Adam) are small constants for numerical stability.
并且把学习率符号改掉(比如):
所以答案是:主物理符号已经清楚,但还缺一批“集合/超参数/求解细节/记号冲突”的定义。把我上面列的 14 项补完(尤其是 冲突、 的直白定义、 替换),老师一般就不会再说“符号不清、写得模糊”。
如果你愿意,我也可以把我上次给你的那份 LaTeX 再做一次“全局一致性清理版”:
再做一次“全局一致性清理版
latex%====================================================================== % GLOBAL CONSISTENCY CLEAN VERSION (Five-stand, thickness+tension only, % actuators: roll gap + stand speed; Δx = deviation, Δu = increment) %====================================================================== %======================== \section{Construction of Dataset} %======================== \subsection{Five-stand tandem mill setting and unified notation} \label{subsec:notation_clean} Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$. Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n$ the corresponding sampling interval length. Define the local within-interval time variable $\tau=t-t_n\in[0,\delta_n]$. \paragraph{Thickness and tension signals (what is controlled).} Let $h_i(t)$ denote the exit thickness of stand $i$ ($i=1,\dots,5$), and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ($i=1,\dots,4$). The system is strongly coupled because the inter-stand tensions propagate along the mill line and are affected by neighboring stands' actions. \paragraph{Reference trajectories and deviation-state definition (meaning of $\Delta x$).} Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be the desired references (setpoints) given by process requirements (e.g., schedule-based references or constant setpoints). We define deviation (tracking-error) variables \begin{equation} \Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad \Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t). \label{eq:dev_def} \end{equation} Throughout this paper, the symbol ``$\Delta$'' attached to \emph{states} always means \emph{deviation from reference}. \paragraph{Local state vector (where thickness and tension appear).} For each stand $i$, we choose the local deviation state as \begin{equation} \Delta x_i(t)\triangleq \begin{bmatrix} \Delta h_i(t)\\ \Delta T_{i-1}(t)\\ \Delta T_i(t) \end{bmatrix}\in\mathbb{R}^{d},\qquad d=3, \label{eq:xi_def_clean} \end{equation} with the boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$ to keep a unified dimension $d=3$ for all stands. (Equivalently, one may remove nonexistent boundary tensions from the state and use varying dimensions; here we keep a unified form.) \paragraph{Neighbor sets and coupling representation (five-stand chain).} For a five-stand tandem mill, the dominant coupling is between adjacent stands, hence we define \begin{equation} Z_1=\{2\},\quad Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad Z_5=\{4\}. \label{eq:Zi_clean} \end{equation} Define the neighbor-state stack \begin{equation} \Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\,|\,k\in Z_i\}. \label{eq:xZi_clean} \end{equation} \paragraph{Actuators and increment inputs (meaning of $u$ and $\Delta u$).} Each stand $i$ is manipulated by \emph{roll gap} (screw-down/hydraulic gap) $s_i(t)$ and \emph{stand speed} $v_i(t)$: \begin{equation} u_i(t)= \begin{bmatrix} s_i(t)\\ v_i(t) \end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2. \label{eq:ui_clean} \end{equation} To ensure smooth actuation and match industrial practice, we optimize \emph{discrete input increments}: \begin{equation} \Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1}) = \begin{bmatrix} \Delta s_i(t_n)\\ \Delta v_i(t_n) \end{bmatrix}. \label{eq:du_discrete_clean} \end{equation} Throughout this paper, the symbol ``$\Delta$'' attached to \emph{inputs} $\Delta u_i(t_n)$ means \emph{sample-to-sample increment}. Thus, $\Delta x$ (deviation state) and $\Delta u$ (input increment) are conceptually different, and this is fixed by definition. \paragraph{Disturbance.} Let $d_i(t)$ denote exogenous disturbances (e.g., entry thickness fluctuation, friction variation, material parameter drift, etc.). We denote the interval-level equivalent disturbance by $\Delta d_i(t_n)$ (defined via interval averaging below). \paragraph{Basic matrix notation.} $I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes the $a\times b$ zero matrix. \subsection{Discrete interval mapping and data-driven learning objective} \label{subsec:mapping_clean} The stand-wise deviation-state evolution over $[t_n,t_{n+1}]$ can be expressed by a discrete-time mapping \begin{equation} \Delta x_i(t_{n+1}) = \Phi_i\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Delta u_i([t_n,t_{n+1}]),\,\Delta d_i([t_n,t_{n+1}])\Big), \label{eq:true_mapping_clean} \end{equation} where $\Phi_i(\cdot)$ is generally nonlinear and coupled due to rolling deformation and tension propagation. A commonly used \emph{conceptual} equivalent discrete linear form is \begin{equation} \Delta x_i(t_{n+1}) = M_d\,\Delta x_i(t_n) + N_d\,\Delta u_i(t_n) + F_d\,\Delta d_i(t_n), \label{eq:linear_form_concept} \end{equation} where $M_d,N_d,F_d$ represent equivalent discrete-time matrices around operating conditions. In a practical five-stand cold rolling mill, accurately deriving/identifying these matrices and disturbance models from first principles is difficult, due to strong coupling, unmodeled nonlinearities, and time-varying operating regimes. Therefore, this paper aims to learn a high-fidelity approximation of the interval evolution from data and then embed it into distributed MPC. \begin{remark} In fact, due to the existence of complex coupling relationships, it is difficult to directly and accurately establish \eqref{eq:linear_form_concept} based on first principles. Therefore, in this paper, we learn an approximate mapping of \eqref{eq:true_mapping_clean} from data. \end{remark} %======================== \subsection{Interval-level parameterization and one-step dataset} %======================== \paragraph{Why interval-level parameterization is reasonable in the five-stand setting.} Although decisions are updated at discrete instants $t_n$, the hydraulic gap and drive systems evolve continuously inside each interval, and abrupt within-interval changes may excite tension oscillations and deteriorate thickness stability. Thus, parameterizing the within-interval increment trajectory by a low-order polynomial: (i) yields a compact finite-dimensional decision representation; (ii) enforces smooth profiles inside the interval; (iii) enables enforcing increment constraints for all $\tau\in[0,\delta_n]$. This is appropriate when $\delta_n$ is not excessively large relative to actuator bandwidth and the within-interval evolution is well approximated by a low-order basis. \paragraph{Vector quadratic polynomial parameterization (two inputs).} On the interval $[t_n,t_{n+1}]$, parameterize the control increment trajectory as \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n}) = \Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2, \qquad \tau\in[0,\delta_n], \label{eq:du_poly_vec_clean} \end{equation} where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ are coefficient vectors ($n_u=2$). Component-wise, \eqref{eq:du_poly_vec_clean} corresponds to \begin{equation} \begin{aligned} \Delta s_{i,n}(\tau) &= \gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\\ \Delta v_{i,n}(\tau) &= \gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2. \end{aligned} \label{eq:du_components_clean} \end{equation} Define the stacked parameter vector \begin{equation} \Gamma_{i,n}\triangleq \big[ (\Gamma_{i,n0})^\top,\, (\Gamma_{i,n1})^\top,\, (\Gamma_{i,n2})^\top \big]^\top \in\mathbb{R}^{p}, \qquad p=3n_u=6. \label{eq:Gamma_clean} \end{equation} Here, $\Gamma_{i,n0}$ is the baseline increment at $\tau=0$, while $\Gamma_{i,n1}$ and $\Gamma_{i,n2}$ describe the linear and quadratic variation rates. \paragraph{Equivalent discrete-time (interval-averaged) increments.} Define the interval-averaged equivalent increments as \begin{equation} \begin{aligned} \Delta u_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau,\\ \Delta d_i(t_n) &\triangleq \frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau. \end{aligned} \label{eq:avg_def_clean} \end{equation} With \eqref{eq:du_poly_vec_clean}, the input average has a closed form: \begin{equation} \Delta u_i(t_n)= \Gamma_{i,n0} +\Gamma_{i,n1}\frac{\delta_n}{2} +\Gamma_{i,n2}\frac{\delta_n^2}{3}. \label{eq:avg_closed_clean} \end{equation} \paragraph{Sampling domains for offline data generation.} Let $\mathcal{I}_x$ denote the sampling domain (ranges) of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$, and let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$ (covering both gap and speed channels). These domains specify the operating envelope used to generate supervised training data. \paragraph{One-step sample generation (five-stand coupled simulation).} Given the above parameterization, one training sample is generated on each interval $[t_n,t_{n+1}]$. In addition to the local deviation state, the neighbor deviation states are included to represent inter-stand coupling. The process is summarized in Table~\ref{tab:interval_sample_generation_en}. \begin{table}[t] \centering \small \renewcommand{\arraystretch}{1.15} \caption{Procedure for generating one interval-level sample on $[t_n,t_{n+1}]$ (five-stand coupled mill).} \label{tab:interval_sample_generation_en} \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.09\linewidth} X} \toprule \textbf{Step} & \textbf{Operation} \\ \midrule 1 & \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \\ 2 & \textbf{Parameter sampling:} draw $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ (coefficients for both $\Delta s_{i,n}(\tau)$ and $\Delta v_{i,n}(\tau)$). \\ 3 & \textbf{Control construction:} compute $\Delta u_{i,n}(\tau)$ via \eqref{eq:du_poly_vec_clean}. \\ 4 & \textbf{State propagation:} integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval control trajectory, and record $\Delta x_i(t_{n+1})$. \\ \bottomrule \end{tabularx} \end{table} Accordingly, an interval sample for subsystem $i$ can be represented as \begin{equation} \mathcal{D}_{i,n}=\big\{\Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Delta u_{i,n}(\tau),\ \Delta x_i(t_{n+1})\big\}. \label{eq:interval_sample_clean} \end{equation} Note that $\Delta u_{i,n}(\tau)$ is fully determined by $(\Gamma_{i,n},\delta_n)$ via \eqref{eq:du_poly_vec_clean}, therefore it is sufficient to store $(\Gamma_{i,n},\delta_n)$ as the learning input. For each subsystem $i$, by repeating the above procedure across multiple intervals and randomized draws, the local one-step training dataset is formed as \begin{equation} \begin{split} S_i=\Big\{& \big(\Delta x_i^{(j)}(t_n),\,\Delta x_{Z_i}^{(j)}(t_n),\,\Delta x_i^{(j)}(t_{n+1});\, \Gamma_{i,n}^{(j)},\,\delta_n^{(j)}\big) \ \Big|\ j=1,\ldots,J \Big\}. \end{split} \label{eq:S_i_clean} \end{equation} Here $J$ is the number of one-step samples for subsystem $i$. The overall dataset for the five-stand mill is denoted by $\{S_i\}_{i=1}^{5}$. The point-cloud visualization of the training dataset is shown in Figure~\ref{2}. \begin{figure*}[htbp] \centering \includegraphics[scale=0.5]{picture/Fig2.pdf} \caption{Point cloud map of the training dataset.}\label{2} \end{figure*} %======================== \subsection{Multi-step rollout segment dataset} %======================== The one-step set $S_i$ is sufficient for one-step regression, but it is not sufficient for training with multi-step rollout loss and reciprocal-consistency regularization, because these objectives require ground-truth deviation-state trajectories over a horizon of $K$ consecutive intervals. Therefore, without changing the single-interval sampling mechanism above, we additionally organize the offline-simulated samples into $K$-step trajectory segments. Specifically, during offline simulation, for each starting time $t_n$ we generate a segment of length $K$ by consecutively sampling $\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ (and the corresponding inputs/disturbances), and integrating the five-stand coupled mill model over $[t_{n+s},t_{n+s+1}]$ for $s=0,\ldots,K-1$. Hence, we obtain the deviation-state sequence $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as the neighbor stacks $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$. Define a $K$-step segment sample for subsystem $i$ as \begin{equation} \begin{aligned} \mathcal{W}_{i,n}= \Big\{& \big(\Delta x_i(t_{n+s}),\,\Delta x_{Z_i}(t_{n+s}),\,\Gamma_{i,n+s},\,\delta_{n+s}\big)_{s=0}^{K-1}; \\ &\big(\Delta x_i(t_{n+s+1})\big)_{s=0}^{K-1} \Big\}. \end{aligned} \label{eq:segment_clean} \end{equation} By repeating the above segment generation, we form the multi-step training set \begin{equation} S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}, \label{eq:S_i_K_clean} \end{equation} where $J_K$ is the number of $K$-step segment samples. Note that $S_i$ can be viewed as the marginal one-step projection of $S_i^{(K)}$ (keeping only $s=0$), thus the original dataset design is preserved, and only an additional \emph{segment organization} is introduced for multi-step training. %======================== \section{Construction of Residual Neural Network} %======================== \subsection{Network architecture (what is learned, why residual, why include control)} \label{subsec:net_clean} \paragraph{Learning target (one-step controlled deviation-state evolution).} Given the dataset, the neural network model is trained to learn a stand-wise, control-dependent one-step evolution law of deviation states: \begin{equation} \Delta x_i(t_{n+1}) \approx \Delta x_i(t_n)+ \mathcal{N}_i\!\Big(\Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\,\Gamma_{i,n},\,\delta_n;\,\Theta_i\Big), \label{eq:learned_dyn_clean} \end{equation} where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are trainable parameters. \paragraph{Why the model must include $u$ (difference between ``with $u$'' and ``without $u$'').} If $\mathcal{N}_i$ does not take control information as input (here $\Gamma_{i,n}$ and $\delta_n$), the predictor becomes an autoregressive model that only reproduces trajectories under the training input patterns and cannot answer the counterfactual question: ``what will happen if we choose a different roll gap/speed trajectory?'' Since MPC optimizes over candidate decisions, a control-dependent predictor \eqref{eq:learned_dyn_clean} is necessary to evaluate the predicted thickness/tension behavior under different candidate actuator trajectories. \paragraph{Input/output dimensions.} Let $d=3$ (state dimension), $|Z_i|$ be the number of neighbors of stand $i$ in \eqref{eq:Zi_clean}, and $p=6$ in \eqref{eq:Gamma_clean}. Define the input vector \begin{equation} X_{i,\text{in}} \triangleq \big[ \Delta x_i(t_n)^\top,\, \Delta x_{Z_i}(t_n)^\top,\, \Gamma_{i,n}^\top,\, \delta_n \big]^\top \in \mathbb{R}^{d(1+|Z_i|)+p+1}. \label{eq:X_in_clean} \end{equation} The network mapping is \begin{equation} \mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}. \end{equation} \paragraph{Residual (shortcut) structure.} To improve training stability and long-horizon rollout robustness, we use a residual form. Let $\hat{I}_i\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}$ be a selection matrix extracting the local state block: \begin{equation} \hat{I}_i = [I_d,\, 0_{d\times(d|Z_i|+p+1)}]. \label{eq:Ihat_clean} \end{equation} Then the one-step predictor is written as \begin{equation} X_{i,\text{out}} = \hat{I}_i X_{i,\text{in}} + \mathcal{N}_i(X_{i,\text{in}}; \Theta_i), \label{eq:res_predict_clean} \end{equation} where $X_{i,\text{out}}$ represents the predicted $\Delta x_i(t_{n+1})$. This structure implements a baseline-plus-correction interpretation: the shortcut propagates the current deviation state $\Delta x_i(t_n)$, while the network learns the correction capturing unmodeled nonlinearities and inter-stand coupling (via $\Delta x_{Z_i}$) under varying operating conditions. \paragraph{Auxiliary branch for variable interval length (avoid symbol conflict).} To improve robustness when $\delta_n$ varies, we introduce an auxiliary branch inside $\mathcal{N}_i$: \begin{equation} \mathcal{N}_i(X_{i,\text{in}};\Theta_i)\triangleq \psi_i(X_{i,\text{in}};\Theta_{\psi_i}) + \rho_i(X_{i,\text{in}};\theta_i), \label{eq:aux_clean} \end{equation} where $\psi_i(\cdot)$ is a lightweight feedforward branch that captures low-frequency/scale effects strongly related to $\delta_n$, and $\rho_i(\cdot)$ captures the remaining nonlinear coupling corrections. When $\psi_i(\cdot)\equiv 0$, the model reduces to a standard residual network. (We use $\psi_i$ and $\rho_i$ to avoid notation conflicts with the sampling sets $\mathcal{I}_x,\mathcal{I}_\Gamma$ and the optimizer learning rate.) \paragraph{One-step supervised target.} For the $j$-th sample in \eqref{eq:S_i_clean}, define \begin{equation} X_{i,\text{in}}^{(j)} = \big[ \Delta x_i^{(j)}(t_n),\ \Delta x_{Z_i}^{(j)}(t_n),\ \Gamma_{i,n}^{(j)},\ \delta_n^{(j)} \big]^{\top}, \end{equation} and the supervised residual target \begin{equation} \Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n). \label{eq:target_clean} \end{equation} \subsection{Training, learned model, and system prediction (multi-step stability and control usage)} \label{subsec:train_clean} To suppress accumulation drift induced by long-horizon recursion and to improve long-term predictive stability, we train the forward predictor jointly with an auxiliary backward residual model and impose a multi-step reciprocal-consistency regularization over a $K$-step segment from $S_i^{(K)}$. \paragraph{Backward residual model.} Construct a backward residual network \begin{equation} \mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}, \end{equation} parameterized by $\bar{\Theta}_i$. For the backward step associated with interval $[t_n,t_{n+1}]$, define \begin{equation} \begin{aligned} X_{i,\mathrm{in}}^{b} &= \big[ \Delta x_i(t_{n+1}),\ \Delta x_{Z_i}(t_{n+1}),\ \Gamma_{i,n},\ \delta_n \big]^{\top},\\ X_{i,\mathrm{out}}^{b} &= \hat{I}_i X_{i,\mathrm{in}}^{b} + \mathcal{B}_i(X_{i,\mathrm{in}}^{b};\bar{\Theta}_i), \end{aligned} \label{eq:back_clean} \end{equation} where $X_{i,\mathrm{out}}^{b}$ represents the backward estimate of $\Delta x_i(t_n)$. The supervised backward residual target is \begin{equation} \Delta r_i^{b}=\Delta x_i(t_n)-\Delta x_i(t_{n+1}). \end{equation} \paragraph{Forward rollout on a $K$-step segment.} Given a segment sample $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize \begin{equation} \Delta \hat{x}_i(t_n)=\Delta x_i(t_n), \end{equation} and recursively apply the forward predictor for $K$ steps: \begin{equation} \begin{aligned} \Delta \hat{x}_i(t_{n+s+1}) &= \Delta \hat{x}_i(t_{n+s}) + \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\,\Delta \hat{x}_{Z_i}(t_{n+s}),\, \Gamma_{i,n+s},\,\delta_{n+s};\,\Theta_i \Big),\\ &\qquad s=0,\ldots,K-1. \end{aligned} \label{eq:fwd_roll_clean} \end{equation} \paragraph{Backward rollout and reciprocal consistency.} Set the terminal condition \begin{equation} \Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}), \end{equation} and roll back using $\mathcal{B}_i$: \begin{equation} \begin{aligned} \Delta \bar{x}_i(t_{n+s}) &= \hat{I}_i X_{i,\mathrm{in}}^{b}(t_{n+s}) + \mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\,\bar{\Theta}_i\Big), \quad s=K-1,\ldots,0, \end{aligned} \label{eq:bwd_roll_clean} \end{equation} where \begin{equation} X_{i,\mathrm{in}}^{b}(t_{n+s})= \big[ \Delta \bar{x}_i(t_{n+s+1}),\ \Delta \hat{x}_{Z_i}(t_{n+s+1}),\ \Gamma_{i,n+s},\ \delta_{n+s} \big]^{\top}. \end{equation} Define the multi-step reciprocal prediction error \begin{equation} E_i(t_n) = \sum_{s=0}^{K} \left\| \Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s}) \right\|^2. \end{equation} \paragraph{Training objectives (meaning of each loss).} We jointly minimize: \begin{equation} \begin{aligned} L_{\mathrm{1step}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| \big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big) -\mathcal{N}_i\!\left( X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i \right) \Big\|^2,\\[2mm] L_{\mathrm{bwd}}(\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| \big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big) -\mathcal{B}_i\!\left( X_{i,\mathrm{in}}^{b\,(j)}(t_{n+s});\bar{\Theta}_i \right) \Big\|^2,\\[2mm] L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n),\\[2mm] L_{\mathrm{roll}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K} \Big\| \Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s}) \Big\|^2. \end{aligned} \label{eq:loss_clean} \end{equation} Here, $L_{\mathrm{1step}}$ enforces one-step accuracy; $L_{\mathrm{roll}}$ explicitly suppresses long-horizon drift under recursion; $L_{\mathrm{msrp}}$ regularizes the learned dynamics by enforcing reciprocal consistency between forward and backward rollouts; and $L_{\mathrm{bwd}}$ trains the backward model for the consistency regularization. In implementation, these terms are combined as \begin{equation} L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}}, \end{equation} where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set. \paragraph{Learned forward predictor used in control.} After training, the forward predictor is \begin{equation} \Delta \hat{x}_i(t_{n+1}) = \Delta x_i(t_n) + \mathcal{N}_i\!\Big( \Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\, \Gamma_{i,n},\,\delta_n;\,\Theta_i^* \Big), \label{eq:pred_clean} \end{equation} and multi-step prediction is obtained by recursive rollout of \eqref{eq:pred_clean}. This learned predictor is the internal model used by the MPC optimizer in the next section. Finally, network parameters are optimized using Adam: \begin{equation} \Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon}, \end{equation} where $\alpha$ is the learning rate (we use $\alpha$ to avoid conflict with other symbols), $\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected moment estimates, and $\varepsilon>0$ is a small constant for numerical stability. Figure~\ref{fig:rnn_logic} illustrates the overall structure. \begin{figure}[htbp] \centering \includegraphics[scale=0.85]{picture/x6.pdf} \caption{Logic diagram of the residual neural network.} \label{fig:rnn_logic} \end{figure} %======================== \section{Nash Equilibrium-Based RNE-DMPC} %======================== The five-stand tandem cold rolling system is strongly coupled through inter-stand tension propagation. As a result, changes in control actions (roll gap and stand speed) at one stand can affect both upstream and downstream stands, making centralized online optimization over all stands' decision variables computationally demanding. To mitigate this issue, we decompose the global predictive-control problem into $N=5$ local subproblems associated with individual stands. Each local controller optimizes its own decision variables while accounting for coupling via limited information exchange with neighboring controllers. Motivated by game-theoretic coordination \citep{rawlings2008coordinating}, we formulate distributed coordination as a Nash-equilibrium-seeking iteration. Based on the trained residual neural network surrogate model, we construct a Nash-equilibrium-based distributed MPC method (RNE-DMPC) for coordinated thickness--tension regulation/tracking. The overall control structure is shown in Figure~\ref{4}. \begin{figure*}[htbp] \centering \includegraphics[width=\linewidth]{picture/x2.pdf} \caption{Schematic diagram of the control architecture for a tandem cold rolling mill.}\label{4} \end{figure*} \subsection{Prediction model used in MPC and the prediction--control interface} \label{subsec:interface_clean} \paragraph{Key idea: prediction serves control through model constraints.} At each sampling time $t_n$, MPC evaluates candidate actuator trajectories (encoded by $\Gamma_{i,n+s}$) by rolling out predictions of thickness/tension deviations using the learned surrogate \eqref{eq:pred_clean}. Therefore, the learned predictor directly provides the multi-step forecasts required to compute the MPC objective and enforce constraints. \paragraph{Local polynomial parameterization over the horizon.} Over each interval $[t_{n+s},t_{n+s+1}]$ with length $\delta_{n+s}$, the control increment trajectory of stand $i$ is \begin{equation} \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}) = \Gamma_{i,n+s,0} +\Gamma_{i,n+s,1}\tau +\Gamma_{i,n+s,2}\tau^2,\qquad \tau \in [0,\delta_{n+s}], \label{eq:du_poly_mpc_clean} \end{equation} where $\Gamma_{i,n+s}\in\mathbb{R}^{p}$ with $p=6$. \paragraph{Neural-network-based multi-step prediction inside MPC.} Define the prediction horizon $N_p$ (number of future intervals predicted) and the control horizon $N_c$ (number of future intervals optimized), with $N_c\le N_p$. Given the current measured/estimated deviation states $\Delta x_i(t_n)$ and a candidate decision sequence $\mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top$, stand $i$ predicts its deviation-state evolution by \begin{equation} \begin{aligned} \Delta \hat{x}_i(t_{n+s+1}) &= \Delta \hat{x}_i(t_{n+s}) + \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\, \Delta \hat{x}_{Z_i}(t_{n+s}),\, \Gamma_{i,n+s},\, \delta_{n+s};\, \Theta_i^* \Big), \\ &\qquad s=0,\ldots,N_p-1, \end{aligned} \label{eq:rollout_mpc_clean} \end{equation} with initialization $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$. Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from the latest communication with neighbors during the Nash iteration. Equation \eqref{eq:rollout_mpc_clean} is the explicit mathematical interface: \textbf{control decisions} $(\Gamma_{i,n+s})$ $\rightarrow$ \textbf{predicted thickness/tension deviations} $(\Delta \hat{x}_i)$ $\rightarrow$ \textbf{objective/constraints evaluation}. \subsection{Local optimization problem (objective, reference meaning, constraints, and numerical solution)} \label{subsec:local_opt_clean} \paragraph{Decision variables.} At time $t_n$, the local decision vector for stand $i$ is \begin{equation} \mathbf{\Gamma}_i(t_n) = \big[ \Gamma_{i,n}^\top,\, \Gamma_{i,n+1}^\top,\, \ldots,\, \Gamma_{i,n+N_c-1}^\top \big]^\top \in \mathbb{R}^{pN_c}. \label{eq:Gamma_seq_clean} \end{equation} \paragraph{Reference meaning (remove ambiguity of $\Delta x_{\mathrm{ref}}$).} Because the deviation state is defined as $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ in \eqref{eq:dev_def} and \eqref{eq:xi_def_clean}, the desired regulation/tracking objective in deviation coordinates is always \begin{equation} \Delta x_i(t)\rightarrow 0. \end{equation} Therefore, the reference in deviation form is simply the zero vector, i.e., \begin{equation} \Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}. \label{eq:dxref_zero_clean} \end{equation} Equivalently, one can view the cost as penalizing $(\hat x_i-x_i^{\mathrm{ref}})$ in absolute coordinates; in this paper we keep the deviation formulation. \paragraph{Local objective (explicitly thickness+tension).} Let $\Delta \hat{x}_i(t_{n+s})=[\Delta \hat h_i(t_{n+s}),\,\Delta \widehat T_{i-1}(t_{n+s}),\,\Delta \widehat T_i(t_{n+s})]^\top$. The local cost is \begin{equation} \begin{aligned} J_i &= \sum_{s=1}^{N_p} \big\| \Delta \hat{x}_i(t_{n+s}) - \Delta x_{i,\mathrm{ref}}(t_{n+s}) \big\|_{Q_i}^2 + \sum_{s=0}^{N_c-1} \big\| \Gamma_{i,n+s} \big\|_{R_i}^2, \end{aligned} \label{eq:Ji_clean} \end{equation} where $Q_i\in\mathbb{R}^{d\times d}$ weights thickness and tension deviations, and $R_i\in\mathbb{R}^{p\times p}$ penalizes the polynomial-parameter magnitude to encourage smooth increments. Using \eqref{eq:dxref_zero_clean}, the tracking term reduces to penalizing predicted deviation states directly. \paragraph{Constraints.} We enforce both absolute-input bounds and increment-trajectory bounds. \emph{Absolute input bounds:} \begin{equation} u_{i,\min} \le u_i(t_{n+s}) \le u_{i,\max}, \qquad s=0,\ldots,N_p-1, \label{eq:u_abs_clean} \end{equation} where $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ provide component-wise bounds for $(s_i,v_i)$. \emph{Increment trajectory bounds for all $\tau$:} \begin{equation} \Delta u_{i,\min} \le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s}) \le \Delta u_{i,\max}, \qquad \forall\tau\in[0,\delta_{n+s}], \label{eq:du_traj_clean} \end{equation} where $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ are component-wise bounds for $(\Delta s,\Delta v)$. \paragraph{Practical enforcement of \eqref{eq:du_traj_clean} (no nonstandard symbols).} For a scalar quadratic $q(\tau)=a+b\tau+c\tau^2$ on $\tau\in[0,\delta]$, its extrema over the interval occur at $\tau=0$, $\tau=\delta$, and possibly at the stationary point $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$. Therefore, to enforce \eqref{eq:du_traj_clean} for the two-channel vector $\Delta u_{i,n+s}(\tau)=[\Delta s_{i,n+s}(\tau),\,\Delta v_{i,n+s}(\tau)]^\top$, we check the above candidate points \emph{separately for each channel} using the corresponding coefficients in \eqref{eq:du_components_clean}. \paragraph{Consistency between within-interval trajectory and discrete execution (interval average).} To update the discrete absolute input and enforce \eqref{eq:u_abs_clean} consistently, we use the interval-averaged increment: \begin{equation} \Delta u_i(t_{n+s}) = \frac{1}{\delta_{n+s}}\int_0^{\delta_{n+s}}\Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\,d\tau = \Gamma_{i,n+s,0} +\Gamma_{i,n+s,1}\frac{\delta_{n+s}}{2} +\Gamma_{i,n+s,2}\frac{\delta_{n+s}^2}{3}. \label{eq:du_avg_clean} \end{equation} Then propagate the absolute input sequence: \begin{equation} \begin{aligned} u_i(t_n) &= u_i(t_{n-1}) + \Delta u_i(t_n), \\ u_i(t_{n+s}) &= u_i(t_{n+s-1}) + \Delta u_i(t_{n+s}), \qquad s=1,\ldots,N_p-1. \end{aligned} \label{eq:u_prop_clean} \end{equation} \paragraph{Local optimization problem (solved at each Nash iteration).} At Nash-iteration index $l$, subsystem $i$ solves \begin{equation} \mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\; J_i \quad \text{s.t.}\quad \eqref{eq:rollout_mpc_clean},\ \eqref{eq:u_abs_clean},\ \eqref{eq:du_traj_clean},\ \eqref{eq:u_prop_clean}. \label{eq:local_prob_clean} \end{equation} Because the learned surrogate $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_prob_clean} is a differentiable nonlinear program (NLP), which can be solved by standard gradient-based NLP solvers (e.g., SQP or interior-point methods) using automatic differentiation to compute gradients. \subsection{Nash equilibrium coordination (where conflict comes from, how it is resolved, and how the solution is obtained)} \label{subsec:nash_clean} \paragraph{Why a Nash iteration is needed (explicit conflict explanation).} Inter-stand tensions are shared coupling variables: the tension $T_i$ is influenced by both stand $i$ and stand $i+1$ (notably through their speed actions), and changes in roll gap can also indirectly affect tensions via strip deformation and transport. Therefore, purely independent local optimization can lead to conflicting actions: improving local thickness may worsen neighbor tensions, and vice versa. To resolve this coupling conflict with limited communication, we adopt a Nash-equilibrium-seeking distributed best-response iteration. \paragraph{Distributed best-response iteration.} At iteration $l$, each stand computes its best response to the latest neighbor strategies and predictions. The procedure is summarized in Table~\ref{tab:nash_iter_en}. \begin{table}[t] \centering \small \renewcommand{\arraystretch}{1.12} \setlength{\tabcolsep}{3.5pt} \caption{Distributed Nash best-response iteration for RNE-DMPC (five-stand).} \label{tab:nash_iter_en} \begin{tabularx}{\linewidth}{>{\centering\arraybackslash}p{0.11\linewidth} X} \toprule \textbf{Step} & \textbf{Description} \\ \midrule A & Initialize $l=1$ and initialize $\mathbf{\Gamma}_i^{(0)}$ for all subsystems (e.g., warm-start from previous time step). \\ B & Using \eqref{eq:rollout_mpc_clean}, compute $\Delta \hat{x}_i^{(l)}(t_{n+s})$ for $s=1,\ldots,N_p$ \\ & given $\mathbf{\Gamma}_i^{(l-1)}$ and the latest neighbor predictions $\Delta \hat{x}_{Z_i}^{(l-1)}(t_{n+s})$. \\ C & Solve the local NLP \eqref{eq:local_prob_clean} to update $\mathbf{\Gamma}_i^{(l)}$ (best response). \\ D & Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories $\Delta \hat{x}_i^{(l)}(t_{n+s})$ to the communication system. \\ E & Update neighbor predictions $\Delta \hat{x}_{Z_i}^{(l)}(t_{n+s})$ using received information; re-generate predictions if needed. \\ F & Compute the maximum relative change $\varsigma^{(l)}$ as the convergence metric. \\ G & If $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$, stop and set $\mathbf{\Gamma}_i^*=\mathbf{\Gamma}_i^{(l)}$; \\ & otherwise set $l\leftarrow l+1$ and repeat Steps B--F. \\ \bottomrule \end{tabularx} \end{table} \paragraph{Convergence metric.} Define \begin{equation} \varsigma^{(l)} = \max_i \frac{\left\| \mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)} \right\|_2}{ \left\| \mathbf{\Gamma}_i^{(l-1)} \right\|_2+\epsilon}, \label{eq:nash_metric_clean} \end{equation} where $\epsilon>0$ is a small constant to avoid division by zero. \paragraph{Receding-horizon implementation (how the final control is applied).} Only the first-interval parameters $\Gamma_{i,n}^*$ are applied. The increment trajectory over $[t_n,t_{n+1}]$ is \begin{equation} \Delta u_{i,n}(\tau)=\Delta u_{i,n}(\tau;\Gamma_{i,n}^*), \quad \tau\in[0,\delta_n]. \end{equation} The discrete input increment applied for updating the absolute input is the interval average: \begin{equation} \Delta u_i(t_n) = \frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau = \Gamma_{i,n,0}^* + \Gamma_{i,n,1}^*\frac{\delta_n}{2} + \Gamma_{i,n,2}^*\frac{\delta_n^2}{3}. \label{eq:apply_avg_clean} \end{equation} Then the absolute input is updated by \begin{equation} u_i(t_n)=u_i(t_{n-1})+\Delta u_i(t_n), \label{eq:apply_u_clean} \end{equation} which ensures smooth evolution of both roll gap and stand speed and avoids abrupt actuator changes. \paragraph{Closed-loop prediction--control connection (complete loop).} At each sampling instant $t_n$: (i) measure/estimate the current deviation states $\Delta x_i(t_n)$ (thickness and tensions) and initialize neighbor information; (ii) perform Nash best-response iterations; in each iteration solve \eqref{eq:local_prob_clean} using the learned predictor \eqref{eq:rollout_mpc_clean}; (iii) after convergence, apply $\Gamma_{i,n}^*$ by generating the within-interval increment trajectory and updating $u_i(t_n)$ via \eqref{eq:apply_avg_clean}--\eqref{eq:apply_u_clean}; (iv) move to $t_{n+1}$ and repeat. In this way, the learned neural predictor supplies the multi-step forecasts needed by MPC, and the distributed optimization computes coordinated gap/speed increment trajectories that regulate thickness and tensions while resolving coupling conflicts via Nash equilibrium. The overall control flow chart is shown in Fig.~\ref{liu}. \begin{figure}[htbp] \centering \includegraphics[width=\linewidth]{picture/x5.pdf} \caption{The overall system control flow chart}\label{liu} \end{figure} %======================== % Optional: a compact symbol paragraph you can keep or move to Appendix %======================== \paragraph{Summary of key symbols (for reviewer clarity).} $i\in\{1,\dots,5\}$: stand index; $t_n$: sampling instant; $\delta_n$: sampling interval; $\tau$: within-interval time. $h_i,T_i$: thickness and inter-stand tension; $h_i^{\mathrm{ref}},T_i^{\mathrm{ref}}$: references. $\Delta h_i,\Delta T_i$: deviations; $\Delta x_i$: local deviation state in \eqref{eq:xi_def_clean}. $u_i=[s_i,v_i]^\top$: actuators (gap and speed); $\Delta u_i(t_n)=u_i(t_n)-u_i(t_{n-1})$: discrete input increment. $\Delta u_{i,n}(\tau;\Gamma_{i,n})$: within-interval increment trajectory; $\Gamma_{i,n}\in\mathbb{R}^{p}$: polynomial parameters ($p=6$). $Z_i$: neighbor set; $\Delta x_{Z_i}$: neighbor stack. $\mathcal{N}_i(\cdot)$: learned forward residual model; $\mathcal{B}_i(\cdot)$: backward model. $K$: segment length; $J,J_K$: numbers of one-step and segment samples. $N_p,N_c$: prediction and control horizons; $Q_i,R_i$: weights; $\epsilon,\varepsilon$: small constants.
这版做了你要的“全局一致性清理”,核心变化是:
如果你希望我继续把你原文里残留的编号(比如你文中引用的(40)-(43)、Fig编号)按你整篇论文的编号体系对齐,我也可以直接帮你把编号和交叉引用改成最终版。
我现在有两个问题,一个是多项式参数化是否一定正确,这么解释是否合理。另一个问题是还存不存在太模糊的问题。如果现在的不太好,你可以帮我把文章结构打乱重新写一个新的
你这两个问题都问到点子上了。结论先说清楚:
多项式参数化不是“一定正确”,它是一个控制参数化(control parameterization)的近似/工程选择;你现在的解释方向是合理的,但要想让老师“闭嘴”,必须把它成立的假设、它在你系统里怎么落地、以及不成立时怎么办写出来,而且要把“区间平均用于更新绝对输入”这块做一次更严谨的一致性处理(否则会被抓住)。
你现在的“全局一致性清理版”已经比原来清楚很多,但仍然可能被认为模糊的点主要集中在:
你说“如果现在不太好,能不能把结构打乱重写一个新的”,我建议你直接换成一种更“审稿友好”的结构:先定义闭环控制问题 → 再定义你把控制轨迹参数化成什么 → 再说怎么生成数据/训练可控模型 → 最后再说 DMPC/Nash 怎么解、怎么滚动实施。这样读者不会在第二章就迷路。
下面我分两部分给你:
可以这样写(核心句式):
老师很容易问一句:
你区间内给的是 ,那 到底是多少?你用平均更新 物理意义是什么?
最稳的做法是:区分两个量(用两个符号):
端点增量(决定下一个采样点的绝对输入):
并用它更新:
区间等效增量(用于离散模型/等效作用):
它用于“等效离散输入”或用于约束/代价中的能量度量,但不要拿它定义采样点上的端点值。
这样你就不会在“离散更新”和“区间内轨迹”之间自相矛盾。
最简单的加固方式(论文一句话):
这句不一定要给表格,但你后面实验如果能做一张消融(p=2、p=4、p=6),老师会非常服。
即使你符号解释清了,老师仍可能认为“模糊”的点一般是方法边界与工程细节没落地。建议你在新结构里显式写:
下面这份就是“打乱结构、重新写”的版本。它把逻辑改成:
你可以直接把下面整段复制替换你第二到第四章(章节号你自己按全文调整)。
latex%====================================================================== \section{Problem Formulation and Interval-Level Input Parameterization} %====================================================================== \subsection{Five-stand tandem mill, controlled variables, and references} Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$. Let $t_n$ be the sampling instants and $\delta_n=t_{n+1}-t_n$ the sampling interval length. The controlled physical quantities are: (i) exit thickness $h_i(t)$ of each stand $i$ ($i=1,\dots,5$); (ii) inter-stand tension $T_i(t)$ between stand $i$ and $i+1$ ($i=1,\dots,4$). Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ denote the desired references given by process requirements. They can be constant setpoints (regulation) or time-varying trajectories (tracking). Define deviation (tracking-error) variables \begin{equation} \Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad \Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t). \end{equation} Accordingly, the stand-wise local deviation state is chosen as \begin{equation} \Delta x_i(t)\triangleq \begin{bmatrix} \Delta h_i(t)\\ \Delta T_{i-1}(t)\\ \Delta T_i(t) \end{bmatrix}\in\mathbb{R}^{d},\qquad d=3, \label{eq:state_def_new} \end{equation} with boundary convention $\Delta T_0(t)\equiv 0$ and $\Delta T_5(t)\equiv 0$. The coupling is dominated by adjacent-tension propagation, therefore we use the neighbor sets \begin{equation} Z_1=\{2\},\quad Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad Z_5=\{4\}, \end{equation} and define the neighbor stack \begin{equation} \Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}. \end{equation} \subsection{Actuators and discrete-time decision variables} Each stand $i$ is manipulated by the roll gap (screw-down/hydraulic gap) $s_i(t)$ and the stand speed $v_i(t)$: \begin{equation} u_i(t)=\begin{bmatrix}s_i(t)\\ v_i(t)\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2. \label{eq:ui_new} \end{equation} To encourage smooth actuation and avoid abrupt changes, we optimize input increments at the sampling instants: \begin{equation} \Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1}) = \begin{bmatrix}\Delta s_i(t_n)\\ \Delta v_i(t_n)\end{bmatrix}. \label{eq:du_discrete_new} \end{equation} Note that $\Delta x$ denotes deviation-from-reference states, while $\Delta u$ denotes sample-to-sample input increments. \subsection{Interval-level input trajectory parameterization (assumption and justification)} Within each sampling interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$. We parameterize the within-interval input increment trajectory by a low-order polynomial: \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n}) = \Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,\qquad \tau\in[0,\delta_n], \label{eq:poly_new} \end{equation} where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and the stacked parameter vector is \begin{equation} \Gamma_{i,n}\triangleq \big[(\Gamma_{i,n0})^\top,(\Gamma_{i,n1})^\top,(\Gamma_{i,n2})^\top\big]^\top\in\mathbb{R}^{p}, \qquad p=3n_u=6. \end{equation} Component-wise, \begin{equation} \Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad \Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2. \end{equation} \paragraph{Remark on correctness and applicability.} The polynomial parameterization \eqref{eq:poly_new} is a control-parameterization choice rather than a first-principles truth. It is adopted because: (i) it provides a compact finite-dimensional decision representation; (ii) it yields smooth within-interval commands, which is consistent with typical industrial interpolation/ramps of gap and speed setpoints; (iii) it enables enforcing increment bounds for all $\tau\in[0,\delta_n]$. If the actual implementation is zero-order hold (ZOH), \eqref{eq:poly_new} reduces to a constant increment by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$, so the proposed framework still applies. \subsection{End-point increment versus interval-equivalent increment} To avoid ambiguity between the within-interval command and discrete-time updates, we distinguish: \textbf{End-point increment} (used to update the sampled input): \begin{equation} \Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n}). \label{eq:du_end} \end{equation} \textbf{Interval-equivalent (averaged) increment} (used as an equivalent discrete quantity when needed): \begin{equation} \overline{\Delta u}_i(t_n)\triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau = \Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}. \label{eq:du_avg} \end{equation} In online execution, the sampled input is updated by the end-point increment: \begin{equation} u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n). \label{eq:u_update_end} \end{equation} This definition is consistent with the within-interval trajectory \eqref{eq:poly_new} and avoids mixing ``average'' with ``end-point'' values. %====================================================================== \section{Dataset Construction for Learning One-step and Multi-step Dynamics} %====================================================================== \subsection{Data-driven one-step mapping to be learned} Over each interval $[t_n,t_{n+1}]$, the (unknown) deviation-state evolution can be written as a discrete mapping \begin{equation} \Delta x_i(t_{n+1}) = \Phi_i\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta d_i([t_n,t_{n+1}])\Big), \label{eq:unknown_map_new} \end{equation} where $\Phi_i(\cdot)$ is nonlinear and coupled due to tension propagation and rolling deformation interactions. Because accurate first-principles identification is difficult, we learn an approximation of \eqref{eq:unknown_map_new} from offline-simulated data. \subsection{Sampling domains and one-step sample generation} Let $\mathcal{I}_x$ denote the sampling domain of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$. Let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial parameters $\Gamma_{i,n}$. One interval-level sample is generated by: \begin{enumerate} \item Sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \item Sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and select $\delta_n$ from a prescribed range. \item Construct $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ via \eqref{eq:poly_new}. \item Integrate the \emph{five-stand coupled} mill model on $[t_n,t_{n+1}]$ (e.g., RK4) using the within-interval input trajectory, and record $\Delta x_i(t_{n+1})$. \end{enumerate} An interval sample is thus \begin{equation} \mathcal{D}_{i,n}=\Big\{\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta x_i(t_{n+1})\Big\}. \end{equation} Repeating this procedure yields the one-step dataset \begin{equation} S_i=\Big\{ \big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big) \ \Big|\ j=1,\ldots,J \Big\}, \end{equation} where $J$ is the number of one-step samples and the overall dataset is $\{S_i\}_{i=1}^{5}$. \subsection{Multi-step rollout segment dataset} For multi-step training objectives, we organize the offline-simulated samples into $K$-step segments. Starting from $t_n$, sample $\{\Gamma_{i,n+s},\delta_{n+s}\}_{s=0}^{K-1}$ consecutively, integrate the coupled mill model, and obtain $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ as well as $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$. Define the segment sample \begin{equation} \mathcal{W}_{i,n}= \Big\{ (\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1}; (\Delta x_i(t_{n+s+1}))_{s=0}^{K-1} \Big\}, \end{equation} and the segment dataset \begin{equation} S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}, \end{equation} where $J_K$ is the number of $K$-step segments. %====================================================================== \section{Residual Neural Surrogate Model and Training} %====================================================================== \subsection{Controlled residual predictor (must include control)} We learn a control-dependent one-step deviation-state predictor \begin{equation} \Delta \hat{x}_i(t_{n+1}) = \Delta x_i(t_n)+ \mathcal{N}_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i\Big). \label{eq:surrogate_new} \end{equation} Including $(\Gamma_{i,n},\delta_n)$ is necessary: without control input, the model degenerates to an autoregressive predictor and cannot be used inside MPC, because MPC must evaluate trajectories under different candidate decisions. Define the network input vector \begin{equation} X_{i,\mathrm{in}}= \big[ \Delta x_i(t_n)^\top,\Delta x_{Z_i}(t_n)^\top,\Gamma_{i,n}^\top,\delta_n \big]^\top \in\mathbb{R}^{d(1+|Z_i|)+p+1}, \end{equation} and $\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\to\mathbb{R}^{d}$. \subsection{Residual structure and auxiliary branch for varying $\delta_n$} Let $\hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}]$ extract the local state block from $X_{i,\mathrm{in}}$. The residual predictor can be written as \begin{equation} X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i), \end{equation} where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$. To enhance robustness to variable $\delta_n$, decompose \begin{equation} \mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i), \end{equation} where $\psi_i(\cdot)$ captures low-frequency/scale effects associated with $\delta_n$, and $\rho_i(\cdot)$ captures remaining nonlinear coupling corrections. \subsection{One-step target and multi-step training objectives} For one-step supervision, define the residual target \begin{equation} \Delta r_i^{(j)}=\Delta x_i^{(j)}(t_{n+1})-\Delta x_i^{(j)}(t_n). \end{equation} To suppress long-horizon drift, we further use $K$-step rollout loss and reciprocal-consistency regularization. Construct a backward residual model $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$ with the same input dimension. Forward rollout over a segment initializes $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$ and applies \eqref{eq:surrogate_new} recursively for $K$ steps. Backward rollout starts from $\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K})$ and rolls back using $\mathcal{B}_i$ along the same segment. Define reciprocal error \begin{equation} E_i(t_n)=\sum_{s=0}^{K}\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\|^2, \end{equation} and losses \begin{equation} \begin{aligned} L_{\mathrm{1step}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| (\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})) -\mathcal{N}_i(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i) \Big\|^2,\\ L_{\mathrm{bwd}}(\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \Big\| (\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})) -\mathcal{B}_i(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i) \Big\|^2,\\ L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}E_i^{(j)}(t_n),\\ L_{\mathrm{roll}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K} \|\Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s})\|^2. \end{aligned} \end{equation} The total loss is \begin{equation} L_{\mathrm{total}}=\lambda_1L_{\mathrm{1step}}+\lambda_2L_{\mathrm{roll}}+\lambda_3L_{\mathrm{msrp}}+\lambda_4L_{\mathrm{bwd}}, \end{equation} with $\lambda_\ell>0$ tuned on a validation set. Optimization uses Adam: \begin{equation} \Theta_{i,t+1} = \Theta_{i,t} - \alpha \frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}} + \varepsilon}, \end{equation} where $\alpha$ is the learning rate and $\varepsilon>0$ is a small constant. %====================================================================== \section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC for Thickness--Tension Control} %====================================================================== \subsection{Control objective, coupling conflict, and why Nash coordination} Because tensions $T_i$ are shared by stand $i$ and $i+1$ and are mainly affected by their speed actions, local optimization decisions can conflict: improving local thickness by changing roll gap or speed may worsen shared tensions of neighbors. To coordinate with limited communication and reduced computational burden, we employ a Nash-equilibrium-seeking distributed MPC. \subsection{Neural predictor as MPC model constraint (prediction serves control)} At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c\le N_p$. Given measured/estimated $\Delta x_i(t_n)$ and a candidate decision sequence $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$, multi-step prediction is obtained by recursively applying \begin{equation} \Delta \hat{x}_i(t_{n+s+1}) = \Delta \hat{x}_i(t_{n+s}) + \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}), \Gamma_{i,n+s},\delta_{n+s};\Theta_i^* \Big), \quad s=0,\ldots,N_p-1, \label{eq:mpc_rollout_new} \end{equation} with initialization $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$. Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from neighbors through communication during Nash iterations. Equation \eqref{eq:mpc_rollout_new} is the explicit interface: decisions $\Gamma\mapsto$ predicted thickness/tension deviations. \subsection{Local optimization problem (clear reference, objective, constraints)} Because $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$, the desired deviation reference is always \begin{equation} \Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0. \end{equation} Let $Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$. The local cost is \begin{equation} J_i = \sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2 + \sum_{s=0}^{N_c-1}\|\Gamma_{i,n+s}\|_{R_i}^2. \label{eq:local_cost_new} \end{equation} Constraints include: \textbf{Absolute input bounds} for roll gap and speed: \begin{equation} u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1. \end{equation} \textbf{Increment trajectory bounds} over the whole interval: \begin{equation} \Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max}, \qquad \forall \tau\in[0,\delta_{n+s}]. \end{equation} For quadratic trajectories, these bounds are enforced by checking $\tau=0$, $\tau=\delta_{n+s}$, and the stationary point $\tau^\star=-b/(2c)$ (separately for each channel) when $\tau^\star\in[0,\delta_{n+s}]$. \textbf{Discrete-time input propagation} is performed using end-point increment \eqref{eq:du_end}: \begin{equation} u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}), \qquad \Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}). \end{equation} At Nash iteration index $l$, stand $i$ solves the differentiable NLP: \begin{equation} \mathbf{\Gamma}_i^{(l)}= \arg\min_{\mathbf{\Gamma}_i}\ J_i \quad\text{s.t.}\quad \eqref{eq:mpc_rollout_new}\ \text{and all constraints above}. \end{equation} Because $\mathcal{N}_i(\cdot)$ is differentiable, this NLP can be solved using gradient-based methods (SQP/interior-point) with automatic differentiation. \subsection{Distributed Nash best-response iteration and practical termination} The Nash best-response iteration proceeds as: (A) initialize $\mathbf{\Gamma}_i^{(0)}$ (warm start from previous time $t_{n-1}$); (B) rollout predictions using \eqref{eq:mpc_rollout_new}; (C) solve local NLP to update $\mathbf{\Gamma}_i^{(l)}$; (D) broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors; (E) update neighbor predictions; (F) compute \begin{equation} \varsigma^{(l)}= \max_i\frac{\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\|_2}{\|\mathbf{\Gamma}_i^{(l-1)}\|_2+\epsilon}, \end{equation} where $\epsilon>0$ avoids division by zero; (G) stop if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$. \paragraph{Practical safeguards.} Because best-response iterations may not be globally convergent for general nonlinear coupled problems, we set a maximum iteration number $L_{\max}$. If the stopping criterion is not met within $L_{\max}$, we apply the last iterate $\mathbf{\Gamma}_i^{(L_{\max})}$. Optionally, a relaxation update can be used: $\mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)}$ with $\beta\in(0,1]$ to enhance robustness. \subsection{Receding-horizon closed-loop implementation (complete prediction--control loop)} At each sampling time $t_n$: (i) measure/estimate $\Delta x_i(t_n)$ and receive neighbor information; (ii) run Nash best-response iterations to obtain $\Gamma_{i,n}^*$; (iii) apply the first-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$; (iv) update the sampled input by $u_i(t_{n+1})=u_i(t_n)+\Delta u_{i}^{\mathrm{end}}(t_n)$; (v) shift the horizon to $t_{n+1}$ and repeat. In this way, the learned predictor provides multi-step forecasts for MPC, and the distributed optimization computes coordinated roll-gap/speed commands that regulate thickness and tensions while resolving coupling conflicts via Nash equilibrium.
还可能被认为模糊的点主要是工程落地/边界条件:
如果你愿意,我还可以把你现有全文的“式号/图号/表号引用”按你的论文编号体系彻底对齐(比如你原来写的(40)-(43)到底对应哪几个约束),并顺便把“边界张力 ”在代价/约束里如何处理写成最不挨骂的版本。
你重新写的这版写的太少了,跟我原来的内容比太简洁了,要写的详细,思路还可以
latex%====================================================================== % DETAILED REWRITE (NEW STRUCTURE, FULLY EXPLAINED) % Five-stand tandem cold rolling mill % Controlled: thickness + inter-stand tension % Actuators: roll gap + stand speed % Notation: Δx = deviation-from-reference (tracking error) % Δu = sample-to-sample increment (discrete input increment) % Within-interval increment trajectory is parameterized (quadratic) % Surrogate: Residual NN with control-dependent input % Training: one-step + multi-step rollout + forward/backward reciprocal consistency % Control: Nash-equilibrium-based distributed MPC (RNE-DMPC) %====================================================================== %======================== \section{Problem Setting and Notation} %======================== \subsection{Five-stand tandem cold rolling mill and controlled variables} Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$. Let $t_n$ denote the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n>0$ the sampling interval length. The controlled physical variables in this paper are: (i) exit thickness of each stand, $h_i(t)$ for $i=1,\dots,5$; and (ii) inter-stand strip tension between stand $i$ and $i+1$, denoted by $T_i(t)$ for $i=1,\dots,4$. The five-stand system exhibits strong coupling primarily through tension propagation: changes of speed and deformation at one stand influence the strip transport and elongation, thereby affecting the tensions of neighboring stands, which in turn affect thickness through rolling force and material flow coupling. This motivates a coupled prediction model and a coordinated (distributed) control strategy. \subsection{References and deviation-state definition (fixing the meaning of $\Delta x$)} Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ be reference trajectories/setpoints determined by the production schedule (or constant setpoints for regulation). Define deviation (tracking-error) variables: \begin{equation} \Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad \Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t). \label{eq:dev_def_detailed} \end{equation} \textbf{Convention (fixed throughout the paper):} the symbol ``$\Delta$'' attached to \emph{states} always denotes deviation from reference (tracking error). Thus, the control objective in deviation coordinates is always $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$. \subsection{Local state vector and boundary handling (where thickness/tension appear)} For each stand $i$, we choose a local deviation state vector containing the stand thickness deviation and adjacent tension deviations: \begin{equation} \Delta x_i(t)\triangleq \begin{bmatrix} \Delta h_i(t)\\ \Delta T_{i-1}(t)\\ \Delta T_i(t) \end{bmatrix}\in\mathbb{R}^{d},\qquad d=3. \label{eq:local_state_detailed} \end{equation} To maintain a unified dimension across stands, we adopt the boundary convention \begin{equation} \Delta T_0(t)\equiv 0,\qquad \Delta T_5(t)\equiv 0. \label{eq:boundary_tension_detailed} \end{equation} In the cost and constraint design, the boundary (virtual) components can be assigned zero weights so that they do not influence optimization. \subsection{Neighbor sets and coupling information (five-stand chain)} In a tandem mill, the strongest coupling is between adjacent stands via inter-stand tensions, hence we define the neighbor sets: \begin{equation} Z_1=\{2\},\quad Z_i=\{i-1,i+1\}\ \ (i=2,3,4),\quad Z_5=\{4\}. \label{eq:neighbor_set_detailed} \end{equation} Define the neighbor-state stack (column concatenation) as \begin{equation} \Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}. \label{eq:neighbor_stack_detailed} \end{equation} This neighbor stack will be used as an explicit input to the learned predictor and to the distributed MPC coordination. \subsection{Actuators and input increments (fixing the meaning of $u$ and $\Delta u$)} Each stand is manipulated by two actuators: roll gap (screw-down/hydraulic gap) $s_i(t)$ and stand speed $v_i(t)$. Define the input vector \begin{equation} u_i(t)= \begin{bmatrix} s_i(t)\\ v_i(t) \end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2. \label{eq:ui_detailed} \end{equation} \textbf{Discrete input increment (fixed throughout the paper):} the symbol ``$\Delta$'' attached to \emph{inputs} denotes a sample-to-sample increment: \begin{equation} \Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1}) = \begin{bmatrix} \Delta s_i(t_n)\\ \Delta v_i(t_n) \end{bmatrix}. \label{eq:du_discrete_detailed} \end{equation} Thus, $\Delta x$ and $\Delta u$ have different meanings by definition: $\Delta x$ is a deviation-from-reference \emph{state}, while $\Delta u$ is a discrete-time \emph{input increment}. \subsection{Disturbance notation} Let $d_i(t)$ denote exogenous disturbances affecting subsystem $i$ (e.g., entry thickness fluctuation, friction variation, material property drift). We use $\Delta d_i([t_n,t_{n+1}])$ to denote the disturbance signal over the interval, and define an interval-equivalent disturbance (average) later. \subsection{Additional basic notation and dimensions} $I_d$ denotes the $d\times d$ identity matrix; $0_{a\times b}$ denotes an $a\times b$ zero matrix. For a vector $z$, $\|z\|_{Q}^{2}\triangleq z^\top Q z$. %======================== \section{Interval-Level Input Parameterization and Its Validity} %======================== \subsection{Why parameterize within-interval input trajectories?} Although supervisory controllers update commands at discrete sampling instants, the physical actuation and underlying drive/hydraulic loops evolve continuously. In cold rolling, abrupt gap/speed changes can excite tension oscillations and degrade thickness stability. Moreover, in many industrial implementations, setpoints are interpolated (ramps/filters) within each sampling interval to ensure smoothness. Therefore, describing the within-interval increment trajectory using a low-order basis has three benefits: \begin{itemize} \item \textbf{Finite-dimensional decision variables:} compress continuous-time command profiles into a small set of coefficients for online optimization. \item \textbf{Smoothness by construction:} avoid discontinuous commands inside the interval, improving closed-loop robustness. \item \textbf{Whole-interval constraint enforcement:} bounds can be imposed for all $\tau\in[0,\delta_n]$, not only at sampling instants. \end{itemize} \subsection{Quadratic polynomial parameterization (two-input vector form)} Within interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$. We parameterize the input increment trajectory by a vector quadratic polynomial: \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n}) = \Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2,\qquad \tau\in[0,\delta_n], \label{eq:du_poly_vector_detailed} \end{equation} where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and $n_u=2$. Component-wise: \begin{equation} \Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad \Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2. \label{eq:du_poly_component_detailed} \end{equation} Define the stacked coefficient vector: \begin{equation} \Gamma_{i,n}\triangleq \big[ (\Gamma_{i,n0})^\top,\, (\Gamma_{i,n1})^\top,\, (\Gamma_{i,n2})^\top \big]^\top \in\mathbb{R}^{p},\qquad p=3n_u=6. \label{eq:Gamma_dim_detailed} \end{equation} \subsection{Is polynomial parameterization ``certainly correct''? (explicit assumptions and fallback)} \begin{remark}[Correctness, applicability, and fallback] The parameterization \eqref{eq:du_poly_vector_detailed} is a \emph{control parameterization} choice, not a first-principles identity. It is reasonable under the following practical assumptions: \begin{enumerate} \item \textbf{Within-interval smooth implementation:} the actuator setpoints (gap and speed) are implemented with interpolation/ramps/filters, so the actual increments inside the interval can be approximated by a low-order smooth function. \item \textbf{Sampling interval not excessively large:} $\delta_n$ is not too large relative to actuator bandwidth, so low-order polynomials can capture the dominant within-interval profile. \item \textbf{Model-consistent implementation:} the same interpolation/command-generation logic used in offline data generation is used online, ensuring that the learned surrogate matches the executed control. \end{enumerate} If the real system uses zero-order-hold (ZOH) increments inside each interval, then \eqref{eq:du_poly_vector_detailed} reduces to the ZOH case by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$. Therefore, the proposed framework subsumes ZOH as a special case. More complex profiles (e.g., piecewise-linear or spline) can be adopted if needed by increasing basis richness, at the cost of more decision variables. \end{remark} \subsection{End-point increment versus interval-equivalent increment (removing ambiguity)} To avoid ambiguity between the \emph{continuous-time} within-interval command and the \emph{discrete-time} input update, we define two distinct quantities. \paragraph{1) End-point increment (used for discrete input update).} Define \begin{equation} \Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n}) = \Gamma_{i,n0}+\Gamma_{i,n1}\delta_n+\Gamma_{i,n2}\delta_n^2. \label{eq:du_end_detailed} \end{equation} This quantity determines the sampled input at the next sampling instant via \begin{equation} u_i(t_{n+1}) = u_i(t_n) + \Delta u_i^{\mathrm{end}}(t_n). \label{eq:u_update_end_detailed} \end{equation} \paragraph{2) Interval-equivalent (averaged) increment (used as equivalent discrete effect when needed).} Define the interval average \begin{equation} \overline{\Delta u}_i(t_n)\triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau = \Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}. \label{eq:du_avg_detailed} \end{equation} The average $\overline{\Delta u}_i(t_n)$ can be used as an equivalent discrete quantity when one needs to represent the average within-interval actuation effect, or when building consistency between continuous trajectories and discrete approximations. Importantly, we do \emph{not} use the average to define the sampled end-point update; the end-point update is governed by \eqref{eq:du_end_detailed}--\eqref{eq:u_update_end_detailed}. %======================== \section{Data-Driven Dynamics: Dataset Construction} %======================== \subsection{Unknown coupled interval mapping to be approximated} The five-stand coupled deviation-state evolution over $[t_n,t_{n+1}]$ can be represented by an unknown nonlinear mapping: \begin{equation} \Delta x_i(t_{n+1}) = \Phi_i\Big( \Delta x_i(t_n),\,\Delta x_{Z_i}(t_n),\, \Gamma_{i,n},\,\delta_n,\, \Delta d_i([t_n,t_{n+1}]) \Big), \label{eq:true_unknown_mapping_detailed} \end{equation} where coupling enters through $\Delta x_{Z_i}(t_n)$ and through the fact that the underlying physics is a coupled five-stand process. A conceptual equivalent linear discrete form is often written as \begin{equation} \Delta x_i(t_{n+1}) = M_d\,\Delta x_i(t_n) + N_d\,\Delta u_i(t_n) + F_d\,\Delta d_i(t_n), \label{eq:conceptual_linear_detailed} \end{equation} but accurate identification of $(M_d,N_d,F_d)$ is difficult in practical cold rolling due to strong nonlinearities and varying operating conditions. Therefore, we learn a nonlinear surrogate for \eqref{eq:true_unknown_mapping_detailed} from offline data. \subsection{Sampling domains and disturbance averaging} Let $\mathcal{I}_x$ denote the sampling domain (ranges) of deviation states $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$ used for offline data generation. Let $\mathcal{I}_\Gamma$ denote the sampling domain of polynomial coefficients $\Gamma_{i,n}$ for both gap and speed channels. For disturbances, define an interval-equivalent disturbance (average) as \begin{equation} \Delta d_i(t_n) \triangleq \frac{1}{\delta_n}\int_{0}^{\delta_n}\Delta d_i(\tau)\,d\tau, \label{eq:dist_avg_detailed} \end{equation} where $\Delta d_i(\tau)$ denotes the disturbance signal expressed in deviation form during the interval. \subsection{One-step sample generation (five-stand coupled simulation)} One training sample is generated per interval $[t_n,t_{n+1}]$ via the following steps: \begin{enumerate} \item \textbf{State sampling:} sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$. \item \textbf{Parameter sampling:} sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and choose $\delta_n$ from a prescribed range. \item \textbf{Construct within-interval increment trajectory:} compute $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ by \eqref{eq:du_poly_vector_detailed}. \item \textbf{Propagate coupled system:} integrate the \emph{five-stand coupled} rolling model on $[t_n,t_{n+1}]$ (e.g., Runge--Kutta 4) under the within-interval input trajectory, and record the resulting $\Delta x_i(t_{n+1})$. \end{enumerate} Thus the interval-level sample can be written as \begin{equation} \mathcal{D}_{i,n}= \Big( \Delta x_i(t_n),\ \Delta x_{Z_i}(t_n),\ \Gamma_{i,n},\ \delta_n,\ \Delta x_i(t_{n+1}) \Big). \label{eq:interval_sample_detailed} \end{equation} Repeating over many intervals yields the one-step dataset for subsystem $i$: \begin{equation} S_i=\Big\{ \big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big) \ \Big|\ j=1,\ldots,J \Big\}, \label{eq:one_step_dataset_detailed} \end{equation} where $J$ is the number of one-step samples. The overall dataset is $\{S_i\}_{i=1}^{5}$. \subsection{$K$-step segment dataset for multi-step training} One-step regression alone can lead to drift under long-horizon recursion, which is undesirable for MPC. Therefore, we also construct $K$-step segments to support multi-step rollout training and reciprocal-consistency regularization. Starting from $t_n$, generate a sequence $\{(\Gamma_{i,n+s},\delta_{n+s})\}_{s=0}^{K-1}$, simulate the coupled five-stand model across $K$ consecutive intervals, and record the deviation-state trajectory $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ and neighbor stacks $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$. Define the $K$-step segment sample as \begin{equation} \mathcal{W}_{i,n}= \Big\{ (\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1}; (\Delta x_i(t_{n+s+1}))_{s=0}^{K-1} \Big\}. \label{eq:segment_sample_detailed} \end{equation} Collecting $J_K$ such segments yields \begin{equation} S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}, \label{eq:segment_dataset_detailed} \end{equation} where $K$ is the segment length and $J_K$ is the number of segments. %======================== \section{Residual Neural Surrogate Model} %======================== \subsection{What is learned: control-dependent one-step deviation dynamics} The surrogate aims to approximate \eqref{eq:true_unknown_mapping_detailed} in a form suitable for MPC. For subsystem $i$, define a control-dependent residual predictor: \begin{equation} \Delta \hat{x}_i(t_{n+1}) = \Delta x_i(t_n)+ \mathcal{N}_i\!\Big( \Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i \Big), \label{eq:forward_predictor_detailed} \end{equation} where $\Theta_i$ are trainable parameters and $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change. \paragraph{Why the input must include control (with $u$ vs without $u$).} If the network does not include $(\Gamma_{i,n},\delta_n)$, it reduces to an autoregressive model that reproduces trajectories under the training input patterns only. MPC requires evaluating predicted trajectories under \emph{candidate} control decisions, therefore a control-dependent predictor \eqref{eq:forward_predictor_detailed} is necessary. \subsection{Network input vector and dimensions} Let $d=3$ and $p=6$. Define the input vector \begin{equation} X_{i,\mathrm{in}} \triangleq \big[ \Delta x_i(t_n)^\top,\ \Delta x_{Z_i}(t_n)^\top,\ \Gamma_{i,n}^\top,\ \delta_n \big]^\top \in\mathbb{R}^{d(1+|Z_i|)+p+1}. \label{eq:X_in_detailed} \end{equation} Then \begin{equation} \mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}. \end{equation} \subsection{Residual (shortcut) structure and its motivation} To incorporate a persistence prior and improve long-horizon stability, we use a residual structure. Define a selection matrix \begin{equation} \hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}]\in\mathbb{R}^{d\times(d(1+|Z_i|)+p+1)}, \label{eq:Ihat_detailed} \end{equation} so that $\hat{I}_i X_{i,\mathrm{in}}=\Delta x_i(t_n)$. The residual predictor can then be written as \begin{equation} X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i), \label{eq:res_form_detailed} \end{equation} where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$. \begin{remark}[Interpretation] Equation \eqref{eq:res_form_detailed} has a baseline-plus-correction form: the shortcut term propagates the current deviation state, and the network learns the correction capturing nonlinear rolling effects and coupling through neighbors. This improves optimization stability because the model only needs to learn the incremental change rather than the full next-state mapping. \end{remark} \subsection{Auxiliary decomposition for variable sampling intervals $\delta_n$ (no notation conflicts)} To enhance robustness under variable $\delta_n$, we decompose \begin{equation} \mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i), \label{eq:aux_decomp_detailed} \end{equation} where $\psi_i(\cdot)$ is a lightweight branch intended to capture low-frequency/scale effects correlated with $\delta_n$, and $\rho_i(\cdot)$ captures remaining nonlinear coupling corrections. This decomposition avoids symbol conflicts with sampling sets (e.g., $\mathcal{I}_x,\mathcal{I}_\Gamma$) and optimizer learning rate. %======================== \section{Training: One-step Accuracy, Multi-step Rollout, and Reciprocal Consistency} %======================== \subsection{One-step supervised targets} For a one-step sample $(\Delta x_i(t_n),\Delta x_i(t_{n+1}))$, define the residual target \begin{equation} \Delta r_i(t_n)\triangleq \Delta x_i(t_{n+1})-\Delta x_i(t_n). \label{eq:res_target_detailed} \end{equation} For sample $j$, the network input is \begin{equation} X_{i,\mathrm{in}}^{(j)}= \big[ \Delta x_i^{(j)}(t_n)^\top,\ \Delta x_{Z_i}^{(j)}(t_n)^\top,\ \Gamma_{i,n}^{(j)\top},\ \delta_n^{(j)} \big]^\top. \end{equation} \subsection{Forward rollout over a $K$-step segment} Given $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize \begin{equation} \Delta \hat{x}_i(t_n)=\Delta x_i(t_n), \end{equation} and recursively roll forward: \begin{equation} \Delta \hat{x}_i(t_{n+s+1}) = \Delta \hat{x}_i(t_{n+s})+ \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}), \Gamma_{i,n+s},\delta_{n+s};\Theta_i \Big), \quad s=0,\ldots,K-1. \label{eq:fwd_rollout_detailed} \end{equation} \subsection{Backward model and backward rollout (reciprocal consistency)} To regularize long-horizon behavior, we introduce a backward residual model \begin{equation} \mathcal{B}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\rightarrow\mathbb{R}^{d}, \end{equation} parameterized by $\bar{\Theta}_i$. Define backward input at step $s$ as \begin{equation} X_{i,\mathrm{in}}^{b}(t_{n+s}) = \big[ \Delta \bar{x}_i(t_{n+s+1})^\top,\ \Delta \hat{x}_{Z_i}(t_{n+s+1})^\top,\ \Gamma_{i,n+s}^\top,\ \delta_{n+s} \big]^\top, \end{equation} where $\Delta \hat{x}_{Z_i}$ is taken from the forward rollout. Set the terminal condition \begin{equation} \Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K}), \end{equation} and roll backward: \begin{equation} \Delta \bar{x}_i(t_{n+s}) = \Delta \bar{x}_i(t_{n+s+1}) + \mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\bar{\Theta}_i\Big), \quad s=K-1,\ldots,0, \label{eq:bwd_rollout_detailed} \end{equation} where $\mathcal{B}_i(\cdot)$ outputs a backward residual correction that reconstructs the earlier state. The supervised backward residual target for one step is \begin{equation} \Delta r_i^{b}(t_n)\triangleq \Delta x_i(t_n)-\Delta x_i(t_{n+1}). \end{equation} \subsection{Loss functions (with explicit motivations)} Define the reciprocal prediction error over a segment: \begin{equation} E_i(t_n)=\sum_{s=0}^{K}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\right\|^2. \label{eq:recip_error_detailed} \end{equation} \paragraph{1) One-step residual loss (local accuracy).} \begin{equation} L_{\mathrm{1step}}(\Theta_i) = \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \left\| \big(\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})\big) -\mathcal{N}_i\!\left(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i\right) \right\|^2. \label{eq:L1_detailed} \end{equation} \paragraph{2) Multi-step rollout loss (suppress drift under recursion).} \begin{equation} L_{\mathrm{roll}}(\Theta_i) = \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K} \left\| \Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s}) \right\|^2. \label{eq:Lroll_detailed} \end{equation} \paragraph{3) Backward one-step loss (train backward model for consistency).} \begin{equation} L_{\mathrm{bwd}}(\bar{\Theta}_i) = \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \left\| \big(\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})\big) -\mathcal{B}_i\!\left(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i\right) \right\|^2. \label{eq:Lbwd_detailed} \end{equation} \paragraph{4) Multi-step reciprocal consistency loss (structural regularization).} \begin{equation} L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i) = \frac{1}{J_K}\sum_{j=1}^{J_K} E_i^{(j)}(t_n). \label{eq:Lmsrp_detailed} \end{equation} \paragraph{Total loss and training.} \begin{equation} L_{\mathrm{total}}=\lambda_1 L_{\mathrm{1step}}+\lambda_2 L_{\mathrm{roll}}+\lambda_3 L_{\mathrm{msrp}}+\lambda_4 L_{\mathrm{bwd}}, \label{eq:Ltotal_detailed} \end{equation} where $\lambda_1,\lambda_2,\lambda_3,\lambda_4>0$ are tuned on a validation set. Parameters are optimized by Adam: \begin{equation} \Theta_{i,t+1}=\Theta_{i,t}-\alpha\frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}}+\varepsilon}, \end{equation} where $\alpha$ is learning rate, $\hat{m}_{i,t}$ and $\hat{v}_{i,t}$ are bias-corrected first/second moments, and $\varepsilon>0$ is a small constant. \subsection{Training algorithm (explicit procedural description)} \begin{algorithm}[t] \caption{Offline training of forward/backward residual models for subsystem $i$} \label{alg:train_detailed} \small \begin{algorithmic}[1] \STATE \textbf{Input:} one-step dataset $S_i$, segment dataset $S_i^{(K)}$, horizons $K$, weights $\lambda_1\dots\lambda_4$ \STATE Initialize forward parameters $\Theta_i$ and backward parameters $\bar{\Theta}_i$ \FOR{each epoch} \STATE Sample a mini-batch of segments $\{\mathcal{W}_{i,n}\}$ from $S_i^{(K)}$ \STATE For each segment, perform forward rollout using \eqref{eq:fwd_rollout_detailed} to obtain $\{\Delta\hat{x}_i(t_{n+s})\}_{s=0}^{K}$ \STATE Perform backward rollout using \eqref{eq:bwd_rollout_detailed} to obtain $\{\Delta\bar{x}_i(t_{n+s})\}_{s=0}^{K}$ \STATE Compute $L_{\mathrm{1step}},L_{\mathrm{roll}},L_{\mathrm{bwd}},L_{\mathrm{msrp}}$ via \eqref{eq:L1_detailed}--\eqref{eq:Lmsrp_detailed} \STATE Update $(\Theta_i,\bar{\Theta}_i)$ by Adam to minimize $L_{\mathrm{total}}$ in \eqref{eq:Ltotal_detailed} \ENDFOR \STATE \textbf{Output:} trained forward model $\mathcal{N}_i(\cdot;\Theta_i^*)$ and backward model $\mathcal{B}_i(\cdot;\bar{\Theta}_i^*)$ \end{algorithmic} \end{algorithm} %======================== \section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC Using the Learned Surrogate} %======================== \subsection{Why distributed Nash coordination (explicit conflict source)} In a five-stand mill, tensions $T_i$ are shared coupling variables influenced by both stand $i$ and stand $i+1$ (mainly via speed actions), and thickness is strongly affected by roll gap while also interacting with tension through deformation and transport coupling. Hence, a purely local action that improves $\Delta h_i$ may worsen a shared tension $\Delta T_i$ for a neighbor, and vice versa. This induces an intrinsic multi-agent coupling conflict, motivating a coordination mechanism. We adopt a Nash-equilibrium-seeking distributed MPC (RNE-DMPC) where each stand solves a local MPC problem and iterates best responses. \subsection{Prediction model inside MPC (explicit prediction--control interface)} At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c$ with $N_c\le N_p$. Let $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$. Given candidate decisions $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$, the surrogate provides the multi-step prediction: \begin{equation} \Delta \hat{x}_i(t_{n+s+1}) = \Delta \hat{x}_i(t_{n+s})+ \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}), \Gamma_{i,n+s},\delta_{n+s};\Theta_i^* \Big), \quad s=0,\ldots,N_p-1. \label{eq:mpc_rollout_detailed} \end{equation} Here $\Delta \hat{x}_{Z_i}(t_{n+s})$ is provided by neighbor communication during Nash iterations. Equation \eqref{eq:mpc_rollout_detailed} makes the dependency \emph{explicit}: candidate control parameters $\Gamma_{i,n+s}$ change the predicted thickness/tension deviations, enabling online optimization. \subsection{Decision variables and reference meaning (no ambiguity)} \paragraph{Decision variables.} The local decision vector is the stacked parameter sequence over the control horizon: \begin{equation} \mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\Gamma_{i,n+1}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top\in\mathbb{R}^{pN_c}. \label{eq:Gamma_stack_detailed} \end{equation} \paragraph{Reference in deviation coordinates.} Because $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ by definition, the desired deviation reference is always \begin{equation} \Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}. \label{eq:dxref_zero_detailed} \end{equation} Thus, the MPC objective penalizes predicted deviation states directly. \subsection{Local objective function (explicit thickness+tension weighting)} Let $\Delta \hat{x}_i(t_{n+s})=[\Delta\hat{h}_i(t_{n+s}),\,\Delta\widehat{T}_{i-1}(t_{n+s}),\,\Delta\widehat{T}_{i}(t_{n+s})]^\top$. Choose $Q_i\in\mathbb{R}^{d\times d}$ and $R_i\in\mathbb{R}^{p\times p}$. A detailed and interpretable weighting choice is to separate thickness and tension weights, e.g., \begin{equation} Q_i=\mathrm{diag}(q_{h,i},\,q_{T,i-1},\,q_{T,i}), \label{eq:Qi_diag_detailed} \end{equation} where $q_{h,i}$ penalizes thickness deviation and $q_{T,i-1},q_{T,i}$ penalize adjacent tension deviations. For boundary tensions, set the corresponding weights to zero (e.g., $q_{T,0}=q_{T,5}=0$). The local cost is defined as \begin{equation} J_i= \sum_{s=1}^{N_p}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta x_{i,\mathrm{ref}}(t_{n+s})\right\|_{Q_i}^2 + \sum_{s=0}^{N_c-1}\left\|\Gamma_{i,n+s}\right\|_{R_i}^2. \label{eq:Ji_detailed} \end{equation} Using \eqref{eq:dxref_zero_detailed}, the first term becomes $\sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2$, which explicitly penalizes predicted thickness and tension deviations. \subsection{Constraints: absolute bounds, increment bounds over the entire interval, and propagation} \paragraph{1) Absolute input bounds (gap and speed).} \begin{equation} u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1, \label{eq:u_abs_detailed} \end{equation} where $u_{i,\min},u_{i,\max}\in\mathbb{R}^{2}$ specify component-wise bounds for $(s_i,v_i)$. \paragraph{2) Increment trajectory bounds for all $\tau$ within each interval.} \begin{equation} \Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max}, \qquad \forall \tau\in[0,\delta_{n+s}], \label{eq:du_traj_detailed} \end{equation} where $\Delta u_{i,\min},\Delta u_{i,\max}\in\mathbb{R}^{2}$ specify component-wise bounds for $(\Delta s,\Delta v)$. \paragraph{Practical enforcement of \eqref{eq:du_traj_detailed}.} For each scalar channel $q(\tau)=a+b\tau+c\tau^2$ on $[0,\delta]$, extrema occur at $\tau=0$, $\tau=\delta$, and possibly at $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$. Therefore, to enforce \eqref{eq:du_traj_detailed}, we check these candidate points separately for $\Delta s_{i,n+s}(\tau)$ and $\Delta v_{i,n+s}(\tau)$ using coefficients in \eqref{eq:du_poly_component_detailed}. \paragraph{3) Discrete-time input propagation consistent with the within-interval trajectory.} We propagate the sampled input using the end-point increment: \begin{equation} u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}), \qquad \Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}). \label{eq:u_prop_detailed} \end{equation} This ensures consistency between the within-interval increment trajectory and the sampled input sequence, and supports enforcing \eqref{eq:u_abs_detailed}. \subsection{Local NLP solved at each Nash iteration (explicit statement)} At Nash iteration index $l$, subsystem $i$ solves the differentiable nonlinear program: \begin{equation} \mathbf{\Gamma}_i^{(l)} = \arg\min_{\mathbf{\Gamma}_i}\ J_i \quad\text{s.t.}\quad \eqref{eq:mpc_rollout_detailed},\ \eqref{eq:u_abs_detailed},\ \eqref{eq:du_traj_detailed},\ \eqref{eq:u_prop_detailed}. \label{eq:local_nlp_detailed} \end{equation} Because the surrogate $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_nlp_detailed} can be solved via gradient-based NLP solvers (e.g., SQP or interior-point methods) with automatic differentiation. \subsection{Nash best-response coordination (detailed algorithm and safeguards)} \paragraph{Best-response update.} Given neighbor strategies $\mathbf{\Gamma}_{Z_i}$ (received from neighbors) and neighbor predicted trajectories, subsystem $i$ computes its best response by solving \eqref{eq:local_nlp_detailed}. The Nash iteration repeats until convergence. \paragraph{Convergence metric.} Define the relative change \begin{equation} \varsigma^{(l)}= \max_i \frac{\left\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\right\|_2} {\left\|\mathbf{\Gamma}_i^{(l-1)}\right\|_2+\epsilon}, \label{eq:nash_metric_detailed} \end{equation} where $\epsilon>0$ avoids division by zero. Stop if $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$. \paragraph{Practical safeguards (important for avoiding ``too ideal'' criticism).} Best-response iterations are not globally guaranteed to converge for general nonlinear coupled problems. Therefore, we set a maximum iteration number $L_{\max}$ and optionally use relaxation: \begin{equation} \mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)},\qquad \beta\in(0,1]. \label{eq:relax_detailed} \end{equation} If convergence is not reached within $L_{\max}$, we apply the last iterate $\mathbf{\Gamma}_i^{(L_{\max})}$. \begin{algorithm}[t] \caption{Online RNE-DMPC at sampling time $t_n$ (five-stand)} \label{alg:online_detailed} \small \begin{algorithmic}[1] \STATE \textbf{Input:} current measurements/estimates $\{\Delta x_i(t_n)\}_{i=1}^5$, previous solution $\{\mathbf{\Gamma}_i(t_{n-1})\}$ (for warm start) \STATE Initialize $\mathbf{\Gamma}_i^{(0)}$ (warm-start), set $l\gets 1$ \REPEAT \STATE Each stand predicts $\{\Delta\hat{x}_i^{(l)}(t_{n+s})\}$ using \eqref{eq:mpc_rollout_detailed} with neighbor predictions from iteration $l-1$ \STATE Each stand solves its local NLP \eqref{eq:local_nlp_detailed} to obtain best response $\mathbf{\Gamma}_i^{(l)}$ \STATE Broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors \STATE (Optional) apply relaxation \eqref{eq:relax_detailed} \STATE Compute $\varsigma^{(l)}$ via \eqref{eq:nash_metric_detailed}, set $l\gets l+1$ \UNTIL{$\varsigma^{(l-1)}\le \varsigma_{\mathrm{tol}}$ or $l-1=L_{\max}$} \STATE Set $\Gamma_{i,n}^*$ to the first block of $\mathbf{\Gamma}_i^{(l-1)}$ \STATE Apply within-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$ \STATE Update sampled input by $u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n)$ using \eqref{eq:du_end_detailed}--\eqref{eq:u_update_end_detailed} \STATE Shift horizon: $n\gets n+1$ \end{algorithmic} \end{algorithm} \subsection{Closed-loop prediction--control connection (explicit end-to-end description)} At each sampling time $t_n$, the closed-loop operation is: \begin{enumerate} \item Measure/estimate $\Delta h_i(t_n)$ and $\Delta T_i(t_n)$, form $\Delta x_i(t_n)$ and neighbor stacks $\Delta x_{Z_i}(t_n)$. \item Run Nash best-response iterations (Algorithm~\ref{alg:online_detailed}): each stand evaluates candidate decisions through multi-step prediction \eqref{eq:mpc_rollout_detailed} and solves \eqref{eq:local_nlp_detailed}. \item After termination, apply the first-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ to the actuators within $[t_n,t_{n+1}]$. \item Update the sampled input using the end-point increment so that $u_i(t_{n+1})$ is consistent with the within-interval command. \item Roll the horizon forward and repeat. \end{enumerate} In this way, the neural surrogate provides the control-dependent multi-step predictions required by MPC, and the distributed Nash coordination resolves coupling conflicts induced by shared tensions in the five-stand mill. %======================== \section{Symbol Summary (to eliminate any remaining ambiguity)} %======================== \paragraph{Indices and time.} $i\in\{1,\dots,5\}$: stand index; $t_n$: sampling instant; $\delta_n=t_{n+1}-t_n$: sampling interval length; $\tau\in[0,\delta_n]$: within-interval time. \paragraph{Physical variables and references.} $h_i(t)$: exit thickness of stand $i$; $T_i(t)$: inter-stand tension between stand $i$ and $i+1$ ($i=1,\dots,4$). $h_i^{\mathrm{ref}}(t),T_i^{\mathrm{ref}}(t)$: reference trajectories (setpoints). \paragraph{Deviation state.} $\Delta h_i=h_i-h_i^{\mathrm{ref}}$; $\Delta T_i=T_i-T_i^{\mathrm{ref}}$; $\Delta x_i=[\Delta h_i,\Delta T_{i-1},\Delta T_i]^\top\in\mathbb{R}^3$. Boundary: $\Delta T_0\equiv 0,\Delta T_5\equiv 0$. \paragraph{Coupling sets.} $Z_i$: neighbor set; $\Delta x_{Z_i}=\mathrm{col}\{\Delta x_k|k\in Z_i\}$. \paragraph{Inputs and increments.} $u_i=[s_i,v_i]^\top$: roll gap and stand speed. $\Delta u_i(t_n)=u_i(t_n)-u_i(t_{n-1})$: discrete increment (not a deviation). $\Delta u_{i,n}(\tau;\Gamma_{i,n})$: within-interval increment trajectory. $\Gamma_{i,n}\in\mathbb{R}^{6}$: quadratic polynomial coefficients (two channels, three coefficients each). $\Delta u_i^{\mathrm{end}}(t_n)=\Delta u_{i,n}(\delta_n;\Gamma_{i,n})$: end-point increment. $\overline{\Delta u}_i(t_n)=\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau)\,d\tau$: interval-average increment. \paragraph{Datasets.} $\mathcal{I}_x$: sampling domain of $\Delta x$; $\mathcal{I}_\Gamma$: sampling domain of $\Gamma$. $S_i$: one-step dataset size $J$; $S_i^{(K)}$: $K$-step segment dataset size $J_K$. \paragraph{Neural models.} $\mathcal{N}_i(\cdot;\Theta_i)$: forward residual surrogate; $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$: backward model. $\psi_i(\cdot),\rho_i(\cdot)$: auxiliary decomposition branches. \paragraph{MPC and Nash.} $N_p$: prediction horizon; $N_c$: control horizon. $Q_i\in\mathbb{R}^{3\times 3}$, $R_i\in\mathbb{R}^{6\times 6}$: weighting matrices. $\varsigma^{(l)}$: Nash convergence metric; $\varsigma_{\mathrm{tol}}$: tolerance; $L_{\max}$: max iterations; $\beta$: relaxation factor. $\epsilon,\varepsilon>0$: small constants; $\alpha$: Adam learning rate.
Remark 明确写成“control parameterization 选择 + 假设 + ZOH退化特例 + 可扩展”。这就是老师最爱挑的点:你现在主动承认并给边界条件,反而更严谨。\Delta u^{end} vs \overline{\Delta u}),并明确在线用端点更新。这个改动通常能直接避免一轮追问。如果你把你原文里每章的标题、已有公式编号、以及你已有的 Figure/Table 编号体系发我(哪怕是截图),我还能把这份 LaTeX 进一步改成完全可直接替换你论文对应章节的版本:包括式号对齐、引用对齐、以及把你原来没写清的“具体输入/约束/权重/实验设置”以同样详细度补齐。
分的章节太多,少分点章节
latex%====================================================================== % FEWER SECTIONS VERSION (STILL DETAILED) % Structure reduced to 2 main sections: % 1) Learning a control-dependent surrogate model (notation + parameterization + data + NN + training) % 2) RNE-DMPC using the learned surrogate (objective + constraints + Nash iteration + closed loop) %====================================================================== %======================== \section{Learning a Control-Dependent Surrogate Model for a Five-Stand Tandem Mill} %======================== \subsection{Five-stand setting, controlled variables, and unified notation} Consider a five-stand tandem cold rolling mill indexed by $i\in\{1,2,3,4,5\}$. Let $t_n$ be the $n$-th sampling instant and $\delta_n=t_{n+1}-t_n>0$ the sampling interval length. Within each interval $[t_n,t_{n+1}]$, define local time $\tau=t-t_n\in[0,\delta_n]$. \paragraph{Controlled physical variables.} Let $h_i(t)$ denote the exit thickness of stand $i$ ($i=1,\dots,5$), and let $T_i(t)$ denote the inter-stand strip tension between stand $i$ and $i+1$ ($i=1,\dots,4$). The overall five-stand system is strongly coupled mainly through tension propagation: speed and deformation changes at one stand affect neighboring tensions, which in turn influence thickness and stability. \paragraph{References and deviation variables (meaning of $\Delta$ for states).} Let $h_i^{\mathrm{ref}}(t)$ and $T_i^{\mathrm{ref}}(t)$ denote reference trajectories/setpoints given by the schedule or regulation targets. Define deviation (tracking-error) variables \begin{equation} \Delta h_i(t)\triangleq h_i(t)-h_i^{\mathrm{ref}}(t),\qquad \Delta T_i(t)\triangleq T_i(t)-T_i^{\mathrm{ref}}(t). \label{eq:dev_state_def} \end{equation} \textbf{Convention (fixed throughout):} the symbol ``$\Delta$'' attached to \emph{states} always means deviation from reference. Hence, the deviation-coordinate control goal is always $\Delta h_i(t)\to 0$ and $\Delta T_i(t)\to 0$. \paragraph{Local deviation state vector (thickness + adjacent tensions).} For each stand $i$, define the local deviation state \begin{equation} \Delta x_i(t)\triangleq \begin{bmatrix} \Delta h_i(t)\\ \Delta T_{i-1}(t)\\ \Delta T_i(t) \end{bmatrix}\in\mathbb{R}^{d},\qquad d=3. \label{eq:local_x_def} \end{equation} To keep a unified dimension for all stands, adopt boundary conventions \begin{equation} \Delta T_0(t)\equiv 0,\qquad \Delta T_5(t)\equiv 0, \label{eq:boundary_T} \end{equation} and later set the corresponding weights to zero so that boundary virtual components do not affect optimization. \paragraph{Neighbor sets and coupling information (five-stand chain).} Coupling is dominated by adjacent stands, so define \begin{equation} Z_1=\{2\},\quad Z_i=\{i-1,i+1\}\ (i=2,3,4),\quad Z_5=\{4\}. \label{eq:neighbor_set} \end{equation} Define the neighbor-state stack \begin{equation} \Delta x_{Z_i}(t_n)=\mathrm{col}\{\Delta x_k(t_n)\mid k\in Z_i\}. \label{eq:xZi_def} \end{equation} \paragraph{Actuators and discrete increments (meaning of $\Delta$ for inputs).} Each stand is manipulated by roll gap $s_i(t)$ and stand speed $v_i(t)$: \begin{equation} u_i(t)=\begin{bmatrix}s_i(t)\\ v_i(t)\end{bmatrix}\in\mathbb{R}^{n_u},\qquad n_u=2. \label{eq:ui_def} \end{equation} We optimize sample-to-sample input increments: \begin{equation} \Delta u_i(t_n)\triangleq u_i(t_n)-u_i(t_{n-1}) =\begin{bmatrix}\Delta s_i(t_n)\\ \Delta v_i(t_n)\end{bmatrix}. \label{eq:du_discrete} \end{equation} \textbf{Convention (fixed throughout):} ``$\Delta$'' on \emph{inputs} means discrete increment, not deviation-from-reference. Thus, $\Delta x$ (state deviation) and $\Delta u$ (input increment) are distinct by definition. \paragraph{Disturbance.} Let $d_i(t)$ denote exogenous disturbances (entry thickness fluctuation, friction variation, material drift, etc.). We write $\Delta d_i(\tau)$ as an interval disturbance signal in deviation form and define an interval-equivalent disturbance by averaging later. \paragraph{Basic notation.} $I_d$ is the $d\times d$ identity matrix and $0_{a\times b}$ is the $a\times b$ zero matrix. For any vector $z$, define $\|z\|_Q^2\triangleq z^\top Q z$. %---------------------------------------------------------------------- \subsection{Interval-level input parameterization, its validity, and dataset construction} %---------------------------------------------------------------------- \paragraph{Why parameterize within-interval trajectories?} Although supervisory decisions are updated at sampling instants $t_n$, the physical drive/hydraulic loops evolve continuously inside $[t_n,t_{n+1}]$. Abrupt changes can excite tension oscillations and deteriorate thickness stability. Therefore, we parameterize the within-interval increment trajectory by a low-order smooth basis to: (i) obtain a compact finite-dimensional decision variable for optimization, (ii) enforce smooth commands inside the interval, and (iii) enforce constraints for all $\tau\in[0,\delta_n]$. \paragraph{Quadratic polynomial parameterization (vector form, two inputs).} For interval $[t_n,t_{n+1}]$ and $\tau\in[0,\delta_n]$, parameterize \begin{equation} \Delta u_{i,n}(\tau;\Gamma_{i,n}) =\Gamma_{i,n0}+\Gamma_{i,n1}\tau+\Gamma_{i,n2}\tau^2, \label{eq:poly_param} \end{equation} where $\Gamma_{i,n0},\Gamma_{i,n1},\Gamma_{i,n2}\in\mathbb{R}^{n_u}$ and $n_u=2$. Component-wise, \begin{equation} \Delta s_{i,n}(\tau)=\gamma^{(s)}_{i,n0}+\gamma^{(s)}_{i,n1}\tau+\gamma^{(s)}_{i,n2}\tau^2,\qquad \Delta v_{i,n}(\tau)=\gamma^{(v)}_{i,n0}+\gamma^{(v)}_{i,n1}\tau+\gamma^{(v)}_{i,n2}\tau^2. \label{eq:poly_param_comp} \end{equation} Define the stacked parameter vector \begin{equation} \Gamma_{i,n}\triangleq \big[(\Gamma_{i,n0})^\top,(\Gamma_{i,n1})^\top,(\Gamma_{i,n2})^\top\big]^\top \in\mathbb{R}^{p},\qquad p=3n_u=6. \label{eq:Gamma_def} \end{equation} \paragraph{Is polynomial parameterization ``certainly correct''? (explicit assumption + fallback)} The representation \eqref{eq:poly_param} is a \emph{control parameterization} choice rather than a first-principles identity. It is reasonable when (i) the implemented setpoints are interpolated/filtered within each sampling interval (common in industry), (ii) $\delta_n$ is not excessively large compared to actuator bandwidth, and (iii) the same command-generation logic is used in offline simulation and online execution, so that the learned model matches the executed control. If the implementation is zero-order hold (ZOH), \eqref{eq:poly_param} reduces to ZOH by setting $\Gamma_{i,n1}=\Gamma_{i,n2}=0$, so the framework still applies. \paragraph{End-point increment vs interval-average increment (remove ambiguity).} To avoid mixing continuous-time trajectories with discrete-time updates, define two quantities: \textbf{End-point increment (used for sampled input update):} \begin{equation} \Delta u_i^{\mathrm{end}}(t_n)\triangleq \Delta u_{i,n}(\delta_n;\Gamma_{i,n}) =\Gamma_{i,n0}+\Gamma_{i,n1}\delta_n+\Gamma_{i,n2}\delta_n^2, \label{eq:du_end} \end{equation} and update the sampled input by \begin{equation} u_i(t_{n+1})=u_i(t_n)+\Delta u_i^{\mathrm{end}}(t_n). \label{eq:u_update} \end{equation} \textbf{Interval-average increment (equivalent discrete effect when needed):} \begin{equation} \overline{\Delta u}_i(t_n)\triangleq\frac{1}{\delta_n}\int_0^{\delta_n}\Delta u_{i,n}(\tau;\Gamma_{i,n})\,d\tau = \Gamma_{i,n0}+\Gamma_{i,n1}\frac{\delta_n}{2}+\Gamma_{i,n2}\frac{\delta_n^2}{3}. \label{eq:du_avg} \end{equation} We emphasize that \eqref{eq:u_update} uses the end-point increment \eqref{eq:du_end}, while \eqref{eq:du_avg} is used only as an equivalent quantity (e.g., for averaged-effect interpretations or auxiliary discrete approximations). \paragraph{Unknown coupled interval mapping to be learned.} Over $[t_n,t_{n+1}]$, the five-stand coupled deviation-state evolution is represented by an unknown nonlinear mapping \begin{equation} \Delta x_i(t_{n+1}) = \Phi_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta d_i([t_n,t_{n+1}])\Big), \label{eq:unknown_map} \end{equation} where coupling enters via neighbor states and via the underlying coupled physics. A conceptual equivalent linear discrete form is often written as \begin{equation} \Delta x_i(t_{n+1})=M_d\Delta x_i(t_n)+N_d\Delta u_i(t_n)+F_d\Delta d_i(t_n), \end{equation} but accurate derivation/identification is difficult in practice due to nonlinearity and varying regimes; hence we adopt a data-driven surrogate. \paragraph{Sampling domains and disturbance averaging.} Let $\mathcal{I}_x$ denote the sampling domain (ranges) of $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ for offline data generation, and let $\mathcal{I}_\Gamma$ denote the sampling domain of $\Gamma_{i,n}$. Define the interval-average disturbance as \begin{equation} \Delta d_i(t_n)\triangleq\frac{1}{\delta_n}\int_0^{\delta_n}\Delta d_i(\tau)\,d\tau. \label{eq:dist_avg} \end{equation} \paragraph{One-step sample generation (coupled five-stand simulation).} For each interval $[t_n,t_{n+1}]$ and each subsystem $i$: (1) sample $\Delta x_i(t_n)$ and $\Delta x_{Z_i}(t_n)$ from $\mathcal{I}_x$; (2) sample $\Gamma_{i,n}\sim\mathcal{I}_\Gamma$ and choose $\delta_n$ from a prescribed range; (3) generate $\Delta u_{i,n}(\tau;\Gamma_{i,n})$ by \eqref{eq:poly_param}; (4) integrate the \emph{five-stand coupled} rolling model on $[t_n,t_{n+1}]$ (e.g., RK4) and record $\Delta x_i(t_{n+1})$. Thus, one interval-level sample is \begin{equation} \mathcal{D}_{i,n}= \Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n,\Delta x_i(t_{n+1})\Big). \label{eq:one_sample} \end{equation} Repeating yields the one-step dataset \begin{equation} S_i=\Big\{ \big(\Delta x_i^{(j)}(t_n),\Delta x_{Z_i}^{(j)}(t_n),\Gamma_{i,n}^{(j)},\delta_n^{(j)},\Delta x_i^{(j)}(t_{n+1})\big) \ \Big|\ j=1,\ldots,J \Big\}, \label{eq:Si} \end{equation} and the overall dataset is $\{S_i\}_{i=1}^{5}$. \paragraph{$K$-step segment dataset (for multi-step training).} To support multi-step rollout training and reciprocal-consistency regularization, we organize offline simulations into $K$-step segments. Starting at $t_n$, sample $\{(\Gamma_{i,n+s},\delta_{n+s})\}_{s=0}^{K-1}$ and simulate $K$ consecutive intervals to obtain $\{\Delta x_i(t_{n+s})\}_{s=0}^{K}$ and $\{\Delta x_{Z_i}(t_{n+s})\}_{s=0}^{K}$. Define the segment sample \begin{equation} \mathcal{W}_{i,n}= \Big\{ (\Delta x_i(t_{n+s}),\Delta x_{Z_i}(t_{n+s}),\Gamma_{i,n+s},\delta_{n+s})_{s=0}^{K-1}; (\Delta x_i(t_{n+s+1}))_{s=0}^{K-1} \Big\}, \label{eq:segment_sample} \end{equation} and collect $J_K$ segments to form \begin{equation} S_i^{(K)}=\Big\{\mathcal{W}_{i,n}^{(j)}\ \Big|\ j=1,\ldots,J_K\Big\}. \label{eq:SiK} \end{equation} %---------------------------------------------------------------------- \subsection{Residual neural surrogate and detailed training objectives} %---------------------------------------------------------------------- \paragraph{Control-dependent one-step residual predictor (why include control).} We learn a control-dependent deviation-state predictor suitable for MPC: \begin{equation} \Delta \hat{x}_i(t_{n+1}) = \Delta x_i(t_n)+ \mathcal{N}_i\!\Big(\Delta x_i(t_n),\Delta x_{Z_i}(t_n),\Gamma_{i,n},\delta_n;\Theta_i\Big), \label{eq:fwd_pred} \end{equation} where $\mathcal{N}_i(\cdot)$ outputs the one-step deviation-state change and $\Theta_i$ are parameters. Including $(\Gamma_{i,n},\delta_n)$ is essential: without control input, the model becomes autoregressive and cannot evaluate trajectories under candidate decisions, which is required by MPC optimization. \paragraph{Input vector and dimensions.} Let $d=3$ and $p=6$. Define \begin{equation} X_{i,\mathrm{in}}= \big[ \Delta x_i(t_n)^\top,\ \Delta x_{Z_i}(t_n)^\top,\ \Gamma_{i,n}^\top,\ \delta_n \big]^\top \in\mathbb{R}^{d(1+|Z_i|)+p+1}, \end{equation} and $\mathcal{N}_i:\mathbb{R}^{d(1+|Z_i|)+p+1}\to\mathbb{R}^{d}$. \paragraph{Residual (shortcut) structure (baseline-plus-correction).} Define the selection matrix \begin{equation} \hat{I}_i=[I_d,\ 0_{d\times(d|Z_i|+p+1)}], \end{equation} so that $\hat{I}_iX_{i,\mathrm{in}}=\Delta x_i(t_n)$. Then the predictor can be written as \begin{equation} X_{i,\mathrm{out}}=\hat{I}_iX_{i,\mathrm{in}}+\mathcal{N}_i(X_{i,\mathrm{in}};\Theta_i), \end{equation} where $X_{i,\mathrm{out}}$ represents $\Delta \hat{x}_i(t_{n+1})$. This residual form improves training stability and long-horizon rollout behavior because the network learns corrections rather than the full state. \paragraph{Auxiliary decomposition for varying $\delta_n$ (avoid notation conflicts).} To enhance robustness when $\delta_n$ varies, decompose \begin{equation} \mathcal{N}_i(X;\Theta_i)\triangleq \psi_i(X;\Theta_{\psi_i})+\rho_i(X;\theta_i), \end{equation} where $\psi_i(\cdot)$ captures low-frequency/scale effects correlated with $\delta_n$ and $\rho_i(\cdot)$ learns remaining nonlinear coupling corrections. \paragraph{One-step targets.} For sample $(\Delta x_i(t_n),\Delta x_i(t_{n+1}))$, define \begin{equation} \Delta r_i(t_n)\triangleq \Delta x_i(t_{n+1})-\Delta x_i(t_n). \end{equation} \paragraph{Multi-step forward rollout (reduce drift).} Given a segment $\mathcal{W}_{i,n}\in S_i^{(K)}$, initialize $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$ and roll forward: \begin{equation} \Delta \hat{x}_i(t_{n+s+1}) = \Delta \hat{x}_i(t_{n+s})+ \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}), \Gamma_{i,n+s},\delta_{n+s};\Theta_i \Big), \quad s=0,\ldots,K-1. \label{eq:fwd_roll} \end{equation} \paragraph{Backward model and reciprocal consistency (structural regularization).} Introduce a backward residual model $\mathcal{B}_i(\cdot;\bar{\Theta}_i)$ with the same input dimension. Set terminal condition $\Delta \bar{x}_i(t_{n+K})=\Delta \hat{x}_i(t_{n+K})$ and roll back: \begin{equation} \Delta \bar{x}_i(t_{n+s}) = \Delta \bar{x}_i(t_{n+s+1}) + \mathcal{B}_i\!\Big(X_{i,\mathrm{in}}^{b}(t_{n+s});\bar{\Theta}_i\Big), \quad s=K-1,\ldots,0, \label{eq:bwd_roll} \end{equation} where \begin{equation} X_{i,\mathrm{in}}^{b}(t_{n+s}) = \big[ \Delta \bar{x}_i(t_{n+s+1})^\top,\ \Delta \hat{x}_{Z_i}(t_{n+s+1})^\top,\ \Gamma_{i,n+s}^\top,\ \delta_{n+s} \big]^\top. \end{equation} Define the reciprocal prediction error \begin{equation} E_i(t_n)=\sum_{s=0}^{K}\left\|\Delta \hat{x}_i(t_{n+s})-\Delta \bar{x}_i(t_{n+s})\right\|^2. \end{equation} \paragraph{Loss functions (all terms explicit).} \begin{align} L_{\mathrm{1step}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \left\| (\Delta x_i^{(j)}(t_{n+s+1})-\Delta x_i^{(j)}(t_{n+s})) -\mathcal{N}_i(X_{i,\mathrm{in}}^{(j)}(t_{n+s});\Theta_i) \right\|^2, \\ L_{\mathrm{roll}}(\Theta_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\sum_{s=1}^{K} \left\| \Delta x_i^{(j)}(t_{n+s})-\Delta \hat{x}_i^{(j)}(t_{n+s}) \right\|^2, \\ L_{\mathrm{bwd}}(\bar{\Theta}_i) &= \frac{1}{J_K}\sum_{j=1}^{J_K}\frac{1}{K}\sum_{s=0}^{K-1} \left\| (\Delta x_i^{(j)}(t_{n+s})-\Delta x_i^{(j)}(t_{n+s+1})) -\mathcal{B}_i(X_{i,\mathrm{in}}^{b,(j)}(t_{n+s});\bar{\Theta}_i) \right\|^2, \\ L_{\mathrm{msrp}}(\Theta_i,\bar{\Theta}_i) &=\frac{1}{J_K}\sum_{j=1}^{J_K}E_i^{(j)}(t_n). \end{align} Combine them: \begin{equation} L_{\mathrm{total}}=\lambda_1L_{\mathrm{1step}}+\lambda_2L_{\mathrm{roll}}+\lambda_3L_{\mathrm{msrp}}+\lambda_4L_{\mathrm{bwd}}, \end{equation} with $\lambda_\ell>0$ tuned on a validation set. Optimization uses Adam: \begin{equation} \Theta_{i,t+1}=\Theta_{i,t}-\alpha\frac{\hat{m}_{i,t}}{\sqrt{\hat{v}_{i,t}}+\varepsilon}, \end{equation} where $\alpha$ is learning rate and $\varepsilon>0$ ensures numerical stability. %======================== \section{RNE-DMPC: Nash-Equilibrium-Based Distributed MPC with the Learned Surrogate} %======================== \subsection{Control objective and why Nash coordination is necessary} In the five-stand mill, tensions $T_i$ are shared coupling variables influenced by both stand $i$ and $i+1$ (mainly through speed actions), and thickness is primarily influenced by roll gap but also affected indirectly by tension coupling. Therefore, local improvements in thickness at one stand may worsen shared tensions for neighbors, creating a multi-agent coupling conflict. To achieve coordinated thickness--tension regulation/tracking with manageable computation, we employ a Nash-equilibrium-seeking distributed MPC (RNE-DMPC). \subsection{Neural predictor as the MPC model constraint (explicit prediction--control interface)} At time $t_n$, define prediction horizon $N_p$ and control horizon $N_c$ with $N_c\le N_p$. Let $\Delta \hat{x}_i(t_n)=\Delta x_i(t_n)$. Given candidate decision sequence $\{\Gamma_{i,n+s}\}_{s=0}^{N_c-1}$, generate predictions by \begin{equation} \Delta \hat{x}_i(t_{n+s+1}) = \Delta \hat{x}_i(t_{n+s})+ \mathcal{N}_i\!\Big( \Delta \hat{x}_i(t_{n+s}),\Delta \hat{x}_{Z_i}(t_{n+s}), \Gamma_{i,n+s},\delta_{n+s};\Theta_i^* \Big), \quad s=0,\ldots,N_p-1, \label{eq:mpc_rollout} \end{equation} where $\Delta \hat{x}_{Z_i}(t_{n+s})$ is obtained from neighbor communication in Nash iterations. Equation \eqref{eq:mpc_rollout} explicitly connects decision variables $\Gamma$ to predicted thickness/tension deviations. \subsection{Local objective, constraints, and local NLP} \paragraph{Reference in deviation coordinates.} Since $\Delta x_i(t)=x_i(t)-x_i^{\mathrm{ref}}(t)$ by definition, the deviation reference is always \begin{equation} \Delta x_{i,\mathrm{ref}}(t_{n+s})\equiv 0\in\mathbb{R}^{d}. \end{equation} \paragraph{Decision variables.} The local decision vector stacks polynomial parameters over the control horizon: \begin{equation} \mathbf{\Gamma}_i(t_n)=\big[\Gamma_{i,n}^\top,\Gamma_{i,n+1}^\top,\ldots,\Gamma_{i,n+N_c-1}^\top\big]^\top\in\mathbb{R}^{pN_c}. \end{equation} \paragraph{Local cost (explicit thickness+tension weighting).} Let $\Delta \hat{x}_i(t_{n+s})=[\Delta\hat{h}_i(t_{n+s}),\Delta\widehat{T}_{i-1}(t_{n+s}),\Delta\widehat{T}_i(t_{n+s})]^\top$. Choose \begin{equation} Q_i=\mathrm{diag}(q_{h,i},q_{T,i-1},q_{T,i})\in\mathbb{R}^{d\times d},\qquad R_i\in\mathbb{R}^{p\times p}. \end{equation} For boundary virtual tensions, set corresponding weights to zero. Define \begin{equation} J_i= \sum_{s=1}^{N_p}\|\Delta \hat{x}_i(t_{n+s})\|_{Q_i}^2 + \sum_{s=0}^{N_c-1}\|\Gamma_{i,n+s}\|_{R_i}^2. \label{eq:Ji} \end{equation} \paragraph{Constraints (absolute bounds and whole-interval increment bounds).} Absolute input bounds: \begin{equation} u_{i,\min}\le u_i(t_{n+s})\le u_{i,\max},\qquad s=0,\ldots,N_p-1. \label{eq:u_bounds} \end{equation} Whole-interval increment bounds: \begin{equation} \Delta u_{i,\min}\le \Delta u_{i,n+s}(\tau;\Gamma_{i,n+s})\le \Delta u_{i,\max}, \qquad \forall\tau\in[0,\delta_{n+s}]. \label{eq:du_bounds} \end{equation} For each scalar quadratic $q(\tau)=a+b\tau+c\tau^2$ on $[0,\delta]$, extrema occur at $\tau=0$, $\tau=\delta$, and possibly $\tau^\star=-b/(2c)$ if $c\neq 0$ and $\tau^\star\in[0,\delta]$. Hence \eqref{eq:du_bounds} is enforced by checking these points separately for the gap channel and the speed channel. \paragraph{Discrete-time propagation of absolute input (consistent with within-interval command).} Update sampled input using end-point increment: \begin{equation} u_i(t_{n+s+1})=u_i(t_{n+s})+\Delta u_i^{\mathrm{end}}(t_{n+s}),\qquad \Delta u_i^{\mathrm{end}}(t_{n+s})=\Delta u_{i,n+s}(\delta_{n+s};\Gamma_{i,n+s}). \label{eq:u_prop} \end{equation} \paragraph{Local NLP at Nash iteration $l$.} At each Nash iteration, subsystem $i$ solves the differentiable NLP: \begin{equation} \mathbf{\Gamma}_i^{(l)}=\arg\min_{\mathbf{\Gamma}_i}\ J_i \quad\text{s.t.}\quad \eqref{eq:mpc_rollout},\ \eqref{eq:u_bounds},\ \eqref{eq:du_bounds},\ \eqref{eq:u_prop}. \label{eq:local_nlp} \end{equation} Because $\mathcal{N}_i(\cdot)$ is differentiable, \eqref{eq:local_nlp} can be solved by SQP/interior-point methods using automatic differentiation. \subsection{Nash best-response iteration, termination, and receding-horizon application} \paragraph{Nash best-response iteration.} Each stand repeatedly computes a best response to the latest neighbor strategies/predictions: initialize $\mathbf{\Gamma}_i^{(0)}$ (warm start), then for $l=1,2,\ldots$: (1) rollout predictions using \eqref{eq:mpc_rollout} with neighbor predictions from the previous iteration; (2) solve \eqref{eq:local_nlp} to obtain $\mathbf{\Gamma}_i^{(l)}$; (3) broadcast $\mathbf{\Gamma}_i^{(l)}$ and predicted trajectories to neighbors. \paragraph{Convergence metric.} \begin{equation} \varsigma^{(l)}= \max_i\frac{\|\mathbf{\Gamma}_i^{(l)}-\mathbf{\Gamma}_i^{(l-1)}\|_2}{\|\mathbf{\Gamma}_i^{(l-1)}\|_2+\epsilon}, \end{equation} where $\epsilon>0$ avoids division by zero. Stop when $\varsigma^{(l)}\le \varsigma_{\mathrm{tol}}$. \paragraph{Practical safeguards (non-ideal but necessary).} Best-response iterations are not globally guaranteed to converge for general nonlinear coupled problems. Thus, we set a maximum iteration number $L_{\max}$. If the criterion is not met within $L_{\max}$, apply the last iterate. Optionally, apply relaxation to improve robustness: \begin{equation} \mathbf{\Gamma}_i^{(l)}\leftarrow (1-\beta)\mathbf{\Gamma}_i^{(l-1)}+\beta\,\mathbf{\Gamma}_i^{(l)},\qquad \beta\in(0,1]. \end{equation} \paragraph{Receding-horizon implementation.} After termination at time $t_n$, apply only the first-interval parameters $\Gamma_{i,n}^*$. Execute the within-interval increment trajectory $\Delta u_{i,n}(\tau;\Gamma_{i,n}^*)$ on $[t_n,t_{n+1}]$, and update the sampled input using \eqref{eq:du_end}--\eqref{eq:u_update}. Then shift the horizon to $t_{n+1}$ and repeat. %------------------------ % Optional: very short symbol paragraph (keep if needed) %------------------------ \paragraph{Key symbols (compact).} $\Delta x$: deviation-from-reference state; $\Delta u$: discrete increment input. $u_i=[s_i,v_i]^\top$: roll gap and speed. $\Gamma_{i,n}\in\mathbb{R}^6$: quadratic coefficients. $\Delta u_i^{\mathrm{end}}$: end-point increment; $\overline{\Delta u}_i$: interval-average increment. $Z_i$: neighbor set; $N_p,N_c$: prediction/control horizons; $Q_i,R_i$: weights; $L_{\max}$: max Nash iterations.
这版把你原来“详细度该有的解释”都保留了,但把章节压到2 个大 section(每个 section 下面少量 subsection),不会显得你论文“拆得太碎”。
如果你还想再少(比如只要 1 个 section),也可以:我可以把第二个 section 直接并回第一个 section 变成一个大章“Methodology”,下面最多 3 个小节(Notation+Data+Control)。
你这样改完结构和我原来的比我感觉是不是不太好
不一定“不太好”,只是我刚才那版把结构压得很狠(2 个大 section),它更像“期刊方法部分的浓缩写法”。如果你的原稿是论文/毕业论文那种模板(章节更细、逐步铺垫),那你会直觉觉得:** 读起来反而没那么顺**——这很正常。
关键点:结构好不好,不看“分几章”,看读者能不能一路不迷路。你原来的“三段式:数据集→网络→控制(DMPC/Nash)”其实是控制+学习文章里最常见、也最安全的结构。你现在真正需要的,多半不是“彻底换结构”,而是:
下面给你一个更稳的结论和改法。
你原稿大概率是这样铺的:
我那版把很多“解释性铺垫”改成了更紧的段落式表达,并且把东西合并了,所以你会觉得:
所以你的直觉是对的:不必硬换成我那种极简结构。
最稳的做法:保留你原来的大章顺序(数据集→网络→控制),但把它们合并成一个“方法章”里 3–4 个小节,既不碎,也不跳。
第 X 章 方法(Methodology)
这样你只多 1 个大章,下面 4 小节,不会碎,而且完全贴合你原来的叙事。
不管你最终用什么结构,这几处我强烈建议保留(它们是“老师最爱抓”的点):
你可以继续用二次多项式,但一定要在正文有一句话落地:
你原来那种“用区间平均更新采样点输入”的写法,老师很可能追问:
那么 到底是什么?
最稳的写法是:
这条建议无论你结构怎么选都要做。
别只写收敛指标,还要写:
否则老师会说“太理想化”。
给你一个简单判据(不用纠结):
如果你原结构已经符合模板、章节标题也跟你全文一致:
✅ 保留原结构,把我修补的内容“嵌进去”,并合并一些碎小节即可。
如果你原结构本来就碎到像“2.1.1.1.1”,读者会迷路:
✅ 用我给的折中结构(一个方法章 4 小节),不要用极简 2 section。
你刚才的反馈更像第一种情况:原结构顺,只是需要“清理一致性”。
你把你现在的目录/章节标题(只要方法相关那几章的标题即可)贴出来,我就能把我写的内容:
你不需要给全篇正文,先给“方法章的目录”就够,我会直接给你一版结构和你原来一致、但更严谨更细的 LaTeX。