附件是我写的SCI期刊论文，准备投稿《Computer Vision and Image Under...

Question

附件是我写的SCI期刊论文，准备投稿《Computer Vision and Image Understanding》期刊，仔细认真的审阅这篇文章，分析每个部分的表述逻辑和行文规范。包括缩略语是否统一，实验是否完整，用词是否准确，摘要是否合理等方面。包括检查公式和图片、参考文献等是否正确，并对应的提出修改意见和修改内容。

Answer 1

我已逐页审阅附件中的 16 页稿件。总体判断：选题和实验结果有投稿 CVIU 的潜力，但当前稿件还不建议直接提交；主要风险不是方法本身，而是结构编号、图表交叉引用、方法定义一致性、实验可复现性、参考文献格式与匿名投稿规范。CVIU 属于双匿名评审，投稿时应将 title page 与匿名正文分开；当前 PDF 首页直接包含作者姓名和单位，需要拆分并匿名化。CVIU 还要求摘要不超过 250 词、提交 highlights、公式为可编辑文本、图像按顺序引用并单独提交，参考文献至少要信息完整且风格一致。(ScienceDirect) 本审阅基于你上传的稿件内容。(GuideHDR_T_CVIU (1).pdf)

一、最需要优先修改的硬伤

1. 章节结构错误：方法部分被放在 “2. Related Work” 下面。
现在的 2.1 Overall Architecture、2.2 MPM、2.3 CESC、2.4 DAFN、2.5 Loss Function 都属于方法部分，却被编号为 Related Work 的子节。这是投稿前必须修正的问题。建议改为：

text
1. Introduction
2. Related Work
3. Proposed Method
   3.1 Overall Architecture
   3.2 Motion-aware Pre-Modulation Module
   3.3 Cross-Exposure Structure Calibration Module
   3.4 Dual-Attention Fusion Network
   3.5 Loss Function
4. Experiments
   4.1 Datasets and Implementation Details
   4.2 Comparisons with State-of-the-Art Methods
   4.3 Visual Comparisons
   4.4 Ablation Studies
   4.5 Complexity and Runtime Analysis
5. Conclusion

2. 多处图号引用错误，会显著影响审稿印象。
稿件中写 “As shown in Figure 5, the proposed GuideHDR-T adopts…” 但总体架构实际是 Figure 2；DRP/DRR 文中写 “Figure 5 and Figure 5”，实际应为 Figure 3 和 Figure 4；Eq. (28) 前出现 “Figure ??”；这些属于 LaTeX 交叉引用未更新或标签错误，必须逐项修正。(GuideHDR_T_CVIU (1).pdf)

建议直接改为：

text
As shown in Fig. 2, the proposed GuideHDR-T adopts a three-stage pipeline...
The specific structures of DRP and DRR are shown in Figs. 3 and 4, respectively.
According to the overall architecture in Fig. 2, a long skip connection...

3. 方法叙述存在概念不一致：CESC 的辅助结构输入 A_i 没有被充分定义。
正文使用 $A_1$ 、 $A_3$ 作为 CESC 输入，但图 5 中又出现 RGB/YCbCr、AvgPool/MaxPool 等标记；正文中只写 $S_i = Conv(A_i)$ ，没有清楚说明 $A_i$ 是 YCbCr、亮度通道、梯度图、max-pooled feature，还是其他结构先验。建议在 Overall Architecture 中提前定义：

text
For each non-reference exposure, we construct an auxiliary structural input A_i from the YCbCr representation of the corresponding LDR frame. Specifically, the luminance-dominated structural branch is used to provide exposure-robust edge and layout cues, while the RGB branch preserves appearance-dependent responses.

如果你的实现并非 YCbCr，请按真实实现改写；关键是正文、公式、图 5 标注必须完全一致。

4. CESC 只写了 PSI 的频域重组，没有写 cross-attention 如何得到 $\hat F_i^{rgb}$ 。
正文称 CESC 包含 “cross-attention fusion”，但公式只到 Eq. (18)，缺少从 $\bar R_i,\bar S_i$ 到 $\hat F_i^{rgb}$ 的计算过程。建议增加一组公式，例如：

text
Q_i = W_Q \bar{R}_i,\quad K_i = W_K \bar{S}_i,\quad V_i = W_V \bar{S}_i,

text
C_i = \mathrm{Softmax}(Q_iK_i^\top/\sqrt{d})V_i,

text
\hat{F}^{rgb}_i = \mathrm{Conv}([\tilde{F}^{rgb}_i, C_i]),\quad i \in \{1,3\}.

如果实际实现不是这一路径，要按代码真实结构写。审稿人会重点检查“模块名很强，但公式是否完整”。

5. 实验完整性不足：缺少复杂度、运行时间、损失函数消融和复现实验说明。
目前有两组数据集、SOTA 对比和模块消融，这是基础完整的；但 Transformer 类 HDR 方法投稿 CVIU 时，建议补充 Params / FLOPs / runtime / memory，并说明比较方法的结果是“引用原论文、使用官方模型测试、还是重新训练”。此外，你的 Loss Function 包含 $L_{perc}$ 、 $L_{fft}$ 、 $L_{ssim}$ ，但实验没有对应消融，建议新增一张 loss ablation 表。(GuideHDR_T_CVIU (1).pdf)

二、摘要、关键词与题目

摘要总体逻辑是合理的：背景 → 痛点 → 三个模块 → 实验结论。但现在存在两个问题：第一，摘要较长且句子密度高；第二，贡献点偏“模块罗列”，对“为什么 cross-exposure matchability 是核心”强调还不够。建议摘要中保留 MPM/CESC/DAFN，但减少形容词堆叠，突出“先恢复可匹配性，再融合”的核心思想。

关键词建议统一大小写：

text
High dynamic range imaging; HDR deghosting; Multi-exposure fusion; Transformer; Cross-exposure calibration

其中 “Deep Learning” 建议改为 “Deep learning” 或直接换成 “Transformer”，因为本文主要技术卖点是 Transformer 与 cross-exposure calibration。

下面是可直接替换的英文摘要和 highlights 草案：

:::writing{variant="document" id="48392"} Revised Abstract

High dynamic range (HDR) imaging reconstructs an HDR radiance map by fusing multi-exposure low dynamic range (LDR) images. Dynamic scenes remain challenging because scene motion and severe exposure variations jointly weaken cross-exposure correspondences, especially in saturated or underexposed regions where reliable structures are missing. This paper proposes GuideHDR-T, a guided HDR Transformer that restores cross-exposure matchability before fusion. The framework first applies a Motion-aware Pre-Modulation Module (MPM) to attenuate motion-induced outliers in non-reference features and refine the reference feature as a stable structural anchor. A Cross-Exposure Structure Calibration Module (CESC) then injects exposure-robust structural cues through phase-guided frequency-domain interaction and cross-attention, improving feature comparability in degraded regions. Finally, a Dual-Attention Fusion Network (DAFN) progressively aggregates the calibrated features to recover global structures and local details. Experiments on the Kalantari and Tel dynamic HDR benchmarks show that GuideHDR-T achieves higher PSNR, SSIM, and HDR-VDP-2 scores than representative alignment-based, attention-based, Transformer-based, and diffusion-based methods. Ablation studies further verify the complementary contributions of motion suppression, structural calibration, and dual-attention fusion. These results demonstrate that explicitly improving cross-exposure feature matchability is effective for ghost-free HDR reconstruction under large motion and extreme exposure gaps.

Highlights

Cross-exposure matchability is modeled before HDR fusion.
MPM suppresses motion-induced outliers in non-reference frames.
CESC restores exposure-robust structure via phase-guided calibration.
DAFN fuses calibrated features with spatial and channel attention.
Experiments show consistent gains on two dynamic HDR benchmarks. :::

三、各部分详细修改意见

1. Introduction

优点是问题意识明确，能把动态 HDR 的难点从“单纯对齐/融合”提升到“motion + exposure degradation 破坏 feature matchability”。这比单纯说“我们提出三个模块”更有说服力。

需要修改的地方：

第一，第一段背景略泛，可减少 “widely used / practical / low hardware cost” 这类常见表述，把篇幅留给本文观察。第二，Figure 1 的作用要更准确：当前图注说 “importance of alignment”，但你的方法是 “motion-aware pre-modulation / implicit alignment-calibration-fusion”，最好避免让审稿人误以为你做了显式 alignment。建议把图注中的 “alignment” 改为 “motion-aware pre-modulation” 或 “feature response pre-shaping”。第三，Introduction 末尾贡献点可以更短、更硬，建议写成三条：提出 cross-exposure matchability 视角；提出 MPM+CESC 的 progressive calibration；在两个 benchmark 上取得 SOTA 并通过 ablation 验证。

建议改写贡献点开头：

text
The main contributions are summarized as follows.

而不是：

text
Based on the above analysis, the main contributions of this paper can be summarized...

2. Related Work

Related Work 的分类逻辑较完整：explicit alignment-fusion、implicit alignment-fusion、fusion-only、你的 implicit alignment-calibration-fusion。但目前文字过长，很多段落像综述论文，削弱了主线。建议每个 paradigm 保持“代表方法 + 局限 + 与本文关系”三层结构。

需要特别注意参考文献更新：SCTNet 当前应引用 ICCV 2023 版本，而不是仅写 arXiv preprint；AFUNet 已有 ICCV 2025 公开版本，若使用其结论或页码，请核对最终出版信息；Hu et al. 的 CVPR 2024 论文也建议补齐页码和 DOI。(CVF Open Access)

建议把 Related Work 最后一节标题从：

text
Our Paradigm: Implicit Alignment-Calibration-Fusion.

改为更学术的：

text
Discussion: From Alignment-Fusion to Alignment-Calibration-Fusion.

否则 “Our Paradigm” 放在 Related Work 中略显主观，且与方法章节混在一起。

3. Proposed Method / Overall Architecture

总体架构叙述清楚，但存在三类问题：

重复公式。 Eq. (1) 已经定义浅层特征提取，Eq. (5) 又写 $F_i^{rgb}=Conv(I_i)$ ，两者功能重复。建议保留 Eq. (1)，把 Eq. (5) 改成实现说明：

text
In our implementation, $N_{shallow}$ is a shared convolutional stem.

变量定义不完整。 $I_i$ 被称为 “dual-domain input”，但前文说输入是三帧 LDR 图像；若确实有 RGB/YCbCr 双域输入，应在 3.1 中定义双域构造方式。若 $I_i$ 只是 RGB LDR，则不要写 dual-domain input。

模块命名应统一大小写。 文中同时出现 “Motion-aware Pre-Modulation Module”“Motion-Aware Pre-modulation Module”“Motion-aware Pre-modulation”，建议统一为：

text
Motion-aware Pre-Modulation Module (MPM)
Cross-Exposure Structure Calibration Module (CESC)
Dual-Attention Fusion Network (DAFN)

4. MPM

MPM 的动机合理：先抑制非参考帧中的运动异常响应，再强化参考帧结构锚点。但 DRP 与 DRR 的关系需要写得更“机制化”：

当前写法偏描述：

text
regions with more consistent structural responses ... can obtain higher weights

建议改为更精确：

text
The learned response map acts as a soft reliability mask for non-reference features. It suppresses activations that are inconsistent with the reference feature while preserving exposure-complementary structures.

另一个问题是 DCNv2 没有给出参考文献。建议首次出现 “DCNv2 Blocks” 时引用 Deformable ConvNets v2，并说明 offset/mask 是否由 block 内部预测。

5. CESC

这是全文最有创新潜力的部分，但也最需要补强。审稿人可能会问三个问题：

一是为什么 phase 比 amplitude 更适合结构校准；二是 $A_i$ 的来源是什么；三是 PSI 后 cross-attention 如何实现。你现在对第一点解释较好，但第二、第三点不足。

建议补充一小段 “Why phase-guided calibration?”：

text
The phase component is less sensitive to global intensity scaling and is more closely related to spatial layout. Therefore, phase exchange provides structural cues that are complementary to amplitude-dominated exposure responses.

同时，图 5 中的 RGB/YCbCr 标注必须与公式统一。若图中 F_max,i^ycbcr 是辅助结构输入，请正文也写成：

text
A_i = MaxPool(Conv(YCbCr(I_i))).

或者按真实实现改写。

6. DAFN

DAFN 的 GCA + CGSF 设计逻辑基本清楚，但术语 “parameter-independent channel interaction modules” 容易引起误解。你后面又写 $CGSF_1$ 和 $CGSF_3$ ，看起来是两个分支独立模块。建议根据实现改成二选一：

text
branch-specific channel interaction modules

或

text
parameter-shared channel interaction modules

不要写 “parameter-independent”，因为它不清楚是“无参数”还是“参数不共享”。

此外，Eq. (21) 中 SwinLayer 没有引用 Swin Transformer 原始论文；Embed/Unembed 的 patch size、window size、number of heads、DARB 数量 $N$ 也应在 implementation details 或 method 中列出。

7. Loss Function

Loss Function 比较完整，但实验没有验证这些 loss 的必要性。建议新增：

text
w/o perceptual loss
w/o FFT loss
w/o SSIM loss
full loss

同时，VGG16 perceptual loss 需要说明输入是否为 tone-mapped RGB、是否归一化到 ImageNet mean/std、使用哪些层，例如 relu2_2, relu3_3, relu4_3。否则可复现性不足。

术语方面，建议把 “L1 loss” 统一写成 “ $\ell_1$ loss”；把 “ADAM optimizer” 改为 “Adam optimizer”。

8. Experiments

实验结果看起来强，尤其 Kalantari 和 Tel 两个数据集上都给出 PSNR、SSIM、HDR-VDP-2。但当前实验部分还缺少几项会被审稿人追问的信息：

问题	建议补充
SOTA 对比来源不清	说明每个方法的结果来自原论文、官方模型复现，还是作者重新训练
复杂度缺失	增加 Params、FLOPs、runtime、GPU memory
公平性不足	说明所有方法是否使用相同 tone mapping、相同裁剪、相同测试脚本
loss 没有消融	增加 loss ablation
baseline 定义模糊	Table 3 的 Baseline 数值与 SCTNet 在 Table 1 中部分指标高度接近，需解释 baseline 是 SCTNet、Swin backbone，还是自建 backbone
缺少失败案例	增加 1 个 failure case 或 limitation，可提升可信度

Table 5 的文字解释也需要重写。现在表中有 “w/o Dual-Domain (RGB Only)” 和 “w/o Phase-Guidance (No PSI)”，但正文说 “When only dual-domain modeling is enabled, i.e., w/o Phase-Guidance / No PSI setting, which only uses YCbCr amplitude information”，这在逻辑上不一致。建议改成：

text
The RGB-only setting removes the auxiliary YCbCr structural branch.
The no-PSI setting keeps the auxiliary branch but removes phase-guided recombination.
The full CESC uses both dual-domain structural guidance and PSI.

9. Figures and Tables

图 2 总体架构较完整，但图中标注信息太密，建议导出高分辨率 PDF/EPS，并统一 MPM/CESC/DAFN/DRP/DRR/PSI/DARB 的字体大小。CVIU/Elsevier 要求图像按正文出现顺序编号、正文中全部引用、caption 解释符号和缩略语。(ScienceDirect)

表格方面，Table 1 使用红、橙、黄底色标前三名。Elsevier 指南建议表格避免过多 shading；建议改为 bold best、underline second best，或在黑白打印下也能区分的标记。(ScienceDirect)

必须修正的图文问题：

text
Figure 5 -> Figure 2: overall architecture
Figure 5 and Figure 5 -> Figures 3 and 4: DRP and DRR
Figure ?? -> Figure 2
HDR-VDP2 -> HDR-VDP-2
fusion_1 -> Fusion-1 或统一反过来
Ground Truth -> ground truth，除非作为图中标签

10. References

当前参考文献基本覆盖了 HDR deghosting 的主要路线，但存在格式与完整性问题。CVIU 指南说明投稿阶段参考文献不设严格格式，只要风格一致且信息完整；正式参考风格是作者-年份，参考文献表按字母顺序排列，并推荐 DOI。(ScienceDirect) 你当前正文使用数字引用 [7]，参考文献按字母顺序排列，这在投稿阶段可能可接受，但如果想更贴近 CVIU 风格，建议改为 author-year，例如：

text
Kalantari and Ramamoorthi (2017)
Yan et al. (2019)
Tel et al. (2023)

还建议补充或核对以下引用：

text
DCNv2 / Deformable ConvNets v2
Swin Transformer
VGG16
Adam optimizer
HDR-VDP-2 官方论文信息是否完整
SCTNet 使用 ICCV 2023 版本而非仅 arXiv
Hu et al. 2024 CVPR 页码和 DOI
AFUNet 2025 ICCV 页码、会议名和 DOI/URL

四、缩略语与术语统一清单

建议全文统一如下：

缩略语	首次出现建议
HDR	high dynamic range (HDR)
LDR	low dynamic range (LDR)
MPM	Motion-aware Pre-Modulation Module (MPM)
CESC	Cross-Exposure Structure Calibration Module (CESC)
DAFN	Dual-Attention Fusion Network (DAFN)
DRP	Dual-Branch Response Pre-Modulation (DRP)
DRR	Deformable Reference Refinement (DRR)
PSI	Phase-Guided Structure Injection (PSI)
DARB	Dual-Attention Refinement Block (DARB)
GCA	Global Context Aggregation (GCA)
CGSF	Context-Guided Semantic Fusion (CGSF)
FFT	fast Fourier transform (FFT)
DCNv2	Deformable ConvNets v2 (DCNv2)
HDR-VDP-2	不要写成 HDR-VDP2
$\mu$ -PSNR / $\mu$ -SSIM	全文统一用希腊字母 $\mu$ ，不要混用 micro sign 和英文 u

五、投稿前检查清单

提交 CVIU 前建议按这个顺序处理：

拆分 title page 和 anonymized manuscript，正文删除作者、单位、致谢等身份信息。
重排章节编号，把方法从 Related Work 中移出。
全文重新编译，清除所有 Figure ??、错误图号、错误表号。
统一 CESC 中 $A_i$ 、RGB/YCbCr、AvgPool/MaxPool、PSI、cross-attention 的定义。
增加 complexity/runtime 与 loss ablation。
补齐 DCNv2、Swin、VGG16、Adam 等基础引用。
统一 HDR-VDP-2、 $\mu$ -PSNR、 $\mu$ -SSIM、Deep learning 等大小写和写法。
准备 highlights 文件；图像用高分辨率 PDF/EPS/TIFF 单独上传。
选择参考文献策略：至少保持一致；更推荐按 CVIU 的 author-year 风格调整。
增加一段 limitation，例如实时性、极端遮挡、极低照度噪声或泛化到真实相机 ISP 的限制。