这两个查询结果有什么区别？如果查询结果没有区别，第二个查询的choose部分有优化空间么？ ```x...

Question

这两个查询结果有什么区别？如果查询结果没有区别，第二个查询的choose部分有优化空间么？

xml
WITH ranked_records AS (
            SELECT *, ROW_NUMBER() OVER (
                        PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type
                        ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END
                   ) AS rn
            FROM power_station_discharge
            <where>
                <if test="replaceType != null">
                    AND replace_type = #{replaceType}
                </if>
                <if test="stationIds != null and stationIds.size > 0">
                    AND station_id IN
                    <foreach collection="stationIds" item="item" separator="," open="(" close=")">
                        #{item}
                    </foreach>
                </if>
                <if test="startDate != null">
                    AND date >= #{startDate}
                </if>
                <if test="endDate != null">
                    AND #{endDate} >= date
                </if>
            </where>
        )
        SELECT *
        FROM ranked_records
        <where>
            <choose>
                <when test="sourceType != null and sourceType.name() == 'AUDIT'">
                    AND source_type = 'AUDIT'
                </when>
                <when test="sourceType == null">
                </when>
                <otherwise>
                    AND rn = 1
                </otherwise>
            </choose>
        </where>

xml
<select id="listByCondition" resultType="com.wiscom.discharge.model.entity.PowerStationDischarge">
        WITH ranked_records AS (
            SELECT *, ROW_NUMBER() OVER (
                        PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type
                        ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END
                   ) AS rn
            FROM power_station_discharge
            <where>
                <if test="sourceType != null and sourceType.name() == 'AUDIT'">
                    AND source_type = 'AUDIT'
                </if>
                <if test="replaceType != null">
                    AND replace_type = #{replaceType}
                </if>
                <if test="stationIds != null and stationIds.size > 0">
                    AND station_id IN
                    <foreach collection="stationIds" item="item" separator="," open="(" close=")">
                        #{item}
                    </foreach>
                </if>
                <if test="startDate != null">
                    AND date >= #{startDate}
                </if>
                <if test="endDate != null">
                    AND #{endDate} >= date
                </if>
            </where>
        )
        SELECT *
        FROM ranked_records
        <where>
            <choose>
                <when test="sourceType != null and sourceType.name() == 'AUDIT'">
                </when>
                <when test="sourceType == null">
                </when>
                <otherwise>
                    AND rn = 1
                </otherwise>
            </choose>
        </where>
    </select>

Answer 1

下面按 sourceType 的三种取值把两段 SQL 的“实际返回行集合”和“差异点”拆开说（因为差异主要集中在 AUDIT 场景的 rn 值 和 执行代价）。

1) 两段查询的返回结果差异（按 sourceType 分情况）

A. `sourceType = AUDIT`

第 1 段：CTE 里不限制 source_type，先对“所有来源”的记录做 ROW_NUMBER()；外层再 AND source_type='AUDIT'。
- 返回行集合：所有满足条件的 AUDIT 行（不去重）。
- rn 的值：因为 rn 是在“全来源”里排的，如果同一分组里有 REPORTED，那 AUDIT 往往会是 rn=2/3...。
第 2 段：CTE 里先 AND source_type='AUDIT'，只对 AUDIT 子集做 ROW_NUMBER()；外层不再加 source_type 条件。
- 返回行集合：同样是所有满足条件的 AUDIT 行（不去重）。
- rn 的值：rn 是在“仅 AUDIT”子集中排的，很多行会变成 rn=1/2...（与第 1 段可能不同）。

✅ 结论：** 返回的业务数据行集合基本一致**（都是 AUDIT 全量），但**rn 列的值可能不同**；并且第 2 段通常更省（窗口函数处理的行更少）。

如果你的 resultType 实体里根本没有 rn 字段（通常没有），那“结果差异”主要就体现在性能/执行计划上，而不是业务字段值上。

B. `sourceType = null`

两段都不会加 source_type='AUDIT'，外层 choose 也不会加 rn=1，所以：

✅ 两段都返回：满足其他条件的全量记录（不去重）。
（很多人这里期望默认去重，但你现在的逻辑是“sourceType 为空就全量”。）

C. `sourceType != null 且 != AUDIT`（比如 REPORTED / 其它枚举）

两段都不会在 CTE 里按 source_type=xxx 过滤，外层都会走 otherwise AND rn=1：

✅ 两段都返回：每个分组最多 1 条（优先 REPORTED，否则其它）。
⚠️ 注意：此时 并不会保证返回的 source_type 等于你传入的 sourceType；传入值只用于“是否去重”，不用于“按该来源过滤”。

2) 第 2 段的 `<choose>` 有优化空间吗？

有，而且是“可读性 + 少生成空分支”的优化：你现在的 <choose> 只有 otherwise 真正输出 SQL，前两个 <when> 是空的，本质就是：

仅当 sourceType != null && sourceType != AUDIT 时加 AND rn = 1

直接改成一个 <if> 更清晰：

xml
SELECT *
FROM ranked_records
<where>
  <if test="sourceType != null and sourceType.name() != 'AUDIT'">
    AND rn = 1
  </if>
</where>

这样可以把整个 <choose> 去掉（功能等价于你第 2 段当前写法）。

3) 顺带给一个更“逻辑集中”的写法（可选）

把“是否 AUDIT 过滤”留在 CTE（你第 2 段做得对，能减少窗口函数工作量），外层只负责“是否 rn=1 去重”：

xml
WITH ranked_records AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type
           ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END
         ) AS rn
  FROM power_station_discharge
  <where>
    <if test="sourceType != null and sourceType.name() == 'AUDIT'">
      AND source_type = 'AUDIT'
    </if>
    ...其它条件...
  </where>
)
SELECT *
FROM ranked_records
<where>
  <if test="sourceType != null and sourceType.name() != 'AUDIT'">
    AND rn = 1
  </if>
</where>

一句话总结

业务行集合：两段在三种 sourceType 情况下基本一致；
唯一可能的“结果差异”：sourceType=AUDIT 时返回行相同，但 rn 值可能不同；
性能：第 2 段更优（AUDIT 时窗口函数处理行更少）；
choose 可优化：第 2 段空 when 建议改成单个 <if> 输出 rn=1 条件。

Answer 2

先把执行计划“翻译”一下，你这条 SQL 的主要代价都花在 扫表 + 为 WindowAgg 排序 上。

1) 从计划看瓶颈在哪

计划关键链路是：

Parallel Seq Scan on power_station_discharge（并行顺序扫表）
说明优化器判断：用现有索引不划算（或统计信息认为命中比例较高），所以直接把表的大段数据读出来再过滤。
这里估算过滤后仍有 ~185k 行（rows=185494）。
Sort（每个 worker 排序） + Gather Merge（合并）
WindowAgg 计算 row_number() 必须要求输入按 partition key + order key 有序，所以出现了大排序：
Sort Key: date, station_id, period_type, discharge_type, special_type, CASE...
WindowAgg
本身计算 row_number 的 CPU 也有，但通常不如“排序 + I/O”贵；你这里成本上也能看出来：Sort 段的 cost 从 28618 拉到 39916，再到 Gather Merge 62520，WindowAgg 68548。

👉 结论：瓶颈主要是并行顺序扫表带来的 I/O + 为 WindowAgg 做的全量排序（可能还会 spill 到磁盘）。

2) 一个“隐藏的最大优化点”：你这个窗口函数在当前 SQL 下基本是纯开销

你 WHERE 里写死了：

sql
AND source_type = 'AUDIT'

同时表上有唯一索引：

sql
(date, source_type, replace_type, station_id, period_type, discharge_type, special_type) UNIQUE

这意味着：在 source_type 固定为 AUDIT 的子集内，同一组 (date, replace_type, station_id, period_type, discharge_type, special_type) 最多只有 1 行（因为 source_type 已固定，唯一性就落在其余列上）。

所以你现在的：

sql
ROW_NUMBER() OVER (PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ...)

在当前查询条件下几乎等价于：每行 rn 都是 1。
也就是说：你为了算一个恒等于 1 的列，付出了“全量排序 + WindowAgg”的代价。

✅ 如果你不需要 rn（你外层也没用 rn 过滤），直接删掉窗口：

sql
SELECT *
FROM power_station_discharge
WHERE date BETWEEN '2025-01-01' AND '2025-12-01'
  AND source_type = 'AUDIT'
  AND replace_type = 'POWER_REPLACE_DEFAULT'
  AND station_id = ANY('{...}'::bigint[]);

如果你只是为了带个 rn 字段给前端/调试，用常量即可：

sql
SELECT *, 1 AS rn
FROM power_station_discharge
WHERE ...

这一步通常就是“数量级”的优化，因为直接消灭了 Sort + WindowAgg。

3) 为什么没走你现有索引（以及怎么补）

你现有两个索引：

unique：(date, source_type, replace_type, station_id, period_type, discharge_type, special_type)
non-unique：(date, replace_type, station_id, period_type, discharge_type, special_type)

但你的过滤条件是 date 范围（很大，接近一年）+ source_type/replace_type 等值 + station_id 大 IN 列表。

多列 BTree 如果第一列是范围条件（date BETWEEN），后面的列对“缩小扫描范围”的帮助会明显变弱（往往还是要扫完这一年内的大段索引范围，再去判断后续列），很容易让优化器觉得“不如并行顺扫”。
再加上你还需要为了 WindowAgg 做排序；如果没有一个能同时满足过滤与排序的索引，排序成本依然在。

索引优化方向（按你常见查询模式选）

A. 如果你保留 AUDIT + POWER_REPLACE_DEFAULT 这种固定条件很常见
建一个**部分索引（partial index）**最划算，体积小、命中准：

sql
CREATE INDEX CONCURRENTLY psd_audit_default_idx
ON power_station_discharge (date, station_id, period_type, discharge_type, special_type)
WHERE source_type = 'AUDIT'
  AND replace_type = 'POWER_REPLACE_DEFAULT';

这样能显著降低扫表范围；并且键顺序也贴近你计划里的排序键（replace_type 固定后它本来也不在 sort key 里）。

B. 如果 source_type、replace_type 经常作为等值过滤，date 是范围
把等值列放前面通常更好：

sql
CREATE INDEX CONCURRENTLY psd_st_rt_date_station_idx
ON power_station_discharge (source_type, replace_type, date, station_id, period_type, discharge_type, special_type);

这类索引更容易把扫描范围限制在 (AUDIT, POWER_REPLACE_DEFAULT) 这一个“子空间”内，再做 date 范围。

注意：索引不是越多越好，优先选最贴合你线上最常见的谓词组合的一条。

4) 其它可选优化（看实际执行情况取舍）

把超长 IN 列表改成 JOIN（减少计划时间 & 可能让优化器更敢用索引）
比如把 station_id 列表作为参数数组传入：

sql
WITH s AS (SELECT unnest($1::bigint[]) AS station_id)
SELECT p.*
FROM power_station_discharge p
JOIN s USING (station_id)
WHERE ...

只取需要的列，别 SELECT *
排序/传输的行宽是 125B+，列越多排序越吃内存、越容易 spill。
用 EXPLAIN (ANALYZE, BUFFERS) 确认是否“Sort spill”
如果看到 Sort Method: external merge 或磁盘读写很高，那提高该会话 work_mem（针对这条查询）可能立竿见影；但它解决不了“窗口函数本可删”的根因。
按 date 分区（按月/季度）
如果表很大且所有查询都带 date 范围，分区能让扫描范围直接变小（分区裁剪）。

这两个查询结果有什么区别？如果查询结果没有区别，第二个查询的choose部分有优化空间么？ ```x...

Вопрос

Мысль

Analyzing query differences

Comparing two query behaviors

Breaking down sourceType filtering behavior

Considering performance and optimization

Optimizing query choose statements

Clarifying differences in 'rn' computation based on filters

Clarifying query optimizations based on sourceType

Ответ

1) 两段查询的返回结果差异（按 sourceType 分情况）

A. `sourceType = AUDIT`

B. `sourceType = null`

C. `sourceType != null 且 != AUDIT`（比如 REPORTED / 其它枚举）

2) 第 2 段的 `<choose>` 有优化空间吗？

3) 顺带给一个更“逻辑集中”的写法（可选）

一句话总结

Вопрос

Мысль

Analyzing PostgreSQL query plan bottleneck

Focusing on query execution and index strategy

Optimizing query index usage

Fine-tuning index for better filtering and sorting

Improving index for partitioning and sorting

Optimizing query with window function and filter

Optimizing query with partial index and window function

Suggesting optimization for window function in PostgreSQL

Optimizing index and query execution

Analyzing query plan and proposing index optimization

Proposing optimizations based on query plan analysis

Ответ

1) 从计划看瓶颈在哪

2) 一个“隐藏的最大优化点”：你这个窗口函数在当前 SQL 下基本是纯开销

3) 为什么没走你现有索引（以及怎么补）

索引优化方向（按你常见查询模式选）

4) 其它可选优化（看实际执行情况取舍）

推荐你优先做的两步（收益最大）

Поделиться этим Q&A

Вопрос

Мысль

Analyzing query differences

Comparing two query behaviors

Breaking down sourceType filtering behavior

Considering performance and optimization

Optimizing query choose statements

Clarifying differences in 'rn' computation based on filters

Clarifying query optimizations based on sourceType

Ответ

1) 两段查询的返回结果差异（按 sourceType 分情况）

A. sourceType = AUDIT

B. sourceType = null

C. sourceType != null 且 != AUDIT（比如 REPORTED / 其它枚举）

2) 第 2 段的 <choose> 有优化空间吗？

3) 顺带给一个更“逻辑集中”的写法（可选）

一句话总结

Вопрос

Мысль

Analyzing PostgreSQL query plan bottleneck

Focusing on query execution and index strategy

Optimizing query index usage

Fine-tuning index for better filtering and sorting

Improving index for partitioning and sorting

Optimizing query with window function and filter

Optimizing query with partial index and window function

Suggesting optimization for window function in PostgreSQL

Optimizing index and query execution

Analyzing query plan and proposing index optimization

Proposing optimizations based on query plan analysis

Ответ

1) 从计划看瓶颈在哪

2) 一个“隐藏的最大优化点”：你这个窗口函数在当前 SQL 下基本是纯开销

3) 为什么没走你现有索引（以及怎么补）

索引优化方向（按你常见查询模式选）

4) 其它可选优化（看实际执行情况取舍）

推荐你优先做的两步（收益最大）

Поделиться этим Q&A

A. `sourceType = AUDIT`

B. `sourceType = null`

C. `sourceType != null 且 != AUDIT`（比如 REPORTED / 其它枚举）

2) 第 2 段的 `<choose>` 有优化空间吗？