这两个查询结果有什么区别?如果查询结果没有区别,第二个查询的choose部分有优化空间么? ```x...

Создано: 21 января 2026 г.

Отвечено с использованием GPT-5.2 Thinking от Chat01

Вопрос

这两个查询结果有什么区别?如果查询结果没有区别,第二个查询的choose部分有优化空间么?

xml
WITH ranked_records AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END ) AS rn FROM power_station_discharge <where> <if test="replaceType != null"> AND replace_type = #{replaceType} </if> <if test="stationIds != null and stationIds.size > 0"> AND station_id IN <foreach collection="stationIds" item="item" separator="," open="(" close=")"> #{item} </foreach> </if> <if test="startDate != null"> AND date >= #{startDate} </if> <if test="endDate != null"> AND #{endDate} >= date </if> </where> ) SELECT * FROM ranked_records <where> <choose> <when test="sourceType != null and sourceType.name() == 'AUDIT'"> AND source_type = 'AUDIT' </when> <when test="sourceType == null"> </when> <otherwise> AND rn = 1 </otherwise> </choose> </where>
xml
<select id="listByCondition" resultType="com.wiscom.discharge.model.entity.PowerStationDischarge"> WITH ranked_records AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END ) AS rn FROM power_station_discharge <where> <if test="sourceType != null and sourceType.name() == 'AUDIT'"> AND source_type = 'AUDIT' </if> <if test="replaceType != null"> AND replace_type = #{replaceType} </if> <if test="stationIds != null and stationIds.size > 0"> AND station_id IN <foreach collection="stationIds" item="item" separator="," open="(" close=")"> #{item} </foreach> </if> <if test="startDate != null"> AND date >= #{startDate} </if> <if test="endDate != null"> AND #{endDate} >= date </if> </where> ) SELECT * FROM ranked_records <where> <choose> <when test="sourceType != null and sourceType.name() == 'AUDIT'"> </when> <when test="sourceType == null"> </when> <otherwise> AND rn = 1 </otherwise> </choose> </where> </select>

Мысль

Analyzing query differences

I need to compare two MyBatis XML queries that involve filtering by source_type and using a CTE. In the first query, the AUDIT filter is applied outside the CTE, so all rows are returned regardless of rn. In the second query, the filtering happens inside the CTE. Let me work through the logic.

Comparing two query behaviors

In both queries, the first one returns all AUDIT rows without filtering based on rn, while the second limits rows to rn=1 when sourceType is not AUDIT. For null source type, both queries return all rows. For sourceType = AUDIT, both queries should return identical results.

Breaking down sourceType filtering behavior

For sourceType that's neither null nor AUDIT, the first query returns one row per partition, preferring REPORTED rows, while ignoring the sourceType filter. The second query behaves similarly, as the CTE only filters for AUDIT. The result should be identical overall, despite where the filter is applied.

Considering performance and optimization

There is a potential performance difference between the two queries because the second query filters by source_type inside the CTE, reducing the number of rows processed. For AUDIT, this optimizes by filtering earlier. While the queries are identical for most cases, there may be optimization opportunities in the choose statements.

Optimizing query choose statements

To optimize the queries, I could simplify the choose logic using dynamic conditions, replacing unnecessary parts (like empty <when> clauses) with <if> conditions. Simplifying the sourceType checks and moving the rn=1 filter to the CTE can improve clarity. Performance-wise, adapting for MySQL's limitations and considering derived tables might help, but the key difference comes in execution plan rather than result.

Clarifying differences in 'rn' computation based on filters

In the first query, I am computing the row number (rn) across all source types, while in the second query, it's computed within the 'AUDIT' subset, causing possible differences in rn values. The result sets are still identical, but the row numbers may differ, especially if the rn column is used. If sourceType is AUDIT, pushing the filter into CTE changes the row number sequence. This detail might matter if the consumer relies on rn. Otherwise, they're identical.

Clarifying query optimizations based on sourceType

In the second query, I simplify the logic by replacing <choose> with a single <if> to add rn=1 when sourceType isn't null or AUDIT. By pushing the rn=1 condition into the CTE, I can optimize performance. The result set remains identical, but there's a difference in the execution plan, mainly in how rn values are computed.

Thought for 48s

Ответ

下面按 sourceType 的三种取值把两段 SQL 的“实际返回行集合”和“差异点”拆开说(因为差异主要集中在 AUDIT 场景的 rn 值执行代价)。

1) 两段查询的返回结果差异(按 sourceType 分情况)

A. sourceType = AUDIT

  • 第 1 段:CTE 里不限制 source_type,先对“所有来源”的记录做 ROW_NUMBER();外层再 AND source_type='AUDIT'
    • 返回行集合:所有满足条件的 AUDIT 行(不去重)。
    • rn 的值:因为 rn 是在“全来源”里排的,如果同一分组里有 REPORTED,那 AUDIT 往往会是 rn=2/3...
  • 第 2 段:CTE 里先 AND source_type='AUDIT',只对 AUDIT 子集做 ROW_NUMBER();外层不再加 source_type 条件。
    • 返回行集合:同样是所有满足条件的 AUDIT 行(不去重)。
    • rn 的值:rn 是在“仅 AUDIT”子集中排的,很多行会变成 rn=1/2...(与第 1 段可能不同)。

结论:** 返回的业务数据行集合基本一致**(都是 AUDIT 全量),但**rn 列的值可能不同**;并且第 2 段通常更省(窗口函数处理的行更少)。

如果你的 resultType 实体里根本没有 rn 字段(通常没有),那“结果差异”主要就体现在性能/执行计划上,而不是业务字段值上。


B. sourceType = null

两段都不会加 source_type='AUDIT',外层 choose 也不会加 rn=1,所以:

两段都返回:满足其他条件的全量记录(不去重)
(很多人这里期望默认去重,但你现在的逻辑是“sourceType 为空就全量”。)


C. sourceType != null 且 != AUDIT(比如 REPORTED / 其它枚举)

两段都不会在 CTE 里按 source_type=xxx 过滤,外层都会走 otherwise AND rn=1

两段都返回:每个分组最多 1 条(优先 REPORTED,否则其它)
⚠️ 注意:此时 并不会保证返回的 source_type 等于你传入的 sourceType;传入值只用于“是否去重”,不用于“按该来源过滤”。


2) 第 2 段的 <choose> 有优化空间吗?

有,而且是“可读性 + 少生成空分支”的优化:你现在的 <choose> 只有 otherwise 真正输出 SQL,前两个 <when> 是空的,本质就是:

仅当 sourceType != null && sourceType != AUDIT 时加 AND rn = 1

直接改成一个 <if> 更清晰:

xml
SELECT * FROM ranked_records <where> <if test="sourceType != null and sourceType.name() != 'AUDIT'"> AND rn = 1 </if> </where>

这样可以把整个 <choose> 去掉(功能等价于你第 2 段当前写法)。


3) 顺带给一个更“逻辑集中”的写法(可选)

把“是否 AUDIT 过滤”留在 CTE(你第 2 段做得对,能减少窗口函数工作量),外层只负责“是否 rn=1 去重”:

xml
WITH ranked_records AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END ) AS rn FROM power_station_discharge <where> <if test="sourceType != null and sourceType.name() == 'AUDIT'"> AND source_type = 'AUDIT' </if> ...其它条件... </where> ) SELECT * FROM ranked_records <where> <if test="sourceType != null and sourceType.name() != 'AUDIT'"> AND rn = 1 </if> </where>

一句话总结

  • 业务行集合:两段在三种 sourceType 情况下基本一致;
  • 唯一可能的“结果差异”sourceType=AUDIT 时返回行相同,但 rn 值可能不同
  • 性能:第 2 段更优(AUDIT 时窗口函数处理行更少);
  • choose 可优化:第 2 段空 when 建议改成单个 <if> 输出 rn=1 条件。

Вопрос

查询SQL如下,表结构和查询计划如下,瓶颈在哪?有哪些优化方案?

sql
WITH ranked_records AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ORDER BY CASE WHEN source_type = 'REPORTED' THEN 0 ELSE 1 END ) AS rn FROM power_station_discharge WHERE date BETWEEN '2025-01-01' AND '2025-12-01' AND source_type = 'AUDIT' AND replace_type = 'POWER_REPLACE_DEFAULT' AND station_id IN (200000053, 200000054, 200000032, 102000530, 200000100, 102000528, 200000091, 200000074, 200000058, 200000062, 102000529, 102000511, 200000078, 200000089, 200000038, 102000406, 200000098, 103000001, 102000205, 200000073, 200000069, 102000287, 102000446, 200000057, 200000029, 200000061, 200000033, 200000036, 200000059, 102000365, 102000411, 102000377, 102000427, 102000413, 102000412, 102000515, 200000044, 102000501, 200000067, 200000048, 102000517, 102000430, 102000410, 102000032, 102000072, 102000009, 102000342, 102000418, 200000019, 102000527, 102000354, 102000395, 102000070, 200000011, 102000518, 200000097, 102000426, 102000328, 200000021, 102000521, 200000027, 102000439, 102000381, 102000386, 102000066, 200000092, 200000042, 102000349, 200000034, 200000047, 200000016, 102000436, 102000265, 102000306, 102000245, 200000080, 200000077, 200000099, 102000061, 102000356, 102000048, 102000041, 102000015, 102000012, 102000034, 102000053, 102000044, 102000038, 102000429, 102000433, 102000417, 102000382, 102000371, 102000010, 200000093, 200000084, 200000088, 102000525, 200000082, 200000087, 102000357, 102000325, 200000085, 102000369, 102000415, 102000401, 102000403, 102000392, 102000344, 102000337, 102000347, 102000285, 102000071, 102000185, 102000361, 102000341, 102000397, 102000396, 102000513, 200000094, 102000326, 102000065, 102000389, 102000419, 102000368, 102000444, 102000425, 102000037, 102000423, 102000374, 102000067, 102000327, 200000028, 200000066, 200000043, 102000391, 102000384, 200000086, 102000526, 102000348, 102000058, 102000338, 102000504, 102000040, 102000054, 102000400, 102000335, 102000435, 102000432, 102000522, 102000339, 102000051, 102000330, 102000398, 102000027, 102000393, 102000405, 102000409, 102000443, 102000045, 102000524, 102000503, 102000043, 200000030, 200000046, 102000532, 102000307, 102000380, 102000506, 102000226, 102000388, 102000364, 102000376, 102000308, 102000366, 102000028, 102000362, 200000000, 102000363, 102000343, 102000016, 102000031, 102000060, 102000390, 102000014, 102000422, 102000519, 102000228, 102000447, 102000428, 102000360, 102000523, 200000060, 102000373, 102000068, 102000370, 102000029, 102000346, 102000375, 102000505, 102000047, 102000445, 102000500, 102000510, 102000502, 102000207, 102000206, 200000083, 200000075, 102000056, 102000050, 200000023, 102000437, 102000358, 102000011, 102000340, 102000394, 102000448, 200000020, 102000449, 200000018, 200000095, 200000090, 200000096, 102000186, 102000017, 102000052, 102000036, 200000026, 102000059, 102000030, 102000531, 200000015, 200000051, 200000039, 200000065, 200000025, 102000055, 102000512, 102000230, 102000026, 102000441, 102000355, 200000064, 200000031, 200000072, 200000040, 102000046, 102000062, 102000404, 102000534, 102000520, 102000064, 102000305, 102000345, 102000509, 102000351, 102000399, 200000063, 200000056, 200000070, 102000416, 200000035, 200000071, 200000050, 200000055, 200000041, 200000037, 200000045, 102000378, 102000334, 102000286, 102000379, 102000333, 102000533, 102000350, 102000033, 102000013, 102000063, 102000408, 102000431, 102000372, 102000442, 102000438, 102000069, 200000017, 200000049, 200000079, 200000022, 102000508, 102000516, 102000266, 102000329, 102000352, 102000359, 102000042, 102000336, 102000331, 102000074, 102000252, 200000052, 200000076, 102000421, 102000420, 102000414, 102000383, 102000387, 102000424, 102000367, 200000024, 102000035, 102000402, 102000507, 102000434, 102000229, 102000008, 102000049, 102000407, 102000353, 102000057, 102000385, 102000073, 200000081) ) SELECT * FROM ranked_records

表结构

sql
-- auto-generated definition create table power_station_discharge ( date date not null, source_type varchar(64) not null, replace_type varchar(64) not null, station_id bigint not null, period_type varchar(64) not null, discharge_type varchar(64) not null, special_type varchar(64) not null, measured_electricity numeric(10, 4), predict_electricity numeric(10, 4), discharge numeric(10, 4), create_time timestamp(0), update_time timestamp(0) ); comment on table power_station_discharge is '每日单站弃电记录表'; comment on column power_station_discharge.date is '日期'; comment on column power_station_discharge.source_type is '功率来源类型'; comment on column power_station_discharge.replace_type is '替换类型'; comment on column power_station_discharge.station_id is '场站id'; comment on column power_station_discharge.period_type is '弃电来源:(弃电时段类型)'; comment on column power_station_discharge.discharge_type is '弃电类型(特殊/常规)'; comment on column power_station_discharge.special_type is '特殊弃电类型(市场交易决策不当、并网技术条件不满足、特殊天气、无特殊类型)'; comment on column power_station_discharge.measured_electricity is '弃电时段内实测电量(万千瓦时)'; comment on column power_station_discharge.predict_electricity is '弃电时段内预测电量(万千瓦时)'; comment on column power_station_discharge.discharge is '总弃电量(万千瓦时)'; comment on column power_station_discharge.create_time is '创建时间'; comment on column power_station_discharge.update_time is '更新时间'; alter table power_station_discharge owner to postgres; create unique index power_station_discharge_unique_index on power_station_discharge (date, source_type, replace_type, station_id, period_type, discharge_type, special_type); create index power_station_discharge_index on power_station_discharge (date, replace_type, station_id, period_type, discharge_type, special_type);

查询计划

txt
INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES ('Subquery Scan on ranked_records (cost=40916.33..70403.65 rows=185494 width=129)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' -> WindowAgg (cost=40916.33..68548.71 rows=185494 width=133)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' -> Gather Merge (cost=40916.33..62520.16 rows=185494 width=125)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' Workers Planned: 2'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' -> Sort (cost=39916.30..40109.52 rows=77289 width=125)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' Sort Key: power_station_discharge.date, power_station_discharge.station_id, power_station_discharge.period_type, power_station_discharge.discharge_type, power_station_discharge.special_type, (CASE WHEN ((power_station_discharge.source_type)::text = ''REPORTED''::text) THEN 0 ELSE 1 END)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' -> Parallel Seq Scan on power_station_discharge (cost=0.81..28618.72 rows=77289 width=125)'); INSERT INTO "MY_TABLE"("QUERY PLAN") VALUES (' Filter: ((date >= ''2025-01-01''::date) AND (date <= ''2025-12-01''::date) AND ((source_type)::text = ''AUDIT''::text) AND ((replace_type)::text = ''POWER_REPLACE_DEFAULT''::text) AND (station_id = ANY (''{200000053,200000054,200000032,102000530,200000100,102000528,200000091,200000074,200000058,200000062,102000529,102000511,200000078,200000089,200000038,102000406,200000098,103000001,102000205,200000073,200000069,102000287,102000446,200000057,200000029,200000061,200000033,200000036,200000059,102000365,102000411,102000377,102000427,102000413,102000412,102000515,200000044,102000501,200000067,200000048,102000517,102000430,102000410,102000032,102000072,102000009,102000342,102000418,200000019,102000527,102000354,102000395,102000070,200000011,102000518,200000097,102000426,102000328,200000021,102000521,200000027,102000439,102000381,102000386,102000066,200000092,200000042,102000349,200000034,200000047,200000016,102000436,102000265,102000306,102000245,200000080,200000077,200000099,102000061,102000356,102000048,102000041,102000015,102000012,102000034,102000053,102000044,102000038,102000429,102000433,102000417,102000382,102000371,102000010,200000093,200000084,200000088,102000525,200000082,200000087,102000357,102000325,200000085,102000369,102000415,102000401,102000403,102000392,102000344,102000337,102000347,102000285,102000071,102000185,102000361,102000341,102000397,102000396,102000513,200000094,102000326,102000065,102000389,102000419,102000368,102000444,102000425,102000037,102000423,102000374,102000067,102000327,200000028,200000066,200000043,102000391,102000384,200000086,102000526,102000348,102000058,102000338,102000504,102000040,102000054,102000400,102000335,102000435,102000432,102000522,102000339,102000051,102000330,102000398,102000027,102000393,102000405,102000409,102000443,102000045,102000524,102000503,102000043,200000030,200000046,102000532,102000307,102000380,102000506,102000226,102000388,102000364,102000376,102000308,102000366,102000028,102000362,200000000,102000363,102000343,102000016,102000031,102000060,102000390,102000014,102000422,102000519,102000228,102000447,102000428,102000360,102000523,200000060,102000373,102000068,102000370,102000029,102000346,102000375,102000505,102000047,102000445,102000500,102000510,102000502,102000207,102000206,200000083,200000075,102000056,102000050,200000023,102000437,102000358,102000011,102000340,102000394,102000448,200000020,102000449,200000018,200000095,200000090,200000096,102000186,102000017,102000052,102000036,200000026,102000059,102000030,102000531,200000015,200000051,200000039,200000065,200000025,102000055,102000512,102000230,102000026,102000441,102000355,200000064,200000031,200000072,200000040,102000046,102000062,102000404,102000534,102000520,102000064,102000305,102000345,102000509,102000351,102000399,200000063,200000056,200000070,102000416,200000035,200000071,200000050,200000055,200000041,200000037,200000045,102000378,102000334,102000286,102000379,102000333,102000533,102000350,102000033,102000013,102000063,102000408,102000431,102000372,102000442,102000438,102000069,200000017,200000049,200000079,200000022,102000508,102000516,102000266,102000329,102000352,102000359,102000042,102000336,102000331,102000074,102000252,200000052,200000076,102000421,102000420,102000414,102000383,102000387,102000424,102000367,200000024,102000035,102000402,102000507,102000434,102000229,102000008,102000049,102000407,102000353,102000057,102000385,102000073,200000081}''::bigint[])))');

Мысль

Analyzing PostgreSQL query plan bottleneck

For the query using window function row_number, the bottleneck seems to be a parallel sequential scan that doesn't use an index, followed by a sort. Although there are existing indexes, none seems to match the full set of filtering conditions, especially considering the source_type filter. The window aggregate requires sorted input, making the sort phase inevitable.

Focusing on query execution and index strategy

The planner uses a parallel sequential scan, possibly due to a large station_id list and the date range being large, which covers a big portion of the table. While indexes match part of the query, there's a need for a sort order that matches the window function's partitioning and ordering requirements. Some optimization might involve adjusting the statistics or creating an index matching the full ORDER BY for the window.

Optimizing query index usage

The sort order in the query matches an existing index, but it wasn't used, possibly because of the large station_id list in the filter. A bitmap index scan could help, but it may still be less efficient than a sequential scan. I could create a composite index that includes source_type to improve performance and support index-only scans.

Fine-tuning index for better filtering and sorting

The current multi-column B-tree index makes good use of the leading column, but with the range on the first column (date), later columns aren't as useful for filtering. For equality on source_type and replace_type, I'd suggest adjusting the index order with source_type and replace_type first to improve efficiency while maintaining sorting for window functions. So, the best index would be: (source_type, replace_type, date, station_id, period_type, discharge_type, special_type).

Improving index for partitioning and sorting

The index matches the required sorting for the window function, but using it effectively depends on the positioning of columns like station_id and date. Date range limits usage of station_id after it in the index, so a composite index on source_type, replace_type, and station_id could improve filtering. However, achieving the right partition order for the window requires sorting by partition columns, which might change depending on the planner's choices. Adjusting the query could help with these partitioning needs.

Optimizing query with window function and filter

In this case, the row_number function seems unnecessary since the query is filtering on a single source_type = 'AUDIT', making the window function's sorting redundant. To optimize: remove the window function and sorting. If deduplication within partitions is needed, consider adding additional ordering (e.g., update_time). For station_id lists, using ANY can help reduce costs.

Optimizing query with partial index and window function

To improve query performance, I could create a partial index on source_type and replace_type to reduce the scan size. Additionally, I might consider partitioning the table by date for better pruning. Ensure statistics are up-to-date, and consider optimizing work_mem. For deduplication, I could tweak the row_number function to avoid unnecessary sorting.

Suggesting optimization for window function in PostgreSQL

In PostgreSQL, window functions require sorting by partition and order keys, but hashing isn't an option. Given the query uses row number without filtering, it could be optimized by removing the window function if unnecessary. To reduce overhead, consider DISTINCT ON, a partial index, or adjusting WHERE conditions. Additionally, eliminating the CTE can help improve performance.

Optimizing index and query execution

For creating efficient indexes, use INCLUDE for selected columns to enable index-only scans in PostgreSQL. Consider partitioning by date to improve pruning and reducing scan cost. If working with large station_id lists, using a temp table or JOIN may help. Be cautious with date ranges as the index may not fully optimize with multi-column scans.

Analyzing query plan and proposing index optimization

Leading with date in indexes is less efficient when filtering by source_type and replace_type. Given their existing date-based index, a better approach is leading with source_type and replace_type. Ray estimation indicates scanning too many rows, so using a narrower date range or partitioning might improve selectivity.

Proposing optimizations based on query plan analysis

The query plan involves reading a large portion of the table, using a parallel scan, filtering on date and other columns, followed by a sort for windowagg. The lack of optimal index use is due to the column order and selectivity. Suggestions: rewrite the query, refine indexes, use partitioning, and adjust stats to enhance performance.

Thought for 2m 22s

Ответ

先把执行计划“翻译”一下,你这条 SQL 的主要代价都花在 扫表 + 为 WindowAgg 排序 上。

1) 从计划看瓶颈在哪

计划关键链路是:

  • Parallel Seq Scan on power_station_discharge(并行顺序扫表)
    说明优化器判断:用现有索引不划算(或统计信息认为命中比例较高),所以直接把表的大段数据读出来再过滤。
    这里估算过滤后仍有 ~185k 行rows=185494)。

  • Sort(每个 worker 排序) + Gather Merge(合并)
    WindowAgg 计算 row_number() 必须要求输入按 partition key + order key 有序,所以出现了大排序:
    Sort Key: date, station_id, period_type, discharge_type, special_type, CASE...

  • WindowAgg
    本身计算 row_number 的 CPU 也有,但通常不如“排序 + I/O”贵;你这里成本上也能看出来:Sort 段的 cost 从 28618 拉到 39916,再到 Gather Merge 62520,WindowAgg 68548。

👉 结论:瓶颈主要是并行顺序扫表带来的 I/O + 为 WindowAgg 做的全量排序(可能还会 spill 到磁盘)。


2) 一个“隐藏的最大优化点”:你这个窗口函数在当前 SQL 下基本是纯开销

你 WHERE 里写死了:

sql
AND source_type = 'AUDIT'

同时表上有唯一索引:

sql
(date, source_type, replace_type, station_id, period_type, discharge_type, special_type) UNIQUE

这意味着:在 source_type 固定为 AUDIT 的子集内,同一组 (date, replace_type, station_id, period_type, discharge_type, special_type) 最多只有 1 行(因为 source_type 已固定,唯一性就落在其余列上)。

所以你现在的:

sql
ROW_NUMBER() OVER (PARTITION BY date, replace_type, station_id, period_type, discharge_type, special_type ...)

在当前查询条件下几乎等价于:每行 rn 都是 1
也就是说:你为了算一个恒等于 1 的列,付出了“全量排序 + WindowAgg”的代价。

如果你不需要 rn(你外层也没用 rn 过滤),直接删掉窗口:

sql
SELECT * FROM power_station_discharge WHERE date BETWEEN '2025-01-01' AND '2025-12-01' AND source_type = 'AUDIT' AND replace_type = 'POWER_REPLACE_DEFAULT' AND station_id = ANY('{...}'::bigint[]);

如果你只是为了带个 rn 字段给前端/调试,用常量即可:

sql
SELECT *, 1 AS rn FROM power_station_discharge WHERE ...

这一步通常就是“数量级”的优化,因为直接消灭了 Sort + WindowAgg。


3) 为什么没走你现有索引(以及怎么补)

你现有两个索引:

  1. unique:(date, source_type, replace_type, station_id, period_type, discharge_type, special_type)
  2. non-unique:(date, replace_type, station_id, period_type, discharge_type, special_type)

但你的过滤条件是 date 范围(很大,接近一年)+ source_type/replace_type 等值 + station_id 大 IN 列表

  • 多列 BTree 如果第一列是范围条件(date BETWEEN),后面的列对“缩小扫描范围”的帮助会明显变弱(往往还是要扫完这一年内的大段索引范围,再去判断后续列),很容易让优化器觉得“不如并行顺扫”。
  • 再加上你还需要为了 WindowAgg 做排序;如果没有一个能同时满足过滤与排序的索引,排序成本依然在。

索引优化方向(按你常见查询模式选)

A. 如果你保留 AUDIT + POWER_REPLACE_DEFAULT 这种固定条件很常见
建一个**部分索引(partial index)**最划算,体积小、命中准:

sql
CREATE INDEX CONCURRENTLY psd_audit_default_idx ON power_station_discharge (date, station_id, period_type, discharge_type, special_type) WHERE source_type = 'AUDIT' AND replace_type = 'POWER_REPLACE_DEFAULT';

这样能显著降低扫表范围;并且键顺序也贴近你计划里的排序键(replace_type 固定后它本来也不在 sort key 里)。

B. 如果 source_type、replace_type 经常作为等值过滤,date 是范围
把等值列放前面通常更好:

sql
CREATE INDEX CONCURRENTLY psd_st_rt_date_station_idx ON power_station_discharge (source_type, replace_type, date, station_id, period_type, discharge_type, special_type);

这类索引更容易把扫描范围限制在 (AUDIT, POWER_REPLACE_DEFAULT) 这一个“子空间”内,再做 date 范围。

注意:索引不是越多越好,优先选最贴合你线上最常见的谓词组合的一条。


4) 其它可选优化(看实际执行情况取舍)

  1. 把超长 IN 列表改成 JOIN(减少计划时间 & 可能让优化器更敢用索引)
    比如把 station_id 列表作为参数数组传入:
sql
WITH s AS (SELECT unnest($1::bigint[]) AS station_id) SELECT p.* FROM power_station_discharge p JOIN s USING (station_id) WHERE ...
  1. 只取需要的列,别 SELECT *
    排序/传输的行宽是 125B+,列越多排序越吃内存、越容易 spill。

  2. EXPLAIN (ANALYZE, BUFFERS) 确认是否“Sort spill”
    如果看到 Sort Method: external merge 或磁盘读写很高,那提高该会话 work_mem(针对这条查询)可能立竿见影;但它解决不了“窗口函数本可删”的根因。

  3. 按 date 分区(按月/季度)
    如果表很大且所有查询都带 date 范围,分区能让扫描范围直接变小(分区裁剪)。


推荐你优先做的两步(收益最大)

  1. 先删掉 WindowAgg(或 rn 改常量 1):这是当前 SQL 的最大浪费点。
  2. 如果还慢,再根据业务常用条件选 partial index(source_type, replace_type, date, …) 这种复合索引。

如果你把 EXPLAIN (ANALYZE, BUFFERS) 的真实执行结果也贴一下(尤其是 Sort Method / shared hit/read),我可以再把“到底是 I/O 还是 sort spill”定得更准,并给出更具体的索引列顺序建议。

Поделиться этим Q&A