我现在想做基于激光点云的目标检测算法,目前算法检测结果如下,我的场景是悬挂式单轨交通,主要是小目标检...

Tạo vào: 26 tháng 12, 2025

Trả lời bằng GPT-5.2 Thinking bởi Chat01

Câu hỏi

image.pngimage.png

我现在想做基于激光点云的目标检测算法,目前算法检测结果如下,我的场景是悬挂式单轨交通,主要是小目标检测和不规则结果,请根据我的场景特点和算法检测结果,选择一个可行的算法检测方案,要确保检测精度和结果以及FPS的综合最优,我目前的方案是想基于体素取进行检测,请选择一个比较好改的算法,考虑主流改进的基础模型去进行改进,我的雷达是揽沃tele-15,,点云范围在算法里X轴[0,160]m,Y轴(横向)[-20,+20]m,Z轴(垂向)[-2,6]m,本文将体素网格尺寸设定为[0.1,0.1,0.2]m,实时性要求在不要很差劲即可,我现在的是OpenPCDet,我是用是3090的24G的GPU,同时要确保可以参考的改进的的中英文文章很多,要不我从哪里知道怎么改,改哪些内容呢,我有个其他类似模型或者模块的更改依据也更快实现,比如基于体素的改进或者基于PILLAR,以及其他,比如基于点的检测方法有哪些可以拿来用的开源算法或者论文呢,类似这样,理解吗,我想选择的基础模型是单阶段的检测模型,体素化模型,请再进行分析,以及大家选择体素化的改进点,同时给出改进算法的来源和改进点,请将下面的人的论文去除:Lian等人,RAE3D: Multiscale Aggregation-Enhanced 3D Object Detection for Rail Transit Obstacle Perception,RVSA‑3D(Voxel-based fully sparse attention 3D object detection for rail transit obstacle perception)这两个作者的论文都包括我们实验室的,我们不能重复,同样的这个作者的其他论文也不要推荐给我,同时现在给我一个可执行的替换思路

Python

python
from PIL import Image img = Image.open('/mnt/data/image.png') img.size (1317, 381)

Python

python
import matplotlib.pyplot as plt, numpy as np plt.figure(figsize=(14,4)) plt.imshow(img) plt.axis('off') plt.show()

Suy nghĩ

Interpreting model metrics for selection

I’ve parsed the data: PointRCNN has the highest accuracy (81.81 mAP, 74.73) but its FPS is low at 19. Models like PV_RCNN and VoxelRCNN provide solid performance in terms of accuracy and FPS. Now I’ll weigh the trade-off between accuracy and processing speed for the best choice.

Evaluating single-stage, voxel-based models

The user wants a single-stage, voxel-based algorithm. Among the options, VoxelNeXt stands out with high FPS (62), but may have lower mAP, which we could improve with techniques like higher resolution or multi-scale. CenterPoint is also viable, especially since both it and VoxelNeXt are implemented in OpenPCDet.

Improving CenterPoint for small, irregular objects

I can suggest using CenterPoint with its voxel-based backbone, focusing on improving small target detection. To optimize it, we could use high-resolution voxels, multi-scale feature fusion, and range-dependent voxel sizes. Deformable attention or a sparse BEV transformer might help. Alternatively, SECOND or VoxelNeXt could work as voxel-based, single-stage options.

Exploring modifications for VoxelNeXt and alternatives

VoxelNeXt is known for its speed and accuracy. Possible improvements include adding multi-head, IoU branch (QualityFocalLoss), and integrating Voxel Transformer (VoTr) modules (by Qi?). Alternatively, CenterPoint with voxel backbone might work, especially with attention modules like SE blocks, CBAM, or Non-local for simplicity, avoiding multi-modality.

Refining object detection for small targets

For small targets, I suggest using high-resolution BEV with a stride of 1/4 or 1/2, adjusting anchor and gaussian sizes for precision. For CenterPoint (anchor-free), use adaptive voxelization and optional deformable attention. To optimize fps and accuracy, SECOND might serve as a baseline, with modern updates like CenterHead for improvements.

Considering hybrid models and modifications

A hybrid approach using the sparse voxel backbone from SECOND and a CenterHead (CenterPoint-style) could work, though it may just end up being like CenterPoint. For better impact, I could suggest VoxelNeXt, tuning its configuration for better mAP (e.g., voxel size, filters), or recommend alternatives like VoxelNet++ or 3DSSD.

Refining suggestions for detection model

I’m considering different options for improved small object detection. I could choose CenterPoint with voxel backbone and enhancements like multi-scale BEV and dynamic label assignment, or go with SECOND for higher FPS and better mAP. VoxelNeXt is another choice, though it has fewer resources compared to CenterPoint. Either way, adjustments can improve their mAP and small object detection.

Choosing the best baseline model

I’m considering using SECOND as a baseline and incorporating some newer improvements like SECONDFPN and CenterHead—although this could make it more like CenterPoint. It seems that their CenterPoint config isn’t fully optimized yet, so tweaking things like voxel size could help. For better small object detection, switching to out_size_factor 4 might improve results, even if it lowers FPS. Ultimately, I’d recommend CenterPoint or VoxelNeXt with upgrades like CenterFormer and multi-scale heads.

Providing model improvements and sources

To improve detection, I would recommend using CenterPoint as the baseline with a spconv backbone, adding high-resolution branches for small objects, and integrating attention blocks like SE or CBAM. Enhancing voxel feature encoding and incorporating methods like GT sampling, soft-NMS, and IoU-aware confidence can boost accuracy, especially for small objects. Open-source implementations like OpenPCDet and VoxelNeXt offer useful resources.

Choosing the right model with customizations

I’m considering proposing either CenterPoint or VoxelNeXt for the user’s case. VoxelNeXt offers high speed and good accuracy, especially for high-resolution, but it has a sparse representation, making it better for large-scale objects. On the other hand, CenterPoint might be easier for modifications and is already well-supported. Combining a VoxelNeXt backbone with a CenterPoint head could work; this way, I get the strengths of both.

Proposing a baseline and enhancements for small-object detection

To improve small-object detection, I’m suggesting a combination of CenterPoint and SA-SSD. CenterPoint offers an anchor-free approach, while SA-SSD enhances performance through auxiliary segmentation. We could incorporate VoxelVFE, increase BEV resolution, and add IoU prediction, attention modules, and data-level techniques like oversampling. This balances efficiency with small-object accuracy, making it an adaptable option for the user.

Proposing baseline and improvements for small-object detection

Based on the user's needs, I will suggest CenterPoint with a sparse convolution backbone, offering a good accuracy-speed balance while adapting to small objects. I'll propose implementing steps like VoxelVFE, dynamic voxelization, BEV downsampling, auxiliary segmentation, IoU-aware quality scores, and range-aware sampling, plus temporal fusion for small objects. This plan is executed with recommended modifications from papers, offering detailed steps and modular code improvements.

Thought for 3m 27s

Trả lời

下面我按“单阶段 + 体素化 + OpenPCDet里好改 + 小目标/不规则物体优先 + 精度/FPS综合最优”给你一个可执行的替换思路(并且不推荐你点名要排除的那位作者/团队的任何论文,包括你列出的两篇及其同作者其它工作)。


1) 先看你现有结果:单阶段体素里,谁最接近“综合最优”?

从你表里单阶段且“体素/BEV”路线(不算 PointRCNN / PV-RCNN / PartA2 / VoxelRCNN 这些两阶段):

  • SECOND:mAP≈44.36,106 FPS(你目前单阶段里综合最好)
  • CenterPoint:mAP≈41.96,65 FPS(速度不错,但你这版精度偏低)
  • VoxelNeXt:mAP≈40.81,62 FPS(你这版精度更低)
  • PointPillar:mAP≈41.68,49 FPS(柱体路线)

同时你的小目标类里(Plastic sheet / Kite / Bird)在体素单阶段里普遍偏低,说明瓶颈大概率来自:

  • 点太少(Tele-15 这种线数不高,远处小目标很容易只有几条线/几个点)
  • BEV输出分辨率不足(下采样太大,小目标在热力图/特征图上“只占1个格子甚至不到”)
  • 训练采样与损失对小目标不友好(正负样本极不平衡、远距样本太少)

2) 我建议你选的“基础模型”:CenterPoint-VOXEL(单阶段、体素、OpenPCDet好改)

虽然你现在 CenterPoint 指标略低于 SECOND,但我仍建议你把 CenterPoint-VOXEL当底座(而不是 SECOND)原因是:

  1. 小目标/不规则物体更适配 anchor-free
    你这些目标(塑料布/风筝/鸟)3D box 的“尺寸、姿态、形状”波动大,anchor-based 很容易出现 anchor 匹配/回归不稳定;CenterPoint 的中心热力图思路更稳,改进空间也更大。

  2. OpenPCDet 里模块化最好改
    你后续要加:更高分辨率BEV、多尺度头、IoU质量分支、注意力模块、Soft-NMS/DIoU-NMS、以及多帧融合 —— CenterPoint 路线改动成本最低。

  3. 可参考中英文资料最多
    CenterPoint 本身、以及围绕“center-based / heatmap head / quality-aware score / multi-sweep fusion”的改法,中文复现/解读/改进文章非常多,落地快。

你“想走体素网格 [0.1,0.1,0.2]m + range[0,160]×[-20,20]×[-2,6]”完全没问题,就用它做底座继续改。


3) 给你一个“可执行的替换方案”:CenterPoint-VOXEL + 4个高性价比改进点

下面这套是我建议你按顺序做的 V1→V4,每一步都能在 OpenPCDet 里比较快落地,并且每一步都有大量公开论文/开源实现可参考(都避开你要求排除的作者线)。

V1:把“小目标最关键”的 BEV 分辨率先救回来(最推荐先做)

改动目标:让小目标在 BEV 特征图上不至于“只有1格”。

做法(OpenPCDet层面)

  • 把 CenterPoint 的 out_size_factor 从常见的 8 改到 4(必要时到2,但先别上2)
  • 同时把 BEV neck 做成更“像FPN”的结构:增加一层上采样/融合,让高分辨率特征参与检测头

收益

  • 小目标(Plastic sheet / Kite / Bird)通常会显著涨(代价是 FPS 掉一些,但你 3090 仍可控)
  • 这是所有改法里对“小目标AP”最直接的一刀

可参考来源(关键词)

  • CenterPoint(center-based heatmap head)
  • 多尺度/FPN 类思路(2D检测通用,3D里大量复用)

V2:把体素特征编码从“均值/简单”升级为“更强的 VFE”(对稀疏小目标很重要)

问题本质:Tele-15 点本来就少,你如果用很弱的 VFE,很容易把少量点的几何差异“平均没了”。

做法

  • 如果你现在用的是偏简单的体素特征(例如 MeanVFE),建议换成 VoxelVFE / PFN层更强的编码(类似 VoxelNet/PointPillars 的 PFN 思路)
  • 给点特征增加:(x,y,z, intensity, Δx/Δy/Δz, distance) 这类稳定增益(distance 对远处小目标通常有帮助)

收益

  • 对“点很少的小目标/薄片类”通常比加大backbone更有效
  • FPS影响相对可控

可参考来源(关键词)

  • VoxelNet / SECOND 系列的 VFE(PFN/voxel feature encoding)
  • PointPillars 的 pillar feature encoding(思路可迁移到 voxel)

V3:加一个“质量分支(IoU-aware / quality-aware)”,让得分更靠谱(不规则目标很吃这个)

你这些目标不规则,分类分数≠定位质量,很容易出现“分高但框飘”。

做法

  • 在 head 上加 IoU/quality prediction branch
  • 推理时用 score = cls_score * quality_score(或类似融合),再做 NMS

收益

  • mAP 往往涨得很“干净”:误检少、框更稳
  • 对 Kite/Plastic sheet 这类形状漂的目标尤其明显

可参考来源(关键词)

  • “IoU-aware / Quality-aware” detection(2D/3D都有大量工作)
  • CenterPoint 系列里也常见“quality/velocity”等辅助分支改法

V4:数据与后处理针对“小目标远距稀疏”的工程化增强(你场景非常适合)

这是你场景里我认为收益/成本比最高的一类改动,因为 Tele-15 对远处小目标太吃亏。

(a) 小目标“数据库采样/过采样”与距离分桶采样

  • DB Sampling:把 Drone/Kite/Bird/Plastic sheet 的样本库做起来,训练时提高采样概率
  • 距离分桶:远距离小目标在训练里占比通常太低,建议按距离分桶做重采样/重权重

(b) 多帧融合(Multi-sweep)——强烈建议作为“第二阶段大招”

在单轨交通里平台运动相对可控(轨道约束),多帧对齐更容易。

  • 把过去 N 帧点云(例如 3~5 帧)运动补偿到当前帧坐标系
  • 给每个点加 time_lag 特征(Δt),让网络学会“动态/静态”区别

收益

  • 对 Tele-15 这种稀疏雷达,多帧通常是“小目标AP”最明显的提升来源之一
  • 实时性会下降,但你说“不要很差劲即可”,而且 3090 还能扛

可参考来源(关键词)

  • Waymo/nuScenes 系多帧 LiDAR 检测(multi-sweep / temporal fusion)
  • CenterPoint 的多帧输入改法社区非常多

(c) NMS 改为 Soft-NMS 或 DIoU-NMS + 类别阈值分离

  • Kite/Bird 这类小目标,NMS/阈值对最终AP影响很大
  • 建议按类设置 score thresh / NMS thresh,并尝试 Soft-NMS 或 DIoU-NMS

4) 如果你坚持“体素单阶段里先选你表里最强的”:SECOND 也行,但我建议这样“换头不换身”

如果你希望“立刻不掉精度/FPS”,可以先以 SECOND为主干(你现在106FPS且mAP最高),但把检测头逐步往 CenterPoint 风格靠拢:

  • SECOND 的 sparse voxel backbone 保留(速度优势)
  • head 从传统 anchor-based,迁移/替换为 center-based heatmap head(CenterPoint式)
  • 然后按上面 V1~V4 的顺序加:高分辨率、多尺度、质量分支、多帧

这条路线的好处是:你从当前最优(SECOND)出发,不会一上来“换模型掉点”。


5) 你问“还有哪些点/柱/体素方法可借鉴的开源算法/论文?”

给你按“你能拿来改模块”的角度列(都避开你要求排除的作者线):

体素/稀疏卷积(单阶段/可做单阶段)

  • SECOND(经典体素单阶段底座)
  • CenterPoint-VOXEL(推荐你主线)
  • SA-SSD(单阶段 + 辅助任务,对小目标/前景学习很友好,很多人拿它的“辅助分支”嫁接到别的框架)
  • IA-SSD(IoU/quality aware 思路很适合你这种不规则目标,可借鉴“质量分支/打分融合”)
  • VoxelNeXt(你已有结果,但建议等你把 V1/V2/V3 做完再回头对比)

Pillar(如果你后面想走更轻量)

  • PointPillars(柱体编码 + 2D backbone,工程上好改)
  • CenterPoint-Pillar(同一套 head,换 pillar 编码)

点法(你现在 PointRCNN 很强,但不是单阶段;这里给“可借鉴模块”)

  • 3DSSD(单阶段点法,小目标处理有不少技巧可借鉴到体素里,比如采样/关键点表达)
  • Point Transformer 类模块(注意力/局部聚合思想,可作为“插件模块”加到 voxel/BEV 特征里)

6) 给你一条“今天就能开干”的替换路线图(不需要你先去读一堆论文)

  1. 用 OpenPCDet 的 CenterPoint-VOXEL 跑通你的数据(先对齐你现在的 voxel size/range)
  2. 直接做 V1:out_size_factor=4 + 加强BEV多尺度融合
  3. V2:换更强 VFE(PFN/VoxelVFE)+ 增加distance等点特征
  4. V3:加 quality/IoU 分支 + 分数融合
  5. V4:DB采样/距离分桶 + 类别阈值/NMS策略
  6. 若 Kite/Plastic sheet 仍低:上 多帧融合(3~5帧),通常会是质变

如果你愿意,我可以根据你 OpenPCDet 当前用的具体 config(yaml)(尤其是:out_size_factor / backbone stride / VFE类型 / target assigner / NMS配置 / 数据增强与DB采样)直接给你一份“按你数据范围和体素尺寸改好的配置改动清单”(逐项告诉你改哪几个字段、为什么、预期影响)。你把 yaml 贴出来即可。

Câu hỏi

CenterPoint的原始配置文件如下:我下来怎么改进呢,CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']

DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml

MODEL:
NAME: CenterPoint

text
VFE: NAME: MeanVFE BACKBONE_3D: NAME: VoxelResBackBone8x MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHead CLASS_AGNOSTIC: False CLASS_NAMES_EACH_HEAD: [ ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True # TODO NUM_HM_CONV: 2 # TODO SEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, } TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 8 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0] MAX_OBJ_PER_SAMPLE: 500 NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti

OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80

text
OPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35

Python

python
import math def gaussian_radius(det_size, min_overlap=0.7): height, width = det_size a1 = 1 b1 = (height + width) c1 = width * height * (1 - min_overlap) / (1 + min_overlap) sq1 = math.sqrt(b1 ** 2 - 4 * a1 * c1) r1 = (b1 + sq1) / 2 a2 = 4 b2 = 2 * (height + width) c2 = (1 - min_overlap) * width * height sq2 = math.sqrt(b2 ** 2 - 4 * a2 * c2) r2 = (b2 + sq2) / 2 a3 = 4 * min_overlap b3 = -2 * min_overlap * (height + width) c3 = (min_overlap - 1) * width * height sq3 = math.sqrt(b3 ** 2 - 4 * a3 * c3) r3 = (b3 + sq3) / 2 return min(r1, r2, r3) for overlap in [0.1,0.3,0.5,0.7,0.9]: print(overlap, gaussian_radius((2,2), overlap), gaussian_radius((3,6), overlap))

Suy nghĩ

Providing configurations and improvements

Since OpenPCDet isn't in the container, but the user has it in their environment, I'll guide them on how to check for modules. I’ll suggest optimizing NMS by setting NMS_THRESH between 0.1-0.2, adjusting POST_CENTER_LIMIT_RANGE and SCORE_THRESH, and tweaking parameters for small objects, like GAUSSIAN_OVERLAP or MIN_RADIUS. I'll outline configuration adjustments for small objects, augmentation strategies, and provide paths for implementation tasks. I'll also create an evaluation plan and ensure dataset coordinate consistency.

Providing configuration tweaks and guidance

First, I’ll address immediate fixes: adjust POST_CENTER_LIMIT_RANGE to [0, -20, -2, 160, 20, 6], set NMS_THRESH to 0.1, and enable MULTI_CLASSES_NMS. For the 4x stride, I’ll suggest setting the backbone to VoxelResBackBone4x, adjusting FEATURE_MAP_STRIDE to 4, and updating 2Dupsample. I’ll outline how to create the 4x backbone if needed, with sample pseudo code. Then, I’ll focus on YAML modifications and VFE updates.

Thought for 2m 13s

Trả lời

下面我按“你这份 CenterPoint 配置 + Tele-15 稀疏点云 + 小目标/不规则目标 + 单阶段/体素化/OpenPCDet 好改”给你一套从能立刻涨点的工程修正 → 到真正针对小目标的结构性改进的可执行路线。你可以按 S0→S3 逐步做消融,每一步都能在 OpenPCDet 里落地。


S0(先别改网络):你这份 config 里有两个“会直接拉低指标”的硬问题

0.1 POST_CENTER_LIMIT_RANGE 跟你点云范围不一致(会直接裁掉 x>75.2 的预测)

你场景是 X:[0,160],Y:[-20,20],Z:[-2,6],但现在是:

yaml
POST_CENTER_LIMIT_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0]

这会把 x>75.2 的候选框直接过滤掉(你 160m 的一半都没了),小目标远距离更惨。

✅ 立刻改成(建议加一点 margin):

yaml
POST_CENTER_LIMIT_RANGE: [0.0, -20.0, -2.5, 160.0, 20.0, 6.5]

0.2 NMS_THRESH: 0.01 太极端(很容易误杀召回)

IoU>0.01 就互相抑制,哪怕只轻微重叠也会杀掉;小目标框本来抖动大,召回会很受伤。

✅ 建议先改:

yaml
MULTI_CLASSES_NMS: True NMS_THRESH: 0.1 # 先从 0.1 起步,后面再按类调

先做 S0,你很可能就会看到 mAP/召回明显回升(尤其是远距离类)。


S1(最推荐的“第一刀”):把输出步长从 8 改成 4(专治小目标)

你现在 FEATURE_MAP_STRIDE: 8,且 voxel_xy=0.1m
=> BEV 上一个格子代表 0.8m
对 Bird/Kite/Plastic_sheet 这种小目标,中心热力图很容易“挤在一个格子里”,学不稳。

✅ 目标:做到 stride=4,让 BEV 网格变成 0.4m/格(小目标会明显好很多)。

1.1 配置层面要改的字段

把:

yaml
TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 8

改为:

yaml
TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4

1.2 网络层面必须“真的输出 stride=4”

你当前 BACKBONE_3D: VoxelResBackBone8x,它天然是 8x 下采样。

你有两条可执行路径(选更省事的那条):

方案A(推荐,如果你版本里有):直接换 4x backbone

yaml
BACKBONE_3D: NAME: VoxelResBackBone4x

方案B(如果没有 4x):复制 8x backbone 改成 4x(最小代码改动)pcdet/models/backbones_3d/spconv_backbone.py 里复制一份 VoxelResBackBone8xVoxelResBackBone4x
核心思路:少做一次 stride=2 的下采样(原来 2×2×2=8,现在 2×2=4)。

这一步是你“小目标涨点幅度最大”的结构改动;FPS 会降一点,但你 3090 24G 通常扛得住。


S2(第二刀,性价比很高):把 MeanVFE 换成更强的 VFE(Tele-15 稀疏点特别需要)

你现在:

yaml
VFE: NAME: MeanVFE

MeanVFE 对“点很少的体素/远距离小目标”非常吃亏:信息一平均就没了。

✅ 建议换成更强的体素特征编码(OpenPCDet 常见可选项是 VoxelVFE 或带 PFN 的实现;你装的版本里叫什么就用什么,原则是别再用 mean)。

一个常见改法(示例,字段名以你仓库实现为准):

yaml
VFE: NAME: VoxelVFE WITH_DISTANCE: True USE_ABSLOTE_XYZ: True NUM_FILTERS: [32, 128]

同时强烈建议把点特征里加上 distance(到雷达距离) 和相对坐标(Δx/Δy/Δz),对远距离小目标更稳。

你做完 S1(stride=4)再做 S2(强VFE),通常 Bird/Kite 这类会继续涨,而且涨得“更稳定”。


S3(第三刀:不规则目标更“干净”的涨法):加质量分支(IoU/quality-aware)

你现在 CenterHead 只有:

yaml
HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']

对于 Plastic_sheet / Kite 这种形状不规则、框漂移大的目标,“分类分数”经常不能代表“定位质量”,导致:

  • 分数很高但框不准(AP 被拉低)
  • NMS 后保留下来的不一定是最准的

✅ 做法:在 head 里加一个 quality/iou 分支,推理时用: final_score = cls_score * quality_score

配置层面长这样(示例)

yaml
SEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot', 'iou'] HEAD_DICT: ... 'iou': {'out_channels': 1, 'num_conv': 2}

代码改动点(OpenPCDet里改动量不大)

  • pcdet/models/dense_heads/center_head.py
    • 增加 iou head 输出
    • target 里计算 iou target(用已有 box iou util)
    • loss 加一项(BCE 或 L1 都行,先求稳)

这一步往往不是暴涨,但会让结果“更干净”:误检少、框更稳,尤其适合你“不规则目标”的诉求。


你可以直接抄的“建议版配置改动摘要”

先做 S0 + S1(最建议的最短路径):

yaml
# S0: 修正范围 + NMS POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [0.0, -20.0, -2.5, 160.0, 20.0, 6.5] NMS_CONFIG: MULTI_CLASSES_NMS: True NMS_TYPE: nms_gpu NMS_THRESH: 0.1 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 # S1: stride 4 DENSE_HEAD: TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 1 # 小目标建议先从 1 起(你现在 2 可能太糊)

然后再做 S2:

yaml
VFE: NAME: VoxelVFE WITH_DISTANCE: True USE_ABSLOTE_XYZ: True NUM_FILTERS: [32, 128]

训练/数据侧(很关键,但不动网络也能涨):两项你一定要做

这些在 custom_dataset.yaml 里的 DATA_AUGMENTOR 做:

  1. DB Sampling(针对小类强过采样)
    Bird/Kite/Plastic_sheet 提高采样概率(它们天然样本少、远距少)。

  2. 距离分桶重采样 / loss reweight
    把 x 距离分桶(例如 0–40 / 40–80 / 80–160),强制训练看到足够多远距离小目标。
    Tele-15 远距离点稀疏,这一招经常比堆网络更有效。


你下一步我建议你怎么做(最省时间的执行顺序)

  1. 先改 S0(range + NMS)跑一版(你很可能马上回升)
  2. 再改 S1 stride=4 跑一版(小目标通常明显涨)
  3. 再上 S2 强VFE 跑一版(远距小目标继续涨)
  4. 最后再决定要不要做 S3 质量分支(追求更稳更干净)

为了我能“按你工程直接给到可跑的改动清单”

你把下面两样贴出来,我就能把 “你仓库里能用的 VFE/backbone 名称” 精确对上,给你一份更像“补丁说明”的修改表(改哪几行、在哪个文件):

  1. custom_dataset.yaml 里关于 POINT_CLOUD_RANGE / VOXEL_SIZE / DATA_AUGMENTOR / MAX_NUMBER_OF_VOXELS / MAX_POINTS_PER_VOXEL 那段
  2. 你 OpenPCDet 版本里 pcdet/models/backbones_3d/spconv_backbone.py 里有哪些 backbone 类名(看一下有没有现成 VoxelResBackBone4x

你不用重新解释场景,我已经按你“单轨悬挂、小目标、不规则、体素单阶段、OpenPCDet、Tele-15、range/voxel size”这个约束给你设计路线了。

Chia sẻ Q&A này