我现在想做基于激光点云的目标检测算法,目前算法检测结果如下,我的场景是悬挂式单轨交通,主要是小目标检...
बनाया गया: 9 जनवरी 2026
का उपयोग करके उत्तर दिया गया GPT-5.2 Pro द्वारा Chat01
बनाया गया: 9 जनवरी 2026
का उपयोग करके उत्तर दिया गया GPT-5.2 Pro द्वारा Chat01
我现在想做基于激光点云的目标检测算法,目前算法检测结果如下,我的场景是悬挂式单轨交通,主要是小目标检测和不规则结果,请根据我的场景特点和算法检测结果,选择一个可行的算法检测方案,要确保检测精度和结果以及FPS的综合最优,我目前的方案是想基于体素取进行检测,请选择一个比较好改的算法,考虑主流改进的基础模型去进行改进,我的雷达是揽沃tele-15,,点云范围在算法里X轴[0,160]m,Y轴(横向)[-20,+20]m,Z轴(垂向)[-2,6]m,本文将体素网格尺寸设定为[0.1,0.1,0.2]m,实时性要求在不要很差劲即可,我现在的是OpenPCDet,我是用是3090的24G的GPU,同时要确保可以参考的改进的的中英文文章很多,要不我从哪里知道怎么改,改哪些内容呢,我有个其他类似模型或者模块的更改依据也更快实现,比如基于体素的改进有哪些可以拿来用的开源算法或者论文呢,类似这样,理解吗,我想选择的基础模型是单阶段的检测模型,体素化模型,请再进行分析,以及大家选择体素化的改进点,同时给出改进算法的来源和改进点,我的基线模型是centerpoint,我的代码现在如下:我现在使用改进centerpoint的模型,使用了你说的PDVIntensityVFE,具体配置文件如下:CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml
MODEL:
NAME: Test
textVFE: NAME: PDVMeanVFE USE_XY_RANGE: True DENSITY_LOG1P: True RANGE_SCALE: 160.0 # 前向 160m,或者 120 之类 INTENSITY_IDX: 3 # xyzI 顺序 NUM_HIST_BINS: 8 # 可以先 8,看效果再增加 HIST_MIN: 0.0 # intensity 范围 HIST_MAX: 1.0 # 如果你的 intensity 已经归一化 BACKBONE_3D: NAME: VoxelResBackBone8x MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHead CLASS_AGNOSTIC: False CLASS_NAMES_EACH_HEAD: [ ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True # TODO NUM_HM_CONV: 2 # TODO SEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, } TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 8 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-160, -75.2, -5.0, 160, 75.2, 8.0] MAX_OBJ_PER_SAMPLE: 500 NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80
textOPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
模块的代码如下:import torch
from .vfe_template import VFETemplate
class PDVMeanVFE(VFETemplate):
def init(self, model_cfg, num_point_features, voxel_size=None, point_cloud_range=None, **kwargs):
super().init(model_cfg=model_cfg)
textif voxel_size is None: voxel_size = kwargs.get('voxel_size', None) if point_cloud_range is None: point_cloud_range = kwargs.get('point_cloud_range', None) assert voxel_size is not None and point_cloud_range is not None self.num_point_features = num_point_features self.voxel_x, self.voxel_y, self.voxel_z = voxel_size self.pc_range = point_cloud_range self.voxel_volume = self.voxel_x * self.voxel_y * self.voxel_z # PDV 部分 self.use_xy_range = getattr(self.model_cfg, 'USE_XY_RANGE', True) self.density_log1p = getattr(self.model_cfg, 'DENSITY_LOG1P', True) self.range_scale = getattr(self.model_cfg, 'RANGE_SCALE', 1.0) # intensity 相关配置 self.intensity_idx = getattr(self.model_cfg, 'INTENSITY_IDX', 3) # 一般 xyzI 里 I = 3 self.num_hist_bins = getattr(self.model_cfg, 'NUM_HIST_BINS', 8) self.hist_min = getattr(self.model_cfg, 'HIST_MIN', 0.0) self.hist_max = getattr(self.model_cfg, 'HIST_MAX', 1.0) # 与 OpenPCDet 的 PillarVFE 一样的中心点 offset 计算方式(coords: [b, z, y, x]) # 其中 x 索引在 coords[:, 3],y 在 coords[:, 2],z 在 coords[:, 1] ([GitHub](https://github.com/open-mmlab/OpenPCDet/blob/master/pcdet/models/backbones_3d/vfe/pillar_vfe.py?utm_source=chatgpt.com)) self.x_offset = self.voxel_x / 2 + self.pc_range[0] self.y_offset = self.voxel_y / 2 + self.pc_range[1] self.z_offset = self.voxel_z / 2 + self.pc_range[2] def get_output_feature_dim(self): # 原始 C + 2(PDV) + num_hist_bins(IVE) return self.num_point_features + 2 + self.num_hist_bins @torch.no_grad() def _compute_voxel_center(self, coords, dtype): # coords shape: (M, 4) = [batch_idx, z, y, x] x_center = coords[:, 3].to(dtype) * self.voxel_x + self.x_offset y_center = coords[:, 2].to(dtype) * self.voxel_y + self.y_offset z_center = coords[:, 1].to(dtype) * self.voxel_z + self.z_offset return x_center, y_center, z_center def forward(self, batch_dict, **kwargs): voxels = batch_dict['voxels'] # (M, T, C) voxel_num_points = batch_dict['voxel_num_points'] # (M,) coords = batch_dict['voxel_coords'] # (M, 4) points_sum = voxels.sum(dim=1) # (M, C) normalizer = torch.clamp_min( voxel_num_points.view(-1, 1), 1.0 ).type_as(voxels) mean_feat = points_sum / normalizer # (M, C) # ---------- PDV: density ---------- density = voxel_num_points.type_as(voxels) / self.voxel_volume if self.density_log1p: density = torch.log1p(density) # ---------- PDV: range ---------- x_center, y_center, z_center = self._compute_voxel_center(coords, voxels.dtype) if self.use_xy_range: rng = torch.sqrt(x_center * x_center + y_center * y_center) else: rng = torch.sqrt(x_center * x_center + y_center * y_center + z_center * z_center) if self.range_scale != 1.0: rng = rng / float(self.range_scale) # ---------- IVE: intensity histogram ---------- # 取出强度通道 (M, T) intensity = voxels[:, :, self.intensity_idx] M, T = intensity.shape device = intensity.device # 只对真实点做统计(> num_points 的位置是 padding) idx = torch.arange(T, device=device).view(1, -1) valid_mask = idx < voxel_num_points.view(-1, 1) # (M, T) bool # 限制在 [hist_min, hist_max] 范围 intensity = torch.clamp(intensity, self.hist_min, self.hist_max) # 计算 K 段区间 edges = torch.linspace(self.hist_min, self.hist_max, self.num_hist_bins + 1, device=device) denom = torch.clamp_min(voxel_num_points.type_as(voxels), 1.0) hist_list = [] for k in range(self.num_hist_bins): lower = edges[k] upper = edges[k + 1] + (1e-6 if k == self.num_hist_bins - 1 else 0.0) bin_mask = valid_mask & (intensity >= lower) & (intensity < upper) # 统计该 bin 内点数 / num_points,得到频率 bin_count = bin_mask.float().sum(dim=1) / denom hist_list.append(bin_count.view(-1, 1)) hist_feat = torch.cat(hist_list, dim=1) # (M, K) # ---------- 拼接所有特征 ---------- voxel_features = torch.cat( [mean_feat, density.unsqueeze(1), rng.unsqueeze(1), hist_feat], dim=1 ).contiguous() batch_dict['voxel_features'] = voxel_features return batch_dict,结果如下:MAR:67.25,MAP:54.592,FPS:89,每个类别[email protected], 0.70, 0.70: 3d ap:无人机:87.01,塑料布:45.64,风筝:15.88,气球:97.3,鸟:27.13,我现在想进行下一步的改进,你更推荐下面哪一个改进思路呢,一定要按模块更改,不要只微调参数,
模型 mAR mAP FPS
Second 63.49 44.36 106
PV_RCNN 72.74 55.82 10
Part_A2 66.47 54.97 38
PointRCNN 81.81 74.73 19
VoxelRCNN 61.34 44.96 69
VoxelNeXt 53.09 40.81 62
Centerpoint 55.34 41.964 65,每个类别[email protected], 0.70, 0.70: 3d ap:无人机:67.37,塑料布:16.45,风筝:7.51,气球:89.5,鸟:28.99,请给出新的模块和思路,我的数据不是时序数据,所以除了时序相关的模块,你都可以进行改进
我理解你的诉求:单阶段、体素化(voxel)、OpenPCDet/CenterPoint 基线,场景是悬挂式单轨、目标以小/少点/形状不规则为主(Bird/Kite/Plastic_sheet 明显难),希望在 mAP/mAR + FPS 上综合最优、且有大量中英文资料/论文/开源可参考,改动必须是模块级而不是“调几个阈值/学习率”。
你现在这版(CenterPoint + 你写的 PDVMeanVFE + intensity histogram)已经非常强:mAP 54.592、mAR 67.25、FPS 89,Drone/Balloon 很高,但 Kite 15.88、Bird 27.13、Plastic_sheet 45.64 是主要短板。这个分布通常意味着:** 瓶颈不在“整体能不能检测到目标”,而在“小目标特征被下采样稀释 + 类别之间头部冲突 + 评分/排序不匹配 IoU=0.7 指标”**。
在你给的约束下(单阶段 + 体素化 + 不做时序),我最推荐你下一步按模块做一套 CenterPoint-4x + Tiny/Normal 双头 + IoU-aware quality head,原因是它最直接对准你的痛点(Bird/Kite/Plastic_sheet)且资料/复现非常多:
把 BEV 输出 stride 从 8 降到 4(Backbone_3D 模块改造)
单阶段体素网络下采样过猛会损失结构/定位信息,这在 SA-SSD 等工作里就是核心问题之一。
你的 tiny 类(Bird/Kite)在 IoU=0.7 下对定位非常敏感,“结构性提升”往往比继续玩 VFE 细节更有效。
把 Dense Head 拆成至少 2 个任务头(Tiny vs Normal)
你现在 5 类共用一套 head,梯度/尺度冲突会非常明显(tiny 类会被大类/易类“带偏”)。CBGS 的思路就是类均衡采样 + 分组多头,对小类提升非常常见。
增加 IoU/Quality 分支做质量感知 re-score(Head 模块扩展)
很多 one-stage 检测的 mAP 提升来自“分类分数能反映定位质量”(quality estimation + quality focal 类思想)。GFL/QFL 是经典体系。
对你这种 IoU=0.7 严格 + 小目标框容易偏 的情况,收益通常很稳定,而且 FPS 影响小。
(可选 4)把 BEV 2D backbone 换成轻量 FPN/BiFPN 做多尺度融合
FPN/BiFPN 的论文与教程太多了,改起来也相对模块化。
下面完全按你要求的“模块改动”来拆解(不是“调调阈值”)。
目标: 把稀疏 3D backbone 的总下采样倍率从 8x 改成 4x,让 HeightCompression 后的 BEV 特征更密,tiny 类更容易保留局部结构。
怎么做(工程上最省事):
pcdet/models/backbones_3d/ 里复制 VoxelResBackBone8x 的实现,新建一个 VoxelResBackBone4x。stride=2 的 downsample stage 改为 stride=1(或直接删掉最后一个 downsample),使总 stride 变成 4。你必须配套改的配置(关键项):
yamlMODEL: BACKBONE_3D: NAME: VoxelResBackBone4x DENSE_HEAD: TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4
为什么这不是“调参”: 这是改变网络的特征图分辨率与 receptive field 的结构性变化,对 tiny 类经常是数量级影响。
直觉:你 voxel_size=0.1m,stride 8 时 BEV cell 对应 0.8m;stride 4 时变 0.4m。对 Bird/Kite 这种 box 尺寸小、IoU=0.7 严格的任务,“更密的语义栅格”通常直接提升 recall 与定位精度。
你现在配置里:
yamlCLASS_NAMES_EACH_HEAD: [ ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] ]
建议改成:
yamlCLASS_NAMES_EACH_HEAD: [ ['Bird', 'Kite'], # Tiny head ['Drone', 'Plastic_sheet', 'Balloon']# Normal head ]
为什么:
建议你加的“小代码模块改动”(非常值):
让每个 head 支持独立的 TARGET_ASSIGNER_CONFIG(例如不同 MIN_RADIUS/GAUSSIAN_OVERLAP/NUM_MAX_OBJS),而不是全局一套。这个改动位置一般在 pcdet/models/dense_heads/center_head.py 的 target 生成逻辑里:把原来单一 self.model_cfg.TARGET_ASSIGNER_CONFIG 改成 list(TARGET_ASSIGNER_CONFIG_EACH_HEAD)并按 head index 取值。
这一步非常符合 CBGS 论文的精神:分组 + 多头来缓解类别不均衡与形状差异。
你当前 SEPARATE_HEAD_CFG 是:
yamlHEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
建议扩成:
yamlHEAD_ORDER: ['center', 'center_z', 'dim', 'rot', 'iou'] HEAD_DICT: ... 'iou': {'out_channels': 1, 'num_conv': 2}
训练怎么监督 iou:
iou 分支(L1/BCE 都可)。推理怎么用:
score = heatmap_score * (iou_pred)^α(或 sqrt(heatmap_score * iou_pred))替代单纯 heatmap_score 排序,再做 NMS。这一步对你的 Bird/Kite/Plastic_sheet 很关键:
它们经常会出现“分数看起来不错,但 box 偏一点就 IoU 掉到 0.7 以下”的预测,IoU-aware re-score 能把这类预测自动降权,mAP 更稳。
你现在回归主要是 L1(center/dim/rot)。建议在正样本上加一个 BEV IoU loss(或借鉴 GIoU/DIoU 思路)让优化目标更贴近评估指标 IoU。
重点:不需要对所有候选都算 IoU,只对正样本算,成本可控。
如果你想先做一个风险最低的版本(基本不动 backbone):
MULTI_CLASSES_NMS: False 有潜在风险(尤其 Bird/Kite 容易被其它类误检压掉)。这套对 FPS 影响很小,但对小类 mAP 往往能有明显改善。
如果你愿意加训练用的辅助分支(推理可关闭),可借鉴:
但这条路线工程量更大,我建议你在方案A跑通后再上。
我建议你优先落地 方案A 的前三项(最划算):
这三项组合是“结构性解决 tiny 类”的主路线:
这些方向之所以推荐,是因为论文、复现、博客、代码都非常多:
如果你希望我把“方案A”进一步落到你这份 cfg里(不是泛泛而谈),我可以直接给你:
VoxelResBackBone4x + 双 head + iou head + per-head assigner 字段结构)请按照方案A一步一步给出代码和配置文件
下面我按方案A(CenterPoint + 体素化 + 单阶段)给你“一步一步”的代码改动 + 配置文件(可直接在 OpenPCDet 上落地)。
目标是:(1) XY stride 从 8 → 4 提升小目标分辨率,(2) Bird/Kite 与其它类分头训练减少任务冲突,(3) 加 IoU/质量分支做 re-score,提升 IoU=0.7 下的 mAP。
重要说明:你贴的
center_head.py版本其实已经原生支持iou分支、USE_IOU_TO_RECTIFY_SCORE、IOU_REG_LOSS(在get_loss里就有相关逻辑),所以IoU 分支这步基本只需要“加 head + 开开关”即可,不需要你自己重写 IoU loss。你只需要按下面把配置补齐即可。你贴的 gitee 版本center_head.py在get_loss中确实有calculate_iou_loss_centerhead和USE_IOU_TO_RECTIFY_SCORE的逻辑。
你现在 voxel_size = [0.1,0.1,0.2],X=160m、Y=40m。
但要注意一个坑:stride 改了以后,Z 方向如果下采样不够,HeightCompression 输出通道会变大(C×D),会导致 2D backbone 输入通道不匹配。
所以我们要做到:
最稳的做法:把 conv4 的 stride 改为 (2,1,1)(只在 Z 下采样,不再在 XY 下采样)。
spconv_backbone.py 增加一个新类文件:pcdet/models/backbones_3d/spconv_backbone.py
做法:复制现有的 VoxelResBackBone8x 类,重命名为 VoxelResBackBone4x,然后按下面完整代码替换/新增(你也可以直接对照你当前 8x 类,仅改动标注的三处)。
下面这个类写法遵循 OpenPCDet 的常见风格:使用已有的
post_act_block、SparseBasicBlock、spconv_utils.replace_feature等(这些在同文件里本来就有)。
python# ========== 在 pcdet/models/backbones_3d/spconv_backbone.py 中新增 ========== from functools import partial import torch import torch.nn as nn from ...utils.spconv_utils import replace_feature, spconv # 这里假设你文件里已经有 post_act_block、SparseBasicBlock # 如果你原文件的 VoxelResBackBone8x 用的就是这些,那直接可用 class VoxelResBackBone4x(nn.Module): """ XY stride: 4x (用于小目标) Z stride: 保持与 8x backbone 类似,保证 HeightCompression 后通道仍≈256 """ def __init__(self, model_cfg, input_channels, grid_size, voxel_size, point_cloud_range, **kwargs): super().__init__() self.model_cfg = model_cfg self.sparse_shape = grid_size[::-1] + [1, 0, 0] self.voxel_size = voxel_size self.point_cloud_range = point_cloud_range norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01) block = post_act_block self.conv_input = spconv.SparseSequential( spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'), norm_fn(16), nn.ReLU(), ) self.conv1 = spconv.SparseSequential( SparseBasicBlock(16, 16, norm_fn=norm_fn, indice_key='res1'), SparseBasicBlock(16, 16, norm_fn=norm_fn, indice_key='res1'), ) # stride=2: XY, Z 都下采样一次 self.conv2 = spconv.SparseSequential( block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'), SparseBasicBlock(32, 32, norm_fn=norm_fn, indice_key='res2'), SparseBasicBlock(32, 32, norm_fn=norm_fn, indice_key='res2'), ) # stride=2: XY, Z 再下采样一次 (此时 XY stride=4) self.conv3 = spconv.SparseSequential( block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'), SparseBasicBlock(64, 64, norm_fn=norm_fn, indice_key='res3'), SparseBasicBlock(64, 64, norm_fn=norm_fn, indice_key='res3'), ) # ========== 关键改动1:conv4 只在 Z 下采样,不在 XY 下采样 ========== # 原 8x:stride=2 (Z,Y,X 都下采样) -> XY stride=8 # 现在 4x:stride=(2,1,1) -> XY stride 保持 4,Z 继续下采样 # padding 建议保持 (0,1,1)(很多 OpenPCDet 配置就是这样,避免 Z padding) self.conv4 = spconv.SparseSequential( block(64, 128, 3, norm_fn=norm_fn, stride=(2, 1, 1), padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'), SparseBasicBlock(128, 128, norm_fn=norm_fn, indice_key='res4'), SparseBasicBlock(128, 128, norm_fn=norm_fn, indice_key='res4'), ) last_pad = self.model_cfg.get('last_pad', 0) # 与 8x 一样:只在 Z 再下采样一次,把 D 压到 ~2,保证 C*D≈256 self.conv_out = spconv.SparseSequential( spconv.SparseConv3d(128, 128, (3, 1, 1), stride=(2, 1, 1), padding=last_pad, bias=False, indice_key='spconv_down2'), norm_fn(128), nn.ReLU(), ) self.num_point_features = 128 def forward(self, batch_dict): voxel_features, voxel_coords = batch_dict['voxel_features'], batch_dict['voxel_coords'] batch_size = batch_dict['batch_size'] input_sp_tensor = spconv.SparseConvTensor( features=voxel_features, indices=voxel_coords.int(), spatial_shape=self.sparse_shape, batch_size=batch_size ) x = self.conv_input(input_sp_tensor) x_conv1 = self.conv1(x) x_conv2 = self.conv2(x_conv1) x_conv3 = self.conv3(x_conv2) x_conv4 = self.conv4(x_conv3) out = self.conv_out(x_conv4) # ========== 关键改动2:encoded stride 改成 4 ========== batch_dict['encoded_spconv_tensor'] = out batch_dict['encoded_spconv_tensor_stride'] = 4 # multi-scale 给其他模块用(如果你未来加 FPN/aux loss 也方便) batch_dict['multi_scale_3d_features'] = { 'x_conv1': x_conv1, 'x_conv2': x_conv2, 'x_conv3': x_conv3, 'x_conv4': x_conv4, } # ========== 关键改动3:x_conv4 stride 改成 4 ========== batch_dict['multi_scale_3d_strides'] = { 'x_conv1': 1, 'x_conv2': 2, 'x_conv3': 4, 'x_conv4': 4, } return batch_dict
文件:pcdet/models/backbones_3d/__init__.py
把 VoxelResBackBone4x import 进来,并加到 __all__ 字典里。
python# 原来可能是: # from .spconv_backbone import VoxelBackBone8x, VoxelResBackBone8x from .spconv_backbone import VoxelBackBone8x, VoxelResBackBone8x, VoxelResBackBone4x __all__ = { 'VoxelBackBone8x': VoxelBackBone8x, 'VoxelResBackBone8x': VoxelResBackBone8x, 'VoxelResBackBone4x': VoxelResBackBone4x, # 新增这一行 # ... 其他不动 }
你要做 Tiny/Normal 双 head,并且希望 tiny head 用不同的 MIN_RADIUS/GAUSSIAN_OVERLAP,这就需要 CenterHead 在 assign_targets 里按 head 取不同 config。
文件:pcdet/models/dense_heads/center_head.py
修改函数:assign_targets
你现在的原始逻辑是所有 head 都用:
pythontarget_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG ... feature_map_stride=target_assigner_cfg.FEATURE_MAP_STRIDE gaussian_overlap=target_assigner_cfg.GAUSSIAN_OVERLAP min_radius=target_assigner_cfg.MIN_RADIUS
我们改成:如果 cfg 里提供 TARGET_ASSIGNER_CONFIG_EACH_HEAD(list),就按 head_idx 取;否则回退到默认。
把下面 patch 贴进去(你只需要改 assign_targets 这一段即可):
python# ====== 在 assign_targets 里:target_assigner_cfg = ... 后面加 ====== target_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG target_assigner_cfg_each_head = self.model_cfg.get('TARGET_ASSIGNER_CONFIG_EACH_HEAD', None) def _cfg_get(cfg, key, default=None): if cfg is None: return default if isinstance(cfg, dict): return cfg.get(key, default) if hasattr(cfg, 'get'): return cfg.get(key, default) return getattr(cfg, key, default) # ====== 把原来的 for idx, cur_class_names ... 改成 ====== for head_idx, cur_class_names in enumerate(self.class_names_each_head): # 每个 head 自己的 assigner cfg(没有就用默认) cur_assigner_cfg = target_assigner_cfg_each_head[head_idx] if target_assigner_cfg_each_head is not None else target_assigner_cfg feature_map_stride = _cfg_get(cur_assigner_cfg, 'FEATURE_MAP_STRIDE', target_assigner_cfg.FEATURE_MAP_STRIDE) num_max_objs = _cfg_get(cur_assigner_cfg, 'NUM_MAX_OBJS', target_assigner_cfg.NUM_MAX_OBJS) gaussian_overlap = _cfg_get(cur_assigner_cfg, 'GAUSSIAN_OVERLAP', target_assigner_cfg.GAUSSIAN_OVERLAP) min_radius = _cfg_get(cur_assigner_cfg, 'MIN_RADIUS', target_assigner_cfg.MIN_RADIUS) heatmap_list, target_boxes_list, inds_list, masks_list, target_boxes_src_list = [], [], [], [], [] for bs_idx in range(batch_size): cur_gt_boxes = gt_boxes[bs_idx] gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()] gt_boxes_single_head = [] # 注意:避免 idx 变量覆盖,这里用 gt_idx for gt_idx, name in enumerate(gt_class_names): if name not in cur_class_names: continue temp_box = cur_gt_boxes[gt_idx] temp_box[-1] = cur_class_names.index(name) + 1 gt_boxes_single_head.append(temp_box[None, :]) if len(gt_boxes_single_head) == 0: gt_boxes_single_head = cur_gt_boxes[:0, :] else: gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0) heatmap, ret_boxes, inds, mask, ret_boxes_src = self.assign_target_of_single_head( num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head.cpu(), feature_map_size=feature_map_size, feature_map_stride=feature_map_stride, num_max_objs=num_max_objs, gaussian_overlap=gaussian_overlap, min_radius=min_radius, ) heatmap_list.append(heatmap.to(gt_boxes_single_head.device)) target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device)) inds_list.append(inds.to(gt_boxes_single_head.device)) masks_list.append(mask.to(gt_boxes_single_head.device)) target_boxes_src_list.append(ret_boxes_src.to(gt_boxes_single_head.device)) ret_dict['heatmaps'].append(torch.stack(heatmap_list, dim=0)) ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0)) ret_dict['inds'].append(torch.stack(inds_list, dim=0)) ret_dict['masks'].append(torch.stack(masks_list, dim=0)) ret_dict['target_boxes_src'].append(torch.stack(target_boxes_src_list, dim=0))
你贴出来的
center_head.py里assign_targets的原代码结构就是上面这种(for head 循环里又把 idx 重用了一次)。
这个 patch 做的是“模块级能力增强”,后面 cfg 才能真正生效。
下面是基于你现有配置,做的最小侵入修改版本(保留你 PDVMeanVFE、2D backbone 等),仅加入方案A的三块:
VoxelResBackBone4x你保存为:
tools/cfgs/custom_models/centerpoint_pdv_4x_2head_iou.yaml(路径随你)
yamlCLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] DATA_CONFIG: _BASE_CONFIG_: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml MODEL: NAME: Test VFE: NAME: PDVMeanVFE USE_XY_RANGE: True DENSITY_LOG1P: True RANGE_SCALE: 160.0 INTENSITY_IDX: 3 NUM_HIST_BINS: 8 HIST_MIN: 0.0 HIST_MAX: 1.0 BACKBONE_3D: NAME: VoxelResBackBone4x # <<< 改这里(Step1) MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 # 仍保持 256(我们 backbone 设计就是为了不破这个) BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHead CLASS_AGNOSTIC: False # <<< 改这里:双 head(Tiny / Normal) CLASS_NAMES_EACH_HEAD: [ ['Bird', 'Kite'], ['Drone', 'Plastic_sheet', 'Balloon'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True NUM_HM_CONV: 2 SEPARATE_HEAD_CFG: # 注意:不要把 iou 放进 HEAD_ORDER(否则回归维度和 code_weights 会错) HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, # <<< 新增:IoU / quality 分支(Step4) 'iou': {'out_channels': 1, 'num_conv': 2} } # <<< 改这里:stride 8 -> 4 TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 新增:每个 head 自己的 assigner(需要 Step3 的代码支持) TARGET_ASSIGNER_CONFIG_EACH_HEAD: - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.35 # tiny:更“尖”的监督(更小 radius) MIN_RADIUS: 1 DENSE_REG: 1 - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.10 # normal:保持你原本风格 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 可选但建议:训练期加 IoU reg(不影响推理FPS) IOU_REG_LOSS: True LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-160, -75.2, -5.0, 160, 75.2, 8.0] MAX_OBJ_PER_SAMPLE: 500 # <<< 新增:用 iou 预测纠正 score(质量感知排序) USE_IOU_TO_RECTIFY_SCORE: True # 对应 CLASS_NAMES 顺序:Drone, Plastic_sheet, Kite, Balloon, Bird # alpha 越大,越依赖 iou(对 tiny 类一般设大点更稳) IOU_RECTIFIER: [0.50, 0.60, 0.70, 0.30, 0.70] NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti OPTIMIZATION: BATCH_SIZE_PER_GPU: 4 NUM_EPOCHS: 80 OPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
HeightCompression.forward 之后):batch_dict['spatial_features_stride'] 应该是 4batch_dict['spatial_features'].shape 的 C 应该仍然是 256(否则你的 Z 下采样没对齐,会爆通道)确认 CenterHead 的 FEATURE_MAP_STRIDE=4 生效
否则你会出现“可视化位置全偏、mAP 反而下降”。
确认 iou 分支真的在输出里
看 pred_dict 是否包含 iou 键;如果没有,说明 SEPARATE_HEAD_CFG.HEAD_DICT 没加对或你的 OpenPCDet 版本没这块逻辑(但你贴的版本是有的)。
你先把上面 A 跑通,拿到新 mAP(特别看 Bird/Kite 是否上升)。
如果 Bird/Kite 还是明显低,我建议你下一步只加一个模块(不动其它):
如果你愿意,我也可以再给你一份**“对照实验清单(A0/A1/A2/A3)”**:每一步单独开关,让你快速定位到底是 stride=4、双 head、还是 iou re-score 哪个带来的提升最大。
原始的center_head代码如下:import copy
import numpy as np
import torch
import torch.nn as nn
from torch.nn.init import kaiming_normal_
from ..model_utils import model_nms_utils
from ..model_utils import centernet_utils
from ...utils import loss_utils
from functools import partial
class SeparateHead(nn.Module):
def init(self, input_channels, sep_head_dict, init_bias=-2.19, use_bias=False, norm_func=None):
super().init()
self.sep_head_dict = sep_head_dict
textfor cur_name in self.sep_head_dict: output_channels = self.sep_head_dict[cur_name]['out_channels'] num_conv = self.sep_head_dict[cur_name]['num_conv'] fc_list = [] for k in range(num_conv - 1): fc_list.append(nn.Sequential( nn.Conv2d(input_channels, input_channels, kernel_size=3, stride=1, padding=1, bias=use_bias), nn.BatchNorm2d(input_channels) if norm_func is None else norm_func(input_channels), nn.ReLU() )) fc_list.append(nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=1, padding=1, bias=True)) fc = nn.Sequential(*fc_list) if 'hm' in cur_name: fc[-1].bias.data.fill_(init_bias) else: for m in fc.modules(): if isinstance(m, nn.Conv2d): kaiming_normal_(m.weight.data) if hasattr(m, "bias") and m.bias is not None: nn.init.constant_(m.bias, 0) self.__setattr__(cur_name, fc) def forward(self, x): ret_dict = {} for cur_name in self.sep_head_dict: ret_dict[cur_name] = self.__getattr__(cur_name)(x) return ret_dict
class CenterHead(nn.Module):
def init(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size,
predict_boxes_when_training=True):
super().init()
self.model_cfg = model_cfg
self.num_class = num_class
self.grid_size = grid_size
self.point_cloud_range = point_cloud_range
self.voxel_size = voxel_size
self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None)
textself.class_names = class_names self.class_names_each_head = [] self.class_id_mapping_each_head = [] for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD: self.class_names_each_head.append([x for x in cur_class_names if x in class_names]) cur_class_id_mapping = torch.from_numpy(np.array( [self.class_names.index(x) for x in cur_class_names if x in class_names] )).cuda() self.class_id_mapping_each_head.append(cur_class_id_mapping) total_classes = sum([len(x) for x in self.class_names_each_head]) assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}' norm_func = partial(nn.BatchNorm2d, eps=self.model_cfg.get('BN_EPS', 1e-5), momentum=self.model_cfg.get('BN_MOM', 0.1)) self.shared_conv = nn.Sequential( nn.Conv2d( input_channels, self.model_cfg.SHARED_CONV_CHANNEL, 3, stride=1, padding=1, bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False) ), norm_func(self.model_cfg.SHARED_CONV_CHANNEL), nn.ReLU(), ) self.heads_list = nn.ModuleList() self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG for idx, cur_class_names in enumerate(self.class_names_each_head): cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT) cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV) self.heads_list.append( SeparateHead( input_channels=self.model_cfg.SHARED_CONV_CHANNEL, sep_head_dict=cur_head_dict, init_bias=-2.19, use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False), norm_func=norm_func ) ) self.predict_boxes_when_training = predict_boxes_when_training self.forward_ret_dict = {} self.build_losses() def build_losses(self): self.add_module('hm_loss_func', loss_utils.FocalLossCenterNet()) self.add_module('reg_loss_func', loss_utils.RegLossCenterNet()) def assign_target_of_single_head( self, num_classes, gt_boxes, feature_map_size, feature_map_stride, num_max_objs=500, gaussian_overlap=0.1, min_radius=2 ): """ Args: gt_boxes: (N, 8) feature_map_size: (2), [x, y] Returns: """ heatmap = gt_boxes.new_zeros(num_classes, feature_map_size[1], feature_map_size[0]) ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1)) inds = gt_boxes.new_zeros(num_max_objs).long() mask = gt_boxes.new_zeros(num_max_objs).long() ret_boxes_src = gt_boxes.new_zeros(num_max_objs, gt_boxes.shape[-1]) ret_boxes_src[:gt_boxes.shape[0]] = gt_boxes x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2] coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride coord_x = torch.clamp(coord_x, min=0, max=feature_map_size[0] - 0.5) # bugfixed: 1e-6 does not work for center.int() coord_y = torch.clamp(coord_y, min=0, max=feature_map_size[1] - 0.5) # center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1) center_int = center.int() center_int_float = center_int.float() dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5] dx = dx / self.voxel_size[0] / feature_map_stride dy = dy / self.voxel_size[1] / feature_map_stride radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap) radius = torch.clamp_min(radius.int(), min=min_radius) for k in range(min(num_max_objs, gt_boxes.shape[0])): if dx[k] <= 0 or dy[k] <= 0: continue if not (0 <= center_int[k][0] <= feature_map_size[0] and 0 <= center_int[k][1] <= feature_map_size[1]): continue cur_class_id = (gt_boxes[k, -1] - 1).long() centernet_utils.draw_gaussian_to_heatmap(heatmap[cur_class_id], center[k], radius[k].item()) inds[k] = center_int[k, 1] * feature_map_size[0] + center_int[k, 0] mask[k] = 1 ret_boxes[k, 0:2] = center[k] - center_int_float[k].float() ret_boxes[k, 2] = z[k] ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log() ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6]) ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6]) if gt_boxes.shape[1] > 8: ret_boxes[k, 8:] = gt_boxes[k, 7:-1] return heatmap, ret_boxes, inds, mask, ret_boxes_src def assign_targets(self, gt_boxes, feature_map_size=None, **kwargs): """ Args: gt_boxes: (B, M, 8) range_image_polar: (B, 3, H, W) feature_map_size: (2) [H, W] spatial_cartesian: (B, 4, H, W) Returns: """ feature_map_size = feature_map_size[::-1] # [H, W] ==> [x, y] target_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG # feature_map_size = self.grid_size[:2] // target_assigner_cfg.FEATURE_MAP_STRIDE batch_size = gt_boxes.shape[0] ret_dict = { 'heatmaps': [], 'target_boxes': [], 'inds': [], 'masks': [], 'heatmap_masks': [], 'target_boxes_src': [], } all_names = np.array(['bg', *self.class_names]) for idx, cur_class_names in enumerate(self.class_names_each_head): heatmap_list, target_boxes_list, inds_list, masks_list, target_boxes_src_list = [], [], [], [], [] for bs_idx in range(batch_size): cur_gt_boxes = gt_boxes[bs_idx] gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()] gt_boxes_single_head = [] for idx, name in enumerate(gt_class_names): if name not in cur_class_names: continue temp_box = cur_gt_boxes[idx] temp_box[-1] = cur_class_names.index(name) + 1 gt_boxes_single_head.append(temp_box[None, :]) if len(gt_boxes_single_head) == 0: gt_boxes_single_head = cur_gt_boxes[:0, :] else: gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0) heatmap, ret_boxes, inds, mask, ret_boxes_src = self.assign_target_of_single_head( num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head.cpu(), feature_map_size=feature_map_size, feature_map_stride=target_assigner_cfg.FEATURE_MAP_STRIDE, num_max_objs=target_assigner_cfg.NUM_MAX_OBJS, gaussian_overlap=target_assigner_cfg.GAUSSIAN_OVERLAP, min_radius=target_assigner_cfg.MIN_RADIUS, ) heatmap_list.append(heatmap.to(gt_boxes_single_head.device)) target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device)) inds_list.append(inds.to(gt_boxes_single_head.device)) masks_list.append(mask.to(gt_boxes_single_head.device)) target_boxes_src_list.append(ret_boxes_src.to(gt_boxes_single_head.device)) ret_dict['heatmaps'].append(torch.stack(heatmap_list, dim=0)) ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0)) ret_dict['inds'].append(torch.stack(inds_list, dim=0)) ret_dict['masks'].append(torch.stack(masks_list, dim=0)) ret_dict['target_boxes_src'].append(torch.stack(target_boxes_src_list, dim=0)) return ret_dict def sigmoid(self, x): y = torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4) return y def get_loss(self): pred_dicts = self.forward_ret_dict['pred_dicts'] target_dicts = self.forward_ret_dict['target_dicts'] tb_dict = {} loss = 0 for idx, pred_dict in enumerate(pred_dicts): pred_dict['hm'] = self.sigmoid(pred_dict['hm']) hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][idx]) hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight'] target_boxes = target_dicts['target_boxes'][idx] pred_boxes = torch.cat([pred_dict[head_name] for head_name in self.separate_head_cfg.HEAD_ORDER], dim=1) reg_loss = self.reg_loss_func( pred_boxes, target_dicts['masks'][idx], target_dicts['inds'][idx], target_boxes ) loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum() loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += hm_loss + loc_loss tb_dict['hm_loss_head_%d' % idx] = hm_loss.item() tb_dict['loc_loss_head_%d' % idx] = loc_loss.item() if 'iou' in pred_dict or self.model_cfg.get('IOU_REG_LOSS', False): batch_box_preds = centernet_utils.decode_bbox_from_pred_dicts( pred_dict=pred_dict, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride ) # (B, H, W, 7 or 9) if 'iou' in pred_dict: batch_box_preds_for_iou = batch_box_preds.permute(0, 3, 1, 2) # (B, 7 or 9, H, W) iou_loss = loss_utils.calculate_iou_loss_centerhead( iou_preds=pred_dict['iou'], batch_box_preds=batch_box_preds_for_iou.clone().detach(), mask=target_dicts['masks'][idx], ind=target_dicts['inds'][idx], gt_boxes=target_dicts['target_boxes_src'][idx] ) loss += iou_loss tb_dict['iou_loss_head_%d' % idx] = iou_loss.item() if self.model_cfg.get('IOU_REG_LOSS', False): iou_reg_loss = loss_utils.calculate_iou_reg_loss_centerhead( batch_box_preds=batch_box_preds_for_iou, mask=target_dicts['masks'][idx], ind=target_dicts['inds'][idx], gt_boxes=target_dicts['target_boxes_src'][idx] ) if target_dicts['masks'][idx].sum().item() != 0: iou_reg_loss = iou_reg_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += iou_reg_loss tb_dict['iou_reg_loss_head_%d' % idx] = iou_reg_loss.item() else: loss += (batch_box_preds_for_iou * 0.).sum() tb_dict['iou_reg_loss_head_%d' % idx] = (batch_box_preds_for_iou * 0.).sum() tb_dict['rpn_loss'] = loss.item() return loss, tb_dict def generate_predicted_boxes(self, batch_size, pred_dicts): post_process_cfg = self.model_cfg.POST_PROCESSING post_center_limit_range = torch.tensor(post_process_cfg.POST_CENTER_LIMIT_RANGE).cuda().float() ret_dict = [{ 'pred_boxes': [], 'pred_scores': [], 'pred_labels': [], } for k in range(batch_size)] for idx, pred_dict in enumerate(pred_dicts): batch_hm = pred_dict['hm'].sigmoid() batch_center = pred_dict['center'] batch_center_z = pred_dict['center_z'] batch_dim = pred_dict['dim'].exp() batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1) batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1) batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None batch_iou = (pred_dict['iou'] + 1) * 0.5 if 'iou' in pred_dict else None final_pred_dicts = centernet_utils.decode_bbox_from_heatmap( heatmap=batch_hm, rot_cos=batch_rot_cos, rot_sin=batch_rot_sin, center=batch_center, center_z=batch_center_z, dim=batch_dim, vel=batch_vel, iou=batch_iou, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride, K=post_process_cfg.MAX_OBJ_PER_SAMPLE, circle_nms=(post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'), score_thresh=post_process_cfg.SCORE_THRESH, post_center_limit_range=post_center_limit_range ) for k, final_dict in enumerate(final_pred_dicts): final_dict['pred_labels'] = self.class_id_mapping_each_head[idx][final_dict['pred_labels'].long()] if post_process_cfg.get('USE_IOU_TO_RECTIFY_SCORE', False) and 'pred_iou' in final_dict: pred_iou = torch.clamp(final_dict['pred_iou'], min=0, max=1.0) IOU_RECTIFIER = final_dict['pred_scores'].new_tensor(post_process_cfg.IOU_RECTIFIER) final_dict['pred_scores'] = torch.pow(final_dict['pred_scores'], 1 - IOU_RECTIFIER[final_dict['pred_labels']]) * torch.pow(pred_iou, IOU_RECTIFIER[final_dict['pred_labels']]) if post_process_cfg.NMS_CONFIG.NMS_TYPE not in ['circle_nms', 'class_specific_nms']: selected, selected_scores = model_nms_utils.class_agnostic_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], nms_config=post_process_cfg.NMS_CONFIG, score_thresh=None ) elif post_process_cfg.NMS_CONFIG.NMS_TYPE == 'class_specific_nms': selected, selected_scores = model_nms_utils.class_specific_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], box_labels=final_dict['pred_labels'], nms_config=post_process_cfg.NMS_CONFIG, score_thresh=post_process_cfg.NMS_CONFIG.get('SCORE_THRESH', None) ) elif post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms': raise NotImplementedError final_dict['pred_boxes'] = final_dict['pred_boxes'][selected] final_dict['pred_scores'] = selected_scores final_dict['pred_labels'] = final_dict['pred_labels'][selected] ret_dict[k]['pred_boxes'].append(final_dict['pred_boxes']) ret_dict[k]['pred_scores'].append(final_dict['pred_scores']) ret_dict[k]['pred_labels'].append(final_dict['pred_labels']) for k in range(batch_size): ret_dict[k]['pred_boxes'] = torch.cat(ret_dict[k]['pred_boxes'], dim=0) ret_dict[k]['pred_scores'] = torch.cat(ret_dict[k]['pred_scores'], dim=0) ret_dict[k]['pred_labels'] = torch.cat(ret_dict[k]['pred_labels'], dim=0) + 1 return ret_dict @staticmethod def reorder_rois_for_refining(batch_size, pred_dicts): num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts]) num_max_rois = max(1, num_max_rois) # at least one faked rois to avoid error pred_boxes = pred_dicts[0]['pred_boxes'] rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1])) roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois)) roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long() for bs_idx in range(batch_size): num_boxes = len(pred_dicts[bs_idx]['pred_boxes']) rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes'] roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores'] roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels'] return rois, roi_scores, roi_labels def forward(self, data_dict): spatial_features_2d = data_dict['spatial_features_2d'] x = self.shared_conv(spatial_features_2d) pred_dicts = [] for head in self.heads_list: pred_dicts.append(head(x)) if self.training: target_dict = self.assign_targets( data_dict['gt_boxes'], feature_map_size=spatial_features_2d.size()[2:], feature_map_stride=data_dict.get('spatial_features_2d_strides', None) ) self.forward_ret_dict['target_dicts'] = target_dict self.forward_ret_dict['pred_dicts'] = pred_dicts if not self.training or self.predict_boxes_when_training: pred_dicts = self.generate_predicted_boxes( data_dict['batch_size'], pred_dicts ) if self.predict_boxes_when_training: rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], pred_dicts) data_dict['rois'] = rois data_dict['roi_scores'] = roi_scores data_dict['roi_labels'] = roi_labels data_dict['has_class_labels'] = True else: data_dict['final_box_dicts'] = pred_dicts return data_dict
请给出修改后的代码,因为我看你修改后的不对
收到。你这份 center_head.py 跟我前面给的 patch 版本不一致,所以你会觉得“不对”。下面我按你贴出来的原始代码,给出一个可直接替换的修改版(只改必要地方):
TARGET_ASSIGNER_CONFIG_EACH_HEAD(每个 head 一套 assigner:stride / overlap / min_radius / max_objs)idx 变量被重复覆盖 的坑(外层 head idx、内层 gt idx)IOU_REG_LOSS=True 但 pred_dict 没有 iou 时,batch_box_preds_for_iou 可能未定义(虽然你通常会加 iou head,但这里让它更鲁棒)你只需要把下面整个文件内容替换掉原 pcdet/models/dense_heads/center_head.py(或至少替换相同位置的函数实现)。
center_head.py(与你贴的版本对齐)pythonimport copy import numpy as np import torch import torch.nn as nn from torch.nn.init import kaiming_normal_ from ..model_utils import model_nms_utils from ..model_utils import centernet_utils from ...utils import loss_utils from functools import partial class SeparateHead(nn.Module): def __init__(self, input_channels, sep_head_dict, init_bias=-2.19, use_bias=False, norm_func=None): super().__init__() self.sep_head_dict = sep_head_dict for cur_name in self.sep_head_dict: output_channels = self.sep_head_dict[cur_name]['out_channels'] num_conv = self.sep_head_dict[cur_name]['num_conv'] fc_list = [] for k in range(num_conv - 1): fc_list.append(nn.Sequential( nn.Conv2d(input_channels, input_channels, kernel_size=3, stride=1, padding=1, bias=use_bias), nn.BatchNorm2d(input_channels) if norm_func is None else norm_func(input_channels), nn.ReLU() )) fc_list.append(nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=1, padding=1, bias=True)) fc = nn.Sequential(*fc_list) if 'hm' in cur_name: fc[-1].bias.data.fill_(init_bias) else: for m in fc.modules(): if isinstance(m, nn.Conv2d): kaiming_normal_(m.weight.data) if hasattr(m, "bias") and m.bias is not None: nn.init.constant_(m.bias, 0) self.__setattr__(cur_name, fc) def forward(self, x): ret_dict = {} for cur_name in self.sep_head_dict: ret_dict[cur_name] = self.__getattr__(cur_name)(x) return ret_dict class CenterHead(nn.Module): def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size, predict_boxes_when_training=True): super().__init__() self.model_cfg = model_cfg self.num_class = num_class self.grid_size = grid_size self.point_cloud_range = point_cloud_range self.voxel_size = voxel_size self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None) self.class_names = class_names self.class_names_each_head = [] self.class_id_mapping_each_head = [] for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD: self.class_names_each_head.append([x for x in cur_class_names if x in class_names]) cur_class_id_mapping = torch.from_numpy(np.array( [self.class_names.index(x) for x in cur_class_names if x in class_names] )).cuda() self.class_id_mapping_each_head.append(cur_class_id_mapping) total_classes = sum([len(x) for x in self.class_names_each_head]) assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}' norm_func = partial(nn.BatchNorm2d, eps=self.model_cfg.get('BN_EPS', 1e-5), momentum=self.model_cfg.get('BN_MOM', 0.1)) self.shared_conv = nn.Sequential( nn.Conv2d( input_channels, self.model_cfg.SHARED_CONV_CHANNEL, 3, stride=1, padding=1, bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False) ), norm_func(self.model_cfg.SHARED_CONV_CHANNEL), nn.ReLU(), ) self.heads_list = nn.ModuleList() self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG for head_idx, cur_class_names in enumerate(self.class_names_each_head): cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT) cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV) self.heads_list.append( SeparateHead( input_channels=self.model_cfg.SHARED_CONV_CHANNEL, sep_head_dict=cur_head_dict, init_bias=-2.19, use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False), norm_func=norm_func ) ) self.predict_boxes_when_training = predict_boxes_when_training self.forward_ret_dict = {} self.build_losses() def build_losses(self): self.add_module('hm_loss_func', loss_utils.FocalLossCenterNet()) self.add_module('reg_loss_func', loss_utils.RegLossCenterNet()) def assign_target_of_single_head( self, num_classes, gt_boxes, feature_map_size, feature_map_stride, num_max_objs=500, gaussian_overlap=0.1, min_radius=2 ): heatmap = gt_boxes.new_zeros(num_classes, feature_map_size[1], feature_map_size[0]) ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1)) inds = gt_boxes.new_zeros(num_max_objs).long() mask = gt_boxes.new_zeros(num_max_objs).long() ret_boxes_src = gt_boxes.new_zeros(num_max_objs, gt_boxes.shape[-1]) ret_boxes_src[:gt_boxes.shape[0]] = gt_boxes x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2] coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride coord_x = torch.clamp(coord_x, min=0, max=feature_map_size[0] - 0.5) coord_y = torch.clamp(coord_y, min=0, max=feature_map_size[1] - 0.5) center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1) center_int = center.int() center_int_float = center_int.float() dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5] dx = dx / self.voxel_size[0] / feature_map_stride dy = dy / self.voxel_size[1] / feature_map_stride radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap) radius = torch.clamp_min(radius.int(), min=min_radius) for k in range(min(num_max_objs, gt_boxes.shape[0])): if dx[k] <= 0 or dy[k] <= 0: continue if not (0 <= center_int[k][0] <= feature_map_size[0] and 0 <= center_int[k][1] <= feature_map_size[1]): continue cur_class_id = (gt_boxes[k, -1] - 1).long() centernet_utils.draw_gaussian_to_heatmap(heatmap[cur_class_id], center[k], radius[k].item()) inds[k] = center_int[k, 1] * feature_map_size[0] + center_int[k, 0] mask[k] = 1 ret_boxes[k, 0:2] = center[k] - center_int_float[k].float() ret_boxes[k, 2] = z[k] ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log() ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6]) ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6]) if gt_boxes.shape[1] > 8: ret_boxes[k, 8:] = gt_boxes[k, 7:-1] return heatmap, ret_boxes, inds, mask, ret_boxes_src # ===================== 关键修改:assign_targets 支持 per-head assigner ===================== def assign_targets(self, gt_boxes, feature_map_size=None, **kwargs): feature_map_size = feature_map_size[::-1] # [H, W] ==> [x, y] target_assigner_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG # 新增:每个 head 独立配置(list[dict/EasyDict]) target_assigner_cfg_each_head = self.model_cfg.get('TARGET_ASSIGNER_CONFIG_EACH_HEAD', None) if target_assigner_cfg_each_head is not None: assert len(target_assigner_cfg_each_head) == len(self.class_names_each_head), \ f"TARGET_ASSIGNER_CONFIG_EACH_HEAD 长度({len(target_assigner_cfg_each_head)}) " \ f"必须等于 head 数({len(self.class_names_each_head)})" # 新增:如果外部传了 feature_map_stride(你 forward 里传了 spatial_features_2d_strides),优先用它 runtime_stride = kwargs.get('feature_map_stride', None) if runtime_stride is not None: # 有些地方可能传 list/tuple,这里只取 int if isinstance(runtime_stride, (list, tuple)): runtime_stride = int(runtime_stride[0]) else: runtime_stride = int(runtime_stride) def _get_cfg_value(cfg, key, default): # cfg 可能是 dict / EasyDict if cfg is None: return default if isinstance(cfg, dict): return cfg.get(key, default) # EasyDict / edict 都支持 getattr return getattr(cfg, key, default) batch_size = gt_boxes.shape[0] ret_dict = { 'heatmaps': [], 'target_boxes': [], 'inds': [], 'masks': [], 'heatmap_masks': [], 'target_boxes_src': [], } all_names = np.array(['bg', *self.class_names]) for head_idx, cur_class_names in enumerate(self.class_names_each_head): # per-head cfg cur_cfg = target_assigner_cfg_each_head[head_idx] if target_assigner_cfg_each_head is not None else target_assigner_cfg cur_feature_map_stride = _get_cfg_value(cur_cfg, 'FEATURE_MAP_STRIDE', target_assigner_cfg.FEATURE_MAP_STRIDE) cur_num_max_objs = _get_cfg_value(cur_cfg, 'NUM_MAX_OBJS', target_assigner_cfg.NUM_MAX_OBJS) cur_gaussian_overlap = _get_cfg_value(cur_cfg, 'GAUSSIAN_OVERLAP', target_assigner_cfg.GAUSSIAN_OVERLAP) cur_min_radius = _get_cfg_value(cur_cfg, 'MIN_RADIUS', target_assigner_cfg.MIN_RADIUS) # runtime stride 优先(避免 stride=4 backbone 但 cfg 写错导致全偏) if runtime_stride is not None: cur_feature_map_stride = runtime_stride heatmap_list, target_boxes_list, inds_list, masks_list, target_boxes_src_list = [], [], [], [], [] for bs_idx in range(batch_size): cur_gt_boxes = gt_boxes[bs_idx] gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()] gt_boxes_single_head = [] # 修复:不要复用 idx(原代码会覆盖外层 idx) for gt_idx, name in enumerate(gt_class_names): if name not in cur_class_names: continue temp_box = cur_gt_boxes[gt_idx].clone() temp_box[-1] = cur_class_names.index(name) + 1 gt_boxes_single_head.append(temp_box[None, :]) if len(gt_boxes_single_head) == 0: gt_boxes_single_head = cur_gt_boxes[:0, :] else: gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0) heatmap, ret_boxes, inds, mask, ret_boxes_src = self.assign_target_of_single_head( num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head.cpu(), feature_map_size=feature_map_size, feature_map_stride=cur_feature_map_stride, num_max_objs=cur_num_max_objs, gaussian_overlap=cur_gaussian_overlap, min_radius=cur_min_radius, ) heatmap_list.append(heatmap.to(gt_boxes_single_head.device)) target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device)) inds_list.append(inds.to(gt_boxes_single_head.device)) masks_list.append(mask.to(gt_boxes_single_head.device)) target_boxes_src_list.append(ret_boxes_src.to(gt_boxes_single_head.device)) ret_dict['heatmaps'].append(torch.stack(heatmap_list, dim=0)) ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0)) ret_dict['inds'].append(torch.stack(inds_list, dim=0)) ret_dict['masks'].append(torch.stack(masks_list, dim=0)) ret_dict['target_boxes_src'].append(torch.stack(target_boxes_src_list, dim=0)) return ret_dict def sigmoid(self, x): y = torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4) return y # ===================== 小修复:iou_reg_loss 时 batch_box_preds_for_iou 必然存在 ===================== def get_loss(self): pred_dicts = self.forward_ret_dict['pred_dicts'] target_dicts = self.forward_ret_dict['target_dicts'] tb_dict = {} loss = 0 for head_idx, pred_dict in enumerate(pred_dicts): pred_dict['hm'] = self.sigmoid(pred_dict['hm']) hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][head_idx]) hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight'] target_boxes = target_dicts['target_boxes'][head_idx] pred_boxes = torch.cat([pred_dict[head_name] for head_name in self.separate_head_cfg.HEAD_ORDER], dim=1) reg_loss = self.reg_loss_func( pred_boxes, target_dicts['masks'][head_idx], target_dicts['inds'][head_idx], target_boxes ) loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum() loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += hm_loss + loc_loss tb_dict['hm_loss_head_%d' % head_idx] = hm_loss.item() tb_dict['loc_loss_head_%d' % head_idx] = loc_loss.item() if ('iou' in pred_dict) or self.model_cfg.get('IOU_REG_LOSS', False): batch_box_preds = centernet_utils.decode_bbox_from_pred_dicts( pred_dict=pred_dict, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride ) # (B, H, W, 7 or 9) # 无论有没有 iou head,都定义 batch_box_preds_for_iou,避免未定义 batch_box_preds_for_iou = batch_box_preds.permute(0, 3, 1, 2).contiguous() if 'iou' in pred_dict: iou_loss = loss_utils.calculate_iou_loss_centerhead( iou_preds=pred_dict['iou'], batch_box_preds=batch_box_preds_for_iou.clone().detach(), mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) loss += iou_loss tb_dict['iou_loss_head_%d' % head_idx] = iou_loss.item() if self.model_cfg.get('IOU_REG_LOSS', False): iou_reg_loss = loss_utils.calculate_iou_reg_loss_centerhead( batch_box_preds=batch_box_preds_for_iou, mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) if target_dicts['masks'][head_idx].sum().item() != 0: iou_reg_loss = iou_reg_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += iou_reg_loss tb_dict['iou_reg_loss_head_%d' % head_idx] = iou_reg_loss.item() else: loss += (batch_box_preds_for_iou * 0.).sum() tb_dict['iou_reg_loss_head_%d' % head_idx] = (batch_box_preds_for_iou * 0.).sum() tb_dict['rpn_loss'] = loss.item() return loss, tb_dict def generate_predicted_boxes(self, batch_size, pred_dicts): post_process_cfg = self.model_cfg.POST_PROCESSING post_center_limit_range = torch.tensor(post_process_cfg.POST_CENTER_LIMIT_RANGE).cuda().float() ret_dict = [{ 'pred_boxes': [], 'pred_scores': [], 'pred_labels': [], } for k in range(batch_size)] for head_idx, pred_dict in enumerate(pred_dicts): batch_hm = pred_dict['hm'].sigmoid() batch_center = pred_dict['center'] batch_center_z = pred_dict['center_z'] batch_dim = pred_dict['dim'].exp() batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1) batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1) batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None batch_iou = (pred_dict['iou'] + 1) * 0.5 if 'iou' in pred_dict else None final_pred_dicts = centernet_utils.decode_bbox_from_heatmap( heatmap=batch_hm, rot_cos=batch_rot_cos, rot_sin=batch_rot_sin, center=batch_center, center_z=batch_center_z, dim=batch_dim, vel=batch_vel, iou=batch_iou, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride, K=post_process_cfg.MAX_OBJ_PER_SAMPLE, circle_nms=(post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'), score_thresh=post_process_cfg.SCORE_THRESH, post_center_limit_range=post_center_limit_range ) for k, final_dict in enumerate(final_pred_dicts): final_dict['pred_labels'] = self.class_id_mapping_each_head[head_idx][final_dict['pred_labels'].long()] if post_process_cfg.get('USE_IOU_TO_RECTIFY_SCORE', False) and 'pred_iou' in final_dict: pred_iou = torch.clamp(final_dict['pred_iou'], min=0, max=1.0) IOU_RECTIFIER = final_dict['pred_scores'].new_tensor(post_process_cfg.IOU_RECTIFIER) final_dict['pred_scores'] = torch.pow( final_dict['pred_scores'], 1 - IOU_RECTIFIER[final_dict['pred_labels']] ) * torch.pow( pred_iou, IOU_RECTIFIER[final_dict['pred_labels']] ) if post_process_cfg.NMS_CONFIG.NMS_TYPE not in ['circle_nms', 'class_specific_nms']: selected, selected_scores = model_nms_utils.class_agnostic_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], nms_config=post_process_cfg.NMS_CONFIG, score_thresh=None ) elif post_process_cfg.NMS_CONFIG.NMS_TYPE == 'class_specific_nms': selected, selected_scores = model_nms_utils.class_specific_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], box_labels=final_dict['pred_labels'], nms_config=post_process_cfg.NMS_CONFIG, score_thresh=post_process_cfg.NMS_CONFIG.get('SCORE_THRESH', None) ) elif post_process_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms': raise NotImplementedError final_dict['pred_boxes'] = final_dict['pred_boxes'][selected] final_dict['pred_scores'] = selected_scores final_dict['pred_labels'] = final_dict['pred_labels'][selected] ret_dict[k]['pred_boxes'].append(final_dict['pred_boxes']) ret_dict[k]['pred_scores'].append(final_dict['pred_scores']) ret_dict[k]['pred_labels'].append(final_dict['pred_labels']) for k in range(batch_size): ret_dict[k]['pred_boxes'] = torch.cat(ret_dict[k]['pred_boxes'], dim=0) ret_dict[k]['pred_scores'] = torch.cat(ret_dict[k]['pred_scores'], dim=0) ret_dict[k]['pred_labels'] = torch.cat(ret_dict[k]['pred_labels'], dim=0) + 1 return ret_dict @staticmethod def reorder_rois_for_refining(batch_size, pred_dicts): num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts]) num_max_rois = max(1, num_max_rois) pred_boxes = pred_dicts[0]['pred_boxes'] rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1])) roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois)) roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long() for bs_idx in range(batch_size): num_boxes = len(pred_dicts[bs_idx]['pred_boxes']) rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes'] roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores'] roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels'] return rois, roi_scores, roi_labels def forward(self, data_dict): spatial_features_2d = data_dict['spatial_features_2d'] x = self.shared_conv(spatial_features_2d) pred_dicts = [] for head in self.heads_list: pred_dicts.append(head(x)) if self.training: target_dict = self.assign_targets( data_dict['gt_boxes'], feature_map_size=spatial_features_2d.size()[2:], feature_map_stride=data_dict.get('spatial_features_2d_strides', None) ) self.forward_ret_dict['target_dicts'] = target_dict self.forward_ret_dict['pred_dicts'] = pred_dicts if not self.training or self.predict_boxes_when_training: pred_dicts = self.generate_predicted_boxes( data_dict['batch_size'], pred_dicts ) if self.predict_boxes_when_training: rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], pred_dicts) data_dict['rois'] = rois data_dict['roi_scores'] = roi_scores data_dict['roi_labels'] = roi_labels data_dict['has_class_labels'] = True else: data_dict['final_box_dicts'] = pred_dicts return data_dict
在你的 DENSE_HEAD 里加入:
yamlCLASS_NAMES_EACH_HEAD: [ ['Bird', 'Kite'], ['Drone', 'Plastic_sheet', 'Balloon'] ] TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 TARGET_ASSIGNER_CONFIG_EACH_HEAD: - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.35 MIN_RADIUS: 1 - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.10 MIN_RADIUS: 2
并且你的 SEPARATE_HEAD_CFG.HEAD_DICT 加上 iou(你这份代码原生支持):
yamlSEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, 'iou': {'out_channels': 1, 'num_conv': 2} } IOU_REG_LOSS: True POST_PROCESSING: USE_IOU_TO_RECTIFY_SCORE: True IOU_RECTIFIER: [0.50, 0.60, 0.70, 0.30, 0.70]
你 backbone stride 已经改为 4,但 cfg 里 FEATURE_MAP_STRIDE 还写 8
→ 会导致热力图中心与 GT 对不上,mAP 会崩。
你加了 per-head cfg,但 CenterHead 没支持 list cfg
→ 这就是我上面给你改 assign_targets 的原因。
如果你把你当前的 VoxelResBackBone8x 贴出来(或者告诉我你用的是哪个 spconv backbone 文件/类名),我也可以把 VoxelResBackBone4x 给你写成完全和你版本一致的实现(避免 stride、Z 压缩、HeightCompression 通道不对导致的 shape error)。
现在代码结果全是0怎么办
你这个“结果全是 0”,在你这套 CenterHead + 新增 iou head + stride=4 改动里,最常见就两类原因(基本都能把预测框直接“杀光”):
post_center_limit_range 过滤掉,最终一个框都没有。generate_predicted_boxes 里用的是 batch_iou = (pred_iou + 1)*0.5(隐含假设 iou 输出在 [-1,1]),实际卷积输出可能远小于 -1 → iou 变成负数,很多实现会在 decode/topK 或 score_thresh 里导致候选全没了(尤其你又开了 USE_IOU_TO_RECTIFY_SCORE 时更容易“全灭”)。下面我给你一个最稳的修复版:
spatial_features_2d_strides,没有就用 cfg)forward() 开头同步 self.feature_map_stride把下面这段插到 forward() 里,紧跟着:
pythonspatial_features_2d = data_dict['spatial_features_2d']
插入:
python# --------- 关键:动态同步 feature_map_stride,避免 4/8 不一致导致预测全被过滤 ---------- runtime_stride = data_dict.get('spatial_features_2d_strides', None) if runtime_stride is None: runtime_stride = data_dict.get('spatial_features_stride', None) # 有的版本用这个键 if runtime_stride is not None: if isinstance(runtime_stride, (list, tuple)): runtime_stride = runtime_stride[0] if torch.is_tensor(runtime_stride): runtime_stride = runtime_stride.item() self.feature_map_stride = int(runtime_stride)
这一步非常关键:不管你 cfg 写没写对,只要 backbone/BEV 输出 stride=4,decode 就会用 4,不会“全过滤”。
generate_predicted_boxes() 里,把 iou 的线性映射改成 sigmoid(只影响推理/解码)找到你原来的这行:
pythonbatch_iou = (pred_dict['iou'] + 1) * 0.5 if 'iou' in pred_dict else None
改成:
python# iou head 输出无界,直接线性映射很容易产生 <0,导致 decode/阈值阶段全灭 batch_iou = torch.sigmoid(pred_dict['iou']) if 'iou' in pred_dict else None
这一步能立刻解决“加了 iou head 以后,一个框都没有”的高频问题。
assign_targets():不要用 runtime_stride 覆盖(否则你某个 key 一旦错,训练 target 全偏)你现在我之前给你的版本里有 runtime_stride 覆盖逻辑。建议你先去掉,让它完全使用 cfg 里每 head 的 stride(你现在明确想用 4)。
也就是把我之前 patch 里这段(如果你已经加过)删掉/注释掉:
pythonruntime_stride = kwargs.get('feature_map_stride', None) ... if runtime_stride is not None: cur_feature_map_stride = runtime_stride
保持只用:
pythoncur_feature_map_stride = _get_cfg_value(cur_cfg, 'FEATURE_MAP_STRIDE', target_assigner_cfg.FEATURE_MAP_STRIDE)
yamlDENSE_HEAD: TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4
pythonbatch_dict['encoded_spconv_tensor_stride'] = 4
不同 OpenPCDet 版本键名不一样,你的 CenterHead forward 里传的是:
pythondata_dict.get('spatial_features_2d_strides', None)
如果你的版本根本没这个 key,那么就靠我上面加的 fallback:
pythondata_dict.get('spatial_features_stride', None)
或者干脆就用 cfg 的 4。
在 generate_predicted_boxes() 里,NMS 前加:
pythonif head_idx == 0: print('DEBUG head0 pre-nms num:', final_dict['pred_boxes'].shape[0], 'score min/max:', final_dict['pred_scores'].min().item() if final_dict['pred_scores'].numel() else -1, final_dict['pred_scores'].max().item() if final_dict['pred_scores'].numel() else -1)
你如果 加载的是原来没有 iou head、也不是 stride=4 backbone 的 ckpt,那么:
这时就算不报错,也很容易输出质量极差甚至被阈值全过滤。
建议你至少先:
USE_IOU_TO_RECTIFY_SCORE 跑一次验证“能不能出框”forward() 同步 stride + generate_predicted_boxes() iou sigmoid 改掉FEATURE_MAP_STRIDE=4,并且 backbone 里 encoded_spconv_tensor_stride=4如果你改完还是全 0:把你验证时终端里 “加载 ckpt 的 missing/unexpected keys 提示” 和我上面那行 debug 打印结果贴出来(尤其是 pre-nms num 和 score min/max),我可以直接告诉你是 stride、iou、还是后处理哪一步在清空候选。
我现在给你bev用的代码:import torch.nn as nn
class HeightCompression(nn.Module):
def init(self, model_cfg, **kwargs):
super().init()
self.model_cfg = model_cfg
self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES
textdef forward(self, batch_dict): """ Args: batch_dict: encoded_spconv_tensor: sparse tensor Returns: batch_dict: spatial_features: """ encoded_spconv_tensor = batch_dict['encoded_spconv_tensor'] spatial_features = encoded_spconv_tensor.dense() N, C, D, H, W = spatial_features.shape spatial_features = spatial_features.view(N, C * D, H, W) batch_dict['spatial_features'] = spatial_features batch_dict['spatial_features_stride'] = batch_dict['encoded_spconv_tensor_stride'] return batch_dict
现在你需要给我完整的center_head.py,因为你让我改来改去,我已经改乱了,如何定位地代码你也可以一并加入
下面给你一份**“可直接整文件替换”的 center_head.py 完整版本**(基于你贴的原始代码风格),并且我把定位用的 Debug 输出也一并加进去了,避免你再“改乱”。
这份文件做了这些事(都为了解决你现在“全 0”):
data_dict['spatial_features_stride'](你 HeightCompression 已经写了)同步到 self.feature_map_stride,避免 stride=4/8 不一致导致 decode 出来的框全被过滤。(x+1)*0.5 导致大量负值/异常,进而候选框“全灭”。TARGET_ASSIGNER_CONFIG_EACH_HEAD:每个 head 可用不同 GAUSSIAN_OVERLAP/MIN_RADIUS/NUM_MAX_OBJS。idx + 内层 idx 覆盖,虽然不一定致命,但会带来隐蔽问题。pcdet/models/dense_heads/center_head.py(整文件替换)pythonimport os import copy import numpy as np import torch import torch.nn as nn from torch.nn.init import kaiming_normal_ from functools import partial from ..model_utils import model_nms_utils from ..model_utils import centernet_utils from ...utils import loss_utils class SeparateHead(nn.Module): def __init__(self, input_channels, sep_head_dict, init_bias=-2.19, use_bias=False, norm_func=None): super().__init__() self.sep_head_dict = sep_head_dict for cur_name in self.sep_head_dict: output_channels = self.sep_head_dict[cur_name]['out_channels'] num_conv = self.sep_head_dict[cur_name]['num_conv'] fc_list = [] for k in range(num_conv - 1): fc_list.append(nn.Sequential( nn.Conv2d(input_channels, input_channels, kernel_size=3, stride=1, padding=1, bias=use_bias), nn.BatchNorm2d(input_channels) if norm_func is None else norm_func(input_channels), nn.ReLU() )) fc_list.append(nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=1, padding=1, bias=True)) fc = nn.Sequential(*fc_list) if 'hm' in cur_name: fc[-1].bias.data.fill_(init_bias) else: for m in fc.modules(): if isinstance(m, nn.Conv2d): kaiming_normal_(m.weight.data) if hasattr(m, "bias") and m.bias is not None: nn.init.constant_(m.bias, 0) self.__setattr__(cur_name, fc) def forward(self, x): ret_dict = {} for cur_name in self.sep_head_dict: ret_dict[cur_name] = self.__getattr__(cur_name)(x) return ret_dict class CenterHead(nn.Module): def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size, predict_boxes_when_training=True): super().__init__() self.model_cfg = model_cfg self.num_class = num_class self.grid_size = grid_size self.point_cloud_range = point_cloud_range self.voxel_size = voxel_size # 默认 stride 来自 cfg;但 forward 会用 data_dict['spatial_features_stride'] 动态同步 self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None) self.class_names = class_names self.class_names_each_head = [] self.class_id_mapping_each_head = [] # Debug 开关:cfg 里写 DEBUG_CENTERHEAD: True 或者环境变量 PCDET_DEBUG_CENTERHEAD=1 self.debug = bool(self.model_cfg.get('DEBUG_CENTERHEAD', False)) or (os.environ.get('PCDET_DEBUG_CENTERHEAD', '0') == '1') self._debug_printed_once = False for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD: self.class_names_each_head.append([x for x in cur_class_names if x in class_names]) cur_class_id_mapping = torch.from_numpy(np.array( [self.class_names.index(x) for x in cur_class_names if x in class_names] )).cuda() self.class_id_mapping_each_head.append(cur_class_id_mapping) total_classes = sum([len(x) for x in self.class_names_each_head]) assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}' norm_func = partial( nn.BatchNorm2d, eps=self.model_cfg.get('BN_EPS', 1e-5), momentum=self.model_cfg.get('BN_MOM', 0.1) ) self.shared_conv = nn.Sequential( nn.Conv2d( input_channels, self.model_cfg.SHARED_CONV_CHANNEL, 3, stride=1, padding=1, bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False) ), norm_func(self.model_cfg.SHARED_CONV_CHANNEL), nn.ReLU(), ) self.heads_list = nn.ModuleList() self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG for head_idx, cur_class_names in enumerate(self.class_names_each_head): cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT) cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV) self.heads_list.append( SeparateHead( input_channels=self.model_cfg.SHARED_CONV_CHANNEL, sep_head_dict=cur_head_dict, init_bias=-2.19, use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False), norm_func=norm_func ) ) self.predict_boxes_when_training = predict_boxes_when_training self.forward_ret_dict = {} self.build_losses() def _dbg(self, msg: str): if not self.debug: return # 控制一下打印频率:每次 forward 都会跑,这里只在第一次打印“结构信息” if not self._debug_printed_once: print(f"[CenterHead-DEBUG] {msg}") else: # 训练中也可以适量打印,不想太吵可以注释掉 print(f"[CenterHead-DEBUG] {msg}") def build_losses(self): from ...utils import loss_utils as loss_utils_pkg self.add_module('hm_loss_func', loss_utils_pkg.FocalLossCenterNet()) self.add_module('reg_loss_func', loss_utils_pkg.RegLossCenterNet()) def assign_target_of_single_head( self, num_classes, gt_boxes, feature_map_size, feature_map_stride, num_max_objs=500, gaussian_overlap=0.1, min_radius=2 ): heatmap = gt_boxes.new_zeros(num_classes, feature_map_size[1], feature_map_size[0]) ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1)) inds = gt_boxes.new_zeros(num_max_objs).long() mask = gt_boxes.new_zeros(num_max_objs).long() ret_boxes_src = gt_boxes.new_zeros(num_max_objs, gt_boxes.shape[-1]) ret_boxes_src[:gt_boxes.shape[0]] = gt_boxes x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2] coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride coord_x = torch.clamp(coord_x, min=0, max=feature_map_size[0] - 0.5) coord_y = torch.clamp(coord_y, min=0, max=feature_map_size[1] - 0.5) center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1) center_int = center.int() center_int_float = center_int.float() dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5] dx = dx / self.voxel_size[0] / feature_map_stride dy = dy / self.voxel_size[1] / feature_map_stride radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap) radius = torch.clamp_min(radius.int(), min=min_radius) for k in range(min(num_max_objs, gt_boxes.shape[0])): if dx[k] <= 0 or dy[k] <= 0: continue if not (0 <= center_int[k][0] <= feature_map_size[0] and 0 <= center_int[k][1] <= feature_map_size[1]): continue cur_class_id = (gt_boxes[k, -1] - 1).long() centernet_utils.draw_gaussian_to_heatmap(heatmap[cur_class_id], center[k], radius[k].item()) inds[k] = center_int[k, 1] * feature_map_size[0] + center_int[k, 0] mask[k] = 1 ret_boxes[k, 0:2] = center[k] - center_int_float[k].float() ret_boxes[k, 2] = z[k] ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log() ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6]) ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6]) if gt_boxes.shape[1] > 8: ret_boxes[k, 8:] = gt_boxes[k, 7:-1] return heatmap, ret_boxes, inds, mask, ret_boxes_src def assign_targets(self, gt_boxes, feature_map_size=None, **kwargs): """ 支持: - TARGET_ASSIGNER_CONFIG(全局默认) - TARGET_ASSIGNER_CONFIG_EACH_HEAD(每个 head 一套) """ feature_map_size = feature_map_size[::-1] # [H, W] ==> [x, y] default_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG each_head_cfg = self.model_cfg.get('TARGET_ASSIGNER_CONFIG_EACH_HEAD', None) if each_head_cfg is not None: assert len(each_head_cfg) == len(self.class_names_each_head), \ f"TARGET_ASSIGNER_CONFIG_EACH_HEAD length {len(each_head_cfg)} != num_heads {len(self.class_names_each_head)}" def _get(cfg, key, default_val): if cfg is None: return default_val if isinstance(cfg, dict): return cfg.get(key, default_val) return getattr(cfg, key, default_val) batch_size = gt_boxes.shape[0] ret_dict = { 'heatmaps': [], 'target_boxes': [], 'inds': [], 'masks': [], 'heatmap_masks': [], 'target_boxes_src': [], } all_names = np.array(['bg', *self.class_names]) for head_idx, cur_class_names in enumerate(self.class_names_each_head): cur_cfg = each_head_cfg[head_idx] if each_head_cfg is not None else default_cfg cur_feature_map_stride = _get(cur_cfg, 'FEATURE_MAP_STRIDE', default_cfg.FEATURE_MAP_STRIDE) cur_num_max_objs = _get(cur_cfg, 'NUM_MAX_OBJS', default_cfg.NUM_MAX_OBJS) cur_gaussian_overlap = _get(cur_cfg, 'GAUSSIAN_OVERLAP', default_cfg.GAUSSIAN_OVERLAP) cur_min_radius = _get(cur_cfg, 'MIN_RADIUS', default_cfg.MIN_RADIUS) heatmap_list, target_boxes_list, inds_list, masks_list, target_boxes_src_list = [], [], [], [], [] for bs_idx in range(batch_size): cur_gt_boxes = gt_boxes[bs_idx] gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()] gt_boxes_single_head = [] for gt_idx, name in enumerate(gt_class_names): # 修复:不要复用 idx if name not in cur_class_names: continue temp_box = cur_gt_boxes[gt_idx].clone() temp_box[-1] = cur_class_names.index(name) + 1 gt_boxes_single_head.append(temp_box[None, :]) if len(gt_boxes_single_head) == 0: gt_boxes_single_head = cur_gt_boxes[:0, :] else: gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0) heatmap, ret_boxes, inds, mask, ret_boxes_src = self.assign_target_of_single_head( num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head.cpu(), feature_map_size=feature_map_size, feature_map_stride=cur_feature_map_stride, num_max_objs=cur_num_max_objs, gaussian_overlap=cur_gaussian_overlap, min_radius=cur_min_radius, ) heatmap_list.append(heatmap.to(gt_boxes_single_head.device)) target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device)) inds_list.append(inds.to(gt_boxes_single_head.device)) masks_list.append(mask.to(gt_boxes_single_head.device)) target_boxes_src_list.append(ret_boxes_src.to(gt_boxes_single_head.device)) ret_dict['heatmaps'].append(torch.stack(heatmap_list, dim=0)) ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0)) ret_dict['inds'].append(torch.stack(inds_list, dim=0)) ret_dict['masks'].append(torch.stack(masks_list, dim=0)) ret_dict['target_boxes_src'].append(torch.stack(target_boxes_src_list, dim=0)) return ret_dict def sigmoid(self, x): return torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4) def get_loss(self): pred_dicts = self.forward_ret_dict['pred_dicts'] target_dicts = self.forward_ret_dict['target_dicts'] tb_dict = {} loss = 0 for head_idx, pred_dict in enumerate(pred_dicts): pred_dict['hm'] = self.sigmoid(pred_dict['hm']) hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][head_idx]) hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight'] target_boxes = target_dicts['target_boxes'][head_idx] pred_boxes = torch.cat([pred_dict[name] for name in self.separate_head_cfg.HEAD_ORDER], dim=1) reg_loss = self.reg_loss_func( pred_boxes, target_dicts['masks'][head_idx], target_dicts['inds'][head_idx], target_boxes ) loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum() loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += hm_loss + loc_loss tb_dict[f'hm_loss_head_{head_idx}'] = hm_loss.item() tb_dict[f'loc_loss_head_{head_idx}'] = loc_loss.item() if ('iou' in pred_dict) or self.model_cfg.get('IOU_REG_LOSS', False): batch_box_preds = centernet_utils.decode_bbox_from_pred_dicts( pred_dict=pred_dict, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride ) batch_box_preds_for_iou = batch_box_preds.permute(0, 3, 1, 2).contiguous() if 'iou' in pred_dict: iou_loss = loss_utils.calculate_iou_loss_centerhead( iou_preds=pred_dict['iou'], batch_box_preds=batch_box_preds_for_iou.clone().detach(), mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) loss += iou_loss tb_dict[f'iou_loss_head_{head_idx}'] = iou_loss.item() if self.model_cfg.get('IOU_REG_LOSS', False): iou_reg_loss = loss_utils.calculate_iou_reg_loss_centerhead( batch_box_preds=batch_box_preds_for_iou, mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) if target_dicts['masks'][head_idx].sum().item() != 0: iou_reg_loss = iou_reg_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += iou_reg_loss tb_dict[f'iou_reg_loss_head_{head_idx}'] = iou_reg_loss.item() else: loss += (batch_box_preds_for_iou * 0.).sum() tb_dict[f'iou_reg_loss_head_{head_idx}'] = (batch_box_preds_for_iou * 0.).sum() tb_dict['rpn_loss'] = loss.item() return loss, tb_dict def generate_predicted_boxes(self, batch_size, pred_dicts): post_cfg = self.model_cfg.POST_PROCESSING post_center_limit_range = torch.tensor(post_cfg.POST_CENTER_LIMIT_RANGE).cuda().float() ret_dict = [{ 'pred_boxes': [], 'pred_scores': [], 'pred_labels': [], } for _ in range(batch_size)] for head_idx, pred_dict in enumerate(pred_dicts): batch_hm = pred_dict['hm'].sigmoid() batch_center = pred_dict['center'] batch_center_z = pred_dict['center_z'] batch_dim = pred_dict['dim'].exp() batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1) batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1) batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None # 关键修复:iou head 用 sigmoid,避免 (x+1)*0.5 导致大量负值/异常,候选全灭 batch_iou = torch.sigmoid(pred_dict['iou']) if 'iou' in pred_dict else None final_pred_dicts = centernet_utils.decode_bbox_from_heatmap( heatmap=batch_hm, rot_cos=batch_rot_cos, rot_sin=batch_rot_sin, center=batch_center, center_z=batch_center_z, dim=batch_dim, vel=batch_vel, iou=batch_iou, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride, K=post_cfg.MAX_OBJ_PER_SAMPLE, circle_nms=(post_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'), score_thresh=post_cfg.SCORE_THRESH, post_center_limit_range=post_center_limit_range ) # Debug:看 decode 阶段是否已经 0 if self.debug and (not self._debug_printed_once): try: hm_max = float(batch_hm.max().item()) if batch_hm.numel() else -1 self._dbg(f"stride={self.feature_map_stride}, hm_max={hm_max}, " f"pred_has_iou={'iou' in pred_dict}, " f"spatial K={post_cfg.MAX_OBJ_PER_SAMPLE}, score_thresh={post_cfg.SCORE_THRESH}") except Exception: pass for bs_i, final_dict in enumerate(final_pred_dicts): final_dict['pred_labels'] = self.class_id_mapping_each_head[head_idx][final_dict['pred_labels'].long()] # 质量重打分 if post_cfg.get('USE_IOU_TO_RECTIFY_SCORE', False) and 'pred_iou' in final_dict: pred_iou = torch.clamp(final_dict['pred_iou'], min=0.0, max=1.0) IOU_RECTIFIER = final_dict['pred_scores'].new_tensor(post_cfg.IOU_RECTIFIER) final_dict['pred_scores'] = torch.pow( final_dict['pred_scores'], 1 - IOU_RECTIFIER[final_dict['pred_labels']] ) * torch.pow( pred_iou, IOU_RECTIFIER[final_dict['pred_labels']] ) # Debug:decode 后候选数量/分数范围 if self.debug: n_pre = int(final_dict['pred_boxes'].shape[0]) if n_pre == 0: self._dbg(f"[head{head_idx} bs{bs_i}] decode=0 (可能 stride/阈值/center_limit_range 过滤)") else: smin = float(final_dict['pred_scores'].min().item()) smax = float(final_dict['pred_scores'].max().item()) self._dbg(f"[head{head_idx} bs{bs_i}] decode={n_pre}, score[min,max]=[{smin:.4f},{smax:.4f}]") # NMS if post_cfg.NMS_CONFIG.NMS_TYPE not in ['circle_nms', 'class_specific_nms']: selected, selected_scores = model_nms_utils.class_agnostic_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], nms_config=post_cfg.NMS_CONFIG, score_thresh=None ) elif post_cfg.NMS_CONFIG.NMS_TYPE == 'class_specific_nms': selected, selected_scores = model_nms_utils.class_specific_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], box_labels=final_dict['pred_labels'], nms_config=post_cfg.NMS_CONFIG, score_thresh=post_cfg.NMS_CONFIG.get('SCORE_THRESH', None) ) else: raise NotImplementedError final_dict['pred_boxes'] = final_dict['pred_boxes'][selected] final_dict['pred_scores'] = selected_scores final_dict['pred_labels'] = final_dict['pred_labels'][selected] ret_dict[bs_i]['pred_boxes'].append(final_dict['pred_boxes']) ret_dict[bs_i]['pred_scores'].append(final_dict['pred_scores']) ret_dict[bs_i]['pred_labels'].append(final_dict['pred_labels']) for bs_i in range(batch_size): ret_dict[bs_i]['pred_boxes'] = torch.cat(ret_dict[bs_i]['pred_boxes'], dim=0) if len(ret_dict[bs_i]['pred_boxes']) else torch.empty((0, 7), device='cuda') ret_dict[bs_i]['pred_scores'] = torch.cat(ret_dict[bs_i]['pred_scores'], dim=0) if len(ret_dict[bs_i]['pred_scores']) else torch.empty((0,), device='cuda') ret_dict[bs_i]['pred_labels'] = torch.cat(ret_dict[bs_i]['pred_labels'], dim=0).long() + 1 if len(ret_dict[bs_i]['pred_labels']) else torch.empty((0,), dtype=torch.long, device='cuda') # 只打印一次“结构信息” self._debug_printed_once = True return ret_dict @staticmethod def reorder_rois_for_refining(batch_size, pred_dicts): num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts]) num_max_rois = max(1, num_max_rois) pred_boxes = pred_dicts[0]['pred_boxes'] rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1])) roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois)) roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long() for bs_idx in range(batch_size): num_boxes = len(pred_dicts[bs_idx]['pred_boxes']) rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes'] roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores'] roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels'] return rois, roi_scores, roi_labels def forward(self, data_dict): # 关键:与你的 HeightCompression 对齐(它写了 spatial_features_stride) runtime_stride = data_dict.get('spatial_features_stride', None) if runtime_stride is not None: if torch.is_tensor(runtime_stride): runtime_stride = runtime_stride.item() self.feature_map_stride = int(runtime_stride) spatial_features_2d = data_dict['spatial_features_2d'] if self.debug and (not self._debug_printed_once): self._dbg(f"spatial_features_2d shape={tuple(spatial_features_2d.shape)}, runtime_stride={self.feature_map_stride}") x = self.shared_conv(spatial_features_2d) pred_dicts = [] for head in self.heads_list: pred_dicts.append(head(x)) if self.training: target_dict = self.assign_targets( data_dict['gt_boxes'], feature_map_size=spatial_features_2d.size()[2:], ) self.forward_ret_dict['target_dicts'] = target_dict self.forward_ret_dict['pred_dicts'] = pred_dicts if not self.training or self.predict_boxes_when_training: final_box_dicts = self.generate_predicted_boxes( data_dict['batch_size'], pred_dicts ) if self.predict_boxes_when_training: rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], final_box_dicts) data_dict['rois'] = rois data_dict['roi_scores'] = roi_scores data_dict['roi_labels'] = roi_labels data_dict['has_class_labels'] = True else: data_dict['final_box_dicts'] = final_box_dicts return data_dict
spatial_features_2d你给的 HeightCompression 输出是:
batch_dict['spatial_features']batch_dict['spatial_features_stride']然后通常 BACKBONE_2D 会把 spatial_features 变成 spatial_features_2d。
如果你某处改 backbone 导致没生成 spatial_features_2d,CenterHead 会直接崩。但你现在能跑出“全 0”,说明这条链路基本是通的。
在你的 DENSE_HEAD 里加:
yamlDEBUG_CENTERHEAD: True
bashexport PCDET_DEBUG_CENTERHEAD=1
打开后你会看到类似输出:
runtime_stride=4(如果这里不是 4,你 stride 同步就没成功)hm_max=...(如果 hm_max 极小,比如接近 0,说明训练/加载权重有问题)[head0 bs0] decode=0(如果 decode=0,说明 decode 阶段已经没框了,最常见是 stride/阈值/范围过滤)decode=xxx score[min,max]=[...](如果 decode 有框但最终输出 0,就是 NMS/score rectify 清空)FEATURE_MAP_STRIDE 还是 8
spatial_features_stride 强行同步,能救 90%。(x+1)*0.5 导致异常
sigmoid,基本一刀解决“加 iou 后全灭”。yamlUSE_IOU_TO_RECTIFY_SCORE: False
如果你把 Debug 打印的前 20 行贴出来(尤其是 runtime_stride、hm_max、decode=0/非0),我可以直接告诉你到底是哪一步在把候选清成 0。
我现在的代码结果很好,结果如下,84.01,74.722,49,93.94,36.97,58.93,99.91,83.86,我的配置文件如下:#CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
#DATA_CONFIG:
#MODEL:
#OPTIMIZATION:
CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml
MODEL:
NAME: Cifu
textVFE: NAME: PDVMeanVFE USE_XY_RANGE: True DENSITY_LOG1P: True RANGE_SCALE: 160.0 INTENSITY_IDX: 3 NUM_HIST_BINS: 10 HIST_MIN: 0.0 HIST_MAX: 1.0 BACKBONE_3D: NAME: VoxelResBackBone4x # <<< 改这里(Step1) MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 # 仍保持 256(我们 backbone 设计就是为了不破这个) BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHeads CLASS_AGNOSTIC: False # <<< 改这里:双 head(Tiny / Normal) CLASS_NAMES_EACH_HEAD: [ ['Bird', 'Kite'], ['Drone', 'Plastic_sheet', 'Balloon'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True NUM_HM_CONV: 2 SEPARATE_HEAD_CFG: # 注意:不要把 iou 放进 HEAD_ORDER(否则回归维度和 code_weights 会错) HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, # <<< 新增:IoU / quality 分支(Step4) 'iou': {'out_channels': 1, 'num_conv': 2} } # <<< 改这里:stride 8 -> 4 TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 新增:每个 head 自己的 assigner(需要 Step3 的代码支持) TARGET_ASSIGNER_CONFIG_EACH_HEAD: - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.35 # tiny:更“尖”的监督(更小 radius) MIN_RADIUS: 1 DENSE_REG: 1 - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.10 # normal:保持你原本风格 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 可选但建议:训练期加 IoU reg(不影响推理FPS) IOU_REG_LOSS: True LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-160, -75.2, -5.0, 160, 75.2, 8.0] MAX_OBJ_PER_SAMPLE: 500 # <<< 新增:用 iou 预测纠正 score(质量感知排序) USE_IOU_TO_RECTIFY_SCORE: True # 对应 CLASS_NAMES 顺序:Drone, Plastic_sheet, Kite, Balloon, Bird # alpha 越大,越依赖 iou(对 tiny 类一般设大点更稳) IOU_RECTIFIER: [0.50, 0.60, 0.70, 0.30, 0.70] NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80
textOPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
我尝试了将[-160, -75.2, -5.0, 160, 75.2, 8.0]改为和我点云范围一样,但是效果很差,所以我还是使用这个默认值,现在我的问题是现在的模块改进和原始的centerpoint有什么区别,改进点是否是3个模块,LOSS还能改进吗,如果模块不够还能怎么改进呢,原始的配置文件如下:CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml
MODEL:
NAME: CenterPoint
textVFE: NAME: MeanVFE BACKBONE_3D: NAME: VoxelResBackBone8x MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHead CLASS_AGNOSTIC: False CLASS_NAMES_EACH_HEAD: [ ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True # TODO NUM_HM_CONV: 2 # TODO SEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, } TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 8 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0] MAX_OBJ_PER_SAMPLE: 500 NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80
textOPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
我现在的代码结果很好,结果如下,84.01,74.722,49,93.94,36.97,58.93,99.91,83.86,我的配置文件如下:#CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
#DATA_CONFIG:
#MODEL:
#OPTIMIZATION:
CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml
MODEL:
NAME: Cifu
textVFE: NAME: PDVMeanVFE USE_XY_RANGE: True DENSITY_LOG1P: True RANGE_SCALE: 160.0 INTENSITY_IDX: 3 NUM_HIST_BINS: 10 HIST_MIN: 0.0 HIST_MAX: 1.0 BACKBONE_3D: NAME: VoxelResBackBone4x # <<< 改这里(Step1) MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 # 仍保持 256(我们 backbone 设计就是为了不破这个) BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHeads CLASS_AGNOSTIC: False # <<< 改这里:双 head(Tiny / Normal) CLASS_NAMES_EACH_HEAD: [ ['Bird', 'Kite'], ['Drone', 'Plastic_sheet', 'Balloon'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True NUM_HM_CONV: 2 SEPARATE_HEAD_CFG: # 注意:不要把 iou 放进 HEAD_ORDER(否则回归维度和 code_weights 会错) HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, # <<< 新增:IoU / quality 分支(Step4) 'iou': {'out_channels': 1, 'num_conv': 2} } # <<< 改这里:stride 8 -> 4 TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 新增:每个 head 自己的 assigner(需要 Step3 的代码支持) TARGET_ASSIGNER_CONFIG_EACH_HEAD: - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.35 # tiny:更“尖”的监督(更小 radius) MIN_RADIUS: 1 DENSE_REG: 1 - FEATURE_MAP_STRIDE: 4 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.10 # normal:保持你原本风格 MIN_RADIUS: 2 DENSE_REG: 1 # <<< 可选但建议:训练期加 IoU reg(不影响推理FPS) IOU_REG_LOSS: True LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-160, -75.2, -5.0, 160, 75.2, 8.0] MAX_OBJ_PER_SAMPLE: 500 # <<< 新增:用 iou 预测纠正 score(质量感知排序) USE_IOU_TO_RECTIFY_SCORE: True # 对应 CLASS_NAMES 顺序:Drone, Plastic_sheet, Kite, Balloon, Bird # alpha 越大,越依赖 iou(对 tiny 类一般设大点更稳) IOU_RECTIFIER: [0.50, 0.60, 0.70, 0.30, 0.70] NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80
textOPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
我尝试了将[-160, -75.2, -5.0, 160, 75.2, 8.0]改为和我点云范围一样,但是效果很差,所以我还是使用这个默认值,现在我的问题是现在的模块改进和原始的centerpoint有什么区别,改进点是否是3个模块,LOSS还能改进吗,如果模块不够还能怎么改进呢,原始的配置文件如下:CLASS_NAMES: ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird']
DATA_CONFIG:
BASE_CONFIG: /media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/cfgs/dataset_configs/custom_dataset.yaml
MODEL:
NAME: CenterPoint
textVFE: NAME: MeanVFE BACKBONE_3D: NAME: VoxelResBackBone8x MAP_TO_BEV: NAME: HeightCompression NUM_BEV_FEATURES: 256 BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [5, 5] LAYER_STRIDES: [1, 2] NUM_FILTERS: [128, 256] UPSAMPLE_STRIDES: [1, 2] NUM_UPSAMPLE_FILTERS: [256, 256] DENSE_HEAD: NAME: CenterHead CLASS_AGNOSTIC: False CLASS_NAMES_EACH_HEAD: [ ['Drone', 'Plastic_sheet', 'Kite', 'Balloon', 'Bird'] ] SHARED_CONV_CHANNEL: 64 USE_BIAS_BEFORE_NORM: True # TODO NUM_HM_CONV: 2 # TODO SEPARATE_HEAD_CFG: HEAD_ORDER: ['center', 'center_z', 'dim', 'rot'] HEAD_DICT: { 'center': {'out_channels': 2, 'num_conv': 2}, 'center_z': {'out_channels': 1, 'num_conv': 2}, 'dim': {'out_channels': 3, 'num_conv': 2}, 'rot': {'out_channels': 2, 'num_conv': 2}, } TARGET_ASSIGNER_CONFIG: FEATURE_MAP_STRIDE: 8 NUM_MAX_OBJS: 500 GAUSSIAN_OVERLAP: 0.1 MIN_RADIUS: 2 DENSE_REG: 1 LOSS_CONFIG: LOSS_WEIGHTS: { 'cls_weight': 1.0, 'loc_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } POST_PROCESSING: SCORE_THRESH: 0.1 POST_CENTER_LIMIT_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0] MAX_OBJ_PER_SAMPLE: 500 NMS_CONFIG: MULTI_CLASSES_NMS: False NMS_TYPE: nms_gpu NMS_THRESH: 0.01 NMS_PRE_MAXSIZE: 4096 NMS_POST_MAXSIZE: 500 POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] OUTPUT_RAW_SCORE: False EVAL_METRIC: kitti
OPTIMIZATION:
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 80
textOPTIMIZER: adam_onecycle LR: 0.01 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9 MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001 LR_WARMUP: False WARMUP_EPOCH: 1 GRAD_NORM_CLIP: 35
你现在这版之所以能从原始 CenterPoint(mAP≈41)拉到 mAP≈74.7 / mAR≈84.0 / FPS≈49,本质上不是“调参”,而是把 信息表达 + 分辨率 + 任务解耦 + 质量感知排序 这几件对小目标最关键的东西都做了结构性改动。
下面我按“和原始 CenterPoint 的区别 → 你到底改了哪些模块(不止 3 个点) → Loss 还能怎么做模块级改进 → 如果还想继续提升还能加哪些模块”给你梳理清楚。
原始:
你现在:
意义(对应你场景):
✅ 这是第 1 个核心改进模块(VFE)。
原始:
你现在:
意义:
✅ 这是第 2 个核心改进模块(Backbone 分辨率)。
原始 CenterPoint:
你现在:
✅ 这其实不是“1 个模块”,而是 Dense Head 内部至少 4 个结构改进点。
如果按“一级模块”算,你可以说是 3 块:
但严格讲你的 Head 里已经是“多项模块级改动叠加”,所以效果上当然比只改一个点强很多。
你点云 Y 范围是 [-20,20],但你把 POST_CENTER_LIMIT_RANGE 改成严格的 [-20,20] 后变差,最常见原因是:
你现在用 [-160, -75.2, -5.0, 160, 75.2, 8.0] 很松,能保住 recall;只要 NMS/score 能压 FP,就会更稳。
如果你想更合理一点,可以用“带 margin 的点云范围”,比如:
你现在的 Loss 结构是:
你已经有 iou/quality 分支、并且推理在做 rectify 了,但训练时 heatmap 仍然是“纯分类 focal”,训练目标与推理排序仍有 gap。
模块级改法:
效果通常是:mAP(尤其 IoU=0.7)更稳,对 Kite/Plastic_sheet 这种定位敏感类更有帮助。
改动位置:
CenterHead.build_losses() 里把 FocalLossCenterNet() 换成你实现的 QualityFocalLoss/VarifocalLossget_loss() 里构造 heatmap target 时引入质量权重(通常只改正样本)这属于“换 loss 模块”,不是调参。
你现在已有 iou_reg_loss,但很多实现只在部分位置算,或形式偏“约束”。你可以再加一个更直接的:
这对 IoU=0.7 指标非常对齐,常见收益是:mAP 上升、特别是小目标定位更稳。
模块级做法(不是调个全局权重):
dim/center loss 增强(按 class mask)这类改法往往能把 tiny 类再往上推一点,而且不会明显伤 FPS。
你现在已经很强了,继续提升建议选“对小目标最敏感、改动最模块化”的方向:
BACKBONE_2D 模块(相对独立)你现在 NMS_THRESH=0.01 非常“狠”,再加上 score rectify,有可能对某些类压得太厉害。
模块级方案:
class_specific_nms(不同类不同阈值)这一步经常对 Bird/Kite 的 recall 有帮助,且几乎不影响训练结构。
这个是“终极小目标方案”,但代码量明显更大,建议在你现在这版稳定后再做。
相对原始 CenterPoint,你现在的提升主要来自:
这已经是一套非常完整、且符合你“小目标 + 不规则 + IoU=0.7 严格”的结构化改进。
如果你想我下一步给你“最推荐的下一刀”(并且仍然要求模块级、可控代码量),我会建议你在保持当前结构不动的前提下二选一:
你告诉我:你更在意继续提升 Plastic_sheet 还是 Kite / Bird?我就按目标给你下一步的模块改造方案(包含具体文件改动点和 cfg 字段)。
请按照你的思路一步一步给出替换的完整的代码,一个一个代码给我,然后我去测试效果给你
下面按我前面说的“Loss 模块级升级”给你一套一步一步、可直接替换的完整代码(每次只动很少文件,方便你回滚对比)。
你现在已经把 VFE / stride=4 backbone / 双 head + per-head assigner / IoU head + rectify 都做了,下一步最划算的是把 hm_loss(热力图分类)从 CenterNet Focal 换成 Varifocal Loss(VFL):
pcdet/utils/vfl_loss.py把下面完整代码保存进去:
pythonimport torch import torch.nn as nn import torch.nn.functional as F class VarifocalLoss(nn.Module): """ Varifocal Loss (VFNet) 的一个简化实现,适配 CenterNet/CenterPoint 的 gaussian heatmap soft target. - target: [0,1] 的软标签 (你的 heatmap gaussian) - pred: [0,1] 的概率 (sigmoid 后) 负样本权重:alpha * pred^gamma 正样本权重:target(越靠近中心的 gaussian 值越大 -> 权重越大) """ def __init__(self, alpha: float = 0.75, gamma: float = 2.0, reduction: str = 'mean'): super().__init__() self.alpha = float(alpha) self.gamma = float(gamma) assert reduction in ['mean', 'sum', 'none'] self.reduction = reduction def forward(self, pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor: """ pred/target: shape (B, C, H, W) """ pred = pred.clamp(min=1e-6, max=1 - 1e-6) target = target.clamp(min=0.0, max=1.0) pos_mask = target > 0 neg_mask = ~pos_mask bce = F.binary_cross_entropy(pred, target, reduction='none') weight = torch.zeros_like(target) # 正样本:用 target 当权重(中心更强) weight[pos_mask] = target[pos_mask] # 负样本:alpha * p^gamma weight[neg_mask] = self.alpha * pred[neg_mask].pow(self.gamma) loss = weight * bce if self.reduction == 'none': return loss # 用正样本数做归一化更稳(防止 batch 里正样本太少造成 loss 抖动) num_pos = pos_mask.float().sum().clamp(min=1.0) loss = loss.sum() / num_pos return loss
center_head.py(只改 build_losses + hm_loss 计算逻辑;其余保持你现在稳定版本的写法)你现在用的是我之前给你的那版
center_head.py(支持 per-head assigner、stride 同步、iou sigmoid、debug)。
下面我给你一份完整可替换版本,核心新增:支持HM_LOSS: VFL配置开关。
pcdet/models/dense_heads/center_head.pypythonimport os import copy import numpy as np import torch import torch.nn as nn from torch.nn.init import kaiming_normal_ from functools import partial from ..model_utils import model_nms_utils from ..model_utils import centernet_utils from ...utils import loss_utils from ...utils.vfl_loss import VarifocalLoss class SeparateHead(nn.Module): def __init__(self, input_channels, sep_head_dict, init_bias=-2.19, use_bias=False, norm_func=None): super().__init__() self.sep_head_dict = sep_head_dict for cur_name in self.sep_head_dict: output_channels = self.sep_head_dict[cur_name]['out_channels'] num_conv = self.sep_head_dict[cur_name]['num_conv'] fc_list = [] for k in range(num_conv - 1): fc_list.append(nn.Sequential( nn.Conv2d(input_channels, input_channels, kernel_size=3, stride=1, padding=1, bias=use_bias), nn.BatchNorm2d(input_channels) if norm_func is None else norm_func(input_channels), nn.ReLU() )) fc_list.append(nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=1, padding=1, bias=True)) fc = nn.Sequential(*fc_list) if 'hm' in cur_name: fc[-1].bias.data.fill_(init_bias) else: for m in fc.modules(): if isinstance(m, nn.Conv2d): kaiming_normal_(m.weight.data) if hasattr(m, "bias") and m.bias is not None: nn.init.constant_(m.bias, 0) self.__setattr__(cur_name, fc) def forward(self, x): ret_dict = {} for cur_name in self.sep_head_dict: ret_dict[cur_name] = self.__getattr__(cur_name)(x) return ret_dict class CenterHead(nn.Module): def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range, voxel_size, predict_boxes_when_training=True): super().__init__() self.model_cfg = model_cfg self.num_class = num_class self.grid_size = grid_size self.point_cloud_range = point_cloud_range self.voxel_size = voxel_size self.feature_map_stride = self.model_cfg.TARGET_ASSIGNER_CONFIG.get('FEATURE_MAP_STRIDE', None) self.class_names = class_names self.class_names_each_head = [] self.class_id_mapping_each_head = [] self.debug = bool(self.model_cfg.get('DEBUG_CENTERHEAD', False)) or (os.environ.get('PCDET_DEBUG_CENTERHEAD', '0') == '1') self._debug_printed_once = False for cur_class_names in self.model_cfg.CLASS_NAMES_EACH_HEAD: self.class_names_each_head.append([x for x in cur_class_names if x in class_names]) cur_class_id_mapping = torch.from_numpy(np.array( [self.class_names.index(x) for x in cur_class_names if x in class_names] )).cuda() self.class_id_mapping_each_head.append(cur_class_id_mapping) total_classes = sum([len(x) for x in self.class_names_each_head]) assert total_classes == len(self.class_names), f'class_names_each_head={self.class_names_each_head}' norm_func = partial( nn.BatchNorm2d, eps=self.model_cfg.get('BN_EPS', 1e-5), momentum=self.model_cfg.get('BN_MOM', 0.1) ) self.shared_conv = nn.Sequential( nn.Conv2d( input_channels, self.model_cfg.SHARED_CONV_CHANNEL, 3, stride=1, padding=1, bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False) ), norm_func(self.model_cfg.SHARED_CONV_CHANNEL), nn.ReLU(), ) self.heads_list = nn.ModuleList() self.separate_head_cfg = self.model_cfg.SEPARATE_HEAD_CFG for head_idx, cur_class_names in enumerate(self.class_names_each_head): cur_head_dict = copy.deepcopy(self.separate_head_cfg.HEAD_DICT) cur_head_dict['hm'] = dict(out_channels=len(cur_class_names), num_conv=self.model_cfg.NUM_HM_CONV) self.heads_list.append( SeparateHead( input_channels=self.model_cfg.SHARED_CONV_CHANNEL, sep_head_dict=cur_head_dict, init_bias=-2.19, use_bias=self.model_cfg.get('USE_BIAS_BEFORE_NORM', False), norm_func=norm_func ) ) self.predict_boxes_when_training = predict_boxes_when_training self.forward_ret_dict = {} self.build_losses() def _dbg(self, msg: str): if not self.debug: return print(f"[CenterHead-DEBUG] {msg}") def build_losses(self): # ===== 新增:支持 VFL / 原 CenterNet Focal 二选一 ===== hm_loss_type = str(self.model_cfg.get('HM_LOSS', 'FOCAL')).upper() if hm_loss_type in ['VFL', 'VARIFOCAL']: alpha = float(self.model_cfg.get('VFL_ALPHA', 0.75)) gamma = float(self.model_cfg.get('VFL_GAMMA', 2.0)) self.add_module('hm_loss_func', VarifocalLoss(alpha=alpha, gamma=gamma, reduction='mean')) self._dbg(f"Use HM_LOSS=VFL (alpha={alpha}, gamma={gamma})") else: self.add_module('hm_loss_func', loss_utils.FocalLossCenterNet()) self._dbg("Use HM_LOSS=FOCAL (FocalLossCenterNet)") self.add_module('reg_loss_func', loss_utils.RegLossCenterNet()) def assign_target_of_single_head( self, num_classes, gt_boxes, feature_map_size, feature_map_stride, num_max_objs=500, gaussian_overlap=0.1, min_radius=2 ): heatmap = gt_boxes.new_zeros(num_classes, feature_map_size[1], feature_map_size[0]) ret_boxes = gt_boxes.new_zeros((num_max_objs, gt_boxes.shape[-1] - 1 + 1)) inds = gt_boxes.new_zeros(num_max_objs).long() mask = gt_boxes.new_zeros(num_max_objs).long() ret_boxes_src = gt_boxes.new_zeros(num_max_objs, gt_boxes.shape[-1]) ret_boxes_src[:gt_boxes.shape[0]] = gt_boxes x, y, z = gt_boxes[:, 0], gt_boxes[:, 1], gt_boxes[:, 2] coord_x = (x - self.point_cloud_range[0]) / self.voxel_size[0] / feature_map_stride coord_y = (y - self.point_cloud_range[1]) / self.voxel_size[1] / feature_map_stride coord_x = torch.clamp(coord_x, min=0, max=feature_map_size[0] - 0.5) coord_y = torch.clamp(coord_y, min=0, max=feature_map_size[1] - 0.5) center = torch.cat((coord_x[:, None], coord_y[:, None]), dim=-1) center_int = center.int() center_int_float = center_int.float() dx, dy, dz = gt_boxes[:, 3], gt_boxes[:, 4], gt_boxes[:, 5] dx = dx / self.voxel_size[0] / feature_map_stride dy = dy / self.voxel_size[1] / feature_map_stride radius = centernet_utils.gaussian_radius(dx, dy, min_overlap=gaussian_overlap) radius = torch.clamp_min(radius.int(), min=min_radius) for k in range(min(num_max_objs, gt_boxes.shape[0])): if dx[k] <= 0 or dy[k] <= 0: continue if not (0 <= center_int[k][0] <= feature_map_size[0] and 0 <= center_int[k][1] <= feature_map_size[1]): continue cur_class_id = (gt_boxes[k, -1] - 1).long() centernet_utils.draw_gaussian_to_heatmap(heatmap[cur_class_id], center[k], radius[k].item()) inds[k] = center_int[k, 1] * feature_map_size[0] + center_int[k, 0] mask[k] = 1 ret_boxes[k, 0:2] = center[k] - center_int_float[k].float() ret_boxes[k, 2] = z[k] ret_boxes[k, 3:6] = gt_boxes[k, 3:6].log() ret_boxes[k, 6] = torch.cos(gt_boxes[k, 6]) ret_boxes[k, 7] = torch.sin(gt_boxes[k, 6]) if gt_boxes.shape[1] > 8: ret_boxes[k, 8:] = gt_boxes[k, 7:-1] return heatmap, ret_boxes, inds, mask, ret_boxes_src def assign_targets(self, gt_boxes, feature_map_size=None, **kwargs): feature_map_size = feature_map_size[::-1] default_cfg = self.model_cfg.TARGET_ASSIGNER_CONFIG each_head_cfg = self.model_cfg.get('TARGET_ASSIGNER_CONFIG_EACH_HEAD', None) if each_head_cfg is not None: assert len(each_head_cfg) == len(self.class_names_each_head), \ f"TARGET_ASSIGNER_CONFIG_EACH_HEAD length {len(each_head_cfg)} != num_heads {len(self.class_names_each_head)}" def _get(cfg, key, default_val): if cfg is None: return default_val if isinstance(cfg, dict): return cfg.get(key, default_val) return getattr(cfg, key, default_val) batch_size = gt_boxes.shape[0] ret_dict = { 'heatmaps': [], 'target_boxes': [], 'inds': [], 'masks': [], 'heatmap_masks': [], 'target_boxes_src': [], } all_names = np.array(['bg', *self.class_names]) for head_idx, cur_class_names in enumerate(self.class_names_each_head): cur_cfg = each_head_cfg[head_idx] if each_head_cfg is not None else default_cfg cur_feature_map_stride = _get(cur_cfg, 'FEATURE_MAP_STRIDE', default_cfg.FEATURE_MAP_STRIDE) cur_num_max_objs = _get(cur_cfg, 'NUM_MAX_OBJS', default_cfg.NUM_MAX_OBJS) cur_gaussian_overlap = _get(cur_cfg, 'GAUSSIAN_OVERLAP', default_cfg.GAUSSIAN_OVERLAP) cur_min_radius = _get(cur_cfg, 'MIN_RADIUS', default_cfg.MIN_RADIUS) heatmap_list, target_boxes_list, inds_list, masks_list, target_boxes_src_list = [], [], [], [], [] for bs_idx in range(batch_size): cur_gt_boxes = gt_boxes[bs_idx] gt_class_names = all_names[cur_gt_boxes[:, -1].cpu().long().numpy()] gt_boxes_single_head = [] for gt_idx, name in enumerate(gt_class_names): if name not in cur_class_names: continue temp_box = cur_gt_boxes[gt_idx].clone() temp_box[-1] = cur_class_names.index(name) + 1 gt_boxes_single_head.append(temp_box[None, :]) if len(gt_boxes_single_head) == 0: gt_boxes_single_head = cur_gt_boxes[:0, :] else: gt_boxes_single_head = torch.cat(gt_boxes_single_head, dim=0) heatmap, ret_boxes, inds, mask, ret_boxes_src = self.assign_target_of_single_head( num_classes=len(cur_class_names), gt_boxes=gt_boxes_single_head.cpu(), feature_map_size=feature_map_size, feature_map_stride=cur_feature_map_stride, num_max_objs=cur_num_max_objs, gaussian_overlap=cur_gaussian_overlap, min_radius=cur_min_radius, ) heatmap_list.append(heatmap.to(gt_boxes_single_head.device)) target_boxes_list.append(ret_boxes.to(gt_boxes_single_head.device)) inds_list.append(inds.to(gt_boxes_single_head.device)) masks_list.append(mask.to(gt_boxes_single_head.device)) target_boxes_src_list.append(ret_boxes_src.to(gt_boxes_single_head.device)) ret_dict['heatmaps'].append(torch.stack(heatmap_list, dim=0)) ret_dict['target_boxes'].append(torch.stack(target_boxes_list, dim=0)) ret_dict['inds'].append(torch.stack(inds_list, dim=0)) ret_dict['masks'].append(torch.stack(masks_list, dim=0)) ret_dict['target_boxes_src'].append(torch.stack(target_boxes_src_list, dim=0)) return ret_dict def sigmoid(self, x): return torch.clamp(x.sigmoid(), min=1e-4, max=1 - 1e-4) def get_loss(self): pred_dicts = self.forward_ret_dict['pred_dicts'] target_dicts = self.forward_ret_dict['target_dicts'] tb_dict = {} loss = 0 for head_idx, pred_dict in enumerate(pred_dicts): pred_hm = self.sigmoid(pred_dict['hm']) # (B,C,H,W) target_hm = target_dicts['heatmaps'][head_idx] hm_loss = self.hm_loss_func(pred_hm, target_hm) hm_loss *= self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight'] target_boxes = target_dicts['target_boxes'][head_idx] pred_boxes = torch.cat([pred_dict[name] for name in self.separate_head_cfg.HEAD_ORDER], dim=1) reg_loss = self.reg_loss_func( pred_boxes, target_dicts['masks'][head_idx], target_dicts['inds'][head_idx], target_boxes ) loc_loss = (reg_loss * reg_loss.new_tensor(self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['code_weights'])).sum() loc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += hm_loss + loc_loss tb_dict[f'hm_loss_head_{head_idx}'] = float(hm_loss.item()) tb_dict[f'loc_loss_head_{head_idx}'] = float(loc_loss.item()) if ('iou' in pred_dict) or self.model_cfg.get('IOU_REG_LOSS', False): batch_box_preds = centernet_utils.decode_bbox_from_pred_dicts( pred_dict=pred_dict, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride ) batch_box_preds_for_iou = batch_box_preds.permute(0, 3, 1, 2).contiguous() if 'iou' in pred_dict: iou_loss = loss_utils.calculate_iou_loss_centerhead( iou_preds=pred_dict['iou'], batch_box_preds=batch_box_preds_for_iou.clone().detach(), mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) loss += iou_loss tb_dict[f'iou_loss_head_{head_idx}'] = float(iou_loss.item()) if self.model_cfg.get('IOU_REG_LOSS', False): iou_reg_loss = loss_utils.calculate_iou_reg_loss_centerhead( batch_box_preds=batch_box_preds_for_iou, mask=target_dicts['masks'][head_idx], ind=target_dicts['inds'][head_idx], gt_boxes=target_dicts['target_boxes_src'][head_idx] ) if target_dicts['masks'][head_idx].sum().item() != 0: iou_reg_loss = iou_reg_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight'] loss += iou_reg_loss tb_dict[f'iou_reg_loss_head_{head_idx}'] = float(iou_reg_loss.item()) else: loss += (batch_box_preds_for_iou * 0.).sum() tb_dict[f'iou_reg_loss_head_{head_idx}'] = float((batch_box_preds_for_iou * 0.).sum().item()) tb_dict['rpn_loss'] = float(loss.item()) return loss, tb_dict def generate_predicted_boxes(self, batch_size, pred_dicts): post_cfg = self.model_cfg.POST_PROCESSING post_center_limit_range = torch.tensor(post_cfg.POST_CENTER_LIMIT_RANGE).cuda().float() ret_dict = [{ 'pred_boxes': [], 'pred_scores': [], 'pred_labels': [], } for _ in range(batch_size)] for head_idx, pred_dict in enumerate(pred_dicts): batch_hm = pred_dict['hm'].sigmoid() batch_center = pred_dict['center'] batch_center_z = pred_dict['center_z'] batch_dim = pred_dict['dim'].exp() batch_rot_cos = pred_dict['rot'][:, 0].unsqueeze(dim=1) batch_rot_sin = pred_dict['rot'][:, 1].unsqueeze(dim=1) batch_vel = pred_dict['vel'] if 'vel' in self.separate_head_cfg.HEAD_ORDER else None batch_iou = torch.sigmoid(pred_dict['iou']) if 'iou' in pred_dict else None final_pred_dicts = centernet_utils.decode_bbox_from_heatmap( heatmap=batch_hm, rot_cos=batch_rot_cos, rot_sin=batch_rot_sin, center=batch_center, center_z=batch_center_z, dim=batch_dim, vel=batch_vel, iou=batch_iou, point_cloud_range=self.point_cloud_range, voxel_size=self.voxel_size, feature_map_stride=self.feature_map_stride, K=post_cfg.MAX_OBJ_PER_SAMPLE, circle_nms=(post_cfg.NMS_CONFIG.NMS_TYPE == 'circle_nms'), score_thresh=post_cfg.SCORE_THRESH, post_center_limit_range=post_center_limit_range ) for bs_i, final_dict in enumerate(final_pred_dicts): final_dict['pred_labels'] = self.class_id_mapping_each_head[head_idx][final_dict['pred_labels'].long()] if post_cfg.get('USE_IOU_TO_RECTIFY_SCORE', False) and 'pred_iou' in final_dict: pred_iou = torch.clamp(final_dict['pred_iou'], min=0.0, max=1.0) IOU_RECTIFIER = final_dict['pred_scores'].new_tensor(post_cfg.IOU_RECTIFIER) final_dict['pred_scores'] = torch.pow( final_dict['pred_scores'], 1 - IOU_RECTIFIER[final_dict['pred_labels']] ) * torch.pow( pred_iou, IOU_RECTIFIER[final_dict['pred_labels']] ) if post_cfg.NMS_CONFIG.NMS_TYPE not in ['circle_nms', 'class_specific_nms']: selected, selected_scores = model_nms_utils.class_agnostic_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], nms_config=post_cfg.NMS_CONFIG, score_thresh=None ) elif post_cfg.NMS_CONFIG.NMS_TYPE == 'class_specific_nms': selected, selected_scores = model_nms_utils.class_specific_nms( box_scores=final_dict['pred_scores'], box_preds=final_dict['pred_boxes'], box_labels=final_dict['pred_labels'], nms_config=post_cfg.NMS_CONFIG, score_thresh=post_cfg.NMS_CONFIG.get('SCORE_THRESH', None) ) else: raise NotImplementedError final_dict['pred_boxes'] = final_dict['pred_boxes'][selected] final_dict['pred_scores'] = selected_scores final_dict['pred_labels'] = final_dict['pred_labels'][selected] ret_dict[bs_i]['pred_boxes'].append(final_dict['pred_boxes']) ret_dict[bs_i]['pred_scores'].append(final_dict['pred_scores']) ret_dict[bs_i]['pred_labels'].append(final_dict['pred_labels']) for bs_i in range(batch_size): ret_dict[bs_i]['pred_boxes'] = torch.cat(ret_dict[bs_i]['pred_boxes'], dim=0) if len(ret_dict[bs_i]['pred_boxes']) else torch.empty((0, 7), device='cuda') ret_dict[bs_i]['pred_scores'] = torch.cat(ret_dict[bs_i]['pred_scores'], dim=0) if len(ret_dict[bs_i]['pred_scores']) else torch.empty((0,), device='cuda') ret_dict[bs_i]['pred_labels'] = torch.cat(ret_dict[bs_i]['pred_labels'], dim=0).long() + 1 if len(ret_dict[bs_i]['pred_labels']) else torch.empty((0,), dtype=torch.long, device='cuda') return ret_dict @staticmethod def reorder_rois_for_refining(batch_size, pred_dicts): num_max_rois = max([len(cur_dict['pred_boxes']) for cur_dict in pred_dicts]) num_max_rois = max(1, num_max_rois) pred_boxes = pred_dicts[0]['pred_boxes'] rois = pred_boxes.new_zeros((batch_size, num_max_rois, pred_boxes.shape[-1])) roi_scores = pred_boxes.new_zeros((batch_size, num_max_rois)) roi_labels = pred_boxes.new_zeros((batch_size, num_max_rois)).long() for bs_idx in range(batch_size): num_boxes = len(pred_dicts[bs_idx]['pred_boxes']) rois[bs_idx, :num_boxes, :] = pred_dicts[bs_idx]['pred_boxes'] roi_scores[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_scores'] roi_labels[bs_idx, :num_boxes] = pred_dicts[bs_idx]['pred_labels'] return rois, roi_scores, roi_labels def forward(self, data_dict): # 关键:与你的 HeightCompression 对齐(它写了 spatial_features_stride) runtime_stride = data_dict.get('spatial_features_stride', None) if runtime_stride is not None: if torch.is_tensor(runtime_stride): runtime_stride = runtime_stride.item() self.feature_map_stride = int(runtime_stride) spatial_features_2d = data_dict['spatial_features_2d'] x = self.shared_conv(spatial_features_2d) pred_dicts = [head(x) for head in self.heads_list] if self.training: target_dict = self.assign_targets( data_dict['gt_boxes'], feature_map_size=spatial_features_2d.size()[2:], ) self.forward_ret_dict['target_dicts'] = target_dict self.forward_ret_dict['pred_dicts'] = pred_dicts if not self.training or self.predict_boxes_when_training: final_box_dicts = self.generate_predicted_boxes( data_dict['batch_size'], pred_dicts ) if self.predict_boxes_when_training: rois, roi_scores, roi_labels = self.reorder_rois_for_refining(data_dict['batch_size'], final_box_dicts) data_dict['rois'] = rois data_dict['roi_scores'] = roi_scores data_dict['roi_labels'] = roi_labels data_dict['has_class_labels'] = True else: data_dict['final_box_dicts'] = final_box_dicts return data_dict
在你当前 cfg 的 DENSE_HEAD: 下面(跟 IOU_REG_LOSS 同级)加入:
yamlHM_LOSS: VFL VFL_ALPHA: 0.75 VFL_GAMMA: 2.0
如果你想对比回原始 focal,只要把
HM_LOSS: FOCAL或者直接删掉这三行即可(默认还是 FocalLossCenterNet)。
如果 VFL 有收益,我们下一个“模块级”通常是二选一(我会按你回传的类别表现选):
你先按上面 Step1~Step3 跑一版,把新的 mAR/mAP/FPS + 每类 AP 发我,我再给你下一份“整文件替换”的代码。
按照你这个方案在巡行中代码报错如下:../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [158,0,0], thread: [126,0,0] Assertion input_val >= zero && input_val <= one failed.
../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [158,0,0], thread: [127,0,0] Assertion input_val >= zero && input_val <= one failed.
epochs: 0%| | 0/80 [03:52<?, ?it/s]
Traceback (most recent call last):
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/train.py", line 233, in <module>
main()
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/train.py", line 178, in main
train_model(
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/train_utils/train_utils.py", line 180, in train_model
accumulated_iter = train_one_epoch(
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/tools/train_utils/train_utils.py", line 56, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/pcdet/models/init.py", line 44, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/home/jd/anaconda3/envs/openpcdet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/pcdet/models/detectors/cifu.py", line 15, in forward
loss, tb_dict, disp_dict = self.get_training_loss()
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/pcdet/models/detectors/cifu.py", line 27, in get_training_loss
loss, tb_dict = self.dense_head.get_loss()
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/pcdet/models/dense_heads/center_head_multiscale.py", line 276, in get_loss
hm_loss = self.hm_loss_func(pred_hm, target_hm)
File "/home/jd/anaconda3/envs/openpcdet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/media/jd/4997BB1603CFE2C4/lw/Openpcdet/pcdet/utils/vfl_loss.py", line 36, in forward
weight[pos_mask] = target[pos_mask]
RuntimeError: numel: integer multiplication overflow
你这个报错根因是 CUDA 的 BCE 内核断言:binary_cross_entropy 要求 input(pred) 必须在 [0,1],但你的 pred_hm 里出现了 NaN/Inf(NaN 不满足 >=0 && <=1),于是先触发了
Assertion `input_val >= zero && input_val <= one` failed.
一旦触发 device assert,后面很多操作都会“连锁报奇怪错误”(你看到的 numel: integer multiplication overflow 基本就是前面 device assert 之后的连带异常,不是真 overflow)。
你现在用的 DenseHead 是 CenterHeads(报错栈里是 center_head_multiscale.py),所以我之前给你的 center_head.py 不是这次报错的直接位置。最稳的修复方式是:把 VFL loss 写得对 NaN/Inf 更鲁棒,这样不需要你再去改 center_head_multiscale.py。
下面给你第一个“只替换一个文件”的修复:替换 pcdet/utils/vfl_loss.py。
pcdet/utils/vfl_loss.py把你现在的 pcdet/utils/vfl_loss.py 整文件替换为下面这份(注意:我加了 nan_to_num,并且在检测到 NaN/Inf 时会打印一次 warning,方便你定位是不是训练数值炸了):
pythonimport os import torch import torch.nn as nn import torch.nn.functional as F class VarifocalLoss(nn.Module): """ Varifocal Loss (probability-input version), robust to NaN/Inf. - pred: probability in [0,1] (通常是 sigmoid 后的 heatmap) - target: soft target in [0,1] (你的 gaussian heatmap) 负样本权重:alpha * pred^gamma 正样本权重:target """ def __init__(self, alpha: float = 0.75, gamma: float = 2.0, reduction: str = 'mean'): super().__init__() self.alpha = float(alpha) self.gamma = float(gamma) assert reduction in ['mean', 'sum', 'none'] self.reduction = reduction # debug:是否打印 NaN/Inf 警告 self.debug_nan = (os.environ.get("PCDET_DEBUG_NAN", "0") == "1") self._warned = False def forward(self, pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor: """ pred/target: (B, C, H, W) """ # --- 关键:先把 NaN/Inf 清掉,否则 BCE CUDA 内核会断言崩溃 --- if (not self._warned) and (self.debug_nan or torch.isnan(pred).any() or torch.isinf(pred).any()): # 只打印一次,避免刷屏 n_nan = torch.isnan(pred).sum().item() n_inf = torch.isinf(pred).sum().item() if n_nan > 0 or n_inf > 0: print(f"[VFL WARNING] pred has NaN/Inf: nan={n_nan}, inf={n_inf}. " f"Will apply nan_to_num + clamp. Consider lowering LR or checking AMP.") self._warned = True pred = torch.nan_to_num(pred, nan=0.5, posinf=1.0, neginf=0.0) target = torch.nan_to_num(target, nan=0.0, posinf=1.0, neginf=0.0) # clamp 到合法范围,保证 BCE 不触发断言 pred = pred.clamp(min=1e-6, max=1.0 - 1e-6) target = target.clamp(min=0.0, max=1.0) pos_mask = target > 0 neg_mask = ~pos_mask # BCE on prob bce = F.binary_cross_entropy(pred, target, reduction='none') weight = torch.zeros_like(target) weight[pos_mask] = target[pos_mask] weight[neg_mask] = self.alpha * pred[neg_mask].pow(self.gamma) loss = weight * bce if self.reduction == 'none': return loss # 用正样本数归一化(更稳) num_pos = pos_mask.float().sum().clamp(min=1.0) loss = loss.sum() / num_pos return loss
bashexport PCDET_DEBUG_NAN=1
它会在第一次发现 NaN/Inf 时打印 warning(只打印一次)。
那说明你传进 VFL 的 pred 不是 sigmoid 后的概率(可能某处把 logits 直接丢进来了),这种情况下 pred 会远大于 1 或小于 0。
这时你把报错行贴出来我就能定位到你 center_head_multiscale.py 的哪一行在算 pred_hm。一般修法就是:确保传入 VFL 的是 sigmoid + clamp 后的 pred。
(你这次栈里显示 hm_loss = self.hm_loss_func(pred_hm, target_hm),如果 pred_hm 是 raw logits 就会崩。)
你先按 Step 1 替换 vfl_loss.py 跑一遍,把:
发我,我再给你 Step 2(下一份“完整替换代码”):要么升级成 logits 版 VFL(更稳定),要么做下一项模块(class_specific_nms 或 BiFPN/FPN)。
根据我目前的结果告诉我每个类别和总的map50以及map50-95是多少:Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.734
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.968
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.855
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.721
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.765
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.785
01/11 16:50:32 - mmengine - INFO -
+---------------+-------+--------+--------+-------+-------+-------+
| category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l |
+---------------+-------+--------+--------+-------+-------+-------+
| drone | 0.774 | 0.997 | 0.911 | nan | 0.723 | 0.783 |
| plastic_sheet | 0.489 | 0.861 | 0.506 | nan | 0.505 | 0.491 |
| kite | 0.847 | 0.98 | 0.968 | nan | 0.779 | 0.859 |
| balloon | 0.871 | 1.0 | 0.99 | nan | 0.876 | 0.873 |
| bird | 0.688 | 1.0 | 0.902 | nan | nan | 0.695 |
+---------------+-------+--------+--------+-------+-------+-------+
01/11 16:50:32 - mmengine - INFO - bbox_mAP_copypaste: 0.734 0.968 0.855 -1.000 0.721 0.740
01/11 16:50:32 - mmengine - INFO - Epoch(val) [100][37/37] coco/drone_precision: 0.7740 coco/plastic_sheet_precision: 0.4890 coco/kite_precision: 0.8470 coco/balloon_precision: 0.8710 coco/bird_precision: 0.6880 coco/bbox_mAP: 0.7340 coco/bbox_mAP_50: 0.9680 coco/bbox_mAP_75: 0.8550 coco/bbox_mAP_s: -1.0000 coco/bbox_mAP_m: 0.7210 coco/bbox_mAP_l: 0.7400 data_time: 0.0376 time: 0.4800
以及出现nan的原因
下面把你这份 COCO-style 评测日志里能读出来的 总 mAP50 / mAP50-95 和 每类的 AP50-95、AP50、AP75 整理清楚;然后解释 为什么会出现 nan / -1.000。
从日志最上面的 3 行:
对应你最后那行 bbox_mAP_copypaste: 0.734 0.968 0.855 -1.000 0.721 0.740 也是同一组:
bbox_mAP = 0.734bbox_mAP_50 = 0.968bbox_mAP_75 = 0.855bbox_mAP_s = -1.000(small 无法计算,见第 3 部分)bbox_mAP_m = 0.721bbox_mAP_l = 0.740你表格里 mAP 就是 该类的 AP@[0.50:0.95](即该类的 mAP50-95),并给了该类的 mAP_50 / mAP_75。
drone
plastic_sheet
kite
balloon
bird
你现在最短板很明显是 plastic_sheet(0.489);其次是 bird(0.688)。
你日志里有两种“异常显示”:
这是 COCO 评测里很典型的情况:small 这个 size 区间里“没有有效 GT”或“没有参与评估的样本”,评测脚本就会输出 -1(表示不可用/未定义),不是模型坏了。
常见原因:
area 计算下,没有落入 small 区间COCO 的 small/medium/large 是按 像素面积/area 划分的(原 COCO 是 32²、96² 这样的阈值)。如果你的评测把 3D box 投影或用某个规则生成 area,可能导致 small 分组为空。
同理:对单个类别,如果该类别在 small 分组里 GT 数量为 0(或有效匹配为 0 导致分母为 0),就会显示 nan。
比如你表格中:
mAP_s 都是 nan另外你 bird 的 mAP_m 也是 nan:说明 bird 在 medium 分组里也可能没有样本(或 area 全被判到 large)。
一般不需要,因为:
只有当你看到的是 loss 变 nan、或者推理输出 nan,那才是数值问题。你这份日志更像是 评测分组为空 的“统计意义 nan”。
如果你想把 small/medium/large 也变成“可用”,你需要检查评测时 bbox area 的计算方式(你的框是 2D 还是由 3D 转出来的?area 是怎么定义的?阈值是不是 COCO 默认?)。你把你用的评测脚本/配置(mmdet 或 mmengine 的 evaluator 配置)贴出来,我可以告诉你具体改哪一项来让 small 分组不为空。