torch.OutOfMemoryError: CUDA out of memory. Tried ...

Question

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 842.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 576.06 MiB is free. Process 1364514 has 6.41 GiB memory in use. Process 1741703 has 37.54 GiB memory in use. Of the allocated memory 36.99 GiB is allocated by PyTorch, and 56.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 解释

Answer 1

这个错误信息说明 PyTorch 在 CUDA 上尝试分配 842.00 MiB 内存时，发现 GPU 可用内存不足，从而导致了内存不足（Out Of Memory, OOM）的错误。具体解释如下：

GPU 内存状态
- 你的 GPU 总共拥有约 44.52 GiB 的显存，但当前只有大约 576.06 MiB 是空闲的。
- 报错中提到两个进程（分别是进程编号 1364514 和 1741703）已经占用了 6.41 GiB 和 37.54 GiB，这说明几乎所有的 GPU 显存都被占用了。
PyTorch 内部的内存管理
- 在已分配的内存中，36.99 GiB 由 PyTorch 管理实际分配使用，另外有约 56.48 MiB 被 PyTorch 保留但还未实际使用。
- PyTorch 会通过内存缓存机制来复用 GPU 内存，但当连续的空闲内存块太小或者不连续时，就会造成内存碎片问题（Fragmentation），即使总体空闲内存数目看起来足够，也可能无法满足单次较大内存分配的需求。
错误原因总结
- 你的程序在执行某个操作时需要申请 842.00 MiB 的连续显存，但由于其他进程和当前 PyTorch 已占用大量显存，导致可用且连续的显存不足，从而引起了这个错误。
建议的解决方法
- 减少 GPU 内存占用：检查是否有不必要的进程同时占用 GPU 资源，可以关闭或暂停其它不相关的进程。
- 降低批次大小或模型大小：如果是在训练过程中出现这个错误，尝试降低模型的 batch size 或简化模型结构，从而减少单次内存申请量。
- 调整内存分配策略：错误提示中建议，如果保留但未实际分配的内存较多，可以尝试设置环境变量
```
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```
  这个设置可以帮助 PyTorch 更好地管理内存碎片，允许内存段的动态扩展，可能有助于解决这类问题。
参考文档
- PyTorch 的官方文档中有关于 CUDA 内存管理的详细说明，建议查阅相关章节以获得更多背景知识和调优方案：PyTorch CUDA 内存管理。

总结来说，该错误主要是由于 GPU 的空闲内存不足以满足一次性大块内存分配请求，再加上可能存在的内存碎片问题。可以考虑减少其他进程对显存的占用、调整模型和 batch size，或者利用环境变量优化内存分配策略来缓解这一问题。

Answer 2

要确保你的模型使用指定的 GPU，通常你需要将模型和数据移动到指定的 GPU 设备上。在 PyTorch 中，你可以通过 torch.device 来指定设备，然后使用 .to(device) 方法将模型和张量转移到该设备。

以下是如何将你的模型和输入数据移动到指定 GPU 的步骤：

1. 指定设备

首先，需要通过 torch.device 来指定目标 GPU。假设你想使用 GPU 0，可以像这样定义设备：

python
device = torch.device('cuda:0')  # 选择 GPU 0

2. 将模型移到指定 GPU

然后，你需要将模型移动到指定的 GPU 上。你可以在模型初始化时将其移动到 GPU，或者在模型的任何地方使用 .to(device) 来进行迁移。

例如，在你的 LlavaForConditionalGeneration 模型中，你可以在初始化时添加：

python
class LlavaForConditionalGeneration(nn.Module, SupportsMultiModal):
    def __init__(self, config: LlavaConfig, multimodal_config: MultiModalConfig, cache_config: Optional[CacheConfig] = None, quant_config: Optional[QuantizationConfig] = None, device: torch.device = torch.device('cuda:0')) -> None:
        super().__init__()
        self.device = device  # 保存设备信息
        self.config = config
        self.multimodal_config = multimodal_config
        # 初始化 vision_tower、multi_modal_projector、language_model
        self.vision_tower = _init_vision_tower(config).to(self.device)  # 将 vision_tower 移到设备
        self.multi_modal_projector = LlavaMultiModalProjector(
            vision_hidden_size=config.vision_config.hidden_size,
            text_hidden_size=config.text_config.hidden_size,
            projector_hidden_act=config.projector_hidden_act
        ).to(self.device)  # 将 multi_modal_projector 移到设备
        self.language_model = init_vllm_registered_model(
            config.text_config, cache_config, quant_config
        ).to(self.device)  # 将 language_model 移到设备

3. 将输入数据移动到指定 GPU

当你将模型移到指定的 GPU 后，输入数据（如 input_ids, pixel_values 等）也必须移到相同的设备上。你可以在前向传递时将数据移到 GPU：

python
def forward(
    self,
    input_ids: torch.Tensor,
    positions: torch.Tensor,
    kv_caches: List[torch.Tensor],
    attn_metadata: AttentionMetadata,
    intermediate_tensors: Optional[IntermediateTensors] = None,
    **kwargs: object,
) -> SamplerOutput:
    input_ids = input_ids.to(self.device)  # 将 input_ids 移到指定设备
    positions = positions.to(self.device)  # 将 positions 移到指定设备

    # 处理图片数据
    image_input = self._parse_and_validate_image_input(**kwargs)
    if image_input is not None:
        vision_embeddings = self._process_image_input(image_input)
        vision_embeddings = vision_embeddings.to(self.device)  # 确保 vision_embeddings 在同一个设备上
        inputs_embeds = self.language_model.model.get_input_embeddings(input_ids).to(self.device)
        inputs_embeds = merge_multimodal_embeddings(
            input_ids, inputs_embeds, vision_embeddings,
            self.config.image_token_index
        )
    else:
        inputs_embeds = None

    hidden_states = self.language_model.model(input_ids, positions, kv_caches, attn_metadata, None, inputs_embeds=inputs_embeds)
    return hidden_states

4. 总结

模型：将模型通过 .to(device) 方法移到指定 GPU。
数据：确保所有输入数据和中间结果都被传输到相同的 GPU 上。

通常，最常见的做法是将所有内容（模型和数据）都统一移到目标设备（如 cuda:0）。如果你使用多个 GPU，你也可以通过 torch.nn.DataParallel 或 torch.nn.parallel.DistributedDataParallel 来并行化训练和推理过程。

如果你只是想在特定的 GPU 上运行，确保在所有相关的代码中都使用 .to(device) 来明确将模型和数据移动到正确的设备。

torch.OutOfMemoryError: CUDA out of memory. Tried ...

質問

思考

回答

質問

from .siglip import (SiglipVisionModel, dummy_image_for_siglip,

dummy_seq_data_for_siglip, get_max_siglip_image_tokens,

input_processor_for_siglip)

TODO(xwjiang): Run benchmark and decide if TP.

思考

回答

1. 指定设备

2. 将模型移到指定 GPU

3. 将输入数据移动到指定 GPU

4. 总结

このQ&Aを共有