fix this issues: ``` C:\Users\lucas\Downloads\Scr...

Criado em: 30 de abril de 2025

Respondido usando O3 por Chat01

Pergunta

fix this issues:

text
C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda>PS_bak_new_clean-base.py --- Configuração --- font_path: C:/Users/lucas/LilitaOne-Regular.ttf font_size: 140 video_resolution: 1920x1080 video_fps: 60 base_text_color: #FFFFFF highlight_text_color: #ff0000 num_visible_lines: 4 upper_case: True background_image: capa.png frames_per_batch: 64 default_subtitle_file: legenda.psv default_output_file: video_karaoke_char_level.mp4 ffmpeg_preset: p4 ffmpeg_tune: hq ffmpeg_bitrate: 20M ffmpeg_codec: h264_nvenc vertical_shift_pixels: 130 min_char_duration: 0.01 cuda_graph_warmup_frames: 4 max_visual_fill_duration: 3.0 Arquivo de Legenda: legenda.psv Arquivo de Áudio: audio.wav Arquivo de Saída: video_karaoke_char_level.mp4 -------------------- Using GPU: 0000:01:00.0 Afinidade da CPU definida para todos os 12 cores. Processando legendas... Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado. Processamento concluído: 8 linhas visuais, 0 pausas longas detectadas. GPU initialized successfully. Iniciando criação do vídeo... Usando duração do áudio: 52.01s Duração estimada: 52.11s | Total de frames: 3127 FFmpeg command: ffmpeg -y -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt bgr24 -r 60 -i - -c:v h264_nvenc -preset p4 -b:v 20M -pix_fmt yuv420p -tune hq video_karaoke_char_level.mp4 Gerando vídeo: 0%| | 0/3127 [00:00<?, ?frames/s]Gerando 35 frames estáticos iniciais... Gerando vídeo: 1%|█ | 35/3127 [00:00<00:46, 66.63frames/s] Erro Crítico durante a criação do vídeo: CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1642, in main karaoke_creator.create_video(lines, long_pauses, output_file, audio_file) File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1411, in create_video self.cuda_processor.process_frames_streaming( File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 885, in process_frames_streaming self._render_batch(batch_frame_times, text_mask_bool, syl_meta, File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 665, in _render_batch if syl_meta["line_idx"][k] != active_line_idx: File "cupy\\_core\\core.pyx", line 1289, in cupy._core.core._ndarray_base.__nonzero__ File "cupy\\_core\\core.pyx", line 1910, in cupy._core.core._ndarray_base.get File "cupy\\cuda\\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async File "cupy_backends\\cuda\\api\\runtime.pyx", line 607, in cupy_backends.cuda.api.runtime.memcpyAsync File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered O processo foi interrompido devido a um erro. Gerando vídeo: 1%|█ | 35/3127 [00:00<01:15, 41.15frames/s] Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Recursos da GPU liberados. Finalizando Karaokê Creator. Tempo total do script: 00:00:01 Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860> Traceback (most recent call last): File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860> Traceback (most recent call last): File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Error in sys.excepthook: Original exception was:

code:

python
""" karaoke_fast.py – versão otimizada Autor: ChatGPT (OpenAI), abr/2025 """ # ---------------------------- IMPORTS ------------------------------------ import cupy as cp import numpy as np from PIL import Image, ImageDraw, ImageFont import subprocess, threading, queue, math, os, time, psutil, traceback from tqdm import tqdm # --------------------------------------------------------------------------- # ------------------------ CONFIGURAÇÃO ----------------------------------- DEFAULT_CONFIG = { "font_path": "C:/Users/lucas/LilitaOne-Regular.ttf", "font_size": 140, "video_resolution": "1920x1080", "video_fps": 60, "base_text_color": "#FFFFFF", "highlight_text_color": "#ff0000", "num_visible_lines": 4, "upper_case": True, "background_image": "capa.png", "frames_per_batch": 64, # agora valor-mínimo "default_subtitle_file": "legenda.psv", "default_output_file": "video_karaoke_char_level.mp4", "ffmpeg_preset": "p4", "ffmpeg_tune": "hq", "ffmpeg_bitrate": "20M", "ffmpeg_codec": "h264_nvenc", "vertical_shift_pixels": 130, "min_char_duration": 0.01, "cuda_graph_warmup_frames": 4, # <-- novo "max_visual_fill_duration": 3.0, # <-- agora usado globalmente } # --------------------------------------------------------------------------- # ============================== UTILS ==================================== def hex_to_bgr_cupy(hex_color: str) -> cp.ndarray: hex_color = hex_color.lstrip('#') rgb = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4)) return cp.array(rgb[::-1], dtype=cp.uint8) def get_audio_duration(audio_file_path): if not os.path.exists(audio_file_path): print(f"Aviso: Arquivo de áudio não encontrado: {audio_file_path}") return None try: command = [ "ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", audio_file_path ] result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, check=True) return float(result.stdout.strip()) except FileNotFoundError: print("Erro: ffprobe não encontrado. Certifique-se de que o FFmpeg está no PATH.") return None except Exception as e: print(f"Erro ao obter duração do áudio: {e}") return None def load_syllables(filepath="syllables.txt"): syllable_dict = {} not_found_words = set() try: with open(filepath, 'r', encoding='utf-8') as f: for line in f: line = line.strip() if line and '|' in line: word, syllables = line.split('|', 1) syllable_dict[word.strip().lower()] = syllables.strip() except FileNotFoundError: print(f"Aviso: Arquivo de sílabas '{filepath}' não encontrado.") except Exception as e: print(f"Erro ao carregar sílabas: {e}") return syllable_dict, not_found_words # --------------------------------------------------------------------------- # ====================== TEXT RENDERER (SEM ALTERAÇÕES) =================== class TextRenderer: def __init__(self, config): self.config = config self.font_path = config["font_path"] self.font_size = config["font_size"] self.num_visible_lines = config["num_visible_lines"] self.upper_case = config["upper_case"] self.base_text_color = config["base_text_color"] self._font_cache = {} try: self.font = ImageFont.truetype(self.font_path, self.font_size) self._font_cache[self.font_size] = self.font temp_img = Image.new("RGB", (1, 1)) temp_draw = ImageDraw.Draw(temp_img) space_bbox = temp_draw.textbbox((0, 0), " ", font=self.font) try: self.space_width_ref = temp_draw.textlength(" ", font=self.font) except AttributeError: self.space_width_ref = space_bbox[2] - space_bbox[0] if space_bbox else int(self.font_size * 0.25) try: sample_bbox = self.font.getbbox("Tg") self.line_height_ref = sample_bbox[3] - sample_bbox[1] except AttributeError: sample_bbox_fallback = temp_draw.textbbox((0, 0), "Tg", font=self.font) self.line_height_ref = sample_bbox_fallback[3] - sample_bbox_fallback[1] if sample_bbox_fallback else int(self.font_size * 1.2) del temp_draw, temp_img except Exception as e: print(f"Aviso: Falha ao carregar fonte '{self.font_path}'. Usando padrão. Erro: {e}") self.font = ImageFont.load_default() try: bbox = self.font.getbbox("M"); self.font_size = bbox[3] - bbox[1] except AttributeError: self.font_size = 20 self._font_cache[self.font_size] = self.font temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img) try: self.space_width_ref = temp_draw.textlength(" ", font=self.font) except AttributeError: self.space_width_ref = 10 try: bbox = self.font.getbbox("Tg"); self.line_height_ref = bbox[3] - bbox[1] except AttributeError: self.line_height_ref = 30 del temp_draw, temp_img spacing_multiplier = 1.0 if self.num_visible_lines <= 1 else (0.8 if self.num_visible_lines == 2 else (0.6 if self.num_visible_lines == 3 else 0.4)) self.line_spacing = max(0, int(self.line_height_ref * spacing_multiplier)) def _get_font_with_size(self, size: int) -> ImageFont.FreeTypeFont: size = max(1, int(size)) if size in self._font_cache: return self._font_cache[size] try: f = ImageFont.truetype(self.font_path, size) except Exception: f = ImageFont.load_default() self._font_cache[size] = f return f def _calculate_line_width(self, line_elements, draw, font) -> int: width_total = 0 for _, _, txt, _ in line_elements: width_total += self._get_element_width(draw, txt, font) return width_total def _get_element_width(self, draw, text, font): if text == " ": return self.space_width_ref try: return draw.textlength(text, font=font) except AttributeError: try: bbox = draw.textbbox((0, 0), text, font=font); return bbox[2] - bbox[0] if bbox else 0 except AttributeError: try: width, _ = draw.textsize(text, font=font); return width except AttributeError: font_size_est = getattr(font, 'size', self.font_size // 2) return len(text) * (font_size_est // 2) except Exception: font_size_est = getattr(font, 'size', self.font_size // 2) return len(text) * (font_size_est // 2) def render_text_images(self, displayed_content, active_line_local_idx, width, height): img_base = Image.new("RGB", (width, height), (0, 0, 0)) img_mask = Image.new("L", (width, height), 0) draw_base = ImageDraw.Draw(img_base) draw_mask = ImageDraw.Draw(img_mask) max_allowed_width = int(width * 0.90) min_font_size = max(10, int(self.font_size * 0.60)) line_render_data = [] for global_idx, line_elements in displayed_content: if not (line_elements and global_idx is not None): line_render_data.append(None) continue font_line_size = self.font_size font_line = self._get_font_with_size(font_line_size) line_width_px = self._calculate_line_width(line_elements, draw_base, font_line) reduction_step = max(1, int(self.font_size * 0.05)) while line_width_px > max_allowed_width and font_line_size > min_font_size: font_line_size = max(min_font_size, font_line_size - reduction_step) font_line = self._get_font_with_size(font_line_size) line_width_px = self._calculate_line_width(line_elements, draw_base, font_line) if font_line_size == min_font_size: break try: h_ref = font_line.getbbox("Tg"); line_height_px = h_ref[3] - h_ref[1] except Exception: line_height_px = int(self.line_height_ref * (font_line_size / self.font_size)) line_render_data.append({"font": font_line, "font_size": font_line_size, "height": line_height_px, "width": line_width_px, "elements": line_elements, "global_idx": global_idx}) vertical_shift = self.config.get("vertical_shift_pixels", 0) block_height_ref = self.num_visible_lines * self.line_height_ref + (self.num_visible_lines - 1) * self.line_spacing start_y_ref = max(0, (height - block_height_ref) // 2 + vertical_shift) line_start_y_positions = [int(start_y_ref + i * (self.line_height_ref + self.line_spacing)) for i in range(self.num_visible_lines)] all_syllable_render_info = [] active_syllable_indices = (-1, -1) current_global_syl_idx = 0 sentence_end_punctuation = ".!?" for local_idx, render_info in enumerate(line_render_data): if render_info is None: continue font_line = render_info["font"] line_width_px = render_info["width"] elements_in_line = render_info["elements"] current_global_line_idx = render_info["global_idx"] is_active_line = (local_idx == active_line_local_idx) if is_active_line: active_syllable_start_idx_global = current_global_syl_idx line_start_x = (width - line_width_px) // 2 current_x = float(line_start_x) line_y_draw = line_start_y_positions[local_idx] if line_y_draw is None: continue for i, (start_time, end_time, element_text, _) in enumerate(elements_in_line): element_width = self._get_element_width(draw_base, element_text, font_line) if not element_text.isspace(): stripped_text = element_text.rstrip() is_sentence_end = bool(stripped_text and stripped_text[-1] in sentence_end_punctuation) try: draw_x, draw_y = int(current_x), line_y_draw element_text_base = element_text_mask = element_text if is_sentence_end and element_text.rstrip().endswith('.'): element_text_mask = element_text_mask.rstrip('.') draw_base.text((draw_x, draw_y), element_text_base, font=font_line, fill=self.base_text_color) if element_text_mask: draw_mask.text((draw_x, draw_y), element_text_mask, font=font_line, fill=255) final_bbox = draw_base.textbbox((draw_x, draw_y), element_text, font=font_line) if final_bbox: bbox_left, bbox_top, bbox_right, bbox_bottom = final_bbox syl_w_actual, syl_h_actual = bbox_right - bbox_left, bbox_bottom - bbox_top bbox_top_final = bbox_top else: line_height_px_fallback = render_info["height"] bbox_left, bbox_top_final = draw_x, draw_y syl_w_actual, syl_h_actual = element_width, line_height_px_fallback except Exception as e: print(f"Fallback render/bbox for: {element_text}. Err: {e}") draw_x, draw_y = int(current_x), line_y_draw try: draw_base.text((draw_x, draw_y), element_text, font=font_line, fill=self.base_text_color) draw_mask.text((draw_x, draw_y), element_text, font=font_line, fill=255) except Exception as draw_err: print(f" -> Falha até no fallback: {draw_err}") line_height_px_fallback = render_info["height"] bbox_left, bbox_top_final = draw_x, draw_y syl_w_actual, syl_h_actual = element_width, line_height_px_fallback all_syllable_render_info.append((start_time, end_time, bbox_left, bbox_top_final, syl_w_actual, syl_h_actual, current_global_line_idx, is_sentence_end)) current_global_syl_idx += 1 current_x += element_width if is_active_line: active_syllable_end_idx_global = current_global_syl_idx active_syllable_indices = (active_syllable_start_idx_global, active_syllable_end_idx_global) base_cp = cp.asarray(np.array(img_base)) mask_cp = cp.asarray(np.array(img_mask)) return base_cp, mask_cp, all_syllable_render_info, active_syllable_indices # --------------------------------------------------------------------------- # ========================== SUBTITLE PROCESSOR (SEM ALTERAÇÕES) =============== class SubtitleProcessor: def __init__(self, text_renderer: TextRenderer, config, syllable_dict, not_found_words_set): self.text_renderer = text_renderer self.config = config self.upper_case = config["upper_case"] self.font = self.text_renderer.font self.syllable_dict = syllable_dict self.not_found_words_set = not_found_words_set @staticmethod def _parse_time_string_float(time_str): try: return float(time_str) except (ValueError, TypeError): print(f"Aviso: Timestamp inesperado: {time_str}"); return None @staticmethod def read_subtitles(file): char_timing_data = [] try: with open(file, 'r', encoding='utf-8') as f: lines = f.readlines() if not lines: print(f"Aviso: Arquivo '{file}' vazio."); return [], [] header = lines[0].strip().upper() start_idx = 1 if header == "CHARACTER|START|END" else (0 if (header and '|' not in lines[0]) else 0) if start_idx==0 and header != "CHARACTER|START|END": print("Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado.") for line_num, line in enumerate(lines[start_idx:], start=start_idx + 1): if not line.strip(): continue parts = line.rstrip('\n\r').split('|') if len(parts) != 3: print(f"Aviso: Ignorando linha {line_num} mal formatada: '{line}'"); continue char, start_str, end_str = parts[0], parts[1].strip(), parts[2].strip() start_time = SubtitleProcessor._parse_time_string_float(start_str) end_time = SubtitleProcessor._parse_time_string_float(end_str) if start_time is None or end_time is None: print(f"Aviso: Ignorando linha {line_num} com timestamp inválido: '{line}'"); continue if not char: char = " " if end_time < start_time: print(f"Aviso: Corrigindo end<start na linha {line_num}: '{line}'"); end_time = start_time char_timing_data.append((start_time, end_time, str(char))) except FileNotFoundError: print(f"Erro: Arquivo PSV não encontrado: {file}"); return [], [] except Exception as e: print(f"Erro ao ler PSV: {e}"); traceback.print_exc(); return [], [] char_timing_data.sort(key=lambda x: x[0]) long_pauses = SubtitleProcessor._identify_long_pauses(char_timing_data) return char_timing_data, long_pauses @staticmethod def _identify_long_pauses(char_timing_data, min_pause_duration=5.0): pauses = [] if not char_timing_data: return pauses first_start = char_timing_data[0][0] if first_start >= min_pause_duration: pauses.append({"start": 0.0, "end": first_start, "duration": first_start, "type": "initial"}) for i in range(1, len(char_timing_data)): prev_end, curr_start = char_timing_data[i-1][1], char_timing_data[i][0] pause_dur = curr_start - prev_end if pause_dur >= min_pause_duration: is_covered = any(p["type"] == "initial" and p["end"] >= curr_start for p in pauses) if not is_covered: pauses.append({"start": prev_end, "end": curr_start, "duration": pause_dur, "type": "between"}) for i, (start, end, _) in enumerate(char_timing_data): char_dur = end - start if char_dur >= min_pause_duration: is_covered = any(abs(p["start"] - start) < 0.01 and abs(p["end"] - end) < 0.01 for p in pauses) if not is_covered: pauses.append({"start": start, "end": end, "duration": char_dur, "type": "during"}) pauses.sort(key=lambda x: x["start"]) return pauses def _group_chars_into_words(self, char_timing_data): words_spaces = [] current_word = [] for i, (start, end, char) in enumerate(char_timing_data): proc_char = char.upper() if self.upper_case else char if proc_char.isspace(): if current_word: words_spaces.append({"type": "word", "chars": current_word}); current_word = [] words_spaces.append({"type": "space", "start": start, "end": end}) else: current_word.append((start, end, proc_char)) if current_word: words_spaces.append({"type": "word", "chars": current_word}) return words_spaces def _process_words_into_syllables(self, words_and_spaces): syllable_data = [] temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img) font = self.text_renderer.font punc_strip, sent_end = ",.!?;:", ".!?" for element in words_and_spaces: if element["type"] == "space": syllable_data.append((element["start"], element["end"], " ", self.text_renderer.space_width_ref, False)) continue word_chars = element["chars"] if not word_chars: continue word_text = "".join([c[2] for c in word_chars]) cleaned_word = word_text.rstrip(punc_strip) lookup = cleaned_word.lower() if lookup in self.syllable_dict: syl_parts = self.syllable_dict[lookup].split('-') char_idx, orig_idx = 0, 0 word_syl_indices = [] for part in syl_parts: syl_len = len(part) if char_idx + syl_len > len(cleaned_word): if orig_idx < len(word_chars): rem_chars = word_chars[orig_idx:] rem_text = "".join([c[2] for c in rem_chars]) s_start, s_end = rem_chars[0][0], rem_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars) syllable_data.append((s_start, s_end, rem_text, s_width, False)) word_syl_indices.append(len(syllable_data) - 1) break syl_chars = word_chars[orig_idx : orig_idx + syl_len] if not syl_chars: continue s_text = "".join([c[2] for c in syl_chars]) s_start, s_end = syl_chars[0][0], syl_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in syl_chars) syllable_data.append((s_start, s_end, s_text, s_width, False)) word_syl_indices.append(len(syllable_data) - 1) char_idx += syl_len; orig_idx += syl_len if orig_idx < len(word_chars): # Handle trailing punctuation rem_chars = word_chars[orig_idx:] rem_text = "".join([c[2] for c in rem_chars]) expected_punc = word_text[len(cleaned_word):] if rem_text == expected_punc and word_syl_indices: # Append to last syllable last_idx = word_syl_indices[-1] ls_start, _, ls_text, _, _ = syllable_data[last_idx] new_text = ls_text + rem_text new_end = rem_chars[-1][1] new_width = self.text_renderer._get_element_width(temp_draw, new_text, font) syllable_data[last_idx] = (ls_start, new_end, new_text, new_width, False) else: # Create new syllable for remaining rem_start, rem_end = rem_chars[0][0], rem_chars[-1][1] rem_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars) syllable_data.append((rem_start, rem_end, rem_text, rem_width, False)) word_syl_indices.append(len(syllable_data) - 1) if word_syl_indices: # Mark sentence end flag on the *actual* last syllable final_syl_idx = word_syl_indices[-1] final_syl_text = syllable_data[final_syl_idx][2].rstrip() if final_syl_text and final_syl_text[-1] in sent_end: syllable_data[final_syl_idx] = syllable_data[final_syl_idx][:4] + (True,) else: # Word not in dictionary if lookup not in self.not_found_words_set and word_text.lower() == lookup: self.not_found_words_set.add(lookup) s_start, s_end = word_chars[0][0], word_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in word_chars) is_end = word_text.rstrip()[-1] in sent_end if word_text.rstrip() else False syllable_data.append((s_start, s_end, word_text, s_width, is_end)) del temp_draw, temp_img syllable_data.sort(key=lambda x: x[0]) # Post-process end times (simpler version, maybe adjust if needed) processed_data = [] for i in range(len(syllable_data)): start, end, text, width, is_end = syllable_data[i] # simplified end time logic for now processed_data.append((start, end, text, width, is_end)) return processed_data # Returning without the 'next_syl_start' for now def group_syllables_into_lines(self, syllable_timing_data, video_width): lines = [] current_line = [] for syllable_tuple in syllable_timing_data: start, end, text, width, is_end = syllable_tuple # Adjusted unpacking current_line.append((start, end, text, width)) # Store width too if is_end: while current_line and current_line[-1][2].isspace(): current_line.pop() if current_line: lines.append(current_line) current_line = [] while current_line and current_line[-1][2].isspace(): current_line.pop() if current_line: lines.append(current_line) return lines def process_subtitles_to_syllable_lines(self, file, video_width): char_data, pauses = self.read_subtitles(file) if not char_data: return [], pauses words_spaces = self._group_chars_into_words(char_data) syl_data = self._process_words_into_syllables(words_spaces) if not syl_data: print("Aviso: Nenhum dado de sílaba gerado."); return [], pauses # Optional: Add debug print here if needed lines = self.group_syllables_into_lines(syl_data, video_width) return lines, pauses # --------------------------------------------------------------------------- # ========================== GPU CONTEXT ================================== class GPURenderContext: """ Mantém buffers, grades e gráficos CUDA persistentes para evitar alocações e capturar kernels repetitivos. """ def __init__(self, width: int, height: int, cfg): self.w, self.h = width, height self.cfg = cfg self.pool = cp.cuda.MemoryPool() # pool dedicado cp.cuda.set_allocator(self.pool.malloc) # Grades X/Y (uint16 já é suficiente p/ 4 K) yy, xx = cp.mgrid[:height, :width] self.xg = xx.astype(cp.uint16) self.yg = yy.astype(cp.uint16) del xx, yy # Buffers de saída duplos (double-buffer) self.batch_cap = 0 # será ajustado na 1.ª chamada self.out_a = self.out_b = None # Will hold cp.empty(...) arrays # Máscara de progresso (barra completa) bar_h = 20 bar_y0 = 10 self.bar_mask_full = ((self.yg >= bar_y0) & (self.yg < bar_y0 + bar_h)) # (H, W) bool # Cores (pre-calculated float32 RGB) self.base_rgbf = hex_to_bgr_cupy(cfg["base_text_color"])[::-1].astype(cp.float32)/255.0 hl_bgr = hex_to_bgr_cupy(cfg["highlight_text_color"]) self.hl_rgbf = hl_bgr[::-1].astype(cp.float32)/255.0 # Darken highlight color for progress bar background dark_hl_bgr = (hl_bgr.astype(cp.float32) * 0.4).clip(0, 255).astype(cp.uint8) self.bar_bg_rgbf= dark_hl_bgr[::-1].astype(cp.float32)/255.0 # Gráfico CUDA (capturado após warm-up) self.graph = None # Will hold instantiated CUDA Graph self.pinned_mem = None # Will hold pinned memory buffer self.pinned_mem_size = 0 # -------------- helpers -------------- def ensure_batch_buffers(self, n_frames: int): """Ensures output buffers `out_a` and `out_b` can hold `n_frames`.""" if n_frames <= self.batch_cap: return # Use power of 2 for potential alignment benefits self.batch_cap = int(2 ** math.ceil(math.log2(n_frames))) shape = (self.batch_cap, self.h, self.w, 3) # Free old buffers before creating new ones if they exist if self.out_a is not None: del self.out_a if self.out_b is not None: del self.out_b self.out_a = cp.empty(shape, dtype=cp.uint8) self.out_b = cp.empty(shape, dtype=cp.uint8) # Also ensure pinned memory is large enough for one buffer self.ensure_pinned_memory(self.out_a.nbytes) def ensure_pinned_memory(self, n_bytes: int): """Ensures pinned host memory `pinned_mem` is at least `n_bytes`.""" if n_bytes <= self.pinned_mem_size: return # Free old pinned memory if it exists if self.pinned_mem is not None: try: # Explicitly free pinned memory if CuPy version supports it well # cp.cuda.runtime.freeHost(self.pinned_mem.ptr) # Or similar if available del self.pinned_mem except Exception as e: print(f"Note: Could not explicitly free old pinned memory: {e}") self.pinned_mem = None # Ensure it's reset anyway self.pinned_mem_size = n_bytes self.pinned_mem = cp.cuda.alloc_pinned_memory(self.pinned_mem_size) def get_pinned_buffer(self, required_bytes: int) -> cp.cuda.PinnedMemory: """Gets the pinned memory buffer, ensuring it's large enough.""" self.ensure_pinned_memory(required_bytes) return self.pinned_mem # --------------------------------------------------------------------------- # ========================= CUDA PROCESSOR ================================ class CUDAProcessor: """ Versão repaginada: todas as operações puramente em GPU e capturadas em um CUDA Graph após o primeiro batch. """ def __init__(self, cfg, static_bg_rgb_cp, gpu_ctx: GPURenderContext): self.cfg = cfg self.ctx = gpu_ctx # Ensure background is float32 RGB (GPU likes RGB order usually) if static_bg_rgb_cp.shape[2] == 3: # Assuming input is RGB self.bg_f = static_bg_rgb_cp.astype(cp.float32) / 255.0 else: # Fallback if not RGB self.bg_f = cp.zeros((gpu_ctx.h, gpu_ctx.w, 3), dtype=cp.float32) self.min_dur = cfg.get("min_char_duration", 0.01) self.max_vis = cfg.get("max_visual_fill_duration", 3.0) # streams self.stream_compute = cp.cuda.Stream(non_blocking=True) self.stream_h2d = cp.cuda.Stream(non_blocking=True) # For D->H copy # Elementwise kernel para progressivo da sílaba ativa # Adjusted for RGB output and input self._hl_kernel = cp.ElementwiseKernel( # Positional arguments first 'raw float32 cut_x_batch, bool text_mask, \ uint16 X, uint16 Y, \ uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, \ float32 hl_r, float32 hl_g, float32 hl_b, \ float32 base_r, float32 base_g, float32 base_b', # in_params (positional) 'float32 io_r, float32 io_g, float32 io_b', # inout_params (positional) r""" // Check bounds and text mask just once per pixel bool is_syl_pixel = (X >= syl_x) && (X < syl_x + syl_w) && (Y >= syl_y) && (Y < syl_y + syl_h) && text_mask; if (is_syl_pixel) { // Determine color based on horizontal position relative to cut_x // cut_x_batch is specific to this frame (i is batch index) // Use 'cut_x_batch' instead of 'raw_cut_x_batch' here float current_cut_x = cut_x_batch[i]; // Access per-frame cut_x if (X < current_cut_x) { // Highlighted part io_r = hl_r; io_g = hl_g; io_b = hl_b; } else { // Base part (already drawn, but ensure it's base color) io_r = base_r; io_g = base_g; io_b = base_b; } } // Pixels outside the syllable or mask are left untouched (background/other text) """, # operation (positional) # Keyword arguments after positional ones name='apply_syllable_highlight_rgb', # We pass cut_x as raw so we can index it with `i` inside the kernel # preamble='#include <cuda_fp16.h>' # Not needed for float32 ) # ------------------ núcleo p/ um batch ------------------ def _render_batch(self, frame_times_f32: cp.ndarray, # (B,) frame times text_mask_bool: cp.ndarray, # (H,W) bool, where text *could* be syl_meta: dict, # dict of cp arrays for syllables active_syl_span: tuple, # (start_idx, end_idx) for active line active_line_idx: int, completed_line_pixels_mask: cp.ndarray, # (H,W) bool, pixels of completed lines bar_progress: cp.ndarray or None, # (B,) float32 (0-1) or None out_buf: cp.ndarray): # (B,H,W,3) uint8 - OUTPUT buffer """ Executa todo o pipeline para um batch dentro de stream_compute. Chamado dentro de um CUDA Graph depois do 1.º warm-up. OUTPUTS directly into `out_buf`. """ B = frame_times_f32.shape[0] H, W = self.ctx.h, self.ctx.w # 1. Start with background, broadcasted to batch size # Ensure bg_f is (1, H, W, 3) for broadcasting with (B,...) masks inter_f = cp.broadcast_to(self.bg_f[None,...], (B, H, W, 3)).copy() # (B,H,W,3) float32 # 2. Apply progress bar (if needed for this batch) if bar_progress is not None and cp.any(bar_progress > 0): # Optimization: check if any progress # Calculate fill width per frame fill_w = (bar_progress * W).astype(cp.uint16) # (B,) uint16 is enough # Expand grid X once: (1, 1, W) for broadcasting against (B, H, 1) X_grid_exp = self.ctx.xg[None, None, :] # Shape (1, 1, W) # Expand bar mask: (1, H, W) for broadcasting against (B,...) bar_mask_exp = self.ctx.bar_mask_full[None, :, :] # Shape (1, H, W) # Expand fill_w: (B, 1, 1) for broadcasting fill_w_exp = fill_w[:, None, None] # Shape (B, 1, 1) # Create masks for this batch using broadcasting # bar_fill_mask: Where bar should be filled (True/False) for each pixel in batch bar_fill_mask = bar_mask_exp & (X_grid_exp < fill_w_exp) # (B, H, W) # bar_bg_mask: Where bar background should be (True/False) bar_bg_mask = bar_mask_exp & (~bar_fill_mask) # (B, H, W) # Apply colors using boolean masks (expand masks to 3 channels) # Ensure colors are (1, 1, 1, 3) for broadcasting bar_bg_color_exp = self.ctx.bar_bg_rgbf[None, None, None, :] bar_fill_color_exp = self.ctx.hl_rgbf[None, None, None, :] cp.copyto(inter_f, bar_bg_color_exp, where=bar_bg_mask[..., None]) cp.copyto(inter_f, bar_fill_color_exp, where=bar_fill_mask[..., None]) # Cleanup intermediate broadcasted arrays if memory is tight del fill_w, X_grid_exp, bar_mask_exp, fill_w_exp, bar_fill_mask, bar_bg_mask del bar_bg_color_exp, bar_fill_color_exp # 3. Apply base text color everywhere text *could* be (overwrites background/bar) # Ensure base color is (1, 1, 1, 3) for broadcasting base_color_exp = self.ctx.base_rgbf[None, None, None, :] # Apply base color where text_mask_bool is True (expand mask to B and C) cp.copyto(inter_f, base_color_exp, where=text_mask_bool[None, ..., None]) del base_color_exp # Cleanup # 4. Highlight completed lines (overwrites base text color) if completed_line_pixels_mask is not None: # Ensure highlight color is (1, 1, 1, 3) for broadcasting hl_color_exp = self.ctx.hl_rgbf[None, None, None, :] # Apply highlight color where completed_line_pixels_mask is True (expand mask to B and C) cp.copyto(inter_f, hl_color_exp, where=completed_line_pixels_mask[None, ..., None]) del hl_color_exp # Cleanup # 5. Highlight active syllables (the complex part) s0, s1 = active_syl_span # Syllable indices for the currently active line [s0, s1) if s1 > s0 and syl_meta is not None: # Check if the *active_line_idx* actually corresponds to syllables in this span # This avoids running the kernel if the active line has no syllables in syl_meta # (Could happen with empty lines or data issues) active_line_has_syllables_in_span = cp.any(syl_meta["line_idx"][s0:s1] == active_line_idx) if active_line_has_syllables_in_span: # Prepare data for the kernel for *all* syllables in the span # Kernel will internally filter by active_line_idx if needed, but simpler is often faster syl_indices_in_span = cp.arange(s0, s1) # Indices within syl_meta arrays # Calculate visual progress (0.0 to 1.0) for each syllable in the span, for each frame in the batch # Shape: (B, len(span)) time_elapsed = frame_times_f32[:, None] - syl_meta["start"][syl_indices_in_span] visual_progress = cp.clip(time_elapsed / syl_meta["vis_dur"][syl_indices_in_span], 0.0, 1.0) # Calculate the horizontal cutoff point (x-coordinate) for highlighting # Shape: (B, len(span)) cut_x_batch_span = syl_meta["x"][syl_indices_in_span] + \ visual_progress * syl_meta["w"][syl_indices_in_span] # Iterate through syllables *in the active span* only num_syls_in_meta = syl_meta["line_idx"].shape[0] # Get actual size of the array syl_indices_in_span_host = cp.asnumpy(syl_indices_in_span) # Bring indices to host for easier iteration/debug # Use host iteration for k to easily check bounds for i, k_host in enumerate(syl_indices_in_span_host): # Convert k_host back to int if needed, though it should be int already k = int(k_host) # --- Explicit Bounds Check --- if k < 0 or k >= num_syls_in_meta: print(f"!!! FATAL ERROR: Index k ({k}) is out of bounds for syl_meta['line_idx'] (size {num_syls_in_meta}).") print(f" s0={s0}, s1={s1}, active_line_idx={active_line_idx}") # This indicates a logic error in how indices (s0, s1) or syl_meta are generated. # Raising an error is appropriate here. raise IndexError(f"Syllable index k={k} out of bounds (size={num_syls_in_meta})") # --- End Bounds Check --- # Filter: only process syllables belonging to the truly active line # Accessing syl_meta["line_idx"][k] happens here if syl_meta["line_idx"][k] != active_line_idx: continue # Extract syllable bounding box for kernel # Ensure k is still valid before accessing other parts of syl_meta # (redundant if the first check passed, but safe) if k < 0 or k >= syl_meta["x"].shape[0]: # Check against another array size for extra safety print(f"!!! WARNING: Index k ({k}) became invalid before accessing bbox?") continue sx, sy = syl_meta["x"][k], syl_meta["y"][k] sw, sh = syl_meta["w"][k], syl_meta["h"][k] # Skip if width or height is zero if sw <= 0 or sh <= 0: continue # Get the cut_x values specific to *this syllable* across the batch # Shape: (B,) # We need the index 'i' which corresponds to the position within the span current_syl_cut_x_batch = cut_x_batch_span[:, i] # Call the ElementwiseKernel # It modifies 'inter_f' in-place # Pass grids (H,W), masks (H,W), scalar bbox, scalar colors # Pass cut_x for the batch (B,) as 'raw' # Kernel uses 'i' (batch index) to get the correct cut_x self._hl_kernel(current_syl_cut_x_batch, # raw float32 (B,) text_mask_bool, # bool (H, W) self.ctx.xg, # uint16 (H, W) self.ctx.yg, # uint16 (H, W) sx, sy, sw, sh, # uint16 scalars self.ctx.hl_rgbf[0], self.ctx.hl_rgbf[1], self.ctx.hl_rgbf[2], # float32 scalars self.ctx.base_rgbf[0], self.ctx.base_rgbf[1], self.ctx.base_rgbf[2], # float32 scalars inter_f[..., 0], # float32 (B, H, W) R channel inter_f[..., 1], # float32 (B, H, W) G channel inter_f[..., 2]) # float32 (B, H, W) B channel # REMOVED: size=B * H * W) # Let CuPy infer the size # Cleanup batch-specific syllable calculations del time_elapsed, visual_progress, cut_x_batch_span if 'current_syl_cut_x_batch' in locals(): del current_syl_cut_x_batch # 6. Convert final float32 RGB to uint8 BGR and write to output buffer # Perform conversion and channel swap in one step if possible # Ensure output is contiguous for potential speedup if needed later out_buf[:] = (inter_f[..., ::-1] * 255.0).astype(cp.uint8) # Cleanup main intermediate buffer del inter_f # ----------------------------------------------------------------------- def process_frames_streaming(self, base_cp, mask_cp, syl_info, active_indices, video_writer, # FFmpegWriter instance video_lock, # Threading lock for FFmpegWriter fps, first_frame_idx, num_frames_to_process, # Renamed n_frames -> num_frames_to_process active_line_idx, completed_lines_set, long_pauses): # Renamed bar_pauses -> long_pauses """ Processes a sequence of frames, potentially using CUDA Graphs. Handles pre-calculation, batching, GPU execution, and H2D transfer. Writes frames to FFmpeg via the video_writer. """ if num_frames_to_process <= 0: print("Warning: process_frames_streaming called with num_frames_to_process <= 0.") return # Determine batch size (use config value, but ensure it's reasonable) # Could add dynamic sizing based on VRAM here later if needed batch_size = max(32, min(self.cfg.get("frames_per_batch", 64), 512)) # Example bounds # ---------- Pré-cálculo de dados constantes (for this whole segment) --------------- # Mask where text *could* appear (remains constant for this render call) text_mask_bool = (mask_cp > 128) if mask_cp is not None else cp.zeros((self.ctx.h, self.ctx.w), dtype=bool) # Create a combined mask of *all pixels* belonging to *any completed line* completed_line_pixels_mask = None if completed_lines_set and syl_info: # Find all syllable indices belonging to completed lines completed_syl_indices = [ idx for idx, s_info in enumerate(syl_info) if s_info[6] in completed_lines_set # s_info[6] is global_line_idx ] if completed_syl_indices: # Initialize mask to False completed_line_pixels_mask = cp.zeros_like(text_mask_bool) # Iterate through completed syllables and mark their pixels in the mask X_grid, Y_grid = self.ctx.xg, self.ctx.yg for idx in completed_syl_indices: s_start, s_end, sx, sy, sw, sh, _, _ = syl_info[idx] if sw > 0 and sh > 0: # Define the bounding box for the syllable syl_bbox_mask = (X_grid >= sx) & (X_grid < sx + sw) & \ (Y_grid >= sy) & (Y_grid < sy + sh) # Combine with the general text mask and OR into the final mask completed_line_pixels_mask |= (syl_bbox_mask & text_mask_bool) del X_grid, Y_grid, completed_syl_indices # Cleanup else: # No syllables found for the completed lines, mask remains None pass # Convert syllable info into a dictionary of CuPy arrays (if syllables exist) syl_meta = None if syl_info: num_syls = len(syl_info) if num_syls > 0: syl_meta = { # Ensure types are optimal (uint16 for coords/dims, int16 for index) "start" : cp.asarray([s[0] for s in syl_info], dtype=cp.float32), "end" : cp.asarray([s[1] for s in syl_info], dtype=cp.float32), "x" : cp.asarray([s[2] for s in syl_info], dtype=cp.uint16), "y" : cp.asarray([s[3] for s in syl_info], dtype=cp.uint16), "w" : cp.asarray([s[4] for s in syl_info], dtype=cp.uint16), "h" : cp.asarray([s[5] for s in syl_info], dtype=cp.uint16), "line_idx": cp.asarray([s[6] for s in syl_info], dtype=cp.int16), # is_sentence_end (s[7]) is not directly used in rendering kernel, but kept if needed elsewhere } # Calculate effective visual duration for highlighting (clipped) raw_duration = syl_meta["end"] - syl_meta["start"] # Apply clipping: duration is at least min_dur and at most max_vis syl_meta["vis_dur"] = cp.clip(raw_duration, self.min_dur, self.max_vis).astype(cp.float32) else: # syl_info was provided but was empty list pass else: # syl_info was None pass # ---------- Execução por batches --------------------------- self.ctx.ensure_batch_buffers(batch_size) # Ensure buffers A/B exist and are large enough outA, outB = self.ctx.out_a, self.ctx.out_b # Get references # Prepare for CUDA Graph capture warmup_frames_count = self.cfg.get("cuda_graph_warmup_frames", 4) graph_warmup_done = False graph_was_captured = self.ctx.graph is not None # Check if graph exists from previous calls total_processed_in_call = 0 current_batch_start_frame = first_frame_idx while total_processed_in_call < num_frames_to_process: # Determine size of the current batch remaining_frames = num_frames_to_process - total_processed_in_call current_batch_size = min(batch_size, remaining_frames) if current_batch_size <= 0: # Should not happen, but safety check break # Get the correct output buffer slice (ping-pong) # Use total_processed_in_call to determine A or B buffer consistently buffer_index = (total_processed_in_call // batch_size) % 2 out_buf_slice = outA[:current_batch_size] if buffer_index == 0 else outB[:current_batch_size] # Calculate frame indices and times for this specific batch batch_frame_indices = cp.arange( current_batch_start_frame, current_batch_start_frame + current_batch_size, dtype=cp.int32 ) batch_frame_times = batch_frame_indices.astype(cp.float32) / fps # Determine progress bar status for this batch batch_bar_progress = None if long_pauses: # Find if any frame in this batch falls within *any* long pause batch_in_pause = cp.zeros(current_batch_size, dtype=bool) batch_progress_values = cp.zeros(current_batch_size, dtype=cp.float32) for pause in long_pauses: pause_start, pause_end, pause_duration = pause["start"], pause["end"], pause["duration"] # Find frames within this pause's time range indices_in_pause = cp.where((batch_frame_times >= pause_start) & (batch_frame_times < pause_end))[0] if indices_in_pause.size > 0: batch_in_pause[indices_in_pause] = True # Calculate progress only for frames in *this* pause progress = (batch_frame_times[indices_in_pause] - pause_start) / max(pause_duration, 1e-6) batch_progress_values[indices_in_pause] = cp.clip(progress, 0.0, 1.0) # If any frame was in a pause, use the calculated progress values if cp.any(batch_in_pause): batch_bar_progress = batch_progress_values del batch_in_pause, batch_progress_values # Cleanup pause calculation temps # ---------------- Execute _render_batch (GPU) ----------------- can_use_graph = graph_was_captured and graph_warmup_done if can_use_graph: # --- Launch existing CUDA Graph --- # Note: We assume the graph was captured with the *maximum* batch size. # We are launching it for a potentially *smaller* batch size. # This is generally okay, but less efficient if sizes vary wildly. # TODO: Potentially re-capture graph if batch size changes significantly? # For simplicity now, we launch the existing one. # We need to ensure the *input data pointers* used by the graph are updated # if they change location (e.g., if batch_frame_times points to new memory). # However, CuPy's graph API often handles this if you pass the arrays directly. # We might need explicit graph.update() if issues arise. # Assuming _render_batch inputs are stable or handled by CuPy's graph launch: try: # Ensure inputs match what the graph expects (might need updates) # For now, assume launch handles it or inputs are stable enough self.ctx.graph.launch(self.stream_compute) # Launch on the compute stream except Exception as graph_launch_err: print(f"Error launching CUDA Graph: {graph_launch_err}. Falling back.") # Fallback to regular execution for this batch self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) # Consider disabling graph use for subsequent batches if it keeps failing # graph_was_captured = False # Option: disable graph if launch fails else: # --- Regular execution or Graph Warmup/Capture --- # Use the compute stream for the rendering task with self.stream_compute: self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) # Check if warmup period is over and graph hasn't been captured yet if not graph_was_captured and total_processed_in_call >= warmup_frames_count: print(f"Capturing CUDA Graph after {total_processed_in_call + current_batch_size} frames...") try: graph = cp.cuda.Graph() with graph.capture(stream=self.stream_compute): # Re-run the *last* batch's render call inside capture self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) self.ctx.graph = graph.instantiate() graph_was_captured = True graph_warmup_done = True # Mark warmup as complete print("CUDA Graph captured successfully.") except Exception as graph_capture_err: print(f"Error capturing CUDA Graph: {graph_capture_err}. Graph will not be used.") self.ctx.graph = None # Ensure graph is None if capture failed graph_warmup_done = True # Still mark warmup done to prevent re-attempts # ----------------------------------------------------------------- # Synchronize the compute stream to ensure rendering is finished self.stream_compute.synchronize() # ---------- Asynchronous D->H Copy and Write to FFmpeg ----------- # Get the output buffer (which is BGR uint8) output_bgr_gpu = out_buf_slice # Already in BGR uint8 from _render_batch # Ensure pinned memory is ready required_bytes = output_bgr_gpu.nbytes pinned_buffer = self.ctx.get_pinned_buffer(required_bytes) # Perform async D->H copy using the H2D stream (stream_h2d is just a name here) # Source: GPU buffer, Dest: Pinned Host buffer cp.cuda.runtime.memcpyAsync( pinned_buffer.ptr, # Destination: Pinned host memory pointer output_bgr_gpu.data.ptr, # Source: GPU memory pointer required_bytes, # Size in bytes cp.cuda.runtime.memcpyDeviceToHost, # Direction self.stream_h2d.ptr # Stream to use for the copy ) # Synchronize the H2D stream to ensure copy is finished self.stream_h2d.synchronize() # Get a NumPy view of the pinned memory (no copy involved here) frame_data_np = np.frombuffer(pinned_buffer, dtype=np.uint8, count=required_bytes) # Reshape to (current_batch_size, H, W, 3) - FFmpeg expects this BGR format frames_to_write = frame_data_np.reshape(current_batch_size, self.ctx.h, self.ctx.w, 3) # Write the batch to FFmpeg using the provided writer and lock try: with video_lock: # Iterate and write frame by frame if writer expects single frames # Or write the whole batch if writer supports it (check FFmpegWriter impl.) # Assuming FFmpegWriter's write handles a single frame numpy array: for frame_idx in range(current_batch_size): video_writer.write(frames_to_write[frame_idx]) # If FFmpegWriter could handle the whole batch byte buffer: # video_writer.write_bytes(pinned_buffer.tobytes()[:required_bytes]) # Hypothetical faster write except Exception as write_err: print(f"Error writing batch to FFmpeg: {write_err}") # Decide how to handle: stop processing, log, etc. # For now, just print and continue, but might need better error handling. # Update counters total_processed_in_call += current_batch_size current_batch_start_frame += current_batch_size # Minor optimization: Check if warmup is complete after the batch is processed if not graph_warmup_done and total_processed_in_call >= warmup_frames_count: graph_warmup_done = True # Optional: Free memory aggressively if needed (usually handled by pool) # cp.get_default_memory_pool().free_all_blocks() # cp.get_default_pinned_memory_pool().free_all_blocks() # --- End of batch loop --- # Cleanup pre-calculated data if it consumes significant memory del text_mask_bool if completed_line_pixels_mask is not None: del completed_line_pixels_mask if syl_meta: for key in list(syl_meta.keys()): del syl_meta[key] del syl_meta # Pinned memory buffer (`self.ctx.pinned_mem`) is kept for reuse # --------------------------------------------------------------------------- # ===================== FFMPEG WRITER (Mostly Unchanged) =================== class FFmpegWriter: def __init__(self, output_file, width, height, fps, config, gpu_ctx: GPURenderContext = None): self.output_file = output_file self.config = config self.gpu_ctx = gpu_ctx # Optional context for pinned memory reuse self.width = width self.height = height self.frame_size_bytes = width * height * 3 # BGR24 ffmpeg_cmd = [ "ffmpeg", "-y", "-f", "rawvideo", "-vcodec", "rawvideo", "-s", f"{width}x{height}", "-pix_fmt", "bgr24", "-r", str(fps), "-i", "-", # Input from stdin # Video codec options "-c:v", config.get("ffmpeg_codec", "libx264"), # Default to libx264 if not specified "-preset", config.get("ffmpeg_preset", "medium"), "-b:v", config.get("ffmpeg_bitrate", "5M"), # Default bitrate "-pix_fmt", "yuv420p", # Common pixel format for compatibility "-tune", config.get("ffmpeg_tune", "fastdecode"), # Example tune # Output file output_file ] print(f"FFmpeg command: {' '.join(ffmpeg_cmd)}") # Log the command # Use a large buffer for stdin if possible bufsize = 10**8 # ~100MB buffer try: self.ffmpeg_process = subprocess.Popen( ffmpeg_cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, # Suppress stdout unless needed for debugging stderr=subprocess.PIPE, # Capture stderr bufsize=bufsize ) except FileNotFoundError: print("ERROR: ffmpeg command not found. Is FFmpeg installed and in your PATH?") raise except Exception as e: print(f"ERROR: Failed to start FFmpeg process: {e}") raise # Thread to read stderr without blocking self.stderr_queue = queue.Queue() self.stderr_thread = threading.Thread(target=self._read_stderr, daemon=True) self.stderr_thread.start() def _read_stderr(self): """Reads FFmpeg's stderr output line by line and puts it in a queue.""" try: # Use iter to read lines efficiently for line in iter(self.ffmpeg_process.stderr.readline, b''): self.stderr_queue.put(line.decode('utf-8', errors='replace').strip()) except Exception as e: # Handle exceptions during stderr reading if necessary self.stderr_queue.put(f"Error reading FFmpeg stderr: {e}") finally: # Signal that reading is done (optional) self.stderr_queue.put(None) # Sentinel value def write(self, frame: np.ndarray): """Writes a single frame (NumPy array, HxWxC BGR uint8) to FFmpeg.""" if self.ffmpeg_process.stdin.closed: print("Warning: Attempted to write to closed FFmpeg stdin.") return try: # Ensure frame is numpy and bytes if isinstance(frame, cp.ndarray): frame = cp.asnumpy(frame) # Convert CuPy array if necessary # Check shape and type? Optional, depends on strictness needed. # if frame.shape != (self.height, self.width, 3) or frame.dtype != np.uint8: # print(f"Warning: Invalid frame shape/type: {frame.shape} {frame.dtype}") # # Handle error? resize/convert? skip? # return self.ffmpeg_process.stdin.write(frame.tobytes()) except (OSError, BrokenPipeError) as e: print(f"ERROR writing frame to FFmpeg: {e}. FFmpeg might have terminated.") self.release() # Attempt cleanup raise # Re-raise the exception def release(self): """Closes FFmpeg stdin, waits for the process, and prints stderr.""" if self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed: try: self.ffmpeg_process.stdin.close() except OSError as e: print(f"Warning: Error closing FFmpeg stdin: {e}") # Wait for the process to finish return_code = self.ffmpeg_process.wait() # Wait for the stderr reader thread to finish self.stderr_thread.join(timeout=2.0) # Add timeout # Print accumulated stderr messages print("\n--- FFmpeg stderr ---") while not self.stderr_queue.empty(): line = self.stderr_queue.get() if line is None: break # Sentinel hit print(line) print("--- End FFmpeg stderr ---\n") if return_code != 0: print(f"Warning: FFmpeg process exited with non-zero status: {return_code}") else: print("FFmpeg process finished successfully.") # --------------------------------------------------------------------------- # ===================== KARAOKE VIDEO CREATOR (Adjusted) =================== class KaraokeVideoCreator: def __init__(self, config, text_renderer: TextRenderer): self.config = config self.fps = config["video_fps"] self.text_renderer = text_renderer self.num_visible_lines = config["num_visible_lines"] try: width, height = map(int, self.config["video_resolution"].split("x")) except ValueError: print(f"Aviso: Resolução inválida '{self.config['video_resolution']}'. Usando 1920x1080.") width, height = 1920, 1080 self.width = width self.height = height # Load background (unchanged logic) self.static_bg_frame_rgb_cp = None # Will hold CuPy array self.static_bg_frame_bgr_np = None # Fallback/initial frame use bg_path = config.get("background_image") try: if bg_path and os.path.exists(bg_path): bg_img = Image.open(bg_path).convert("RGB").resize((width, height), Image.Resampling.LANCZOS) self.static_bg_frame_rgb_cp = cp.asarray(np.array(bg_img)) # RGB for GPU self.static_bg_frame_bgr_np = np.array(bg_img)[:, :, ::-1].copy() # BGR for OpenCV/FFmpeg initial else: raise FileNotFoundError("Background not found or not specified") except Exception as e: print(f"Aviso: Falha ao carregar fundo '{bg_path}': {e}. Usando fundo preto.") self.static_bg_frame_rgb_cp = cp.zeros((height, width, 3), dtype=cp.uint8) self.static_bg_frame_bgr_np = np.zeros((height, width, 3), dtype=np.uint8) # Initialize GPU Context and CUDA Processor *here* self.init_gpu() # Ensures GPU is ready self.gpu_ctx = GPURenderContext(self.width, self.height, self.config) self.cuda_processor = CUDAProcessor( self.config, self.static_bg_frame_rgb_cp, # Pass the RGB CuPy background self.gpu_ctx # Pass the shared GPU context ) def init_gpu(self): """Initializes the GPU device and memory pool (minimal version).""" try: cp.cuda.Device(0).use() # Memory pool is handled by GPURenderContext now cp.cuda.Stream.null.synchronize() _ = cp.zeros(1) # Warm-up allocation print("GPU initialized successfully.") except cp.cuda.runtime.CUDARuntimeError as e: print(f"Erro Crítico: Falha ao inicializar CUDA: {e}") raise except Exception as e: print(f"Erro Crítico: Falha inesperada na inicialização da GPU: {e}") raise def _get_next_global_indices(self, current_displayed_indices, count=2): """Finds the next `count` global line indices not currently displayed.""" valid_indices = {idx for idx in current_displayed_indices if idx is not None} max_existing_idx = max(valid_indices) if valid_indices else -1 next_indices = [] candidate_idx = max_existing_idx + 1 while len(next_indices) < count: next_indices.append(candidate_idx) candidate_idx += 1 return next_indices def create_video(self, syllable_lines, long_pauses, output_file, audio_file_path): """Creates the karaoke video using the optimized CUDA processor.""" start_total_time = time.time() width, height = self.width, self.height N = self.num_visible_lines # Number of slots on screen # --- Determine video duration --- audio_duration = get_audio_duration(audio_file_path) last_syl_end_time = 0.0 first_syl_start_time = 0.0 if syllable_lines: # Find last valid line and its end time last_valid_line_idx = next((idx for idx in range(len(syllable_lines) - 1, -1, -1) if syllable_lines[idx]), -1) if last_valid_line_idx != -1 and syllable_lines[last_valid_line_idx]: # Get end time of the last element in the last valid line last_syl_end_time = syllable_lines[last_valid_line_idx][-1][1] # index 1 is end_time # Find first valid line and its start time first_valid_line_idx = next((idx for idx, ln in enumerate(syllable_lines) if ln), -1) if first_valid_line_idx != -1 and syllable_lines[first_valid_line_idx]: first_syl_start_time = syllable_lines[first_valid_line_idx][0][0] # index 0 is start_time else: print("Aviso: Nenhuma linha de sílaba para processar.") last_syl_end_time = 1.0 # Default minimum duration if no syllables # Set video end time based on audio or subtitles, adding a small buffer video_end_time = last_syl_end_time + 0.5 # Default end based on subtitles if audio_duration is not None: video_end_time = max(video_end_time, audio_duration + 0.1) print(f"Usando duração do áudio: {audio_duration:.2f}s") else: print(f"Aviso: Sem duração de áudio. Vídeo terminará em {video_end_time:.2f}s (baseado nas legendas).") total_frames = math.ceil(video_end_time * self.fps) if total_frames <= 0: print("Erro: Duração total do vídeo calculada como 0 ou negativa. Abortando.") return print(f"Duração estimada: {video_end_time:.2f}s | Total de frames: {total_frames}") # --- Initialize FFmpeg Writer --- try: # Pass gpu_ctx if FFmpegWriter can use it for pinned memory video_writer = FFmpegWriter(output_file, width, height, self.fps, self.config, self.gpu_ctx) except Exception as e: print(f"Erro Crítico: Falha ao inicializar FFmpegWriter: {e}") return video_lock = threading.Lock() # Lock for thread-safe writing to FFmpeg process # --- Main Processing Loop --- current_frame_index = 0 num_total_lines = len(syllable_lines) pbar = tqdm(total=total_frames, unit="frames", desc="Gerando vídeo", bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}{postfix}]") # State variables for line display logic # displayed_content format: list of N tuples: (global_line_idx | None, line_data | None) displayed_content = [] for idx in range(N): line_data = syllable_lines[idx] if idx < num_total_lines else None displayed_content.append((idx if line_data else None, line_data)) completed_global_line_indices = set() # Keep track of lines fully processed prev_render_data_for_fill = None # Store last valid render data for filling gaps # --- Initial Static Frames (before first syllable) --- initial_static_frames = 0 if first_syl_start_time > 0.01: # Add a small tolerance initial_static_frames = max(0, int(first_syl_start_time * self.fps)) if initial_static_frames > 0: print(f"Gerando {initial_static_frames} frames estáticos iniciais...") try: # Render the initial state (first N lines, no highlight) initial_base_cp, initial_mask_cp, initial_syl_info, _ = self.text_renderer.render_text_images( displayed_content, -1, width, height # -1 indicates no active line ) if initial_base_cp is not None and initial_mask_cp is not None: # Use process_frames_streaming to render static frames # Pass empty syllable info or handle appropriately in CUDAProcessor if needed self.cuda_processor.process_frames_streaming( initial_base_cp, initial_mask_cp, [], # No active syllables initially (-1, -1), video_writer, video_lock, self.fps, 0, initial_static_frames, # Frame range -1, # No active line index set(), # No completed lines initially long_pauses # Pass pauses for potential initial progress bar ) prev_render_data_for_fill = (initial_base_cp.copy(), initial_mask_cp.copy()) # Store for potential gap filling del initial_base_cp, initial_mask_cp, initial_syl_info else: print("Aviso: Renderização inicial falhou. Preenchendo com background.") # Fallback: write raw background frames with video_lock: for _ in range(initial_static_frames): video_writer.write(self.static_bg_frame_bgr_np) prev_render_data_for_fill = None pbar.update(initial_static_frames) current_frame_index = initial_static_frames except Exception as e: print(f"Erro ao gerar frames iniciais: {e}") traceback.print_exc() # Fallback if rendering fails with video_lock: for _ in range(initial_static_frames): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(initial_static_frames) current_frame_index = initial_static_frames prev_render_data_for_fill = None # --- Process Each Line/Sentence --- last_line_processed_frame = current_frame_index trigger_1_pending_for_line = -1 # Track if top needs update trigger_2_pending = False # Track if bottom needs update last_trigger_1_line_completed = -1 last_trigger_2_line_completed = -1 for current_global_line_idx, current_line_syllables in enumerate(syllable_lines): if not current_line_syllables: print(f"Aviso: Pulando linha global vazia {current_global_line_idx}") continue line_start_time = current_line_syllables[0][0] line_end_time = current_line_syllables[-1][1] line_start_frame = int(line_start_time * self.fps) # --- Handle Line Transitions and Content Updates --- # Check if Trigger 2 (bottom update) is pending from the *previous* line completion if trigger_2_pending and current_global_line_idx != last_trigger_2_line_completed: print(f"Trigger 2: Atualizando slots inferiores antes da linha {current_global_line_idx}") current_indices_on_screen = [content[0] for content in displayed_content] next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2 available new_idx_bottom1 = next_indices[0] new_data_bottom1 = syllable_lines[new_idx_bottom1] if new_idx_bottom1 < num_total_lines else None displayed_content[N-2] = (new_idx_bottom1 if new_data_bottom1 else None, new_data_bottom1) new_idx_bottom2 = next_indices[1] new_data_bottom2 = syllable_lines[new_idx_bottom2] if new_idx_bottom2 < num_total_lines else None displayed_content[N-1] = (new_idx_bottom2 if new_data_bottom2 else None, new_data_bottom2) trigger_2_pending = False # Reset trigger last_trigger_2_line_completed = current_global_line_idx # Mark update time # Find the local slot index for the current global line active_local_idx = -1 for local_idx, (global_idx, _) in enumerate(displayed_content): if global_idx == current_global_line_idx: active_local_idx = local_idx break if active_local_idx == -1: print(f"ERRO FATAL: Linha ativa {current_global_line_idx} não encontrada nos slots! {displayed_content}") # This indicates a logic error in content updates. Need to decide how to handle. # Option 1: Try to recover (e.g., force it into the last slot?) - Risky # Option 2: Abort or skip processing this line and fill time - Safer # For now, fill time and skip processing the line's content frames_to_fill_until_next = 0 next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] next_start_frame = int(next_line_start_time * self.fps) frames_to_fill_until_next = max(0, next_start_frame - current_frame_index) else: # No more lines frames_to_fill_until_next = max(0, total_frames - current_frame_index) if frames_to_fill_until_next > 0: print(f" Preenchendo {frames_to_fill_until_next} frames devido à linha não encontrada...") # Use previous render data if available for filling if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, frames_to_fill_until_next, -1, completed_global_line_indices, long_pauses ) else: # Fallback to background with video_lock: for _ in range(frames_to_fill_until_next): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(frames_to_fill_until_next) current_frame_index += frames_to_fill_until_next continue # Skip to the next line in syllable_lines # --- Fill Gap Before Line Starts (if needed) --- frames_to_fill_gap = max(0, line_start_frame - current_frame_index) if frames_to_fill_gap > 0: print(f"Preenchendo {frames_to_fill_gap} frames de gap antes da linha {current_global_line_idx}...") if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, frames_to_fill_gap, -1, completed_global_line_indices, long_pauses # No active line during gap ) else: # Fallback with video_lock: for _ in range(frames_to_fill_gap): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(frames_to_fill_gap) current_frame_index += frames_to_fill_gap # --- Render the current state with the active line --- render_success = False render_data = None try: # Render base text and mask for the current arrangement of lines render_data = self.text_renderer.render_text_images( displayed_content, active_local_idx, width, height ) base_cp, mask_cp, all_syl_info, active_indices = render_data if base_cp is None or mask_cp is None: raise ValueError("TextRenderer returned None for base or mask.") render_success = True except Exception as e: print(f"Erro Crítico ao renderizar texto para linha {current_global_line_idx}: {e}") traceback.print_exc() render_success = False # Fallback handled below # --- Process the frames for this line's duration --- if render_success: base_cp, mask_cp, all_syl_info, active_indices = render_data # Determine the end frame for processing this line's active state # Usually ends when the next line starts, or at video end next_line_start_time = float('inf') next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] # End processing at the earliest of: line end + buffer, next line start, video end processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time) # Add small buffer? processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames) # Calculate number of frames to generate for this active line state effective_start_frame = max(line_start_frame, current_frame_index) # Start from current pos if line started earlier num_frames_for_line = max(0, processing_end_frame - effective_start_frame) if num_frames_for_line > 0: # Check if Trigger 1 needs to be set (update top half) is_penultimate_line_slot = (active_local_idx == N - 2) trigger_1_frame = -1 if is_penultimate_line_slot and current_global_line_idx != last_trigger_1_line_completed: line_duration = max(line_end_time - line_start_time, 0.01) midpoint_time = line_start_time + line_duration / 2.0 trigger_1_frame = int(midpoint_time * self.fps) trigger_1_pending_for_line = current_global_line_idx # Mark which line set the trigger # Call the optimized CUDA processor self.cuda_processor.process_frames_streaming( base_cp, mask_cp, all_syl_info, active_indices, video_writer, video_lock, self.fps, effective_start_frame, num_frames_for_line, current_global_line_idx, # Pass the *active* line index completed_global_line_indices, # Pass completed lines for highlighting long_pauses # Pass pauses for progress bar ) processed_frames_end = effective_start_frame + num_frames_for_line # --- Handle Trigger 1 (Top Update) --- # Check if the trigger frame occurred within the processed frames if trigger_1_pending_for_line == current_global_line_idx and trigger_1_frame != -1 and processed_frames_end > trigger_1_frame: print(f"Trigger 1: Atualizando slots superiores durante linha {current_global_line_idx}") current_indices_on_screen = [content[0] for content in displayed_content] next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2 new_idx_top1 = next_indices[0] new_data_top1 = syllable_lines[new_idx_top1] if new_idx_top1 < num_total_lines else None displayed_content[0] = (new_idx_top1 if new_data_top1 else None, new_data_top1) new_idx_top2 = next_indices[1] new_data_top2 = syllable_lines[new_idx_top2] if new_idx_top2 < num_total_lines else None displayed_content[1] = (new_idx_top2 if new_data_top2 else None, new_data_top2) trigger_1_pending_for_line = -1 # Reset trigger last_trigger_1_line_completed = current_global_line_idx # Mark update time # Update progress bar and current frame index # pbar update happens inside cuda_processor call now (implicitly via frame count) current_frame_index = processed_frames_end # Store the rendered mask/base for potential gap filling later prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy()) del base_cp, mask_cp, all_syl_info # Free GPU memory for this line's render else: # No frames needed processing (e.g., line ends before it starts relative to current_frame_index) print(f"Aviso: Linha {current_global_line_idx} não precisou de processamento de frames ({num_frames_for_line}).") # Ensure prev_render_data is updated even if no processing happened, using the latest render prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy()) if base_cp is not None and mask_cp is not None else prev_render_data_for_fill # Mark the current global line as completed *after* processing its frames completed_global_line_indices.add(current_global_line_idx) # --- Handle Trigger 2 (Bottom Update) --- # Check if this line was the last one in the visible slots is_last_line_slot = (active_local_idx == N - 1) if is_last_line_slot: trigger_2_pending = True # Set trigger for the *next* line to handle else: # render_success was False print(f"Aviso: Falha na renderização para linha {current_global_line_idx}. Preenchendo tempo...") # Calculate fill duration similar to the success case next_line_start_time = float('inf') next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time) processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames) effective_start_frame = max(line_start_frame, current_frame_index) num_frames_to_fill = max(0, processing_end_frame - effective_start_frame) if num_frames_to_fill > 0: if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, effective_start_frame, num_frames_to_fill, -1, completed_global_line_indices, long_pauses # No active line ) else: # Fallback with video_lock: for _ in range(num_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(num_frames_to_fill) current_frame_index += num_frames_to_fill # Still mark line as 'completed' in terms of timing, even if render failed completed_global_line_indices.add(current_global_line_idx) # Clean up GPU memory periodically if needed (usually handled by pool) cp.get_default_memory_pool().free_all_blocks() # --- Fill Remaining Frames (after last line) --- final_frames_to_fill = total_frames - current_frame_index if final_frames_to_fill > 0: print(f"Preenchendo {final_frames_to_fill} frames finais...") if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill # Optionally add a fade-out effect here by manipulating the mask or base over time # For simplicity, just hold the last rendered state self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, final_frames_to_fill, -1, completed_global_line_indices, long_pauses # No active line ) del fill_base, fill_mask # Cleanup last stored render data else: # Fallback with video_lock: for _ in range(final_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(final_frames_to_fill) current_frame_index += final_frames_to_fill # --- Cleanup --- pbar.close() video_writer.release() # Close FFmpeg process and print stderr if prev_render_data_for_fill: # Should be None, but just in case del prev_render_data_for_fill del displayed_content[:] # Clear display list # Explicitly clear GPU context graph if desired if self.gpu_ctx.graph: del self.gpu_ctx.graph self.gpu_ctx.graph = None # Optionally clear pinned memory # if self.gpu_ctx.pinned_mem: del self.gpu_ctx.pinned_mem; self.gpu_ctx.pinned_mem = None cp.get_default_memory_pool().free_all_blocks() cp.get_default_pinned_memory_pool().free_all_blocks() end_total_time = time.time() print(f"Criação do vídeo concluída em {time.strftime('%H:%M:%S', time.gmtime(end_total_time - start_total_time))}.") # --------------------------------------------------------------------------- # ============================= MAIN ====================================== def main(): start_main_time = time.time() config = DEFAULT_CONFIG.copy() # Start with defaults # --- Argument Parsing (Example - replace with your preferred method) --- # import argparse # parser = argparse.ArgumentParser(description="Karaoke Video Creator") # parser.add_argument("-s", "--subtitles", default=config["default_subtitle_file"], help="Input subtitle file (PSV format)") # parser.add_argument("-a", "--audio", default="audio.wav", help="Input audio file") # parser.add_argument("-o", "--output", default=config["default_output_file"], help="Output video file") # parser.add_argument("--font", default=config["font_path"], help="Path to TTF font file") # # Add other config options as needed # args = parser.parse_args() # # # Update config from args # config["default_subtitle_file"] = args.subtitles # config["default_output_file"] = args.output # config["font_path"] = args.font # audio_file = args.audio # subtitle_file = args.subtitles # output_file = args.output # --- Using defaults defined in DEFAULT_CONFIG for simplicity --- subtitle_file = config.get("default_subtitle_file", "legenda.psv") output_file = config.get("default_output_file", "video_karaoke_char_level.mp4") audio_file = "audio.wav" # Hardcoded for now, consider argparse print("--- Configuração ---") for key, value in config.items(): print(f" {key}: {value}") print(f" Arquivo de Legenda: {subtitle_file}") print(f" Arquivo de Áudio: {audio_file}") print(f" Arquivo de Saída: {output_file}") print("--------------------\n") # --- Basic Checks --- if not os.path.exists(subtitle_file): print(f"Erro Crítico: Arquivo de legenda '{subtitle_file}' não encontrado.") return if not os.path.exists(audio_file): print(f"Aviso: Arquivo de áudio '{audio_file}' não encontrado. Duração baseada nas legendas.") # --- Initialize GPU & CPU Affinity --- try: cp.cuda.Device(0).use() print(f"Using GPU: {cp.cuda.Device(0).pci_bus_id}") except cp.cuda.runtime.CUDARuntimeError as e: if 'no CUDA-capable device is detected' in str(e): print("Erro Crítico: Nenhuma GPU CUDA detectada.") elif 'CUDA driver version is insufficient' in str(e): print("Erro Crítico: Driver NVIDIA CUDA desatualizado.") else: print(f"Erro Crítico: Falha ao inicializar CUDA: {e}") return except Exception as e: print(f"Erro inesperado na inicialização da GPU: {e}") return try: process = psutil.Process() affinity = list(range(os.cpu_count())) process.cpu_affinity(affinity) print(f"Afinidade da CPU definida para todos os {len(affinity)} cores.") except (ImportError, AttributeError, OSError, ValueError) as e: print(f"Aviso: Não foi possível definir afinidade da CPU: {e}") # --- Initialize Core Components --- try: text_renderer = TextRenderer(config) syllable_dict, not_found_words = load_syllables() # Load syllable dictionary if not syllable_dict: print("Aviso: Dicionário de sílabas vazio ou não carregado.") subtitle_processor = SubtitleProcessor(text_renderer, config, syllable_dict, not_found_words) except Exception as e: print(f"Erro Crítico ao inicializar componentes: {e}") traceback.print_exc() return # --- Process Subtitles --- lines = [] long_pauses = [] try: video_width, _ = map(int, config["video_resolution"].split("x")) print("Processando legendas...") lines, long_pauses = subtitle_processor.process_subtitles_to_syllable_lines(subtitle_file, video_width) print(f"Processamento concluído: {len(lines)} linhas visuais, {len(long_pauses)} pausas longas detectadas.") if not lines and not long_pauses: print("Aviso: Nenhuma linha visual ou pausa longa encontrada. O vídeo pode ficar vazio ou curto.") # Decide if you want to exit here or proceed with potentially empty video # return if not_found_words: print(f"\nAviso: {len(not_found_words)} palavras não encontradas no dicionário de sílabas:") print(" ", ", ".join(sorted(list(not_found_words))[:20]) + ("..." if len(not_found_words) > 20 else "")) print(" -> Estas palavras serão destacadas como um todo.\n") except Exception as e: print(f"Erro Crítico ao processar legendas: {e}") traceback.print_exc() return # --- Create Video --- if not lines and not long_pauses: print("Nenhuma linha visual ou pausa longa para processar. Saindo.") return try: # KaraokeVideoCreator now initializes GPU context and CUDA processor internally karaoke_creator = KaraokeVideoCreator(config, text_renderer) print("Iniciando criação do vídeo...") karaoke_creator.create_video(lines, long_pauses, output_file, audio_file) except Exception as e: print(f"\nErro Crítico durante a criação do vídeo:") print(f"{type(e).__name__}: {e}") traceback.print_exc() print("\nO processo foi interrompido devido a um erro.") finally: # Final GPU memory cleanup cp.get_default_memory_pool().free_all_blocks() cp.get_default_pinned_memory_pool().free_all_blocks() print("Recursos da GPU liberados.") end_main_time = time.time() total_duration_str = time.strftime('%H:%M:%S', time.gmtime(end_main_time - start_main_time)) print(f"\nFinalizando Karaokê Creator. Tempo total do script: {total_duration_str}") if __name__ == "__main__": main()

Pensamento

Addressing CUDARuntimeError

The user seems to be working on debugging a CUDARuntimeError related to illegal memory access. The issue appears to occur in the _render_batch function, particularly when iterating over syl_meta["line_idx"][k]. The stack trace indicates that the error happens at this check when the batch size is greater than 1. The problem may be related to memory access issues, such as accessing arrays after they've been deallocated or using invalid indices. I should focus on checking array bounds and deallocation timing.

Fixing kernel bug in code

The error seems to stem from incorrect indexing within the elementwise kernel. The kernel passes cut_x_batch[i], but the indexing is wrong because i corresponds to pixel index, not batch index. This causes an out-of-bounds memory read. To fix this, consider broadcasting cut_x per frame or restructuring the kernel to avoid the elementwise kernel inside loops. A simpler fix

Thought for 32 seconds

Resposta

Por que o erro acontece

O cudaErrorIllegalAddress costuma aparecer quando um kernel acessa memória fora do limite do array que recebeu.
No seu “hot–spot” (_hl_kernel) isso acontece porque:

  1. A variável implícita i do ElementwiseKernel não representa o índice do frame, mas sim o índice absoluto do elemento que está sendo executado (pixel).

    cpp
    float current_cut_x = cut_x_batch[i]; // &larr; i pode ser maior que len(cut_x_batch)

    Como você passa um vector cut_x_batch com apenas B valores e lança o kernel sobre
    B &times; H &times; W elementos, a maior parte das threads lê fora dos limites -– memória ilegal.

  2. Depois que o kernel corrompe memória, a primeira operação que toca em syl_meta["line_idx"][k] (linha 665) quebra e a exceção “salta” aí.


Correção mínima (segura e rápida)

Execute o kernel uma vez por frame e passe cut_x como escalar, em vez de tentar indexá-lo dentro do kernel.

python
# 1️⃣ Redefina o kernel para receber apenas um corte escalar self._hl_kernel = cp.ElementwiseKernel( 'float32 cut_x, bool text_mask, uint16 X, uint16 Y, ' 'uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, ' 'float32 hl_r, float32 hl_g, float32 hl_b, ' 'float32 base_r, float32 base_g, float32 base_b', 'float32 io_r, float32 io_g, float32 io_b', r''' bool in_bbox = (X >= syl_x) && (X < syl_x + syl_w) && (Y >= syl_y) && (Y < syl_y + syl_h) && text_mask; if (in_bbox) { if (X < cut_x) { io_r = hl_r; io_g = hl_g; io_b = hl_b; // parte já “cantada” } else { io_r = base_r; io_g = base_g; io_b = base_b; // ainda por cantar } } ''', name='apply_syllable_highlight_rgb_oneframe' )
python
# 2️⃣ Dentro de _render_batch: for k in syl_indices_in_span: # um loop por sílaba if syl_meta["line_idx"][k] != active_line_idx: continue sx, sy, sw, sh = map(int, (syl_meta["x"][k], syl_meta["y"][k], syl_meta["w"][k], syl_meta["h"][k])) if sw == 0 or sh == 0: continue # shape: (B,) &rarr; percorremos frame-a-frame cuts = cut_x_batch_span[:, k - s0] # s0 é início da span for f in range(B): self._hl_kernel( cuts[f].astype(cp.float32), # 🔸 passa ESCALAR text_mask_bool, self.ctx.xg, self.ctx.yg, sx, sy, sw, sh, *self.ctx.hl_rgbf, *self.ctx.base_rgbf, inter_f[f, ..., 0], # canais do frame f inter_f[f, ..., 1], inter_f[f, ..., 2] )

Este ajuste garante:

  • Nenhum acesso fora dos limites (porque cada kernel “vê” só um valor cut_x).
  • Simples de entender/manter; a sobrecarga é irrelevante (B ≤ 128, por padrão).

Outras melhorias recomendadas

ProblemaComo resolver
syl_indices_in_span_host = cp.asnumpy(...) faz ida/volta CPU⇄GPU para cada batchFaça o loop inteiro em GPU: for k in syl_indices_in_span.tolist(): ... ou use int(k) diretamente (já está em device).
Muitas conversões copy() de imagensArmazene prev_render_data_for_fill como view (base_cp.copy() não é necessário; o array é imutável nesse ponto).
Pools não são esvaziadosDepois dos grandes loops: cp.get_default_memory_pool().free_all_blocks() já é suficiente; evite chamá-lo dentro de cada batch.
frames_per_batch fixo = 64Extraia da VRAM disponível (cp.cuda.runtime.memGetInfo) para usar lotes maiores quando possível.

Teste rápido

  1. Aplique o patch acima.
  2. Rode apenas 10 frames para validar (use flag --dry-run ou limite manual).
  3. Verifique uso de VRAM no nvidia-smi – não deve mais haver spikes/erros.
  4. Se tudo ok, volte ao vídeo completo.

Resumo: o kernel estava acessando além do vector de cortes.
Splittar a chamada por frame (ou redesenhar o kernel para 3-D launch) elimina o acesso ilegal e estabiliza o pipeline.

Compartilhar esta Q&A