fix this issues: ``` C:\Users\lucas\Downloads\Scr...
Criado em: 30 de abril de 2025
Criado em: 30 de abril de 2025
fix this issues:
textC:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda>PS_bak_new_clean-base.py --- Configuração --- font_path: C:/Users/lucas/LilitaOne-Regular.ttf font_size: 140 video_resolution: 1920x1080 video_fps: 60 base_text_color: #FFFFFF highlight_text_color: #ff0000 num_visible_lines: 4 upper_case: True background_image: capa.png frames_per_batch: 64 default_subtitle_file: legenda.psv default_output_file: video_karaoke_char_level.mp4 ffmpeg_preset: p4 ffmpeg_tune: hq ffmpeg_bitrate: 20M ffmpeg_codec: h264_nvenc vertical_shift_pixels: 130 min_char_duration: 0.01 cuda_graph_warmup_frames: 4 max_visual_fill_duration: 3.0 Arquivo de Legenda: legenda.psv Arquivo de Áudio: audio.wav Arquivo de Saída: video_karaoke_char_level.mp4 -------------------- Using GPU: 0000:01:00.0 Afinidade da CPU definida para todos os 12 cores. Processando legendas... Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado. Processamento concluído: 8 linhas visuais, 0 pausas longas detectadas. GPU initialized successfully. Iniciando criação do vídeo... Usando duração do áudio: 52.01s Duração estimada: 52.11s | Total de frames: 3127 FFmpeg command: ffmpeg -y -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt bgr24 -r 60 -i - -c:v h264_nvenc -preset p4 -b:v 20M -pix_fmt yuv420p -tune hq video_karaoke_char_level.mp4 Gerando vídeo: 0%| | 0/3127 [00:00<?, ?frames/s]Gerando 35 frames estáticos iniciais... Gerando vídeo: 1%|█ | 35/3127 [00:00<00:46, 66.63frames/s] Erro Crítico durante a criação do vídeo: CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1642, in main karaoke_creator.create_video(lines, long_pauses, output_file, audio_file) File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1411, in create_video self.cuda_processor.process_frames_streaming( File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 885, in process_frames_streaming self._render_batch(batch_frame_times, text_mask_bool, syl_meta, File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 665, in _render_batch if syl_meta["line_idx"][k] != active_line_idx: File "cupy\\_core\\core.pyx", line 1289, in cupy._core.core._ndarray_base.__nonzero__ File "cupy\\_core\\core.pyx", line 1910, in cupy._core.core._ndarray_base.get File "cupy\\cuda\\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async File "cupy_backends\\cuda\\api\\runtime.pyx", line 607, in cupy_backends.cuda.api.runtime.memcpyAsync File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered O processo foi interrompido devido a um erro. Gerando vídeo: 1%|█ | 35/3127 [00:00<01:15, 41.15frames/s] Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00> Traceback (most recent call last): File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Recursos da GPU liberados. Finalizando Karaokê Creator. Tempo total do script: 00:00:01 Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860> Traceback (most recent call last): File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860> Traceback (most recent call last): File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__ File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Exception ignored in: 'cupy.cuda.function.Module.__dealloc__' Traceback (most recent call last): File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Error in sys.excepthook: Original exception was:
code:
python""" karaoke_fast.py – versão otimizada Autor: ChatGPT (OpenAI), abr/2025 """ # ---------------------------- IMPORTS ------------------------------------ import cupy as cp import numpy as np from PIL import Image, ImageDraw, ImageFont import subprocess, threading, queue, math, os, time, psutil, traceback from tqdm import tqdm # --------------------------------------------------------------------------- # ------------------------ CONFIGURAÇÃO ----------------------------------- DEFAULT_CONFIG = { "font_path": "C:/Users/lucas/LilitaOne-Regular.ttf", "font_size": 140, "video_resolution": "1920x1080", "video_fps": 60, "base_text_color": "#FFFFFF", "highlight_text_color": "#ff0000", "num_visible_lines": 4, "upper_case": True, "background_image": "capa.png", "frames_per_batch": 64, # agora valor-mínimo "default_subtitle_file": "legenda.psv", "default_output_file": "video_karaoke_char_level.mp4", "ffmpeg_preset": "p4", "ffmpeg_tune": "hq", "ffmpeg_bitrate": "20M", "ffmpeg_codec": "h264_nvenc", "vertical_shift_pixels": 130, "min_char_duration": 0.01, "cuda_graph_warmup_frames": 4, # <-- novo "max_visual_fill_duration": 3.0, # <-- agora usado globalmente } # --------------------------------------------------------------------------- # ============================== UTILS ==================================== def hex_to_bgr_cupy(hex_color: str) -> cp.ndarray: hex_color = hex_color.lstrip('#') rgb = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4)) return cp.array(rgb[::-1], dtype=cp.uint8) def get_audio_duration(audio_file_path): if not os.path.exists(audio_file_path): print(f"Aviso: Arquivo de áudio não encontrado: {audio_file_path}") return None try: command = [ "ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", audio_file_path ] result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, check=True) return float(result.stdout.strip()) except FileNotFoundError: print("Erro: ffprobe não encontrado. Certifique-se de que o FFmpeg está no PATH.") return None except Exception as e: print(f"Erro ao obter duração do áudio: {e}") return None def load_syllables(filepath="syllables.txt"): syllable_dict = {} not_found_words = set() try: with open(filepath, 'r', encoding='utf-8') as f: for line in f: line = line.strip() if line and '|' in line: word, syllables = line.split('|', 1) syllable_dict[word.strip().lower()] = syllables.strip() except FileNotFoundError: print(f"Aviso: Arquivo de sílabas '{filepath}' não encontrado.") except Exception as e: print(f"Erro ao carregar sílabas: {e}") return syllable_dict, not_found_words # --------------------------------------------------------------------------- # ====================== TEXT RENDERER (SEM ALTERAÇÕES) =================== class TextRenderer: def __init__(self, config): self.config = config self.font_path = config["font_path"] self.font_size = config["font_size"] self.num_visible_lines = config["num_visible_lines"] self.upper_case = config["upper_case"] self.base_text_color = config["base_text_color"] self._font_cache = {} try: self.font = ImageFont.truetype(self.font_path, self.font_size) self._font_cache[self.font_size] = self.font temp_img = Image.new("RGB", (1, 1)) temp_draw = ImageDraw.Draw(temp_img) space_bbox = temp_draw.textbbox((0, 0), " ", font=self.font) try: self.space_width_ref = temp_draw.textlength(" ", font=self.font) except AttributeError: self.space_width_ref = space_bbox[2] - space_bbox[0] if space_bbox else int(self.font_size * 0.25) try: sample_bbox = self.font.getbbox("Tg") self.line_height_ref = sample_bbox[3] - sample_bbox[1] except AttributeError: sample_bbox_fallback = temp_draw.textbbox((0, 0), "Tg", font=self.font) self.line_height_ref = sample_bbox_fallback[3] - sample_bbox_fallback[1] if sample_bbox_fallback else int(self.font_size * 1.2) del temp_draw, temp_img except Exception as e: print(f"Aviso: Falha ao carregar fonte '{self.font_path}'. Usando padrão. Erro: {e}") self.font = ImageFont.load_default() try: bbox = self.font.getbbox("M"); self.font_size = bbox[3] - bbox[1] except AttributeError: self.font_size = 20 self._font_cache[self.font_size] = self.font temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img) try: self.space_width_ref = temp_draw.textlength(" ", font=self.font) except AttributeError: self.space_width_ref = 10 try: bbox = self.font.getbbox("Tg"); self.line_height_ref = bbox[3] - bbox[1] except AttributeError: self.line_height_ref = 30 del temp_draw, temp_img spacing_multiplier = 1.0 if self.num_visible_lines <= 1 else (0.8 if self.num_visible_lines == 2 else (0.6 if self.num_visible_lines == 3 else 0.4)) self.line_spacing = max(0, int(self.line_height_ref * spacing_multiplier)) def _get_font_with_size(self, size: int) -> ImageFont.FreeTypeFont: size = max(1, int(size)) if size in self._font_cache: return self._font_cache[size] try: f = ImageFont.truetype(self.font_path, size) except Exception: f = ImageFont.load_default() self._font_cache[size] = f return f def _calculate_line_width(self, line_elements, draw, font) -> int: width_total = 0 for _, _, txt, _ in line_elements: width_total += self._get_element_width(draw, txt, font) return width_total def _get_element_width(self, draw, text, font): if text == " ": return self.space_width_ref try: return draw.textlength(text, font=font) except AttributeError: try: bbox = draw.textbbox((0, 0), text, font=font); return bbox[2] - bbox[0] if bbox else 0 except AttributeError: try: width, _ = draw.textsize(text, font=font); return width except AttributeError: font_size_est = getattr(font, 'size', self.font_size // 2) return len(text) * (font_size_est // 2) except Exception: font_size_est = getattr(font, 'size', self.font_size // 2) return len(text) * (font_size_est // 2) def render_text_images(self, displayed_content, active_line_local_idx, width, height): img_base = Image.new("RGB", (width, height), (0, 0, 0)) img_mask = Image.new("L", (width, height), 0) draw_base = ImageDraw.Draw(img_base) draw_mask = ImageDraw.Draw(img_mask) max_allowed_width = int(width * 0.90) min_font_size = max(10, int(self.font_size * 0.60)) line_render_data = [] for global_idx, line_elements in displayed_content: if not (line_elements and global_idx is not None): line_render_data.append(None) continue font_line_size = self.font_size font_line = self._get_font_with_size(font_line_size) line_width_px = self._calculate_line_width(line_elements, draw_base, font_line) reduction_step = max(1, int(self.font_size * 0.05)) while line_width_px > max_allowed_width and font_line_size > min_font_size: font_line_size = max(min_font_size, font_line_size - reduction_step) font_line = self._get_font_with_size(font_line_size) line_width_px = self._calculate_line_width(line_elements, draw_base, font_line) if font_line_size == min_font_size: break try: h_ref = font_line.getbbox("Tg"); line_height_px = h_ref[3] - h_ref[1] except Exception: line_height_px = int(self.line_height_ref * (font_line_size / self.font_size)) line_render_data.append({"font": font_line, "font_size": font_line_size, "height": line_height_px, "width": line_width_px, "elements": line_elements, "global_idx": global_idx}) vertical_shift = self.config.get("vertical_shift_pixels", 0) block_height_ref = self.num_visible_lines * self.line_height_ref + (self.num_visible_lines - 1) * self.line_spacing start_y_ref = max(0, (height - block_height_ref) // 2 + vertical_shift) line_start_y_positions = [int(start_y_ref + i * (self.line_height_ref + self.line_spacing)) for i in range(self.num_visible_lines)] all_syllable_render_info = [] active_syllable_indices = (-1, -1) current_global_syl_idx = 0 sentence_end_punctuation = ".!?" for local_idx, render_info in enumerate(line_render_data): if render_info is None: continue font_line = render_info["font"] line_width_px = render_info["width"] elements_in_line = render_info["elements"] current_global_line_idx = render_info["global_idx"] is_active_line = (local_idx == active_line_local_idx) if is_active_line: active_syllable_start_idx_global = current_global_syl_idx line_start_x = (width - line_width_px) // 2 current_x = float(line_start_x) line_y_draw = line_start_y_positions[local_idx] if line_y_draw is None: continue for i, (start_time, end_time, element_text, _) in enumerate(elements_in_line): element_width = self._get_element_width(draw_base, element_text, font_line) if not element_text.isspace(): stripped_text = element_text.rstrip() is_sentence_end = bool(stripped_text and stripped_text[-1] in sentence_end_punctuation) try: draw_x, draw_y = int(current_x), line_y_draw element_text_base = element_text_mask = element_text if is_sentence_end and element_text.rstrip().endswith('.'): element_text_mask = element_text_mask.rstrip('.') draw_base.text((draw_x, draw_y), element_text_base, font=font_line, fill=self.base_text_color) if element_text_mask: draw_mask.text((draw_x, draw_y), element_text_mask, font=font_line, fill=255) final_bbox = draw_base.textbbox((draw_x, draw_y), element_text, font=font_line) if final_bbox: bbox_left, bbox_top, bbox_right, bbox_bottom = final_bbox syl_w_actual, syl_h_actual = bbox_right - bbox_left, bbox_bottom - bbox_top bbox_top_final = bbox_top else: line_height_px_fallback = render_info["height"] bbox_left, bbox_top_final = draw_x, draw_y syl_w_actual, syl_h_actual = element_width, line_height_px_fallback except Exception as e: print(f"Fallback render/bbox for: {element_text}. Err: {e}") draw_x, draw_y = int(current_x), line_y_draw try: draw_base.text((draw_x, draw_y), element_text, font=font_line, fill=self.base_text_color) draw_mask.text((draw_x, draw_y), element_text, font=font_line, fill=255) except Exception as draw_err: print(f" -> Falha até no fallback: {draw_err}") line_height_px_fallback = render_info["height"] bbox_left, bbox_top_final = draw_x, draw_y syl_w_actual, syl_h_actual = element_width, line_height_px_fallback all_syllable_render_info.append((start_time, end_time, bbox_left, bbox_top_final, syl_w_actual, syl_h_actual, current_global_line_idx, is_sentence_end)) current_global_syl_idx += 1 current_x += element_width if is_active_line: active_syllable_end_idx_global = current_global_syl_idx active_syllable_indices = (active_syllable_start_idx_global, active_syllable_end_idx_global) base_cp = cp.asarray(np.array(img_base)) mask_cp = cp.asarray(np.array(img_mask)) return base_cp, mask_cp, all_syllable_render_info, active_syllable_indices # --------------------------------------------------------------------------- # ========================== SUBTITLE PROCESSOR (SEM ALTERAÇÕES) =============== class SubtitleProcessor: def __init__(self, text_renderer: TextRenderer, config, syllable_dict, not_found_words_set): self.text_renderer = text_renderer self.config = config self.upper_case = config["upper_case"] self.font = self.text_renderer.font self.syllable_dict = syllable_dict self.not_found_words_set = not_found_words_set @staticmethod def _parse_time_string_float(time_str): try: return float(time_str) except (ValueError, TypeError): print(f"Aviso: Timestamp inesperado: {time_str}"); return None @staticmethod def read_subtitles(file): char_timing_data = [] try: with open(file, 'r', encoding='utf-8') as f: lines = f.readlines() if not lines: print(f"Aviso: Arquivo '{file}' vazio."); return [], [] header = lines[0].strip().upper() start_idx = 1 if header == "CHARACTER|START|END" else (0 if (header and '|' not in lines[0]) else 0) if start_idx==0 and header != "CHARACTER|START|END": print("Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado.") for line_num, line in enumerate(lines[start_idx:], start=start_idx + 1): if not line.strip(): continue parts = line.rstrip('\n\r').split('|') if len(parts) != 3: print(f"Aviso: Ignorando linha {line_num} mal formatada: '{line}'"); continue char, start_str, end_str = parts[0], parts[1].strip(), parts[2].strip() start_time = SubtitleProcessor._parse_time_string_float(start_str) end_time = SubtitleProcessor._parse_time_string_float(end_str) if start_time is None or end_time is None: print(f"Aviso: Ignorando linha {line_num} com timestamp inválido: '{line}'"); continue if not char: char = " " if end_time < start_time: print(f"Aviso: Corrigindo end<start na linha {line_num}: '{line}'"); end_time = start_time char_timing_data.append((start_time, end_time, str(char))) except FileNotFoundError: print(f"Erro: Arquivo PSV não encontrado: {file}"); return [], [] except Exception as e: print(f"Erro ao ler PSV: {e}"); traceback.print_exc(); return [], [] char_timing_data.sort(key=lambda x: x[0]) long_pauses = SubtitleProcessor._identify_long_pauses(char_timing_data) return char_timing_data, long_pauses @staticmethod def _identify_long_pauses(char_timing_data, min_pause_duration=5.0): pauses = [] if not char_timing_data: return pauses first_start = char_timing_data[0][0] if first_start >= min_pause_duration: pauses.append({"start": 0.0, "end": first_start, "duration": first_start, "type": "initial"}) for i in range(1, len(char_timing_data)): prev_end, curr_start = char_timing_data[i-1][1], char_timing_data[i][0] pause_dur = curr_start - prev_end if pause_dur >= min_pause_duration: is_covered = any(p["type"] == "initial" and p["end"] >= curr_start for p in pauses) if not is_covered: pauses.append({"start": prev_end, "end": curr_start, "duration": pause_dur, "type": "between"}) for i, (start, end, _) in enumerate(char_timing_data): char_dur = end - start if char_dur >= min_pause_duration: is_covered = any(abs(p["start"] - start) < 0.01 and abs(p["end"] - end) < 0.01 for p in pauses) if not is_covered: pauses.append({"start": start, "end": end, "duration": char_dur, "type": "during"}) pauses.sort(key=lambda x: x["start"]) return pauses def _group_chars_into_words(self, char_timing_data): words_spaces = [] current_word = [] for i, (start, end, char) in enumerate(char_timing_data): proc_char = char.upper() if self.upper_case else char if proc_char.isspace(): if current_word: words_spaces.append({"type": "word", "chars": current_word}); current_word = [] words_spaces.append({"type": "space", "start": start, "end": end}) else: current_word.append((start, end, proc_char)) if current_word: words_spaces.append({"type": "word", "chars": current_word}) return words_spaces def _process_words_into_syllables(self, words_and_spaces): syllable_data = [] temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img) font = self.text_renderer.font punc_strip, sent_end = ",.!?;:", ".!?" for element in words_and_spaces: if element["type"] == "space": syllable_data.append((element["start"], element["end"], " ", self.text_renderer.space_width_ref, False)) continue word_chars = element["chars"] if not word_chars: continue word_text = "".join([c[2] for c in word_chars]) cleaned_word = word_text.rstrip(punc_strip) lookup = cleaned_word.lower() if lookup in self.syllable_dict: syl_parts = self.syllable_dict[lookup].split('-') char_idx, orig_idx = 0, 0 word_syl_indices = [] for part in syl_parts: syl_len = len(part) if char_idx + syl_len > len(cleaned_word): if orig_idx < len(word_chars): rem_chars = word_chars[orig_idx:] rem_text = "".join([c[2] for c in rem_chars]) s_start, s_end = rem_chars[0][0], rem_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars) syllable_data.append((s_start, s_end, rem_text, s_width, False)) word_syl_indices.append(len(syllable_data) - 1) break syl_chars = word_chars[orig_idx : orig_idx + syl_len] if not syl_chars: continue s_text = "".join([c[2] for c in syl_chars]) s_start, s_end = syl_chars[0][0], syl_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in syl_chars) syllable_data.append((s_start, s_end, s_text, s_width, False)) word_syl_indices.append(len(syllable_data) - 1) char_idx += syl_len; orig_idx += syl_len if orig_idx < len(word_chars): # Handle trailing punctuation rem_chars = word_chars[orig_idx:] rem_text = "".join([c[2] for c in rem_chars]) expected_punc = word_text[len(cleaned_word):] if rem_text == expected_punc and word_syl_indices: # Append to last syllable last_idx = word_syl_indices[-1] ls_start, _, ls_text, _, _ = syllable_data[last_idx] new_text = ls_text + rem_text new_end = rem_chars[-1][1] new_width = self.text_renderer._get_element_width(temp_draw, new_text, font) syllable_data[last_idx] = (ls_start, new_end, new_text, new_width, False) else: # Create new syllable for remaining rem_start, rem_end = rem_chars[0][0], rem_chars[-1][1] rem_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars) syllable_data.append((rem_start, rem_end, rem_text, rem_width, False)) word_syl_indices.append(len(syllable_data) - 1) if word_syl_indices: # Mark sentence end flag on the *actual* last syllable final_syl_idx = word_syl_indices[-1] final_syl_text = syllable_data[final_syl_idx][2].rstrip() if final_syl_text and final_syl_text[-1] in sent_end: syllable_data[final_syl_idx] = syllable_data[final_syl_idx][:4] + (True,) else: # Word not in dictionary if lookup not in self.not_found_words_set and word_text.lower() == lookup: self.not_found_words_set.add(lookup) s_start, s_end = word_chars[0][0], word_chars[-1][1] s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in word_chars) is_end = word_text.rstrip()[-1] in sent_end if word_text.rstrip() else False syllable_data.append((s_start, s_end, word_text, s_width, is_end)) del temp_draw, temp_img syllable_data.sort(key=lambda x: x[0]) # Post-process end times (simpler version, maybe adjust if needed) processed_data = [] for i in range(len(syllable_data)): start, end, text, width, is_end = syllable_data[i] # simplified end time logic for now processed_data.append((start, end, text, width, is_end)) return processed_data # Returning without the 'next_syl_start' for now def group_syllables_into_lines(self, syllable_timing_data, video_width): lines = [] current_line = [] for syllable_tuple in syllable_timing_data: start, end, text, width, is_end = syllable_tuple # Adjusted unpacking current_line.append((start, end, text, width)) # Store width too if is_end: while current_line and current_line[-1][2].isspace(): current_line.pop() if current_line: lines.append(current_line) current_line = [] while current_line and current_line[-1][2].isspace(): current_line.pop() if current_line: lines.append(current_line) return lines def process_subtitles_to_syllable_lines(self, file, video_width): char_data, pauses = self.read_subtitles(file) if not char_data: return [], pauses words_spaces = self._group_chars_into_words(char_data) syl_data = self._process_words_into_syllables(words_spaces) if not syl_data: print("Aviso: Nenhum dado de sílaba gerado."); return [], pauses # Optional: Add debug print here if needed lines = self.group_syllables_into_lines(syl_data, video_width) return lines, pauses # --------------------------------------------------------------------------- # ========================== GPU CONTEXT ================================== class GPURenderContext: """ Mantém buffers, grades e gráficos CUDA persistentes para evitar alocações e capturar kernels repetitivos. """ def __init__(self, width: int, height: int, cfg): self.w, self.h = width, height self.cfg = cfg self.pool = cp.cuda.MemoryPool() # pool dedicado cp.cuda.set_allocator(self.pool.malloc) # Grades X/Y (uint16 já é suficiente p/ 4 K) yy, xx = cp.mgrid[:height, :width] self.xg = xx.astype(cp.uint16) self.yg = yy.astype(cp.uint16) del xx, yy # Buffers de saída duplos (double-buffer) self.batch_cap = 0 # será ajustado na 1.ª chamada self.out_a = self.out_b = None # Will hold cp.empty(...) arrays # Máscara de progresso (barra completa) bar_h = 20 bar_y0 = 10 self.bar_mask_full = ((self.yg >= bar_y0) & (self.yg < bar_y0 + bar_h)) # (H, W) bool # Cores (pre-calculated float32 RGB) self.base_rgbf = hex_to_bgr_cupy(cfg["base_text_color"])[::-1].astype(cp.float32)/255.0 hl_bgr = hex_to_bgr_cupy(cfg["highlight_text_color"]) self.hl_rgbf = hl_bgr[::-1].astype(cp.float32)/255.0 # Darken highlight color for progress bar background dark_hl_bgr = (hl_bgr.astype(cp.float32) * 0.4).clip(0, 255).astype(cp.uint8) self.bar_bg_rgbf= dark_hl_bgr[::-1].astype(cp.float32)/255.0 # Gráfico CUDA (capturado após warm-up) self.graph = None # Will hold instantiated CUDA Graph self.pinned_mem = None # Will hold pinned memory buffer self.pinned_mem_size = 0 # -------------- helpers -------------- def ensure_batch_buffers(self, n_frames: int): """Ensures output buffers `out_a` and `out_b` can hold `n_frames`.""" if n_frames <= self.batch_cap: return # Use power of 2 for potential alignment benefits self.batch_cap = int(2 ** math.ceil(math.log2(n_frames))) shape = (self.batch_cap, self.h, self.w, 3) # Free old buffers before creating new ones if they exist if self.out_a is not None: del self.out_a if self.out_b is not None: del self.out_b self.out_a = cp.empty(shape, dtype=cp.uint8) self.out_b = cp.empty(shape, dtype=cp.uint8) # Also ensure pinned memory is large enough for one buffer self.ensure_pinned_memory(self.out_a.nbytes) def ensure_pinned_memory(self, n_bytes: int): """Ensures pinned host memory `pinned_mem` is at least `n_bytes`.""" if n_bytes <= self.pinned_mem_size: return # Free old pinned memory if it exists if self.pinned_mem is not None: try: # Explicitly free pinned memory if CuPy version supports it well # cp.cuda.runtime.freeHost(self.pinned_mem.ptr) # Or similar if available del self.pinned_mem except Exception as e: print(f"Note: Could not explicitly free old pinned memory: {e}") self.pinned_mem = None # Ensure it's reset anyway self.pinned_mem_size = n_bytes self.pinned_mem = cp.cuda.alloc_pinned_memory(self.pinned_mem_size) def get_pinned_buffer(self, required_bytes: int) -> cp.cuda.PinnedMemory: """Gets the pinned memory buffer, ensuring it's large enough.""" self.ensure_pinned_memory(required_bytes) return self.pinned_mem # --------------------------------------------------------------------------- # ========================= CUDA PROCESSOR ================================ class CUDAProcessor: """ Versão repaginada: todas as operações puramente em GPU e capturadas em um CUDA Graph após o primeiro batch. """ def __init__(self, cfg, static_bg_rgb_cp, gpu_ctx: GPURenderContext): self.cfg = cfg self.ctx = gpu_ctx # Ensure background is float32 RGB (GPU likes RGB order usually) if static_bg_rgb_cp.shape[2] == 3: # Assuming input is RGB self.bg_f = static_bg_rgb_cp.astype(cp.float32) / 255.0 else: # Fallback if not RGB self.bg_f = cp.zeros((gpu_ctx.h, gpu_ctx.w, 3), dtype=cp.float32) self.min_dur = cfg.get("min_char_duration", 0.01) self.max_vis = cfg.get("max_visual_fill_duration", 3.0) # streams self.stream_compute = cp.cuda.Stream(non_blocking=True) self.stream_h2d = cp.cuda.Stream(non_blocking=True) # For D->H copy # Elementwise kernel para progressivo da sílaba ativa # Adjusted for RGB output and input self._hl_kernel = cp.ElementwiseKernel( # Positional arguments first 'raw float32 cut_x_batch, bool text_mask, \ uint16 X, uint16 Y, \ uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, \ float32 hl_r, float32 hl_g, float32 hl_b, \ float32 base_r, float32 base_g, float32 base_b', # in_params (positional) 'float32 io_r, float32 io_g, float32 io_b', # inout_params (positional) r""" // Check bounds and text mask just once per pixel bool is_syl_pixel = (X >= syl_x) && (X < syl_x + syl_w) && (Y >= syl_y) && (Y < syl_y + syl_h) && text_mask; if (is_syl_pixel) { // Determine color based on horizontal position relative to cut_x // cut_x_batch is specific to this frame (i is batch index) // Use 'cut_x_batch' instead of 'raw_cut_x_batch' here float current_cut_x = cut_x_batch[i]; // Access per-frame cut_x if (X < current_cut_x) { // Highlighted part io_r = hl_r; io_g = hl_g; io_b = hl_b; } else { // Base part (already drawn, but ensure it's base color) io_r = base_r; io_g = base_g; io_b = base_b; } } // Pixels outside the syllable or mask are left untouched (background/other text) """, # operation (positional) # Keyword arguments after positional ones name='apply_syllable_highlight_rgb', # We pass cut_x as raw so we can index it with `i` inside the kernel # preamble='#include <cuda_fp16.h>' # Not needed for float32 ) # ------------------ núcleo p/ um batch ------------------ def _render_batch(self, frame_times_f32: cp.ndarray, # (B,) frame times text_mask_bool: cp.ndarray, # (H,W) bool, where text *could* be syl_meta: dict, # dict of cp arrays for syllables active_syl_span: tuple, # (start_idx, end_idx) for active line active_line_idx: int, completed_line_pixels_mask: cp.ndarray, # (H,W) bool, pixels of completed lines bar_progress: cp.ndarray or None, # (B,) float32 (0-1) or None out_buf: cp.ndarray): # (B,H,W,3) uint8 - OUTPUT buffer """ Executa todo o pipeline para um batch dentro de stream_compute. Chamado dentro de um CUDA Graph depois do 1.º warm-up. OUTPUTS directly into `out_buf`. """ B = frame_times_f32.shape[0] H, W = self.ctx.h, self.ctx.w # 1. Start with background, broadcasted to batch size # Ensure bg_f is (1, H, W, 3) for broadcasting with (B,...) masks inter_f = cp.broadcast_to(self.bg_f[None,...], (B, H, W, 3)).copy() # (B,H,W,3) float32 # 2. Apply progress bar (if needed for this batch) if bar_progress is not None and cp.any(bar_progress > 0): # Optimization: check if any progress # Calculate fill width per frame fill_w = (bar_progress * W).astype(cp.uint16) # (B,) uint16 is enough # Expand grid X once: (1, 1, W) for broadcasting against (B, H, 1) X_grid_exp = self.ctx.xg[None, None, :] # Shape (1, 1, W) # Expand bar mask: (1, H, W) for broadcasting against (B,...) bar_mask_exp = self.ctx.bar_mask_full[None, :, :] # Shape (1, H, W) # Expand fill_w: (B, 1, 1) for broadcasting fill_w_exp = fill_w[:, None, None] # Shape (B, 1, 1) # Create masks for this batch using broadcasting # bar_fill_mask: Where bar should be filled (True/False) for each pixel in batch bar_fill_mask = bar_mask_exp & (X_grid_exp < fill_w_exp) # (B, H, W) # bar_bg_mask: Where bar background should be (True/False) bar_bg_mask = bar_mask_exp & (~bar_fill_mask) # (B, H, W) # Apply colors using boolean masks (expand masks to 3 channels) # Ensure colors are (1, 1, 1, 3) for broadcasting bar_bg_color_exp = self.ctx.bar_bg_rgbf[None, None, None, :] bar_fill_color_exp = self.ctx.hl_rgbf[None, None, None, :] cp.copyto(inter_f, bar_bg_color_exp, where=bar_bg_mask[..., None]) cp.copyto(inter_f, bar_fill_color_exp, where=bar_fill_mask[..., None]) # Cleanup intermediate broadcasted arrays if memory is tight del fill_w, X_grid_exp, bar_mask_exp, fill_w_exp, bar_fill_mask, bar_bg_mask del bar_bg_color_exp, bar_fill_color_exp # 3. Apply base text color everywhere text *could* be (overwrites background/bar) # Ensure base color is (1, 1, 1, 3) for broadcasting base_color_exp = self.ctx.base_rgbf[None, None, None, :] # Apply base color where text_mask_bool is True (expand mask to B and C) cp.copyto(inter_f, base_color_exp, where=text_mask_bool[None, ..., None]) del base_color_exp # Cleanup # 4. Highlight completed lines (overwrites base text color) if completed_line_pixels_mask is not None: # Ensure highlight color is (1, 1, 1, 3) for broadcasting hl_color_exp = self.ctx.hl_rgbf[None, None, None, :] # Apply highlight color where completed_line_pixels_mask is True (expand mask to B and C) cp.copyto(inter_f, hl_color_exp, where=completed_line_pixels_mask[None, ..., None]) del hl_color_exp # Cleanup # 5. Highlight active syllables (the complex part) s0, s1 = active_syl_span # Syllable indices for the currently active line [s0, s1) if s1 > s0 and syl_meta is not None: # Check if the *active_line_idx* actually corresponds to syllables in this span # This avoids running the kernel if the active line has no syllables in syl_meta # (Could happen with empty lines or data issues) active_line_has_syllables_in_span = cp.any(syl_meta["line_idx"][s0:s1] == active_line_idx) if active_line_has_syllables_in_span: # Prepare data for the kernel for *all* syllables in the span # Kernel will internally filter by active_line_idx if needed, but simpler is often faster syl_indices_in_span = cp.arange(s0, s1) # Indices within syl_meta arrays # Calculate visual progress (0.0 to 1.0) for each syllable in the span, for each frame in the batch # Shape: (B, len(span)) time_elapsed = frame_times_f32[:, None] - syl_meta["start"][syl_indices_in_span] visual_progress = cp.clip(time_elapsed / syl_meta["vis_dur"][syl_indices_in_span], 0.0, 1.0) # Calculate the horizontal cutoff point (x-coordinate) for highlighting # Shape: (B, len(span)) cut_x_batch_span = syl_meta["x"][syl_indices_in_span] + \ visual_progress * syl_meta["w"][syl_indices_in_span] # Iterate through syllables *in the active span* only num_syls_in_meta = syl_meta["line_idx"].shape[0] # Get actual size of the array syl_indices_in_span_host = cp.asnumpy(syl_indices_in_span) # Bring indices to host for easier iteration/debug # Use host iteration for k to easily check bounds for i, k_host in enumerate(syl_indices_in_span_host): # Convert k_host back to int if needed, though it should be int already k = int(k_host) # --- Explicit Bounds Check --- if k < 0 or k >= num_syls_in_meta: print(f"!!! FATAL ERROR: Index k ({k}) is out of bounds for syl_meta['line_idx'] (size {num_syls_in_meta}).") print(f" s0={s0}, s1={s1}, active_line_idx={active_line_idx}") # This indicates a logic error in how indices (s0, s1) or syl_meta are generated. # Raising an error is appropriate here. raise IndexError(f"Syllable index k={k} out of bounds (size={num_syls_in_meta})") # --- End Bounds Check --- # Filter: only process syllables belonging to the truly active line # Accessing syl_meta["line_idx"][k] happens here if syl_meta["line_idx"][k] != active_line_idx: continue # Extract syllable bounding box for kernel # Ensure k is still valid before accessing other parts of syl_meta # (redundant if the first check passed, but safe) if k < 0 or k >= syl_meta["x"].shape[0]: # Check against another array size for extra safety print(f"!!! WARNING: Index k ({k}) became invalid before accessing bbox?") continue sx, sy = syl_meta["x"][k], syl_meta["y"][k] sw, sh = syl_meta["w"][k], syl_meta["h"][k] # Skip if width or height is zero if sw <= 0 or sh <= 0: continue # Get the cut_x values specific to *this syllable* across the batch # Shape: (B,) # We need the index 'i' which corresponds to the position within the span current_syl_cut_x_batch = cut_x_batch_span[:, i] # Call the ElementwiseKernel # It modifies 'inter_f' in-place # Pass grids (H,W), masks (H,W), scalar bbox, scalar colors # Pass cut_x for the batch (B,) as 'raw' # Kernel uses 'i' (batch index) to get the correct cut_x self._hl_kernel(current_syl_cut_x_batch, # raw float32 (B,) text_mask_bool, # bool (H, W) self.ctx.xg, # uint16 (H, W) self.ctx.yg, # uint16 (H, W) sx, sy, sw, sh, # uint16 scalars self.ctx.hl_rgbf[0], self.ctx.hl_rgbf[1], self.ctx.hl_rgbf[2], # float32 scalars self.ctx.base_rgbf[0], self.ctx.base_rgbf[1], self.ctx.base_rgbf[2], # float32 scalars inter_f[..., 0], # float32 (B, H, W) R channel inter_f[..., 1], # float32 (B, H, W) G channel inter_f[..., 2]) # float32 (B, H, W) B channel # REMOVED: size=B * H * W) # Let CuPy infer the size # Cleanup batch-specific syllable calculations del time_elapsed, visual_progress, cut_x_batch_span if 'current_syl_cut_x_batch' in locals(): del current_syl_cut_x_batch # 6. Convert final float32 RGB to uint8 BGR and write to output buffer # Perform conversion and channel swap in one step if possible # Ensure output is contiguous for potential speedup if needed later out_buf[:] = (inter_f[..., ::-1] * 255.0).astype(cp.uint8) # Cleanup main intermediate buffer del inter_f # ----------------------------------------------------------------------- def process_frames_streaming(self, base_cp, mask_cp, syl_info, active_indices, video_writer, # FFmpegWriter instance video_lock, # Threading lock for FFmpegWriter fps, first_frame_idx, num_frames_to_process, # Renamed n_frames -> num_frames_to_process active_line_idx, completed_lines_set, long_pauses): # Renamed bar_pauses -> long_pauses """ Processes a sequence of frames, potentially using CUDA Graphs. Handles pre-calculation, batching, GPU execution, and H2D transfer. Writes frames to FFmpeg via the video_writer. """ if num_frames_to_process <= 0: print("Warning: process_frames_streaming called with num_frames_to_process <= 0.") return # Determine batch size (use config value, but ensure it's reasonable) # Could add dynamic sizing based on VRAM here later if needed batch_size = max(32, min(self.cfg.get("frames_per_batch", 64), 512)) # Example bounds # ---------- Pré-cálculo de dados constantes (for this whole segment) --------------- # Mask where text *could* appear (remains constant for this render call) text_mask_bool = (mask_cp > 128) if mask_cp is not None else cp.zeros((self.ctx.h, self.ctx.w), dtype=bool) # Create a combined mask of *all pixels* belonging to *any completed line* completed_line_pixels_mask = None if completed_lines_set and syl_info: # Find all syllable indices belonging to completed lines completed_syl_indices = [ idx for idx, s_info in enumerate(syl_info) if s_info[6] in completed_lines_set # s_info[6] is global_line_idx ] if completed_syl_indices: # Initialize mask to False completed_line_pixels_mask = cp.zeros_like(text_mask_bool) # Iterate through completed syllables and mark their pixels in the mask X_grid, Y_grid = self.ctx.xg, self.ctx.yg for idx in completed_syl_indices: s_start, s_end, sx, sy, sw, sh, _, _ = syl_info[idx] if sw > 0 and sh > 0: # Define the bounding box for the syllable syl_bbox_mask = (X_grid >= sx) & (X_grid < sx + sw) & \ (Y_grid >= sy) & (Y_grid < sy + sh) # Combine with the general text mask and OR into the final mask completed_line_pixels_mask |= (syl_bbox_mask & text_mask_bool) del X_grid, Y_grid, completed_syl_indices # Cleanup else: # No syllables found for the completed lines, mask remains None pass # Convert syllable info into a dictionary of CuPy arrays (if syllables exist) syl_meta = None if syl_info: num_syls = len(syl_info) if num_syls > 0: syl_meta = { # Ensure types are optimal (uint16 for coords/dims, int16 for index) "start" : cp.asarray([s[0] for s in syl_info], dtype=cp.float32), "end" : cp.asarray([s[1] for s in syl_info], dtype=cp.float32), "x" : cp.asarray([s[2] for s in syl_info], dtype=cp.uint16), "y" : cp.asarray([s[3] for s in syl_info], dtype=cp.uint16), "w" : cp.asarray([s[4] for s in syl_info], dtype=cp.uint16), "h" : cp.asarray([s[5] for s in syl_info], dtype=cp.uint16), "line_idx": cp.asarray([s[6] for s in syl_info], dtype=cp.int16), # is_sentence_end (s[7]) is not directly used in rendering kernel, but kept if needed elsewhere } # Calculate effective visual duration for highlighting (clipped) raw_duration = syl_meta["end"] - syl_meta["start"] # Apply clipping: duration is at least min_dur and at most max_vis syl_meta["vis_dur"] = cp.clip(raw_duration, self.min_dur, self.max_vis).astype(cp.float32) else: # syl_info was provided but was empty list pass else: # syl_info was None pass # ---------- Execução por batches --------------------------- self.ctx.ensure_batch_buffers(batch_size) # Ensure buffers A/B exist and are large enough outA, outB = self.ctx.out_a, self.ctx.out_b # Get references # Prepare for CUDA Graph capture warmup_frames_count = self.cfg.get("cuda_graph_warmup_frames", 4) graph_warmup_done = False graph_was_captured = self.ctx.graph is not None # Check if graph exists from previous calls total_processed_in_call = 0 current_batch_start_frame = first_frame_idx while total_processed_in_call < num_frames_to_process: # Determine size of the current batch remaining_frames = num_frames_to_process - total_processed_in_call current_batch_size = min(batch_size, remaining_frames) if current_batch_size <= 0: # Should not happen, but safety check break # Get the correct output buffer slice (ping-pong) # Use total_processed_in_call to determine A or B buffer consistently buffer_index = (total_processed_in_call // batch_size) % 2 out_buf_slice = outA[:current_batch_size] if buffer_index == 0 else outB[:current_batch_size] # Calculate frame indices and times for this specific batch batch_frame_indices = cp.arange( current_batch_start_frame, current_batch_start_frame + current_batch_size, dtype=cp.int32 ) batch_frame_times = batch_frame_indices.astype(cp.float32) / fps # Determine progress bar status for this batch batch_bar_progress = None if long_pauses: # Find if any frame in this batch falls within *any* long pause batch_in_pause = cp.zeros(current_batch_size, dtype=bool) batch_progress_values = cp.zeros(current_batch_size, dtype=cp.float32) for pause in long_pauses: pause_start, pause_end, pause_duration = pause["start"], pause["end"], pause["duration"] # Find frames within this pause's time range indices_in_pause = cp.where((batch_frame_times >= pause_start) & (batch_frame_times < pause_end))[0] if indices_in_pause.size > 0: batch_in_pause[indices_in_pause] = True # Calculate progress only for frames in *this* pause progress = (batch_frame_times[indices_in_pause] - pause_start) / max(pause_duration, 1e-6) batch_progress_values[indices_in_pause] = cp.clip(progress, 0.0, 1.0) # If any frame was in a pause, use the calculated progress values if cp.any(batch_in_pause): batch_bar_progress = batch_progress_values del batch_in_pause, batch_progress_values # Cleanup pause calculation temps # ---------------- Execute _render_batch (GPU) ----------------- can_use_graph = graph_was_captured and graph_warmup_done if can_use_graph: # --- Launch existing CUDA Graph --- # Note: We assume the graph was captured with the *maximum* batch size. # We are launching it for a potentially *smaller* batch size. # This is generally okay, but less efficient if sizes vary wildly. # TODO: Potentially re-capture graph if batch size changes significantly? # For simplicity now, we launch the existing one. # We need to ensure the *input data pointers* used by the graph are updated # if they change location (e.g., if batch_frame_times points to new memory). # However, CuPy's graph API often handles this if you pass the arrays directly. # We might need explicit graph.update() if issues arise. # Assuming _render_batch inputs are stable or handled by CuPy's graph launch: try: # Ensure inputs match what the graph expects (might need updates) # For now, assume launch handles it or inputs are stable enough self.ctx.graph.launch(self.stream_compute) # Launch on the compute stream except Exception as graph_launch_err: print(f"Error launching CUDA Graph: {graph_launch_err}. Falling back.") # Fallback to regular execution for this batch self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) # Consider disabling graph use for subsequent batches if it keeps failing # graph_was_captured = False # Option: disable graph if launch fails else: # --- Regular execution or Graph Warmup/Capture --- # Use the compute stream for the rendering task with self.stream_compute: self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) # Check if warmup period is over and graph hasn't been captured yet if not graph_was_captured and total_processed_in_call >= warmup_frames_count: print(f"Capturing CUDA Graph after {total_processed_in_call + current_batch_size} frames...") try: graph = cp.cuda.Graph() with graph.capture(stream=self.stream_compute): # Re-run the *last* batch's render call inside capture self._render_batch(batch_frame_times, text_mask_bool, syl_meta, active_indices, active_line_idx, completed_line_pixels_mask, batch_bar_progress, out_buf_slice) self.ctx.graph = graph.instantiate() graph_was_captured = True graph_warmup_done = True # Mark warmup as complete print("CUDA Graph captured successfully.") except Exception as graph_capture_err: print(f"Error capturing CUDA Graph: {graph_capture_err}. Graph will not be used.") self.ctx.graph = None # Ensure graph is None if capture failed graph_warmup_done = True # Still mark warmup done to prevent re-attempts # ----------------------------------------------------------------- # Synchronize the compute stream to ensure rendering is finished self.stream_compute.synchronize() # ---------- Asynchronous D->H Copy and Write to FFmpeg ----------- # Get the output buffer (which is BGR uint8) output_bgr_gpu = out_buf_slice # Already in BGR uint8 from _render_batch # Ensure pinned memory is ready required_bytes = output_bgr_gpu.nbytes pinned_buffer = self.ctx.get_pinned_buffer(required_bytes) # Perform async D->H copy using the H2D stream (stream_h2d is just a name here) # Source: GPU buffer, Dest: Pinned Host buffer cp.cuda.runtime.memcpyAsync( pinned_buffer.ptr, # Destination: Pinned host memory pointer output_bgr_gpu.data.ptr, # Source: GPU memory pointer required_bytes, # Size in bytes cp.cuda.runtime.memcpyDeviceToHost, # Direction self.stream_h2d.ptr # Stream to use for the copy ) # Synchronize the H2D stream to ensure copy is finished self.stream_h2d.synchronize() # Get a NumPy view of the pinned memory (no copy involved here) frame_data_np = np.frombuffer(pinned_buffer, dtype=np.uint8, count=required_bytes) # Reshape to (current_batch_size, H, W, 3) - FFmpeg expects this BGR format frames_to_write = frame_data_np.reshape(current_batch_size, self.ctx.h, self.ctx.w, 3) # Write the batch to FFmpeg using the provided writer and lock try: with video_lock: # Iterate and write frame by frame if writer expects single frames # Or write the whole batch if writer supports it (check FFmpegWriter impl.) # Assuming FFmpegWriter's write handles a single frame numpy array: for frame_idx in range(current_batch_size): video_writer.write(frames_to_write[frame_idx]) # If FFmpegWriter could handle the whole batch byte buffer: # video_writer.write_bytes(pinned_buffer.tobytes()[:required_bytes]) # Hypothetical faster write except Exception as write_err: print(f"Error writing batch to FFmpeg: {write_err}") # Decide how to handle: stop processing, log, etc. # For now, just print and continue, but might need better error handling. # Update counters total_processed_in_call += current_batch_size current_batch_start_frame += current_batch_size # Minor optimization: Check if warmup is complete after the batch is processed if not graph_warmup_done and total_processed_in_call >= warmup_frames_count: graph_warmup_done = True # Optional: Free memory aggressively if needed (usually handled by pool) # cp.get_default_memory_pool().free_all_blocks() # cp.get_default_pinned_memory_pool().free_all_blocks() # --- End of batch loop --- # Cleanup pre-calculated data if it consumes significant memory del text_mask_bool if completed_line_pixels_mask is not None: del completed_line_pixels_mask if syl_meta: for key in list(syl_meta.keys()): del syl_meta[key] del syl_meta # Pinned memory buffer (`self.ctx.pinned_mem`) is kept for reuse # --------------------------------------------------------------------------- # ===================== FFMPEG WRITER (Mostly Unchanged) =================== class FFmpegWriter: def __init__(self, output_file, width, height, fps, config, gpu_ctx: GPURenderContext = None): self.output_file = output_file self.config = config self.gpu_ctx = gpu_ctx # Optional context for pinned memory reuse self.width = width self.height = height self.frame_size_bytes = width * height * 3 # BGR24 ffmpeg_cmd = [ "ffmpeg", "-y", "-f", "rawvideo", "-vcodec", "rawvideo", "-s", f"{width}x{height}", "-pix_fmt", "bgr24", "-r", str(fps), "-i", "-", # Input from stdin # Video codec options "-c:v", config.get("ffmpeg_codec", "libx264"), # Default to libx264 if not specified "-preset", config.get("ffmpeg_preset", "medium"), "-b:v", config.get("ffmpeg_bitrate", "5M"), # Default bitrate "-pix_fmt", "yuv420p", # Common pixel format for compatibility "-tune", config.get("ffmpeg_tune", "fastdecode"), # Example tune # Output file output_file ] print(f"FFmpeg command: {' '.join(ffmpeg_cmd)}") # Log the command # Use a large buffer for stdin if possible bufsize = 10**8 # ~100MB buffer try: self.ffmpeg_process = subprocess.Popen( ffmpeg_cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, # Suppress stdout unless needed for debugging stderr=subprocess.PIPE, # Capture stderr bufsize=bufsize ) except FileNotFoundError: print("ERROR: ffmpeg command not found. Is FFmpeg installed and in your PATH?") raise except Exception as e: print(f"ERROR: Failed to start FFmpeg process: {e}") raise # Thread to read stderr without blocking self.stderr_queue = queue.Queue() self.stderr_thread = threading.Thread(target=self._read_stderr, daemon=True) self.stderr_thread.start() def _read_stderr(self): """Reads FFmpeg's stderr output line by line and puts it in a queue.""" try: # Use iter to read lines efficiently for line in iter(self.ffmpeg_process.stderr.readline, b''): self.stderr_queue.put(line.decode('utf-8', errors='replace').strip()) except Exception as e: # Handle exceptions during stderr reading if necessary self.stderr_queue.put(f"Error reading FFmpeg stderr: {e}") finally: # Signal that reading is done (optional) self.stderr_queue.put(None) # Sentinel value def write(self, frame: np.ndarray): """Writes a single frame (NumPy array, HxWxC BGR uint8) to FFmpeg.""" if self.ffmpeg_process.stdin.closed: print("Warning: Attempted to write to closed FFmpeg stdin.") return try: # Ensure frame is numpy and bytes if isinstance(frame, cp.ndarray): frame = cp.asnumpy(frame) # Convert CuPy array if necessary # Check shape and type? Optional, depends on strictness needed. # if frame.shape != (self.height, self.width, 3) or frame.dtype != np.uint8: # print(f"Warning: Invalid frame shape/type: {frame.shape} {frame.dtype}") # # Handle error? resize/convert? skip? # return self.ffmpeg_process.stdin.write(frame.tobytes()) except (OSError, BrokenPipeError) as e: print(f"ERROR writing frame to FFmpeg: {e}. FFmpeg might have terminated.") self.release() # Attempt cleanup raise # Re-raise the exception def release(self): """Closes FFmpeg stdin, waits for the process, and prints stderr.""" if self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed: try: self.ffmpeg_process.stdin.close() except OSError as e: print(f"Warning: Error closing FFmpeg stdin: {e}") # Wait for the process to finish return_code = self.ffmpeg_process.wait() # Wait for the stderr reader thread to finish self.stderr_thread.join(timeout=2.0) # Add timeout # Print accumulated stderr messages print("\n--- FFmpeg stderr ---") while not self.stderr_queue.empty(): line = self.stderr_queue.get() if line is None: break # Sentinel hit print(line) print("--- End FFmpeg stderr ---\n") if return_code != 0: print(f"Warning: FFmpeg process exited with non-zero status: {return_code}") else: print("FFmpeg process finished successfully.") # --------------------------------------------------------------------------- # ===================== KARAOKE VIDEO CREATOR (Adjusted) =================== class KaraokeVideoCreator: def __init__(self, config, text_renderer: TextRenderer): self.config = config self.fps = config["video_fps"] self.text_renderer = text_renderer self.num_visible_lines = config["num_visible_lines"] try: width, height = map(int, self.config["video_resolution"].split("x")) except ValueError: print(f"Aviso: Resolução inválida '{self.config['video_resolution']}'. Usando 1920x1080.") width, height = 1920, 1080 self.width = width self.height = height # Load background (unchanged logic) self.static_bg_frame_rgb_cp = None # Will hold CuPy array self.static_bg_frame_bgr_np = None # Fallback/initial frame use bg_path = config.get("background_image") try: if bg_path and os.path.exists(bg_path): bg_img = Image.open(bg_path).convert("RGB").resize((width, height), Image.Resampling.LANCZOS) self.static_bg_frame_rgb_cp = cp.asarray(np.array(bg_img)) # RGB for GPU self.static_bg_frame_bgr_np = np.array(bg_img)[:, :, ::-1].copy() # BGR for OpenCV/FFmpeg initial else: raise FileNotFoundError("Background not found or not specified") except Exception as e: print(f"Aviso: Falha ao carregar fundo '{bg_path}': {e}. Usando fundo preto.") self.static_bg_frame_rgb_cp = cp.zeros((height, width, 3), dtype=cp.uint8) self.static_bg_frame_bgr_np = np.zeros((height, width, 3), dtype=np.uint8) # Initialize GPU Context and CUDA Processor *here* self.init_gpu() # Ensures GPU is ready self.gpu_ctx = GPURenderContext(self.width, self.height, self.config) self.cuda_processor = CUDAProcessor( self.config, self.static_bg_frame_rgb_cp, # Pass the RGB CuPy background self.gpu_ctx # Pass the shared GPU context ) def init_gpu(self): """Initializes the GPU device and memory pool (minimal version).""" try: cp.cuda.Device(0).use() # Memory pool is handled by GPURenderContext now cp.cuda.Stream.null.synchronize() _ = cp.zeros(1) # Warm-up allocation print("GPU initialized successfully.") except cp.cuda.runtime.CUDARuntimeError as e: print(f"Erro Crítico: Falha ao inicializar CUDA: {e}") raise except Exception as e: print(f"Erro Crítico: Falha inesperada na inicialização da GPU: {e}") raise def _get_next_global_indices(self, current_displayed_indices, count=2): """Finds the next `count` global line indices not currently displayed.""" valid_indices = {idx for idx in current_displayed_indices if idx is not None} max_existing_idx = max(valid_indices) if valid_indices else -1 next_indices = [] candidate_idx = max_existing_idx + 1 while len(next_indices) < count: next_indices.append(candidate_idx) candidate_idx += 1 return next_indices def create_video(self, syllable_lines, long_pauses, output_file, audio_file_path): """Creates the karaoke video using the optimized CUDA processor.""" start_total_time = time.time() width, height = self.width, self.height N = self.num_visible_lines # Number of slots on screen # --- Determine video duration --- audio_duration = get_audio_duration(audio_file_path) last_syl_end_time = 0.0 first_syl_start_time = 0.0 if syllable_lines: # Find last valid line and its end time last_valid_line_idx = next((idx for idx in range(len(syllable_lines) - 1, -1, -1) if syllable_lines[idx]), -1) if last_valid_line_idx != -1 and syllable_lines[last_valid_line_idx]: # Get end time of the last element in the last valid line last_syl_end_time = syllable_lines[last_valid_line_idx][-1][1] # index 1 is end_time # Find first valid line and its start time first_valid_line_idx = next((idx for idx, ln in enumerate(syllable_lines) if ln), -1) if first_valid_line_idx != -1 and syllable_lines[first_valid_line_idx]: first_syl_start_time = syllable_lines[first_valid_line_idx][0][0] # index 0 is start_time else: print("Aviso: Nenhuma linha de sílaba para processar.") last_syl_end_time = 1.0 # Default minimum duration if no syllables # Set video end time based on audio or subtitles, adding a small buffer video_end_time = last_syl_end_time + 0.5 # Default end based on subtitles if audio_duration is not None: video_end_time = max(video_end_time, audio_duration + 0.1) print(f"Usando duração do áudio: {audio_duration:.2f}s") else: print(f"Aviso: Sem duração de áudio. Vídeo terminará em {video_end_time:.2f}s (baseado nas legendas).") total_frames = math.ceil(video_end_time * self.fps) if total_frames <= 0: print("Erro: Duração total do vídeo calculada como 0 ou negativa. Abortando.") return print(f"Duração estimada: {video_end_time:.2f}s | Total de frames: {total_frames}") # --- Initialize FFmpeg Writer --- try: # Pass gpu_ctx if FFmpegWriter can use it for pinned memory video_writer = FFmpegWriter(output_file, width, height, self.fps, self.config, self.gpu_ctx) except Exception as e: print(f"Erro Crítico: Falha ao inicializar FFmpegWriter: {e}") return video_lock = threading.Lock() # Lock for thread-safe writing to FFmpeg process # --- Main Processing Loop --- current_frame_index = 0 num_total_lines = len(syllable_lines) pbar = tqdm(total=total_frames, unit="frames", desc="Gerando vídeo", bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}{postfix}]") # State variables for line display logic # displayed_content format: list of N tuples: (global_line_idx | None, line_data | None) displayed_content = [] for idx in range(N): line_data = syllable_lines[idx] if idx < num_total_lines else None displayed_content.append((idx if line_data else None, line_data)) completed_global_line_indices = set() # Keep track of lines fully processed prev_render_data_for_fill = None # Store last valid render data for filling gaps # --- Initial Static Frames (before first syllable) --- initial_static_frames = 0 if first_syl_start_time > 0.01: # Add a small tolerance initial_static_frames = max(0, int(first_syl_start_time * self.fps)) if initial_static_frames > 0: print(f"Gerando {initial_static_frames} frames estáticos iniciais...") try: # Render the initial state (first N lines, no highlight) initial_base_cp, initial_mask_cp, initial_syl_info, _ = self.text_renderer.render_text_images( displayed_content, -1, width, height # -1 indicates no active line ) if initial_base_cp is not None and initial_mask_cp is not None: # Use process_frames_streaming to render static frames # Pass empty syllable info or handle appropriately in CUDAProcessor if needed self.cuda_processor.process_frames_streaming( initial_base_cp, initial_mask_cp, [], # No active syllables initially (-1, -1), video_writer, video_lock, self.fps, 0, initial_static_frames, # Frame range -1, # No active line index set(), # No completed lines initially long_pauses # Pass pauses for potential initial progress bar ) prev_render_data_for_fill = (initial_base_cp.copy(), initial_mask_cp.copy()) # Store for potential gap filling del initial_base_cp, initial_mask_cp, initial_syl_info else: print("Aviso: Renderização inicial falhou. Preenchendo com background.") # Fallback: write raw background frames with video_lock: for _ in range(initial_static_frames): video_writer.write(self.static_bg_frame_bgr_np) prev_render_data_for_fill = None pbar.update(initial_static_frames) current_frame_index = initial_static_frames except Exception as e: print(f"Erro ao gerar frames iniciais: {e}") traceback.print_exc() # Fallback if rendering fails with video_lock: for _ in range(initial_static_frames): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(initial_static_frames) current_frame_index = initial_static_frames prev_render_data_for_fill = None # --- Process Each Line/Sentence --- last_line_processed_frame = current_frame_index trigger_1_pending_for_line = -1 # Track if top needs update trigger_2_pending = False # Track if bottom needs update last_trigger_1_line_completed = -1 last_trigger_2_line_completed = -1 for current_global_line_idx, current_line_syllables in enumerate(syllable_lines): if not current_line_syllables: print(f"Aviso: Pulando linha global vazia {current_global_line_idx}") continue line_start_time = current_line_syllables[0][0] line_end_time = current_line_syllables[-1][1] line_start_frame = int(line_start_time * self.fps) # --- Handle Line Transitions and Content Updates --- # Check if Trigger 2 (bottom update) is pending from the *previous* line completion if trigger_2_pending and current_global_line_idx != last_trigger_2_line_completed: print(f"Trigger 2: Atualizando slots inferiores antes da linha {current_global_line_idx}") current_indices_on_screen = [content[0] for content in displayed_content] next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2 available new_idx_bottom1 = next_indices[0] new_data_bottom1 = syllable_lines[new_idx_bottom1] if new_idx_bottom1 < num_total_lines else None displayed_content[N-2] = (new_idx_bottom1 if new_data_bottom1 else None, new_data_bottom1) new_idx_bottom2 = next_indices[1] new_data_bottom2 = syllable_lines[new_idx_bottom2] if new_idx_bottom2 < num_total_lines else None displayed_content[N-1] = (new_idx_bottom2 if new_data_bottom2 else None, new_data_bottom2) trigger_2_pending = False # Reset trigger last_trigger_2_line_completed = current_global_line_idx # Mark update time # Find the local slot index for the current global line active_local_idx = -1 for local_idx, (global_idx, _) in enumerate(displayed_content): if global_idx == current_global_line_idx: active_local_idx = local_idx break if active_local_idx == -1: print(f"ERRO FATAL: Linha ativa {current_global_line_idx} não encontrada nos slots! {displayed_content}") # This indicates a logic error in content updates. Need to decide how to handle. # Option 1: Try to recover (e.g., force it into the last slot?) - Risky # Option 2: Abort or skip processing this line and fill time - Safer # For now, fill time and skip processing the line's content frames_to_fill_until_next = 0 next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] next_start_frame = int(next_line_start_time * self.fps) frames_to_fill_until_next = max(0, next_start_frame - current_frame_index) else: # No more lines frames_to_fill_until_next = max(0, total_frames - current_frame_index) if frames_to_fill_until_next > 0: print(f" Preenchendo {frames_to_fill_until_next} frames devido à linha não encontrada...") # Use previous render data if available for filling if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, frames_to_fill_until_next, -1, completed_global_line_indices, long_pauses ) else: # Fallback to background with video_lock: for _ in range(frames_to_fill_until_next): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(frames_to_fill_until_next) current_frame_index += frames_to_fill_until_next continue # Skip to the next line in syllable_lines # --- Fill Gap Before Line Starts (if needed) --- frames_to_fill_gap = max(0, line_start_frame - current_frame_index) if frames_to_fill_gap > 0: print(f"Preenchendo {frames_to_fill_gap} frames de gap antes da linha {current_global_line_idx}...") if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, frames_to_fill_gap, -1, completed_global_line_indices, long_pauses # No active line during gap ) else: # Fallback with video_lock: for _ in range(frames_to_fill_gap): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(frames_to_fill_gap) current_frame_index += frames_to_fill_gap # --- Render the current state with the active line --- render_success = False render_data = None try: # Render base text and mask for the current arrangement of lines render_data = self.text_renderer.render_text_images( displayed_content, active_local_idx, width, height ) base_cp, mask_cp, all_syl_info, active_indices = render_data if base_cp is None or mask_cp is None: raise ValueError("TextRenderer returned None for base or mask.") render_success = True except Exception as e: print(f"Erro Crítico ao renderizar texto para linha {current_global_line_idx}: {e}") traceback.print_exc() render_success = False # Fallback handled below # --- Process the frames for this line's duration --- if render_success: base_cp, mask_cp, all_syl_info, active_indices = render_data # Determine the end frame for processing this line's active state # Usually ends when the next line starts, or at video end next_line_start_time = float('inf') next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] # End processing at the earliest of: line end + buffer, next line start, video end processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time) # Add small buffer? processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames) # Calculate number of frames to generate for this active line state effective_start_frame = max(line_start_frame, current_frame_index) # Start from current pos if line started earlier num_frames_for_line = max(0, processing_end_frame - effective_start_frame) if num_frames_for_line > 0: # Check if Trigger 1 needs to be set (update top half) is_penultimate_line_slot = (active_local_idx == N - 2) trigger_1_frame = -1 if is_penultimate_line_slot and current_global_line_idx != last_trigger_1_line_completed: line_duration = max(line_end_time - line_start_time, 0.01) midpoint_time = line_start_time + line_duration / 2.0 trigger_1_frame = int(midpoint_time * self.fps) trigger_1_pending_for_line = current_global_line_idx # Mark which line set the trigger # Call the optimized CUDA processor self.cuda_processor.process_frames_streaming( base_cp, mask_cp, all_syl_info, active_indices, video_writer, video_lock, self.fps, effective_start_frame, num_frames_for_line, current_global_line_idx, # Pass the *active* line index completed_global_line_indices, # Pass completed lines for highlighting long_pauses # Pass pauses for progress bar ) processed_frames_end = effective_start_frame + num_frames_for_line # --- Handle Trigger 1 (Top Update) --- # Check if the trigger frame occurred within the processed frames if trigger_1_pending_for_line == current_global_line_idx and trigger_1_frame != -1 and processed_frames_end > trigger_1_frame: print(f"Trigger 1: Atualizando slots superiores durante linha {current_global_line_idx}") current_indices_on_screen = [content[0] for content in displayed_content] next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2 new_idx_top1 = next_indices[0] new_data_top1 = syllable_lines[new_idx_top1] if new_idx_top1 < num_total_lines else None displayed_content[0] = (new_idx_top1 if new_data_top1 else None, new_data_top1) new_idx_top2 = next_indices[1] new_data_top2 = syllable_lines[new_idx_top2] if new_idx_top2 < num_total_lines else None displayed_content[1] = (new_idx_top2 if new_data_top2 else None, new_data_top2) trigger_1_pending_for_line = -1 # Reset trigger last_trigger_1_line_completed = current_global_line_idx # Mark update time # Update progress bar and current frame index # pbar update happens inside cuda_processor call now (implicitly via frame count) current_frame_index = processed_frames_end # Store the rendered mask/base for potential gap filling later prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy()) del base_cp, mask_cp, all_syl_info # Free GPU memory for this line's render else: # No frames needed processing (e.g., line ends before it starts relative to current_frame_index) print(f"Aviso: Linha {current_global_line_idx} não precisou de processamento de frames ({num_frames_for_line}).") # Ensure prev_render_data is updated even if no processing happened, using the latest render prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy()) if base_cp is not None and mask_cp is not None else prev_render_data_for_fill # Mark the current global line as completed *after* processing its frames completed_global_line_indices.add(current_global_line_idx) # --- Handle Trigger 2 (Bottom Update) --- # Check if this line was the last one in the visible slots is_last_line_slot = (active_local_idx == N - 1) if is_last_line_slot: trigger_2_pending = True # Set trigger for the *next* line to handle else: # render_success was False print(f"Aviso: Falha na renderização para linha {current_global_line_idx}. Preenchendo tempo...") # Calculate fill duration similar to the success case next_line_start_time = float('inf') next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1) if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0] processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time) processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames) effective_start_frame = max(line_start_frame, current_frame_index) num_frames_to_fill = max(0, processing_end_frame - effective_start_frame) if num_frames_to_fill > 0: if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, effective_start_frame, num_frames_to_fill, -1, completed_global_line_indices, long_pauses # No active line ) else: # Fallback with video_lock: for _ in range(num_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(num_frames_to_fill) current_frame_index += num_frames_to_fill # Still mark line as 'completed' in terms of timing, even if render failed completed_global_line_indices.add(current_global_line_idx) # Clean up GPU memory periodically if needed (usually handled by pool) cp.get_default_memory_pool().free_all_blocks() # --- Fill Remaining Frames (after last line) --- final_frames_to_fill = total_frames - current_frame_index if final_frames_to_fill > 0: print(f"Preenchendo {final_frames_to_fill} frames finais...") if prev_render_data_for_fill: fill_base, fill_mask = prev_render_data_for_fill # Optionally add a fade-out effect here by manipulating the mask or base over time # For simplicity, just hold the last rendered state self.cuda_processor.process_frames_streaming( fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps, current_frame_index, final_frames_to_fill, -1, completed_global_line_indices, long_pauses # No active line ) del fill_base, fill_mask # Cleanup last stored render data else: # Fallback with video_lock: for _ in range(final_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np) pbar.update(final_frames_to_fill) current_frame_index += final_frames_to_fill # --- Cleanup --- pbar.close() video_writer.release() # Close FFmpeg process and print stderr if prev_render_data_for_fill: # Should be None, but just in case del prev_render_data_for_fill del displayed_content[:] # Clear display list # Explicitly clear GPU context graph if desired if self.gpu_ctx.graph: del self.gpu_ctx.graph self.gpu_ctx.graph = None # Optionally clear pinned memory # if self.gpu_ctx.pinned_mem: del self.gpu_ctx.pinned_mem; self.gpu_ctx.pinned_mem = None cp.get_default_memory_pool().free_all_blocks() cp.get_default_pinned_memory_pool().free_all_blocks() end_total_time = time.time() print(f"Criação do vídeo concluída em {time.strftime('%H:%M:%S', time.gmtime(end_total_time - start_total_time))}.") # --------------------------------------------------------------------------- # ============================= MAIN ====================================== def main(): start_main_time = time.time() config = DEFAULT_CONFIG.copy() # Start with defaults # --- Argument Parsing (Example - replace with your preferred method) --- # import argparse # parser = argparse.ArgumentParser(description="Karaoke Video Creator") # parser.add_argument("-s", "--subtitles", default=config["default_subtitle_file"], help="Input subtitle file (PSV format)") # parser.add_argument("-a", "--audio", default="audio.wav", help="Input audio file") # parser.add_argument("-o", "--output", default=config["default_output_file"], help="Output video file") # parser.add_argument("--font", default=config["font_path"], help="Path to TTF font file") # # Add other config options as needed # args = parser.parse_args() # # # Update config from args # config["default_subtitle_file"] = args.subtitles # config["default_output_file"] = args.output # config["font_path"] = args.font # audio_file = args.audio # subtitle_file = args.subtitles # output_file = args.output # --- Using defaults defined in DEFAULT_CONFIG for simplicity --- subtitle_file = config.get("default_subtitle_file", "legenda.psv") output_file = config.get("default_output_file", "video_karaoke_char_level.mp4") audio_file = "audio.wav" # Hardcoded for now, consider argparse print("--- Configuração ---") for key, value in config.items(): print(f" {key}: {value}") print(f" Arquivo de Legenda: {subtitle_file}") print(f" Arquivo de Áudio: {audio_file}") print(f" Arquivo de Saída: {output_file}") print("--------------------\n") # --- Basic Checks --- if not os.path.exists(subtitle_file): print(f"Erro Crítico: Arquivo de legenda '{subtitle_file}' não encontrado.") return if not os.path.exists(audio_file): print(f"Aviso: Arquivo de áudio '{audio_file}' não encontrado. Duração baseada nas legendas.") # --- Initialize GPU & CPU Affinity --- try: cp.cuda.Device(0).use() print(f"Using GPU: {cp.cuda.Device(0).pci_bus_id}") except cp.cuda.runtime.CUDARuntimeError as e: if 'no CUDA-capable device is detected' in str(e): print("Erro Crítico: Nenhuma GPU CUDA detectada.") elif 'CUDA driver version is insufficient' in str(e): print("Erro Crítico: Driver NVIDIA CUDA desatualizado.") else: print(f"Erro Crítico: Falha ao inicializar CUDA: {e}") return except Exception as e: print(f"Erro inesperado na inicialização da GPU: {e}") return try: process = psutil.Process() affinity = list(range(os.cpu_count())) process.cpu_affinity(affinity) print(f"Afinidade da CPU definida para todos os {len(affinity)} cores.") except (ImportError, AttributeError, OSError, ValueError) as e: print(f"Aviso: Não foi possível definir afinidade da CPU: {e}") # --- Initialize Core Components --- try: text_renderer = TextRenderer(config) syllable_dict, not_found_words = load_syllables() # Load syllable dictionary if not syllable_dict: print("Aviso: Dicionário de sílabas vazio ou não carregado.") subtitle_processor = SubtitleProcessor(text_renderer, config, syllable_dict, not_found_words) except Exception as e: print(f"Erro Crítico ao inicializar componentes: {e}") traceback.print_exc() return # --- Process Subtitles --- lines = [] long_pauses = [] try: video_width, _ = map(int, config["video_resolution"].split("x")) print("Processando legendas...") lines, long_pauses = subtitle_processor.process_subtitles_to_syllable_lines(subtitle_file, video_width) print(f"Processamento concluído: {len(lines)} linhas visuais, {len(long_pauses)} pausas longas detectadas.") if not lines and not long_pauses: print("Aviso: Nenhuma linha visual ou pausa longa encontrada. O vídeo pode ficar vazio ou curto.") # Decide if you want to exit here or proceed with potentially empty video # return if not_found_words: print(f"\nAviso: {len(not_found_words)} palavras não encontradas no dicionário de sílabas:") print(" ", ", ".join(sorted(list(not_found_words))[:20]) + ("..." if len(not_found_words) > 20 else "")) print(" -> Estas palavras serão destacadas como um todo.\n") except Exception as e: print(f"Erro Crítico ao processar legendas: {e}") traceback.print_exc() return # --- Create Video --- if not lines and not long_pauses: print("Nenhuma linha visual ou pausa longa para processar. Saindo.") return try: # KaraokeVideoCreator now initializes GPU context and CUDA processor internally karaoke_creator = KaraokeVideoCreator(config, text_renderer) print("Iniciando criação do vídeo...") karaoke_creator.create_video(lines, long_pauses, output_file, audio_file) except Exception as e: print(f"\nErro Crítico durante a criação do vídeo:") print(f"{type(e).__name__}: {e}") traceback.print_exc() print("\nO processo foi interrompido devido a um erro.") finally: # Final GPU memory cleanup cp.get_default_memory_pool().free_all_blocks() cp.get_default_pinned_memory_pool().free_all_blocks() print("Recursos da GPU liberados.") end_main_time = time.time() total_duration_str = time.strftime('%H:%M:%S', time.gmtime(end_main_time - start_main_time)) print(f"\nFinalizando Karaokê Creator. Tempo total do script: {total_duration_str}") if __name__ == "__main__": main()
O cudaErrorIllegalAddress
costuma aparecer quando um kernel acessa memória fora do limite do array que recebeu.
No seu “hot–spot” (_hl_kernel
) isso acontece porque:
A variável implícita i
do ElementwiseKernel
não representa o índice do frame, mas sim o índice absoluto do elemento que está sendo executado (pixel).
cppfloat current_cut_x = cut_x_batch[i]; // ← i pode ser maior que len(cut_x_batch)
Como você passa um vector cut_x_batch
com apenas B valores e lança o kernel sobre
B × H × W
elementos, a maior parte das threads lê fora dos limites -– memória ilegal.
Depois que o kernel corrompe memória, a primeira operação que toca em
syl_meta["line_idx"][k]
(linha 665) quebra e a exceção “salta” aí.
Execute o kernel uma vez por frame e passe cut_x
como escalar, em vez de
tentar indexá-lo dentro do kernel.
python# 1️⃣ Redefina o kernel para receber apenas um corte escalar self._hl_kernel = cp.ElementwiseKernel( 'float32 cut_x, bool text_mask, uint16 X, uint16 Y, ' 'uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, ' 'float32 hl_r, float32 hl_g, float32 hl_b, ' 'float32 base_r, float32 base_g, float32 base_b', 'float32 io_r, float32 io_g, float32 io_b', r''' bool in_bbox = (X >= syl_x) && (X < syl_x + syl_w) && (Y >= syl_y) && (Y < syl_y + syl_h) && text_mask; if (in_bbox) { if (X < cut_x) { io_r = hl_r; io_g = hl_g; io_b = hl_b; // parte já “cantada” } else { io_r = base_r; io_g = base_g; io_b = base_b; // ainda por cantar } } ''', name='apply_syllable_highlight_rgb_oneframe' )
python# 2️⃣ Dentro de _render_batch: for k in syl_indices_in_span: # um loop por sílaba if syl_meta["line_idx"][k] != active_line_idx: continue sx, sy, sw, sh = map(int, (syl_meta["x"][k], syl_meta["y"][k], syl_meta["w"][k], syl_meta["h"][k])) if sw == 0 or sh == 0: continue # shape: (B,) → percorremos frame-a-frame cuts = cut_x_batch_span[:, k - s0] # s0 é início da span for f in range(B): self._hl_kernel( cuts[f].astype(cp.float32), # 🔸 passa ESCALAR text_mask_bool, self.ctx.xg, self.ctx.yg, sx, sy, sw, sh, *self.ctx.hl_rgbf, *self.ctx.base_rgbf, inter_f[f, ..., 0], # canais do frame f inter_f[f, ..., 1], inter_f[f, ..., 2] )
Este ajuste garante:
cut_x
).Problema | Como resolver |
---|---|
syl_indices_in_span_host = cp.asnumpy(...) faz ida/volta CPU⇄GPU para cada batch | Faça o loop inteiro em GPU: for k in syl_indices_in_span.tolist(): ... ou use int(k) diretamente (já está em device). |
Muitas conversões copy() de imagens | Armazene prev_render_data_for_fill como view (base_cp.copy() não é necessário; o array é imutável nesse ponto). |
Pools não são esvaziados | Depois dos grandes loops: cp.get_default_memory_pool().free_all_blocks() já é suficiente; evite chamá-lo dentro de cada batch. |
frames_per_batch fixo = 64 | Extraia da VRAM disponível (cp.cuda.runtime.memGetInfo ) para usar lotes maiores quando possível. |
--dry-run
ou limite manual).Resumo: o kernel estava acessando além do vector de cortes.
Splittar a chamada por frame (ou redesenhar o kernel para 3-D launch) elimina o acesso ilegal e estabiliza o pipeline.