fix this issues: ``` C:\Users\lucas\Downloads\Scr...

Question

fix this issues:

text
C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda>PS_bak_new_clean-base.py
--- Configuração ---
  font_path: C:/Users/lucas/LilitaOne-Regular.ttf
  font_size: 140
  video_resolution: 1920x1080
  video_fps: 60
  base_text_color: #FFFFFF
  highlight_text_color: #ff0000
  num_visible_lines: 4
  upper_case: True
  background_image: capa.png
  frames_per_batch: 64
  default_subtitle_file: legenda.psv
  default_output_file: video_karaoke_char_level.mp4
  ffmpeg_preset: p4
  ffmpeg_tune: hq
  ffmpeg_bitrate: 20M
  ffmpeg_codec: h264_nvenc
  vertical_shift_pixels: 130
  min_char_duration: 0.01
  cuda_graph_warmup_frames: 4
  max_visual_fill_duration: 3.0
  Arquivo de Legenda: legenda.psv
  Arquivo de Áudio: audio.wav
  Arquivo de Saída: video_karaoke_char_level.mp4
--------------------

Using GPU: 0000:01:00.0
Afinidade da CPU definida para todos os 12 cores.
Processando legendas...
Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado.
Processamento concluído: 8 linhas visuais, 0 pausas longas detectadas.
GPU initialized successfully.
Iniciando criação do vídeo...
Usando duração do áudio: 52.01s
Duração estimada: 52.11s | Total de frames: 3127
FFmpeg command: ffmpeg -y -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt bgr24 -r 60 -i - -c:v h264_nvenc -preset p4 -b:v 20M -pix_fmt yuv420p -tune hq video_karaoke_char_level.mp4
Gerando vídeo:   0%|                                                                                                    | 0/3127 [00:00<?, ?frames/s]Gerando 35 frames estáticos iniciais...
Gerando vídeo:   1%|█                                                                                          | 35/3127 [00:00<00:46, 66.63frames/s]
Erro Crítico durante a criação do vídeo:
CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
  File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1642, in main
    karaoke_creator.create_video(lines, long_pauses, output_file, audio_file)
  File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 1411, in create_video
    self.cuda_processor.process_frames_streaming(
  File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 885, in process_frames_streaming
    self._render_batch(batch_frame_times, text_mask_bool, syl_meta,
  File "C:\Users\lucas\Downloads\Scripts\Karaoke-Creator\playdasegunda\PS_bak_new_clean-base.py", line 665, in _render_batch
    if syl_meta["line_idx"][k] != active_line_idx:
  File "cupy\\_core\\core.pyx", line 1289, in cupy._core.core._ndarray_base.__nonzero__
  File "cupy\\_core\\core.pyx", line 1910, in cupy._core.core._ndarray_base.get
  File "cupy\\cuda\\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 607, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

O processo foi interrompido devido a um erro.
Gerando vídeo:   1%|█                                                                                          | 35/3127 [00:00<01:15, 41.15frames/s]
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00>
Traceback (most recent call last):
  File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00>
Traceback (most recent call last):
  File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00>
Traceback (most recent call last):
  File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <cyfunction PinnedMemory.__del__ at 0x000001F69BF6FA00>
Traceback (most recent call last):
  File "cupy\\cuda\\pinned_memory.pyx", line 36, in cupy.cuda.pinned_memory.PinnedMemory.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 576, in cupy_backends.cuda.api.runtime.freeHost
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Recursos da GPU liberados.

Finalizando Karaokê Creator. Tempo total do script: 00:00:01
Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860>
Traceback (most recent call last):
  File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: <cyfunction Stream.__del__ at 0x000001F69BF61860>
Traceback (most recent call last):
  File "cupy\\cuda\\stream.pyx", line 485, in cupy.cuda.stream.Stream.__del__
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 851, in cupy_backends.cuda.api.runtime.streamDestroy
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 571, in cupy_backends.cuda.api.runtime.free
  File "cupy_backends\\cuda\\api\\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\\cuda\\api\\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\\cuda\\api\\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Error in sys.excepthook:

Original exception was:

code:

python
"""
karaoke_fast.py – versão otimizada
Autor: ChatGPT (OpenAI), abr/2025
"""

# ----------------------------  IMPORTS  ------------------------------------
import cupy as cp
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import subprocess, threading, queue, math, os, time, psutil, traceback
from tqdm import tqdm
# ---------------------------------------------------------------------------

# ------------------------  CONFIGURAÇÃO  -----------------------------------
DEFAULT_CONFIG = {
    "font_path": "C:/Users/lucas/LilitaOne-Regular.ttf",
    "font_size": 140,
    "video_resolution": "1920x1080",
    "video_fps": 60,
    "base_text_color": "#FFFFFF",
    "highlight_text_color": "#ff0000",
    "num_visible_lines": 4,
    "upper_case": True,
    "background_image": "capa.png",
    "frames_per_batch": 64, # agora valor-mínimo
    "default_subtitle_file": "legenda.psv",
    "default_output_file": "video_karaoke_char_level.mp4",
    "ffmpeg_preset": "p4",
    "ffmpeg_tune": "hq",
    "ffmpeg_bitrate": "20M",
    "ffmpeg_codec": "h264_nvenc",
    "vertical_shift_pixels": 130,
    "min_char_duration": 0.01,
    "cuda_graph_warmup_frames": 4,      # <-- novo
    "max_visual_fill_duration": 3.0,    # <-- agora usado globalmente
}
# ---------------------------------------------------------------------------

# ==============================  UTILS  ====================================
def hex_to_bgr_cupy(hex_color: str) -> cp.ndarray:
    hex_color = hex_color.lstrip('#')
    rgb = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
    return cp.array(rgb[::-1], dtype=cp.uint8)

def get_audio_duration(audio_file_path):
    if not os.path.exists(audio_file_path):
        print(f"Aviso: Arquivo de áudio não encontrado: {audio_file_path}")
        return None
    try:
        command = [
            "ffprobe", "-v", "error", "-show_entries", "format=duration",
            "-of", "default=noprint_wrappers=1:nokey=1", audio_file_path
        ]
        result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, check=True)
        return float(result.stdout.strip())
    except FileNotFoundError:
        print("Erro: ffprobe não encontrado. Certifique-se de que o FFmpeg está no PATH.")
        return None
    except Exception as e:
        print(f"Erro ao obter duração do áudio: {e}")
        return None

def load_syllables(filepath="syllables.txt"):
    syllable_dict = {}
    not_found_words = set()
    try:
        with open(filepath, 'r', encoding='utf-8') as f:
            for line in f:
                line = line.strip()
                if line and '|' in line:
                    word, syllables = line.split('|', 1)
                    syllable_dict[word.strip().lower()] = syllables.strip()
    except FileNotFoundError:
        print(f"Aviso: Arquivo de sílabas '{filepath}' não encontrado.")
    except Exception as e:
        print(f"Erro ao carregar sílabas: {e}")
    return syllable_dict, not_found_words
# ---------------------------------------------------------------------------


# ======================  TEXT RENDERER (SEM ALTERAÇÕES)  ===================
class TextRenderer:
    def __init__(self, config):
        self.config = config
        self.font_path = config["font_path"]
        self.font_size = config["font_size"]
        self.num_visible_lines = config["num_visible_lines"]
        self.upper_case = config["upper_case"]
        self.base_text_color = config["base_text_color"]
        self._font_cache = {}

        try:
            self.font = ImageFont.truetype(self.font_path, self.font_size)
            self._font_cache[self.font_size] = self.font
            temp_img = Image.new("RGB", (1, 1))
            temp_draw = ImageDraw.Draw(temp_img)
            space_bbox = temp_draw.textbbox((0, 0), " ", font=self.font)
            try: self.space_width_ref = temp_draw.textlength(" ", font=self.font)
            except AttributeError: self.space_width_ref = space_bbox[2] - space_bbox[0] if space_bbox else int(self.font_size * 0.25)
            try:
                sample_bbox = self.font.getbbox("Tg")
                self.line_height_ref = sample_bbox[3] - sample_bbox[1]
            except AttributeError:
                sample_bbox_fallback = temp_draw.textbbox((0, 0), "Tg", font=self.font)
                self.line_height_ref = sample_bbox_fallback[3] - sample_bbox_fallback[1] if sample_bbox_fallback else int(self.font_size * 1.2)
            del temp_draw, temp_img
        except Exception as e:
            print(f"Aviso: Falha ao carregar fonte '{self.font_path}'. Usando padrão. Erro: {e}")
            self.font = ImageFont.load_default()
            try: bbox = self.font.getbbox("M"); self.font_size = bbox[3] - bbox[1]
            except AttributeError: self.font_size = 20
            self._font_cache[self.font_size] = self.font
            temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img)
            try: self.space_width_ref = temp_draw.textlength(" ", font=self.font)
            except AttributeError: self.space_width_ref = 10
            try: bbox = self.font.getbbox("Tg"); self.line_height_ref = bbox[3] - bbox[1]
            except AttributeError: self.line_height_ref = 30
            del temp_draw, temp_img

        spacing_multiplier = 1.0 if self.num_visible_lines <= 1 else (0.8 if self.num_visible_lines == 2 else (0.6 if self.num_visible_lines == 3 else 0.4))
        self.line_spacing = max(0, int(self.line_height_ref * spacing_multiplier))

    def _get_font_with_size(self, size: int) -> ImageFont.FreeTypeFont:
        size = max(1, int(size))
        if size in self._font_cache: return self._font_cache[size]
        try: f = ImageFont.truetype(self.font_path, size)
        except Exception: f = ImageFont.load_default()
        self._font_cache[size] = f
        return f

    def _calculate_line_width(self, line_elements, draw, font) -> int:
        width_total = 0
        for _, _, txt, _ in line_elements:
            width_total += self._get_element_width(draw, txt, font)
        return width_total

    def _get_element_width(self, draw, text, font):
        if text == " ": return self.space_width_ref
        try: return draw.textlength(text, font=font)
        except AttributeError:
            try: bbox = draw.textbbox((0, 0), text, font=font); return bbox[2] - bbox[0] if bbox else 0
            except AttributeError:
                try: width, _ = draw.textsize(text, font=font); return width
                except AttributeError:
                    font_size_est = getattr(font, 'size', self.font_size // 2)
                    return len(text) * (font_size_est // 2)
        except Exception:
             font_size_est = getattr(font, 'size', self.font_size // 2)
             return len(text) * (font_size_est // 2)

    def render_text_images(self, displayed_content, active_line_local_idx, width, height):
        img_base = Image.new("RGB", (width, height), (0, 0, 0))
        img_mask = Image.new("L", (width, height), 0)
        draw_base = ImageDraw.Draw(img_base)
        draw_mask = ImageDraw.Draw(img_mask)
        max_allowed_width = int(width * 0.90)
        min_font_size = max(10, int(self.font_size * 0.60))
        line_render_data = []

        for global_idx, line_elements in displayed_content:
            if not (line_elements and global_idx is not None):
                line_render_data.append(None)
                continue
            font_line_size = self.font_size
            font_line = self._get_font_with_size(font_line_size)
            line_width_px = self._calculate_line_width(line_elements, draw_base, font_line)
            reduction_step = max(1, int(self.font_size * 0.05))
            while line_width_px > max_allowed_width and font_line_size > min_font_size:
                font_line_size = max(min_font_size, font_line_size - reduction_step)
                font_line = self._get_font_with_size(font_line_size)
                line_width_px = self._calculate_line_width(line_elements, draw_base, font_line)
                if font_line_size == min_font_size: break
            try: h_ref = font_line.getbbox("Tg"); line_height_px = h_ref[3] - h_ref[1]
            except Exception: line_height_px = int(self.line_height_ref * (font_line_size / self.font_size))
            line_render_data.append({"font": font_line, "font_size": font_line_size, "height": line_height_px, "width": line_width_px, "elements": line_elements, "global_idx": global_idx})

        vertical_shift = self.config.get("vertical_shift_pixels", 0)
        block_height_ref = self.num_visible_lines * self.line_height_ref + (self.num_visible_lines - 1) * self.line_spacing
        start_y_ref = max(0, (height - block_height_ref) // 2 + vertical_shift)
        line_start_y_positions = [int(start_y_ref + i * (self.line_height_ref + self.line_spacing)) for i in range(self.num_visible_lines)]
        all_syllable_render_info = []
        active_syllable_indices = (-1, -1)
        current_global_syl_idx = 0
        sentence_end_punctuation = ".!?"

        for local_idx, render_info in enumerate(line_render_data):
            if render_info is None: continue
            font_line = render_info["font"]
            line_width_px = render_info["width"]
            elements_in_line = render_info["elements"]
            current_global_line_idx = render_info["global_idx"]
            is_active_line = (local_idx == active_line_local_idx)
            if is_active_line: active_syllable_start_idx_global = current_global_syl_idx
            line_start_x = (width - line_width_px) // 2
            current_x = float(line_start_x)
            line_y_draw = line_start_y_positions[local_idx]
            if line_y_draw is None: continue

            for i, (start_time, end_time, element_text, _) in enumerate(elements_in_line):
                element_width = self._get_element_width(draw_base, element_text, font_line)
                if not element_text.isspace():
                    stripped_text = element_text.rstrip()
                    is_sentence_end = bool(stripped_text and stripped_text[-1] in sentence_end_punctuation)
                    try:
                        draw_x, draw_y = int(current_x), line_y_draw
                        element_text_base = element_text_mask = element_text
                        if is_sentence_end and element_text.rstrip().endswith('.'): element_text_mask = element_text_mask.rstrip('.')
                        draw_base.text((draw_x, draw_y), element_text_base, font=font_line, fill=self.base_text_color)
                        if element_text_mask: draw_mask.text((draw_x, draw_y), element_text_mask, font=font_line, fill=255)
                        final_bbox = draw_base.textbbox((draw_x, draw_y), element_text, font=font_line)
                        if final_bbox:
                            bbox_left, bbox_top, bbox_right, bbox_bottom = final_bbox
                            syl_w_actual, syl_h_actual = bbox_right - bbox_left, bbox_bottom - bbox_top
                            bbox_top_final = bbox_top
                        else:
                            line_height_px_fallback = render_info["height"]
                            bbox_left, bbox_top_final = draw_x, draw_y
                            syl_w_actual, syl_h_actual = element_width, line_height_px_fallback
                    except Exception as e:
                         print(f"Fallback render/bbox for: {element_text}. Err: {e}")
                         draw_x, draw_y = int(current_x), line_y_draw
                         try:
                             draw_base.text((draw_x, draw_y), element_text, font=font_line, fill=self.base_text_color)
                             draw_mask.text((draw_x, draw_y), element_text, font=font_line, fill=255)
                         except Exception as draw_err: print(f"  -> Falha até no fallback: {draw_err}")
                         line_height_px_fallback = render_info["height"]
                         bbox_left, bbox_top_final = draw_x, draw_y
                         syl_w_actual, syl_h_actual = element_width, line_height_px_fallback
                    all_syllable_render_info.append((start_time, end_time, bbox_left, bbox_top_final, syl_w_actual, syl_h_actual, current_global_line_idx, is_sentence_end))
                    current_global_syl_idx += 1
                current_x += element_width
            if is_active_line:
                active_syllable_end_idx_global = current_global_syl_idx
                active_syllable_indices = (active_syllable_start_idx_global, active_syllable_end_idx_global)

        base_cp = cp.asarray(np.array(img_base))
        mask_cp = cp.asarray(np.array(img_mask))
        return base_cp, mask_cp, all_syllable_render_info, active_syllable_indices
# ---------------------------------------------------------------------------


# ========================== SUBTITLE PROCESSOR (SEM ALTERAÇÕES) ===============
class SubtitleProcessor:
    def __init__(self, text_renderer: TextRenderer, config, syllable_dict, not_found_words_set):
        self.text_renderer = text_renderer
        self.config = config
        self.upper_case = config["upper_case"]
        self.font = self.text_renderer.font
        self.syllable_dict = syllable_dict
        self.not_found_words_set = not_found_words_set

    @staticmethod
    def _parse_time_string_float(time_str):
        try: return float(time_str)
        except (ValueError, TypeError): print(f"Aviso: Timestamp inesperado: {time_str}"); return None

    @staticmethod
    def read_subtitles(file):
        char_timing_data = []
        try:
            with open(file, 'r', encoding='utf-8') as f: lines = f.readlines()
            if not lines: print(f"Aviso: Arquivo '{file}' vazio."); return [], []
            header = lines[0].strip().upper()
            start_idx = 1 if header == "CHARACTER|START|END" else (0 if (header and '|' not in lines[0]) else 0)
            if start_idx==0 and header != "CHARACTER|START|END": print("Aviso: Cabeçalho 'CHARACTER|START|END' não encontrado.")

            for line_num, line in enumerate(lines[start_idx:], start=start_idx + 1):
                if not line.strip(): continue
                parts = line.rstrip('\n\r').split('|')
                if len(parts) != 3: print(f"Aviso: Ignorando linha {line_num} mal formatada: '{line}'"); continue
                char, start_str, end_str = parts[0], parts[1].strip(), parts[2].strip()
                start_time = SubtitleProcessor._parse_time_string_float(start_str)
                end_time = SubtitleProcessor._parse_time_string_float(end_str)
                if start_time is None or end_time is None: print(f"Aviso: Ignorando linha {line_num} com timestamp inválido: '{line}'"); continue
                if not char: char = " "
                if end_time < start_time: print(f"Aviso: Corrigindo end<start na linha {line_num}: '{line}'"); end_time = start_time
                char_timing_data.append((start_time, end_time, str(char)))
        except FileNotFoundError: print(f"Erro: Arquivo PSV não encontrado: {file}"); return [], []
        except Exception as e: print(f"Erro ao ler PSV: {e}"); traceback.print_exc(); return [], []
        char_timing_data.sort(key=lambda x: x[0])
        long_pauses = SubtitleProcessor._identify_long_pauses(char_timing_data)
        return char_timing_data, long_pauses

    @staticmethod
    def _identify_long_pauses(char_timing_data, min_pause_duration=5.0):
        pauses = []
        if not char_timing_data: return pauses
        first_start = char_timing_data[0][0]
        if first_start >= min_pause_duration: pauses.append({"start": 0.0, "end": first_start, "duration": first_start, "type": "initial"})
        for i in range(1, len(char_timing_data)):
            prev_end, curr_start = char_timing_data[i-1][1], char_timing_data[i][0]
            pause_dur = curr_start - prev_end
            if pause_dur >= min_pause_duration:
                is_covered = any(p["type"] == "initial" and p["end"] >= curr_start for p in pauses)
                if not is_covered: pauses.append({"start": prev_end, "end": curr_start, "duration": pause_dur, "type": "between"})
        for i, (start, end, _) in enumerate(char_timing_data):
             char_dur = end - start
             if char_dur >= min_pause_duration:
                 is_covered = any(abs(p["start"] - start) < 0.01 and abs(p["end"] - end) < 0.01 for p in pauses)
                 if not is_covered: pauses.append({"start": start, "end": end, "duration": char_dur, "type": "during"})
        pauses.sort(key=lambda x: x["start"])
        return pauses

    def _group_chars_into_words(self, char_timing_data):
        words_spaces = []
        current_word = []
        for i, (start, end, char) in enumerate(char_timing_data):
            proc_char = char.upper() if self.upper_case else char
            if proc_char.isspace():
                if current_word: words_spaces.append({"type": "word", "chars": current_word}); current_word = []
                words_spaces.append({"type": "space", "start": start, "end": end})
            else:
                current_word.append((start, end, proc_char))
        if current_word: words_spaces.append({"type": "word", "chars": current_word})
        return words_spaces

    def _process_words_into_syllables(self, words_and_spaces):
        syllable_data = []
        temp_img = Image.new("RGB", (1, 1)); temp_draw = ImageDraw.Draw(temp_img)
        font = self.text_renderer.font
        punc_strip, sent_end = ",.!?;:", ".!?"
        for element in words_and_spaces:
            if element["type"] == "space":
                 syllable_data.append((element["start"], element["end"], " ", self.text_renderer.space_width_ref, False))
                 continue
            word_chars = element["chars"]
            if not word_chars: continue
            word_text = "".join([c[2] for c in word_chars])
            cleaned_word = word_text.rstrip(punc_strip)
            lookup = cleaned_word.lower()

            if lookup in self.syllable_dict:
                syl_parts = self.syllable_dict[lookup].split('-')
                char_idx, orig_idx = 0, 0
                word_syl_indices = []
                for part in syl_parts:
                    syl_len = len(part)
                    if char_idx + syl_len > len(cleaned_word):
                         if orig_idx < len(word_chars):
                            rem_chars = word_chars[orig_idx:]
                            rem_text = "".join([c[2] for c in rem_chars])
                            s_start, s_end = rem_chars[0][0], rem_chars[-1][1]
                            s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars)
                            syllable_data.append((s_start, s_end, rem_text, s_width, False))
                            word_syl_indices.append(len(syllable_data) - 1)
                         break
                    syl_chars = word_chars[orig_idx : orig_idx + syl_len]
                    if not syl_chars: continue
                    s_text = "".join([c[2] for c in syl_chars])
                    s_start, s_end = syl_chars[0][0], syl_chars[-1][1]
                    s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in syl_chars)
                    syllable_data.append((s_start, s_end, s_text, s_width, False))
                    word_syl_indices.append(len(syllable_data) - 1)
                    char_idx += syl_len; orig_idx += syl_len
                if orig_idx < len(word_chars): # Handle trailing punctuation
                     rem_chars = word_chars[orig_idx:]
                     rem_text = "".join([c[2] for c in rem_chars])
                     expected_punc = word_text[len(cleaned_word):]
                     if rem_text == expected_punc and word_syl_indices: # Append to last syllable
                         last_idx = word_syl_indices[-1]
                         ls_start, _, ls_text, _, _ = syllable_data[last_idx]
                         new_text = ls_text + rem_text
                         new_end = rem_chars[-1][1]
                         new_width = self.text_renderer._get_element_width(temp_draw, new_text, font)
                         syllable_data[last_idx] = (ls_start, new_end, new_text, new_width, False)
                     else: # Create new syllable for remaining
                         rem_start, rem_end = rem_chars[0][0], rem_chars[-1][1]
                         rem_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in rem_chars)
                         syllable_data.append((rem_start, rem_end, rem_text, rem_width, False))
                         word_syl_indices.append(len(syllable_data) - 1)
                if word_syl_indices: # Mark sentence end flag on the *actual* last syllable
                    final_syl_idx = word_syl_indices[-1]
                    final_syl_text = syllable_data[final_syl_idx][2].rstrip()
                    if final_syl_text and final_syl_text[-1] in sent_end:
                         syllable_data[final_syl_idx] = syllable_data[final_syl_idx][:4] + (True,)
            else: # Word not in dictionary
                if lookup not in self.not_found_words_set and word_text.lower() == lookup: self.not_found_words_set.add(lookup)
                s_start, s_end = word_chars[0][0], word_chars[-1][1]
                s_width = sum(self.text_renderer._get_element_width(temp_draw, c[2], font) for c in word_chars)
                is_end = word_text.rstrip()[-1] in sent_end if word_text.rstrip() else False
                syllable_data.append((s_start, s_end, word_text, s_width, is_end))

        del temp_draw, temp_img
        syllable_data.sort(key=lambda x: x[0])
        # Post-process end times (simpler version, maybe adjust if needed)
        processed_data = []
        for i in range(len(syllable_data)):
            start, end, text, width, is_end = syllable_data[i]
            # simplified end time logic for now
            processed_data.append((start, end, text, width, is_end))
        return processed_data # Returning without the 'next_syl_start' for now

    def group_syllables_into_lines(self, syllable_timing_data, video_width):
        lines = []
        current_line = []
        for syllable_tuple in syllable_timing_data:
            start, end, text, width, is_end = syllable_tuple # Adjusted unpacking
            current_line.append((start, end, text, width)) # Store width too
            if is_end:
                while current_line and current_line[-1][2].isspace(): current_line.pop()
                if current_line: lines.append(current_line)
                current_line = []
        while current_line and current_line[-1][2].isspace(): current_line.pop()
        if current_line: lines.append(current_line)
        return lines

    def process_subtitles_to_syllable_lines(self, file, video_width):
        char_data, pauses = self.read_subtitles(file)
        if not char_data: return [], pauses
        words_spaces = self._group_chars_into_words(char_data)
        syl_data = self._process_words_into_syllables(words_spaces)
        if not syl_data: print("Aviso: Nenhum dado de sílaba gerado."); return [], pauses
        # Optional: Add debug print here if needed
        lines = self.group_syllables_into_lines(syl_data, video_width)
        return lines, pauses
# ---------------------------------------------------------------------------


# ==========================  GPU CONTEXT  ==================================
class GPURenderContext:
    """
    Mantém buffers, grades e gráficos CUDA persistentes para evitar
    alocações e capturar kernels repetitivos.
    """
    def __init__(self, width: int, height: int, cfg):
        self.w, self.h = width, height
        self.cfg = cfg
        self.pool = cp.cuda.MemoryPool()          # pool dedicado
        cp.cuda.set_allocator(self.pool.malloc)

        # Grades X/Y (uint16 já é suficiente p/ 4 K)
        yy, xx = cp.mgrid[:height, :width]
        self.xg = xx.astype(cp.uint16)
        self.yg = yy.astype(cp.uint16)
        del xx, yy

        # Buffers de saída duplos (double-buffer)
        self.batch_cap = 0          # será ajustado na 1.ª chamada
        self.out_a = self.out_b = None # Will hold cp.empty(...) arrays

        # Máscara de progresso (barra completa)
        bar_h = 20
        bar_y0 = 10
        self.bar_mask_full = ((self.yg >= bar_y0) & (self.yg < bar_y0 + bar_h)) # (H, W) bool

        # Cores (pre-calculated float32 RGB)
        self.base_rgbf  = hex_to_bgr_cupy(cfg["base_text_color"])[::-1].astype(cp.float32)/255.0
        hl_bgr          = hex_to_bgr_cupy(cfg["highlight_text_color"])
        self.hl_rgbf    = hl_bgr[::-1].astype(cp.float32)/255.0
        # Darken highlight color for progress bar background
        dark_hl_bgr = (hl_bgr.astype(cp.float32) * 0.4).clip(0, 255).astype(cp.uint8)
        self.bar_bg_rgbf= dark_hl_bgr[::-1].astype(cp.float32)/255.0

        # Gráfico CUDA (capturado após warm-up)
        self.graph      = None # Will hold instantiated CUDA Graph
        self.pinned_mem = None # Will hold pinned memory buffer
        self.pinned_mem_size = 0

    # --------------  helpers  --------------
    def ensure_batch_buffers(self, n_frames: int):
        """Ensures output buffers `out_a` and `out_b` can hold `n_frames`."""
        if n_frames <= self.batch_cap:
            return
        # Use power of 2 for potential alignment benefits
        self.batch_cap = int(2 ** math.ceil(math.log2(n_frames)))
        shape = (self.batch_cap, self.h, self.w, 3)
        # Free old buffers before creating new ones if they exist
        if self.out_a is not None: del self.out_a
        if self.out_b is not None: del self.out_b
        self.out_a = cp.empty(shape, dtype=cp.uint8)
        self.out_b = cp.empty(shape, dtype=cp.uint8)
        # Also ensure pinned memory is large enough for one buffer
        self.ensure_pinned_memory(self.out_a.nbytes)

    def ensure_pinned_memory(self, n_bytes: int):
        """Ensures pinned host memory `pinned_mem` is at least `n_bytes`."""
        if n_bytes <= self.pinned_mem_size:
            return
        # Free old pinned memory if it exists
        if self.pinned_mem is not None:
            try:
                 # Explicitly free pinned memory if CuPy version supports it well
                 # cp.cuda.runtime.freeHost(self.pinned_mem.ptr) # Or similar if available
                 del self.pinned_mem
            except Exception as e:
                 print(f"Note: Could not explicitly free old pinned memory: {e}")
                 self.pinned_mem = None # Ensure it's reset anyway

        self.pinned_mem_size = n_bytes
        self.pinned_mem = cp.cuda.alloc_pinned_memory(self.pinned_mem_size)

    def get_pinned_buffer(self, required_bytes: int) -> cp.cuda.PinnedMemory:
        """Gets the pinned memory buffer, ensuring it's large enough."""
        self.ensure_pinned_memory(required_bytes)
        return self.pinned_mem
# ---------------------------------------------------------------------------


# =========================  CUDA PROCESSOR  ================================
class CUDAProcessor:
    """
    Versão repaginada: todas as operações puramente em GPU e
    capturadas em um CUDA Graph após o primeiro batch.
    """
    def __init__(self, cfg, static_bg_rgb_cp, gpu_ctx: GPURenderContext):
        self.cfg  = cfg
        self.ctx  = gpu_ctx
        # Ensure background is float32 RGB (GPU likes RGB order usually)
        if static_bg_rgb_cp.shape[2] == 3: # Assuming input is RGB
            self.bg_f = static_bg_rgb_cp.astype(cp.float32) / 255.0
        else: # Fallback if not RGB
             self.bg_f = cp.zeros((gpu_ctx.h, gpu_ctx.w, 3), dtype=cp.float32)

        self.min_dur = cfg.get("min_char_duration", 0.01)
        self.max_vis = cfg.get("max_visual_fill_duration", 3.0)

        # streams
        self.stream_compute = cp.cuda.Stream(non_blocking=True)
        self.stream_h2d     = cp.cuda.Stream(non_blocking=True) # For D->H copy

        # Elementwise kernel para progressivo da sílaba ativa
        # Adjusted for RGB output and input
        self._hl_kernel = cp.ElementwiseKernel(
            # Positional arguments first
            'raw float32 cut_x_batch, bool text_mask, \
             uint16 X, uint16 Y, \
             uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, \
             float32 hl_r, float32 hl_g, float32 hl_b, \
             float32 base_r, float32 base_g, float32 base_b',  # in_params (positional)
            'float32 io_r, float32 io_g, float32 io_b',      # inout_params (positional)
            r"""
                // Check bounds and text mask just once per pixel
                bool is_syl_pixel = (X >= syl_x) && (X < syl_x + syl_w) &&
                                    (Y >= syl_y) && (Y < syl_y + syl_h) &&
                                    text_mask;

                if (is_syl_pixel) {
                    // Determine color based on horizontal position relative to cut_x
                    // cut_x_batch is specific to this frame (i is batch index)
                    // Use 'cut_x_batch' instead of 'raw_cut_x_batch' here
                    float current_cut_x = cut_x_batch[i]; // Access per-frame cut_x
                    if (X < current_cut_x) { // Highlighted part
                        io_r = hl_r;  io_g = hl_g;  io_b = hl_b;
                    } else { // Base part (already drawn, but ensure it's base color)
                        io_r = base_r; io_g = base_g; io_b = base_b;
                    }
                }
                // Pixels outside the syllable or mask are left untouched (background/other text)
            """, # operation (positional)
            # Keyword arguments after positional ones
            name='apply_syllable_highlight_rgb',
            # We pass cut_x as raw so we can index it with `i` inside the kernel
            # preamble='#include <cuda_fp16.h>' # Not needed for float32
        )

    # ------------------  núcleo p/ um batch  ------------------
    def _render_batch(self,
                      frame_times_f32: cp.ndarray,        # (B,) frame times
                      text_mask_bool: cp.ndarray,         # (H,W) bool, where text *could* be
                      syl_meta: dict,                     # dict of cp arrays for syllables
                      active_syl_span: tuple,             # (start_idx, end_idx) for active line
                      active_line_idx: int,
                      completed_line_pixels_mask: cp.ndarray, # (H,W) bool, pixels of completed lines
                      bar_progress: cp.ndarray or None,   # (B,) float32 (0-1) or None
                      out_buf: cp.ndarray):               # (B,H,W,3) uint8 - OUTPUT buffer
        """
        Executa todo o pipeline para um batch dentro de stream_compute.
        Chamado dentro de um CUDA Graph depois do 1.º warm-up.
        OUTPUTS directly into `out_buf`.
        """
        B = frame_times_f32.shape[0]
        H, W = self.ctx.h, self.ctx.w

        # 1. Start with background, broadcasted to batch size
        # Ensure bg_f is (1, H, W, 3) for broadcasting with (B,...) masks
        inter_f = cp.broadcast_to(self.bg_f[None,...], (B, H, W, 3)).copy() # (B,H,W,3) float32

        # 2. Apply progress bar (if needed for this batch)
        if bar_progress is not None and cp.any(bar_progress > 0): # Optimization: check if any progress
            # Calculate fill width per frame
            fill_w = (bar_progress * W).astype(cp.uint16) # (B,) uint16 is enough
            # Expand grid X once: (1, 1, W) for broadcasting against (B, H, 1)
            X_grid_exp = self.ctx.xg[None, None, :] # Shape (1, 1, W)
            # Expand bar mask: (1, H, W) for broadcasting against (B,...)
            bar_mask_exp = self.ctx.bar_mask_full[None, :, :] # Shape (1, H, W)
            # Expand fill_w: (B, 1, 1) for broadcasting
            fill_w_exp = fill_w[:, None, None] # Shape (B, 1, 1)

            # Create masks for this batch using broadcasting
            # bar_fill_mask: Where bar should be filled (True/False) for each pixel in batch
            bar_fill_mask = bar_mask_exp & (X_grid_exp < fill_w_exp) # (B, H, W)
            # bar_bg_mask: Where bar background should be (True/False)
            bar_bg_mask = bar_mask_exp & (~bar_fill_mask) # (B, H, W)

            # Apply colors using boolean masks (expand masks to 3 channels)
            # Ensure colors are (1, 1, 1, 3) for broadcasting
            bar_bg_color_exp = self.ctx.bar_bg_rgbf[None, None, None, :]
            bar_fill_color_exp = self.ctx.hl_rgbf[None, None, None, :]

            cp.copyto(inter_f, bar_bg_color_exp, where=bar_bg_mask[..., None])
            cp.copyto(inter_f, bar_fill_color_exp, where=bar_fill_mask[..., None])

            # Cleanup intermediate broadcasted arrays if memory is tight
            del fill_w, X_grid_exp, bar_mask_exp, fill_w_exp, bar_fill_mask, bar_bg_mask
            del bar_bg_color_exp, bar_fill_color_exp

        # 3. Apply base text color everywhere text *could* be (overwrites background/bar)
        # Ensure base color is (1, 1, 1, 3) for broadcasting
        base_color_exp = self.ctx.base_rgbf[None, None, None, :]
        # Apply base color where text_mask_bool is True (expand mask to B and C)
        cp.copyto(inter_f, base_color_exp, where=text_mask_bool[None, ..., None])
        del base_color_exp # Cleanup

        # 4. Highlight completed lines (overwrites base text color)
        if completed_line_pixels_mask is not None:
            # Ensure highlight color is (1, 1, 1, 3) for broadcasting
            hl_color_exp = self.ctx.hl_rgbf[None, None, None, :]
            # Apply highlight color where completed_line_pixels_mask is True (expand mask to B and C)
            cp.copyto(inter_f, hl_color_exp, where=completed_line_pixels_mask[None, ..., None])
            del hl_color_exp # Cleanup

        # 5. Highlight active syllables (the complex part)
        s0, s1 = active_syl_span              # Syllable indices for the currently active line [s0, s1)
        if s1 > s0 and syl_meta is not None:
            # Check if the *active_line_idx* actually corresponds to syllables in this span
            # This avoids running the kernel if the active line has no syllables in syl_meta
            # (Could happen with empty lines or data issues)
            active_line_has_syllables_in_span = cp.any(syl_meta["line_idx"][s0:s1] == active_line_idx)

            if active_line_has_syllables_in_span:
                # Prepare data for the kernel for *all* syllables in the span
                # Kernel will internally filter by active_line_idx if needed, but simpler is often faster
                syl_indices_in_span = cp.arange(s0, s1) # Indices within syl_meta arrays

                # Calculate visual progress (0.0 to 1.0) for each syllable in the span, for each frame in the batch
                # Shape: (B, len(span))
                time_elapsed = frame_times_f32[:, None] - syl_meta["start"][syl_indices_in_span]
                visual_progress = cp.clip(time_elapsed / syl_meta["vis_dur"][syl_indices_in_span], 0.0, 1.0)

                # Calculate the horizontal cutoff point (x-coordinate) for highlighting
                # Shape: (B, len(span))
                cut_x_batch_span = syl_meta["x"][syl_indices_in_span] + \
                                   visual_progress * syl_meta["w"][syl_indices_in_span]

                # Iterate through syllables *in the active span* only
                num_syls_in_meta = syl_meta["line_idx"].shape[0] # Get actual size of the array
                syl_indices_in_span_host = cp.asnumpy(syl_indices_in_span) # Bring indices to host for easier iteration/debug

                # Use host iteration for k to easily check bounds
                for i, k_host in enumerate(syl_indices_in_span_host):
                    # Convert k_host back to int if needed, though it should be int already
                    k = int(k_host)

                    # --- Explicit Bounds Check ---
                    if k < 0 or k >= num_syls_in_meta:
                        print(f"!!! FATAL ERROR: Index k ({k}) is out of bounds for syl_meta['line_idx'] (size {num_syls_in_meta}).")
                        print(f"    s0={s0}, s1={s1}, active_line_idx={active_line_idx}")
                        # This indicates a logic error in how indices (s0, s1) or syl_meta are generated.
                        # Raising an error is appropriate here.
                        raise IndexError(f"Syllable index k={k} out of bounds (size={num_syls_in_meta})")
                    # --- End Bounds Check ---

                    # Filter: only process syllables belonging to the truly active line
                    # Accessing syl_meta["line_idx"][k] happens here
                    if syl_meta["line_idx"][k] != active_line_idx:
                        continue

                    # Extract syllable bounding box for kernel
                    # Ensure k is still valid before accessing other parts of syl_meta
                    # (redundant if the first check passed, but safe)
                    if k < 0 or k >= syl_meta["x"].shape[0]: # Check against another array size for extra safety
                         print(f"!!! WARNING: Index k ({k}) became invalid before accessing bbox?")
                         continue

                    sx, sy = syl_meta["x"][k], syl_meta["y"][k]
                    sw, sh = syl_meta["w"][k], syl_meta["h"][k]

                    # Skip if width or height is zero
                    if sw <= 0 or sh <= 0:
                        continue

                    # Get the cut_x values specific to *this syllable* across the batch
                    # Shape: (B,)
                    # We need the index 'i' which corresponds to the position within the span
                    current_syl_cut_x_batch = cut_x_batch_span[:, i]

                    # Call the ElementwiseKernel
                    # It modifies 'inter_f' in-place
                    # Pass grids (H,W), masks (H,W), scalar bbox, scalar colors
                    # Pass cut_x for the batch (B,) as 'raw'
                    # Kernel uses 'i' (batch index) to get the correct cut_x
                    self._hl_kernel(current_syl_cut_x_batch, # raw float32 (B,)
                                    text_mask_bool,          # bool (H, W)
                                    self.ctx.xg,             # uint16 (H, W)
                                    self.ctx.yg,             # uint16 (H, W)
                                    sx, sy, sw, sh,          # uint16 scalars
                                    self.ctx.hl_rgbf[0], self.ctx.hl_rgbf[1], self.ctx.hl_rgbf[2], # float32 scalars
                                    self.ctx.base_rgbf[0], self.ctx.base_rgbf[1], self.ctx.base_rgbf[2], # float32 scalars
                                    inter_f[..., 0],         # float32 (B, H, W) R channel
                                    inter_f[..., 1],         # float32 (B, H, W) G channel
                                    inter_f[..., 2])         # float32 (B, H, W) B channel
                                    # REMOVED: size=B * H * W) # Let CuPy infer the size

                # Cleanup batch-specific syllable calculations
                del time_elapsed, visual_progress, cut_x_batch_span
                if 'current_syl_cut_x_batch' in locals(): del current_syl_cut_x_batch

        # 6. Convert final float32 RGB to uint8 BGR and write to output buffer
        # Perform conversion and channel swap in one step if possible
        # Ensure output is contiguous for potential speedup if needed later
        out_buf[:] = (inter_f[..., ::-1] * 255.0).astype(cp.uint8)

        # Cleanup main intermediate buffer
        del inter_f

    # -----------------------------------------------------------------------

    def process_frames_streaming(self,
                                 base_cp, mask_cp, syl_info, active_indices,
                                 video_writer, # FFmpegWriter instance
                                 video_lock,   # Threading lock for FFmpegWriter
                                 fps,
                                 first_frame_idx, num_frames_to_process, # Renamed n_frames -> num_frames_to_process
                                 active_line_idx, completed_lines_set,
                                 long_pauses): # Renamed bar_pauses -> long_pauses
        """
        Processes a sequence of frames, potentially using CUDA Graphs.
        Handles pre-calculation, batching, GPU execution, and H2D transfer.
        Writes frames to FFmpeg via the video_writer.
        """
        if num_frames_to_process <= 0:
            print("Warning: process_frames_streaming called with num_frames_to_process <= 0.")
            return

        # Determine batch size (use config value, but ensure it's reasonable)
        # Could add dynamic sizing based on VRAM here later if needed
        batch_size = max(32, min(self.cfg.get("frames_per_batch", 64), 512)) # Example bounds

        # ----------   Pré-cálculo de dados constantes (for this whole segment) ---------------
        # Mask where text *could* appear (remains constant for this render call)
        text_mask_bool = (mask_cp > 128) if mask_cp is not None else cp.zeros((self.ctx.h, self.ctx.w), dtype=bool)

        # Create a combined mask of *all pixels* belonging to *any completed line*
        completed_line_pixels_mask = None
        if completed_lines_set and syl_info:
            # Find all syllable indices belonging to completed lines
            completed_syl_indices = [
                idx for idx, s_info in enumerate(syl_info)
                if s_info[6] in completed_lines_set # s_info[6] is global_line_idx
            ]

            if completed_syl_indices:
                # Initialize mask to False
                completed_line_pixels_mask = cp.zeros_like(text_mask_bool)
                # Iterate through completed syllables and mark their pixels in the mask
                X_grid, Y_grid = self.ctx.xg, self.ctx.yg
                for idx in completed_syl_indices:
                    s_start, s_end, sx, sy, sw, sh, _, _ = syl_info[idx]
                    if sw > 0 and sh > 0:
                        # Define the bounding box for the syllable
                        syl_bbox_mask = (X_grid >= sx) & (X_grid < sx + sw) & \
                                        (Y_grid >= sy) & (Y_grid < sy + sh)
                        # Combine with the general text mask and OR into the final mask
                        completed_line_pixels_mask |= (syl_bbox_mask & text_mask_bool)
                del X_grid, Y_grid, completed_syl_indices # Cleanup
            else:
                 # No syllables found for the completed lines, mask remains None
                 pass

        # Convert syllable info into a dictionary of CuPy arrays (if syllables exist)
        syl_meta = None
        if syl_info:
            num_syls = len(syl_info)
            if num_syls > 0:
                syl_meta = {
                    # Ensure types are optimal (uint16 for coords/dims, int16 for index)
                    "start"   : cp.asarray([s[0] for s in syl_info], dtype=cp.float32),
                    "end"     : cp.asarray([s[1] for s in syl_info], dtype=cp.float32),
                    "x"       : cp.asarray([s[2] for s in syl_info], dtype=cp.uint16),
                    "y"       : cp.asarray([s[3] for s in syl_info], dtype=cp.uint16),
                    "w"       : cp.asarray([s[4] for s in syl_info], dtype=cp.uint16),
                    "h"       : cp.asarray([s[5] for s in syl_info], dtype=cp.uint16),
                    "line_idx": cp.asarray([s[6] for s in syl_info], dtype=cp.int16),
                    # is_sentence_end (s[7]) is not directly used in rendering kernel, but kept if needed elsewhere
                }
                # Calculate effective visual duration for highlighting (clipped)
                raw_duration = syl_meta["end"] - syl_meta["start"]
                # Apply clipping: duration is at least min_dur and at most max_vis
                syl_meta["vis_dur"] = cp.clip(raw_duration, self.min_dur, self.max_vis).astype(cp.float32)
            else:
                 # syl_info was provided but was empty list
                 pass
        else:
             # syl_info was None
             pass


        # ----------   Execução por batches   ---------------------------
        self.ctx.ensure_batch_buffers(batch_size) # Ensure buffers A/B exist and are large enough
        outA, outB = self.ctx.out_a, self.ctx.out_b # Get references

        # Prepare for CUDA Graph capture
        warmup_frames_count = self.cfg.get("cuda_graph_warmup_frames", 4)
        graph_warmup_done = False
        graph_was_captured = self.ctx.graph is not None # Check if graph exists from previous calls

        total_processed_in_call = 0
        current_batch_start_frame = first_frame_idx

        while total_processed_in_call < num_frames_to_process:
            # Determine size of the current batch
            remaining_frames = num_frames_to_process - total_processed_in_call
            current_batch_size = min(batch_size, remaining_frames)

            if current_batch_size <= 0: # Should not happen, but safety check
                break

            # Get the correct output buffer slice (ping-pong)
            # Use total_processed_in_call to determine A or B buffer consistently
            buffer_index = (total_processed_in_call // batch_size) % 2
            out_buf_slice = outA[:current_batch_size] if buffer_index == 0 else outB[:current_batch_size]

            # Calculate frame indices and times for this specific batch
            batch_frame_indices = cp.arange(
                current_batch_start_frame,
                current_batch_start_frame + current_batch_size,
                dtype=cp.int32
            )
            batch_frame_times = batch_frame_indices.astype(cp.float32) / fps

            # Determine progress bar status for this batch
            batch_bar_progress = None
            if long_pauses:
                # Find if any frame in this batch falls within *any* long pause
                batch_in_pause = cp.zeros(current_batch_size, dtype=bool)
                batch_progress_values = cp.zeros(current_batch_size, dtype=cp.float32)
                for pause in long_pauses:
                    pause_start, pause_end, pause_duration = pause["start"], pause["end"], pause["duration"]
                    # Find frames within this pause's time range
                    indices_in_pause = cp.where((batch_frame_times >= pause_start) & (batch_frame_times < pause_end))[0]
                    if indices_in_pause.size > 0:
                        batch_in_pause[indices_in_pause] = True
                        # Calculate progress only for frames in *this* pause
                        progress = (batch_frame_times[indices_in_pause] - pause_start) / max(pause_duration, 1e-6)
                        batch_progress_values[indices_in_pause] = cp.clip(progress, 0.0, 1.0)

                # If any frame was in a pause, use the calculated progress values
                if cp.any(batch_in_pause):
                    batch_bar_progress = batch_progress_values
                del batch_in_pause, batch_progress_values # Cleanup pause calculation temps


            # ----------------  Execute _render_batch (GPU) -----------------
            can_use_graph = graph_was_captured and graph_warmup_done

            if can_use_graph:
                # --- Launch existing CUDA Graph ---
                # Note: We assume the graph was captured with the *maximum* batch size.
                # We are launching it for a potentially *smaller* batch size.
                # This is generally okay, but less efficient if sizes vary wildly.
                # TODO: Potentially re-capture graph if batch size changes significantly?
                # For simplicity now, we launch the existing one.
                # We need to ensure the *input data pointers* used by the graph are updated
                # if they change location (e.g., if batch_frame_times points to new memory).
                # However, CuPy's graph API often handles this if you pass the arrays directly.
                # We might need explicit graph.update() if issues arise.

                # Assuming _render_batch inputs are stable or handled by CuPy's graph launch:
                try:
                     # Ensure inputs match what the graph expects (might need updates)
                     # For now, assume launch handles it or inputs are stable enough
                     self.ctx.graph.launch(self.stream_compute) # Launch on the compute stream
                except Exception as graph_launch_err:
                     print(f"Error launching CUDA Graph: {graph_launch_err}. Falling back.")
                     # Fallback to regular execution for this batch
                     self._render_batch(batch_frame_times, text_mask_bool, syl_meta,
                                       active_indices, active_line_idx,
                                       completed_line_pixels_mask, batch_bar_progress, out_buf_slice)
                     # Consider disabling graph use for subsequent batches if it keeps failing
                     # graph_was_captured = False # Option: disable graph if launch fails
            else:
                # --- Regular execution or Graph Warmup/Capture ---
                # Use the compute stream for the rendering task
                with self.stream_compute:
                    self._render_batch(batch_frame_times, text_mask_bool, syl_meta,
                                       active_indices, active_line_idx,
                                       completed_line_pixels_mask, batch_bar_progress, out_buf_slice)

                # Check if warmup period is over and graph hasn't been captured yet
                if not graph_was_captured and total_processed_in_call >= warmup_frames_count:
                    print(f"Capturing CUDA Graph after {total_processed_in_call + current_batch_size} frames...")
                    try:
                        graph = cp.cuda.Graph()
                        with graph.capture(stream=self.stream_compute):
                            # Re-run the *last* batch's render call inside capture
                            self._render_batch(batch_frame_times, text_mask_bool, syl_meta,
                                               active_indices, active_line_idx,
                                               completed_line_pixels_mask, batch_bar_progress, out_buf_slice)

                        self.ctx.graph = graph.instantiate()
                        graph_was_captured = True
                        graph_warmup_done = True # Mark warmup as complete
                        print("CUDA Graph captured successfully.")
                    except Exception as graph_capture_err:
                        print(f"Error capturing CUDA Graph: {graph_capture_err}. Graph will not be used.")
                        self.ctx.graph = None # Ensure graph is None if capture failed
                        graph_warmup_done = True # Still mark warmup done to prevent re-attempts
            # -----------------------------------------------------------------

            # Synchronize the compute stream to ensure rendering is finished
            self.stream_compute.synchronize()

            # ----------  Asynchronous D->H Copy and Write to FFmpeg -----------
            # Get the output buffer (which is BGR uint8)
            output_bgr_gpu = out_buf_slice # Already in BGR uint8 from _render_batch

            # Ensure pinned memory is ready
            required_bytes = output_bgr_gpu.nbytes
            pinned_buffer = self.ctx.get_pinned_buffer(required_bytes)

            # Perform async D->H copy using the H2D stream (stream_h2d is just a name here)
            # Source: GPU buffer, Dest: Pinned Host buffer
            cp.cuda.runtime.memcpyAsync(
                pinned_buffer.ptr,          # Destination: Pinned host memory pointer
                output_bgr_gpu.data.ptr,    # Source: GPU memory pointer
                required_bytes,             # Size in bytes
                cp.cuda.runtime.memcpyDeviceToHost, # Direction
                self.stream_h2d.ptr         # Stream to use for the copy
            )

            # Synchronize the H2D stream to ensure copy is finished
            self.stream_h2d.synchronize()

            # Get a NumPy view of the pinned memory (no copy involved here)
            frame_data_np = np.frombuffer(pinned_buffer, dtype=np.uint8, count=required_bytes)
            # Reshape to (current_batch_size, H, W, 3) - FFmpeg expects this BGR format
            frames_to_write = frame_data_np.reshape(current_batch_size, self.ctx.h, self.ctx.w, 3)

            # Write the batch to FFmpeg using the provided writer and lock
            try:
                with video_lock:
                    # Iterate and write frame by frame if writer expects single frames
                    # Or write the whole batch if writer supports it (check FFmpegWriter impl.)
                    # Assuming FFmpegWriter's write handles a single frame numpy array:
                    for frame_idx in range(current_batch_size):
                         video_writer.write(frames_to_write[frame_idx])
                    # If FFmpegWriter could handle the whole batch byte buffer:
                    # video_writer.write_bytes(pinned_buffer.tobytes()[:required_bytes]) # Hypothetical faster write
            except Exception as write_err:
                 print(f"Error writing batch to FFmpeg: {write_err}")
                 # Decide how to handle: stop processing, log, etc.
                 # For now, just print and continue, but might need better error handling.

            # Update counters
            total_processed_in_call += current_batch_size
            current_batch_start_frame += current_batch_size

            # Minor optimization: Check if warmup is complete after the batch is processed
            if not graph_warmup_done and total_processed_in_call >= warmup_frames_count:
                 graph_warmup_done = True

            # Optional: Free memory aggressively if needed (usually handled by pool)
            # cp.get_default_memory_pool().free_all_blocks()
            # cp.get_default_pinned_memory_pool().free_all_blocks()

        # --- End of batch loop ---

        # Cleanup pre-calculated data if it consumes significant memory
        del text_mask_bool
        if completed_line_pixels_mask is not None: del completed_line_pixels_mask
        if syl_meta:
            for key in list(syl_meta.keys()): del syl_meta[key]
            del syl_meta
        # Pinned memory buffer (`self.ctx.pinned_mem`) is kept for reuse

# ---------------------------------------------------------------------------


# =====================  FFMPEG WRITER (Mostly Unchanged) ===================
class FFmpegWriter:
    def __init__(self, output_file, width, height, fps, config, gpu_ctx: GPURenderContext = None):
        self.output_file = output_file
        self.config = config
        self.gpu_ctx = gpu_ctx # Optional context for pinned memory reuse
        self.width = width
        self.height = height
        self.frame_size_bytes = width * height * 3 # BGR24

        ffmpeg_cmd = [
            "ffmpeg", "-y",
            "-f", "rawvideo", "-vcodec", "rawvideo",
            "-s", f"{width}x{height}", "-pix_fmt", "bgr24",
            "-r", str(fps),
            "-i", "-", # Input from stdin
            # Video codec options
            "-c:v", config.get("ffmpeg_codec", "libx264"), # Default to libx264 if not specified
            "-preset", config.get("ffmpeg_preset", "medium"),
            "-b:v", config.get("ffmpeg_bitrate", "5M"), # Default bitrate
            "-pix_fmt", "yuv420p", # Common pixel format for compatibility
            "-tune", config.get("ffmpeg_tune", "fastdecode"), # Example tune
            # Output file
            output_file
        ]

        print(f"FFmpeg command: {' '.join(ffmpeg_cmd)}") # Log the command

        # Use a large buffer for stdin if possible
        bufsize = 10**8 # ~100MB buffer

        try:
            self.ffmpeg_process = subprocess.Popen(
                ffmpeg_cmd,
                stdin=subprocess.PIPE,
                stdout=subprocess.DEVNULL, # Suppress stdout unless needed for debugging
                stderr=subprocess.PIPE,    # Capture stderr
                bufsize=bufsize
            )
        except FileNotFoundError:
            print("ERROR: ffmpeg command not found. Is FFmpeg installed and in your PATH?")
            raise
        except Exception as e:
            print(f"ERROR: Failed to start FFmpeg process: {e}")
            raise

        # Thread to read stderr without blocking
        self.stderr_queue = queue.Queue()
        self.stderr_thread = threading.Thread(target=self._read_stderr, daemon=True)
        self.stderr_thread.start()

    def _read_stderr(self):
        """Reads FFmpeg's stderr output line by line and puts it in a queue."""
        try:
            # Use iter to read lines efficiently
            for line in iter(self.ffmpeg_process.stderr.readline, b''):
                self.stderr_queue.put(line.decode('utf-8', errors='replace').strip())
        except Exception as e:
            # Handle exceptions during stderr reading if necessary
            self.stderr_queue.put(f"Error reading FFmpeg stderr: {e}")
        finally:
            # Signal that reading is done (optional)
            self.stderr_queue.put(None) # Sentinel value

    def write(self, frame: np.ndarray):
        """Writes a single frame (NumPy array, HxWxC BGR uint8) to FFmpeg."""
        if self.ffmpeg_process.stdin.closed:
             print("Warning: Attempted to write to closed FFmpeg stdin.")
             return
        try:
            # Ensure frame is numpy and bytes
            if isinstance(frame, cp.ndarray):
                frame = cp.asnumpy(frame) # Convert CuPy array if necessary
            # Check shape and type? Optional, depends on strictness needed.
            # if frame.shape != (self.height, self.width, 3) or frame.dtype != np.uint8:
            #     print(f"Warning: Invalid frame shape/type: {frame.shape} {frame.dtype}")
            #     # Handle error? resize/convert? skip?
            #     return

            self.ffmpeg_process.stdin.write(frame.tobytes())
        except (OSError, BrokenPipeError) as e:
            print(f"ERROR writing frame to FFmpeg: {e}. FFmpeg might have terminated.")
            self.release() # Attempt cleanup
            raise # Re-raise the exception

    def release(self):
        """Closes FFmpeg stdin, waits for the process, and prints stderr."""
        if self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed:
            try:
                self.ffmpeg_process.stdin.close()
            except OSError as e:
                 print(f"Warning: Error closing FFmpeg stdin: {e}")

        # Wait for the process to finish
        return_code = self.ffmpeg_process.wait()

        # Wait for the stderr reader thread to finish
        self.stderr_thread.join(timeout=2.0) # Add timeout

        # Print accumulated stderr messages
        print("\n--- FFmpeg stderr ---")
        while not self.stderr_queue.empty():
             line = self.stderr_queue.get()
             if line is None: break # Sentinel hit
             print(line)
        print("--- End FFmpeg stderr ---\n")

        if return_code != 0:
            print(f"Warning: FFmpeg process exited with non-zero status: {return_code}")
        else:
             print("FFmpeg process finished successfully.")

# ---------------------------------------------------------------------------


# ===================== KARAOKE VIDEO CREATOR (Adjusted) ===================
class KaraokeVideoCreator:
    def __init__(self, config, text_renderer: TextRenderer):
        self.config = config
        self.fps = config["video_fps"]
        self.text_renderer = text_renderer
        self.num_visible_lines = config["num_visible_lines"]
        try:
            width, height = map(int, self.config["video_resolution"].split("x"))
        except ValueError:
            print(f"Aviso: Resolução inválida '{self.config['video_resolution']}'. Usando 1920x1080.")
            width, height = 1920, 1080
        self.width = width
        self.height = height

        # Load background (unchanged logic)
        self.static_bg_frame_rgb_cp = None # Will hold CuPy array
        self.static_bg_frame_bgr_np = None # Fallback/initial frame use
        bg_path = config.get("background_image")
        try:
            if bg_path and os.path.exists(bg_path):
                bg_img = Image.open(bg_path).convert("RGB").resize((width, height), Image.Resampling.LANCZOS)
                self.static_bg_frame_rgb_cp = cp.asarray(np.array(bg_img)) # RGB for GPU
                self.static_bg_frame_bgr_np = np.array(bg_img)[:, :, ::-1].copy() # BGR for OpenCV/FFmpeg initial
            else: raise FileNotFoundError("Background not found or not specified")
        except Exception as e:
            print(f"Aviso: Falha ao carregar fundo '{bg_path}': {e}. Usando fundo preto.")
            self.static_bg_frame_rgb_cp = cp.zeros((height, width, 3), dtype=cp.uint8)
            self.static_bg_frame_bgr_np = np.zeros((height, width, 3), dtype=np.uint8)

        # Initialize GPU Context and CUDA Processor *here*
        self.init_gpu() # Ensures GPU is ready
        self.gpu_ctx = GPURenderContext(self.width, self.height, self.config)
        self.cuda_processor = CUDAProcessor(
            self.config,
            self.static_bg_frame_rgb_cp, # Pass the RGB CuPy background
            self.gpu_ctx                 # Pass the shared GPU context
        )

    def init_gpu(self):
        """Initializes the GPU device and memory pool (minimal version)."""
        try:
            cp.cuda.Device(0).use()
            # Memory pool is handled by GPURenderContext now
            cp.cuda.Stream.null.synchronize()
            _ = cp.zeros(1) # Warm-up allocation
            print("GPU initialized successfully.")
        except cp.cuda.runtime.CUDARuntimeError as e:
            print(f"Erro Crítico: Falha ao inicializar CUDA: {e}")
            raise
        except Exception as e:
            print(f"Erro Crítico: Falha inesperada na inicialização da GPU: {e}")
            raise

    def _get_next_global_indices(self, current_displayed_indices, count=2):
        """Finds the next `count` global line indices not currently displayed."""
        valid_indices = {idx for idx in current_displayed_indices if idx is not None}
        max_existing_idx = max(valid_indices) if valid_indices else -1
        next_indices = []
        candidate_idx = max_existing_idx + 1
        while len(next_indices) < count:
            next_indices.append(candidate_idx)
            candidate_idx += 1
        return next_indices

    def create_video(self, syllable_lines, long_pauses, output_file, audio_file_path):
        """Creates the karaoke video using the optimized CUDA processor."""
        start_total_time = time.time()
        width, height = self.width, self.height
        N = self.num_visible_lines # Number of slots on screen

        # --- Determine video duration ---
        audio_duration = get_audio_duration(audio_file_path)
        last_syl_end_time = 0.0
        first_syl_start_time = 0.0
        if syllable_lines:
            # Find last valid line and its end time
            last_valid_line_idx = next((idx for idx in range(len(syllable_lines) - 1, -1, -1) if syllable_lines[idx]), -1)
            if last_valid_line_idx != -1 and syllable_lines[last_valid_line_idx]:
                 # Get end time of the last element in the last valid line
                 last_syl_end_time = syllable_lines[last_valid_line_idx][-1][1] # index 1 is end_time
            # Find first valid line and its start time
            first_valid_line_idx = next((idx for idx, ln in enumerate(syllable_lines) if ln), -1)
            if first_valid_line_idx != -1 and syllable_lines[first_valid_line_idx]:
                first_syl_start_time = syllable_lines[first_valid_line_idx][0][0] # index 0 is start_time
        else:
             print("Aviso: Nenhuma linha de sílaba para processar.")
             last_syl_end_time = 1.0 # Default minimum duration if no syllables

        # Set video end time based on audio or subtitles, adding a small buffer
        video_end_time = last_syl_end_time + 0.5 # Default end based on subtitles
        if audio_duration is not None:
             video_end_time = max(video_end_time, audio_duration + 0.1)
             print(f"Usando duração do áudio: {audio_duration:.2f}s")
        else:
             print(f"Aviso: Sem duração de áudio. Vídeo terminará em {video_end_time:.2f}s (baseado nas legendas).")

        total_frames = math.ceil(video_end_time * self.fps)
        if total_frames <= 0:
             print("Erro: Duração total do vídeo calculada como 0 ou negativa. Abortando.")
             return

        print(f"Duração estimada: {video_end_time:.2f}s | Total de frames: {total_frames}")

        # --- Initialize FFmpeg Writer ---
        try:
            # Pass gpu_ctx if FFmpegWriter can use it for pinned memory
            video_writer = FFmpegWriter(output_file, width, height, self.fps, self.config, self.gpu_ctx)
        except Exception as e:
            print(f"Erro Crítico: Falha ao inicializar FFmpegWriter: {e}")
            return

        video_lock = threading.Lock() # Lock for thread-safe writing to FFmpeg process

        # --- Main Processing Loop ---
        current_frame_index = 0
        num_total_lines = len(syllable_lines)

        pbar = tqdm(total=total_frames, unit="frames", desc="Gerando vídeo",
                   bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}{postfix}]")

        # State variables for line display logic
        # displayed_content format: list of N tuples: (global_line_idx | None, line_data | None)
        displayed_content = []
        for idx in range(N):
            line_data = syllable_lines[idx] if idx < num_total_lines else None
            displayed_content.append((idx if line_data else None, line_data))

        completed_global_line_indices = set() # Keep track of lines fully processed
        prev_render_data_for_fill = None # Store last valid render data for filling gaps

        # --- Initial Static Frames (before first syllable) ---
        initial_static_frames = 0
        if first_syl_start_time > 0.01: # Add a small tolerance
            initial_static_frames = max(0, int(first_syl_start_time * self.fps))

        if initial_static_frames > 0:
            print(f"Gerando {initial_static_frames} frames estáticos iniciais...")
            try:
                # Render the initial state (first N lines, no highlight)
                initial_base_cp, initial_mask_cp, initial_syl_info, _ = self.text_renderer.render_text_images(
                    displayed_content, -1, width, height # -1 indicates no active line
                )
                if initial_base_cp is not None and initial_mask_cp is not None:
                     # Use process_frames_streaming to render static frames
                     # Pass empty syllable info or handle appropriately in CUDAProcessor if needed
                     self.cuda_processor.process_frames_streaming(
                         initial_base_cp, initial_mask_cp,
                         [], # No active syllables initially
                         (-1, -1),
                         video_writer, video_lock, self.fps,
                         0, initial_static_frames, # Frame range
                         -1, # No active line index
                         set(), # No completed lines initially
                         long_pauses # Pass pauses for potential initial progress bar
                     )
                     prev_render_data_for_fill = (initial_base_cp.copy(), initial_mask_cp.copy()) # Store for potential gap filling
                     del initial_base_cp, initial_mask_cp, initial_syl_info
                else:
                     print("Aviso: Renderização inicial falhou. Preenchendo com background.")
                     # Fallback: write raw background frames
                     with video_lock:
                          for _ in range(initial_static_frames):
                               video_writer.write(self.static_bg_frame_bgr_np)
                     prev_render_data_for_fill = None

                pbar.update(initial_static_frames)
                current_frame_index = initial_static_frames
            except Exception as e:
                print(f"Erro ao gerar frames iniciais: {e}")
                traceback.print_exc()
                # Fallback if rendering fails
                with video_lock:
                    for _ in range(initial_static_frames):
                        video_writer.write(self.static_bg_frame_bgr_np)
                pbar.update(initial_static_frames)
                current_frame_index = initial_static_frames
                prev_render_data_for_fill = None

        # --- Process Each Line/Sentence ---
        last_line_processed_frame = current_frame_index
        trigger_1_pending_for_line = -1 # Track if top needs update
        trigger_2_pending = False       # Track if bottom needs update
        last_trigger_1_line_completed = -1
        last_trigger_2_line_completed = -1

        for current_global_line_idx, current_line_syllables in enumerate(syllable_lines):
            if not current_line_syllables:
                print(f"Aviso: Pulando linha global vazia {current_global_line_idx}")
                continue

            line_start_time = current_line_syllables[0][0]
            line_end_time = current_line_syllables[-1][1]
            line_start_frame = int(line_start_time * self.fps)

            # --- Handle Line Transitions and Content Updates ---
            # Check if Trigger 2 (bottom update) is pending from the *previous* line completion
            if trigger_2_pending and current_global_line_idx != last_trigger_2_line_completed:
                print(f"Trigger 2: Atualizando slots inferiores antes da linha {current_global_line_idx}")
                current_indices_on_screen = [content[0] for content in displayed_content]
                next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2 available

                new_idx_bottom1 = next_indices[0]
                new_data_bottom1 = syllable_lines[new_idx_bottom1] if new_idx_bottom1 < num_total_lines else None
                displayed_content[N-2] = (new_idx_bottom1 if new_data_bottom1 else None, new_data_bottom1)

                new_idx_bottom2 = next_indices[1]
                new_data_bottom2 = syllable_lines[new_idx_bottom2] if new_idx_bottom2 < num_total_lines else None
                displayed_content[N-1] = (new_idx_bottom2 if new_data_bottom2 else None, new_data_bottom2)

                trigger_2_pending = False # Reset trigger
                last_trigger_2_line_completed = current_global_line_idx # Mark update time

            # Find the local slot index for the current global line
            active_local_idx = -1
            for local_idx, (global_idx, _) in enumerate(displayed_content):
                if global_idx == current_global_line_idx:
                    active_local_idx = local_idx
                    break

            if active_local_idx == -1:
                print(f"ERRO FATAL: Linha ativa {current_global_line_idx} não encontrada nos slots! {displayed_content}")
                # This indicates a logic error in content updates. Need to decide how to handle.
                # Option 1: Try to recover (e.g., force it into the last slot?) - Risky
                # Option 2: Abort or skip processing this line and fill time - Safer
                # For now, fill time and skip processing the line's content
                frames_to_fill_until_next = 0
                next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1)
                if next_valid_line_idx != -1:
                    next_line_start_time = syllable_lines[next_valid_line_idx][0][0]
                    next_start_frame = int(next_line_start_time * self.fps)
                    frames_to_fill_until_next = max(0, next_start_frame - current_frame_index)
                else: # No more lines
                    frames_to_fill_until_next = max(0, total_frames - current_frame_index)

                if frames_to_fill_until_next > 0:
                     print(f"  Preenchendo {frames_to_fill_until_next} frames devido à linha não encontrada...")
                     # Use previous render data if available for filling
                     if prev_render_data_for_fill:
                          fill_base, fill_mask = prev_render_data_for_fill
                          self.cuda_processor.process_frames_streaming(
                              fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps,
                              current_frame_index, frames_to_fill_until_next,
                              -1, completed_global_line_indices, long_pauses
                          )
                     else: # Fallback to background
                          with video_lock:
                              for _ in range(frames_to_fill_until_next): video_writer.write(self.static_bg_frame_bgr_np)
                     pbar.update(frames_to_fill_until_next)
                     current_frame_index += frames_to_fill_until_next

                continue # Skip to the next line in syllable_lines

            # --- Fill Gap Before Line Starts (if needed) ---
            frames_to_fill_gap = max(0, line_start_frame - current_frame_index)
            if frames_to_fill_gap > 0:
                 print(f"Preenchendo {frames_to_fill_gap} frames de gap antes da linha {current_global_line_idx}...")
                 if prev_render_data_for_fill:
                      fill_base, fill_mask = prev_render_data_for_fill
                      self.cuda_processor.process_frames_streaming(
                          fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps,
                          current_frame_index, frames_to_fill_gap,
                          -1, completed_global_line_indices, long_pauses # No active line during gap
                      )
                 else: # Fallback
                     with video_lock:
                         for _ in range(frames_to_fill_gap): video_writer.write(self.static_bg_frame_bgr_np)
                 pbar.update(frames_to_fill_gap)
                 current_frame_index += frames_to_fill_gap

            # --- Render the current state with the active line ---
            render_success = False
            render_data = None
            try:
                # Render base text and mask for the current arrangement of lines
                render_data = self.text_renderer.render_text_images(
                    displayed_content, active_local_idx, width, height
                )
                base_cp, mask_cp, all_syl_info, active_indices = render_data
                if base_cp is None or mask_cp is None:
                     raise ValueError("TextRenderer returned None for base or mask.")
                render_success = True
            except Exception as e:
                 print(f"Erro Crítico ao renderizar texto para linha {current_global_line_idx}: {e}")
                 traceback.print_exc()
                 render_success = False # Fallback handled below

            # --- Process the frames for this line's duration ---
            if render_success:
                base_cp, mask_cp, all_syl_info, active_indices = render_data

                # Determine the end frame for processing this line's active state
                # Usually ends when the next line starts, or at video end
                next_line_start_time = float('inf')
                next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1)
                if next_valid_line_idx != -1:
                     next_line_start_time = syllable_lines[next_valid_line_idx][0][0]

                # End processing at the earliest of: line end + buffer, next line start, video end
                processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time) # Add small buffer?
                processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames)

                # Calculate number of frames to generate for this active line state
                effective_start_frame = max(line_start_frame, current_frame_index) # Start from current pos if line started earlier
                num_frames_for_line = max(0, processing_end_frame - effective_start_frame)

                if num_frames_for_line > 0:
                    # Check if Trigger 1 needs to be set (update top half)
                    is_penultimate_line_slot = (active_local_idx == N - 2)
                    trigger_1_frame = -1
                    if is_penultimate_line_slot and current_global_line_idx != last_trigger_1_line_completed:
                        line_duration = max(line_end_time - line_start_time, 0.01)
                        midpoint_time = line_start_time + line_duration / 2.0
                        trigger_1_frame = int(midpoint_time * self.fps)
                        trigger_1_pending_for_line = current_global_line_idx # Mark which line set the trigger

                    # Call the optimized CUDA processor
                    self.cuda_processor.process_frames_streaming(
                        base_cp, mask_cp, all_syl_info, active_indices,
                        video_writer, video_lock, self.fps,
                        effective_start_frame, num_frames_for_line,
                        current_global_line_idx, # Pass the *active* line index
                        completed_global_line_indices, # Pass completed lines for highlighting
                        long_pauses # Pass pauses for progress bar
                    )

                    processed_frames_end = effective_start_frame + num_frames_for_line

                    # --- Handle Trigger 1 (Top Update) ---
                    # Check if the trigger frame occurred within the processed frames
                    if trigger_1_pending_for_line == current_global_line_idx and trigger_1_frame != -1 and processed_frames_end > trigger_1_frame:
                        print(f"Trigger 1: Atualizando slots superiores durante linha {current_global_line_idx}")
                        current_indices_on_screen = [content[0] for content in displayed_content]
                        next_indices = self._get_next_global_indices(current_indices_on_screen, 2) # Get next 2

                        new_idx_top1 = next_indices[0]
                        new_data_top1 = syllable_lines[new_idx_top1] if new_idx_top1 < num_total_lines else None
                        displayed_content[0] = (new_idx_top1 if new_data_top1 else None, new_data_top1)

                        new_idx_top2 = next_indices[1]
                        new_data_top2 = syllable_lines[new_idx_top2] if new_idx_top2 < num_total_lines else None
                        displayed_content[1] = (new_idx_top2 if new_data_top2 else None, new_data_top2)

                        trigger_1_pending_for_line = -1 # Reset trigger
                        last_trigger_1_line_completed = current_global_line_idx # Mark update time

                    # Update progress bar and current frame index
                    # pbar update happens inside cuda_processor call now (implicitly via frame count)
                    current_frame_index = processed_frames_end
                    # Store the rendered mask/base for potential gap filling later
                    prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy())
                    del base_cp, mask_cp, all_syl_info # Free GPU memory for this line's render
                else:
                     # No frames needed processing (e.g., line ends before it starts relative to current_frame_index)
                     print(f"Aviso: Linha {current_global_line_idx} não precisou de processamento de frames ({num_frames_for_line}).")
                     # Ensure prev_render_data is updated even if no processing happened, using the latest render
                     prev_render_data_for_fill = (base_cp.copy(), mask_cp.copy()) if base_cp is not None and mask_cp is not None else prev_render_data_for_fill


                # Mark the current global line as completed *after* processing its frames
                completed_global_line_indices.add(current_global_line_idx)

                # --- Handle Trigger 2 (Bottom Update) ---
                # Check if this line was the last one in the visible slots
                is_last_line_slot = (active_local_idx == N - 1)
                if is_last_line_slot:
                    trigger_2_pending = True # Set trigger for the *next* line to handle

            else: # render_success was False
                print(f"Aviso: Falha na renderização para linha {current_global_line_idx}. Preenchendo tempo...")
                # Calculate fill duration similar to the success case
                next_line_start_time = float('inf')
                next_valid_line_idx = next((idx for idx in range(current_global_line_idx + 1, num_total_lines) if syllable_lines[idx]), -1)
                if next_valid_line_idx != -1: next_line_start_time = syllable_lines[next_valid_line_idx][0][0]
                processing_end_time = min(line_end_time + 0.1, next_line_start_time, video_end_time)
                processing_end_frame = min(math.ceil(processing_end_time * self.fps), total_frames)
                effective_start_frame = max(line_start_frame, current_frame_index)
                num_frames_to_fill = max(0, processing_end_frame - effective_start_frame)

                if num_frames_to_fill > 0:
                    if prev_render_data_for_fill:
                         fill_base, fill_mask = prev_render_data_for_fill
                         self.cuda_processor.process_frames_streaming(
                             fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps,
                             effective_start_frame, num_frames_to_fill,
                             -1, completed_global_line_indices, long_pauses # No active line
                         )
                    else: # Fallback
                        with video_lock:
                             for _ in range(num_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np)
                    pbar.update(num_frames_to_fill)
                    current_frame_index += num_frames_to_fill
                # Still mark line as 'completed' in terms of timing, even if render failed
                completed_global_line_indices.add(current_global_line_idx)


            # Clean up GPU memory periodically if needed (usually handled by pool)
            cp.get_default_memory_pool().free_all_blocks()

        # --- Fill Remaining Frames (after last line) ---
        final_frames_to_fill = total_frames - current_frame_index
        if final_frames_to_fill > 0:
            print(f"Preenchendo {final_frames_to_fill} frames finais...")
            if prev_render_data_for_fill:
                fill_base, fill_mask = prev_render_data_for_fill
                # Optionally add a fade-out effect here by manipulating the mask or base over time
                # For simplicity, just hold the last rendered state
                self.cuda_processor.process_frames_streaming(
                    fill_base, fill_mask, [], (-1,-1), video_writer, video_lock, self.fps,
                    current_frame_index, final_frames_to_fill,
                    -1, completed_global_line_indices, long_pauses # No active line
                )
                del fill_base, fill_mask # Cleanup last stored render data
            else: # Fallback
                with video_lock:
                    for _ in range(final_frames_to_fill): video_writer.write(self.static_bg_frame_bgr_np)
            pbar.update(final_frames_to_fill)
            current_frame_index += final_frames_to_fill

        # --- Cleanup ---
        pbar.close()
        video_writer.release() # Close FFmpeg process and print stderr
        if prev_render_data_for_fill: # Should be None, but just in case
            del prev_render_data_for_fill
        del displayed_content[:] # Clear display list

        # Explicitly clear GPU context graph if desired
        if self.gpu_ctx.graph:
             del self.gpu_ctx.graph
             self.gpu_ctx.graph = None
        # Optionally clear pinned memory
        # if self.gpu_ctx.pinned_mem: del self.gpu_ctx.pinned_mem; self.gpu_ctx.pinned_mem = None

        cp.get_default_memory_pool().free_all_blocks()
        cp.get_default_pinned_memory_pool().free_all_blocks()

        end_total_time = time.time()
        print(f"Criação do vídeo concluída em {time.strftime('%H:%M:%S', time.gmtime(end_total_time - start_total_time))}.")
# ---------------------------------------------------------------------------


# =============================  MAIN  ======================================
def main():
    start_main_time = time.time()
    config = DEFAULT_CONFIG.copy() # Start with defaults

    # --- Argument Parsing (Example - replace with your preferred method) ---
    # import argparse
    # parser = argparse.ArgumentParser(description="Karaoke Video Creator")
    # parser.add_argument("-s", "--subtitles", default=config["default_subtitle_file"], help="Input subtitle file (PSV format)")
    # parser.add_argument("-a", "--audio", default="audio.wav", help="Input audio file")
    # parser.add_argument("-o", "--output", default=config["default_output_file"], help="Output video file")
    # parser.add_argument("--font", default=config["font_path"], help="Path to TTF font file")
    # # Add other config options as needed
    # args = parser.parse_args()
    #
    # # Update config from args
    # config["default_subtitle_file"] = args.subtitles
    # config["default_output_file"] = args.output
    # config["font_path"] = args.font
    # audio_file = args.audio
    # subtitle_file = args.subtitles
    # output_file = args.output

    # --- Using defaults defined in DEFAULT_CONFIG for simplicity ---
    subtitle_file = config.get("default_subtitle_file", "legenda.psv")
    output_file = config.get("default_output_file", "video_karaoke_char_level.mp4")
    audio_file = "audio.wav" # Hardcoded for now, consider argparse

    print("--- Configuração ---")
    for key, value in config.items():
        print(f"  {key}: {value}")
    print(f"  Arquivo de Legenda: {subtitle_file}")
    print(f"  Arquivo de Áudio: {audio_file}")
    print(f"  Arquivo de Saída: {output_file}")
    print("--------------------\n")


    # --- Basic Checks ---
    if not os.path.exists(subtitle_file):
        print(f"Erro Crítico: Arquivo de legenda '{subtitle_file}' não encontrado.")
        return
    if not os.path.exists(audio_file):
        print(f"Aviso: Arquivo de áudio '{audio_file}' não encontrado. Duração baseada nas legendas.")

    # --- Initialize GPU & CPU Affinity ---
    try:
        cp.cuda.Device(0).use()
        print(f"Using GPU: {cp.cuda.Device(0).pci_bus_id}")
    except cp.cuda.runtime.CUDARuntimeError as e:
        if 'no CUDA-capable device is detected' in str(e): print("Erro Crítico: Nenhuma GPU CUDA detectada.")
        elif 'CUDA driver version is insufficient' in str(e): print("Erro Crítico: Driver NVIDIA CUDA desatualizado.")
        else: print(f"Erro Crítico: Falha ao inicializar CUDA: {e}")
        return
    except Exception as e:
        print(f"Erro inesperado na inicialização da GPU: {e}")
        return

    try:
        process = psutil.Process()
        affinity = list(range(os.cpu_count()))
        process.cpu_affinity(affinity)
        print(f"Afinidade da CPU definida para todos os {len(affinity)} cores.")
    except (ImportError, AttributeError, OSError, ValueError) as e:
        print(f"Aviso: Não foi possível definir afinidade da CPU: {e}")

    # --- Initialize Core Components ---
    try:
        text_renderer = TextRenderer(config)
        syllable_dict, not_found_words = load_syllables() # Load syllable dictionary
        if not syllable_dict: print("Aviso: Dicionário de sílabas vazio ou não carregado.")
        subtitle_processor = SubtitleProcessor(text_renderer, config, syllable_dict, not_found_words)
    except Exception as e:
        print(f"Erro Crítico ao inicializar componentes: {e}")
        traceback.print_exc()
        return

    # --- Process Subtitles ---
    lines = []
    long_pauses = []
    try:
        video_width, _ = map(int, config["video_resolution"].split("x"))
        print("Processando legendas...")
        lines, long_pauses = subtitle_processor.process_subtitles_to_syllable_lines(subtitle_file, video_width)
        print(f"Processamento concluído: {len(lines)} linhas visuais, {len(long_pauses)} pausas longas detectadas.")
        if not lines and not long_pauses:
             print("Aviso: Nenhuma linha visual ou pausa longa encontrada. O vídeo pode ficar vazio ou curto.")
             # Decide if you want to exit here or proceed with potentially empty video
             # return
        if not_found_words:
             print(f"\nAviso: {len(not_found_words)} palavras não encontradas no dicionário de sílabas:")
             print(" ", ", ".join(sorted(list(not_found_words))[:20]) + ("..." if len(not_found_words) > 20 else ""))
             print("  -> Estas palavras serão destacadas como um todo.\n")

    except Exception as e:
        print(f"Erro Crítico ao processar legendas: {e}")
        traceback.print_exc()
        return

    # --- Create Video ---
    if not lines and not long_pauses:
        print("Nenhuma linha visual ou pausa longa para processar. Saindo.")
        return

    try:
        # KaraokeVideoCreator now initializes GPU context and CUDA processor internally
        karaoke_creator = KaraokeVideoCreator(config, text_renderer)
        print("Iniciando criação do vídeo...")
        karaoke_creator.create_video(lines, long_pauses, output_file, audio_file)

    except Exception as e:
        print(f"\nErro Crítico durante a criação do vídeo:")
        print(f"{type(e).__name__}: {e}")
        traceback.print_exc()
        print("\nO processo foi interrompido devido a um erro.")

    finally:
        # Final GPU memory cleanup
        cp.get_default_memory_pool().free_all_blocks()
        cp.get_default_pinned_memory_pool().free_all_blocks()
        print("Recursos da GPU liberados.")

    end_main_time = time.time()
    total_duration_str = time.strftime('%H:%M:%S', time.gmtime(end_main_time - start_main_time))
    print(f"\nFinalizando Karaokê Creator. Tempo total do script: {total_duration_str}")

if __name__ == "__main__":
    main()

Answer 1

Por que o erro acontece

O cudaErrorIllegalAddress costuma aparecer quando um kernel acessa memória fora do limite do array que recebeu.
No seu “hot–spot” (_hl_kernel) isso acontece porque:

A variável implícita i do ElementwiseKernel não representa o índice do frame, mas sim o índice absoluto do elemento que está sendo executado (pixel).
```
cpp
float current_cut_x = cut_x_batch[i];   //  &larr;  i pode ser maior que len(cut_x_batch)
```
Como você passa um vector cut_x_batch com apenas B valores e lança o kernel sobre
B × H × W elementos, a maior parte das threads lê fora dos limites -– memória ilegal.
Depois que o kernel corrompe memória, a primeira operação que toca em syl_meta["line_idx"][k] (linha 665) quebra e a exceção “salta” aí.

Correção mínima (segura e rápida)

Execute o kernel uma vez por frame e passe cut_x como escalar, em vez de tentar indexá-lo dentro do kernel.

python
# 1️⃣  Redefina o kernel para receber apenas um corte escalar
self._hl_kernel = cp.ElementwiseKernel(
    'float32 cut_x, bool text_mask, uint16 X, uint16 Y, '
    'uint16 syl_x, uint16 syl_y, uint16 syl_w, uint16 syl_h, '
    'float32 hl_r, float32 hl_g, float32 hl_b, '
    'float32 base_r, float32 base_g, float32 base_b',
    'float32 io_r, float32 io_g, float32 io_b',
    r'''
        bool in_bbox = (X >= syl_x) && (X < syl_x + syl_w) &&
                       (Y >= syl_y) && (Y < syl_y + syl_h) && text_mask;

        if (in_bbox) {
            if (X < cut_x) {
                io_r = hl_r;  io_g = hl_g;  io_b = hl_b;   // parte já “cantada”
            } else {
                io_r = base_r; io_g = base_g; io_b = base_b; // ainda por cantar
            }
        }
    ''',
    name='apply_syllable_highlight_rgb_oneframe'
)

python
# 2️⃣  Dentro de _render_batch:
for k in syl_indices_in_span:                         # um loop por sílaba
    if syl_meta["line_idx"][k] != active_line_idx:
        continue
    sx, sy, sw, sh = map(int, (syl_meta["x"][k],
                               syl_meta["y"][k],
                               syl_meta["w"][k],
                               syl_meta["h"][k]))
    if sw == 0 or sh == 0:
        continue

    # shape: (B,)  &rarr; percorremos frame-a-frame
    cuts = cut_x_batch_span[:, k - s0]                # s0 é início da span
    for f in range(B):
        self._hl_kernel( cuts[f].astype(cp.float32),  # 🔸 passa ESCALAR
                         text_mask_bool,
                         self.ctx.xg, self.ctx.yg,
                         sx, sy, sw, sh,
                         *self.ctx.hl_rgbf, *self.ctx.base_rgbf,
                         inter_f[f, ..., 0],           # canais do frame f
                         inter_f[f, ..., 1],
                         inter_f[f, ..., 2] )

Este ajuste garante:

Nenhum acesso fora dos limites (porque cada kernel “vê” só um valor cut_x).
Simples de entender/manter; a sobrecarga é irrelevante (B ≤ 128, por padrão).

Outras melhorias recomendadas

Problema	Como resolver
`syl_indices_in_span_host = cp.asnumpy(...)` faz ida/volta CPU⇄GPU para cada batch	Faça o loop inteiro em GPU: `for k in syl_indices_in_span.tolist(): ...` ou use `int(k)` diretamente (já está em device).
Muitas conversões `copy()` de imagens	Armazene `prev_render_data_for_fill` como view (`base_cp.copy()` não é necessário; o array é imutável nesse ponto).
Pools não são esvaziados	Depois dos grandes loops: `cp.get_default_memory_pool().free_all_blocks()` já é suficiente; evite chamá-lo dentro de cada batch.
`frames_per_batch` fixo = 64	Extraia da VRAM disponível (`cp.cuda.runtime.memGetInfo`) para usar lotes maiores quando possível.

Teste rápido

Aplique o patch acima.
Rode apenas 10 frames para validar (use flag --dry-run ou limite manual).
Verifique uso de VRAM no nvidia-smi – não deve mais haver spikes/erros.
Se tudo ok, volte ao vídeo completo.

Resumo: o kernel estava acessando além do vector de cortes.
Splittar a chamada por frame (ou redesenhar o kernel para 3-D launch) elimina o acesso ilegal e estabiliza o pipeline.

fix this issues: ``` C:\Users\lucas\Downloads\Scr...

Pergunta

Pensamento

Addressing CUDARuntimeError

Fixing kernel bug in code

Resposta

Por que o erro acontece

Correção mínima (segura e rápida)

Outras melhorias recomendadas

Teste rápido

Compartilhar esta Q&A