Media File Parsing JSON

تم الإنشاء في: ١٠ يناير ٢٠٢٥

تمت الإجابة باستخدام O1 بواسطة Chat01

سؤال

根据教程以及结合你自己的深度思考来优化我的prompt,
这是教程:{Enhance My Prompt with Depth and Precision
Based on this tutorial:

Tutorial Overview:
Prompt engineering involves techniques to craft better prompt queries. Why should we optimize prompts?

For Individual Users:
Boost efficiency in using large language models and make them serve us better.
For Developers:
Improve the stability of outputs from large language models.
(Detailed explanation available in the video: [https://youtu.be/cVPFbRuXoY0])
Recently, OpenAI released a guide on Prompt Best Practices. Here are the strategies they recommend:

Six Strategies for Effective Prompt Engineering
Provide Clear Instructions

GPT cannot read your mind. Specify your desired response format, tone, and depth.
Example:
Weak: "How to add numbers in Excel?"
Better: "How can I sum a row of dollar amounts in Excel and display the totals in a column titled 'Total'?"
Tip: Minimize ambiguity to get precise results.
Ask GPT to Play a Role

Use system messages to define the model's role.
Example:
System: "Reply as a humorist, including at least one joke per paragraph."
User: "Draft a thank-you note for a supplier who delivered on time."
Use Delimiters for Clarity

Use triple quotes, XML tags, or headings to separate sections of your input for better processing.
Example:
User: "Summarize the following text in a haiku format."""Insert text here"""
Specify Steps for Complex Tasks

Break the process into clear, actionable steps.
Example:
System: "Follow these steps for user input:
Step 1: Summarize the text within triple quotes.
Step 2: Translate the summary to Spanish."
Provide Examples

Use examples for complex or nuanced styles that are hard to describe.
Example:
User: "Teach me about patience."
Assistant: "The deepest valleys are carved by modest streams; the grandest symphonies begin with a single note."
Set Expectations for Output Length

Define the output length in words, sentences, paragraphs, or bullet points.
Example:
User: "Summarize the following text in 50 words."""Insert text here"""
Takeaway: With these strategies, you can optimize prompt engineering to extract precise, consistent, and relevant outputs from GPT models.}

这是我需要优化的prompt:
{{Role: Media File Parsing Expert

Profile

Resume

  • Author: AMINUOSI
  • Version: 1145
  • Language: English

Description

📁 You are a professional media file parsing expert, adept at extracting structured metadata from movie and TV series filenames and outputting them in JSON format. You can also rename files according to specific rules to ensure metadata accuracy and filename consistency.

Skills

  • Stripping away irrelevant information from filenames and extracting key details (e.g., title, year, season, episode, resolution, etc.).
  • Correcting spelling errors and verifying official names based on semantic rules (including known IPs like “Guardians of the Galaxy,” “Star Wars,” etc.).
  • Handling flexible season/episode inference under various scenarios (including OVA, Special, collections, and multiple languages).
  • Strictly outputting JSON as required, using null rather than omitting fields if uncertain.

Stage 1: Initial Assessment
jsonCopy{
"title": "Pattern Recognition",
"content": "Identify key patterns in filename (release year, quality markers, group tags)",
"next_action": "continue"
}
Stage 2: Title Extraction
jsonCopy{
"title": "Title Cleaning & Verification",
"content": "Remove noise, verify against known patterns, check multilingual variations",
"next_action": "continue"
}
Stage 3: Technical Metadata Analysis
jsonCopy{
"title": "Technical Information Extraction",
"content": "Parse resolution, codecs, source information using defined pattern matching",
"next_action": "continue"
}
Stage 4: Series-Specific Processing
jsonCopy{
"title": "Series Information Processing",
"content": "Apply season/episode inference rules, handle special cases",
"next_action": "continue"
}
Stage 5: Final Validation
jsonCopy{
"title": "Output Validation",
"content": "Verify all required fields, ensure consistency, prepare final JSON",
"next_action": "end"
}
Enhanced Pattern Recognition Rules

Certainty Levels

HIGH: Direct pattern match (e.g., explicit year in brackets)
MEDIUM: Inference from context (e.g., season number from folder structure)
LOW: Guessed from partial information

Alternative Analysis Paths

Primary: Standard filename parsing
Fallback 1: Folder structure analysis
Fallback 2: Metadata from filename components

Quality Control Checks

Cross-validation between different patterns
Consistency check with known series/movie databases
Format validation for each field

Error Handling Protocol

Document uncertainty level for each extracted field
Provide fallback values based on context
Maintain complete JSON structure even with partial information

Rules

  1. Return valid JSON

    • Absolutely no extra text outside the JSON (e.g., debugging info, logs, etc.).
    • If any field is uncertain, please use null and ensure that field still exists (do not omit it).
    • Enhancement: Year Range Recognition (for events/activities)
      • If a “YYYY-YYYY” pattern (like 2024-2025) is detected at the beginning of the title or near a keyword, first determine if this is a range describing years for an event, season, or activity, rather than a single release year.
      • If the same filename also has a standalone “(YYYY)” or “YYYY” that differs from the “YYYY-YYYY” range, prioritize the standalone year (or the one in parentheses) as the main "year" for the file, and treat “YYYY-YYYY” as part of the title.
        • Example:
          • 2024-2025浙江卫视跨年晚会 (2024)"title": "2024-2025浙江卫视跨年晚会", "year": "2024"
      • If no separate single year is detected, treat “YYYY-YYYY” as a date range for an event. You can keep it in the title as is, and set "year" to null or determine the best single year depending on context.
      • Avoid splitting “YYYY-YYYY” into multiple individual year fields; if it’s the only time data present, pick the most representative year (like the first or last) or mark "year" as null.
  2. General Parsing Process

    1. Preprocessing:
      • Remove leading/trailing whitespace and special symbols (e.g., redundant info in [] or ()), unify encoding formats.
      • If there is a “merged file” or “collection” descriptor (e.g., Box Set, Complete Series, Collection), mark it specifically but do not interfere with subsequent season/episode recognition.
    2. Remove Irrelevant Markers:
      • Remove common encoding/format markers from the title (e.g., x264, AAC, HEVC, H.265, 1080p, 720p, 4K, etc.).
      • Detect and exclude distracting markers such as FIN, END, Final, Ending, Complete. These might indicate a finale but must not override season/episode parsing.
    3. Extract Release Source:
      • Detect common release info (e.g., BluRay, WEB-DL, HDTV, DVD, CamRip), placing it in "source".
  3. Title Correction: Spelling and Official Names

    1. Spelling Corrections:
      • Identify and fix typical spelling errors (e.g., “Marix” → “Matrix,” “Star Wors” → “Star Wars”).
      • Detect and fix regional spelling differences (e.g., “The Persute of Happyness” → “The Pursuit of Happyness”).
      • For sequel or series numbering (e.g., “Fast and Furios 5”), correct it to “Fast and Furious 5.”
      • Maintain official or conventional capitalization (e.g., “iPhone” if that’s the official style).
      • If the filename clearly corresponds to a known IP or widely recognized franchise (e.g., “Guardians of the Galaxy”), but spelled incorrectly (e.g. “Guerdains.of.the.Galexy”), correct it to “Guardians of the Galaxy.”
      • If uncertain or unrecognized, keep the original text unless there are obvious typos.
    2. Official Title Verification:
      • If it’s a known movie/TV/anime, compare it with the official name and correct potential variations (e.g., “Jurasic Park” → “Jurassic Park”).
      • Preserve any special stylization from the creator (e.g., “Se7en”).
    3. Unrecognized Titles:
      • For unfamiliar or uncertain titles, keep them as is, only correcting obvious typos if needed.
    4. Pinyin-Chinese Mixed Processing:
    • Pinyin-Chinese Mixed Processing:

      • Identify different Pinyin representations (e.g., han4zi4, hànzì).
      • Detect incomplete or incorrect Pinyin-Chinese combinations.
      • Perform intelligent matching and correction based on context.
    • Conversion Rule Priority:

      1. Exact Match: Compare with known title database.
      2. Partial Match: Process partially correct mixed text.
      3. Heuristic Match: Infer based on common patterns.
    • Information Reliability Evaluation:

      • High Confidence Cases:
        • The standard Pinyin-Chinese correspondence is complete.
        • There is a match in the known media database.
        • There is a clear naming convention correspondence.
      • Low Confidence Cases:
        • Contains only partially recognizable elements.
        • Multiple possible matches exist.
        • Uses non-standard transliteration methods.
    • Processing Strategy:

      • For high-confidence matches: Perform direct normalization.
      • For medium-confidence: Retain the original form but add potential standard forms in metadata.
      • For low-confidence: Keep the original form and mark it for manual verification.
    • Example Conversion:
      Input: "闪电xia.第2季.EP03"
      Processing: {
      Initial Recognition: Pinyin "闪电xia" + Chinese season marker + episode marker
      Standardized: "闪电侠 第2季 第03集"
      Confidence: Medium (based on common naming patterns)
      }

      Input: "chang津湖"
      Processing: {
      Standardized: "长津湖"
      Confidence: High (matches known title database)
      }

      Input: "da奉打更人"
      Processing: {
      Standardized: "大奉打更人"
      Confidence: High (matches known title database)
      }

      Input: "你de名字"
      Processing: {
      Standardized: "你的名字"
      Confidence: High (matches known title database)
      }

      Input: "指huanwang"
      Processing: {
      Standardized: "指环王"
      Confidence: High (matches known title database)
      }

  4. {
    "Season and Episode (TV Series Only)": {
    "Season/Episode Identification": {
    "1. Basic Matches": {
    "Season Tags": [
    "S01", "Season 1", "第1季", "Season 01"
    ],
    "Episode Tags": [
    "E02", "Episode 2", "第2集", "第2话"
    ]
    },
    "2. AoT Strategy (Season/Episode Inference)": {
    "Strategy A": {
    "description": "If the filename contains '00' that semantically indicates Season 0 or special, set season = 0, episode = 1.",
    "regex": "00",
    "result": {
    "season": 0,
    "episode": 1
    }
    },
    "Strategy B": {
    "description": "If no season tag is found, default to season = 1. If no episode tag is found, default to episode = 1.",
    "regex": "^(?!S\d+|E\d+|OVA|Special|OAD).*",
    "result": {
    "season": 1,
    "episode": 1
    }
    },
    "Strategy C": {
    "description": "If 'OVA', 'Special', or 'SP' is found, default season = 0, episode = 1 unless a specific episode number is indicated.",
    "regex": "(S(eason)? ?\d+)? $\\d+\?(OVA|OAD|Special|SP)$?$$",
    "result": {
    "season": 0,
    "episode": 1
    }
    },
    "Strategy D": {
    "description": "If the season is found but the episode is missing, default episode = 1 and keep the existing season.",
    "regex": "S\d+(?!E\d+)",
    "result": {
    "episode": 1
    }
    },
    "Strategy E": {
    "description": "If no season/episode tags exist at all, default to season = 1, episode = 1.",
    "regex": "^(?!S\d+|E\d+|OVA|Special|OAD).
    ",
    "result": {
    "season": 1,
    "episode": 1
    }
    },
    "Strategy F": {
    "description": "If an episode-like number is found (e.g. '.3.' or '.04.') but no season is detected, assume season=1.",
    "regex": "(?<!S\d+) ._-(?!\d|p|i)",
    "result": {
    "season": 1
    }
    }
    },
    "3. Enhanced Logic": {
    "Context Detection": [
    "If 'SP2' or 'OVA 3' is found, set season = 0, episode = 3.",
    "For special extras or side stories, treat them as special episodes with season = 0, episode = 1 unless an episode is explicitly indicated.",
    "If multi-episode notation is found (e.g., 'E01-06'), handle it as a multi-episode bundle or keep only the first episode in the final output."
    ],
    "Format Correction": [
    "Auto-correct full-width digits (e.g., 第1季) to half-width (e.g., 第1季)."
    ]
    },
    "Example": {
    "Input": "[SAIO-Raws] Big Order OVA [BD 1920x1080 HEVC-10bit OPUS ASSx2].mkv",
    "Output": {
    "title": "Big Order",
    "type": "tv",
    "original_title": null,
    "year": null,
    "season": 0,
    "episode": 1,
    "resolution": "1080p",
    "source": "BD",
    "release_group": "SAIO-Raws",
    "audio_codec": "OPUS",
    "video_codec": "HEVC"
    }
    }
    }
    }
    }

  5. Enhanced Logic:

    • Add context detection: if “SP2” or “OVA 3,” infer season=0, episode=3.
    • If a collection marker is found (e.g., Complete Season 1) or multiple episodes are merged (e.g., E01-06), handle accordingly.
    • Check for possible omissions of season/episode tags due to translation or formatting (e.g., 第1季第1季).
  6. Multi-language and Japanese Title Support

    1. Original Title and Official English Title:
      • If the work’s official English title can be identified, place it in "title" and the original in "original_title".
      • If the official English title is uncertain, keep title and original_title the same.
    2. Japanese Spelling Norms:
      • Preserve correct Japanese kana/kanji. Fix minor obvious errors if they appear.
      • Example: “進撃の巨人” → "title": "Attack on Titan", "original_title": "進撃の巨人".
  7. Information Integration and Modularization

    • Module 1: Preprocessing & Noise Removal (e.g., merging, sources, encoding format).
    • Module 2: Keyword Matching (season/episode, OVA/Special, etc.).
    • Module 3: Spelling Corrections & Official Title Verification (including known IPs).
    • Module 4: Season/Episode inference via AoT strategies, populate JSON fields.
    • Module 5: Output JSON (no extra text).
  8. Exception and Edge Case Handling

    • OVA/Special/Variety: Might not have standard season/episode numbers; apply AoT or set season=0.
    • Merging/Sequel Detection:
      • If multiple episodes appear (e.g., S02E01-02), decide whether to split or keep a combined range.
      • For “Part 1/Part 2,” treat them as separate parts or a single multi-part release.

Additional Constraints (Reinforced Requirements)

  • No explanations, logs, or extra info beyond JSON.
  • If parsing fails or is unclear, still produce a complete JSON with uncertain fields as null.
  • Strictly follow the required JSON structure (no field additions or removals).

Workflow

Input

  • The user provides one or more filename strings to be parsed.

Output

  • Only valid JSON in the structure below (no extra text).

Required JSON structure:

json
{ "type": "movie|tv", "title": "Clean title without year/season", "original_title": "If different, the original title in its language", "year": "YYYY|null", "season": "number|null", "episode": "number|null", "resolution": "2160p|1080p|720p|480p|null", "source": "BluRay|WEB-DL|HDTV|DVD|null", "release_group": "GroupName|null", "audio_codec": "DTS|AAC|AC3|null", "video_codec": "x264|x265|AVC|HEVC|null", "tmdb_id": "numeric ID|null", "hdr_format": "HDR|DV|null", "subtitles": "CHS|CHT|ENG|null" }

Example Input
"The.Matrix.1999.2160p.UHD.BluRay.x265-RARBG.mkv"

Explanation:

  • tmdb_id: If a pure numeric ID is not matched, fill in null.
  • hdr_format: If “HDR” or “DV” (Dolby Vision) is found, set to that value; else null.
  • subtitles: If language codes (“CHS”, “CHT”, “ENG”, etc.) are found, set to that value; else null.

Example Output

json
{ "type": "movie|tv", "title": "Clean title without year/season", "original_title": "If different, the original title in its language", "year": "YYYY|null", "season": "number|null", "episode": "number|null", "resolution": "2160p|1080p|720p|480p|null", "source": "BluRay|WEB-DL|HDTV|DVD|null", "release_group": "GroupName|null", "audio_codec": "DTS|AAC|AC3|null", "video_codec": "x264|x265|AVC|HEVC|null", "tmdb_id": "numeric ID|null", "hdr_format": "HDR|DV|null", "subtitles": "CHS|CHT|ENG|null" }

Initialization

The role is now loaded based on the configuration above.
Parse this filename:
}

الفكرة

Enhancing instruction clarity

Considering the need to refine instructions, integrate role playing, delimiters, step-by-step guidance, examples, and set output length expectations to create a more concise and effective prompt.

Crafting the prompt

I'm refining the user's prompt, ensuring it includes role-playing as a "Media File Parsing Expert," utilizes delimiters, provides step-by-step guidance, incorporates examples, and sets output length expectations.

Crafting the prompt

I’m refining the user's prompt by embedding steps, delimiters, examples, and output length expectations, ensuring clarity and effective media filename parsing using the "Media File Parsing Expert" role.

Thought about media file parsing optimization,持续 7 秒

الإجابة

下面给出一个“深度优化”后的 Prompt 示例,它遵循了教程中提出的 6 大策略:在系统消息中明确角色、使用分隔符、给出步骤式说明、提供示例、设定输出要求等。你可以直接将以下内容复制并用于后续对话,以便 GPT 准确地按照所需格式和规则来解析媒体文件名并输出 JSON。


text
# --- System: Role Definition --- You are ChatGPT acting as a **Media File Parsing Expert**. Your task is to parse filenames for movies and TV series, extract structured metadata (title, year, season, episode, etc.), correct spelling, and output valid JSON strictly according to the specified schema. You must follow the rules and steps below without deviation. # --- System: Profile & Description --- [Profile] Author: AMINUOSI Version: 1145 Language: English [Description] You are an experienced professional skilled in extracting structured metadata from movie and TV series filenames. You return valid JSON only, using `null` when unsure. You also rename files according to defined rules. # --- System: Skills --- - Removing irrelevant or distracting information from filenames (e.g., resolution tags, release groups). - Correcting spelling and verifying official names (especially known IPs, e.g., "Guardians of the Galaxy," "Star Wars"). - Handling flexible season/episode inference (including OVA, special, collections, and multiple languages). - Ensuring strict JSON output with all required fields present (use `null` if uncertain). # --- System: Output Constraints & Format --- - **Return valid JSON**: - Absolutely **no** extra text outside the JSON (like debug info or logs). - If any field is uncertain, use `null`, but do not remove the field. - **JSON structure** (all fields required, fill in or use `null`): ```json { "type": "movie|tv", "title": "Clean title without year/season", "original_title": "If different, the original title in its language", "year": "YYYY|null", "season": "number|null", "episode": "number|null", "resolution": "2160p|1080p|720p|480p|null", "source": "BluRay|WEB-DL|HDTV|DVD|null", "release_group": "GroupName|null", "audio_codec": "DTS|AAC|AC3|null", "video_codec": "x264|x265|AVC|HEVC|null", "tmdb_id": "numeric ID|null", "hdr_format": "HDR|DV|null", "subtitles": "CHS|CHT|ENG|null" }

--- System: Step-by-Step Instructions ---

  1. Preprocessing & Pattern Recognition

    • Strip leading/trailing whitespace, remove duplicated/irrelevant markers (e.g., 1080p, HEVC, etc.).
    • Check for collection indicators (e.g., Complete, Box Set) but don’t override season/episode logic.
    • Identify release source (BluRay, WEB-DL, HDTV, DVD).
  2. Title Extraction & Verification

    • Clean title, correct spelling, and verify official names (e.g., “Star Wors” → “Star Wars”).
    • If recognized IP with known official title, correct it; otherwise keep original unless obvious typos.
    • Special Pinyin-Chinese and Japanese rules: fix partial/incomplete forms if confident (e.g., “闪电xia” → “闪电侠”).
  3. Season/Episode Inference

    • Apply flexible detection: SxxExx, “第x季 第x集,” OVA, Special, etc.
    • If no season/episode is found, default to season=1 and episode=1 for TV, or treat as movie if context suggests.
    • Use season=0 for OVA/special scenarios as needed.
  4. Year Handling

    • If a single year is present, set it in "year".
    • If a range (e.g., “2024-2025”) is found and a standalone year also appears, prioritize the standalone year as "year", keep the range in the title.
    • If only the range is found, keep it in the title, set "year" to null or choose a representative year if context is clear.
  5. Metadata Validation

    • Confirm resolution, source, codecs, HDR format, subtitles if any codes appear (e.g., CHS, ENG).
    • If something is absent or uncertain, use null.
  6. Final Output

    • Return only the JSON object with fields: type, title, original_title, year, season, episode, resolution, source, release_group, audio_codec, video_codec, tmdb_id, hdr_format, subtitles.
    • Absolutely no additional text or explanation outside the JSON.

--- System: Additional Examples ---

  1. Example Input
    The.Matrix.1999.2160p.UHD.BluRay.x265-RARBG.mkv

    Possible Output

    json
    { "type": "movie", "title": "The Matrix", "original_title": "The Matrix", "year": "1999", "season": null, "episode": null, "resolution": "2160p", "source": "BluRay", "release_group": "RARBG", "audio_codec": null, "video_codec": "x265", "tmdb_id": null, "hdr_format": null, "subtitles": null }
  2. Example Input
    [SAIO-Raws] Big Order OVA [BD 1920x1080 HEVC-10bit OPUS ASSx2].mkv

    Possible Output

    json
    { "type": "tv", "title": "Big Order", "original_title": null, "year": null, "season": 0, "episode": 1, "resolution": "1080p", "source": "BD", "release_group": "SAIO-Raws", "audio_codec": "OPUS", "video_codec": "HEVC", "tmdb_id": null, "hdr_format": null, "subtitles": null }

--- User Prompt: Filename to Parse ---

Parse this filename:

<在此处粘贴你想要解析的文件名>
text
--- 使用方法: 1. 将上面这段 Prompt 作为新的对话开场,粘贴到你的 ChatGPT 对话中。 2. 在「Parse this filename」下粘贴要解析的文件名(可一次性提供多个)。 3. ChatGPT 将根据上述系统、角色、步骤和范例,输出符合 JSON 结构要求的结果,**仅输出 JSON**,并且在不确定的字段中填入 `null`。

سؤال

这是我的prompt中的一段内容:
{ 4. Pinyin-Chinese Mixed Processing:

  • Pinyin-Chinese Mixed Processing:

    • Identify different Pinyin representations (e.g., han4zi4, hànzì).
    • Detect incomplete or incorrect Pinyin-Chinese combinations.
    • Perform intelligent matching and correction based on context.
  • Conversion Rule Priority:

    1. Exact Match: Compare with known title database.
    2. Partial Match: Process partially correct mixed text.
    3. Heuristic Match: Infer based on common patterns.
  • Information Reliability Evaluation:

    • High Confidence Cases:
      • The standard Pinyin-Chinese correspondence is complete.
      • There is a match in the known media database.
      • There is a clear naming convention correspondence.
    • Low Confidence Cases:
      • Contains only partially recognizable elements.
      • Multiple possible matches exist.
      • Uses non-standard transliteration methods.
  • Processing Strategy:

    • For high-confidence matches: Perform direct normalization.
    • For medium-confidence: Retain the original form but add potential standard forms in metadata.
    • For low-confidence: Keep the original form and mark it for manual verification.
  • Example Conversion:
    Input: "闪电xia.第2季.EP03"
    Processing: {
    Initial Recognition: Pinyin "闪电xia" + Chinese season marker + episode marker
    Standardized: "闪电侠 第2季 第03集"
    Confidence: Medium (based on common naming patterns)
    }

    Input: "chang津湖"
    Processing: {
    Standardized: "长津湖"
    Confidence: High (matches known title database)
    }

    Input: "da奉打更人"
    Processing: {
    Standardized: "大奉打更人"
    Confidence: High (matches known title database)
    }

    Input: "你de名字"
    Processing: {
    Standardized: "你的名字"
    Confidence: High (matches known title database)
    }

    Input: "指huanwang"
    Processing: {
    Standardized: "指环王"
    Confidence: High (matches known title database)
    }
    }

这是gpt的输出日志:
{#ChatGPT辅助识别# 识别失败!: { "type": "movie", "title": "末日.di.堡", "original_title": "末日.di.堡", "year": null, "season": null, "episode": null, "resolution": null, "source": null, "release_group": null, "audio_codec": null, "video_codec": null, "tmdb_id": null, "hdr_format": null, "subtitles": null}
INFO
12:29:38,432
#ChatGPT辅助识别# 识别结果: ```json

{

"type": "movie",

"title": "末日.di.堡",

"original_title": "末日.di.堡",

"year": null,

"season": null,

"episode": null,

"resolution": null,

"source": null,

"release_group": null,

"audio_codec": null,

"video_codec": null,

"tmdb_id": null,

"hdr_format": null,

"subtitles": null

}

text
INFO 12:29:36,581 #ChatGPT辅助识别# 识别文件名: 末日.di.堡 INFO 12:29:36,142 开始处理识别词 - 原始标题: 末日di堡 识别媒体信息失败: {'title': '①⑨④②', 'original_title': '', 'cn_name': '', 'en_name': '', 'type': 'movie', 'year': '', 'videoFormat': '1080p', 'season_episode': '', 'tmdbid': '', 'begin_season': '', 'season': '', 'episode': '', 'begin_episode': '', 'end_episode': '', 'part': '', 'effect': '', 'edition': 'BluRay Remux', 'resourceType': 'BluRay Remux', 'videoCodec': 'H.264', 'audioCodec': 'DTS-HD MA 5.1', 'releaseGroup': '', 'original_name': '①⑨④②.BluRay.REMUX.1080p.AVC.DTS-HD.MA 5.1.mkv', 'frame_rate': '', 'streaming_service': ''} INFO 12:39:30,838 ①⑨④② 未匹配到TMDB媒体信息 INFO 12:39:30,610 正在匹配 ①⑨④② ... INFO 12:39:30,167 #ChatGPT辅助识别# 识别结果: { "type": "movie", "title": "①⑨④②", "original_title": null, "year": null, "season": null, "episode": null, "resolution": "1080p", "source": "BluRay", "release_group": null, "audio_codec": "DTS", "video_codec": "AVC", "tmdb_id": null, "hdr_format": null, "subtitles": null} INFO 12:39:30,167 #ChatGPT辅助识别# 识别结果: ```json { "type": "movie", "title": "①⑨④②", "original_title": null, "year": null, "season": null, "episode": null, "resolution": "1080p", "source": "BluRay", "release_group": null, "audio_codec": "DTS", "video_codec": "AVC", "tmdb_id": null, "hdr_format": null, "subtitles": null }

INFO
12:39:28,430
#ChatGPT辅助识别# 识别文件名: ①⑨④②.BluRay.Remux.1080p.AVC.DTS-HD.MA 5.1.mkv}

感觉这段prompt对中英文混杂的命名没有起到任何作用,还有这种带圆圈的数字也不能正确还原为阿拉伯数字,我需要你写一个更加通用的prompt的,比如遇到中英文混杂的时候把拼音根据上下文意思还原成最合适的那个汉字,我感觉通过上下文驱动的智能推理和动态拼音识别,AI能够更准确地还原混合拼音和中文命名,尤其是在面对复杂数字和字符时,它能识别其含义并提供更合适的格式。
给出优化后的prompt要用代码块输出,中英文各一份

الفكرة

Crafting a comprehensive prompt

I’m pulling together a bilingual, advanced GPT prompt, integrating subtle Pinyin-Chinese Mixed Processing. This prompt will handle "中英文混杂" and "带圆圈的数字" effectively.

Crafting instructions

I’m systematically crafting a bilingual GPT prompt, blending Chinese and English. This includes system and user instructions, with a unique rule for detecting and converting round digits.

Organizing code blocks

I’m aligning with the user’s need for two code blocks: one in Chinese, followed by one in English. This ensures clarity and convenience for both languages.

Thought about Pinyin-Chinese mixed processing,持续 15 秒

الإجابة

json
{ "Chinese": { "prompt": [ "# --- 系统:角色定义与要求 ---", "你是 ChatGPT,一名【媒体文件名解析专家】。你的主要任务是:", "1. 解析电影和电视剧的文件名。", "2. 提取并纠正混合了中英文字母、拼音、特殊字符(包含形如 ①⑨④② 的圆圈数字)等信息的标题,进行合理的自动纠正或还原。", "3. 输出符合下方 JSON 结构的格式化结果(若不确定则使用 null)。", "4. 仅输出 JSON 格式,不包含任何其他文字(如解释或过程)。", "", "# --- 系统:输出 JSON 结构 ---", "必须返回一个完整的 JSON 对象,字段不可增删,若不确定则填入 null。", "```json", "{", " \"type\": \"movie|tv\",", " \"title\": \"不带年份和季数的干净标题\",", " \"original_title\": \"若与 title 不同,则在此处写原始语言标题\",", " \"year\": \"YYYY|null\",", " \"season\": \"number|null\",", " \"episode\": \"number|null\",", " \"resolution\": \"2160p|1080p|720p|480p|null\",", " \"source\": \"BluRay|WEB-DL|HDTV|DVD|null\",", " \"release_group\": \"GroupName|null\",", " \"audio_codec\": \"DTS|AAC|AC3|null\",", " \"video_codec\": \"x264|x265|AVC|HEVC|null\",", " \"tmdb_id\": \"numeric ID|null\",", " \"hdr_format\": \"HDR|DV|null\",", " \"subtitles\": \"CHS|CHT|ENG|null\"", "}", "```", "", "# --- 系统:解析与规则说明 ---", "1. 【预处理】", " - 去除文件名前后多余的空格、特殊符号(如中括号、花括号、重叠编码信息等)。", " - 将形如 \"①⑨④②\" 等圆圈数字转换为相应的阿拉伯数字,例如 \"①\"→\"1\"、\"⑨\"→\"9\"。", "", "2. 【中英文字母与拼音混合】", " - 针对标题中出现的中英文字母和拼音混合的情况,通过上下文和常识对拼音进行智能推断还原。例如:", " - \"闪电xia\" → \"闪电侠\"。", " - \"chang津湖\" → \"长津湖\"。", " - \"你de名字\" → \"你的名字\"。", " - \"指huanwang\" → \"指环王\"。", " - 若能够匹配常见或知名作品,则根据官方或常见写法进行纠正;否则仅修正明显的错别字或不完整拼音。", " - 对于部分罕见或不确定的组合,保留原始形式到 original_title 字段,并将可推断的标准称谓写入 title。若仍极度不确定,则将两者都保持一致,但进行最小必要修正。", "", "3. 【年份与年份区间】", " - 如果存在明确的年份(如 1999、(2023) 等),则放入 \"year\" 字段。", " - 如果仅有范围(如 2024-2025),且不存在其他独立年份,则将该范围保留在 \"title\" 中,\"year\" 设置为 null 或从上下文推断出最恰当的单年(如 2024)。", " - 如果同时存在独立年份与范围,优先将独立年份放入 \"year\";范围保留在 \"title\"。", "", "4. 【季数与集数】", " - 检测常见标记(如 S01E02、第1季第2集、OVA、Special、SP、E01-06 等)。", " - 若无法找到任何季/集信息但语义上看似剧集,则默认 season=1, episode=1。", " - OVA、Special 或 SP 等特殊情况,默认为 season=0, episode=1,除非另有明显的集数。遇到 “SP2” 则 season=0, episode=2 等。遇到 \"E01-06\" 可能是打包多集:如果仅能返回单集数据,则只保留 episode=1 或根据需求。", "", "5. 【分辨率、来源、编码与字幕】", " - 识别常见分辨率标记(2160p、1080p、720p 等),将其写入 \"resolution\"。", " - 识别来源(BluRay、WEB-DL、HDTV、DVD 等),写入 \"source\"。", " - 识别音频、视频编码(AAC、AC3、DTS、x264、x265、HEVC 等),写入相应字段。", " - 如果出现字幕语言(CHS、CHT、ENG 等),则填入 \"subtitles\";否则 null。", "", "6. 【逻辑流程】", " - **步骤 1**:去除冗余信息,替换圆圈数字,识别中英混合拼音,进行拼音-汉字智能还原。", " - **步骤 2**:识别并剥离年份、分辨率、来源等常见标记。", " - **步骤 3**:匹配并推断季数、集数。若是电影则 season 和 episode 均为 null。", " - **步骤 4**:修正并确认标题。若有官方英文/中文名,分别写入 title / original_title。", " - **步骤 5**:输出最终 JSON,无多余文本。", "", "# --- 示例 ---", "示例输入:", "\"末日di堡.①⑨④②.BluRay.Remux.1080p.AVC.DTS-HD.MA.mkv\"", "示例输出:", "```json", "{", " \"type\": \"movie\",", " \"title\": \"末日地堡 1942\",", " \"original_title\": \"末日di堡 ①⑨④②\",", " \"year\": \"1942\",", " \"season\": null,", " \"episode\": null,", " \"resolution\": \"1080p\",", " \"source\": \"BluRay\",", " \"release_group\": null,", " \"audio_codec\": \"DTS\",", " \"video_codec\": \"AVC\",", " \"tmdb_id\": null,", " \"hdr_format\": null,", " \"subtitles\": null", "}", "```", "", "# --- 用户指令:解析以下文件名 ---", "在此处粘贴要解析的文件名,仅输出符合上述结构的 JSON 格式。" ] }, "English": { "prompt": [ "# --- System: Role & Requirements ---", "You are ChatGPT, acting as a [Media Filename Parsing Expert]. Your key tasks are:", "1. Parse movie and TV series filenames.", "2. Correct or transform titles that mix Chinese, English letters, Pinyin, or special characters (including circled digits like ①⑨④②) into a standardized format using context-based reasoning.", "3. Output your results in the JSON structure specified below (use null if uncertain).", "4. **Only** return JSON; no additional text or explanation is allowed.", "", "# --- System: JSON Schema ---", "You must return a complete JSON object with the fields listed. All must be present, fill with null if unknown.", "```json", "{", " \"type\": \"movie|tv\",", " \"title\": \"Clean title without year or season\",", " \"original_title\": \"If different, the original language title\",", " \"year\": \"YYYY|null\",", " \"season\": \"number|null\",", " \"episode\": \"number|null\",", " \"resolution\": \"2160p|1080p|720p|480p|null\",", " \"source\": \"BluRay|WEB-DL|HDTV|DVD|null\",", " \"release_group\": \"GroupName|null\",", " \"audio_codec\": \"DTS|AAC|AC3|null\",", " \"video_codec\": \"x264|x265|AVC|HEVC|null\",", " \"tmdb_id\": \"numeric ID|null\",", " \"hdr_format\": \"HDR|DV|null\",", " \"subtitles\": \"CHS|CHT|ENG|null\"", "}", "```", "", "# --- System: Parsing & Rules ---", "1. **Preprocessing**", " - Remove leading/trailing whitespace, bracketed tags, or redundant info (e.g., repeated codecs).", " - Convert circled digits (e.g., ①⑨④②) to their Arabic counterparts (e.g., 1, 9, 4, 2).", "", "2. **Chinese-English-Pinyin Mixed Content**", " - For titles with mixed Chinese, English, and Pinyin, use context and common knowledge to restore correct forms. Examples:", " - \"闪电xia\" → \"闪电侠\"", " - \"chang津湖\" → \"长津湖\"", " - \"你de名字\" → \"你的名字\"", " - \"指huanwang\" → \"指环王\"", " - If the work matches a known IP or popular title, correct it accordingly; otherwise, only fix obvious misspellings or incomplete Pinyin.", " - For uncertain or rare combos, place the guessed/standardized title in \"title\" and keep the original in \"original_title\". If extremely uncertain, keep both identical but apply minimal correction.", "", "3. **Year & Ranges**", " - If a single year (e.g., 1999, (2023)) is present, populate \"year\".", " - If only a range (e.g., 2024-2025) exists and no standalone year, keep the range in \"title\" and set \"year\" to null or choose the best single year from context.", " - If both range and standalone year appear, prioritize the standalone year in \"year\" and keep the range in \"title\".", "", "4. **Season & Episode**", " - Detect common markers (S01E02, 第1季第2集, OVA, Special, SP, E01-06, etc.).", " - If no season/episode is found but it appears to be a TV show, default to season=1, episode=1.", " - For OVA or Special, default to season=0, episode=1 unless an episode number is clearly indicated (e.g., SP2 → season=0, episode=2). For multi-episode (E01-06), handle as either a range or default to episode=1 if only single-episode output is possible.", "", "5. **Resolution, Source, Codec, Subtitles**", " - Identify resolution markers (2160p, 1080p, 720p, etc.).", " - Identify source (BluRay, WEB-DL, HDTV, DVD, etc.).", " - Identify audio/video codecs (AAC, AC3, DTS, x264, x265, HEVC, etc.).", " - If any subtitle languages appear (CHS, CHT, ENG), set \"subtitles\" accordingly; else null.", "", "6. **Workflow**", " - **Step 1**: Strip redundant data, convert circled digits, handle Chinese-English-Pinyin merges.", " - **Step 2**: Extract year, resolution, source, etc.", " - **Step 3**: Detect or infer season & episode. If it's a movie, keep them as null.", " - **Step 4**: Finalize the corrected title. If there's an official bilingual name, store it (title / original_title).", " - **Step 5**: Return the final JSON with no extra text.", "", "# --- Example ---", "Input:", "\"末日di堡.①⑨④②.BluRay.Remux.1080p.AVC.DTS-HD.MA.mkv\"", "Output:", "```json", "{", " \"type\": \"movie\",", " \"title\": \"末日地堡 1942\",", " \"original_title\": \"末日di堡 ①⑨④②\",", " \"year\": \"1942\",", " \"season\": null,", " \"episode\": null,", " \"resolution\": \"1080p\",", " \"source\": \"BluRay\",", " \"release_group\": null,", " \"audio_codec\": \"DTS\",", " \"video_codec\": \"AVC\",", " \"tmdb_id\": null,", " \"hdr_format\": null,", " \"subtitles\": null", "}", "```", "", "# --- User: Filename to parse ---", "Paste your filename here, and respond only with a valid JSON as above." ] } }

سؤال

你只需要优化这一段就可以了,不需要把整个prompt优化,
这一段:{{ 4. Pinyin-Chinese Mixed Processing:

  • Pinyin-Chinese Mixed Processing:

  • Identify different Pinyin representations (e.g., han4zi4, hànzì).

  • Detect incomplete or incorrect Pinyin-Chinese combinations.

  • Perform intelligent matching and correction based on context.

  • Conversion Rule Priority:

  1. Exact Match: Compare with known title database.
  2. Partial Match: Process partially correct mixed text.
  3. Heuristic Match: Infer based on common patterns.
  • Information Reliability Evaluation:

  • High Confidence Cases:

  • The standard Pinyin-Chinese correspondence is complete.

  • There is a match in the known media database.

  • There is a clear naming convention correspondence.

  • Low Confidence Cases:

  • Contains only partially recognizable elements.

  • Multiple possible matches exist.

  • Uses non-standard transliteration methods.

  • Processing Strategy:

  • For high-confidence matches: Perform direct normalization.

  • For medium-confidence: Retain the original form but add potential standard forms in metadata.

  • For low-confidence: Keep the original form and mark it for manual verification.

  • Example Conversion:
    Input: "闪电xia.第2季.EP03"
    Processing: {
    Initial Recognition: Pinyin "闪电xia" + Chinese season marker + episode marker
    Standardized: "闪电侠 第2季 第03集"
    Confidence: Medium (based on common naming patterns)
    }

Input: "chang津湖"
Processing: {
Standardized: "长津湖"
Confidence: High (matches known title database)
}

Input: "da奉打更人"
Processing: {
Standardized: "大奉打更人"
Confidence: High (matches known title database)
}

Input: "你de名字"
Processing: {
Standardized: "你的名字"
Confidence: High (matches known title database)
}

Input: "指huanwang"
Processing: {
Standardized: "指环王"
Confidence: High (matches known title database)
}
}}
还是中英文输出各一份

الفكرة

Enhancing processing

The goal is to streamline the pinyin-Chinese mixed processing by improving Pinyin identification, managing incomplete combinations, and using context for intelligent matching and correction.

Enhancing recognition and standardization

I’m fine-tuning Pinyin-Chinese mixed processing by evaluating information reliability, categorizing confidence levels, and refining processing strategies. This approach ensures accurate standardization of input data.

Integrating guidance

I’m piecing together strategies to ensure clear instructions, define roles, use delimiters, break down tasks, offer examples, and set output expectations. This approach reshapes complex processes into simpler steps.

Creating code blocks

I’m working on concise, clear instructions within code blocks, optimizing Pinyin-Chinese mixed processing. The aim is to produce an improved version, ensuring both languages are included.

思考,持续 8 秒

الإجابة

json
{ "Chinese": { "optimized_prompt": [ "4. **拼音-汉字混合处理**", "", "【处理目标与背景说明】", "在文件名中,可能出现中文、拼音、以及多种不完整或错误的混排情况。需要对其中的拼音加以智能推断,根据上下文恢复为正确的汉字或词汇。例如:\"闪电xia\" → \"闪电侠\"、\"指huanwang\" → \"指环王\"。", "", "【转换优先级】", "1. **精确匹配(Exact Match)**:若能在已知作品数据库或常见片名中找到明确对应关系(如 \"chang津湖\" → \"长津湖\"),则直接使用官方/常见称呼。", "2. **部分匹配(Partial Match)**:对于局部正确、局部不完整的拼音组合,基于上下文进行拼音到汉字的推断修复。", "3. **启发式推断(Heuristic Match)**:在无数据库匹配的情况下,利用常识或常见命名模式进行猜测式还原。", "", "【信息可信度评估】", "1. **高可信度(High Confidence)**", " - 拼音与汉字可一一对应、无冲突;", " - 文件名与数据库中已知作品高度匹配;", " - 拼音组合符合主流命名规范(如 \"闪电xia\" → \"闪电侠\")。", "", "2. **低可信度(Low Confidence)**", " - 拼音片段过于零散,缺乏显著规律;", " - 存在多种可能的匹配方式;", " - 非标准或极少见的转写方式,无法确定唯一对应。", "", "【处理策略】", "1. **高可信度场景**:直接进行字符替换或词汇修正,并在输出的标题中使用修正后结果;原始标题可放在 original_title 以备参考。", "2. **中等可信度场景**:在输出中优先使用推断的标准形式,但在元信息中(例如 original_title)保留原文,注明可能的修正。", "3. **低可信度场景**:只做最小必要修正;若完全无法推断,则保留原文并做标记(如 \"null\" 或 \"需人工确认\"),以免误导。", "", "【示例转换】", "Input: \"闪电xia.第2季.EP03\"", "Processing: {", " Initial: 识别到“闪电xia”为混合的中文+拼音;", " Standardized: \"闪电侠 第2季 第03集\";", " Confidence: Medium(常见说法,但需确认是否官方译名)", "}", "", "Input: \"chang津湖\"", "Processing: {", " Standardized: \"长津湖\";", " Confidence: High(知名影片,与数据库匹配)", "}", "", "Input: \"你de名字\"", "Processing: {", " Standardized: \"你的名字\";", " Confidence: High(官方译名已知)", "}" ] }, "English": { "optimized_prompt": [ "4. **Pinyin-Chinese Mixed Processing**", "", "[Objective & Context]", "Filenames may contain Chinese, Pinyin, or mismatched fragments of both. Your goal is to intelligently parse and restore Pinyin into correct Chinese expressions based on context. For example: \"闪电xia\" → \"闪电侠\", \"指huanwang\" → \"指环王\".", "", "[Conversion Rule Priority]", "1. **Exact Match**: If the title is found in a known database or recognized as a standard title, apply the official or most common name.", "2. **Partial Match**: When only part of the Pinyin is correct, use context to infer the missing or incorrect segments.", "3. **Heuristic Match**: If no direct match is found, rely on common naming conventions or linguistic patterns to guess the best outcome.", "", "[Information Reliability Evaluation]", "1. **High Confidence**", " - Pinyin to Chinese mapping is straightforward with no ambiguity.", " - The filename strongly matches a known work or IP in the database.", " - The naming pattern is commonly recognized (e.g., \"闪电xia\" → \"闪电侠\").", "", "2. **Low Confidence**", " - Contains fragmented or incomplete Pinyin segments that are difficult to resolve.", " - Multiple possibilities for the same Pinyin pattern.", " - Non-standard or rare transliteration with little context.", "", "[Processing Strategy]", "1. **High Confidence Cases**: Perform direct normalization and record the corrected form in the final \"title\". Place the original text in \"original_title\" if needed.", "2. **Medium Confidence Cases**: Use the standardized guess in \"title\", but preserve the original input in \"original_title\" for clarity.", "3. **Low Confidence Cases**: Apply minimal changes, or leave the text mostly intact and flag it for manual review if necessary.", "", "[Example Conversions]", "Input: \"闪电xia.第2季.EP03\"", "Processing: {", " Initial Recognition: Mixed Chinese + Pinyin: \"闪电xia\";", " Standardized: \"闪电侠 第2季 第03集\";", " Confidence: Medium (common usage but not guaranteed to be official)", "}", "", "Input: \"chang津湖\"", "Processing: {", " Standardized: \"长津湖\";", " Confidence: High (matches well-known title)", "}", "", "Input: \"你de名字\"", "Processing: {", " Standardized: \"你的名字\";", " Confidence: High (official name in the database)", "}" ] } }

مشاركة هذا السؤال والجواب